Re: Streaming Uploads Discussion

2008-04-05 Thread Ivan Sagalaev

Ivan Sagalaev wrote:
> Nobody stops a developer from doing both things in parallel: storing on 
> disk and streaming to S3.

> they will gain heavily from not doing writes and 
> reads of the whole file on local disk.

Looks like I'm contradicting myself a bit here :-). But it's not the 
point actually :-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-05 Thread Ivan Sagalaev

Malcolm Tredinnick wrote:
> Then those people deserve to be beaten heavily about the head and
> shoulders. S3 is NOT a reliable upload endpoint. They (Amazon) say
> there'll be approximately a 1% failure rate for attempted uploads. As 37
> signals have noted in the past (they use it for their whiteboard file
> storage), that doesn't mean that when you try again it will work,
> either. It means, in practice, that periodically for 1% of the time, all
> your uploads will fail.

Nobody stops a developer from doing both things in parallel: storing on 
disk and streaming to S3. Then when S3 fails a stored file can be 
scheduled for repeating uploading.

The reason for not doing it only over-the-disk way is speed. Since 99% 
of uploads do succeed they will gain heavily from not doing writes and 
reads of the whole file on local disk.

Anyway S3 is just an example. It could be some local media-server 
instead. I think the main reason between an upload handler and a file 
backend is that the latter is generally simpler to write and the former 
is generally more flexible.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-05 Thread Ivan Sagalaev

Mike Axiak wrote:
> I didn't want to use the extra setting either, but I finally caved in
> after working with it (and discussing with Ivan). I will certainly
> explore any other possibilities.

My original reason for settings was that a single upload handler can't 
possibly know the semantics of other upload handlers and can't decide 
where to put itself in the list. This should be a decision of a 
developer who compiles a project from third-party apps with upload 
handlers how to arrange them. It's very similar to middleware: the order 
is decided in settings. The only exception would be when a handler 
really depends on some other handler which it can require be raising an 
Exception somewhere.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Malcolm Tredinnick


On Fri, 2008-04-04 at 13:23 -0700, Mike Axiak wrote:
[...]
> Let's look at the S3 uploading process. There are two ways we can
> handle the upload and send to S3:
> 
>1. Stream the data directly to S3.
>2. Stream the data to disk, then send to S3.
> 
> I think a lot of people might opt to do option (1). 

Then those people deserve to be beaten heavily about the head and
shoulders. S3 is NOT a reliable upload endpoint. They (Amazon) say
there'll be approximately a 1% failure rate for attempted uploads. As 37
signals have noted in the past (they use it for their whiteboard file
storage), that doesn't mean that when you try again it will work,
either. It means, in practice, that periodically for 1% of the time, all
your uploads will fail. Then, a little later, they'll start to work. But
there could be minutes on end where you cannot upload. So unless your
web application is designed to intentionally lose data (in which case I
can think of a big optimisation on the saving end to lose it even faster
and more reliably), you must save to disk before trying to upload to S3.

In short, I can't see any reason the file upload handler should care
about storage systems like this.

Regards,
Malcolm

-- 
Experience is something you don't get until just after you need it. 
http://www.pointy-stick.com/blog/


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Mike Axiak

On Apr 4, 3:29 pm, "Marty Alchin" <[EMAIL PROTECTED]> wrote:
> I admit I haven't been following this terribly closely, but now that
> both #5361 and #2070 are nearing completion, I'm trying to get a good
> handle on all of this in case there are any interactions between the
> two that I can help with.
Yeah, it's an interesting interaction. We can talk about this more on
IRC, but let me try to outline a few ideas here.
I see both 2070 (one half of it) and 5361 both doing the same thing:
representing files. However, they are representing files in different
stages of life.
One is tied to the database (#5361), and one can potentially be
half-written (#2070). I forsee a time when we actually merge these
into one, but I think should do this incrementally.

Let's look at the S3 uploading process. There are two ways we can
handle the upload and send to S3:

   1. Stream the data directly to S3.
   2. Stream the data to disk, then send to S3.

I think a lot of people might opt to do option (1). In this case, the
upload handler will need to stream it to the S3 directly, but it needs
to give request.FILES *something* that the user/forms/db layer can
interact with. In this case, I gave the example of an S3UploadedFile.
It has a size, a way to read data from it, and a filename. In light of
#5361, if your backend is S3FileBackend, I would imagine it would
notice the S3UploadedFile and say "Oh, saving this is trivial. Let me
just save attribute ". But that doesn't mean you can't save a
normal file in the S3FileBackend, does it?

Again, this is a little tricky, and to get right you and I will have
to work out some more of the details. But I do see them as usefully
somewhat separate identities.

Cheers,
Mike

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Marty Alchin

On Fri, Apr 4, 2008 at 3:43 PM, Jacob Kaplan-Moss
<[EMAIL PROTECTED]> wrote:
>  Yeah, one thing we'll need to figure out PDQ is what's appropriate for
>  an upload handler, and what's appropriate for a storage backend.
>  Hopefully the two of you can work out the breakdown.

I'll read over the patch and the docs and see if I can get a better
handle on how it works, so I can be of more use there. Also, Mike and
I put our heads together in IRC sometimes, so we should be able to get
it sorted out soon.

>  FYI, I plan to merge 2070 first (sorry, Marty!) since I think it works
>  a bit better that way from a dependancy POV.

No worries. I still have another of David's suggestions to integrate
before I can have it "finished" anyway. Plus, there's a 3291-ticket
age difference. The youngest always gets the shaft. :) I can live with
that.

-Gul

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Jacob Kaplan-Moss

On Fri, Apr 4, 2008 at 2:29 PM, Marty Alchin <[EMAIL PROTECTED]> wrote:
>  Maybe I'm missing something obvious, but why would there ever be an
>  S3UploadHandler? Shouldn't that be handled by a file storage backend?

Yeah, one thing we'll need to figure out PDQ is what's appropriate for
an upload handler, and what's appropriate for a storage backend.
Hopefully the two of you can work out the breakdown.

FYI, I plan to merge 2070 first (sorry, Marty!) since I think it works
a bit better that way from a dependancy POV.

Jacob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Marty Alchin

On Fri, Apr 4, 2008 at 2:04 PM, Mike Axiak <[EMAIL PROTECTED]> wrote:
>  However, the instant they pass the file to some remote location
>  (think: S3UploadHandler) or alter the content (think:
>  GZipUploadHandler) they will need their own way of defining what is
>  content and how it should be "saved".

Maybe I'm missing something obvious, but why would there ever be an
S3UploadHandler? Shouldn't that be handled by a file storage backend?

As for GZipUploadHandler, if you're talking about gzipping it as part
of the save process, shouldn't that also be a file storage backend? Of
course, if you're talking about *un*gzipping it during upload, so it
can be processed by Python, I withdraw the question.

I admit I haven't been following this terribly closely, but now that
both #5361 and #2070 are nearing completion, I'm trying to get a good
handle on all of this in case there are any interactions between the
two that I can help with.

-Gul

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Mike Axiak

On Apr 4, 2:46 pm, "Jacob Kaplan-Moss" <[EMAIL PROTECTED]>
wrote:
> Hrm, good point. I'll chew a bit more, but I can't think of a good way
> to avoid the extra setting (as much as I dislike creeping settings).

I didn't want to use the extra setting either, but I finally caved in
after working with it (and discussing with Ivan). I will certainly
explore any other possibilities.

>
> Jacob
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Jacob Kaplan-Moss

On Fri, Apr 4, 2008 at 1:04 PM, Mike Axiak <[EMAIL PROTECTED]> wrote:
>  Composition gets a little tricky. I realized this when I wrote the
>  stuff that handles the composition.
>  Right now we have (InMemoryUploadHandler, TemporaryFileUploadHandler)
>  in a sequence, and now the memory handler doesn't have to worry about
>  what happens nor does the TemporaryFileUploadHandler need to worry
>  about what's before.
>  Basically every upload handler I can think of writing benefits from
>  this. For example, the UploadProgressUploadHandler would be wedged
>  between:
>
> (InMemoryUploadHandler, UploadProgressUploadHandler,
>  TemporaryFileUploadHandler)
>
>  So that file progress only gets computed for uploads that go into the
>  file system. Avoiding the overhead for small uploads.

OK, I think I understand; thanks. It's hard sometimes figuring out a
thought process from its final result. I think you're probably right
to make this a list, then.

>  This [FILE_UPLOAD_HANDLERS] took a while for me to get sold on. I can
>  only think of 5 or 6 useful upload handlers, but that's still 1956
>  possible orderings. It'd be tough to make a handler easy to install
>  when it has to be aware of where it belongs. This functionality could
>  be replicated in a middleware which reads the settings, but I'm not
>  sure that's the best place for something like that.

Hrm, good point. I'll chew a bit more, but I can't think of a good way
to avoid the extra setting (as much as I dislike creeping settings).

Jacob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Mike Axiak

Thanks for the review!

On Apr 4, 12:28 pm, "Jacob Kaplan-Moss" <[EMAIL PROTECTED]>
wrote:
> >    TextFileForm(data={'description': u'Assistance'}, files={'file':
> >  {'filename': 'test1.txt', 'content': 'hello world'}})
>
> What would an equivalent line look like under the new system? That is, what do
> folks need to change their tests to?
The patch does this for all the tests in tests/, but here's an
example:

TextFileForm(data={'description': u'Assistance'}, files={'file':
SimpleUploadedFile('test1.txt', 'hello world')})

Note the SimpleUploadedFile, which takes the filename and the content.

> Yes, this is correct approach. Something along these lines:
>
>    if isinstance(uploadedfile, dict):
>         warn(...)
>         uploadedfile = uploadedfile_from_dict(uploadedfile)

I like this way better than special-casing it. I'll update this in the
patch.

> I'm thinking YAGNI here. Why would I need multiple upload handlers? I think 
> you
> need to talk me through your thinking here, because at first glance this 
> smacks
> of overegineering. Remember that multiple handlers can always be accomplished 
> by
> composition anyway, so unless there's a good reason I think 
> set_upload_handler()
> is just much cleaner.

Composition gets a little tricky. I realized this when I wrote the
stuff that handles the composition.
Right now we have (InMemoryUploadHandler, TemporaryFileUploadHandler)
in a sequence, and now the memory handler doesn't have to worry about
what happens nor does the TemporaryFileUploadHandler need to worry
about what's before.
Basically every upload handler I can think of writing benefits from
this. For example, the UploadProgressUploadHandler would be wedged
between:

(InMemoryUploadHandler, UploadProgressUploadHandler,
TemporaryFileUploadHandler)

So that file progress only gets computed for uploads that go into the
file system. Avoiding the overhead for small uploads.

> Similarly, I'm a bit suspicious of FILE_UPLOAD_HANDLERS. Couldn't you just 
> write
> a simple middleware to do the same thing? As a general principle, you should 
> try
> as hard as you can to avoid introducing new settings; if there's another way
> just do it that way. In this case, I'd just document that if you want to use a
> custom upload handler globally that you should write a middleware class.

This took a while for me to get sold on. I can only think of 5 or 6
useful upload handlers, but that's still 1956 possible orderings. It'd
be tough to make a handler easy to install when it has to be aware of
where it belongs. This functionality could be replicated in a
middleware which reads the settings, but I'm not sure that's the best
place for something like that.

> I quite like the API. I'm not sure why you'd need to return a custom
> UploadedFile from an upload handler, but props for documenting the interface
> anyway :)

If the upload handlers don't alter the content or represent any
storage change then sure they wouldn't need an additional UploadedFile
class.
However, the instant they pass the file to some remote location
(think: S3UploadHandler) or alter the content (think:
GZipUploadHandler) they will need their own way of defining what is
content and how it should be "saved".

Thanks for the thorough review.

Cheers,
Mike

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-04-04 Thread Jacob Kaplan-Moss

On Mon, Mar 24, 2008 at 1:02 PM, Mike Axiak <[EMAIL PROTECTED]> wrote:
>  Now that we actually have a working patch [1], there are a few details
>  I'd like to raise here.

Woo! Thanks for your hard work. My thoughts on your questions follow inline:

>  Supporting dictionaries in form code
>  
>  [...]
>TextFileForm(data={'description': u'Assistance'}, files={'file':
>  {'filename': 'test1.txt', 'content': 'hello world'}})

What would an equivalent line look like under the new system? That is, what do
folks need to change their tests to?

>  I see three options to deal with this:
>  [...]
>   2. Modify the form code to access the attributes of the
>  UploadedFile, but on AttributeError use the old dict-style interface
>  and emit a DeprecationWarning.

Yes, this is correct approach. Something along these lines:

   if isinstance(uploadedfile, dict):
warn(...)
uploadedfile = uploadedfile_from_dict(uploadedfile)

Option #3 is unacceptable: if at all possible we want people's tests to
not break. Warnings are fine; breakage if avoidable is a bad idea.

>  The other issue is what we should do with the tests. Should we
>  leave them? Should we copy them and create a copy of them for the new
>  style? Should we replace them with the new style?

The latter -- fix all the tests to use the new syntax as a demo of how it's
supposed to be done.

>  Having upload_handlers be a list object
>  ---
>  [...]
>  Therefore, I decided to just leave it as a plain old list. Thus, to
>  add an upload handler, you'd just write::
>
>request.upload_handlers.append(some_upload_handler)
>
>  And to replace them all::
>
>request.upload_handlers = [some_upload_handler]

If we do indeed need this -- see below -- then this is the right way to do it.

>  What do people think about this?

I'm thinking YAGNI here. Why would I need multiple upload handlers? I think you
need to talk me through your thinking here, because at first glance this smacks
of overegineering. Remember that multiple handlers can always be accomplished by
composition anyway, so unless there's a good reason I think set_upload_handler()
is just much cleaner.

Similarly, I'm a bit suspicious of FILE_UPLOAD_HANDLERS. Couldn't you just write
a simple middleware to do the same thing? As a general principle, you should try
as hard as you can to avoid introducing new settings; if there's another way
just do it that way. In this case, I'd just document that if you want to use a
custom upload handler globally that you should write a middleware class.

>  (Mis)Handling Content-Length
>  
>  [...]
>  There's probably not much room for argument here, but it's worth
>  asking a larger group. Currently when you try uploading Large files
>  (~2GB and greater), you will get a weird Content-Length header (less
>  than zero, overflowing).

Personally, I can't see any reason to care too much about people trying
to upload 2GB files through the web, anyway. Anyone allowing uploads should
have a sane upload size limit set at the web server level. Anyone who's allowing
uploads of over 2GB is just asking to get DOSed.

I think this is a place where we just add a note to the docs about setting the
Apache/lighttpd/etc. upload limit and move on.

>  Revised API
>  ===
>  [...]
>  Let me know if you have any comments on the API. (Things like how it
>  deals with multiple handlers could be up for discussion.)

I quite like the API. I'm not sure why you'd need to return a custom
UploadedFile from an upload handler, but props for documenting the interface
anyway :)

Thanks again for the hard work -- this looks very good!

Jacob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-03-26 Thread Malcolm Tredinnick


On Tue, 2008-03-25 at 22:59 -0700, Mike Axiak wrote:
> On Mar 25, 3:38 pm, Mike Axiak <[EMAIL PROTECTED]> wrote:
> > Yes. :-)
> Doh. I didn't realize you actually wanted me to do more work.

I didn't either. I thought I may have just been missing something
obvious. But now that you've written the extra bits at the top of the
document, it makes more sense to me as a user. Thanks.

Okay, so this all has to go on the review pile now I guess. That should
be fun, for some value of "fun". Nice work, Mike.

Regards,
Malcolm

-- 
No one is listening until you make a mistake. 
http://www.pointy-stick.com/blog/


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-03-25 Thread Mike Axiak

On Mar 25, 3:38 pm, Mike Axiak <[EMAIL PROTECTED]> wrote:
> Yes. :-)
Doh. I didn't realize you actually wanted me to do more work.

The newly updated doc can be found at the same URL:
http://orodruin.axiak.net/upload_handling.html

Be sure to refresh as the browser caching is high on that page.

-Mike
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-03-25 Thread Malcolm Tredinnick


On Mon, 2008-03-24 at 11:02 -0700, Mike Axiak wrote:
> Now that we actually have a working patch [1], there are a few details
> I'd like to raise here.

I haven't had time to sit down and devote to reading through the whole
patch, but I have a possibly very easy question that I can't answer from
the docs at the moment.

I'm a simple guy. I like simple stuff. So if I'm deploying a Django
system and it will handle file uploads from some forms and all I want to
do is ensure that large file uploads aren't held in memory, how do I do
that? In other words, I want to avoid the problem that originally
prompted this ticket and the related one.

Does this Just Work(tm) out of the box?

Malcolm

-- 
Remember that you are unique. Just like everyone else. 
http://www.pointy-stick.com/blog/


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-03-25 Thread Mike Axiak

Hey,

On Mar 25, 6:04 am, Ivan Sagalaev <[EMAIL PROTECTED]> wrote:
> I don't like #1 because there's no point to keep deprecated code in
> Django where we can fix it. And #3 is indeed broken because there is
> much code in the wild that uses request.FILES directly. It's a public
> API after all.
I thought this would be confused. Currently when you want to upload
something to a form, you would do something along the lines of::

f = SomeForm(request.POST, request.FILES)

This actually invokes two separate interfaces: the objects inside
request.FILES, and the way the form deals with it. Currently, the form
assumes request.FILES contains dictionaries, and so (like the
documentation) you can feed it dictionaries of data and it will work.
How the form treats this data is the specific interface I'm talking
about, not what request.FILES contains.

Anyway, I'll modify the patch to do #2 anyway. Thanks for the input.

-Mike
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-03-25 Thread Ivan Sagalaev

Mike Axiak wrote:
>  1. Leave forms code alone, causing (a) DeprecationWarnings and (b)
> the files to be read into memory when saving.
>  2. Modify the form code to access the attributes of the
> UploadedFile, but on AttributeError use the old dict-style interface
> and emit a DeprecationWarning.
>  3. Modify form code to use the new UploadedFile object, but just
> break on the old dictionary access as it's largely an internal API
> anyway.

I don't like #1 because there's no point to keep deprecated code in 
Django where we can fix it. And #3 is indeed broken because there is 
much code in the wild that uses request.FILES directly. It's a public 
API after all.

#2 looks reasonable to me.

> I realized that I just want people to have a list interface.
> Therefore, I decided to just leave it as a plain old list. Thus, to
> add an upload handler, you'd just write::
> 
>request.upload_handlers.append(some_upload_handler)
> 
> And to replace them all::
> 
>request.upload_handlers = [some_upload_handler]
> 
> I've made a few efforts to ensure that it will raise an error if the
> upload has already been handled. I know this isn't as simple as the
> .set_upload_handler() interface, but I think it's the simplest way we
> can support the list modification/replacement in a useful fashion.
> What do people think about this?

It would be good to invent a declarative setting for this but since it's 
a per-view thing it would be hard. So may be indeed a list would be enough.

> Currently when you try uploading Large files
> (~2GB and greater), you will get a weird Content-Length header (less
> than zero, overflowing). 
> ...
> Should
> I/we just ignore this and expect people to be sane and upload
> reasonable data?

If Content-Length header is crewed then neither we, nor users can do 
anything about it. When these file volumes become more widespread I 
believe browsers will fix themselves to send correct Content-Length.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Streaming Uploads Discussion

2008-03-24 Thread Mike Axiak

Now that we actually have a working patch [1], there are a few details
I'd like to raise here.

Major Issues
===

Supporting dictionaries in form code

While we can have file objects that support the dictionary lookup to
make those backwards compatible, there's still the API of the forms
code.
As an example, there are a lot of tests that look like::

   TextFileForm(data={'description': u'Assistance'}, files={'file':
{'filename': 'test1.txt', 'content': 'hello world'}})

This is an example of something using a dictionary instead of a
UploadedFile instance when using the forms. However, the forms should
*probably* be using the new fileobject API (use .chunks()/.read()
rather than ['content']), so that would mean they would break if fed a
dictionary like the above example. I see three options to deal with
this:

 1. Leave forms code alone, causing (a) DeprecationWarnings and (b)
the files to be read into memory when saving.
 2. Modify the form code to access the attributes of the
UploadedFile, but on AttributeError use the old dict-style interface
and emit a DeprecationWarning.
 3. Modify form code to use the new UploadedFile object, but just
break on the old dictionary access as it's largely an internal API
anyway.

Without thinking about it, I wrote the patch for (3) and modified the
tests appropriately. I'm now thinking people will probably want (2)
more. The other issue is what we should do with the tests. Should we
leave them? Should we copy them and create a copy of them for the new
style? Should we replace them with the new style?


Having upload_handlers be a list object
---
I know that .set_upload_handler() was really cool, but I've discarded
it to support multiple handlers (see below for details). So now I had
to think of what to replace it with. At first I thought:
.append_upload_handler(), but then I realized that I may want
.insert_upload_handler(), and .delete_upload_handler(). So before long
I realized that I just want people to have a list interface.
Therefore, I decided to just leave it as a plain old list. Thus, to
add an upload handler, you'd just write::

   request.upload_handlers.append(some_upload_handler)

And to replace them all::

   request.upload_handlers = [some_upload_handler]

I've made a few efforts to ensure that it will raise an error if the
upload has already been handled. I know this isn't as simple as the
.set_upload_handler() interface, but I think it's the simplest way we
can support the list modification/replacement in a useful fashion.
What do people think about this?


(Mis)Handling Content-Length

There's probably not much room for argument here, but it's worth
asking a larger group. Currently when you try uploading Large files
(~2GB and greater), you will get a weird Content-Length header (less
than zero, overflowing). The problem is two-fold:
 1. We can't easily exhaust the request data without knowing its
length.
 2. If we don't exhaust the request data, we get a sad Connect Reset
error from the browsers.
This means that even if we wanted to display a useful error page to
people telling them why this happened, it won't happen because we'll
just get Connection Reset errors. We can just ignore it, or we can try
modifying the request streams to set timeouts and exhaust the input.
My perusal of Apache docs tells me that there's discard_request_body,
but unfortunately even this seems to depend on Content-Length. Should
I/we just ignore this and expect people to be sane and upload
reasonable data?

Revised API
===

I was about to write a huge document that described the API in here.
But then I realized I might as well write it in restructuredtext. Then
I realized I might as well make it formatted like a Django document.
And well...you can see the outcome at:
   http://orodruin.axiak.net/upload_handling.html

Let me know if you have any comments on the API. (Things like how it
deals with multiple handlers could be up for discussion.)

Addendum


One last note: In order to test the API and everything, I created a
google code project 'django-uploadutils' [2] which I hope will become
a collection of upload handlers for doing common tasks. If anyone is
interested in helping out with this, I'd be happy to add you as a
contributor.


Regards,
Mike


1: http://code.djangoproject.com/ticket/2070
2: http://code.google.com/p/django-uploadutils/

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---