Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
About the initial problem that started this thread, the associated bug number is #10948. On Apr 6, 10:41 pm, Karen Traceywrote: > On Mon, Apr 6, 2009 at 1:56 PM, Marty Alchin wrote: > > > On Mon, Apr 6, 2009 at 1:16 PM, Karen Tracey wrote: > > > I feel like I'm going around in circles thinking about this one -- is > > there > > > a way out that someone else sees that I'm blind to? > > > Welcome to my world! :( I spent a long time on issues like this prior > > to getting the new file storage system in place at first, and > > apparently skipped much of that step when doing r9766. At this point, > > I think the most useful approach is to take a step back and just look > > at the high-level options we have available. Trying to determine the > > name in advance is a no-go, because that would just make the existing > > known race conditions much more prominent. I see two options left: > > > 1. Write something to disk (maybe the whole file, as it was before, > > maybe just a placeholder, as the _save() does now) when the file is > > first assigned, so that we have a full, proper filename early on. > > Then, if the model doesn't get saved, somehow roll that back so we > > don't leave stuff lingering on the filesystem. This is currently > > something of a "then some magic happens" approach, since I'm not yet > > sure if there's a reasonable, reliable way to roll back a file save. > > But I'd like to keep an open mind, so there it is. This would also be > > preferable if we can detect transaction rollbacks, because there is > > also #6456 to consider. > > I'm not seeing where exactly we have a _save() that writes just a > placeholder? The _save() in django.core.files.storage.FileSystemStorage > writes the whole content, and I don't see any other _save as opposed to > save() methods in the FileField area? > > > > > > > 2. Save the file all at once, as soon as possible, and blatantly > > document this behavior when model validation goes in. Then, if people > > need to validate a model that has a file on it, they have two options: > > validate the model *before* saving the file to it, so the model is > > known to be valid and can all be saved at once, or (if they need to > > validate something in the file, such as its name or contents), add > > their own code to delete the file if validation fails. Or, I suppose a > > third option is to ignore the lingering files until they suck up too > > much space, requiring a manual purge. > > > I obviously hate to go with number 2, but if we can't come up with > > something solid, I think it's the better approach, at least for now. > > Documenting unfortunate behavior is certainly preferable to coding > > even more unfortunate behavior. > > What about: > > 3. Revert the removal of the FileField save_form_data override that was part > of r9766. It is that removal that is causing save() on a ModelForm to no > longer save the file to disk. If we restore it, names will get assigned for > code using ModelForms just as they used to be. > > I am missing why the save_form_data override had to be removed in order to > support the direct assignment of files to FileFields that is needed for > model validation to work without the side-effect of model validation saving > the file to disk. I can see that it might be preferable, in general, to > delay saving as long as possible so you don't wind up with files written to > disk for models that are ultimately not saved to the DB. But we didn't have > that behavior in 1.0 -- model form save (even with commit=False), in 1.0.X, > saves files to disk. Trying to move the save later at this point, for the > model form usage scenario, leads to backwards-compatibility problems. > > It's kind of ugly to have one form of model creation/assignment ( via model > forms) save data early and a 2nd (assignment of files to FileFields) save > data late, so I can see we might not want to do this. But I thought I'd > throw it out there. > > Reverting just that bit of r9766 also doesn't fix #10249 or #10300 which are > resulting form the mixing in of FieldFile in the class bases. It also > doesn't fix #10404 in the case where someone uses the new > assign-file-to-FileField without saving the field explicitly before the > model. So they'd still need fixing, but they seem more easily solved than > this file name issue. > > Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
On Mon, Apr 6, 2009 at 1:56 PM, Marty Alchinwrote: > > On Mon, Apr 6, 2009 at 1:16 PM, Karen Tracey wrote: > > I feel like I'm going around in circles thinking about this one -- is > there > > a way out that someone else sees that I'm blind to? > > Welcome to my world! :( I spent a long time on issues like this prior > to getting the new file storage system in place at first, and > apparently skipped much of that step when doing r9766. At this point, > I think the most useful approach is to take a step back and just look > at the high-level options we have available. Trying to determine the > name in advance is a no-go, because that would just make the existing > known race conditions much more prominent. I see two options left: > > 1. Write something to disk (maybe the whole file, as it was before, > maybe just a placeholder, as the _save() does now) when the file is > first assigned, so that we have a full, proper filename early on. > Then, if the model doesn't get saved, somehow roll that back so we > don't leave stuff lingering on the filesystem. This is currently > something of a "then some magic happens" approach, since I'm not yet > sure if there's a reasonable, reliable way to roll back a file save. > But I'd like to keep an open mind, so there it is. This would also be > preferable if we can detect transaction rollbacks, because there is > also #6456 to consider. > I'm not seeing where exactly we have a _save() that writes just a placeholder? The _save() in django.core.files.storage.FileSystemStorage writes the whole content, and I don't see any other _save as opposed to save() methods in the FileField area? > > 2. Save the file all at once, as soon as possible, and blatantly > document this behavior when model validation goes in. Then, if people > need to validate a model that has a file on it, they have two options: > validate the model *before* saving the file to it, so the model is > known to be valid and can all be saved at once, or (if they need to > validate something in the file, such as its name or contents), add > their own code to delete the file if validation fails. Or, I suppose a > third option is to ignore the lingering files until they suck up too > much space, requiring a manual purge. > > I obviously hate to go with number 2, but if we can't come up with > something solid, I think it's the better approach, at least for now. > Documenting unfortunate behavior is certainly preferable to coding > even more unfortunate behavior. > What about: 3. Revert the removal of the FileField save_form_data override that was part of r9766. It is that removal that is causing save() on a ModelForm to no longer save the file to disk. If we restore it, names will get assigned for code using ModelForms just as they used to be. I am missing why the save_form_data override had to be removed in order to support the direct assignment of files to FileFields that is needed for model validation to work without the side-effect of model validation saving the file to disk. I can see that it might be preferable, in general, to delay saving as long as possible so you don't wind up with files written to disk for models that are ultimately not saved to the DB. But we didn't have that behavior in 1.0 -- model form save (even with commit=False), in 1.0.X, saves files to disk. Trying to move the save later at this point, for the model form usage scenario, leads to backwards-compatibility problems. It's kind of ugly to have one form of model creation/assignment ( via model forms) save data early and a 2nd (assignment of files to FileFields) save data late, so I can see we might not want to do this. But I thought I'd throw it out there. Reverting just that bit of r9766 also doesn't fix #10249 or #10300 which are resulting form the mixing in of FieldFile in the class bases. It also doesn't fix #10404 in the case where someone uses the new assign-file-to-FileField without saving the field explicitly before the model. So they'd still need fixing, but they seem more easily solved than this file name issue. Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
On Mon, Apr 6, 2009 at 1:16 PM, Karen Traceywrote: > We cannot know for sure what the file name is until it is saved to disk, as > the save operation may tack on underscores when handling race conditions. > Thus we cannot delay file save to a field pre_save routine and report the > guaranteed correct file name in a pre_save signal handler, since the latter > runs before the former. Yes, very true. I wrote my reply in a hurry to get you some more information, and didn't really think all the way through that one. > I feel like I'm going around in circles thinking about this one -- is there > a way out that someone else sees that I'm blind to? Welcome to my world! :( I spent a long time on issues like this prior to getting the new file storage system in place at first, and apparently skipped much of that step when doing r9766. At this point, I think the most useful approach is to take a step back and just look at the high-level options we have available. Trying to determine the name in advance is a no-go, because that would just make the existing known race conditions much more prominent. I see two options left: 1. Write something to disk (maybe the whole file, as it was before, maybe just a placeholder, as the _save() does now) when the file is first assigned, so that we have a full, proper filename early on. Then, if the model doesn't get saved, somehow roll that back so we don't leave stuff lingering on the filesystem. This is currently something of a "then some magic happens" approach, since I'm not yet sure if there's a reasonable, reliable way to roll back a file save. But I'd like to keep an open mind, so there it is. This would also be preferable if we can detect transaction rollbacks, because there is also #6456 to consider. 2. Save the file all at once, as soon as possible, and blatantly document this behavior when model validation goes in. Then, if people need to validate a model that has a file on it, they have two options: validate the model *before* saving the file to it, so the model is known to be valid and can all be saved at once, or (if they need to validate something in the file, such as its name or contents), add their own code to delete the file if validation fails. Or, I suppose a third option is to ignore the lingering files until they suck up too much space, requiring a manual purge. I obviously hate to go with number 2, but if we can't come up with something solid, I think it's the better approach, at least for now. Documenting unfortunate behavior is certainly preferable to coding even more unfortunate behavior. -Gul --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
On Thu, Apr 2, 2009 at 12:33 PM, Marty Alchinwrote: > > A related, thus-far-unreported (I think) issue comes up when > attempting to access width_field and height_field attributes on a > model prior to saving the new file. > Actually it looks like this was kinda reported, it just wasn't recognized as related to r9766: http://code.djangoproject.com/ticket/10404 And the problem is slightly worse than just being unable to access the fields prior to saving: turns out if the fields are declared before the associated image field in the model, they aren't set properly at all. We've now got unresolved: #10249 - MRO exception on attempt to assign File to FileField #10300 - Storage backend cannot get length of content #10404 - Image width/height fields not set properly plus the file name issue (pre-save signal handler does not get actual file name) that started this thread, which I don't believe has a ticket. There are tentative patches for the first two (plus Marty you say you have some ideas how to fix them, but I'm not sure how those ideas differ from what's noted in the tickets). #10404 seems fixable as well given a bit of thought. But I really don't know what to do about the file name issue. We cannot know for sure what the file name is until it is saved to disk, as the save operation may tack on underscores when handling race conditions. Thus we cannot delay file save to a field pre_save routine and report the guaranteed correct file name in a pre_save signal handler, since the latter runs before the former. In fact, there may be code out there expecting to know the correct name much earlier than a pre-save signal. For example: form = ModelFormForModelContainingFileField(request.POST, request.FILES) if form.is_valid(): m = form.save(commit=False) # this actually saved the file to disk pre-9766 # ... code that uses the FileField's assigned file name ... works on 1.0.X but will run into trouble post-r9766 as the file won't be saved anymore as part of the form.save(commit=False). I can easily cause some of our existing model_forms tests to fail due to reported file name differences by passing commit=False on form save. Making the same commit=False additions pre-9766 does not cause the failures, as pre-r9766 the file was saved and the name was set during form save regardless of the commit argument. I don't see how we can support knowing the actual file name as early as we used to and simultaneously delay saving the file long enough so that model validation doesn't cause file data to be written to disk...though I don't know enough about the model validation implementation to understand exactly why it was triggering the write-to-disk, as in the code above it's happening during a form-specific save routine and a form wouldn't necessarily be involved in model validation. I feel like I'm going around in circles thinking about this one -- is there a way out that someone else sees that I'm blind to? Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
I'm short on time right now so I just want to respond to one part of this that I just noticed and I'll come back later to digest the rest. Thanks for the background on the why to r9766. I knew it was model-validation related but didn't know what the specific issue was. It helps to have a clearer idea of the actual problem it's trying to solve. Now the additional wrinkle I just noticed: On Thu, Apr 2, 2009 at 12:33 PM, Marty Alchinwrote: > As for addressing this one in particular, I think we'll have to add > something similar to the _committed flag, so we can identify whether > the filename has been processed or not. If it has, return it; > otherwise, generate it using upload_to when it's requested. The > trouble with just moving it up into save_form_data() is that if > someone assigns a file manually, save_form_data() won't get the chance > to generate a proper filename and we'll have a separate issue on our > hands. I hate having to add all these flags to determine what part of > the process we're in, but at a glance, that seems the most reasonable > approach. > Upon further investigation, I think it's going to be pretty hard to disentangle setting the right file name from the actual save. While the adding in of the upload_path part can be done whenever, the possible underscore additions are done, I now see, as part of save. There's actually a loop in _save() that deals with multiple threads possibly picking the same name and colliding on save -- if it turns out the tentatively assigned name is already in use, get_available_name() is called again (adding at least one more underscore) to generate the next available name: http://code.djangoproject.com/browser/django/trunk/django/core/files/storage.py#L150 Thus, as it is right now, the actual file name on disk isn't going to be really known for sure until the file is saved. I need to head out the door in a few minutes and don't at the moment have any good idea for how to deal with this. "Reserving" the name by actually creating the file on disk before save I think just puts things back to where they were pre-r9766: you'd wind up with possibly something (maybe just an empty file) saved on disk when in the end ultimately the model containing the file field was not saved, unless we do something clever to un-do the "reservation" at some (what?) point. Ugh. Maybe there's a better way I'm overlooking? Finally one thought I'll throw out there -- since r9766 was needed for model validation, and that's not now going to be in 1.1, does r9766 need to stay in 1.1? Personally I'd prefer to solve the issues now if it's going to be needed for model validation anyway in the future...but if it comes down to the wire and we don't have good answers for some of these side-effects, could we just un-do r9766? Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
On Thu, Apr 2, 2009 at 11:45 AM, Karen Traceywrote: > I think it's a bug, and I'm pretty sure it was introduced by r9766. The > setting of the actual file name that will be used (which pulls in the > upload_to path and possibly tacks on some underscores if the uploaded name > conflicts with already-present file(s) in the upload directory) is done by > django.db.models.fields.files.FieldFile save(). This used to be called > fairly early on from the django.db.models.fields.files.FileField > save_form_data() method but r9766 moved that to much later, namely a > newly-introduced pre_save() routine in FileField. Since (from a brief look > at django.db.models.base.Model save_base()) the field pre_save routine is > called after the pre_save signal is sent, the real file name has now not yet > been set when the pre_save signal is sent. Ugh. Thanks, Karen, for looking into this and so many others that came up after r9766. I've been fairly negligent on these, and it's starting to cause some pretty big problems. r9766 may well mark the last time I try to be clever. :( > There are a couple of still-open tickets for other side-effects of r9766 > (#10249, #10300) but I don't believe either of the potential fixes for those > would fix this issue, so this probably needs a ticket with the details you > provided (an integrated testcase would be even better, though I don't know > how much signal testing is done by the test suite). Not sure what the right > fix would be -- can just the setting of the name be pulled out of save() and > done earlier by save_form_data()? Possibly save() would have to also still > ensure it's done, in cases where save_form_data() is not involved? Can > anyone with a better handle on r9766 comment on this (and #10249, #10300)? For the record, #10044 (which was "fixed" by r9766) was entered as a precursor to model validation. After some discussions with Honza, we discovered that validating a model with a file would cause the file to get saved to storage even if the model itself would never get stored in the database. Our solution was to allow files to be assigned to models outside the save() logic, and only store the file on disk when the model actually hits the database. Unfortunately, that proved problematic, because if you assigned an UploadedFile to a model attribute, it didn't have the FieldFile methods necessary to actually store it in the backend. I briefly considered leaving that alone, and not relying on FieldFile when finally saving the delayed file, but that created an inconsistent API. After all, if you have a function that acts on models with files attached, but you pass it a model with an *unsaved* file, it would have a different set of methods than if the file had been saved. Thus, r9766 includes the task of creating a new hybrid class whenever a new file is assigned to a FileField attribute. This implementation detail is the cause of #10249 and #10300 that you mentioned, but this one stems from the simple fact that file saving is delayed as long as possible. A related, thus-far-unreported (I think) issue comes up when attempting to access width_field and height_field attributes on a model prior to saving the new file. I have a couple ideas to try out with regard to #10249 and #10300, so feel free to hit me up on IRC if you'd like to discuss them, but I'm not as certain how to approach this new one yet. There are certainly band-aids that would allow the filename to be generated prior to storing the file, but that wouldn't address the width_field/height_field concern, nor any other potential future issues that arise from delaying the file. I'd like to get to a point where we're confident this sort of thing won't happen again, but if the only option is to just keep putting out fires and they flare up, it'll have to do. As for addressing this one in particular, I think we'll have to add something similar to the _committed flag, so we can identify whether the filename has been processed or not. If it has, return it; otherwise, generate it using upload_to when it's requested. The trouble with just moving it up into save_form_data() is that if someone assigns a file manually, save_form_data() won't get the chance to generate a proper filename and we'll have a separate issue on our hands. I hate having to add all these flags to determine what part of the process we're in, but at a glance, that seems the most reasonable approach. -Gul --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
On Thu, Apr 2, 2009 at 8:59 AM, Lior Gradsteinwrote: > > I noticed an important change in behaviour between Django 1.0.2 and > Django 1.1+ (svn and beta versions). > > If I set a signal that intercepts as pre_save and try to access a > FileField/ImageField's path, I don't get the same result. It seems the > 'upload_to' attribute is not taken into account! > It seems to be set much later, compared to version 1.0.2. Is this a > feature or a bug? > I think it's a bug, and I'm pretty sure it was introduced by r9766. The setting of the actual file name that will be used (which pulls in the upload_to path and possibly tacks on some underscores if the uploaded name conflicts with already-present file(s) in the upload directory) is done by django.db.models.fields.files.FieldFile save(). This used to be called fairly early on from the django.db.models.fields.files.FileField save_form_data() method but r9766 moved that to much later, namely a newly-introduced pre_save() routine in FileField. Since (from a brief look at django.db.models.base.Model save_base()) the field pre_save routine is called after the pre_save signal is sent, the real file name has now not yet been set when the pre_save signal is sent. There are a couple of still-open tickets for other side-effects of r9766 (#10249, #10300) but I don't believe either of the potential fixes for those would fix this issue, so this probably needs a ticket with the details you provided (an integrated testcase would be even better, though I don't know how much signal testing is done by the test suite). Not sure what the right fix would be -- can just the setting of the name be pulled out of save() and done earlier by save_form_data()? Possibly save() would have to also still ensure it's done, in cases where save_form_data() is not involved? Can anyone with a better handle on r9766 comment on this (and #10249, #10300)? Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2
I noticed an important change in behaviour between Django 1.0.2 and Django 1.1+ (svn and beta versions). If I set a signal that intercepts as pre_save and try to access a FileField/ImageField's path, I don't get the same result. It seems the 'upload_to' attribute is not taken into account! It seems to be set much later, compared to version 1.0.2. Is this a feature or a bug? Here's a very simple example to show the problem: * settings.py (relevant parts only): MEDIA_ROOT = '/tmp/static' MEDIA_URL = '/static/' * models.py: from django.db import models from django.db.models import signals class MyModel(models.Model): myfile = models.FileField(upload_to="uploads") def intercept(sender, instance, signal, *args, **kwargs): print instance.myfile print instance.myfile.path print instance.myfile.url signals.pre_save.connect(intercept, sender=MyModel) - In Django 1.0.2-final: uploads/a__.flv /tmp/static/uploads/a__.flv /static/uploads/a__.flv - In Django 1.1+ (1.1beta and latest SVN version 1.1 beta 1 SVN-10367): a.flv /tmp/static/a.flv /static/a.flv Which is really wrong. Thanks, Lior --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---