Re: Denormalisation, magic, and is it really that useful?
Hi, Erik. The main purpose is to have declarative form of composition field calculation definition. Not to write imperative actions/signal connection/etc. I've made only flexible generic solution. In future I plan to write high-level subclasses that will can with minimal input parameters make all work very comfortable for developers. And then there will be no need to write all denormalisation functionality once again and again, from project to project. On Sep 26, 8:56 pm, Erik Allik <[EMAIL PROTECTED]> wrote: > I was just wondering.. Why all this abstraction? > > Why do we need a separate field for denormalization? Can't we just use > regular fields and simply set up denormalization in a procedural way > in the constructor? All that needs to be done to create a denormalized > field is connect a few signals maybe or do some expensive compuation > after modifying a field. But maybe I'm missing something crucial. > > Erik > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
I was just wondering.. Why all this abstraction? Why do we need a separate field for denormalization? Can't we just use regular fields and simply set up denormalization in a procedural way in the constructor? All that needs to be done to create a denormalized field is connect a few signals maybe or do some expensive compuation after modifying a field. But maybe I'm missing something crucial. Erik On 26.09.2008, at 18:45, Alex Koshelev wrote: > > Hi, guys! > > For a long time I have thoughts to make composition/denormalisation a > little bit easier and reusable. But I have no time to implement my > ideas. > Inspired by Anrew's blog post and its thread I recently wrote some > code. And I think that it really may be useful and cover most of > denormalisation use cases in Django. > > And this is an example of Anrew's code but with my CompositionField: > > class Picture(models.model): >user = models.ForeignKey(User) >user_username = CompositionField( > native=models.CharField(max_length=150), > trigger=dict( > sender_model=User, > field_holder_getter=lambda user: > user.picture_set.all(), > do=lambda picture, user, signal: > user.username >) > ) >image = models.ImageField(upload_to="foo") >title = models.CharField(max_length=100) > > Its more verbose but very generic and can be customized. And free > bonus - auto-generated `update_user_username` method. > > The code available here [1]. Its very generic but allows to write more > high-level subclasses with some introspection. > > Today I've written the blog post about it and my future suggestions. > It is in Russian[2], here is a google-translated version [3] (but with > broken code blocks) that I think can help to understand some concepts. > And it holds more usage examples. > > I have many new ideas of how my solution can be improved and I what to > hear you opinions about it. > > Thanks. > > [1]: http://svn.turbion.org/turbion/trunk/turbion/utils/composition.py > [2]: http://webnewage.org/post/2008/9/26/krasivaya-kompozitsiya/ > [3]: > http://translate.google.com/translate?u=http%3A%2F%2Fwebnewage.org%2Fpost%2F2008%2F9%2F26%2Fkrasivaya-kompozitsiya%2F%23comment_720=en=UTF-8=ru=en > > On Sep 23, 12:03 am, Andrew Godwin <[EMAIL PROTECTED]> wrote: >> So, hello everyone. I figure this list is the best place to ask this, >> but please feel free to deride me if not... >> >> After all the talk of multiple databases, and non-relational >> databases >> (bigtable, couchdb, etc.) that went on at DjangoCon and >> afterward,I've >> been thinking about denormali[s/z]ation and how to make it easier in >> Django; previously, I had thought denormalisation was something you >> did >> to ruin your database, but I can now see it can be additive as well. >> >> In this vein, last weekend I decided to see how easy it would be to >> write a magical DenormalisationField which handled all the >> synchronising >> of data from the relevant tables given a normal ForeignKey >> relationship; >> the results are athttp://www.aeracode.org/2008/9/14/denormalisation-follies/ >> >> Basically, yes, it is possible, and a lot of people (eight! for my >> blog, >> that's a lot) seem to like the idea; it certainly saves having to >> write >> that replicate-on-save logic that otherwise goes with this kind of >> schema design. >> >> Given the fact that it seems like a reasonably common thing to do in >> large-scale applications, I'd like to know what people think about >> adding this kind of thing into core (while I have a standalone >> implementation, that needs some hacks I'd like to see gone). My main >> issues with this idea are: >> >> 1) It relies on having a ForeignKey pointing to the other model, so >> this >> might not work for non-relational databases (well, unless ForeignKeys >> are still allowed in that case, but they just perform expensive >> SELECTs >> as well as possibly printing abusive messages to stderr). >> >> 2) I'm not sure how common a use case this is; it sounds like it >> might >> be verging into the long tail a bit. >> >> 3) People also want other things denormalised - count()s, aggregate >> queries, etc. Of course, writing fields for these too is similar, and >> I'd be happy to do it, but it might mean that even if the case for >> denormalisation is good this is only a piece of the puzzle. >> >> So, I'd love people's opinions about where I should take this - to a >> dark corner and hide it, to a separate app like django-extensions >> (which >> involves keeping the horrid hacks in there), into a separate app but >> with patches to core to make it less hackish (i.e. more signals), >> or add >> it as my 1.1 pony with the proviso that I'm happy to write all the >> code. >> >> Andrew > > --~--~-~--~~~---~--~~ You received this message
Re: Denormalisation, magic, and is it really that useful?
Hi, guys! For a long time I have thoughts to make composition/denormalisation a little bit easier and reusable. But I have no time to implement my ideas. Inspired by Anrew's blog post and its thread I recently wrote some code. And I think that it really may be useful and cover most of denormalisation use cases in Django. And this is an example of Anrew's code but with my CompositionField: class Picture(models.model): user = models.ForeignKey(User) user_username = CompositionField( native=models.CharField(max_length=150), trigger=dict( sender_model=User, field_holder_getter=lambda user: user.picture_set.all(), do=lambda picture, user, signal: user.username ) ) image = models.ImageField(upload_to="foo") title = models.CharField(max_length=100) Its more verbose but very generic and can be customized. And free bonus - auto-generated `update_user_username` method. The code available here [1]. Its very generic but allows to write more high-level subclasses with some introspection. Today I've written the blog post about it and my future suggestions. It is in Russian[2], here is a google-translated version [3] (but with broken code blocks) that I think can help to understand some concepts. And it holds more usage examples. I have many new ideas of how my solution can be improved and I what to hear you opinions about it. Thanks. [1]: http://svn.turbion.org/turbion/trunk/turbion/utils/composition.py [2]: http://webnewage.org/post/2008/9/26/krasivaya-kompozitsiya/ [3]: http://translate.google.com/translate?u=http%3A%2F%2Fwebnewage.org%2Fpost%2F2008%2F9%2F26%2Fkrasivaya-kompozitsiya%2F%23comment_720=en=UTF-8=ru=en On Sep 23, 12:03 am, Andrew Godwin <[EMAIL PROTECTED]> wrote: > So, hello everyone. I figure this list is the best place to ask this, > but please feel free to deride me if not... > > After all the talk of multiple databases, and non-relational databases > (bigtable, couchdb, etc.) that went on at DjangoCon and afterward,I've > been thinking about denormali[s/z]ation and how to make it easier in > Django; previously, I had thought denormalisation was something you did > to ruin your database, but I can now see it can be additive as well. > > In this vein, last weekend I decided to see how easy it would be to > write a magical DenormalisationField which handled all the synchronising > of data from the relevant tables given a normal ForeignKey relationship; > the results are athttp://www.aeracode.org/2008/9/14/denormalisation-follies/ > > Basically, yes, it is possible, and a lot of people (eight! for my blog, > that's a lot) seem to like the idea; it certainly saves having to write > that replicate-on-save logic that otherwise goes with this kind of > schema design. > > Given the fact that it seems like a reasonably common thing to do in > large-scale applications, I'd like to know what people think about > adding this kind of thing into core (while I have a standalone > implementation, that needs some hacks I'd like to see gone). My main > issues with this idea are: > > 1) It relies on having a ForeignKey pointing to the other model, so this > might not work for non-relational databases (well, unless ForeignKeys > are still allowed in that case, but they just perform expensive SELECTs > as well as possibly printing abusive messages to stderr). > > 2) I'm not sure how common a use case this is; it sounds like it might > be verging into the long tail a bit. > > 3) People also want other things denormalised - count()s, aggregate > queries, etc. Of course, writing fields for these too is similar, and > I'd be happy to do it, but it might mean that even if the case for > denormalisation is good this is only a piece of the puzzle. > > So, I'd love people's opinions about where I should take this - to a > dark corner and hide it, to a separate app like django-extensions (which > involves keeping the horrid hacks in there), into a separate app but > with patches to core to make it less hackish (i.e. more signals), or add > it as my 1.1 pony with the proviso that I'm happy to write all the code. > > Andrew --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
CopiedField sounds a bit off, but otherwise I like the proposed additions. On Thu, Sep 25, 2008 at 8:08 AM, Andrew Godwin <[EMAIL PROTECTED]>wrote: > > David Cramer wrote: > > I would say ignore triggers on the DB level, until they've been > > written in the framework. > > > > Yes, this was essentially my point earlier; triggers would be nice to > have from a consistency point of view, but it will be easier and quicker > to reimplement them in Python, not least because it allows more complex > functionality to be put straight in if we want it in the future. Then, > we can always extend it so that some types of fields are implemented > directly as triggers if you want once the API is settled. > > With all this in mind, my API proposal would be something like this: > > A CopiedField, called like so: user_username = > models.CopiedField("user", "username") [here, "user" is the name of > the ForeignKey to link through] > > An AggregateField: post_count = > models.AggregateField(Post.objects.filter(visible=True), "count") > > The CopiedField essentially works how my current DenormField does, but > using a foreignkey to avoid you having to specify both it _and_ the > model if you have more than one to the same model (a reasonable amount > of the DenormField code is just for finding which foreignkey to link > through, anyway) > > The AggregateField takes a simple queryset (from which it can extract > the model to hook a signal onto), and whenever something in the queryset > is saved (so, either an extra select on each save to see what items are > in the queryset, or just updating the aggregate every time - possibly > with either selectable, since the cost might be higher on either), and > it also takes the function to run on the set - either a builtin name > like "count" here, or an actual function which takes a queryset and > returns a result (in this situation the type of the underlying column > will also have to be specified). > > This should probably cover most cases, and has the advantage of people > being able to specify their own in-python aggregate functions (such as > working out the standard deviation of the users' ages) if they tell us > what kind of field to sling the result in. > > I'm happy to change the field names, too, CopiedField especially is a > little clunky. > > Andrew > > > > -- David Cramer Director of Technology iBegin http://www.ibegin.com/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
David Cramer wrote: > I would say ignore triggers on the DB level, until they've been > written in the framework. > Yes, this was essentially my point earlier; triggers would be nice to have from a consistency point of view, but it will be easier and quicker to reimplement them in Python, not least because it allows more complex functionality to be put straight in if we want it in the future. Then, we can always extend it so that some types of fields are implemented directly as triggers if you want once the API is settled. With all this in mind, my API proposal would be something like this: A CopiedField, called like so: user_username = models.CopiedField("user", "username") [here, "user" is the name of the ForeignKey to link through] An AggregateField: post_count = models.AggregateField(Post.objects.filter(visible=True), "count") The CopiedField essentially works how my current DenormField does, but using a foreignkey to avoid you having to specify both it _and_ the model if you have more than one to the same model (a reasonable amount of the DenormField code is just for finding which foreignkey to link through, anyway) The AggregateField takes a simple queryset (from which it can extract the model to hook a signal onto), and whenever something in the queryset is saved (so, either an extra select on each save to see what items are in the queryset, or just updating the aggregate every time - possibly with either selectable, since the cost might be higher on either), and it also takes the function to run on the set - either a builtin name like "count" here, or an actual function which takes a queryset and returns a result (in this situation the type of the underlying column will also have to be specified). This should probably cover most cases, and has the advantage of people being able to specify their own in-python aggregate functions (such as working out the standard deviation of the users' ages) if they tell us what kind of field to sling the result in. I'm happy to change the field names, too, CopiedField especially is a little clunky. Andrew --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
Denormalized for me personally happens a LOT when it's not just a simple foreign key. Maybe I want to store a many to many field as name,name,name in a column (tags). We're already inside of Django so it should already be available (it's not going to be much slower than it is executing the subquery in sql to get the denormalized output). On Sep 23, 4:19 pm, "Justin Fagnani" <[EMAIL PROTECTED]> wrote: > On Tue, Sep 23, 2008 at 12:52 AM, David Cramer <[EMAIL PROTECTED]> wrote:> > > For me, personally, it would be great if this could accept callables > > as well. So you could store the username, like so, or you could store > > a choices field like: > > > field = models.IntegerField(choices=CHOICES) > > denorm = models.DenormField('self', 'get_field_display') # which > > would rock if it was .field.display ;) > > I think denormalizing with callables is a very different thing than > denormalizing with expressions that can be evaluated by the database. > Not that they both aren't worth supporting, but db-level expressions > are going to be far easier and more efficient to validate and update > in bulk - no looping in Python, just executing SQL. In this case, I > think your example would be better suited as an FK, for denorm > purposes. > > Callables would be useful for something more complicated like > abstracting auto_now to not be limited to dates, that is allowing a > field value to be set by a callable on save, not just create, for any > field type. > > On Tue, Sep 23, 2008 at 2:42 AM, Andrew Godwin <[EMAIL PROTECTED]> wrote: > > Still, with an > > AggregateField(Sandwiches.filter(filling="cheese").count()) it's still > > possible to work out that you want to listen on the Sandwiches model, > > and you could then fall back to re-running the count on every Sandwich > > save, even if it ends up not having a cheese filling. > > I'm not sure I like the idea of accepting arbitrary QuerySets. It > could just be my point-of-view, but I see automatic denormalization > and aggregation as intimately tied, where a denormalized field is just > a declarative aggregate expression that's optionally cached in a > column. I think this makes it easy to understand and document, since > aggregation queries and calculation fields would support the same > features, and it also allows the implementation to share a lot with > aggregates. It's also better in terms of storing and updating a > calculation: you can calculate on reads or writes for N objects in one > query without N subqueries, though it may involve a lot of joins. > > > So, I think the best approach would be one to replicate fields (like my > > current DenormField; perhaps call it CopyField or something) and one to > > cache aggregates (an AggregateField, like above). > > I'd also be hesitant to have two separate fields for these cases, > since copying a related field value is just a simple SQL expression. I > think the same calculation field could be used for both, by assuming a > string is a field name: > > name = CalculationField('user.name') > > or by using F expressions: > > name = CalculationField(F('user.name')) > > Another approach to calculations, that doesn't necessarily involve > denormalization, would be to use views. Russell talked to me a bit > about this at djangocon, and I think the idea was that if you solve > basing models on views (Isn't it actually possible now? Maybe we need > read-only fields), and then have view creation support in the ORM, > then calculation fields can be implemented with views. I see that > un-stored calculations are re-implementing views, but I don't know > enough about views to know whether they have some performance > advantages over embedding the calculation in a query. > > -Justin --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
I just wanted to touch on this point: models.IntegerField(default=0) does not translate to INT(11) default 0; in SQL :) Django doesn't nescesarily try to interact with the database properly when it should, because it requires too much thinking (and more code that hasn't been written). I would say ignore triggers on the DB level, until they've been written in the framework. On Sep 23, 7:44 am, Steve Holden <[EMAIL PROTECTED]> wrote: > This appears to be a proposal to re-implement triggers inside Django. > > I can see there are benefits if the underlying DB platform won't support > triggers, but wouldn't triggers be the preferred solution when they're > available? That way there is no chance that changes can be made outside > the scope of the denormalization, and hence no need to recompute the > denormalized values. > > regards > Steve > > Andrew Godwin wrote: > > David Cramer wrote: > > >> If you're not doing denormalization in your database, most likely > >> you're doing something wrong. I really like the approach that is > >> offered here. > > >> For me, personally, it would be great if this could accept callables > >> as well. So you could store the username, like so, or you could store > >> a choices field like: > > >> field = models.IntegerField(choices=CHOICES) > >> denorm = models.DenormField('self', 'get_field_display') # which > >> would rock if it was .field.display ;) > > >> You could also make it somehow accept querysets or something similar > >> for things like count(). I see a lot of possibilities and am a bit > >> disappointed I didn't come up with something this easy for my use- > >> cases. > > > The key is making sure you can listen for changes on whatever's at the > > other end of your denormalisation. With my current snippet, it listens > > for a save on the model the foreignkey points to, then checks for the > > right ID; if we start accepting random querysets, then there has to be a > > way to resolve that back to conditions the signal listener can understand. > > > Still, with an > > AggregateField(Sandwiches.filter(filling="cheese").count()) it's still > > possible to work out that you want to listen on the Sandwiches model, > > and you could then fall back to re-running the count on every Sandwich > > save, even if it ends up not having a cheese filling. > > > So, I think the best approach would be one to replicate fields (like my > > current DenormField; perhaps call it CopyField or something) and one to > > cache aggregates (an AggregateField, like above). > > > Simon Willison wrote: > > >> Just so it's on the record, I'd like any denormalisation tools in > >> Django to include a mechanism for re-syncronizing them should > >> something go awry (like direct updates being applied to the database > >> without keeping the denormalised fields in sync). This mechanism could > >> then be exposed as a ./manage.py command which could be called > >> periodically to verify and fix any incorrect data. > > > Yes, this I very much agree with. The reason you always layer this stuff > > on top of a pre-normalised database is because you can then rebuild the > > data after playing with it externally. > > > Doing so shouldn't be too much of a problem; have a management command > > that loads the models, and then just executes the update method on each > > of the denormalisationalish fields. > > > Justin's idea of lazy updating is interesting, and probably quite doable > > (as well as what most people will want by default on aggregate queries). > > > I'm also hoping any kind of aggregate denormalisation will work with any > > future extended aggregate support, but if the field just takes a normal > > QuerySet, that might Just Work™. > > > Andrew --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
On Tue, Sep 23, 2008 at 12:52 AM, David Cramer <[EMAIL PROTECTED]> wrote:> > For me, personally, it would be great if this could accept callables > as well. So you could store the username, like so, or you could store > a choices field like: > >field = models.IntegerField(choices=CHOICES) >denorm = models.DenormField('self', 'get_field_display') # which > would rock if it was .field.display ;) I think denormalizing with callables is a very different thing than denormalizing with expressions that can be evaluated by the database. Not that they both aren't worth supporting, but db-level expressions are going to be far easier and more efficient to validate and update in bulk - no looping in Python, just executing SQL. In this case, I think your example would be better suited as an FK, for denorm purposes. Callables would be useful for something more complicated like abstracting auto_now to not be limited to dates, that is allowing a field value to be set by a callable on save, not just create, for any field type. On Tue, Sep 23, 2008 at 2:42 AM, Andrew Godwin <[EMAIL PROTECTED]> wrote: > Still, with an > AggregateField(Sandwiches.filter(filling="cheese").count()) it's still > possible to work out that you want to listen on the Sandwiches model, > and you could then fall back to re-running the count on every Sandwich > save, even if it ends up not having a cheese filling. I'm not sure I like the idea of accepting arbitrary QuerySets. It could just be my point-of-view, but I see automatic denormalization and aggregation as intimately tied, where a denormalized field is just a declarative aggregate expression that's optionally cached in a column. I think this makes it easy to understand and document, since aggregation queries and calculation fields would support the same features, and it also allows the implementation to share a lot with aggregates. It's also better in terms of storing and updating a calculation: you can calculate on reads or writes for N objects in one query without N subqueries, though it may involve a lot of joins. > So, I think the best approach would be one to replicate fields (like my > current DenormField; perhaps call it CopyField or something) and one to > cache aggregates (an AggregateField, like above). I'd also be hesitant to have two separate fields for these cases, since copying a related field value is just a simple SQL expression. I think the same calculation field could be used for both, by assuming a string is a field name: name = CalculationField('user.name') or by using F expressions: name = CalculationField(F('user.name')) Another approach to calculations, that doesn't necessarily involve denormalization, would be to use views. Russell talked to me a bit about this at djangocon, and I think the idea was that if you solve basing models on views (Isn't it actually possible now? Maybe we need read-only fields), and then have view creation support in the ORM, then calculation fields can be implemented with views. I see that un-stored calculations are re-implementing views, but I don't know enough about views to know whether they have some performance advantages over embedding the calculation in a query. -Justin --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
On Tue, Sep 23, 2008 at 7:44 AM, Steve Holden <[EMAIL PROTECTED]> wrote: > > This appears to be a proposal to re-implement triggers inside Django. > > I can see there are benefits if the underlying DB platform won't support > triggers, but wouldn't triggers be the preferred solution when they're > available? That way there is no chance that changes can be made outside > the scope of the denormalization, and hence no need to recompute the > denormalized values. It would be nice to have an option to make atomic changes to the normalized and denormalized tables. -Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
On Sep 23, 2:00 pm, "Marty Alchin" <[EMAIL PROTECTED]> wrote: > Without some Python-based approach, all I could see is maybe adding a > cross-platform "create trigger" API (ugh) to Django, which an > application could then use to set up its triggers during syncdb. > Otherwise, something like that forum app would have to implement a > trigger for all available backends or just ship with instructions on > how to set it up yourself. Triggers also aren't supported in Drizzle ( https://launchpad.net/drizzle ), which Django will probably want to target at some point. Cheers, Simon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
On Tue, Sep 23, 2008 at 8:44 AM, Steve Holden <[EMAIL PROTECTED]> wrote: > This appears to be a proposal to re-implement triggers inside Django. I suppose it is. But then, perhaps triggers were a re-implementation of application-based denormalization. ... The chicken? The egg? The world may never know. :) > I can see there are benefits if the underlying DB platform won't support > triggers, but wouldn't triggers be the preferred solution when they're > available? That way there is no chance that changes can be made outside > the scope of the denormalization, and hence no need to recompute the > denormalized values. Perhaps the biggest downside is that distributed applications can't make use of it, since triggers aren't cross-platform. Sure, you could argue that denormalization is best suited for individual projects, rather than distributed apps, but I could see uses for it. Imagine a full-blown forum app that wants to denormalize its post counts. Without some Python-based approach, all I could see is maybe adding a cross-platform "create trigger" API (ugh) to Django, which an application could then use to set up its triggers during syncdb. Otherwise, something like that forum app would have to implement a trigger for all available backends or just ship with instructions on how to set it up yourself. That said, you're right, I would think it's usually safest to put that stuff directly in the db. That way, if you have other applications using other languages or perhaps just other ORMs, everything's all intact. I'm pretty sure I've seen arguments on both sides of the issue, though, so I'm not sure there's any objective answer to it. -Gul --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
Yes, Steve, it's true that triggers do have much of the same functionality as this kind of proposal, but as you say, part of the proposal is to make these hopefully work with future non-SQL databases (although, admittedly, that's only a small piece of the puzzle). To be honest, my main drive is to have a nice, easy denormalisation API for Django; whether this means implementing triggers in all the database backends and using them behind this kind of API remains to be seen. The problems I can see with this at the moment are that: 1) If you change a DenormField, there's less writing migrations/fiddling with the database than if we managed it all in Django code (where you could just resync) 2) Triggers are a whole part of SQL Django just hasn't touched yet, and implementing support for them will take a reasonable amount of work in both debugging and testing for each backend. I'd really like to go ahead and implement this stuff 'vanilla' at first, relying on the tried-and-tested field and signal code, and then if it gets popular, add in the option to run denormalisations using triggers rather than inside Django. That way, we eventually end up with the best of both worlds, and get people using more efficient schemas first, which they can then easily bake into triggers when they come along, and if their DB supports them. Andrew Steve Holden wrote: > This appears to be a proposal to re-implement triggers inside Django. > > I can see there are benefits if the underlying DB platform won't support > triggers, but wouldn't triggers be the preferred solution when they're > available? That way there is no chance that changes can be made outside > the scope of the denormalization, and hence no need to recompute the > denormalized values. > > regards > Steve > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
This appears to be a proposal to re-implement triggers inside Django. I can see there are benefits if the underlying DB platform won't support triggers, but wouldn't triggers be the preferred solution when they're available? That way there is no chance that changes can be made outside the scope of the denormalization, and hence no need to recompute the denormalized values. regards Steve Andrew Godwin wrote: > David Cramer wrote: > >> If you're not doing denormalization in your database, most likely >> you're doing something wrong. I really like the approach that is >> offered here. >> >> For me, personally, it would be great if this could accept callables >> as well. So you could store the username, like so, or you could store >> a choices field like: >> >> field = models.IntegerField(choices=CHOICES) >> denorm = models.DenormField('self', 'get_field_display') # which >> would rock if it was .field.display ;) >> >> You could also make it somehow accept querysets or something similar >> for things like count(). I see a lot of possibilities and am a bit >> disappointed I didn't come up with something this easy for my use- >> cases. >> >> > > The key is making sure you can listen for changes on whatever's at the > other end of your denormalisation. With my current snippet, it listens > for a save on the model the foreignkey points to, then checks for the > right ID; if we start accepting random querysets, then there has to be a > way to resolve that back to conditions the signal listener can understand. > > Still, with an > AggregateField(Sandwiches.filter(filling="cheese").count()) it's still > possible to work out that you want to listen on the Sandwiches model, > and you could then fall back to re-running the count on every Sandwich > save, even if it ends up not having a cheese filling. > > So, I think the best approach would be one to replicate fields (like my > current DenormField; perhaps call it CopyField or something) and one to > cache aggregates (an AggregateField, like above). > > Simon Willison wrote: > >> Just so it's on the record, I'd like any denormalisation tools in >> Django to include a mechanism for re-syncronizing them should >> something go awry (like direct updates being applied to the database >> without keeping the denormalised fields in sync). This mechanism could >> then be exposed as a ./manage.py command which could be called >> periodically to verify and fix any incorrect data. >> > Yes, this I very much agree with. The reason you always layer this stuff > on top of a pre-normalised database is because you can then rebuild the > data after playing with it externally. > > Doing so shouldn't be too much of a problem; have a management command > that loads the models, and then just executes the update method on each > of the denormalisationalish fields. > > Justin's idea of lazy updating is interesting, and probably quite doable > (as well as what most people will want by default on aggregate queries). > > I'm also hoping any kind of aggregate denormalisation will work with any > future extended aggregate support, but if the field just takes a normal > QuerySet, that might Just Work™. > > Andrew > > > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
David Cramer wrote: > If you're not doing denormalization in your database, most likely > you're doing something wrong. I really like the approach that is > offered here. > > For me, personally, it would be great if this could accept callables > as well. So you could store the username, like so, or you could store > a choices field like: > > field = models.IntegerField(choices=CHOICES) > denorm = models.DenormField('self', 'get_field_display') # which > would rock if it was .field.display ;) > > You could also make it somehow accept querysets or something similar > for things like count(). I see a lot of possibilities and am a bit > disappointed I didn't come up with something this easy for my use- > cases. > The key is making sure you can listen for changes on whatever's at the other end of your denormalisation. With my current snippet, it listens for a save on the model the foreignkey points to, then checks for the right ID; if we start accepting random querysets, then there has to be a way to resolve that back to conditions the signal listener can understand. Still, with an AggregateField(Sandwiches.filter(filling="cheese").count()) it's still possible to work out that you want to listen on the Sandwiches model, and you could then fall back to re-running the count on every Sandwich save, even if it ends up not having a cheese filling. So, I think the best approach would be one to replicate fields (like my current DenormField; perhaps call it CopyField or something) and one to cache aggregates (an AggregateField, like above). Simon Willison wrote: > Just so it's on the record, I'd like any denormalisation tools in > Django to include a mechanism for re-syncronizing them should > something go awry (like direct updates being applied to the database > without keeping the denormalised fields in sync). This mechanism could > then be exposed as a ./manage.py command which could be called > periodically to verify and fix any incorrect data. Yes, this I very much agree with. The reason you always layer this stuff on top of a pre-normalised database is because you can then rebuild the data after playing with it externally. Doing so shouldn't be too much of a problem; have a management command that loads the models, and then just executes the update method on each of the denormalisationalish fields. Justin's idea of lazy updating is interesting, and probably quite doable (as well as what most people will want by default on aggregate queries). I'm also hoping any kind of aggregate denormalisation will work with any future extended aggregate support, but if the field just takes a normal QuerySet, that might Just Work™. Andrew --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
If you're not doing denormalization in your database, most likely you're doing something wrong. I really like the approach that is offered here. For me, personally, it would be great if this could accept callables as well. So you could store the username, like so, or you could store a choices field like: field = models.IntegerField(choices=CHOICES) denorm = models.DenormField('self', 'get_field_display') # which would rock if it was .field.display ;) You could also make it somehow accept querysets or something similar for things like count(). I see a lot of possibilities and am a bit disappointed I didn't come up with something this easy for my use- cases. +1 from me! On Sep 22, 7:00 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > On Sep 23, 12:21 am, "Justin Fagnani" <[EMAIL PROTECTED]> > wrote: > > > In my experience at least, denormalization occurs > > a lot and leaves a lot of room for mistakes, so it's something a > > framework should handle if possible. > > Just so it's on the record, I'd like any denormalisation tools in > Django to include a mechanism for re-syncronizing them should > something go awry (like direct updates being applied to the database > without keeping the denormalised fields in sync). This mechanism could > then be exposed as a ./manage.py command which could be called > periodically to verify and fix any incorrect data. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
On Sep 23, 12:21 am, "Justin Fagnani" <[EMAIL PROTECTED]> wrote: > In my experience at least, denormalization occurs > a lot and leaves a lot of room for mistakes, so it's something a > framework should handle if possible. Just so it's on the record, I'd like any denormalisation tools in Django to include a mechanism for re-syncronizing them should something go awry (like direct updates being applied to the database without keeping the denormalised fields in sync). This mechanism could then be exposed as a ./manage.py command which could be called periodically to verify and fix any incorrect data. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
I think any discussion of denormalization needs to include aggregation as well, since most uses will involve more than simply looking up one value form a related model. I also think this is something definitely to include in core. In my experience at least, denormalization occurs a lot and leaves a lot of room for mistakes, so it's something a framework should handle if possible. Ideally I'd like to see either a calculation field that can take aggregation expressions and options about how to store and synchronize. I imaging something like this syntax: class Order(models.Model): # updates the order total if the line items change subtotal = models.CalculationField(Sum('items.cost'), store=True, sync=True) # doesn't update the order's customer name if the customer's name changes customer = models.ForeignKey(Customer) customer_name = models.CalculationField(F('customer.name'), store=True, sync=False) # don't store a rarely used value. calculate on select weight = CalculationField(Sum('items.weight'), store=False) This approach would require some changes to the ORM and aggregation, but it'd be worth it since it'd make denormalization easy, flexible and less error-prone. Fields would need to be able to contribute more to queries and table creation, and we'd some simple need type inference for expressions/aggregation. If calculation support is added to all fields, or calculations have to have a declared type, then type inference wouldn't be necessary, but type inference for aggregation isn't hard. -Justin --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
Hi, this would be very useful for a lot of use cases. View counts that originally came from a foo viewed bar table with timestamps for every viewer, or storing some sort of sum calculation from a list of related objects. Basically all that stuff you would display in an "overview table" of some sort for which you would normally use aggregation wich gets very slow very fast on large data sets. I have written a generic denormalizer to store whole data sets in denormalized tables to speed up searching. It basically consists of a Metaclass that constructs a Model out of a values queryset and a base class that adds method to generate sql to keep the table up to date. This way most of the time you only have to write the values query set and for special cases you can still add "custom" Fields if you have to. The tables get updated periodically, but it would be quite possible to do this with signals instantly. A previous version even generated triggers to keep the tables updated, but this was dropped afterwards due to other reasons. Queryset refactor did help a lot to make this sort of stuff easier. Only quirk right now is that you have to use extra and "force" queryset.query to use aggregation wich you'll need very often when denormalizing. I think a django.contrib.denormalization app would be a very nice thingy. +1 from me ;). Regards --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Denormalisation, magic, and is it really that useful?
On Mon, Sep 22, 2008 at 4:03 PM, Andrew Godwin <[EMAIL PROTECTED]> wrote: > So, I'd love people's opinions about where I should take this - to a > dark corner and hide it, to a separate app like django-extensions (which > involves keeping the horrid hacks in there), into a separate app but > with patches to core to make it less hackish (i.e. more signals), or add > it as my 1.1 pony with the proviso that I'm happy to write all the code. Personally, I think it's best suited for a django-d13n (that's d13n for denormali(s|z)ation) app that stands on its own. That way, you can provide a host of other options as you see fit, have other people get onboard and provide whatever they use, and so on. I think there's definite merit to what you're doing, but I just don't see it ever being in core, unless it requires some freakish internal changes. If you do need changes to core, or you can at least frame them in a way that would provide wider benefits, you'll definitely get a fruitful discussion on it, at least. -Gul --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Denormalisation, magic, and is it really that useful?
So, hello everyone. I figure this list is the best place to ask this, but please feel free to deride me if not... After all the talk of multiple databases, and non-relational databases (bigtable, couchdb, etc.) that went on at DjangoCon and afterward,I've been thinking about denormali[s/z]ation and how to make it easier in Django; previously, I had thought denormalisation was something you did to ruin your database, but I can now see it can be additive as well. In this vein, last weekend I decided to see how easy it would be to write a magical DenormalisationField which handled all the synchronising of data from the relevant tables given a normal ForeignKey relationship; the results are at http://www.aeracode.org/2008/9/14/denormalisation-follies/ Basically, yes, it is possible, and a lot of people (eight! for my blog, that's a lot) seem to like the idea; it certainly saves having to write that replicate-on-save logic that otherwise goes with this kind of schema design. Given the fact that it seems like a reasonably common thing to do in large-scale applications, I'd like to know what people think about adding this kind of thing into core (while I have a standalone implementation, that needs some hacks I'd like to see gone). My main issues with this idea are: 1) It relies on having a ForeignKey pointing to the other model, so this might not work for non-relational databases (well, unless ForeignKeys are still allowed in that case, but they just perform expensive SELECTs as well as possibly printing abusive messages to stderr). 2) I'm not sure how common a use case this is; it sounds like it might be verging into the long tail a bit. 3) People also want other things denormalised - count()s, aggregate queries, etc. Of course, writing fields for these too is similar, and I'd be happy to do it, but it might mean that even if the case for denormalisation is good this is only a piece of the puzzle. So, I'd love people's opinions about where I should take this - to a dark corner and hide it, to a separate app like django-extensions (which involves keeping the horrid hacks in there), into a separate app but with patches to core to make it less hackish (i.e. more signals), or add it as my 1.1 pony with the proviso that I'm happy to write all the code. Andrew --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---