Re: Problems with concurrent DB access and get_or_create()
Hi, I'm using rev 7713 (I need to port to the new upload handling before I can get back to trunk). The code that actually adds to the table: def do_update(self, event): if event.countable: matrix, new = self.get_or_create(date=event.time.date(), member=event.target, shop=event.shop, product=event.product) matrix.increment_column(event.type) matrix.save() Ben On Jul 16, 3:28 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > What version are you running, and what's the exact line you're using > for get_or_create? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Problems with concurrent DB access and get_or_create()
What version are you running, and what's the exact line you're using for get_or_create? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Problems with concurrent DB access and get_or_create()
I'm experiencing the issue with concurrent writes on a low traffic site hosted on a single machine DB and web server. I'm logging reporting events in certain views and everytime I get indexed by a search engine, this error floods into my inbox. If I implement the workaround of hiding the multiple-rows-returned exception, I will end up with multiple rows containing my data instead of one. Until aggregates support comes around this means I would have to sum() the rows with some custom SQL. I suppose I could hide duplicates found by get_or_create and write a script to periodically tidy up duplicates into a single row. That seems rather a hack though. Regarding the prior conversation from December. I think it's important that this "just works," or there is at least a note warning about this possibility in the get_or_create docs. Otherwise the same thing that happened to me will happen to others. They'll build their software merrily and think it's working until one day traffic hits the level at which this problem occurs, then they have to modify code. Ben --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Problems with concurrent DB access and get_or_create()
Travis Terry wrote: > It's my understanding that SELECT ... FOR UPDATE only locks the rows > that it reads (so you can be sure they can be updated or referenced > later in the same trasaction). However, in the case of get_or_create(), > there is no existing row Oh... Indeed. Then this won't help here, agreed. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Problems with concurrent DB access and get_or_create()
Travis Terry wrote: > 4. P2's get_or_create() tries to create the item. One of two things > happens: > a. a second item is created with the same parameters if this doesn't > violate a UNIQUE constraint > b. the second create fails (because of a UNIQUE constraint) and > raises an exception Actually there is a common solution to this problem that doesn't create duplicates and doesn't fail on second transaction. And as James correctly has noted it works on database level. The solution is a form of SELECT called SELECT FOR UPDATE. When one transaction selects something explicitly for update any other transaction trying to do the same thing will wait until the first one ends. I.e. it works like a lock for threads but since it works in database it works for multiple Python processes that otherwise don't know anything about each other. The good part is that SELECT FOR UPDATE is implemented in MySQL, PostgreSQL and Oracle. I recall Malcolm has ones said that Adrian expressed desire to have this in Django and it might happen after queryset refactoring. Malcolm, Adrian, please confirm is this correct or I'm just hallucinating :-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Problems with concurrent DB access and get_or_create()
On Dec 4, 2007 2:21 PM, Travis Terry <[EMAIL PROTECTED]> wrote: > exception escape, then everywhere I call get_or_create() will have to > implement a very non-DRY piece of identical code to handle the situation. I won't get into the issue of whether this should or shouldn't be in core, but you most certainly don't need any non-DRY code anywhere you use get_or_create(). Remember, it's a manager method, and you're already able to override the default manager. Sure, it's still more overhead in your code than if Django did it for you, but there's no need to be melodramatic. from django.db import models class ConcurrentManager(models.Manager): def get_or_create(self, **kwargs): try: return super(ConcurrentManager, self).get_or_create(**kwargs) except: return self.get(**kwargs) class MyModel(models.Model): # fields go here objects = ConcurrentManager() You can set up the manager code once, then simply import it and slap it on whichever models you expect to have concurrency problems. Then all the rest of your code behaves as if it was included in trunk. And if you're already using custom managers, just add the above method to them and enjoy the ride. That is, unless you're using get_or_create() on a QuerySet instead of the model's manager, at which point you might have to wait for the queryset-refactor, since that will make custom QuerySet classes easier to work with. Hope this helps! -Gul --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Problems with concurrent DB access and get_or_create()
On Dec 4, 2007 11:21 AM, Travis Terry <[EMAIL PROTECTED]> wrote: > Otherwise, if it lets the exception escape, then everywhere I call > get_or_create() will have to implement a very non-DRY piece of identical > code to handle the situation. > > Travis > Couldn't you implement your own DRY solution? Write a decorator for the get_or_create() method that catches exception thrown during the call to create() and calls get() again. Jordan --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Problems with concurrent DB access and get_or_create()
I've run into a problem with get_or_create() with respect to concurrent access of the DB, and I have looked at the list archives for advice. I found some discussions a while back regarding other's problems but no acceptable solution was ever implemented. I have another proposed solution that I thought I should throw out there to see if anyone like it better. First, let me restate the problem: Two threads/processes/servers (let's call them P1 and P2) need to concurrently create a unique object 1. P1 calls get_or_create(), which tries to get the item (it doesn't exist) 2. P2 calls get_or_create(), which tries to get the same item (it doesn't exist) 3. P1's get_or_create() tries to create the item (this works and returns the item) 4. P2's get_or_create() tries to create the item. One of two things happens: a. a second item is created with the same parameters if this doesn't violate a UNIQUE constraint b. the second create fails (because of a UNIQUE constraint) and raises an exception In the case of 4a, a future get() or get_or_create() call will assert because multiple values have been returned. In the case of 4b, the caller will need to catch the exception and (since the exception probably means there was a concurrent create) most likely try to get the object again. Previous proposals to address this issue involved adding either a thread lock or a DB table lock around the get_or_create() call. Both of these are unacceptable. The thread lock does nothing to prevent the problem when using multiple front-end servers, and the DB lock is just plain bad for performance. It seems reasonable to require that the model be designed with unique_together=(...) on the fields that are used the get_or_create(). This will allow the DB to prevent duplicates from being created. Thus the only code change needed to make get_or_create() always return the correct object is to call get() again in the event of an exception from create(). Pseudo-code - def get_or_create(**kwargs): try: obj = get(**kwargs) except: try: obj = create(**kwargs) except: obj = get(**kwargs) return obj This solution is based on the following assumptions: 1. We always want get_or_create() to return the object we're looking for. 2. MOST of the time the object will exist, so calling get() first is the highest performance. 3. Occasionally the object will not exist and may be created concurrently by multiple threads/processes/servers. In this case the second get() is no more expensive than the get() the caller should have to make anyway when handling the exception. This solution has not performance penalty in the "normal" case and takes full advantage of the DB's data integrity enforcement. If this solution is favorable, I'll create a ticket with the patch and tests. Travis --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---