Re: Problems with concurrent DB access and get_or_create()

2008-07-21 Thread Ben Godfrey


Hi,

I'm using rev 7713 (I need to port to the new upload handling before I
can get back to trunk).

The code that actually adds to the table:

def do_update(self, event):
if event.countable:
matrix, new = self.get_or_create(date=event.time.date(),
member=event.target, shop=event.shop,
product=event.product)
matrix.increment_column(event.type)
matrix.save()

Ben

On Jul 16, 3:28 pm, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> What version are you running, and what's the exact line you're using
> for get_or_create?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Problems with concurrent DB access and get_or_create()

2008-07-16 Thread [EMAIL PROTECTED]

What version are you running, and what's the exact line you're using
for get_or_create?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Problems with concurrent DB access and get_or_create()

2008-07-15 Thread Ben Godfrey

I'm experiencing the issue with concurrent writes on a low traffic
site hosted on a single machine DB and web server. I'm logging
reporting events in certain views and everytime I get indexed by a
search engine, this error floods into my inbox.

If I implement the workaround of hiding the multiple-rows-returned
exception, I will end up with multiple rows containing my data instead
of one. Until aggregates support comes around this means I would have
to sum() the rows with some custom SQL.

I suppose I could hide duplicates found by get_or_create and write a
script to periodically tidy up duplicates into a single row. That
seems rather a hack though.

Regarding the prior conversation from December. I think it's important
that this "just works," or there is at least a note warning about this
possibility in the get_or_create docs. Otherwise the same thing that
happened to me will happen to others. They'll build their software
merrily and think it's working until one day traffic hits the level at
which this problem occurs, then they have to modify code.

Ben
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Problems with concurrent DB access and get_or_create()

2007-12-04 Thread Ivan Sagalaev

Travis Terry wrote:
> It's my understanding that SELECT ... FOR UPDATE only locks the rows 
> that it reads (so you can be sure they can be updated or referenced 
> later in the same trasaction).  However, in the case of get_or_create(), 
> there is no existing row

Oh... Indeed. Then this won't help here, agreed.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Problems with concurrent DB access and get_or_create()

2007-12-04 Thread Ivan Sagalaev

Travis Terry wrote:
> 4. P2's get_or_create() tries to create the item.  One of two things 
> happens:
> a. a second item is created with the same parameters if this doesn't 
> violate a UNIQUE constraint
> b. the second create fails (because of a UNIQUE constraint) and 
> raises an exception

Actually there is a common solution to this problem that doesn't create 
duplicates and doesn't fail on second transaction. And as James 
correctly has noted it works on database level. The solution is a form 
of SELECT called SELECT FOR UPDATE. When one transaction selects 
something explicitly for update any other transaction trying to do the 
same thing will wait until the first one ends. I.e. it works like a lock 
for threads but since it works in database it works for multiple Python 
processes that otherwise don't know anything about each other.

The good part is that SELECT FOR UPDATE is implemented in MySQL, 
PostgreSQL and Oracle. I recall Malcolm has ones said that Adrian 
expressed desire to have this in Django and it might happen after 
queryset refactoring. Malcolm, Adrian, please confirm is this correct or 
I'm just hallucinating :-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Problems with concurrent DB access and get_or_create()

2007-12-04 Thread Marty Alchin

On Dec 4, 2007 2:21 PM, Travis Terry <[EMAIL PROTECTED]> wrote:
> exception escape, then everywhere I call get_or_create() will have to
> implement a very non-DRY piece of identical code to handle the situation.

I won't get into the issue of whether this should or shouldn't be in
core, but you most certainly don't need any non-DRY code anywhere you
use get_or_create(). Remember, it's a manager method, and you're
already able to override the default manager. Sure, it's still more
overhead in your code than if Django did it for you, but there's no
need to be melodramatic.

from django.db import models

class ConcurrentManager(models.Manager):
def get_or_create(self, **kwargs):
try:
return super(ConcurrentManager, self).get_or_create(**kwargs)
except:
return self.get(**kwargs)

class MyModel(models.Model):
# fields go here
objects = ConcurrentManager()

You can set up the manager code once, then simply import it and slap
it on whichever models you expect to have concurrency problems. Then
all the rest of your code behaves as if it was included in trunk. And
if you're already using custom managers, just add the above method to
them and enjoy the ride.

That is, unless you're using get_or_create() on a QuerySet instead of
the model's manager, at which point you might have to wait for the
queryset-refactor, since that will make custom QuerySet classes easier
to work with.

Hope this helps!

-Gul

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Problems with concurrent DB access and get_or_create()

2007-12-04 Thread Jordan Levy
On Dec 4, 2007 11:21 AM, Travis Terry <[EMAIL PROTECTED]> wrote:

> Otherwise, if it lets the exception escape, then everywhere I call
> get_or_create() will have to implement a very non-DRY piece of identical
> code to handle the situation.
>
> Travis
>


Couldn't you implement your own DRY solution?  Write a decorator for the
get_or_create() method that catches exception thrown during the call to
create() and calls get() again.

Jordan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Problems with concurrent DB access and get_or_create()

2007-12-04 Thread Travis Terry

I've run into a problem with get_or_create() with respect to concurrent 
access of the DB, and I have looked at the list archives for advice.  I 
found some discussions a while back regarding other's problems but no 
acceptable solution was ever implemented.  I have another proposed 
solution that I thought I should throw out there to see if anyone like 
it better.  First, let me restate the problem:

Two threads/processes/servers (let's call them P1 and P2) need to 
concurrently create a unique object
1. P1 calls get_or_create(), which tries to get the item (it doesn't exist)
2. P2 calls get_or_create(), which tries to get the same item (it 
doesn't exist)
3. P1's get_or_create() tries to create the item (this works and returns 
the item)
4. P2's get_or_create() tries to create the item.  One of two things 
happens:
a. a second item is created with the same parameters if this doesn't 
violate a UNIQUE constraint
b. the second create fails (because of a UNIQUE constraint) and 
raises an exception

In the case of 4a, a future get() or get_or_create() call will assert 
because multiple values have been returned.  In the case of 4b, the 
caller will need to catch the exception and (since the exception 
probably means there was a concurrent create) most likely try to get the 
object again.

Previous proposals to address this issue involved adding either a thread 
lock or a DB table lock around the get_or_create() call.  Both of these 
are unacceptable.  The thread lock does nothing to prevent the problem 
when using multiple front-end servers, and the DB lock is just plain bad 
for performance.

It seems reasonable to require that the model be designed with 
unique_together=(...) on the fields that are used the get_or_create().  
This will allow the DB to prevent duplicates from being created.  Thus 
the only code change needed to make get_or_create() always return the 
correct object is to call get() again in the event of an exception from 
create().

Pseudo-code
-
def get_or_create(**kwargs):
try:
   obj = get(**kwargs)
except:
   try:
  obj = create(**kwargs)
   except:
  obj = get(**kwargs)
return obj

This solution is based on the following assumptions:

1. We always want get_or_create() to return the object we're looking for.
2. MOST of the time the object will exist, so calling get() first is the 
highest performance.
3. Occasionally the object will not exist and may be created 
concurrently by multiple threads/processes/servers.  In this case the 
second get() is no more expensive than the get() the caller should have 
to make anyway when handling the exception.

This solution has not performance penalty in the "normal" case and takes 
full advantage of the DB's data integrity enforcement.

If this solution is favorable, I'll create a ticket with the patch and 
tests.

Travis


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---