Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Claudiu Saftoiu
I wonder, if you have a problem which an SQL database would be so good for that 
youre mimicking an SQL database with zodb, why not just use an SQL database? It 
doesn't sound like you'll gain much from being able to persist objects which is 
one of the main reasons to use an object database...


On Aug 18, 2013, at 12:17 PM, Christian Tismer  wrote:

> On 18.08.13 17:09, Jim Fulton wrote:
>> On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  
>> wrote:
>> 
> 
> Explaining very concisely, now.
> 
>> I don't think I/we understand your problem well enough to answer. If data 
>> has a very low shelf life, then replacing it frequently might make sense. If 
>> the schema changes that frequently, I'd as why. If this is a data analysis 
>> application, you might be better served by tools designed for that.
>>> Is Python still the way to go, or should I stop this and use something like
>>> PostgreSQL? (And I doubt that this would give a benefit, actually).
>> Ditto,
>> 
>>> Would you implement a column store, and how would you do that?
>> Ditto.
>> 
>>> Right now, everything gets too large, and I'm quite desperate. Therefore,
>>> I'm
>>> asking the master, which you definately are!
>> "large" can mean many things. The examples you give don't
>> seem very large in terms of storage, at least not for ZODB.
>> 
>> Beyond that there are lots of dimensions of scale that ZODB
>> doesn't handle well (e.g. large transaction rates, very
>> high availability).
>> 
>> It's really hard to make specific recommendations without
>> knowing more about the problem. (And it's likely that someone
>> wouldn't be able to spend the time necessary to learn more
>> about the problem without a stake in it. IOW, don't assume I'll
>> read a much longer post getting into details. :)
>> 
> 
> Ok, just the sketch of it to make things clearer, don't waste time on this ;-)
> 
> We get a medication prescription database in a certain serialized format
> which is standard in Germany for all pharmacy support companies.
> 
> This database comes in ~25 files == tables in a zip file every two weeks.
> The DB is actually a structured set of SQL tables with references et al.
> 
> I actually did not want to change the design and simply created the table
> structure that they have, using ZODB, with tables as btrees that contain
> tuples for the records, so this is basically the SQL model, mimicked in Zodb.
> 
> What is boring is the fact, that the database gets incremental updates all 
> the time,
> changed prices, packing info, etc.
> We need to cope with millions of recipes that come from certain dates
> and therefore need to inquire different versions of the database.
> 
> I just hate the huge redundancy that these database versions would have
> and tried to find a way to put this all into a single Zodb with a way to
> time-travel to every version.
> 
> The weird thing is that the DB also changes its structure over time:
> 
> - new fields are added, old fields dropped.
> 
> That's the reason why I thought to store the tables by column, and each 
> column is
> a BTree on itself. Is that feasible at all?
> 
> Of the 25 tables, there are 4 quite large, like
> 4 tables x 500,000 rows x 100 columns,
> == 200,000,000 cells in one database.
> 
> With a btree bucket size of ~60, this gives ~ 3,333,333 buckets.
> With multiple versions, this will be even more.
> 
> -- Can Zodb handle so many objects and still open the db fast?
> -- Or will the huge index kill performance?
> 
> That's all I'm asking before doing another experiment ;-)
> 
> but don't waste time, just telling you the story -- chris
> 
> -- 
> Christian Tismer :^)   
> Software Consulting  : Have a break! Take a ride on Python's
> Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
> 14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
> phone +49 173 24 18 776  fax +49 (30) 700143-0023
> PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
>  whom do you want to sponsor today?   http://www.stackless.com/
> 
> ___
> For more information about ZODB, see http://zodb.org/
> 
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Cache warm up time

2013-03-08 Thread Claudiu Saftoiu
On Fri, Mar 8, 2013 at 12:31 PM, Leonardo Santagada wrote:

>
> On Fri, Mar 8, 2013 at 2:17 PM, Claudiu Saftoiu wrote:
>
>> Once I know the difference I'll probably be able to answer this myself,
>> but I wonder why the ZEO server doesn't do the sort of caching that allow
>> the client to operate so quickly on the indices once they are loaded.
>
>
> IIRC zeo not only takes bytes from the storage and put them on a socket,
> it has a kind of heavy protocol for sending objects that has overhead on
> each object, so lots of small objects (that are 400mb in size) take a lot
> more time than sending a 400mb blob.
>

Ah that would make perfect sense. So ZEO and catalog indices really don't
mix well at all.
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Cache warm up time

2013-03-08 Thread Claudiu Saftoiu
Hmm alright, that makes sense. If all the ZEO server does is read from disk
and write to a socket, and the reads from disk are cached by the OS, then I
see how it would be redundant for ZEO to cache those bytes itself as well.

However, I still don't understand the following situation. Say you have
Roche's case (and my case a few weeks ago) where, when you restart a
client, it takes a while to load big catalog indexes. I am assuming that
the client and the ZEO server are both running on the same machine and that
the .fs file is entirely in the OS's disk cache. Why is it that when the
client restarts, it takes a while to load the big catalog index, whereas
once the client has loaded it into its own local cache, it can use it
almost instantly? It seems that in the former case the bytes are being read
from the OS Disk Cache and sent over a local socket whereas in the second
case the bytes are being read from the client's memory cache. Shouldn't
those two have comparable speeds?

Once I know the difference I'll probably be able to answer this myself, but
I wonder why the ZEO server doesn't do the sort of caching that allow the
client to operate so quickly on the indices once they are loaded.

Thanks if you take the time to explain it to me. I feel like my knowledge
is lacking here and that impacts my ability to write efficient zodb
programs.
- Claudiu


On Fri, Mar 8, 2013 at 12:05 PM, Leonardo Rochael Almeida <
leoroch...@gmail.com> wrote:

> The ZEO server *has* a cache already: it's called the Operating System
> Disk Cache.
>
> Remember: other than coordinating transactions, the only thing ZEO
> really does is to sling the bytes of the pickles (and eventually the
> bytes of blobs) across the ZEO connection to the client.
>
> Any memory allocation that ZEO did to keep these bytes around in
> memory would subtract from the memory available to the OS to do disk
> caching of these exact same bytes.
>
> (it could perhaps be made a bit more efficient if the underlying
> FileStorage used mmap() instead of stdio calls to read these bytes.
> This would eliminate a temporary copy of the data that ZEO does
> between reading the bytes and sending them across the network).
>
> Cheers,
>
> Leo
>
> On Fri, Mar 8, 2013 at 1:16 PM, Claudiu Saftoiu 
> wrote:
> > I'd be curious to know what your results are, whichever path you decide
> to
> > take! Might help inform me as to what might help on my server...
> >
> > One thing I haven't yet understood is - how come the ZEO server itself
> > doesn't have a cache? It seems that would be a logical place to put one
> as
> > the ZEO server generally rarely gets restarted, at least for the use
> case of
> > running both the ZEO server and the clients on the same machine.
> >
> >
> > On Fri, Mar 8, 2013 at 1:46 AM, Roché Compaan <
> ro...@upfrontsystems.co.za>
> > wrote:
> >>
> >> Thanks, there are definitely some settings relating to the persistent
> >> cache that I haven't tried before, simply because I've been avoiding
> >> them.
> >>
> >> I'd still be interested to know if one can leverage the Relstorage
> >> memcache code for a ZEO cache, so if Shane doesn't get around to it
> >> I'll have a stab at it myself. Loading objects from a persistent cache
> >> will still cause IO so to me it seems that it would be a big win to
> >> keep the cache in memory even while restarting.
> >>
> >> --
> >> Roché Compaan
> >> Upfront Systems   http://www.upfrontsystems.co.za
> >>
> >>
> >>
> >>
> >> On Thu, Mar 7, 2013 at 9:35 PM, Leonardo Rochael Almeida
> >>  wrote:
> >> > This mail from Jim at this list a couple of years ago was stocked full
> >> > of nice tips:
> >> >
> >> > https://mail.zope.org/pipermail/zodb-dev/2011-May/014180.html
> >> >
> >> > In particular:
> >> >
> >> > - Yes, use persistent cache. Recent versions are reliable. Make it as
> >> > large as resonable (e.g at most the size of your packed database, at
> >> > least the size of objects that you want to be around after a restart).
> >> >
> >> > - Consider using zc.zlibstorage to compress the data that's stored in
> >> > ZODB
> >> >
> >> > - set drop-cache-rather-verify to true on the client (avoid long
> >> > restart time where your client is revalidating the ZEO cache)
> >> >
> >> > - set invalidation-age on the server to at least an hour or two so
> >> >   that you dea

Re: [ZODB-Dev] Cache warm up time

2013-03-08 Thread Claudiu Saftoiu
I'd be curious to know what your results are, whichever path you decide to
take! Might help inform me as to what might help on my server...

One thing I haven't yet understood is - how come the ZEO server itself
doesn't have a cache? It seems that would be a logical place to put one as
the ZEO server generally rarely gets restarted, at least for the use case
of running both the ZEO server and the clients on the same machine.

On Fri, Mar 8, 2013 at 1:46 AM, Roché Compaan wrote:

> Thanks, there are definitely some settings relating to the persistent
> cache that I haven't tried before, simply because I've been avoiding
> them.
>
> I'd still be interested to know if one can leverage the Relstorage
> memcache code for a ZEO cache, so if Shane doesn't get around to it
> I'll have a stab at it myself. Loading objects from a persistent cache
> will still cause IO so to me it seems that it would be a big win to
> keep the cache in memory even while restarting.
>
> --
> Roché Compaan
> Upfront Systems   http://www.upfrontsystems.co.za
>
>
>
>
> On Thu, Mar 7, 2013 at 9:35 PM, Leonardo Rochael Almeida
>  wrote:
> > This mail from Jim at this list a couple of years ago was stocked full
> > of nice tips:
> >
> > https://mail.zope.org/pipermail/zodb-dev/2011-May/014180.html
> >
> > In particular:
> >
> > - Yes, use persistent cache. Recent versions are reliable. Make it as
> > large as resonable (e.g at most the size of your packed database, at
> > least the size of objects that you want to be around after a restart).
> >
> > - Consider using zc.zlibstorage to compress the data that's stored in
> ZODB
> >
> > - set drop-cache-rather-verify to true on the client (avoid long
> > restart time where your client is revalidating the ZEO cache)
> >
> > - set invalidation-age on the server to at least an hour or two so
> >   that you deal with being disconnected from the storage server for a
> >   reasonable period of time without having to verify.
> >
> > Cheers,
> >
> > Leo
> >
> > On Thu, Mar 7, 2013 at 3:54 PM, Roché Compaan
> >  wrote:
> >> We have a setup that is running just fine when the caches are warm but
> >> it takes several minutes after a restart before the cache warms up.
> >> As per usual, big catalog indexes seem to be the problem.
> >>
> >> I was wondering about two things. Firstly, in 2011 in this thread
> >> https://mail.zope.org/pipermail/zodb-dev/2011-October/014398.html
> >> about zeo.memcache, Shane said that he could adapt the caching code in
> >> RelStorage for ZEO. Shane do you still plan to do this? Do you think
> >> an instance can restart without having to reload most objects into the
> >> cache?
> >>
> >> Secondly, I was wondering to what extent using persistent caches can
> >> improve cache warm up time and if persistent caches are usable or not,
> >> given that at various times in the past, it was recommended that one
> >> try and avoid them.
> >>
> >> --
> >> Roché Compaan
> >> Upfront Systems   http://www.upfrontsystems.co.za
> >> ___
> >> For more information about ZODB, see http://zodb.org/
> >>
> >> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> >> https://mail.zope.org/mailman/listinfo/zodb-dev
> ___
> For more information about ZODB, see http://zodb.org/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev
>
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Cache warm up time

2013-03-07 Thread Claudiu Saftoiu
I was having this same issue. Persistent caching helped a little bit but not
too much. I didn't end up implementing this but ultimately the best thing
to do
seemed to be to have a different server with a different zodb that only
handles indexing. That way it will never restart and lose its cache. The
downside
is you have to figure out a way to communicate between the two servers.
Maybe a clean way would be to use the multiprocessing module somehow.

On Thu, Mar 7, 2013 at 1:54 PM, Roché Compaan wrote:

> We have a setup that is running just fine when the caches are warm but
> it takes several minutes after a restart before the cache warms up.
> As per usual, big catalog indexes seem to be the problem.
>
> I was wondering about two things. Firstly, in 2011 in this thread
> https://mail.zope.org/pipermail/zodb-dev/2011-October/014398.html
> about zeo.memcache, Shane said that he could adapt the caching code in
> RelStorage for ZEO. Shane do you still plan to do this? Do you think
> an instance can restart without having to reload most objects into the
> cache?
>
> Secondly, I was wondering to what extent using persistent caches can
> improve cache warm up time and if persistent caches are usable or not,
> given that at various times in the past, it was recommended that one
> try and avoid them.
>
> --
> Roché Compaan
> Upfront Systems   http://www.upfrontsystems.co.za
> ___
> For more information about ZODB, see http://zodb.org/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev
>
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] A certain code path seems to be blocking everything

2013-02-15 Thread Claudiu Saftoiu
I figured out what was wrong and it was so convoluted I thought I'd share
it.

It ended up having nothing to do with zodb *per se*. I had a thread which,
whenever it processes a certain kind of internal message, loaded & saved a
~1.5mb pickled object from disk. The 'special client' ended up triggering
this thread twice/request. It was using pickle instead of cPickle, and I
think due to the GIL it was just thrashing everything else, though in
particular it seemed the zodb commit code was impacted particularly hard.

Using cPickle instead of pickle seems to have fixed the issue for now, but
I'm going to change that part of the code to be more efficient.

- Claudiu

On Thu, Feb 14, 2013 at 4:24 PM, Claudiu Saftoiu  wrote:

> I've got a weird bug with my server and I'm wondering if anyone could
> provide some insight
> into it.
>
> The general idea of the server is that I have 8 or so clients constantly
> pinging it for information.
> Each ping usually only takes up to 2 seconds to process.
>
> Essentially, everything runs fine, until a certain code path is executed -
> basically, one of
> the clients causes the server to try to give it some other information
> than usual. When this
> client is turned on and starts pinging the server, suddenly all those
> requests that only took
> 0.5-2 seconds now take 10, 20, or even 30 seconds.
>
> I'm using paster and using its thread-tracker tool I can see what all the
> threads are doing at
> any given point. Usually there's anywhere from 0 to 4 requests going on,
> each having taken
> less than 1 second up to that point. When the 'special client' is turned
> on, there's more like
> 10 requests each having taken 6 seconds or more up to that point. There
> aren't more
> requests, it's just that they don't clear out quickly enough. The tail end
> of the tracebacks
> are one of two things:
>
>   File "/home/tsa/env/lib/python2.6/site-packages/transaction/_manager.py", 
> line 89, in commit
> return self.get().commit()
>   File 
> "/home/tsa/env/lib/python2.6/site-packages/transaction/_transaction.py", line 
> 329, in commit
> self._commitResources()
>   File 
> "/home/tsa/env/lib/python2.6/site-packages/transaction/_transaction.py", line 
> 441, in _commitResources
> rm.tpc_begin(self)
>   File "/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.py", line 
> 547, in tpc_begin
> self._normal_storage.tpc_begin(transaction)
>   File "/home/tsa/env/lib/python2.6/site-packages/ZEO/ClientStorage.py", line 
> 1118, in tpc_begin
> self._tpc_cond.wait(30)
>   File "/usr/lib/python2.6/threading.py", line 258, in wait
> _sleep(delay)
>
> Or:
>
>   File "/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.py", line 
> 856, in setstate
> self._setstate(obj)
>   File "/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.py", line 
> 897, in _setstate
> p, serial = self._storage.load(obj._p_oid, '')
>   File "/home/tsa/env/lib/python2.6/site-packages/ZEO/ClientStorage.py", line 
> 824, in load
> self._load_lock.acquire()
>
> I read through the code a bit and it sees that only one thread can commit
> at a time.
> So, my first guess is that the 'special' request is taking particularly
> long to commit, which
> ends up slowing everything else down. Would that be a fair estimation?
> What might
> cause a commit to take abnormally long? Just looking at the code I can't
> see anything
> that particularly stands out, especially since this bug only started
> happening recently
> and I haven't really changed that part of the code on the server-side
> recently. I did
> refactor the client code but I don't yet see how that might be causing
> this issue.
>
> One last thing. I have certain views which touch a lot of different parts
> of the database and
> do a lot of processing. I had many conflict errors in the past, and
> retrying the entire view
> all the time proved to be too inefficient, so I separated the code out
> into blocks, where
> I retry each block separately. This function describes the pattern well:
>
> def commit_and_inner_retry(transaction_manager, func, num_retries,
> ingdescription)):
> """Using the `transaction_manager`:
> 1) commit the transaction thus far
> 2) try `num_retries` times to begin a new transaction, execute
> `func`, which is
>passed the current transaction as its only argument, and then
> commit the
>new transaction. if ConflictError is raised, abort & try again
> 3)

[ZODB-Dev] A certain code path seems to be blocking everything

2013-02-14 Thread Claudiu Saftoiu
I've got a weird bug with my server and I'm wondering if anyone could
provide some insight
into it.

The general idea of the server is that I have 8 or so clients constantly
pinging it for information.
Each ping usually only takes up to 2 seconds to process.

Essentially, everything runs fine, until a certain code path is executed -
basically, one of
the clients causes the server to try to give it some other information than
usual. When this
client is turned on and starts pinging the server, suddenly all those
requests that only took
0.5-2 seconds now take 10, 20, or even 30 seconds.

I'm using paster and using its thread-tracker tool I can see what all the
threads are doing at
any given point. Usually there's anywhere from 0 to 4 requests going on,
each having taken
less than 1 second up to that point. When the 'special client' is turned
on, there's more like
10 requests each having taken 6 seconds or more up to that point. There
aren't more
requests, it's just that they don't clear out quickly enough. The tail end
of the tracebacks
are one of two things:

  File "/home/tsa/env/lib/python2.6/site-packages/transaction/_manager.py",
line 89, in commit
return self.get().commit()
  File "/home/tsa/env/lib/python2.6/site-packages/transaction/_transaction.py",
line 329, in commit
self._commitResources()
  File "/home/tsa/env/lib/python2.6/site-packages/transaction/_transaction.py",
line 441, in _commitResources
rm.tpc_begin(self)
  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.py",
line 547, in tpc_begin
self._normal_storage.tpc_begin(transaction)
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/ClientStorage.py",
line 1118, in tpc_begin
self._tpc_cond.wait(30)
  File "/usr/lib/python2.6/threading.py", line 258, in wait
_sleep(delay)

Or:

  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.py",
line 856, in setstate
self._setstate(obj)
  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.py",
line 897, in _setstate
p, serial = self._storage.load(obj._p_oid, '')
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/ClientStorage.py",
line 824, in load
self._load_lock.acquire()

I read through the code a bit and it sees that only one thread can commit
at a time.
So, my first guess is that the 'special' request is taking particularly
long to commit, which
ends up slowing everything else down. Would that be a fair estimation? What
might
cause a commit to take abnormally long? Just looking at the code I can't
see anything
that particularly stands out, especially since this bug only started
happening recently
and I haven't really changed that part of the code on the server-side
recently. I did
refactor the client code but I don't yet see how that might be causing this
issue.

One last thing. I have certain views which touch a lot of different parts
of the database and
do a lot of processing. I had many conflict errors in the past, and
retrying the entire view
all the time proved to be too inefficient, so I separated the code out into
blocks, where
I retry each block separately. This function describes the pattern well:

def commit_and_inner_retry(transaction_manager, func, num_retries,
ingdescription)):
"""Using the `transaction_manager`:
1) commit the transaction thus far
2) try `num_retries` times to begin a new transaction, execute
`func`, which is
   passed the current transaction as its only argument, and then
commit the
   new transaction. if ConflictError is raised, abort & try again
3) if a failure happens, then an error is printed involving
`ingdescription`
4) raise ConflictError if retrying `num_retries` times does not
work.

"""
tm = transaction_manager
tm.commit()
for retry in range(num_retries):
tm.begin()
return_value = func(tm.get())
try:
tm.commit()
except ConflictError:
print "Conflict error attempt #%d %s, trying again" %
(retry+1, ingdescription)
tm.abort()
continue
break
else:
raise ConflictError("Was never able to commit %s" %
ingdescription)

return return_value

With the view code looking something like this:

def long_view(context, request):
db_objs = prepare_stuff()

def do_stuff_1():
return stuff_1_with_db_objs(db_objs)
stuff1 = commit_and_inner_retry(
transaction.manager, block, 10, 'doing stuff 1')

def do_stuff_2():
return stuff_2_with_db_objs(stuff1)
stuff2 = commit_and_inner_retry(
transaction.manager, block, 10, 'doing stuff 2')

def do_stuff_3():
stuff_3_with_db_objs(stuff2)
try:
commit_and_inner_retry(
transaction.manager, block, 10, 'doing stuff 3')
except ConflictError:
#this stuff not import

Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-21 Thread Claudiu Saftoiu
On Sat, Jan 19, 2013 at 10:00 AM, Jim Fulton  wrote:

> - ZODB doesn't simply load your database into memory.
>   It loads objects when you try to access their state.
>   If you're using ZEO (or relstorage, or neo), each load requires a
>   round-trip to the server.  That's typically a millisecond or two,
>   depending on your network setup.  (Your database is small, so disk
>   access shouldn't be an issue as it is, presumably in your disk
>   cache.

I understand. It seems to be able to unghost about 1 catalog-related
objects/minute - does that sound about right?


> - You say it often takes you a couple of minutes to handle requests.
>   This is obviously very long.  It sounds like there's an issue
>   with the way you're using the catalog.  It's not that hard get this
>   wrong.  I suggest either hiring someone with experience in this
>   area to help you or consider using another tool, like solr.
>   (You could put more details of your application here, but I doubt
>   people will be willing to put in the time to really analyze it and
>   tell you how to fix it.  I know I can't.)


That's alright, I won't ask for such a time investment. As it is I greatly
appreciate everyone for replying and helping out already - thanks guys!

- solr is so fast it almost makes me want to cry.  At ZC, we're
>   increasingly using solr instead of the catalog.  As the original
>   author of the catalog, this makes me sad, but we just don't have the
>   time to put in the effort to equal solr/lucene.
> - A common mistake when using ZODB is to use it like a relational
>   database, puting most data in catalog-like data structures and
>   querying to get most of your data.  The strength of a OODB is that
>   you don't have to query to get data from a well-designed object
>   model.


My use case is basically this: I have 400,000 'documents' with 17 attributes
that I want to search on. One of them is the date of the document. This
index
I could easily do away with as the documents are organized roughly by date.
However, if I want to get a 'document' made at any date but with a certain
attribute
in a certain range, I don't have a good way to do it based on how they are
stored,
now. I could try making my own indexing scheme but I figured ZCatalog would
be well-suited for this...

On Thu, Jan 17, 2013 at 12:31 PM, Claudiu Saftoiu 
> wrote:
> ...
> > One potential thing is this: after a zeopack the index database .fs file
> is
> > about 400 megabytes, so I figure a cache of 3000 megabytes should more
> than
> > cover it. Before a zeopack, though - I do one every 3 hours - the file
> grows
> > to 7.6 gigabytes.
>
> In scanning over this thread while writing my last message, I noticed
> this.
>
> This is a ridiculous amount of churn. There is likely something
> seriously out of whack with your application.  Every application is
> different, but we typically see *weekly* packs reduce database size by
> at most 50%.
>

All that database contains is: a catalog with 17 indices of 400,000 objects,
the root object, a document map, and an object to hold the catalog. The
document map itself I put as a 'document_map' attribute of the catalog.
Because
of the nature of my app I have to add and re-index those objects quite
often (they
change a lot). This seems to cause the index .fs file to grow by a
ridiculous
amount... is there anything obviously wrong with the above picture?

The main database does not have quite so much churn. Right after a pack
just now, it
was 5715MB, and it gets to at most 6000MB or so after 3 hours (often just
up to 5800MB).
I don't have to run the pack quite so often - is there a significant
downside to packing often?


> > Shouldn't the relevant objects - the entire set of latest
> > versions of the objects - be the ones in the cache, thus it doesn't
> matter
> > that the .fs file is 7.6gb as the actual used bits of it are only 400mb
> or
> > so?
>
> Every object update invalidates cached versions of the obejct in all
> caches except the writer's.  (Even the writer's cached value is
> invalidated of conflict-resolution was performed.)
>
> > Another question is, does zeopacking destroy the cache?
>
> No, but lots of writing does.
>

I see. After all the above it really sounds like if I want fast indexing I
should
just drop zcatalog and go ahead and use solr. It doesn't seem zcatalog +
zodb,
the way they are now, are really made to handle many objects with many
indices
that get updated often...

Thanks for all the help,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
> I wonder if disk latency is the problem?. As a test you could put the
> index.fs file into a tmpfs and see if that improves things, or cat
> index.fs > /dev/null to try and force it into the fs cache.
>

Hmm, it would seem not... the cat happens instantly:

(env)tsa@sp2772c:~/sports$ time cat Data_IndexDB.fs > /dev/null

real0m0.065s
user0m0.000s
sys 0m0.064s

The database isn't even very big:

rw-r--r-- 1 tsa tsa 233M Jan 18 14:34 Data_IndexDB.fs

Which makes me wonder why it takes so long to load it into memory it's
just a bit frustrating that the server has 7gb of RAM and it's proving to
be so difficult to get ZODB to keep ~300 megs of it up in there. Or,
indeed, if linux already has the whole .fs file in a memory cache, where
are these delays coming from? There's something I don't quite understand
about this whole situation...

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
> > Any suggestions? There must be a way to effectively use indexing with
> zodb
> > and what I'm doing isn't working.
>
> Have you confirmed that the ZEO client cache file is being used?
> Configure logging to display the ZEO messages to make sure.
>
> The client cache is transient by default, so you will need to enable
> persistent client caching to see an effect past restarts:
>
> 
>   client zeo1
>   ...
> 
>
> https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt
>

Yep, I specified a var of 'zeocache' and a client of 'index', and there is
indeed a ./zeocache/index-1.zec file and a ./zeocache/index-1.zec.lock
file.



> Laurence
>
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
>
> That is after having run the 'preloading'. It seems that when the query
> takes forever, the non-ghost-size is slowly increasing (~100
> objects/second) while the 'size' stays the same. Once the query is done
> after having taken a few minutes, each subsequent run is instant and the
> ngsize doesn't grow. My naive question is: it has plenty of RAM, why does
> it not just load everything into the RAM?
>

It's actually not *that *slow - I didn't realize that everything seems to
stop while it's asking for cacheDetailSize. It seems to load about 1
objects/minute, most of these being IFTreeSet/IFSet. This seems a bit
slow... if the index db has 750k objects in it, then it would take 75
minutes, at this rate, to read through it all, meaning an extensive query
would really take way too long...

Also my ZEO server is running locally, anyway, so the local socket transfer
speed shouldn't really be much slower than loading from the persistent
cache, should it? Either way it ends up loading from disk.

I don't quite understand why the zeoserver doesn't have any sort of
caching... hence my earlier thoughts of a memcachedb server to load all
this in RAM and to just run forever. Why would it not be a win in my
situation?

I'm pretty new to zodb so perhaps I don't understand a lot of the design
decisions very well and thus how best to take advantage of zodb, but I'm
willing to learn.

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
> > Er, to be clearer: my goal is for the preload to load everything into the
> > cache that the query mechanism might use.
> >
> > It seems the bucket approach only takes ~10 seconds on the 350k-sized
> index
> > trees vs. ~60-90 seconds. This seems to indicate that less things end up
> > being pre-loaded...
>
> I guess I was too subtle before.
>
> Preloading is a waste of time.  Just use a persistent ZEO cache
> of adequate size and be done with it.
>

Okay. I did that, and I only tried the preloading because it didn't seem I
was getting what I wanted.

To wit: I ran a simple query and it took a good few minutes. It's true,
after it took a few minutes, it ran instantly, and even after a server
restart it only took a few seconds, but I don't understand why it took a
few minutes in the first place. There are only 750k objects in that
database, and I gave it a cache object size of 5 million; the packed
database .fs is only 400 megabytes, and I gave it a cache byte size of 3000
megabytes.

Then when I change one parameter of the query (to ask for objects with a
month of november instead of october), it takes another few minutes...

Speaking to your point, preloading didn't seem to help either (I had
'preloaded' dozens of times over the past few days and the queries still
took forever), but the fact remains: it does not seem unreasonable to want
these queries to run instantly from the get-go, given that is the point of
indexing in the first place. As it stands now, for certain queries I could
probably do better loading each object and filtering it via python because
I wouldn't have to deal with loading the indices in order to run the 'fast'
query, but this seems to defeat the point of indices entirely, and I'd like
to not have to create custom search routines for every separate query.
Again, maybe I'm doing something wrong, but I haven't been able to figure
it out yet.

I made a view to display the output of cacheDetailSize like Jeff suggested
and I got something like this:

db = ...
for conn_d in db.cacheDetailSize():
writer.write("%(connection)s, size=%(size)s,
non-ghost-size=%(ngsize)s\n" % conn_d)

output:

, size=635683, non-ghost-size=209039
, size=3490, non-ghost-size=113

That is after having run the 'preloading'. It seems that when the query
takes forever, the non-ghost-size is slowly increasing (~100
objects/second) while the 'size' stays the same. Once the query is done
after having taken a few minutes, each subsequent run is instant and the
ngsize doesn't grow. My naive question is: it has plenty of RAM, why does
it not just load everything into the RAM?

Any suggestions? There must be a way to effectively use indexing with zodb
and what I'm doing isn't working.

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
>
>
>> If you want to load the btree item into cache, you need to do
>>
>>   item._p_activate()
>>
>
> That's not going to work, since `item` is a tuple. I don't want to load
> the item itself into the cache, I just want the btree to be in the cache.
>

Er, to be clearer: my goal is for the preload to load everything into the
cache that the query mechanism might use.

It seems the bucket approach only takes ~10 seconds on the 350k-sized index
trees vs. ~60-90 seconds. This seems to indicate that less things end up
being pre-loaded...

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Claudiu Saftoiu
On Fri, Jan 18, 2013 at 9:02 AM, Marius Gedminas  wrote:

> On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote:
> > I wrote the following code to preload the indices:
> >
> > def preload_index_btree(index_name, index_type, btree):
> > print "((Preloading '%s' %s index btree...))" % (index_name,
> > index_type)
> > start = last_print = time.time()
> > for i, item in enumerate(btree.items()):
> > item
>
> That's a no-op: you might as well just write 'pass' here.
>

True, I wanted to do something with 'item' but didn't know what.



> > print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % (
> > index_name, index_type, i, time.time() - start,
> > )
>
> If you ever get an empty btree, you'll get an UnboundLocalError: 'i' here.
>
> Drop the enumerate() trick and just use len(btree), it's efficient.
>

Thanks for catching that. `len` still takes a while on a large btree though
if it isn't in memory:

In [7]: start = time.time(); len(bt); end = time.time()
Out[7]: 350169
In [8]: end - start
Out[8]: 32.397267818450928

It actually seems to require loading the entire tree, because after running
`len`, subsequent operations (like iterating through the entire tree) start
happening instantly. However, since I just iterated through the entire
tree, it will definitely be fast at that point.


> If you want to load the btree item into cache, you need to do
>
>   item._p_activate()
>

That's not going to work, since `item` is a tuple. I don't want to load the
item itself into the cache, I just want the btree to be in the cache. I
figured iterating through the entire tree would force it to be loaded, but
is that not the case? If not then what should I call `_p_activate()` on? I
assume calling it on the tree itself won't cause all its internals to be
loaded. I'm not familiar with the internals of the BTree, however. Would
this be a better solution?

def preload_index_btree(index_name, index_type, btree):
print "((Preloading '%s' %s index btree...))" % (index_name,
index_type)
start = time.time()
num_buckets = 0
bucket = btree._firstbucket
while bucket:
bucket._p_activate()
num_buckets += 1
bucket = bucket._next
print "((Preloaded '%s' %s index btree (%d/%d buckets items in
%.2fs)))" % (
index_name, index_type, len(btree), num_buckets, time.time() -
start,
)

> def preload_catalog(catalog):
> > """Given a catalog, touch every persistent object we can find to
> > force
> > them to go into the cache."""
> > start = time.time()
> > num_indices = len(catalog.items())
> > for i, (index_name, index) in enumerate(catalog.items()):
> > print "((Preloading index %2d/%2d '%s'...))" % (i+1,
> > num_indices, index_name,)
> > preload_index_btree(index_name, 'fwd', index._fwd_index)
> > preload_index_btree(index_name, 'rev', index._rev_index)
> > print "((Preloaded catalog! Took %.2fs))" % (time.time() - start)
> >
> > And I run it on server start as follows (modified for the relevant
> parts; I
> > tried to make the example simple but it ended up needing a lot of parts).
> > This runs in a thread:
> >
> > from util import zodb as Z
> > from util import zodb_query as ZQ
> > for i in xrange(3):
> > connwrap = Z.ConnWrapper('index')
> > print "((Preload #%d...))" % (i+1)
> > with connwrap as index_root:
> > ZQ.preload_catalog(index_root.index.catalog)
> > connwrap.close()
>
> Every thread has its own in-memory ZODB object cache, but if you have
> configured a persistent ZEO client cache, it should help.
>

Gotcha. Thanks for the help!
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-17 Thread Claudiu Saftoiu
Thanks, Jeff! I'll check out that cacheDetail stuff and see if I simply
have too few objects being stored (though I thought 5 million would be
enough...)

> Further, after having preloaded the indices once, shouldn't it preload
> quite rapidly upon further server restarts, if it's all in the cache and
> the cache is persisted?
>
>
> Again, there are two caches here and they are not really related. The
> "persistent cache" is for ZEO to keep local copies instead of having to
> constantly hit the network. The object or 'connection' cache is what is in
> memory being used by the application. It still requires IO operations to
> find all of the bytes from the persistent ZEO cache and move them into
> memory as objects. The connection/object cache does not get preserved
> between restarts. The client/persistent cache is not a memory dump. If you
> run the ZODB with just a local FileStorage file, there is no 'persistent
> cache' aside from the database file itself.
>

Understood. Is there any good way to memory dump a connection's cache, and
re-load it when a connection is made again? It seems that would be
particularly useful in my situation, and much simpler than making a new
server to deal solely with indexing. If no one has done it yet, then is it
feasible? Would it make sense in this context, I mean, and speed up the
warming-up process? If yes, what would be a good place to start?

Thanks again,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-17 Thread Claudiu Saftoiu
>
> > Okay, that makes sense. Would that be a server-side cache, or a
> client-side
> > cache?
>
> There are no server-side caches (other than the OS disk cache).
>

Ok, that's what I gathered before, was just checking.

> I believe I've already succeeded in getting a client-side persistent
> > disk-based cache to work (my zodb_indexdb_uri is
> >
> "zeo://%(here)s/zeo_indexdb.sock?cache_size=2000MB&connection_cache_size=50&connection_pool_size=5&var=zeocache&client=index"),
>
> This configuration syntax isn't part of ZODB.  I'm not familiar with
> the options there.


Ah yes it's a part of repoze -
http://docs.repoze.org/zodbconn/narr.html#zeo-uri-scheme . I looked into
this, and the following mappings from uri-syntax to xml-syntax hold true:

  cache_size --> zodb/zeoclient/cache-size
  connection_cache_size --> zodb/cache-size
  connection_pool_size --> zodb/pool-size
  var --> zodb/zeoclient/var
  client --> zodb/zeoclient/client


> > but this doesn't seem to be what you're referring to as that is exactly
> the
> > same size as the in-memory cache.
>
> I doubt it, but who knows?
>

I meant that I have only one cache-size option in terms of bytes, and the
cache made on the disk is exactly that size (rather, it reserves all the
space on disk instantly, even if it isn't all used).

--

Here's a detailed description of the issues I'm having.

I wrote the following code to preload the indices:

def preload_index_btree(index_name, index_type, btree):
print "((Preloading '%s' %s index btree...))" % (index_name,
index_type)
start = last_print = time.time()
for i, item in enumerate(btree.items()):
item
print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % (
index_name, index_type, i, time.time() - start,
)
def preload_catalog(catalog):
"""Given a catalog, touch every persistent object we can find to
force
them to go into the cache."""
start = time.time()
num_indices = len(catalog.items())
for i, (index_name, index) in enumerate(catalog.items()):
print "((Preloading index %2d/%2d '%s'...))" % (i+1,
num_indices, index_name,)
preload_index_btree(index_name, 'fwd', index._fwd_index)
preload_index_btree(index_name, 'rev', index._rev_index)
print "((Preloaded catalog! Took %.2fs))" % (time.time() - start)

And I run it on server start as follows (modified for the relevant parts; I
tried to make the example simple but it ended up needing a lot of parts).
This runs in a thread:

from util import zodb as Z
from util import zodb_query as ZQ
for i in xrange(3):
connwrap = Z.ConnWrapper('index')
print "((Preload #%d...))" % (i+1)
with connwrap as index_root:
ZQ.preload_catalog(index_root.index.catalog)
connwrap.close()

Z.ConnWrapper is something that uses my config to return connections such
that I only have one DB instance for the whole server process:

  class ConnWrapper(object):
def __init__(self, db_name):
global_config = appconfig.get_config()
db_conf = global_config['dbs'][db_name]

db = db_conf['db']
self.appmaker = db_conf['appmaker']

conn = db.open()
self.conn = conn
self.cur_t = None
#...
def get_approot(self):
return self.appmaker(self.conn.root())
def __enter__(self):
""".begin() transaction and return the app_root"""
if self.cur_t:
raise ValueError("transaction already in progres")
self.cur_t = self.conn.transaction_manager.begin()
return self.get_approot()
def __exit__(self, typ, value, tb):
if typ is None:
try:
self.cur_t.commit()
except:
self.cur_t = None
raise

self.cur_t = None
else:
self.cur_t.abort()
self.cur_t = None

The relevant part of the global config setup is:

from repoze.zodbconn.uri import db_from_uri
from indexdb.models import appmaker as indexdb_appmaker
#...
zodb_indexdb_uri = global_config.get('zodb_indexdb_uri')
index_db = db_from_uri(zodb_indexdb_uri)
global_config['dbs'] = {
'index': {
'db': index_db,
'appmaker': indexdb_appmaker,
},
}

`zodb_indexdb_uri` is in my .ini file as mentioned above:

zodb_indexdb_uri =
zeo://%(here)s/zeo_indexdb.sock?cache_size=3000MB&connection_cache_size=500&connection_pool_size=5&var=zeocache&client=index

The preloading seems to accomplish its purpose. When I restart the server,
it takes a while to run through all the indices the first time over and the
memory usage grows as this is happening, e.g.:

((Preloading index  3/17 'account'...))
((Preloading 'account' fwd index btree...))
((Preloaded 'account' fwd index btree (37 items in 0.00s)))
((Preloading 'account' rev index

Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-15 Thread Claudiu Saftoiu
On Tue, Jan 15, 2013 at 2:40 PM, Jim Fulton  wrote:

> So, first, a concise partial answer to a previous question:
>
> ZODB provides an in-memory object cache.  This is non-persistent.
> If you restart, it is lost.  There is a cache per connection and the
> cache size is limited by both object count and total object size (as
> estimated by database record size).
>
> ZEO also provides a disk-based cache of database records read
> from the server.  This is normally much larger than the in-memory cache.
> It can be configured to be persistent.  If you're using blobs, then there
> is a separate blob cache.
>
> On Tue, Jan 15, 2013 at 2:15 PM, Claudiu Saftoiu 
> wrote:
> >> You can't cause a specific object (or collection of objects) to stay
> >> ion the cache, but if you're working set is small enough to fit in
> >> the memory or client cache, you can get the same effect.
> >
> >
> > That makes sense. So, is there any way to give ZODB a Persistent and
> tell it
> > "load everything about the object now for this transaction" so  that the
> > cache mechanism then gets triggered, or do I have to do a custom search
> > through every aspect of the object, touching all Persistents it touches,
> > etc, in order to get everything loaded? Essentially, when  the server
> > restarts, I'd like to pre-load all these objects (my cache is indeed big
> > enough), so that if a few hours later someone makes a request that uses
> it,
> > the objects will already be cached instead of starting to be cached right
> > then.
>
> ZODB doesn't provide any pre-warming facility.  This would be
> application dependent.
>
> You're probably better off using a persistent ZEO cache
> and letting the cache fill with objects you actually use.
>

Okay, that makes sense. Would that be a server-side cache, or a client-side
cache? I believe I've already succeeded in getting a client-side persistent
disk-based cache to work (my zodb_indexdb_uri is
"zeo://%(here)s/zeo_indexdb.sock?cache_size=2000MB&connection_cache_size=50&connection_pool_size=5&var=zeocache&client=index"),
but this doesn't seem to be what you're referring to as that is exactly the
same size as the in-memory cache. Could you provide some pointers as to how
to get a persistent disk-based cache on the ZEO server, if that is what you
meant? It seems ZODB/ZEO  really lacks for centralized documentation.

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-15 Thread Claudiu Saftoiu
>
> You can't cause a specific object (or collection of objects) to stay
> ion the cache, but if you're working set is small enough to fit in
> the memory or client cache, you can get the same effect.
>

That makes sense. So, is there any way to give ZODB a Persistent and tell
it "load everything about the object now for this transaction" so  that the
cache mechanism then gets triggered, or do I have to do a custom search
through every aspect of the object, touching all Persistents it touches,
etc, in order to get everything loaded? Essentially, when  the server
restarts, I'd like to pre-load all these objects (my cache is indeed big
enough), so that if a few hours later someone makes a request that uses it,
the objects will already be cached instead of starting to be cached right
then.
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-15 Thread Claudiu Saftoiu
On Tue, Jan 15, 2013 at 2:07 PM, Leonardo Santagada wrote:

>
>
>
> On Tue, Jan 15, 2013 at 3:10 PM, Jim Fulton  wrote:
>
>> On Tue, Jan 15, 2013 at 12:00 PM, Claudiu Saftoiu 
>> wrote:
>> > Hello all,
>> >
>> > I'm looking to speed up my server and it seems memcached would be a good
>> > way to do it - at least for the `Catalog` (I've already put the catalog
>> in a
>> > separate
>> > zodb with a separate zeoserver with persistent client caching enabled
>> and it
>> > still doesn't run as nice as I like...)
>> >
>> > I've googled around a bit and found nothing definitive, though...
>> what's the
>> > best way to combine zodb/zeo + memcached as of now?
>>
>> My opinion is that a distributed memcached isn't
>> a big enough win, but this likely depends on your  use cases.
>>
>> We (ZC) took a different approach.  If there is a reasonable way
>> to classify your corpus by URL (or other request parameter),
>> then check out zc.resumelb.  This fit our use cases well.
>>
>
> Maybe I don't understand zodb correctly but if the catalog is small enough
> to fit in memory wouldn't it be much faster to just cache the whole catalog
> on the clients? Then at least for catalog searches it is all mostly as fast
> as running through python objects. Memcache will put an extra
> serialize/deserialize step into it (plus network io, plus context
> switches).
>

That would be fine, actually. Is there a way to explicitly tell ZODB/ZEO to
load an entire object and keep it in the cache? I also want it to remain in
the cache on connection restart, but I think I've already accomplished that
with persistent client-side caching.
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-15 Thread Claudiu Saftoiu
Hello all,

I'm looking to speed up my server and it seems memcached would be a good
way to do it - at least for the `Catalog` (I've already put the catalog in
a separate
zodb with a separate zeoserver with persistent client caching enabled and it
still doesn't run as nice as I like...)

I've googled around a bit and found nothing definitive, though... what's
the
best way to combine zodb/zeo + memcached as of now?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB-Dev Digest, Vol 118, Issue 7

2013-01-10 Thread Claudiu Saftoiu
Hey Vincent,

Thanks for the detailed reply! That really clears things up.

> > - Are there any other caching options?
> The option to make pickle cache persistent comes to mind. There may be
others,
> I don't recall right now.

This seems like a good idea - I would like for the server to maintain its
cache when I restart it (which is fairly often as I'm always working on
it). However, I seem to have run into some problems. I tried this uri
(added newlines for ease of reading):

zeo://%(here)s/zeo.sock?
 cache_size=2000MB&
 connection_cache_size=50&
 connection_pool_size=15&
 var=zeocache&
 client=main

Yet I get a "zc.lockfile.LockError: Couldn't lock 'zeocache/main-1.zec.lock'"
error. I've tried googling it and it seems like this happens if multiple
processes attempt to access the same database if one isn't using ZEO, but I
am using ZEO and this is only related to the cache. I'm using paster to
serve the app and I don't think it creates multiple processes. Why might
this be happening? Here is my `app` function that paster uses to get the
wsgi app along with all relevant code:

import logging
import time
import threading

from repoze.bfg.configuration import Configurator
from repoze.zodbconn.finder import PersistentApplicationFinder

from util.pyshell import in_shell

from mainapp.models import appmaker
from mainapp import server_threads

def check_start_threads():
time.sleep(5)
if not in_shell():
server_threads.start_server_threads()

def app(global_config, **settings):
logging.basicConfig()

zodb_uri = global_config.get('zodb_uri')
if zodb_uri is None:
raise ValueError("No 'zodb_uri' in application configuration.")

zcml_file = settings.get('configure_zcml', 'configure.zcml')

finder = PersistentApplicationFinder(zodb_uri, appmaker)
def get_root(request):
return finder(request.environ)
config = Configurator(root_factory=get_root, settings=settings)
config.begin()
config.load_zcml(zcml_file)
config.end()

th = threading.Thread(target=check_start_threads)
th.start()

return config.make_wsgi_app()


Seems fairly straightforward, no?

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] documentation on the various caching options

2013-01-09 Thread Claudiu Saftoiu
I'm going a bit nuts trying to figure out what all the different caches
mean. This doesn't seem to be well documented, but I'll keep googling after
sending out this email.

Here's the various configs I've got. I have a zeo.conf:

%define INSTANCE .


  address $INSTANCE/zeo_indexdb.sock
  read-only false
  invalidation-queue-size 100
  pid-filename $INSTANCE/zeo_indexdb.pid



  
path $INSTANCE/Data_IndexDB.fs
  
  blob-dir $INSTANCE/blobs_indexdb



I run this with `runzeo -C zeo.conf` .

Next I have an `app.ini` which I run with `paster server --verbose --reload
app.ini`. It has a `zodb_uri`:

zodb_uri =
zeo://%(here)s/zeo.sock?cache_size=1000MB&connection_cache_size=10&connection_pool_size=30

I hear there is a `cache-size` option and a `cache-size-bytes` option,
which goes in a `zodb.conf`, but I don't know where to put it in my
`zeo.conf`. So my question is:

In the `.conf` files:
- What is cache-size?
- What is cache-size-bytes?
- Are there any other caching options?
- Where would they go in my `zeo.conf`, or if they can't go there, what
should I change so I can configure this? I tried putting them under ,
, and , but I get "not a known key name" errors.

In the `zodb_uri`:
- What is cache_size?
- What is connection_cache_size?

I'm basically concerned with: what does 'runzeo' cache and how do I
configure that, and what does the local connection cache and how do I
configure that?

Thanks in advance,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] runzeo perma-memory leak

2013-01-09 Thread Claudiu Saftoiu
Oh never mind, http://www.linuxatemyram.com/ . =)
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] runzeo perma-memory leak

2013-01-09 Thread Claudiu Saftoiu
Hello all,

It seems that something weird is happening with runzeo & my server's memory.

I have a Data.fs that is 5.5gb, and 7.6gb of RAM on my server. I do 'runzeo
-C zeo.conf' where the 'zeo.conf' is:

%define INSTANCE .


  address $INSTANCE/zeo.sock
  read-only false
  invalidation-queue-size 100
  pid-filename $INSTANCE/zeo.pid



  
pack-gc false
path $INSTANCE/Data.fs
  
  blob-dir $INSTANCE/blobs



To simplify things I use 'zeopack' instead of my server as the test case:

$ zeopack -u zeo.sock


The entire time the zeopack is running my available memory (seen using
"top") gradually decreases, from 7.5gb at the start to 50mb somewhere in
the middle, where it stays until the end. In all this time the VIRT and RES
columns of runzeo don't change from 249m and 171m respectively.

Once zeopack is done (Data.fs is now 5.4 gb), memory use goes up to 64mb.
If I CTRL-C out of runzeo, memory use goes up to 327mb. This is far, far
less than the initial 7.5gb that was free before I started runzeo &
zeopack, and both those programs are now done.

Grepping through my environment shows that I'm using
"ZODB3-3.10.2-py2.6.egg".

So, what's going on? Why does the memory not get released at least when I
stop runzeo? How would I go about not having this memory leak happen? Also
let me know if I should ask elsewhere but this seems relevant to zodb.

Thanks in advance,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] zeopack error in zrpc.connection

2013-01-07 Thread Claudiu Saftoiu
>
> > I'm afraid this doesn't seem to help me figure out what's wrong...
>
> I suspect your database is corrupted.  You'd probably want to look at
> the record in question to be sure.
>

Sure, I've re-run the pack and will dump the pickled object to a file to
inspect it - is that what you meant? (How else would I figure out what the
record is?)


> You could disable garbage collection, but if you have a damaged
> record, you might want to use the previous version of the record
> (if it exists) to recover it.
>

What do you mean by disable garbage collection - you mean disable removing
old versions of records that are no longer used? I can't do that
unfortunately, the database gets too large.

How would I go about attempting to find the previous version of the record?
If I know what the record is I can just decide whether to let it be lost -
would catching the TypeError and 'pass'ing accomplish that?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] zeopack error in zrpc.connection

2013-01-07 Thread Claudiu Saftoiu
> How do I go about fixing this? Let me know if I can provide any other
> information that would be helpful.
>

I took the advice in this thread:
https://mail.zope.org/pipermail/zodb-dev/2012-January/014526.html

The exception that comes up, from the zeo server log, is:

2013-01-07T13:01:49 ERROR ZEO.zrpc (14891) Error raised in delayed method
Traceback (most recent call last):
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/StorageServer.py",
line 1377, in run
result = self._method(*self._args)
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/StorageServer.py",
line 343, in _pack_impl
self.storage.pack(time, referencesf)
  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/blob.py", line 796,
in pack
result = unproxied.pack(packtime, referencesf)
  File
"/home/tsa/env/lib/python2.6/site-packages/ZODB/FileStorage/FileStorage.py",
line 1078, in pack
pack_result = self.packer(self, referencesf, stop, gc)
  File
"/home/tsa/env/lib/python2.6/site-packages/ZODB/FileStorage/FileStorage.py",
line 1034, in packer
opos = p.pack()
  File
"/home/tsa/env/lib/python2.6/site-packages/ZODB/FileStorage/fspack.py",
line 397, in pack
self.gc.findReachable()
  File
"/home/tsa/env/lib/python2.6/site-packages/ZODB/FileStorage/fspack.py",
line 190, in findReachable
self.findReachableAtPacktime([z64])
  File
"/home/tsa/env/lib/python2.6/site-packages/ZODB/FileStorage/fspack.py",
line 275, in findReachableAtPacktime
for oid in self.findrefs(pos):
  File
"/home/tsa/env/lib/python2.6/site-packages/ZODB/FileStorage/fspack.py",
line 328, in findrefs
return self.referencesf(self._file.read(dh.plen))
  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/serialize.py", line
630, in referencesf
u.noload()
TypeError: 'NoneType' object does not support item assignment


I'm afraid this doesn't seem to help me figure out what's wrong...

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] zeopack error in zrpc.connection

2013-01-07 Thread Claudiu Saftoiu
I noticed my DB had swelled to 132 gigabytes (as of 3 days ago; it's 160
gigabytes today) and it seems to be because zeopack has started failing:

tsa@sp2772c:~/db$ /home/tsa/env/bin/zeopack -u /home/tsa/db/zeo.sock
Traceback (most recent call last):
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/scripts/zeopack.py",
line 159, in _main
cs.pack(packt, wait=True)
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/ClientStorage.py",
line 916, in pack
return self._server.pack(t, wait)
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/ServerStub.py", line
155, in pack
self.rpc.call('pack', t, wait)
  File "/home/tsa/env/lib/python2.6/site-packages/ZEO/zrpc/connection.py",
line 730, in call
raise inst # error raised by server
TypeError: 'NoneType' object does not support item assignment
Error:
Error packing storage 1 in '/home/tsa/db/zeo.sock'


The runzeo log is:

2013-01-07T11:29:11 INFO ZEO.StorageServer new connection :

--
2013-01-07T11:29:11 INFO ZEO.zrpc.Connection(S) () received handshake
'Z3101'
--
2013-01-07T11:29:11 INFO ZEO.StorageServer pack(time=1357576151.4019079)
started...
--
2013-01-07T11:55:37 ERROR ZEO.zrpc (8174) Error raised in delayed method
None
--
2013-01-07T11:55:37 INFO ZEO.StorageServer disconnected


The zeo.conf is:

%define INSTANCE .


  address $INSTANCE/zeo.sock
  read-only false
  invalidation-queue-size 100
  pid-filename $INSTANCE/zeo.pid



  
path $INSTANCE/Data.fs
  
  blob-dir $INSTANCE/blobs



I tried shutting down the server that uses the database and re-running
zeopack, but the same thing happened.

I also tried re-starting 'runzeo' and re-running the pack (with the server
still off), and the same thing happened yet again.

How do I go about fixing this? Let me know if I can provide any other
information that would be helpful.

Thanks in advance,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] repoze.catalog.query very slow

2013-01-03 Thread Claudiu Saftoiu
Hello all,

Am I doing something wrong with my queries, or is repoze.catalog.query very
slow?

I have a `Catalog` with ~320,000 objects and 17 `CatalogFieldIndex`es. All
the objects are indexed and up to date. This is the query I ran (field
names renamed):

And(InRange('float_field', 0.01, 0.04),
InRange('datetime_field', seven_days_ago, today),
Eq('str1', str1),
Eq('str2', str2),
Eq('str3', str3),
Eq('str4', str4))

It returned 15 results so it's not a large result set by any means. The
strings are like labels - there are <20 things any one of the string fields
can be.

This query took a few minutes to run the first time. Re-running it again in
the same session took <1 second each time. When I restarted the session it
took only 30 seconds, and again 1 second each subsequent time.

What makes it run so slow? Is it that the catalog isn't fully in memory? If
so, is there any way I can guarantee the catalog will be in memory given
that my entire database doesn't fit in memory all at once?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Is any function called when an object is loaded from the database?

2012-06-19 Thread Claudiu Saftoiu
>
> You'll need to override ``__new__``. That's your hook. It's called
> when the database instantiates the object. Note that this is always
> true for Python. The ``__new__`` method is always called before an
> object is instantiated.
>

Actually, this doesn't seem to be what I want. ``__new__`` is called
*before* any attributes are set on the instance... so it's too early
to tell whether the instance is missing the attribute, as it certainly
will (since it's missing any attributes).

Is there any hook to call *after* the instance attributes get set/loaded
from the database?

Here is my code that didn't work, in case I'm just doing something silly:

class Line(Persistent):
def __new__(cls, *args, **kwargs):
inst = super(Line, cls).__new__(cls, *args, **kwargs)

try:
inst.id #every instance in the DB already has an 'id'
except AttributeError:
return inst
print 'we are here...' #this is never printed

try:
inst.expired  #the thing i actually want to guarantee
except AttributeError:
inst.expired = False

return inst

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Is any function called when an object is loaded from the database?

2012-06-19 Thread Claudiu Saftoiu
On Tue, Jun 19, 2012 at 2:29 PM, Malthe Borch  wrote:

> On 19 June 2012 19:54, Claudiu Saftoiu  wrote:
> > That is, a function called whenever the object is loaded, that does all
> the
> > necessary backwards-compatibility
> > work right there. It separates the backwards-compat code cleanly, and
> also
> > only updates the objects
> > as-needed... though still a minor performance hit as it does the check
> each
> > time the object is loaded.
> >
> > Is there a way to do that last option? What's the best practice for this
> > sort of thing, in general?
>
> You'll need to override ``__new__``. That's your hook. It's called
> when the database instantiates the object. Note that this is always
> true for Python. The ``__new__`` method is always called before an
> object is instantiated.
>

Thanks, I wasn't aware. Seems to work - much appreciated!

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] How to cause commit with non-PersistentDict?

2012-06-19 Thread Claudiu Saftoiu
Hello all,

Say I have a:

class Foo(Persistent):
def __init__(self, bar):
self.my_dict = PersistentDict({'keyis': bar})
def update_it(self, bar):
self.my_dict['keyis'] = bar

If I want to use a `dict`, instead (it seems it might be faster for my
larger example), how would I cause a change to the dict to
be committed? Is there any way other than this?

class Foo(Persistent):
def __init__(self, bar):
self.my_dict = {'keyis': bar}
def update_it(self, bar):
self.my_dict['keyis'] = bar
self.my_dict = dict(self.my_dict)

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Is any function called when an object is loaded from the database?

2012-06-19 Thread Claudiu Saftoiu
Hello all,

It is often the case that I have a Persistent object that evolves over
time. E.g., it might start off as this:

class Foo(Persistent):
def __init__(self, a):
self.a = a
def calc_it(self, b):
return expensive_function(a, b)

I'll then have a few hundred Foos in my database. Then I'll want to modify
the object, for example to cache the previous calculation.
That is, I want this to be the case, now:

class Foo(Persistent):
def __init__(self, a):
self.a = a
self.b_cache = PersistentDict()
def calc_it(self, b):
if b in self.b_cache: return self.b_cache[b]
res = expensive_function(a, b)
self.b_cache[b] = res
return res

However, this won't work with existing Foo objects, as they won't have the
`b_cache` attribute. Thus I have two options. One is
to make the modifications backwards-compatible, e.g.:

class Foo(Persistent):
def __init__(self, a):
self.a = a
self.b_cache = PersistentDict()
def calc_it(self, b):
if not hasattr(self, 'b_cache'): self.b_cache = PersistentDict()

if b in self.b_cache: return self.b_cache[b]
res = expensive_function(a, b)
self.b_cache[b] = res
return res

The other is to go through the database and add 'b_cache' to all the
existing objects. Neither of these is really appealing.
The former is OK, but if I have multiple functions that want to use the new
functionality I'll have to have the code all
over, and it won't be obviously separated. The latter is rather annoying as
I have to figure out wherever I have Foos and
write throwaway code to change them all.

Ideally I could do something like this:

class Foo(Persistent):
def __init__(self, a):
self.a = a
self.b_cache = PersistentDict()

def __just_loaded__(self):
if not hasattr(self, 'b_cache'): self.b_cache = PersistentDict()

def calc_it(self, b):

if b in self.b_cache: return self.b_cache[b]
res = expensive_function(a, b)
self.b_cache[b] = res
return res

That is, a function called whenever the object is loaded, that does all the
necessary backwards-compatibility
work right there. It separates the backwards-compat code cleanly, and also
only updates the objects
as-needed... though still a minor performance hit as it does the check each
time the object is loaded.

Is there a way to do that last option? What's the best practice for this
sort of thing, in general?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] all webserver threads blocking on db.open()

2012-05-07 Thread Claudiu Saftoiu
On Mon, May 7, 2012 at 12:13 PM, Hanno Schlichting wrote:

> I think you might get better help on one of the Pyramid support channels.
>
> Your problems all seem to be related to configuring a web server in
> production mode, rather than database issues.
>

Thanks, I'll check out the Repoze.BFG IRC channel as well.

>From what I can tell you are dealing with hung requests. I'd look at
> either the paster configuration options for anything related to
> timeouts, thread pools, handling of incomplete requests and so on. Or
> use a more production quality web server like Apache (mod_wsgi), Nginx
> (gevent/gunicon) which likely has better default configuration values
> for these things.
>

The problem from my previous email was indeed hung requests. However, the
stack trace for those looked different:

Thread 140605868680960:
  File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 878, in worker_thread_callback
runnable()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 1052, in 
lambda: self.process_request_in_thread(request, client_address))
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 1068, in process_request_in_thread
self.finish_request(request, client_address)
  File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.6/SocketServer.py", line 617, in __init__
self.handle()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 442, in handle
BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
self.handle_one_request()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 437, in handle_one_request
self.wsgi_execute()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 287, in wsgi_execute
self.wsgi_start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/connector.py",
line 21, in __call__
result = self.next_app(environ, start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/cachecleanup.py",
line 25, in __call__
return self.next_app(environ, start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/retry/__init__.py", line
65, in __call__
chunk = original_wsgi_input.read(rest)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 474, in read
data = self.file.read(length)
  File "/usr/lib/python2.6/socket.py", line 377, in read
data = self._sock.recv(left)

Note the line it blocks on is "self._sock.recv(left)", well after the
response started.
In the trace I just provided, the block was on opening the DB connection
*at the start of the request*:

  ...
*  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 287, in wsgi_execute*
*self.wsgi_start_response)*
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/connector.py",
line 18, in __call__
conn = self.db.open()
*  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/DB.py", line 729,
in open*
*self._a()*
  File "/usr/lib/python2.6/threading.py", line 123, in acquire
rc = self.__block.acquire(blocking)

Why would the database start blocking on opening a new database connection?
The
issue does indeed seem to be with ZODB.

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] all webserver threads blocking on db.open()

2012-05-07 Thread Claudiu Saftoiu
Hello all,

I'm using Repoze.BFG, with paster to launch the webserver. This is a
similar issue to the one I emailed about before titled
"server stops handling requests - nowhere near 100% CPU or Memory
used"

The situation is the same. I used z3c.deadlockdebugger , and what
I notice is that, when the server is blocked, there are about 100
threads running (as opposed to 15 or so when the server has
just started), and all their stack traces look like this:

Thread 140269004887808:
  File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 878, in worker_thread_callback
runnable()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 1052, in 
lambda: self.process_request_in_thread(request, client_address))
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 1068, in process_request_in_thread
self.finish_request(request, client_address)
  File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.6/SocketServer.py", line 617, in __init__
self.handle()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 442, in handle
BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
self.handle_one_request()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 437, in handle_one_request
self.wsgi_execute()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 287, in wsgi_execute
self.wsgi_start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/connector.py",
line 18, in __call__
conn = self.db.open()
  File "/home/tsa/env/lib/python2.6/site-packages/ZODB/DB.py", line 729, in
open
self._a()
  File "/usr/lib/python2.6/threading.py", line 123, in acquire
rc = self.__block.acquire(blocking)

The server gets to a blocked state every 24 hours or so. Simply restarting
the webserver works fine; however, i'd like to know what the problem is so
restarting won't be necessary, and to prevent it from getting worse. Any
ideas/
suggestions?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB-Dev Digest, Vol 110, Issue 1

2012-05-03 Thread Claudiu Saftoiu
> > However, the thread has to start a new transaction each time it
>  > processes something - which I know how to do:
>  >
>  > while True:
>  > #wait until asked to do something
>  > import transaction
>  > transaction.begin()
>  >
>  > However, the thread needs access to the root object in order to turn
>  > the identifiers gotten from the requests into persistent objects...
>  > how would I go about accessing the root object in such a circumstance?
>
>  You need to pass the database object to the thread, and the thread needs
>  to open a connection (connection = db.open()).  Then connection.root()
>  will give you the root object (or you could pass OIDs to the thread and
>  use connection.get(oid) to find the objects you need to work with).

That makes sense. I wasn't sure where to get a db object from so here's what
I ended up doing. Let me know if there's a better way.

I use paster to run the webserver, so in run.py I now have:

GLOBAL_CONFIG = [None]
def app(global_config, **settings):
GLOBAL_CONFIG[0] = global_config
#...

Now in the thread I start to do the processing I have:

def proc_f(self, i):
from repoze.zodbconn.uri import db_from_uri
import run
global_config = run.GLOBAL_CONFIG[0]
uri = global_config['zodb_uri']
db = db_from_uri(uri)

conn = db.open()
app_root = conn.root()['app_root']

while True:
#wait for request...

t = conn.transaction_manager.begin()
#do a bunch of read-only processing, queue results
conn.abort(t)

> Don't forget to commit or abort the transaction, and also don't forget
> that you may need to implement some kind of retry logic if commit()
> raises a ConflictError due to conflicting updates.

I figure since I never write anything to the database in my worker thread I
can
always abort the transaction.

This all seems to work. Anything horribly wrong I'm doing?

Thanks muchly,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] how to get root obj from a new transaction?

2012-05-02 Thread Claudiu Saftoiu
Hey all,

I'm using a thread to do some server-side work. The thread will be asked by
different
requests to do the same thing at the same time, so I want the thread to do
all the work
once and return the data to the requests. The problem is that the requests
each have
their own transaction and the thread essentially has none.

I can communicate between the two only using identifiers - not persistent
objects - that
way the thread can process data in a different transaction than the
requests yet still
return a meaningful reply. However, the thread has to start a new
transaction each
time it processes something - which I know how to do:

while True:
#wait until asked to do something
import transaction
transaction.begin()

However, the thread needs access to the root object in order to turn the
identifiers
gotten from the requests into persistent objects... how would I go about
accessing
the root object in such a circumstance?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] server stops handling requests - nowhere near 100% CPU or Memory used

2012-04-20 Thread Claudiu Saftoiu
>
> Ah, so it seems that, as I leave the server running longer & longer,
> more & more threads are taken up with a `.recv()` call. I think one of
> my clients opens requests and does not read them/close them. Eventually
> all the threads are blocking in that fashion.
>
> I will fix my clients. But, is there a server-side fix to this (again,
> using Repoze.BFG)?
> Something to time out the connection after 60 seconds or so if nothing has
> happened?
>

Additional info: in particular the blocked threads' stack dumps look like
this:

Thread 140605868680960:
  File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 878, in worker_thread_callback
runnable()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 1052, in 
lambda: self.process_request_in_thread(request, client_address))
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 1068, in process_request_in_thread
self.finish_request(request, client_address)
  File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.6/SocketServer.py", line 617, in __init__
self.handle()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 442, in handle
BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
self.handle_one_request()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 437, in handle_one_request
self.wsgi_execute()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 287, in wsgi_execute
self.wsgi_start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/connector.py",
line 21, in __call__
result = self.next_app(environ, start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/cachecleanup.py",
line 25, in __call__
return self.next_app(environ, start_response)
  File
"/home/tsa/env/lib/python2.6/site-packages/repoze/retry/__init__.py", line
65, in __call__
chunk = original_wsgi_input.read(rest)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
line 474, in read
data = self.file.read(length)
  File "/usr/lib/python2.6/socket.py", line 377, in read
data = self._sock.recv(left)

Was my assessment of the situation accurate?
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] server stops handling requests - nowhere near 100% CPU or Memory used

2012-04-20 Thread Claudiu Saftoiu
>
> I don't think that is_alive would be the cause of this, it looks like
> a simple enough view. But if your server uses a thread pool, and all
> the other threads are now occupied by something that got locked up,
> then it could be that the server is not answering your is_alive
> request at all because it is waiting for a thread to free up first.
>

Ah, so it seems that, as I leave the server running longer & longer,
more & more threads are taken up with a `.recv()` call. I think one of
my clients opens requests and does not read them/close them. Eventually
all the threads are blocking in that fashion.

I will fix my clients. But, is there a server-side fix to this (again,
using Repoze.BFG)?
Something to time out the connection after 60 seconds or so if nothing has
happened?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] server stops handling requests - nowhere near 100% CPU or Memory used

2012-04-19 Thread Claudiu Saftoiu
>
> I don't know, anything could lock up your site if something is waiting
> for a lock, for example.
>
> Use http://pypi.python.org/pypi/z3c.deadlockdebugger to figure out
> what the threads are doing at this time. Preferably, trigger the
> dump_threads() method of that module on SIGUSR1, like the Zope
> signalstack product does (see
>
> http://svn.plone.org/svn/collective/Products.signalstack/trunk/Products/signalstack/__init__.py
> for the exact code to bind the signal handler). That'll tell you
> exactly what each thread is busy with when you send the signal.
>

That module seems to be just the trick for seeing if I have
a deadlock issue. I set up the SIGUSR1 thing, but whenever I would
send the signal to the server, it would stop, saying something about
system call interrupting the 'select', so I just made it output the contents
of dump_threads() to a file once a minute. If it happens again I'll look
at the tail end of the file & see what they're all busy with.

Thanks, seems like this will likely do the trick,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] server stops handling requests - nowhere near 100% CPU or Memory used

2012-04-19 Thread Claudiu Saftoiu
>
> I have no idea; all you told is that you use the ZODB, not what server
> framework you use to register your views. Is this Grok, Bluebream,
> Repoze.BFG, Zope 2 or something else?
>

Ah yes, sorry about that. I'm using Repoze.BFG . Does that help any?

I don't think that is_alive would be the cause of this, it looks like
> a simple enough view. But if your server uses a thread pool, and all
> the other threads are now occupied by something that got locked up,
> then it could be that the server is not answering your is_alive
> request at all because it is waiting for a thread to free up first.
>

Yea, I don't think it is the 'is_alive'. I more mentioned that to help
diagnose the problem, i.e., whatever the problem is, it also
affects simple views that don't touch the database at all.

Hmm, it might be the thread pool issue that you mention. That seems
to make sense. I'll have to see if I have any views that are never
finishing. Thanks
for the pointer.

Thanks again,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] server stops handling requests - nowhere near 100% CPU or Memory used

2012-04-19 Thread Claudiu Saftoiu
On Thu, Apr 19, 2012 at 11:33 AM, Martijn Pieters  wrote:

> On Thu, Apr 19, 2012 at 17:20, Claudiu Saftoiu  wrote:
> > My question is: what could possibly be causing the server to 'lock up',
> even
> > on a
> > simple view like 'is_alive', without using any memory or CPU? Is there
> some
> > ZODB
> > resource that might be getting gradually exhausted because I'm not
> handling
> > it properly?
>
> I don't know, anything could lock up your site if something is waiting
> for a lock, for example.
>

Are there locks that could possibly be used for the 'is_alive' function?
Here is the
definition in its entirety.

In 'configure.zcml':
  
in 'views.py':
def is_alive(request):
return True

Whatever the problem is, it causes 'is_alive' to take forever, and the CPU
is not
spinning at 100%, and memory usage is not high. Could this be a lock
problem?

Use http://pypi.python.org/pypi/z3c.deadlockdebugger to figure out
> what the threads are doing at this time. Preferably, trigger the
> dump_threads() method of that module on SIGUSR1, like the Zope
> signalstack product does (see
>
> http://svn.plone.org/svn/collective/Products.signalstack/trunk/Products/signalstack/__init__.py
> for the exact code to bind the signal handler). That'll tell you
> exactly what each thread is busy with when you send the signal.
>

Thanks, I'll check this out.
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] server stops handling requests - nowhere near 100% CPU or Memory used

2012-04-19 Thread Claudiu Saftoiu
Hello all,

I recently made a lot of changes to my ZODB app, and now I'm experiencing
some puzzling behavior.

Normally I have ~15 clients making requests to the server every few
seconds. Some of these
requests commit new data to the database, while others just process
existing data. I also
have a 'heartbeat' client, which access the view called '/is_alive', which
is entirely described
by this function:

def is_alive(request): return True

Once a day or so, I will get a report from the heartbeat client that the
server is down - no heartbeat
has succeeded in the past 10 minutes. Indeed, any URL I go to on the
server, will simply not
load, but take forever - even this '/is_alive' view.

I can still SSH into the server, however. Running 'top', I see that the
server is not taking any
CPU time, and is not taking any large amount of memory. The computer itself
runs
just fine. I can even access the database with the paster shell, make
requests, commit
things, etc., without any exceptional delays. If I CTRL+C the webserver and
restart it,
things immediately work just fine.

Looking at the immediately preceding stdout, I see no hints of errors of
any kind. However,
a printline is executed on the server every time two of those clients
completes a request,
and I notice that, leading up to the current point in the stdout (where the
webserver does not
respond anymore), there are a few "Pipe is broken" messages (which happen
when, e.g.,
I go to a view with a web-browser and close the tab before the page loads),
and I notice
that requests stopped completing from one of the two clients, then from the
other. It seems
a gradual slowdown of some kind, though I'm not entirely certain.

My question is: what could possibly be causing the server to 'lock up',
even on a
simple view like 'is_alive', without using any memory or CPU? Is there some
ZODB
resource that might be getting gradually exhausted because I'm not handling
it properly?

Thanks in advance,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB-Dev Digest, Vol 108, Issue 17

2012-04-05 Thread Claudiu Saftoiu
>
> Hello retry shall be made on a transaction basis.
>
> I would say, that while for the regular_view retry may be handled by a
> middleware or such (the publisher in Zope2 ?), you may want to manage it
> in my_view around your transaction.
>
> in my_view you do transaction commit, so you won't undo the
> slow_no_conflict part if you fail in the second part (if it was not your
> intent you shall use subtransactions).
>
> So you may do something like :
>
> from ZODB.POSException import ConflictError
>
> def my_view(request):
>
> transaction.begin()
> slow_no_conflict()
> transaction.commit()
>
> do_retry = True
> while do_retry:
>try:
>   transaction.begin()
>   fast_yes_conflict(avar)
>   transaction.commit()
>   do_retry = False
>except Exception, e:
>   transaction.abort()
>   do_retry = isinstance(e, ConflictError)
>
>
> Hope this helps !
>

That does help, thanks! Makes a lot of sense, too. Question - will hooks
added with `addAfterCommitHook()` be called
if a transaction is aborted? Also, in what situation is a web request
retried automatically - is it if
the function handling the request raises a ConflictError (e.g. by not
catching a .commit() that fails)?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] How does automatic retrying work?

2012-03-29 Thread Claudiu Saftoiu
Hello all,

I have an HTTP request which has roughly two parts to it. One part
requires a lot of processing with a small chance of a DatabaseConflict,
while the other part requires little processing with a high chance of
a DatabaseConflict. I'd like to do something like this:

def my_view(request):
transaction.begin()
slow_no_conflict()
transaction.commit()

for avar in something:
transaction.begin()
fast_yes_conflict(avar)
transaction.commit()

My question is: how will automatic retrying work? Most of my views are
simply:

def regular_view(request):
do_stuff()

and, if something conflicts, the whole thing is just retried. In `my_view`,
what will happen if the `slow_no_conflict()` function has a conflict?
What will happen if the `fast_yes_conflict(avar)` function has a conflict?
What if its in the first iteration of the loop, or the last? I'm not quite
sure
how to properly think about these things yet.

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] database conflict fail

2012-03-22 Thread Claudiu Saftoiu
On Thu, Mar 22, 2012 at 5:04 PM, Vincent Pelletier wrote:

> Le jeudi 22 mars 2012 21:13:34, Claudiu Saftoiu a écrit :
> > In [14]: root._p_jar[0x139c35]
>
> Actually, you want to write:
>  root._p_jar['\x00\x00\x00\x00\x00\x13\x9c\x35']
> ie, OIDs are 8-byte binary strings.
>

Ah, great, thanks! Now I actually have some idea why these conflicts are
occurring and
where.

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] database conflict fail

2012-03-22 Thread Claudiu Saftoiu
>
> Stop!  Before you go down that road, application-level conflict resolution
> if an extremely advanced feature.  It's on the same level as meta classes
> and custom import hooks in a Python.  It's a last resort (or maybe the one
> after that).
>
> The first thing you need to ask yourself is why your application wants to
> update the same value from multiple threads.  This is a hot spot.
> Hot spots: bad.  Most applications will need to update the same value from
> multiple threads occasionally.  Occasionally retrying is fine.
>
> If your application wants to update the same value from multiple
> threads often, then that's a design problem you should solve.
>

Thanks for the advice, Jim. I was going for an all-or-nothing approach
(leave
all conflicts in, or try to eradicate all of them), but it does seem like
some of
them are quite easy to resolve, and others don't matter at all if they
happen
rarely. I'll take care to see if there are any other solutions, and that
it's
important, before going to the application-level conflict resolution.

- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] database conflict fail

2012-03-22 Thread Claudiu Saftoiu
On Thu, Mar 22, 2012 at 2:45 PM, Vincent Pelletier wrote:

> Le jeudi 22 mars 2012 18:23:47, Claudiu Saftoiu a écrit :
> > Ahh, now that looks promising. Are there any particularly good places to
> > get documentation on that sort of thing? All I see when I google are
> > mailing list archives.
>
> See ZODB/ConflictResolution.txt . Disclaimer: I didn't read it. I read the
> code - but I expect the text file to be easier to assimilate :) .
> BTrees/Length.py:Length class implements a simple _p_resolveConflict (yours
> will be even simpler).
>

Thanks, I will take a look.


> There should be some details in the exceptoin itself. Like, oids, currently
> commited TID and the TID transaction started with. root()._p_jar[the_oid]
> will
> get you the object. Then the hard part starts: guess where in the object
> tree
> that object is. If you have varied classes, and significant data on your
> persistent instances, it will be easy.
>

Ok, I just got a ConflictError:

ConflictError: database conflict error (oid 0x139c35, class
BTrees.OOBTree.OOBucket, serial this txn started with 0x03954ed053c0ff88
2012-03-22 16:48:19.629820, serial currently committed 0x03954f996d61c944
2012-03-22 20:09:25.636401)

in my paster shell I do:

In [14]: root._p_jar[0x139c35]

However, this causes:

In [14]: root._p_jar[0x139c35]
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (85, 0))

---
TypeError Traceback (most recent call last)

/home/tsa/sports/ in ()

/home/tsa/env/lib/python2.6/site-packages/ZODB/Connection.pyc in get(self,
oid)
246 return obj
247
--> 248 p, serial = self._storage.load(oid, '')
249 obj = self._reader.getGhost(p)
250

/home/tsa/env/lib/python2.6/site-packages/ZEO/ClientStorage.pyc in
load(self, oid, version)
813 self._lock.acquire()# for atomic processing of
invalidations
814 try:
--> 815 t = self._cache.load(oid)
816 if t:
817 return t

/home/tsa/env/lib/python2.6/site-packages/ZEO/cache.pyc in call(*args, **kw)
141 inst._lock.acquire()
142 try:
--> 143 return self.func(inst, *args, **kw)
144 finally:
145 inst._lock.release()

/home/tsa/env/lib/python2.6/site-packages/ZEO/cache.pyc in load(self, oid)
487 @locked
488 def load(self, oid):
--> 489 ofs = self.current.get(oid)
490 if ofs is None:
491 self._trace(0x20, oid)

/home/tsa/env/lib/python2.6/site-packages/ZODB/fsIndex.pyc in get(self,
key, default)
123
124 def get(self, key, default=None):
--> 125 tree = self._data.get(key[:6], default)
126 if tree is default:
127 return default

TypeError: 'int' object is unsubscriptable

What am I doing wrong?

Thanks again,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] database conflict fail

2012-03-22 Thread Claudiu Saftoiu
On Thu, Mar 22, 2012 at 5:29 AM, Vincent Pelletier wrote:

> Le Wed, 21 Mar 2012 21:04:20 -0400,
> Claudiu Saftoiu  a écrit :
> > I definitely want to keep the latest update.
>
> Then, if the change alters just one persistent object, you can write a
> conflict resolution method on the class of that object
> (_p_resolveConflict). Note, through, that this only means you will keep
> the latest in commit order, not in transaction-begin order.
>

Ahh, now that looks promising. Are there any particularly good places to
get documentation on that sort of thing? All I see when I google are mailing
list archives.

Also: is there any easier way to see which objects had a conflict when a
ConflictError is raised? Currently I am doing a binary-search via commenting
code, but I figure there must be a better way...

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] database conflict fail

2012-03-21 Thread Claudiu Saftoiu
>
> You shoudn't need to be building your own locking system on top of
> ZODB, this suggests that either ZODB is the wrong tool for your
> problem, or that you're using it wrong ;)
>
> ZODB has MVCC which means you get a consistent view of the database
> from the time the transaction was begun (see
> http://en.wikipedia.org/wiki/Snapshot_isolation.) In a web application
> using ZODB, each new request begins a transaction and at the end of
> the request the transaction is committed. If that commit raises a
> conflict error then the request is retried (usually up to three
> times.) Perhaps you could build on top of one of the existing
> frameworks that has this behaviour built in, e.g.
> http://pyramid.readthedocs.org/en/1.3-branch/tutorials/wiki/index.html


Aye that's the crux of it. If I allow the retries, my app gets too slow.
Instead of completing a request that takes 200ms, then retrying it after
a conflict happens and spending another 200ms on it, I'd rather wait
50ms until the previous transaction finishes, and then go ahead and
do the request, knowing no conflict will occur.

It's difficult to offer any useful advice without knowing what you are
> trying to achieve here. Certain data structures (mostly those in the
> BTrees package) have conflict resolution built in. For instance, you
> can concurrently modify multiple values in the dict like
> BTrees.OOBTree.OOBTree (or any of the other variants) so long as each
> concurrent transaction is modifying different keys.
>

I have begun using OOBTree instead of PersistentDicts, and they took
care of most of my conflict issues - namely, the ones where different
keys of the data structure (not related to each other) were being
updated. That was the majority of my problem, actually, and that's solved,
now.

The issue now is when the same key in the same data structure is being
updated from two different locations. I definitely want to keep the latest
update. It would also be acceptable to abort the earlier transaction in
favor of letting the later one take place. However, I don't want to do
both, have one fail, and re-do one (3 total). It's just too slow and I don't
see it as necessary.

Does that help any?

Thanks,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] database conflict fail

2012-03-21 Thread Claudiu Saftoiu
Hello ZODB List,

(This is also a stackoverflow question - you might prefer the formatting
there:
http://stackoverflow.com/questions/9810116/zodb-database-conflict-fail )

I have a server, and a client.

A client sends a request. The request has a certain key associated with it,
e.g. `a-1`, `a-2`, `b-1`, `b-4`.

If two requests for the same key come in at once, there will be a conflict
error, as the same data structures are being modified.

I can adapt the client to simply not send two requests of the same key at
once. However, I'd like this system to work with multiple clients, too. It
seems silly to have the clients coordinate what they send to the server.
Instead, I'd like the server to simply block on a request of a certain key
if that key is already being modified, until the other requests with that
same key are done.

To this end, I've created a locking system. At the beginning of the
function on the server, I do:

key = ...
print "Acquiring %s lock..." % (key,)
KEY_LOCKS[key].acquire()
print "%s lock acquired." % (key,)
def after_commit_hook(success):
KEY_LOCKS[key].release()
print "(after %s commit): Released %s lock" % (('failed',
'successful')[success], key)
transaction.get().addAfterCommitHook(after_commit_hook)

where `KEY_LOCKS` is a dict mapping keys to `threading.Lock`s. Afterwards
follows the code that modifies the persistent data structures.

What I assume would happen is that, if a request comes in for a key that's
already being processed, it would block when acquiring the lock. Only when
the earlier request **has already been committed** (thus being beyond any
conflict errors), would the new request resume. The requests do nothing
that would conflict until the lock is acquired.

Most of the requests work fine:

Acquiring a-b lock...
a-b lock acquired.
(after successful commit): Released a-b lock
Acquiring a-c lock...
a-c lock acquired.
(after successful commit): Released a-c lock

However, there is _still_ an issue when the same key is sent, even though
the locking seems to work:

Acquiring q-q lock...
q-q lock acquired.
Acquiring q-q lock...
(after successful commit): Released q-q lock
q-q lock acquired.
(after failed commit): Released q-q lock
repoze.retry retrying, count = 1
Traceback (most recent call last):
...
ConflictError: database conflict error (oid 0x13009b, class
persistent.list.PersistentList)

And then the request retries. Note that the `q-q lock` was only acquired
after the successful commit.

What gives? Why is this system not preventing conflict errors? Where is my
assumption incorrect?

---

If, before the `transaction.get().addAfterCommitHook(after_commit_hook)`
line I put `transaction.begin()`, it works. For the life of me I can't
figure out why. Before the `transaction.begin()` line, the entirety of my
code is:

post = request.params
if not post: return Response("No data!")

data = eval(post['data'])
time_parsed = time.time()
my_app = request.context

This solves my problem but I'm not putting that as an answer 'cause I still
want to know: Why does it give conflict errors if I don't start a fresh
transaction right before?

Thanks all,
- Claudiu
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev