Le mardi 20 mars 2012 13:06:16, Sebastian Wain a écrit :
> It is a persistent queue with acknowledgment. The issue I see is the sync
> of the BTree between threads. Consumers/Producers get/put elements, one at
> a time, so before that operation can take place it will need to sync to
> have the most updated version of the BTree in all threads.
Indeed, you need to get a fresh snapshot when looking for an item to process.
And you also want that transaction to be as short as possible to reduce the
time spent resolving conflicts (ie, rolling back transactions and starting
over when an item has already been reserved by a competing transaction), hence
probably disconnected from the code actually processing items.
This might be possible by providing ZODB.Connection:Connection.open() with a
transaction manager different from the threaded one, to use for those sub-
transactions. But I've never used this feature (I'm only using ZODB through
The way I see it, it would go like:
- threaded TM begin
- pop from queue:
- other TM begin on a ZODB connection reserved to queue access
- pop item
- commit (or abort if something went wrong, and re-raise)
- other TM bein on a ZODB connection reserved to queue access (can reuse
- commit (etc, as above)
- threaded TM commit (again, or abort on exception)
If processing itself involves other persistent objects (ex: the queue item
describes an action to take on another persistent object), two connections
would have to be opened on the same database, which can lead to errors when
moving objects around (if not careful, an object fetched from a transction
will be reused with another, which will raise an exception).
Also, using poped item outside its connector's transaction will cost some
hair, and will probably need to be mutated to some non-persistent form before
leaving that transaction (otherwise, any alteration of it will raise).
TL/DR: I don't know how to implement it correctly without actually doing it.
> Is this in the context of Python 3.y? because it is a multiprocess Queue or
> on Python 2.x?
Tested on 2.7. I don't know on 3.x. Verified to *not* occur on pypy 1.7
(although I don't know how it can really fix the issue).
The use case is a single process, multi thread app reading a file with many
nested structures, each level sending chunks to the higher level. I wished to
use ply.yacc to do the parsing so I could easily alter the grammar, but it
cannot (with its default API) accept partial inputs, and I need that. So I
used queues & threads. I then used a simple queue implementation (no count, no
ack), and finally modified ply to work on partial inputs - with great speed
improvement, even over the "simpler queue" version.
> There is a way to have a "history free" storage? obviously in the context
> of ZODB.
AFAIK, relStorage supports this, not sure about others.
Also, the mandatory loss of conflict resolution on history-free storages might
cause performance regression. I believe a periodic history-prune packing of
reasonably-old transactions (compared to processing duration) might turn out
to be better, and also readily available with any ZODB back-end. If ran on a
big-enough (compared to available RAM and disk speed) ZODB, it will start
causing problems, though.
For more information about ZODB, see http://zodb.org/
ZODB-Dev mailing list - ZODB-Dev@zope.org