Le mardi 20 mars 2012 13:06:16, Sebastian Wain a écrit : > It is a persistent queue with acknowledgment. The issue I see is the sync > of the BTree between threads. Consumers/Producers get/put elements, one at > a time, so before that operation can take place it will need to sync to > have the most updated version of the BTree in all threads.
Indeed, you need to get a fresh snapshot when looking for an item to process. And you also want that transaction to be as short as possible to reduce the time spent resolving conflicts (ie, rolling back transactions and starting over when an item has already been reserved by a competing transaction), hence probably disconnected from the code actually processing items. This might be possible by providing ZODB.Connection:Connection.open() with a transaction manager different from the threaded one, to use for those sub- transactions. But I've never used this feature (I'm only using ZODB through Zope). The way I see it, it would go like: - threaded TM begin - pop from queue: - other TM begin on a ZODB connection reserved to queue access - pop item - commit (or abort if something went wrong, and re-raise) - process - acknowledge: - other TM bein on a ZODB connection reserved to queue access (can reuse above one) - ack - commit (etc, as above) - threaded TM commit (again, or abort on exception) If processing itself involves other persistent objects (ex: the queue item describes an action to take on another persistent object), two connections would have to be opened on the same database, which can lead to errors when moving objects around (if not careful, an object fetched from a transction will be reused with another, which will raise an exception). Also, using poped item outside its connector's transaction will cost some hair, and will probably need to be mutated to some non-persistent form before leaving that transaction (otherwise, any alteration of it will raise). TL/DR: I don't know how to implement it correctly without actually doing it. > Is this in the context of Python 3.y? because it is a multiprocess Queue or > on Python 2.x? Tested on 2.7. I don't know on 3.x. Verified to *not* occur on pypy 1.7 (although I don't know how it can really fix the issue). The use case is a single process, multi thread app reading a file with many nested structures, each level sending chunks to the higher level. I wished to use ply.yacc to do the parsing so I could easily alter the grammar, but it cannot (with its default API) accept partial inputs, and I need that. So I used queues & threads. I then used a simple queue implementation (no count, no ack), and finally modified ply to work on partial inputs - with great speed improvement, even over the "simpler queue" version. More there: https://github.com/vpelletier/ITI1480A-linux/blob/master/iti1480a/parser.py > There is a way to have a "history free" storage? obviously in the context > of ZODB. AFAIK, relStorage supports this, not sure about others. Also, the mandatory loss of conflict resolution on history-free storages might cause performance regression. I believe a periodic history-prune packing of reasonably-old transactions (compared to processing duration) might turn out to be better, and also readily available with any ZODB back-end. If ran on a big-enough (compared to available RAM and disk speed) ZODB, it will start causing problems, though. Regards, -- Vincent Pelletier _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev