Re: Python ORM library for distributed mostly-read-only objects?
On Jun 23, 2014, at 12:26 AM, smur...@gmail.com wrote: On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote: Can you give us some more quantitative idea of your requirements? How many objects? How much total data is being stored? How many queries per second, and what is the acceptable latency for a query? Not yet, A whole lot, More than fits in memory, That depends. To explain. The data is a network of diverse related objects. I can keep the most-used objects in memory but not all of them. Indeed, I _need_ to keep them, otherwise this will be too slow, even when using Mongo instead of SQLAlchemy. Which objects are most-used changes over time. Are you sure it won’t fit in memory? Default server memory configs these days tend to start at 128 Gig, and scale to 256 or 384 Gig. -Bill I could work with MongoEngine by judicious hacking (augment DocumentField dereferencing with a local cache), but that leaves the update problem. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
In article mailman.11202.1403534666.18130.python-l...@python.org, William Ray Wing w...@mac.com wrote: On Jun 23, 2014, at 12:26 AM, smur...@gmail.com wrote: On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote: Can you give us some more quantitative idea of your requirements? How many objects? How much total data is being stored? How many queries per second, and what is the acceptable latency for a query? Not yet, A whole lot, More than fits in memory, That depends. To explain. The data is a network of diverse related objects. I can keep the most-used objects in memory but not all of them. Indeed, I _need_ to keep them, otherwise this will be too slow, even when using Mongo instead of SQLAlchemy. Which objects are most-used changes over time. Are you sure it won¹t fit in memory? Default server memory configs these days tend to start at 128 Gig, and scale to 256 or 384 Gig. I'm not sure what default means, but it's certainly possible to get machines with that much RAM. On the other hand, even the amount of RAM on a single machine is not really a limit. There are very easy to use technologies these days (i.e. memcache) which let you build clusters to effectively aggregate the physical RAM from multiple machines. And database sharding lets you do a different flavor of memory aggregation. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
On 22/06/14 10:46, smur...@gmail.com wrote: I've been doing this with a classic session-based SQLAlchemy ORM, approach, but that ends up way too slow and memory intense, as each thread gets its own copy of every object it needs. I don't want that. If you don't want each thread to have their own copy of the object, Don't use thread-scoped session. Use explicit scope instead. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
Hi, William Ray Wing: Are you sure it won’t fit in memory? Default server memory configs these days tend to start at 128 Gig, and scale to 256 or 384 Gig. I am not going to buy a new server. I can justify writing a lot of custom code for that kind of money. Besides, the time to actually load all the data into memory beforehand would be prohibitive (so I'd still need a way to load referred data on demand), and the update problem remains. -- -- Matthias Urlichs -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
memcache (or redis or ...) would be an option. However, I'm not going to go through the network plus deserialization for every object, that'd be too slow - thus I'd still need a local cache - which needs to be invalidated. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
On Monday, June 23, 2014 5:54:38 PM UTC+2, Lie Ryan wrote: If you don't want each thread to have their own copy of the object, Don't use thread-scoped session. Use explicit scope instead. How would that work when multiple threads traverse the in-memory object structure and cause relationships to be loaded? IIRC sqlalchemy's sessions are not thread safe. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
On 23/06/14 19:05, smur...@gmail.com wrote: On Monday, June 23, 2014 5:54:38 PM UTC+2, Lie Ryan wrote: If you don't want each thread to have their own copy of the object, Don't use thread-scoped session. Use explicit scope instead. How would that work when multiple threads traverse the in-memory object structure and cause relationships to be loaded? IIRC sqlalchemy's sessions are not thread safe. You're going to have that problem anyway, if it is as you said that your problem is that you don't want each thread to have their own copy, then you cannot avoid having to deal with concurrent access. Note that SQLAlchemy objects can be used from multiple thread as long as it's not used concurrently and the underlying DBAPI is thread-safe (not all DBAPI supported by SQLAlchemy are thread safe). You can detach/expunge an SQLAlchemy object from the session to avoid unexpected loading of relationships. Alternatively, if you are not tied to SQLAlchemy nor SQL-based database, then you might want to check out ZODB's ZEO (http://www.zodb.org/en/latest/documentation/guide/zeo.html): ZEO, Zope Enterprise Objects, extends the ZODB machinery to provide access to objects over a network. ... ClientStorage aggressively caches objects locally, so in order to avoid using stale data the ZEO server sends an invalidation message to all the connected ClientStorage instances on every write operation. ... As a result, reads from the database are far more frequent than writes, and ZEO is therefore better suited for read-intensive applications. Warning: I had never used ZODB nor ZEO personally. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
In article 85659fdd-511b-4aea-9c4b-17a4bbb88...@googlegroups.com, smur...@gmail.com wrote: My problem: I have a large database of interconnected objects which I need to process with a combination of short- and long-lived workers. These objects are mostly read-only (i.e. any of them can be changed/marked-as-deleted, but that happens infrequently). The workers may or may not be within one Python process, or even on one system. I've been doing this with a classic session-based SQLAlchemy ORM, approach, but that ends up way too slow and memory intense, as each thread gets its own copy of every object it needs. I don't want that. My existing code does object loading and traversal by simple attribute access; I'd like to keep that if at all possible. Ideally, what I'd like to have is an object server which mediates write access to the database and then sends change/invalidation notices to the workers. (Changes are infrequent enough that I don't care if a worker gets a notice it's not interested in.) I don't care if updates are applied immediately or are only visible to the local process until committed. I also don't need fancy indexing or query abilities; if necessary I can go to the storage backend for that. (That should be SQL, though a NoSQL back-end would be nice to have.) Does something like this already exist, somewhere out there, or do I need to write this, or does somebody know of an alternate solution? If you want to go NoSQL, I think what you're describing is a MongoDB replica set (http://docs.mongodb.org/manual/replication/). One of the replicas is the primary, to which all writes are directed. You can have some number of secondaries, which get all the changes applied to the primary, and spread out the load for read access. If you want a vaguely SQLAlchemy flavored ORM, there's mongoengine (http://mongoengine.org/). On the other hand, this may be overkill for what you're trying to do. Can you give us some more quantitative idea of your requirements? How many objects? How much total data is being stored? How many queries per second, and what is the acceptable latency for a query? -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote: Can you give us some more quantitative idea of your requirements? How many objects? How much total data is being stored? How many queries per second, and what is the acceptable latency for a query? Not yet, A whole lot, More than fits in memory, That depends. To explain. The data is a network of diverse related objects. I can keep the most-used objects in memory but not all of them. Indeed, I _need_ to keep them, otherwise this will be too slow, even when using Mongo instead of SQLAlchemy. Which objects are most-used changes over time. I could work with MongoEngine by judicious hacking (augment DocumentField dereferencing with a local cache), but that leaves the update problem. -- https://mail.python.org/mailman/listinfo/python-list