Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread William Ray Wing
On Jun 23, 2014, at 12:26 AM, smur...@gmail.com wrote:

 On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote:
 
 Can you give us some more quantitative idea of your requirements?  How 
 many objects?  How much total data is being stored?  How many queries 
 per second, and what is the acceptable latency for a query?
 
 Not yet, A whole lot, More than fits in memory, That depends.
 
 To explain. The data is a network of diverse related objects. I can keep the 
 most-used objects in memory but not all of them. Indeed, I _need_ to keep 
 them, otherwise this will be too slow, even when using Mongo instead of 
 SQLAlchemy. Which objects are most-used changes over time.
 

Are you sure it won’t fit in memory?  Default server memory configs these days 
tend to start at 128 Gig, and scale to 256 or 384 Gig.

-Bill


 I could work with MongoEngine by judicious hacking (augment DocumentField 
 dereferencing with a local cache), but that leaves the update problem.
 -- 
 https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread Roy Smith
In article mailman.11202.1403534666.18130.python-l...@python.org,
 William Ray Wing w...@mac.com wrote:

 On Jun 23, 2014, at 12:26 AM, smur...@gmail.com wrote:
 
  On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote:
  
  Can you give us some more quantitative idea of your requirements?  How 
  many objects?  How much total data is being stored?  How many queries 
  per second, and what is the acceptable latency for a query?
  
  Not yet, A whole lot, More than fits in memory, That depends.
  
  To explain. The data is a network of diverse related objects. I can keep 
  the most-used objects in memory but not all of them. Indeed, I _need_ to 
  keep them, otherwise this will be too slow, even when using Mongo instead 
  of SQLAlchemy. Which objects are most-used changes over time.
  
 
 Are you sure it won¹t fit in memory?  Default server memory configs these 
 days tend to start at 128 Gig, and scale to 256 or 384 Gig.

I'm not sure what default means, but it's certainly possible to get 
machines with that much RAM.  On the other hand, even the amount of RAM 
on a single machine is not really a limit.  There are very easy to use 
technologies these days (i.e. memcache) which let you build clusters to 
effectively aggregate the physical RAM from multiple machines.  And 
database sharding lets you do a different flavor of memory aggregation.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread Lie Ryan

On 22/06/14 10:46, smur...@gmail.com wrote:


I've been doing this with a classic session-based SQLAlchemy ORM, approach, 
but that ends up way too slow and memory intense, as each thread gets its own copy of 
every object it needs. I don't want that.


If you don't want each thread to have their own copy of the object, 
Don't use thread-scoped session. Use explicit scope instead.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread Matthias Urlichs
Hi,

William Ray Wing:
 Are you sure it won’t fit in memory?  Default server memory configs these 
 days tend to start at 128 Gig, and scale to 256 or 384 Gig.
 
I am not going to buy a new server. I can justify writing a lot of custom
code for that kind of money.

Besides, the time to actually load all the data into memory beforehand
would be prohibitive (so I'd still need a way to load referred data on
demand), and the update problem remains.

-- 
-- Matthias Urlichs
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread smurfix
memcache (or redis or ...) would be an option. However, I'm not going to go 
through the network plus deserialization for every object, that'd be too slow - 
thus I'd still need a local cache - which needs to be invalidated.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread smurfix
On Monday, June 23, 2014 5:54:38 PM UTC+2, Lie Ryan wrote:

 If you don't want each thread to have their own copy of the object, 
 
 Don't use thread-scoped session. Use explicit scope instead.

How would that work when multiple threads traverse the in-memory object 
structure and cause relationships to be loaded?

IIRC sqlalchemy's sessions are not thread safe.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread Lie Ryan

On 23/06/14 19:05, smur...@gmail.com wrote:

On Monday, June 23, 2014 5:54:38 PM UTC+2, Lie Ryan wrote:


If you don't want each thread to have their own copy of the object,

Don't use thread-scoped session. Use explicit scope instead.


How would that work when multiple threads traverse the in-memory object 
structure and cause relationships to be loaded?
IIRC sqlalchemy's sessions are not thread safe.


You're going to have that problem anyway, if it is as you said that your 
problem is that you don't want each thread to have their own copy, then 
you cannot avoid having to deal with concurrent access. Note that 
SQLAlchemy objects can be used from multiple thread as long as it's not 
used concurrently and the underlying DBAPI is thread-safe (not all DBAPI 
supported by SQLAlchemy are thread safe). You can detach/expunge an 
SQLAlchemy object from the session to avoid unexpected loading of 
relationships.


Alternatively, if you are not tied to SQLAlchemy nor SQL-based database, 
then you might want to check out ZODB's ZEO 
(http://www.zodb.org/en/latest/documentation/guide/zeo.html):


 ZEO, Zope Enterprise Objects, extends the ZODB machinery to
 provide access to objects over a network. ... ClientStorage
 aggressively caches objects locally, so in order to avoid
 using stale data the ZEO server sends an invalidation message
 to all the connected ClientStorage instances on every write
 operation. ...  As a result, reads from the database are
 far more frequent than writes, and ZEO is therefore better
 suited for read-intensive applications.

Warning: I had never used ZODB nor ZEO personally.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-22 Thread Roy Smith
In article 85659fdd-511b-4aea-9c4b-17a4bbb88...@googlegroups.com,
 smur...@gmail.com wrote:

 My problem: I have a large database of interconnected objects which I need to 
 process with a combination of short- and long-lived workers. These objects 
 are mostly read-only (i.e. any of them can be changed/marked-as-deleted, but 
 that happens infrequently). The workers may or may not be within one Python 
 process, or even on one system.
 
 I've been doing this with a classic session-based SQLAlchemy ORM, approach, 
 but that ends up way too slow and memory intense, as each thread gets its own 
 copy of every object it needs. I don't want that.
 
 My existing code does object loading and traversal by simple attribute 
 access; I'd like to keep that if at all possible.
 
 Ideally, what I'd like to have is an object server which mediates write 
 access to the database and then sends change/invalidation notices to the 
 workers. (Changes are infrequent enough that I don't care if a worker gets a 
 notice it's not interested in.)
 
 I don't care if updates are applied immediately or are only visible to the 
 local process until committed. I also don't need fancy indexing or query 
 abilities; if necessary I can go to the storage backend for that. (That 
 should be SQL, though a NoSQL back-end would be nice to have.)
 
 Does something like this already exist, somewhere out there, or do I need to 
 write this, or does somebody know of an alternate solution?

If you want to go NoSQL, I think what you're describing is a MongoDB 
replica set (http://docs.mongodb.org/manual/replication/).  One of the 
replicas is the primary, to which all writes are directed.  You can have 
some number of secondaries, which get all the changes applied to the 
primary, and spread out the load for read access.  If you want a vaguely 
SQLAlchemy flavored ORM, there's mongoengine (http://mongoengine.org/).

On the other hand, this may be overkill for what you're trying to do.  
Can you give us some more quantitative idea of your requirements?  How 
many objects?  How much total data is being stored?  How many queries 
per second, and what is the acceptable latency for a query?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-22 Thread smurfix
On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote:

 Can you give us some more quantitative idea of your requirements?  How 
 many objects?  How much total data is being stored?  How many queries 
 per second, and what is the acceptable latency for a query?

Not yet, A whole lot, More than fits in memory, That depends.

To explain. The data is a network of diverse related objects. I can keep the 
most-used objects in memory but not all of them. Indeed, I _need_ to keep them, 
otherwise this will be too slow, even when using Mongo instead of SQLAlchemy. 
Which objects are most-used changes over time.

I could work with MongoEngine by judicious hacking (augment DocumentField 
dereferencing with a local cache), but that leaves the update problem.
-- 
https://mail.python.org/mailman/listinfo/python-list