On Sep 3, 2007, at 9:01 AM, Gustavo Niemeyer wrote: > > Right. In Storm this won't be an issue. When objects get dirty > they are > added to a dictionary which will strong-reference them so that they > are > kept in memory at least up to the next flush/rollback. They > continue to > be in the weakrefd cache even then, and only leave that one when > they die.
I'll tell you why we currently dont have a strong-referencing "dirty" list...its because our session detects "dirty" changes at flush time. While most "dirty" objects are detected in the session using a regular "dirty" flag that was set when an attribute changed (this part could be replaced with a strong-referencing list instead), there are some which are detected by comparing the values of their attributes to that which was loaded from the database. this approach was copied from that of Hibernate and supports "mutable" attribute types, such as a mapped attribute that points to another object which is pickled into a binary database column. If someone changes an attribute on the non-mapped, "pickled" object, that change needs to be detected as well, and the only way to do that is to compare to what was loaded. We only do the comparison operation on datatypes that are known to be "mutable". so even if we do reinstate the strong dirty list and the weakrefed identity map, that case would still remain as a caveat. > >> While we might someday reinstate a strongly-referenced "dirty" >> collection, the basic idea of a strongly referenced identity map is >> generally not a problem for our users; the use case where someone is >> looping through many objects and throwing away as they iterate is >> pretty rare and those folks either expunge the objects explicitly or >> use the "weak referencing" option on their session. > > I see.. but then do they have to make sure by themselves that the > object doesn't die before it gets flushed? well i think the "weak referencing" option is probably not widely used, people just know to expunge/clear objects from the session which they dont need. we went with Hibernate's example in this area as "not that big a deal". > > Did you consider the creation of a more flexible caching system, and > if so, can you tell me why you gave up? (maybe there's something > we can learn from that) We never "gave up", as far as "caching" we've never "begun" that. I dont really consider the Session's identity map to be much of a "cache"; while we do use it as a cache in cases where we need to locate an object by primary key (such as lazy-loading a many-to-one attribute), i would consider a "more flexible" cache to be a second level cache which is a distinct plugin to the whole system, which is configurable with things like cache size, expiration time, expire event handlers, and maybe even having some form of query caching. When you really do "caching", people need fine grained control over the lifespan of objects, which is something I know from all the caching work we did with Myghty and now Pylons. So we dont try to turn the Session into the full "caching" solution, its "cache" is primarily there to maintain identity uniqueness (and we say as much in our docs). and someday, we might tackle a real second level solution that integrates nicely. Currently, people who need this tend to roll their own, or move the caching into a coarser-grained area (which often is the better place for it), such as page caching or "sub-template" caching which is something Mako/Pylons supports. > > I'm actually a bit surprised that people don't seem to bother with > the strong references for the duration of the transaction. > In Landscape, for instance, we have web pages which show up thousands > of objects, and there isn't a good reason to keep the object in > memory after it has been displayed. Our ORM's system of loading objects for a particular query still needs to store the full results of that query in a single in-memory collection; since we support queries which add left outer joins of additional objects to be loaded as part of a collection, we cant just load a row, create an instance for it, then throw it away; the next row might also represent the same instance which needs to be "uniqued" against the total result set (i.e., we have a mini "identity map" that takes place for a single ORM query). Not only that, but the eager loading of collections also means the same object, in rare circumstatnces, can be represented at different levels in the same result; object A might reference B, and also might reference C which *also* references B. While this is another area where I've proposed that we could add options to not maintain a local "uniqued" set of instances for a query which doesn't need it and just allow "streaming" of ORM'ed objects, it hasn't been needed and i think folks who display thousands of rows tend to just use non-ORM result sets, which of course dont have any of these requirements. Though as it turns out, DBAPIs like psycopg2 already buffer all the rows of a result set by default so theres a lot more "load it all into memory" going on than people might think anyway. More commonly, people who are representing thousands of objects will only be displaying a subset of those on a single page, and only need to load a range of objects, and our "eager loading" does support the usage of LIMIT and OFFSET in such a way that you limit the "primary" entities but still get the full list of "collection entities" associated with them. This is another area where we've looked at Hibernate, seen that theres no problem with their "non-streamed" approach, so for now its "good enough", with the door open to improve upon it if needed. . -- storm mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/storm
