On 2009-10-16 12.25, Niclas Hedhman wrote:
Isn't this the equivalent of "Loading Policy" in JDO/JPA ??
Pretty much, yes I think so. Except aren't those on structural rather than usecase level? I.e. regardless of usecase "if you load property1 always load property2,property3 at the same time".
I think this is a "per EntityStore type issue" more than it is a client code issue, although the client code will have a huge influence which loading strategy is the best for a given EntityStore.
Agree.
So, I tend to lean towards introducing "Loading Policy" in the MapES first, and perhaps even my favorite approach of 'self learning' against a use-case, i.e. the ES will internally keep track of which properties (and possibly associations) that a particular use-case uses, and pre-load those upon a request.
That's an interesting idea, and it's probably the most easy to use and with best results. So basically the client developer doesn't have to care.
Another issue that I can see is that of performance in for instance JDBM... I would assume that making many key lookups are relatively expensive, and we would need to look at "Lookup/Size speed ratio", i.e. how many bytes larger blob in the single lookup is required cost wise for each new key-lookup?
I think there are two issues: one is the expense per lookup, and one is that with more key->value mappings the database is going to be bigger, so will consume more disk space. The indices will also be bigger and thus slower.
What would be interesting is to have the store work on two levels: one is the identity lookup, and then the next level would be property(/association) lookup. If one index can have only identity lookup and another has the per-property lookup, it should be much faster.
And that is needed at the particular MapES implementation, as the values for JDBM would be dramatically different from a network based. But, you probably realize that reading the Blob is one step and creating the Property instances is another with its own overhead, potentially very substantial, and here it is probably a fixed time per instance, i.e. per first use --> create.
Right, so one thing we can do *today* is to change so that the blob read does not do the property instantiation eagerly. There's no need to as far as I can tell. That on its own will make a huge difference.
Some of my entities right now have like 10-15 interfaces already, each with its own state (typically 1-5 properties), and having to load and instantiate all of that just to read one property seems like a massive performance hit. With lazy-instantiation of properties a big part of the problem goes away.
The remaining question is whether having only a id->blob mapping like today is fine, or whether introducing extra fragmentation would be useful. One obvious issue is how to keep it all in sync. With the two-level solution it should be fine, since the first lookup should get {id,timestamp,version,app_version} and all such metadata about the entity as a whole, whereas the second level lookup would access the actual data. But I agree, that needs to be benchmarked.
I think it is a complex area, with room for significant speed improvements in large entities, but I also think that the answers will surprise us once we start measuring.
Any hints on what the surprise might be?
We need a Performance Expert :-) who can dedicate himself/herself to chasing performance in ES and Indexing/Query.
That would be very nice, yes. /Rickard _______________________________________________ qi4j-dev mailing list [email protected] http://lists.ops4j.org/mailman/listinfo/qi4j-dev

