Re: [qi4j-dev] Entity persistence

Rickard Öberg Thu, 15 Oct 2009 23:25:50 -0700

On 2009-10-16 12.25, Niclas Hedhman wrote:

Isn't this the equivalent of "Loading Policy" in JDO/JPA ??

Pretty much, yes I think so. Except aren't those on structural ratherthan usecase level? I.e. regardless of usecase "if you load property1always load property2,property3 at the same time".

I think this is a "per EntityStore type issue" more than it is a
client code issue, although the client code will have a huge influence
which loading strategy is the best for a given EntityStore.


Agree.

So, I tend to lean towards introducing "Loading Policy" in the MapES
first, and perhaps even my favorite approach of 'self learning'
against a use-case, i.e. the ES will internally keep track of which
properties (and possibly associations) that a particular use-case
uses, and pre-load those upon a request.

That's an interesting idea, and it's probably the most easy to use andwith best results. So basically the client developer doesn't have to care.

Another issue that I can see is that of performance in for instance
JDBM... I would assume that making many key lookups are relatively
expensive, and we would need to look at "Lookup/Size speed ratio",
i.e. how many bytes larger blob in the single lookup is required cost
wise for each new key-lookup?

I think there are two issues: one is the expense per lookup, and one isthat with more key->value mappings the database is going to be bigger,so will consume more disk space. The indices will also be bigger andthus slower.

What would be interesting is to have the store work on two levels: oneis the identity lookup, and then the next level would beproperty(/association) lookup. If one index can have only identitylookup and another has the per-property lookup, it should be much faster.

And that is needed at the particular
MapES implementation, as the values for JDBM would be dramatically
different from a network based. But, you probably realize that reading
the Blob is one step and creating the Property instances is another
with its own overhead, potentially very substantial, and here it is
probably a fixed time per instance, i.e. per first use -->  create.

Right, so one thing we can do *today* is to change so that the blob readdoes not do the property instantiation eagerly. There's no need to asfar as I can tell. That on its own will make a huge difference.

Some of my entities right now have like 10-15 interfaces already, eachwith its own state (typically 1-5 properties), and having to load andinstantiate all of that just to read one property seems like a massiveperformance hit. With lazy-instantiation of properties a big part of theproblem goes away.

The remaining question is whether having only a id->blob mapping liketoday is fine, or whether introducing extra fragmentation would beuseful. One obvious issue is how to keep it all in sync. With thetwo-level solution it should be fine, since the first lookup should get{id,timestamp,version,app_version} and all such metadata about theentity as a whole, whereas the second level lookup would access theactual data. But I agree, that needs to be benchmarked.

I think it is a complex area, with room for significant speed
improvements in large entities, but I also think that the answers will
surprise us once we start measuring.


Any hints on what the surprise might be?

We need a Performance Expert :-) who can dedicate himself/herself to
chasing performance in ES and Indexing/Query.


That would be very nice, yes.

/Rickard

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Re: [qi4j-dev] Entity persistence

Reply via email to