Loading policies based on usecases would be really awesome, and beyond cool!!

Maybe a compromising solution could be to separate persistence based on the mixins in the EntityComposite? So the state for each mixin is stored separately. The data in a mixin will probably tend to be used together, at least in a good design. And it should cause a lot less fragments than storing each property separately.

I even think it makes sence conceptionally, since the basic atom in QI4J is fragments, to store each mixin as a single unit. Not sure if any technical issues makes it difficult/impossible.

Another point is, the EntityStore might want to get called the first time a part of a composite is being accessed regardless of whether the state of that part is already loaded or not, to allow it to learn about usage patterns for the current usecase, even if all state for the composite is currently eagerly fetched. Reducing the granularity to mixins instead of properties/associations could reduce the number of parts of a composite, and thus the number of callbacks a lot!

The drawback is off course, that each association/property will now need to know which mixin it is a part of.

Any ideas how to let the EntityStore know, whether it is being called because someone wants to load an entity, or it is being called because another part of the composite is being accessed? And how will the EntityStore know whether the requested part of the composite is already loaded? Would it be possible to store the info in the EntityReference?

On a sidenote: What is the proper way to handle partially loaded entities when exiting a UnitOfWork? Should state not yet loaded be lazy-loaded on exit, or should the client be prepared for LazyLoadExceptions? If state is lazy loaded on exit, it will probably be an incentive to use value objects.

/Kent

Den 16/10/2009 kl. 08.25 skrev Rickard Öberg:

On 2009-10-16 12.25, Niclas Hedhman wrote:
Isn't this the equivalent of "Loading Policy" in JDO/JPA ??

Pretty much, yes I think so. Except aren't those on structural rather than usecase level? I.e. regardless of usecase "if you load property1 always load property2,property3 at the same time".

I think this is a "per EntityStore type issue" more than it is a
client code issue, although the client code will have a huge influence
which loading strategy is the best for a given EntityStore.

Agree.

So, I tend to lean towards introducing "Loading Policy" in the MapES
first, and perhaps even my favorite approach of 'self learning'
against a use-case, i.e. the ES will internally keep track of which
properties (and possibly associations) that a particular use-case
uses, and pre-load those upon a request.

That's an interesting idea, and it's probably the most easy to use and with best results. So basically the client developer doesn't have to care.

Another issue that I can see is that of performance in for instance
JDBM... I would assume that making many key lookups are relatively
expensive, and we would need to look at "Lookup/Size speed ratio",
i.e. how many bytes larger blob in the single lookup is required cost
wise for each new key-lookup?

I think there are two issues: one is the expense per lookup, and one is that with more key->value mappings the database is going to be bigger, so will consume more disk space. The indices will also be bigger and thus slower.

What would be interesting is to have the store work on two levels: one is the identity lookup, and then the next level would be property(/association) lookup. If one index can have only identity lookup and another has the per-property lookup, it should be much faster.

And that is needed at the particular
MapES implementation, as the values for JDBM would be dramatically
different from a network based. But, you probably realize that reading
the Blob is one step and creating the Property instances is another
with its own overhead, potentially very substantial, and here it is
probably a fixed time per instance, i.e. per first use -->  create.

Right, so one thing we can do *today* is to change so that the blob read does not do the property instantiation eagerly. There's no need to as far as I can tell. That on its own will make a huge difference.

Some of my entities right now have like 10-15 interfaces already, each with its own state (typically 1-5 properties), and having to load and instantiate all of that just to read one property seems like a massive performance hit. With lazy-instantiation of properties a big part of the problem goes away.

The remaining question is whether having only a id->blob mapping like today is fine, or whether introducing extra fragmentation would be useful. One obvious issue is how to keep it all in sync. With the two-level solution it should be fine, since the first lookup should get {id,timestamp,version,app_version} and all such metadata about the entity as a whole, whereas the second level lookup would access the actual data. But I agree, that needs to be benchmarked.

I think it is a complex area, with room for significant speed
improvements in large entities, but I also think that the answers will
surprise us once we start measuring.

Any hints on what the surprise might be?

We need a Performance Expert :-) who can dedicate himself/herself to
chasing performance in ES and Indexing/Query.

That would be very nice, yes.

/Rickard

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Reply via email to