Loading policies based on usecases would be really awesome, and beyond
cool!!
Maybe a compromising solution could be to separate persistence based
on the mixins in the EntityComposite? So the state for each mixin is
stored separately. The data in a mixin will probably tend to be used
together, at least in a good design.
And it should cause a lot less fragments than storing each property
separately.
I even think it makes sence conceptionally, since the basic atom in
QI4J is fragments, to store each mixin as a single unit. Not sure if
any technical issues makes it difficult/impossible.
Another point is, the EntityStore might want to get called the first
time a part of a composite is being accessed regardless of whether
the state of that part is already loaded or not, to allow it to learn
about usage patterns for the current usecase, even if all state for
the composite is currently eagerly fetched. Reducing the granularity
to mixins instead of properties/associations could reduce the number
of parts of a composite, and thus the number of callbacks a lot!
The drawback is off course, that each association/property will now
need to know which mixin it is a part of.
Any ideas how to let the EntityStore know, whether it is being called
because someone wants to load an entity, or it is being called because
another part of the composite is being accessed? And how will the
EntityStore know whether the requested part of the composite is
already loaded? Would it be possible to store the info in the
EntityReference?
On a sidenote: What is the proper way to handle partially loaded
entities when exiting a UnitOfWork? Should state not yet loaded be
lazy-loaded on exit, or should the client be prepared for
LazyLoadExceptions? If state is lazy loaded on exit, it will probably
be an incentive to use value objects.
/Kent
Den 16/10/2009 kl. 08.25 skrev Rickard Öberg:
On 2009-10-16 12.25, Niclas Hedhman wrote:
Isn't this the equivalent of "Loading Policy" in JDO/JPA ??
Pretty much, yes I think so. Except aren't those on structural
rather than usecase level? I.e. regardless of usecase "if you load
property1 always load property2,property3 at the same time".
I think this is a "per EntityStore type issue" more than it is a
client code issue, although the client code will have a huge
influence
which loading strategy is the best for a given EntityStore.
Agree.
So, I tend to lean towards introducing "Loading Policy" in the MapES
first, and perhaps even my favorite approach of 'self learning'
against a use-case, i.e. the ES will internally keep track of which
properties (and possibly associations) that a particular use-case
uses, and pre-load those upon a request.
That's an interesting idea, and it's probably the most easy to use
and with best results. So basically the client developer doesn't
have to care.
Another issue that I can see is that of performance in for instance
JDBM... I would assume that making many key lookups are relatively
expensive, and we would need to look at "Lookup/Size speed ratio",
i.e. how many bytes larger blob in the single lookup is required cost
wise for each new key-lookup?
I think there are two issues: one is the expense per lookup, and one
is that with more key->value mappings the database is going to be
bigger, so will consume more disk space. The indices will also be
bigger and thus slower.
What would be interesting is to have the store work on two levels:
one is the identity lookup, and then the next level would be
property(/association) lookup. If one index can have only identity
lookup and another has the per-property lookup, it should be much
faster.
And that is needed at the particular
MapES implementation, as the values for JDBM would be dramatically
different from a network based. But, you probably realize that
reading
the Blob is one step and creating the Property instances is another
with its own overhead, potentially very substantial, and here it is
probably a fixed time per instance, i.e. per first use --> create.
Right, so one thing we can do *today* is to change so that the blob
read does not do the property instantiation eagerly. There's no need
to as far as I can tell. That on its own will make a huge difference.
Some of my entities right now have like 10-15 interfaces already,
each with its own state (typically 1-5 properties), and having to
load and instantiate all of that just to read one property seems
like a massive performance hit. With lazy-instantiation of
properties a big part of the problem goes away.
The remaining question is whether having only a id->blob mapping
like today is fine, or whether introducing extra fragmentation would
be useful. One obvious issue is how to keep it all in sync. With the
two-level solution it should be fine, since the first lookup should
get {id,timestamp,version,app_version} and all such metadata about
the entity as a whole, whereas the second level lookup would access
the actual data. But I agree, that needs to be benchmarked.
I think it is a complex area, with room for significant speed
improvements in large entities, but I also think that the answers
will
surprise us once we start measuring.
Any hints on what the surprise might be?
We need a Performance Expert :-) who can dedicate himself/herself to
chasing performance in ES and Indexing/Query.
That would be very nice, yes.
/Rickard
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev