On 6/1/12 11:37 , Niclas Hedhman wrote:
Gang,
I am contemplating the possibility of going the full distance with DDD
support and restrictions when it comes to Entities.
<snip>
Here's my general take on it. Modeling apps using DDD and the
Aggregate/Entity/Value breakdown I think works well for most apps.
Having entities output events, a la Greg Young, as their main result, is
generally the right way to go. This is just a continuation of what we
already have started, with separating entity store from querying, but
taking it to its logical conclusion.
Some thoughts/issues:
* When you work with aggregates, instead of UoW, whenever you need to
involve two Aggregates, with usecases such as "move child X from
aggregate A to B" it has to involve a saga somehow. You would first send
the command to A, which would verify the remove and produce the event.
Then you would have to consume that in a saga and create a new command
to B for adding X. To keep it simple that should not be allowed to fail
or throw exceptions. The move will hence not be atomic, so would be
eventually consistent, and involves a bit of infrastructure for handling
it. If you don't have this infrastructure in place I don't see using
aggregates as being very realistic, as only simplistic usecases that
don't involve many aggregates are possible.
* If the above is done while taking advantage of the sharding
possibility that comes with using aggregates (an aggregate and all
subentities live in one server, i.e. you have a "document view" of the
whole thing), a "move" such as the above essentially involves taking the
client entity X and creating a VALUE for it in a command to B, and then
recreating it in B with new identities etc. With that you have a fully
shardable solution with no cross-server references, but it is relying on
quite a bit of infrastructure to get it going.
* If the aggregate store is events, a la Greg Young, where snapshots are
only created once in a while, then you need to consider how to handle
updates of the model. The pure way is to only do it through events, but
this is tricky and not a lot of people are used to it. It can be made
reasonably simple I think by using something like the Migration API
(which is data-centric), but that MUST be in place.
* If you go this route the emphasis would then also be on creating event
consumers that denormalize events into whatever read store you are using.
* If you go this route, then it becomes clear that the domain model is
ONLY used for processing commands, i.e. you don't create POJO's that you
use for both read AND write. This is a GOOD THING, as it just makes no
sense at all to go through an object model on view queries. I mean
seriously: make a database query, get the id's, load the objects, pick
out the stuff you need, serialize to JSON/XML for client vs make a
database query and stream it to JSON/XML for client. As long as you are
using a Query DSL for the query I don't see a problem with that, and it
would allow for much more rich results involving aggregates and such,
which we just can't do with queries-in-domain-model.
* On how to construct the domain model, I'm sort of leaning towards the
case that DCI will to a large extent replace how we use mixins now. You
would load an entity with state, and then add DCI context and roles
around it/them to process the commands and the business rules. The
entities then would pretty much only be simple state, i.e. we could
still use interfaces+Property<>/Associations<> as now, but it would have
no logic. Logic would be in DCI, which can be implemented using POJO's.
This simplifies testing as well, as the interfaces can be implemented
using TransientComposite for tests.
* Once you have all of this, and your main store is the event store, you
can plug in any number of NOSQL thingies as event consumers for
denormalized views, and you don't really have to rely on transactions or
blah blah in them for consistency, since that is already done at the
command level. All you need is a event consumer that feeds one or more
NOSQL stores for *view purposes only*. Awesome.
That's my general take on it.
For your specific points:
* Entities are bound to an Aggregate, and the Aggregate has an
Aggregate Root, which is the only Entity within the Aggregate that is
globally reachable.
See above. Yes, with the condition that infrastructure around it is
needed for cross-aggregate stuff.
* Only changes within the Aggregate are atomic. Changes across
Aggregates are eventually consistent.
Agreed, as above.
* Invariants are declared on the Aggregate Root or assigned to the
Aggregate Root at assembly.
Yup.
* Aggregates are declared via @Aggregated annotation on Associations
and ManyAssociations.
Yup.
* The Aggregated entities Identity is scoped by the Aggregate Root
(under the hood, Aggregate Root identity is prefixed to the aggregated
entity).
* When a non-Aggregated Association is traversed the retrieved Entity
is read-only.
Traversed for what reason, is the question? Usually that will be for
query/view purposes, but as above, why not skip the model entirely and
just use a query DSL on the store, and stream the result to the client?
Would then that mean that UnitOfWork is not needed at all?? The
Aggregate IS effectively the UnitOfWork, and obtaining an Aggregate
can be done directly on the EntityFactory/Module, and the aggregated
entities are created from the AggregateRoot. Various posts on DDD
group also seems to suggest the same thing, IF you are modelling with
Aggregates, UnitOfWork should not exist.
Agreed. Takes more effort to model, or at least get used to, but
philosophically it's the right thing to do, given all we now know about
DDD and CQRS/EventSourcing.
In all, this seems to suggest that the whole persistence system can be
simplified, GoodThing(tm), yet with the Aggregates being the
Distribution boundary, Consistency boundary, Transaction boundary and
Concurrency boundary, I think we can obtain a more solid semantic
model for how things are expected to work, both locally as well as
distributed.
Agreed, see above. We provide the transaction boundary on the aggregate,
and make sure that isolation is done properly there. Once that is done
with, the rest becomes eventually consistent.
To add to the above, I would like to get in place an asynchronous
model for the Entity Store SPI as well;
* All changes to Entities are captured as Transitions.
Why not Events?
* Such transitions are pushed to the Entity Store SPI asynchronously.
Optimistic success, with callback for success/failures.
If Events from Aggregates are the main thing to decide whether a command
processing went well or not, I would make this part synchronous.
Processing these events to generate denormalized views would be
asynchronous though. This would be great! Right now, for example, I know
that a big performance issue in Streamflow is the updates to Sesame and
Solr as indexes upon transaction commit. With this, all of that would
architecturally be done asynchronously, and out of the critical path of
execution.
* Retrieval is likewise asynchronous. The request contains a callback
whereto deliver the transition stream.
Retrieval of aggregate happens on start of command processing. You can
definitely make that asynchronous, but what is the payoff? If you are
backing a REST API the response would have to be "command received,
processing", rather than worked/failed.
* Perhaps retrieval requests can be persistent, so that one can
register a Specification, which will continue to feed the callback
with all changes matching the specification. Not sure if this will be
useful though.
"Retrieval" is too vague. If we do separate between command processing
and view requests, then I (with what I know now) usually opt for
skipping the domain model COMPLETELY. Query, stream, done. The domain
model doesn't give me anything, other than schema help. But I can get
that, if I want to, by using query DSL's and composite interfaces for
view definition, or something like that.
This could also simplify the EntityStore SPI quite a bit, since the
only interface needed would be something like;
public interface EventStore<T extends Event>
{
void save( Identity identity, Iterable<T> events, EventOutcome<T>
handler );
void load( Identity identity, EventRetriever<T> callback );
}
I would use Future<> as result for both of these, for outcome and
retriever, IF you want them to be asynch. I.e.
public interface EventStore
{
Future<EventOutcome> save(Identity identity, Iterable<Event> events);
EntityComposite load(Identity identity); // Load given aggregate
}
This is for the purpose of command processing. For event consuming there
would be a different interface that allows for paging, either for all
events or per identity. EventOutcome would be a POJO above rather than
callback.
public interface StateTransition extends Event // super interface for
all ES transitions
{
Identity entityIdentity();
long sequenceNumber();
DateTime timestamp();
}
Have a look at the event Value I did in the EventSourcing:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/eventsourcing/src/main/java/org/qi4j/library/eventsourcing/domain/api/DomainEventValue.java
Since the aggregate outputs an iterable of events, you don't actually
put the timestamp into the event itself, but rather in a wrapper of that
iterable:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/eventsourcing/src/main/java/org/qi4j/library/eventsourcing/domain/api/UnitOfWorkDomainEventsValue.java
Notice that it also contains a bit of extra context, such as version of
app used to create it (helps for migration), usecase (helps
understanding scope of event), and who triggered this usecase. With
this, the EventStore becomes:
public interface EventStore
{
Future<EventOutcome> save(AggregateEvents events); // Contains
identity, timestamp, and list of events
Iterable<Event> load(Identity identity); // Load events for given
aggregate
}
Something like that. Again, the consumer of events would have a
different API, something like this:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/eventsourcing/src/main/java/org/qi4j/library/eventsourcing/domain/source/EventSource.java
IF the entitiy state is represented as a List of Transitions, the
"current state" must be rebuilt from these transitions, which seems to
suggest things will be much slower. This is probably true if the
number of modifications to a Property or Association are magnitude
larger than the snapshot value, but only actual trials will tell what
can be expected, how much will be in serialization overhead, versus
reconstruction of the snapshot state. A later optimization could be to
allow for "snapshot", which the ES understand as "temporal starting
point".
Having snapshot events is something I think we would need to have from
the start. In the Streamflow version the EntityStore and EventStore are
separated, but if you just join the two, with the introduction of
snapshot events, you're good to go.
The above is basically how I think a proper modern DDD friendly app
building environment would look like, and it would embrace NOSQL fully,
along with understanding how sharding and vertical scaling works.
/Rickard
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev