On 6/1/12 11:37 , Niclas Hedhman wrote:
Gang,

I am contemplating the possibility of going the full distance with DDD
support and restrictions when it comes to Entities.
<snip>

Here's my general take on it. Modeling apps using DDD and the Aggregate/Entity/Value breakdown I think works well for most apps. Having entities output events, a la Greg Young, as their main result, is generally the right way to go. This is just a continuation of what we already have started, with separating entity store from querying, but taking it to its logical conclusion.

Some thoughts/issues:
* When you work with aggregates, instead of UoW, whenever you need to involve two Aggregates, with usecases such as "move child X from aggregate A to B" it has to involve a saga somehow. You would first send the command to A, which would verify the remove and produce the event. Then you would have to consume that in a saga and create a new command to B for adding X. To keep it simple that should not be allowed to fail or throw exceptions. The move will hence not be atomic, so would be eventually consistent, and involves a bit of infrastructure for handling it. If you don't have this infrastructure in place I don't see using aggregates as being very realistic, as only simplistic usecases that don't involve many aggregates are possible.

* If the above is done while taking advantage of the sharding possibility that comes with using aggregates (an aggregate and all subentities live in one server, i.e. you have a "document view" of the whole thing), a "move" such as the above essentially involves taking the client entity X and creating a VALUE for it in a command to B, and then recreating it in B with new identities etc. With that you have a fully shardable solution with no cross-server references, but it is relying on quite a bit of infrastructure to get it going.

* If the aggregate store is events, a la Greg Young, where snapshots are only created once in a while, then you need to consider how to handle updates of the model. The pure way is to only do it through events, but this is tricky and not a lot of people are used to it. It can be made reasonably simple I think by using something like the Migration API (which is data-centric), but that MUST be in place.

* If you go this route the emphasis would then also be on creating event consumers that denormalize events into whatever read store you are using.

* If you go this route, then it becomes clear that the domain model is ONLY used for processing commands, i.e. you don't create POJO's that you use for both read AND write. This is a GOOD THING, as it just makes no sense at all to go through an object model on view queries. I mean seriously: make a database query, get the id's, load the objects, pick out the stuff you need, serialize to JSON/XML for client vs make a database query and stream it to JSON/XML for client. As long as you are using a Query DSL for the query I don't see a problem with that, and it would allow for much more rich results involving aggregates and such, which we just can't do with queries-in-domain-model.

* On how to construct the domain model, I'm sort of leaning towards the case that DCI will to a large extent replace how we use mixins now. You would load an entity with state, and then add DCI context and roles around it/them to process the commands and the business rules. The entities then would pretty much only be simple state, i.e. we could still use interfaces+Property<>/Associations<> as now, but it would have no logic. Logic would be in DCI, which can be implemented using POJO's. This simplifies testing as well, as the interfaces can be implemented using TransientComposite for tests.

* Once you have all of this, and your main store is the event store, you can plug in any number of NOSQL thingies as event consumers for denormalized views, and you don't really have to rely on transactions or blah blah in them for consistency, since that is already done at the command level. All you need is a event consumer that feeds one or more NOSQL stores for *view purposes only*. Awesome.

That's my general take on it.

For your specific points:
* Entities are bound to an Aggregate, and the Aggregate has an
Aggregate Root, which is the only Entity within the Aggregate that is
globally reachable.

See above. Yes, with the condition that infrastructure around it is needed for cross-aggregate stuff.

* Only changes within the Aggregate are atomic. Changes across
Aggregates are eventually consistent.

Agreed, as above.

* Invariants are declared on the Aggregate Root or assigned to the
Aggregate Root at assembly.

Yup.

* Aggregates are declared via @Aggregated annotation on Associations
and ManyAssociations.

Yup.

* The Aggregated entities Identity is scoped by the Aggregate Root
(under the hood, Aggregate Root identity is prefixed to the aggregated
entity).

* When a non-Aggregated Association is traversed the retrieved Entity
is read-only.

Traversed for what reason, is the question? Usually that will be for query/view purposes, but as above, why not skip the model entirely and just use a query DSL on the store, and stream the result to the client?

Would then that mean that UnitOfWork is not needed at all?? The
Aggregate IS effectively the UnitOfWork, and obtaining an Aggregate
can be done directly on the EntityFactory/Module, and the aggregated
entities are created from the AggregateRoot. Various posts on DDD
group also seems to suggest the same thing, IF you are modelling with
Aggregates, UnitOfWork should not exist.

Agreed. Takes more effort to model, or at least get used to, but philosophically it's the right thing to do, given all we now know about DDD and CQRS/EventSourcing.

In all, this seems to suggest that the whole persistence system can be
simplified, GoodThing(tm), yet with the Aggregates being the
Distribution boundary, Consistency boundary, Transaction boundary and
Concurrency boundary, I think we can obtain a more solid semantic
model for how things are expected to work, both locally as well as
distributed.

Agreed, see above. We provide the transaction boundary on the aggregate, and make sure that isolation is done properly there. Once that is done with, the rest becomes eventually consistent.

To add to the above, I would like to get in place an asynchronous
model for the Entity Store SPI as well;

* All changes to Entities are captured as Transitions.

Why not Events?

* Such transitions are pushed to the Entity Store SPI asynchronously.
Optimistic success, with callback for success/failures.

If Events from Aggregates are the main thing to decide whether a command processing went well or not, I would make this part synchronous. Processing these events to generate denormalized views would be asynchronous though. This would be great! Right now, for example, I know that a big performance issue in Streamflow is the updates to Sesame and Solr as indexes upon transaction commit. With this, all of that would architecturally be done asynchronously, and out of the critical path of execution.

* Retrieval is likewise asynchronous. The request contains a callback
whereto deliver the transition stream.

Retrieval of aggregate happens on start of command processing. You can definitely make that asynchronous, but what is the payoff? If you are backing a REST API the response would have to be "command received, processing", rather than worked/failed.

* Perhaps retrieval requests can be persistent, so that one can
register a Specification, which will continue to feed the callback
with all changes matching the specification. Not sure if this will be
useful though.

"Retrieval" is too vague. If we do separate between command processing and view requests, then I (with what I know now) usually opt for skipping the domain model COMPLETELY. Query, stream, done. The domain model doesn't give me anything, other than schema help. But I can get that, if I want to, by using query DSL's and composite interfaces for view definition, or something like that.

This could also simplify the EntityStore SPI quite a bit, since the
only interface needed would be something like;

public interface EventStore<T extends Event>
{

     void save( Identity identity, Iterable<T>  events, EventOutcome<T>  
handler );

     void load( Identity identity, EventRetriever<T>  callback );
}

I would use Future<> as result for both of these, for outcome and retriever, IF you want them to be asynch. I.e.
public interface EventStore
{
  Future<EventOutcome> save(Identity identity, Iterable<Event> events);
  EntityComposite load(Identity identity); // Load given aggregate
}

This is for the purpose of command processing. For event consuming there would be a different interface that allows for paging, either for all events or per identity. EventOutcome would be a POJO above rather than callback.

public interface StateTransition extends Event  // super interface for
all ES transitions
{
     Identity entityIdentity();

     long sequenceNumber();

     DateTime timestamp();
}

Have a look at the event Value I did in the EventSourcing:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/eventsourcing/src/main/java/org/qi4j/library/eventsourcing/domain/api/DomainEventValue.java

Since the aggregate outputs an iterable of events, you don't actually put the timestamp into the event itself, but rather in a wrapper of that iterable:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/eventsourcing/src/main/java/org/qi4j/library/eventsourcing/domain/api/UnitOfWorkDomainEventsValue.java

Notice that it also contains a bit of extra context, such as version of app used to create it (helps for migration), usecase (helps understanding scope of event), and who triggered this usecase. With this, the EventStore becomes:
public interface EventStore
{
Future<EventOutcome> save(AggregateEvents events); // Contains identity, timestamp, and list of events Iterable<Event> load(Identity identity); // Load events for given aggregate
}

Something like that. Again, the consumer of events would have a different API, something like this:
https://github.com/Qi4j/qi4j-sdk/blob/develop/libraries/eventsourcing/src/main/java/org/qi4j/library/eventsourcing/domain/source/EventSource.java

IF the entitiy state is represented as a List of Transitions, the
"current state" must be rebuilt from these transitions, which seems to
suggest things will be much slower. This is probably true if the
number of modifications to a Property or Association are magnitude
larger than the snapshot value, but only actual trials will tell what
can be expected, how much will be in serialization overhead, versus
reconstruction of the snapshot state. A later optimization could be to
allow for "snapshot", which the ES understand as "temporal starting
point".

Having snapshot events is something I think we would need to have from the start. In the Streamflow version the EntityStore and EventStore are separated, but if you just join the two, with the introduction of snapshot events, you're good to go.

The above is basically how I think a proper modern DDD friendly app building environment would look like, and it would embrace NOSQL fully, along with understanding how sharding and vertical scaling works.

/Rickard

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Reply via email to