Niclas Hedhman wrote:
> I watched this; http://www.youtube.com/watch?v=4XpnKHJAok8
> (Is it just me, or does the Google crowd laugh at the 'wrong' places??)
>
> And there has also been a debate over GIT in Apache, and I think I am
> in the pro-Git camp at the moment, but not totally convinced.
I've listened to the presentation now, and the ideas are really really
interesting. I would be ok with going to GIT as long as there are tools
to use it. Windows-use seem to require Cygwin, which is bad (but hey,
who uses Windows anyway?), and I'm not sure about the status of the Git
plugin for IDEA. Is there one for Eclipse? Anyway, if these practical
things are resolved I can't see any major reason not to go to Git,
as the model itself seems very attractive.
Speaking of which, what I really got excited about was the distributed
model itself. My first thought was: why can't we use this for object
models?? It would solve a whole shitload of problems, including (but not
limited to) "clustering", backups (which are not needed anymore),
offline usage (which in todays laptop/mobile oriented world is becoming
increasingly more important!!!), globalization of users (having a
central server when users are on all continents is not very appealing),
and so on.
In PetStore terms, if I'm a vet doing a housecall I would like to be
able to bring up the medical history of the pets of the owner I am
visiting, look at the typical ailments for those types of pets, and then
update the medical history while being offline, and then "merge" that
data with the "central" repository when I get back to the office
("central" being a wrong notion, since there might be other offices
which should also be "merged" with). This is the correct way to view
these things in a connected world I think, and it requires the
infrastructure to think about data, consistency, partitioning and such
things very differently.
The CAP theorem becomes central in these things. For example, in the
housecall scenario I am willing to sacrifice Consistency and Network
partitioning in order to have Availability of data, that is, get all the
data for the pets onto the laptop (Availability) but since it is
disconnected it might not necessarily be up to date when I view it
(Consistency) and I might not be able to look at histories of other pets
(Network partitioning). But that may be fine, and making this decision
allows me to have an offline client in the first place! When I (as a
vet) update the medical history that is essentially creating a diff
(sequence of changes) that will be applied to the "central" repository
later on.
However, and this goes back to my recent description of how the Remote
EntityStore should work, this will not just copy the data into the
central repository, but will rather emulate a client doing these
changes, so all the constraints and business rules will be run again,
ensuring that the resulting state is still consistent. If they are not
then there might be an exception from that UnitOfWork which needs to be
resolved first. Doing "clustering", then, is to some extent a matter of
sending these sequences of changes around to any interested party and
running them again. This is, btw, exactly how replication worked in
SiteVision, and it worked surprisingly well.
There's a whole host of resulting things to think about here, such as
"objects vs graphs" (e.g. "files vs content" in Linus' talk), where we
don't really care about the version of a single object, but rather the
hash of graphs, i.e. "aggregates" in DDD lingo (here's another reason
why having explicit support for DDD things like Aggregates is
superimportant I think). I would like to explore where all of this could
take us, but it's too long for a single email, so I'll stop here and see
what any other associations and reactions you guys have from this stuff.
regards,
Rickard
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev