After listening to the FeatherCast episode about Hadoop I spent a
couple hour reading about it, and filling in my knowledge gap about
the Map-Reduce distributed processing pattern.
It might be interesting to hook up a distributed database like HBase,
but whether or not it would do better than a good traditional database
in a clustered mode is a tough question. In fact, for operational/
transaction systems in most cases my guess is that it would be
significantly less efficient (where quick response as opposed to lots
of data is more critical).
On the other hand, for data warehousing and running reports, something
like Hadoop combined with HBase would probably significantly improve
performance, and even allow for huge reports being done in a few
seconds that would normally take many minutes.
From the execution perspective the Map-Reduce stuff is possibly even
somewhat compatible with the way the Service Engine does things.
That's kind of neither here nor there though, because the OFBiz
infrastructure is all about handling a large number of small requests
doing mostly very small and simple operations. That is by nature very
concurrency friendly, and actually needs to be concurrent by nature.
In other words there isn't a reason to try to introduce algorithm
changes to make things easier to parallelize, they are mostly that way
by nature in OFBiz.
For the data warehousing stuff (which we're just getting into, Jacopo
has been driving that mostly to date), it would be really cool to
support HBase with the entity engine (which would be really easy if it
has a JDBC driver, and if not it may still be not be too hard as long
as it organizes data in a relational way and supports general
relational operations such as SQL-ish queries.
-David
On Apr 30, 2008, at 4:42 PM, Calum Miller wrote:
Admittedly this is a bit off the wall but any reason why the
EntityEngine could not have a Cloud interface to something like
Hadoop/HBase?
Strikes me that the simple EntityEngine map view of the world is not
too far from the MapReduce view. It would be very cool to dump an
existing OFBiz database running on Oracle/Derby then re-import into
a cloud model to support massive data sets and parallel processing
of queries.
Calum