After listening to the FeatherCast episode about Hadoop I spent a couple hour reading about it, and filling in my knowledge gap about the Map-Reduce distributed processing pattern.

It might be interesting to hook up a distributed database like HBase, but whether or not it would do better than a good traditional database in a clustered mode is a tough question. In fact, for operational/ transaction systems in most cases my guess is that it would be significantly less efficient (where quick response as opposed to lots of data is more critical).

On the other hand, for data warehousing and running reports, something like Hadoop combined with HBase would probably significantly improve performance, and even allow for huge reports being done in a few seconds that would normally take many minutes.

From the execution perspective the Map-Reduce stuff is possibly even somewhat compatible with the way the Service Engine does things. That's kind of neither here nor there though, because the OFBiz infrastructure is all about handling a large number of small requests doing mostly very small and simple operations. That is by nature very concurrency friendly, and actually needs to be concurrent by nature. In other words there isn't a reason to try to introduce algorithm changes to make things easier to parallelize, they are mostly that way by nature in OFBiz.

For the data warehousing stuff (which we're just getting into, Jacopo has been driving that mostly to date), it would be really cool to support HBase with the entity engine (which would be really easy if it has a JDBC driver, and if not it may still be not be too hard as long as it organizes data in a relational way and supports general relational operations such as SQL-ish queries.

-David


On Apr 30, 2008, at 4:42 PM, Calum Miller wrote:
Admittedly this is a bit off the wall but any reason why the EntityEngine could not have a Cloud interface to something like Hadoop/HBase?

Strikes me that the simple EntityEngine map view of the world is not too far from the MapReduce view. It would be very cool to dump an existing OFBiz database running on Oracle/Derby then re-import into a cloud model to support massive data sets and parallel processing of queries.

Calum


Reply via email to