Le 15/10/11 23:34, Christian Schäfer a écrit :
But nevertheless I will try on using data nucleus' jpa for hbase and make some benchmarks to compare it with the hbase native interface;-)
Hi there,
Would be great if you plan to make such study to publish results (here ?) !!!
What about proposing a simple application that all those guys who created an ORM (Datanucleus, Kundera, ...) could implement and submit (you?) for a bench ?
I'm part of those guys. We created n-orm (http://code.google.com/p/n-orm/) just as a matter to separate responsibilities in our team (functionnal vs non-functionnal), to centralize data management (to improve separation of concerns, and thus maintainability), and to still understand what really happens under the hood (and still be able to change platform in case of problem...). Actually, our ORM considers POJOs as some kind of schema for the base (query-driven), and thus, philosophy is more to use java objects but with the knowledge of how to use HBase in mind, so that we hope not loosing too much of HBase possibilities.
I agree when Michel says that the HBase API is easy, but when it comes to details, it's really hard to think of everything, especially when it's interleaved with functionnal code (scan caching, inter-process schema management, compression, migration, error handling, new versions of the API, new possibilities... or just learning a new important stuff to be integrated in the complete application !).
Nevertheless, as our application becomes more and more complex, it's unconceivable for us to re-implement it just using the HBase raw API. But, as a consequence, I have no real idea of the price we pay regarding performance just to help us developing...
Another ORM that deserves attention is https://github.com/ghelmling/meetup.beeno which is built on the same philosophy. Actually, we didn't choose it as it's too tightly coupled with HBase, but I guess it must really perform well (because of the latter reason).
I think the real danger of ORMs is to think your schema in a domain-driven (classical) fashion, instead of query-driven. It might be the case that this danger is less important when you use raw APIs.
Cheers, Frédéric.
