Hi Stack, Hi everyone, >> I do feel the HBase project would benefit from some example metrics >> for various operations and hardware or else it will remain a difficult >> technology for some people to get into with confidence. We'll blog >> our findings, and hopefully it might be of benefit to other >> leprechauns. If we can prove the concept, we're more likely to be >> able to get $ to grow. > > Agree (except for the bit where you look like a leprechaun). Would be > cool if folks published what stats they see doing various operations > in hbase on a specific hardware. Previous I'd have thought the > deploys, configs., etc., too various but I suppose you have to start > somewhere.
I too agree. >From my experience there are a lot of small companies[4] which can't afford or need large clusters and don't have the knowledge and resources to fully optimize a cluster. We're certainly one of those organizations. It's already a challenge for us to follow the rapid development in the projects we're using (Hadoop, HBase, Oozie, Hive, etc.). We're still putting Hadoop and HBase to good use and it's tremendously helpful. As all our work is Open Source we're in the very fortunate position to being able to point to all our configs[1], workflows[2] and metrics (Ganglia now up and public)[3] etc. and ask for recommendations based on that but a lot of other companies don't enjoy that privilege. We're more than willing to provide information and even test out different configurations on our (admittedly small and aging) cluster and we would hope that this'll prove helpful for others as well. It is worth noting that we do plan to buy new and better hardware, but need to understand the technologies and capabilities to make some informed choices before spending our total yearly hardware budget. Therefore, understanding the behavior even on lesser quality hardware is still important for us. Thanks for all the past and (hopefully) future help and it's great to finally be able to work with HBase again. Cheers, Lars PS: Tim and I work at the same organization [1] <http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet> [2] <http://code.google.com/p/gbif-occurrencestore/source/browse/#svn%2Ftrunk%2Foozie-apps%2Frollover> [3] <http://dev.gbif.org/ganglia/> [4] See also the cluster sizes on <http://wiki.apache.org/hadoop/PoweredBy>
