I am very interested in this also. I posed the question somewhere a couple years ago and hadn't heard anything. We decided to go with hbase to store a "working set" of the data - data that we would want to view with low latency and relatively randomly. Then, we store everything else to HDFS for later processing.
We are working with medical/physiological sensor data. -- Andrew Nguyen On Tuesday, May 29, 2012 at 10:13 AM, Josh Patterson wrote: > unless you need low latency access to all of this time series, it > might be a more cost efficient path to store large archives of the > data in plain HDFS. > > The scanning can be done more efficiently in a lot of cases in MapReduce + > HDFS. > > Some links: > > OSCON-data presentation (good TVA story here): > > http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard > > http://www.slideshare.net/cloudera/hadoop-as-the-platform-for-the-smartgrid-at-tva > > > Engineering Literature: > > http://openpdc.codeplex.com/ > > Josh > > On Thu, May 17, 2012 at 7:23 PM, Rita <rmorgan...@gmail.com > (mailto:rmorgan...@gmail.com)> wrote: > > Hello, > > > > Currently, using hbase to store sensor data -- basically large time series > > data hitting close to 2 billion rows for a type of sensor. I was wondering > > how hbase differs from HDF (http://www.hdfgroup.org/HDF5/) file format. > > Most of my operations are scanning a range and getting its values but it > > seems I can achieve this usind HDF. Does anyone have experience with this > > file container format and shed some light? > > > > > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > > > > > -- > Twitter: @jpatanooga > Solution Architect @ Cloudera > hadoop: http://www.cloudera.com > >