I am very interested in this also.  I posed the question somewhere a couple 
years ago and hadn't heard anything.  We decided to go with hbase to store a 
"working set" of the data - data that we would want to view with low latency 
and relatively randomly.  Then, we store everything else to HDFS for later 
processing. 

We are working with medical/physiological sensor data. 

-- 
Andrew Nguyen



On Tuesday, May 29, 2012 at 10:13 AM, Josh Patterson wrote:

> unless you need low latency access to all of this time series, it
> might be a more cost efficient path to store large archives of the
> data in plain HDFS.
> 
> The scanning can be done more efficiently in a lot of cases in MapReduce + 
> HDFS.
> 
> Some links:
> 
> OSCON-data presentation (good TVA story here):
> 
> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
> 
> http://www.slideshare.net/cloudera/hadoop-as-the-platform-for-the-smartgrid-at-tva
> 
> 
> Engineering Literature:
> 
> http://openpdc.codeplex.com/
> 
> Josh
> 
> On Thu, May 17, 2012 at 7:23 PM, Rita <rmorgan...@gmail.com 
> (mailto:rmorgan...@gmail.com)> wrote:
> > Hello,
> > 
> > Currently, using hbase to store sensor data -- basically large time series
> > data hitting close to 2 billion rows for a type of sensor. I was wondering
> > how hbase differs from HDF (http://www.hdfgroup.org/HDF5/)  file format.
> > Most of my operations are scanning a range and getting its values but it
> > seems I can achieve this usind HDF. Does anyone have experience with this
> > file container format and shed some light?
> > 
> > 
> > 
> > 
> > --
> > --- Get your facts first, then you can distort them as you please.--
> > 
> 
> 
> 
> 
> -- 
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
> 
> 


Reply via email to