Re: HFileInputFormat for MapReduce

Tim Robertson Thu, 09 Feb 2012 15:01:25 -0800

Hey Stack,

We see the difference between a scan and TextFileInputFormat of the
same data as csv being 10x slower.  This is what prompted me to look
at MR using an HFIF just out of curiosity.


Cheers,
Tim



On Thu, Feb 9, 2012 at 7:32 PM, Stack <st...@duboce.net> wrote:
> On Thu, Feb 9, 2012 at 12:55 AM, Tim Robertson
> <timrobertson...@gmail.com> wrote:
>> From the limitations you mention, 1) and 2) we can live with, but 3)
>> could be why my quick tests are already giving incorrect record
>> counts.  That sounds like a show stopper straight away right?
>>
>
> So Tim, you are going against the hfiles directly and not via the
> HBase API?  If so, you'll need to do a merge read of the multiple
> hfiles like hbase does (as per Amandeep).  You need this facility?
>
>> One option for us would be HBase for the primary store for random
>> access, and periodic (e.g. 12 hourly) exports to HDFS for all the full
>> scanning.  Would you consider that sane?
>>
>
> You are not getting good scan performance from hbase Tim?
> St.Ack

Re: HFileInputFormat for MapReduce

Reply via email to