Is HIVE involved? Or is it just raw scan compared to TFIF?
No Hive
Is this a MR scan or just a shell serial scan (or is it still PE?)?
We are using PE scan to try and standardize as much as possible.
You want to get this scan speed up only? You are not interested in figuring
how
to
On Fri, Feb 10, 2012 at 3:21 AM, Tim Robertson
timrobertson...@gmail.com wrote:
We are using PE scan to try and standardize as much as possible.
Fair enough.
Since CDH3u3 is ongoing as I type, I'm not sure on the regions (50
regions on 3 RS with the PE TestTable).
Why are you not sure?
Hi all,
Can anyone elaborate on the pitfalls or implications of running
MapReduce using an HFileInputFormat extending FileInputFormat?
I'm sure scanning goes through the RS for good reasons (guessing
handling splits, locking, RS monitoring etc) but can it ever be safe
to run MR over HFiles
of running
MapReduce using an HFileInputFormat extending FileInputFormat?
I'm sure scanning goes through the RS for good reasons (guessing
handling splits, locking, RS monitoring etc) but can it ever be safe
to run MR over HFiles directly? E.g. For scenarios like a a region
split, would the MR
)
Amandeep
On Feb 9, 2012, at 12:19 AM, Tim Robertson timrobertson...@gmail.com wrote:
Hi all,
Can anyone elaborate on the pitfalls or implications of running
MapReduce using an HFileInputFormat extending FileInputFormat?
I'm sure scanning goes through the RS for good reasons (guessing
On Thu, Feb 9, 2012 at 12:55 AM, Tim Robertson
timrobertson...@gmail.com wrote:
From the limitations you mention, 1) and 2) we can live with, but 3)
could be why my quick tests are already giving incorrect record
counts. That sounds like a show stopper straight away right?
So Tim, you are
Hey Stack,
We see the difference between a scan and TextFileInputFormat of the
same data as csv being 10x slower. This is what prompted me to look
at MR using an HFIF just out of curiosity.
Cheers,
Tim
On Thu, Feb 9, 2012 at 7:32 PM, Stack st...@duboce.net wrote:
On Thu, Feb 9, 2012 at
From the limitations you mention, 1) and 2) we can live with, but 3)
could be why my quick tests are already giving incorrect record
counts. That sounds like a show stopper straight away right?
One option for us would be HBase for the primary store for random
access, and periodic (e.g. 12
I also encountered this issue when comparing Hive+HBase with
Hive+HDFS(native hive tables). After some tuning(ensure data locality,
using scan cache,appropriate number of mappers per node etc), Hive+HBase is
around 4~5X slower.
I guess the two main reasons are :
1) HFile repeats keys for each K/V
On Thu, Feb 9, 2012 at 3:00 PM, Tim Robertson timrobertson...@gmail.com wrote:
Hey Stack,
We see the difference between a scan and TextFileInputFormat of the
same data as csv being 10x slower. This is what prompted me to look
at MR using an HFIF just out of curiosity.
Is HIVE involved? Or
10 matches
Mail list logo