Re: Custom partitioning and order for optimum hbase store

Alan Gates Mon, 24 Jan 2011 13:29:10 -0800

Do you want to order the groups or just within the groups? If youwant to order within the groups you can do that in Pig in a single job.


Alan.


On Jan 24, 2011, at 1:20 PM, Dmitriy Lyubimov wrote:

Thanks.

So i take there's no way in pig to specify custom partitioner And the
ordering in one MR step?
I don't think prebuilding HFILEs is the best strategy in my case.For my jobis incremental (i.e. i am not replacing 100% of the data). However,it is
big enough that i don't want to create random writes.
but using custom partitioner in GROUP statement along with PARALLELand
somehow specifying ordering as well would probably be ideal .
i wonder if sequential spec of GROUP and ORDER BY could translateinto a
single MR job? i guess not, would it?



-d
On Mon, Jan 24, 2011 at 1:12 PM, Dmitriy Ryaboy <[email protected]>wrote:
Pushing this logic into the storefunc would force an MR boundarybefore thestore (unless the StoreFunc passed, I suppose) which can makethings overly
complex.
I think for the purposes of bulk-loading into HBase, a betterapproach
might
be to use the native map-reduce functionality and feed results youwant to
store into a map-reduce job created as per
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.html(the<http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.html%28the>
bulk loading section).

D

On Mon, Jan 24, 2011 at 11:51 AM, Dmitriy Lyubimov <[email protected]
wrote:
Better yet, it would've seem to be logical if partitioning andadvise on
partition #s is somehow tailored to a storefunc . It would stand to
reason
that for as long as we are not storing to hdfs, store func is inthe bestposition to determine optimal save parameters such as order,partitioning
and parallelism.
On Mon, Jan 24, 2011 at 11:47 AM, Dmitriy Lyubimov <dlieu.[email protected]
wrote:
Hi,

so it seems to be more efficient if storing to hbase partitions by
regions
and orders by hbase keys.
I see that pig 0.8 (pig-282) added custom partitioner in a groupbut i
am
not sure if order is enforced there.
Is there a way to run single MR that orders and partitions dataas per
above and uses an explicitly specifed store func in reducers?

Thank you.

Re: Custom partitioning and order for optimum hbase store

Reply via email to