Hi Stack,
My problem is that I have large number of smaller objects and a few
larger objects. My strategy is to store smaller objects (size < 5MB)
in hbase and larger objects (size > 5MB) on hdfs. And I also want to
run MapReduce tasks on those objects. Loan suggested that I should put
all objects in a MapFile/SequenceFile on hdfs and insert in to hbase
the reference of the object stored in the file. Now if I run a
mapreduce task, my mapper would be run locally wrt the object
references and not the actual dfs block where the object resides.

- Rohit Kelkar

On Mon, Jan 30, 2012 at 11:12 AM, Stack <[email protected]> wrote:
> On Sun, Jan 29, 2012 at 8:36 PM, Rohit Kelkar <[email protected]> wrote:
>> Hi Loan, this seems interesting. But in your approach I have a follow
>> up question -  would I be able to take advantage of data locality
>> while running map-reduce tasks? My understanding is that the locality
>> would be with respect to the references to those objects and not the
>> actual objects themselves.
>>
>
> If you are mapreducing against hbase, locality should be respected:
> i.e. the <5mb objects should be on the local regionserver.  The bigger
> stuff maybe local -- it would depend on how it was written.
>
> Not sure what you mean above when you talk of references vs actual objects.
>
> Yours,
> St.Ack

Reply via email to