Hi Stack, My problem is that I have large number of smaller objects and a few larger objects. My strategy is to store smaller objects (size < 5MB) in hbase and larger objects (size > 5MB) on hdfs. And I also want to run MapReduce tasks on those objects. Loan suggested that I should put all objects in a MapFile/SequenceFile on hdfs and insert in to hbase the reference of the object stored in the file. Now if I run a mapreduce task, my mapper would be run locally wrt the object references and not the actual dfs block where the object resides.
- Rohit Kelkar On Mon, Jan 30, 2012 at 11:12 AM, Stack <[email protected]> wrote: > On Sun, Jan 29, 2012 at 8:36 PM, Rohit Kelkar <[email protected]> wrote: >> Hi Loan, this seems interesting. But in your approach I have a follow >> up question - would I be able to take advantage of data locality >> while running map-reduce tasks? My understanding is that the locality >> would be with respect to the references to those objects and not the >> actual objects themselves. >> > > If you are mapreducing against hbase, locality should be respected: > i.e. the <5mb objects should be on the local regionserver. The bigger > stuff maybe local -- it would depend on how it was written. > > Not sure what you mean above when you talk of references vs actual objects. > > Yours, > St.Ack
