Hi Loan, this seems interesting. But in your approach I have a follow up question - would I be able to take advantage of data locality while running map-reduce tasks? My understanding is that the locality would be with respect to the references to those objects and not the actual objects themselves.
- Rohit Kelkar On Fri, Jan 27, 2012 at 4:21 PM, Ioan Eugen Stan <[email protected]> wrote: > Hello Rohit, > > I would try to write most objects in a Hadoop Sequence file or a MapFile and > store the index/byte offeset in HBase. > > When reading: open the file seek() to the position and start reading the > key:value. I don't think that using toByteArray() is good because, I think, > you are creating a copy of the object in memory. If it's big you will end up > with two instances of them. Try to stream the object directly to disk. > > I don't know if 5mb is good or not, I hope someone can shed some light. > > If the objects are changing: append to the SequenceFile and update the > reference in HBase. From time to time run a MR job that cleans the file. > > You can use ZooKeeper to coordinate writing to many Sequence Files. > > If you go this way, please post your results. > > Cheers, > > Pe 27.01.2012 10:42, Rohit Kelkar a scris: > >> Hi, >> I am using hbase to store java objects. The objects implement the >> Writable interface. The size of objects to be stored in each row >> ranges from a few kb to ~50 Mb. The strategy that I am planning to use >> is >> if object size< 5Mb >> store it in hbase >> else >> store it on hdfs and insert its hdfs location in hbase >> >> While storing the objects I am using >> WritableUtils.toByteArray(myObject) method. Can I use the >> WritableUtils.toByteArray(myObject).length to determine if the object >> should go in hbase or hdfs? Is this an acceptable strategy? Is the 5 >> MB limit a safe enough threshold? >> >> - Rohit Kelkar > > > > -- > Ioan Eugen Stan > http://ieugen.blogspot.com
