Hello Rohit,

I would try to write most objects in a Hadoop Sequence file or a MapFile and store the index/byte offeset in HBase.

When reading: open the file seek() to the position and start reading the key:value. I don't think that using toByteArray() is good because, I think, you are creating a copy of the object in memory. If it's big you will end up with two instances of them. Try to stream the object directly to disk.

I don't know if 5mb is good or not, I hope someone can shed some light.

If the objects are changing: append to the SequenceFile and update the reference in HBase. From time to time run a MR job that cleans the file.

You can use ZooKeeper to coordinate writing to many Sequence Files.

If you go this way, please post your results.

Cheers,

Pe 27.01.2012 10:42, Rohit Kelkar a scris:
Hi,
I am using hbase to store java objects. The objects implement the
Writable interface. The size of objects to be stored in each row
ranges from a few kb to ~50 Mb. The strategy that I am planning to use
is
if object size<  5Mb
store it in hbase
else
store it on hdfs and insert its hdfs location in hbase

While storing the objects I am using
WritableUtils.toByteArray(myObject) method. Can I use the
WritableUtils.toByteArray(myObject).length to determine if the object
should go in hbase or hdfs? Is this an acceptable strategy? Is the 5
MB limit a safe enough threshold?

- Rohit Kelkar


--
Ioan Eugen Stan
http://ieugen.blogspot.com

Reply via email to