Hello Rohit,
I would try to write most objects in a Hadoop Sequence file or a MapFile
and store the index/byte offeset in HBase.
When reading: open the file seek() to the position and start reading the
key:value. I don't think that using toByteArray() is good because, I
think, you are creating a copy of the object in memory. If it's big you
will end up with two instances of them. Try to stream the object
directly to disk.
I don't know if 5mb is good or not, I hope someone can shed some light.
If the objects are changing: append to the SequenceFile and update the
reference in HBase. From time to time run a MR job that cleans the file.
You can use ZooKeeper to coordinate writing to many Sequence Files.
If you go this way, please post your results.
Cheers,
Pe 27.01.2012 10:42, Rohit Kelkar a scris:
Hi,
I am using hbase to store java objects. The objects implement the
Writable interface. The size of objects to be stored in each row
ranges from a few kb to ~50 Mb. The strategy that I am planning to use
is
if object size< 5Mb
store it in hbase
else
store it on hdfs and insert its hdfs location in hbase
While storing the objects I am using
WritableUtils.toByteArray(myObject) method. Can I use the
WritableUtils.toByteArray(myObject).length to determine if the object
should go in hbase or hdfs? Is this an acceptable strategy? Is the 5
MB limit a safe enough threshold?
- Rohit Kelkar
--
Ioan Eugen Stan
http://ieugen.blogspot.com