For large binary objects, you could consider Google Protocol Buffers. It is very compact when working with large lists of numbers, etc. where Java serialization will give a lot of overhead (for example: a single BigInteger object of value 0 takes 50 bytes in serialized form).
If you anticipate on using a volume of data that justifies the use of HBase and Hadoop, I would not want to fix any data corruption manually, so you probably should have this automated using some kind of sanity checks. I don't think you have to worry about HBase / HDFS corrupting your data. It has proven to be very stable in that area. Friso On 7 dec 2010, at 00:45, Buttler, David wrote: > A couple of thoughts here: > 1) for some types of objects, you want your fields to be column qualifiers in > HBase. So in effect, you are serializing to the hbase format > 2) Some objects you might want to serialize with json -- it is a very > lightweight serialization protocol -- and you can use Gson to do most of the > work for you > 3) some objects you might want to invent your own human-readable format for > legacy or convenience reasons. > > I do all three in a single table and find it very flexible > > Dave > > > -----Original Message----- > From: Hiller, Dean (Contractor) [mailto:[email protected]] > Sent: Monday, December 06, 2010 1:40 PM > To: [email protected] > Subject: serialized objects as strings or as object? & data corruption? > > Is there a good tool out there for serialization to hbase for a java > entity? If I have an Account, and then have a List<Activities> in the > account, I preferably want to serialize that as all strings so data > corruption issues can be fixed easier independent of the objects.....or > do I just create MapReduce short lived jobs that fix data corruption? > How do people deal with data corruption and serializing objects to HBase > storage? > > > > I also like the ability to query command line and actually be able to > read the storage(but maybe I just build something that knows about my > objects?)....how do people deal with this today? Just looking for > thoughts on this subject. > > > > Thanks, > > Dean > > > This message and any attachments are intended only for the use of the > addressee and > may contain information that is privileged and confidential. If the reader of > the > message is not the intended recipient or an authorized representative of the > intended recipient, you are hereby notified that any dissemination of this > communication is strictly prohibited. If you have received this communication > in > error, please notify us immediately by e-mail and delete the message and any > attachments from your system. > >
