Yes, I have been wondering about that exact scenario of "rollback" from versions and also wonder if I set it to store the last 3 versions, then do I triple my 7 terabytes into 21 terabytes as it stands now which I don't know yet if that is :( or :). Thoughts on versioning here from experienced users? (I am completely new to this and just putting a prototype together to bring an 12 hour job down to 1 hour or less). thanks Dean
-----Original Message----- From: Buttler, David [mailto:[email protected]] Sent: Tuesday, December 07, 2010 12:14 PM To: [email protected] Subject: RE: serialized objects as strings or as object? & data corruption? If you are not doing any type of aggregation, then a reduce job adds unnecessary overhead. For your example I would definitely recommend a single map job that does a get/put operation pair. Also, don't forget that hbase stores versions, so you may be able to simply delete a corrupted value Dave -----Original Message----- From: Hiller, Dean (Contractor) [mailto:[email protected]] Sent: Tuesday, December 07, 2010 9:16 AM To: [email protected] Subject: RE: serialized objects as strings or as object? & data corruption? Purely application bugs is what I am thinking about and the plan to fix that data corruption when it happens.(ie. Bug is in prod for 1 day and I need to fix all records that it touched). I really like that JSON approach. That sounds quite nice and then I think a short lived Map-Reduce job might fix the corruption. Actually, I wonder if I could just do a Map without any Reduce and pick the data out and write it back fixing the corruption? Thanks, Dean -----Original Message----- From: Jonathan Gray [mailto:[email protected]] Sent: Monday, December 06, 2010 2:57 PM To: [email protected] Subject: RE: serialized objects as strings or as object? & data corruption? Hey Dean, Why are you so concerned about data corruption? Is your concern about application level bugs causing corruption, or HBase/HDFS causing the corruption? HDFS provides checksumming and if a replica of a block is found to be corrupt it will be re-replicated from a correct replica. As for a CLI, I imagine it wouldn't be too hard to extend the existing jruby shell to suit your needs if you have experience with jruby. JG > -----Original Message----- > From: Hiller, Dean (Contractor) [mailto:[email protected]] > Sent: Monday, December 06, 2010 1:40 PM > To: [email protected] > Subject: serialized objects as strings or as object? & data corruption? > > Is there a good tool out there for serialization to hbase for a java > entity? If I have an Account, and then have a List<Activities> in the > account, I preferably want to serialize that as all strings so data > corruption issues can be fixed easier independent of the objects.....or > do I just create MapReduce short lived jobs that fix data corruption? > How do people deal with data corruption and serializing objects to HBase > storage? > > > > I also like the ability to query command line and actually be able to > read the storage(but maybe I just build something that knows about my > objects?)....how do people deal with this today? Just looking for > thoughts on this subject. > > > > Thanks, > > Dean > > > This message and any attachments are intended only for the use of the > addressee and > may contain information that is privileged and confidential. If the reader > of the > message is not the intended recipient or an authorized representative of > the > intended recipient, you are hereby notified that any dissemination of this > communication is strictly prohibited. If you have received this > communication in > error, please notify us immediately by e-mail and delete the message and > any > attachments from your system. > This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system. This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
