Yes, I have been wondering about that exact scenario of "rollback" from
versions and also wonder if I set it to store the last 3 versions, then
do I triple my 7 terabytes into 21 terabytes as it stands now which I
don't know yet if that is :( or :).  Thoughts on versioning here from
experienced users?  (I am completely new to this and just putting a
prototype together to bring an 12 hour job down to 1 hour or less).
thanks
Dean

-----Original Message-----
From: Buttler, David [mailto:[email protected]] 
Sent: Tuesday, December 07, 2010 12:14 PM
To: [email protected]
Subject: RE: serialized objects as strings or as object? & data
corruption?

If you are not doing any type of aggregation, then a reduce job adds
unnecessary overhead.  For your example I would definitely recommend a
single map job that does a get/put operation pair.

Also, don't forget that hbase stores versions, so you may be able to
simply delete a corrupted value

Dave


-----Original Message-----
From: Hiller, Dean (Contractor) [mailto:[email protected]] 
Sent: Tuesday, December 07, 2010 9:16 AM
To: [email protected]
Subject: RE: serialized objects as strings or as object? & data
corruption?

Purely application bugs is what I am thinking about and the plan to fix
that data corruption when it happens.(ie. Bug is in prod for 1 day and I
need to fix all records that it touched).

I really like that JSON approach.  That sounds quite nice and then I
think a short lived Map-Reduce job might fix the corruption.  Actually,
I wonder if I could just do a Map without any Reduce and pick the data
out and write it back fixing the corruption?

Thanks,
Dean

-----Original Message-----
From: Jonathan Gray [mailto:[email protected]] 
Sent: Monday, December 06, 2010 2:57 PM
To: [email protected]
Subject: RE: serialized objects as strings or as object? & data
corruption?

Hey Dean,

Why are you so concerned about data corruption?

Is your concern about application level bugs causing corruption, or
HBase/HDFS causing the corruption?

HDFS provides checksumming and if a replica of a block is found to be
corrupt it will be re-replicated from a correct replica.


As for a CLI, I imagine it wouldn't be too hard to extend the existing
jruby shell to suit your needs if you have experience with jruby.

JG

> -----Original Message-----
> From: Hiller, Dean (Contractor) [mailto:[email protected]]
> Sent: Monday, December 06, 2010 1:40 PM
> To: [email protected]
> Subject: serialized objects as strings or as object? & data
corruption?
> 
> Is there a good tool out there for serialization to hbase for a java
> entity?  If I have an Account, and then have a List<Activities> in the
> account, I preferably want to serialize that as all strings so data
> corruption issues can be fixed easier independent of the
objects.....or
> do I just create MapReduce short lived jobs that fix data corruption?
> How do people deal with data corruption and serializing objects to
HBase
> storage?
> 
> 
> 
> I also like the ability to query command line and actually be able to
> read the storage(but maybe I just build something that knows about my
> objects?)....how do people deal with this today?  Just looking for
> thoughts on this subject.
> 
> 
> 
> Thanks,
> 
> Dean
> 
> 
> This message and any attachments are intended only for the use of the
> addressee and
> may contain information that is privileged and confidential. If the
reader
> of the
> message is not the intended recipient or an authorized representative
of
> the
> intended recipient, you are hereby notified that any dissemination of
this
> communication is strictly prohibited. If you have received this
> communication in
> error, please notify us immediately by e-mail and delete the message
and
> any
> attachments from your system.
> 

This message and any attachments are intended only for the use of the
addressee and
may contain information that is privileged and confidential. If the
reader of the 
message is not the intended recipient or an authorized representative of
the
intended recipient, you are hereby notified that any dissemination of
this
communication is strictly prohibited. If you have received this
communication in
error, please notify us immediately by e-mail and delete the message and
any
attachments from your system.


This message and any attachments are intended only for the use of the addressee 
and
may contain information that is privileged and confidential. If the reader of 
the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Reply via email to