This thread here might be useful:
https://forums.aws.amazon.com/thread.jspa?threadID=34936

Theres also a section on S3 here:

http://ofps.oreilly.com/titles/9781449396107/installation.html

On Wed, Sep 28, 2011 at 12:11 PM, Vinod Gupta Tankala
<tvi...@readypulse.com> wrote:
> thanks Li. I didn't know about using S3 as a datastore. Will look into this
> more.
>
> I understand that hdfs replication will help in partial hardware failure. I
> wanted to protect myself against inconsistencies as I have gotten bitten in
> the past. That had happened due to hbase fatal exceptions. One of the
> reasons for that could have been due to standalone mode as that is not
> production ready, based on reading hbase documentation.
> Another use case I have is - I would be writing sweeper jobs to delete user
> data that is more than x months old. So in case, we need to retrieve old
> user data, I would like to have the ability to get old data back from
> exported tables. Ofcourse, I understand that to do so for selective user
> accounts, I have to write custom jobs.
>
> thanks
> vinod
>
> On Wed, Sep 28, 2011 at 11:49 AM, Li Pi <l...@ucsd.edu> wrote:
>
>> What kind of situations are you looking for to guard against? Partial
>> hardware failure, full hardware failure (of live cluster),
>> accidentally deleting all data?
>>
>> HDFS provides replication that already guards against partial hardware
>> failure - if this is all you need, a ephemeral store should be  fine.
>>
>> Also, HBase can use S3 directly as a datastore. You can choose the raw
>> mode, in which HBase treats S3 as a disk. There used to be a block
>> based mode as well, but now as S3 has increased the object size limit
>> to 5tb, this isn't needed anymore. (Somebody correct me if i'm wrong).
>>
>> On Wed, Sep 28, 2011 at 9:15 AM, Vinod Gupta Tankala
>> <tvi...@readypulse.com> wrote:
>> > Hi,
>> > Can someone answer these basic but important questions for me.
>> > We are using hbase for our datastore and want to safeguard ourselves from
>> > data corruption/data loss. Also we are hosted on aws ec2. Currently, I
>> only
>> > have a single node but want to prepare for scale right away as things are
>> > going to change starting next couple of weeks. Also, I am currently using
>> > ephemeral store for hbase data.
>> >
>> > 1) What is the recommended aws data store method for hbase? should you
>> use
>> > ephemeral store and do S3 backups or use EBS? I read and heard that EBS
>> can
>> > be expensive and also unreliable in terms of read/write latency.
>> Ofcourse,
>> > it provides data replication and protection, so you don't have to worry
>> > about that.
>> >
>> > 2) What is the recommended backup/restore method for hbase? I would like
>> to
>> > take periodic data snapshots and then have a import utility that will
>> > incrementally import data in case i lose some regions due to corruption
>> or
>> > table inconsistencies. also, if something catastrophic happens, i can
>> > restore the whole data.
>> >
>> > 3) While we are at it, what is the recommended ec2 instance types for
>> > running master/zookeeper/region servers? i get conflicting answers from
>> > google search - ranging from c1.xlarge to m1.xlarge.
>> >
>> > I would really appreciate if someone could help me.
>> >
>> > thanks
>> > vinod
>> >
>>
>

Reply via email to