Hi Iain, Random reads for now, with further MR work w/ Hive possibly down the line.
Thanks, -t On Wed, Sep 9, 2015 at 9:31 PM, iain wright <[email protected]> wrote: > Hi Anthony, > > Just curious, you mention your access pattern is mostly reads. Is it random > reads, M/R jobs over a portion of the dataset, or other? > > Best, > > -- > Iain Wright > > This email message is confidential, intended only for the recipient(s) > named above and may contain information that is privileged, exempt from > disclosure under applicable law. If you are not the intended recipient, do > not disclose or disseminate the message to anyone except the intended > recipient. If you have received this message in error, or are not the named > recipient(s), please immediately notify the sender by return email, and > delete all copies of this message. > > On Wed, Sep 9, 2015 at 5:55 PM, Anthony Nguyen < > [email protected]> > wrote: > > > Hi Andrew, > > > > Thanks for your help! I'm attempting to give it Hbase on S3 another try > :). > > Let me see if I can address the two things you pointed out and you can > call > > me crazy for even thinking that they'd be okay: > > > > First, some background - my initial use case will be using HBase in a > > mostly read-only fashion (bulk loads are the primary method of data > load). > > > > Consistency issues: Since S3 has read after write consistency for new > > objects, we could potentially structure the naming scheme within HBase > file > > writes such that the same location is never overwritten. In this way, any > > new files (which would include updates / appends to an ultimately new > file) > > in S3 will be immediately consistent. Files can be cleaned up after a set > > period of time. > > > > Appends: An append in S3 can be modeled as a read / copy / delete > > operation. Performance will definitely suffer, but we can configure WAL > > settings to perhaps roll the WAL very often to minimize appends, right? > > > > Atomic folder renames can also probably be worked around too, similar to > > how hadoop-azure does it. > > > > Thanks again. Looking forward to your insight. > > > > On Wed, Sep 9, 2015 at 7:32 PM, Andrew Purtell <[email protected]> > > wrote: > > > > > It cannot work to use S3 as a backing store for HBase. This has been > > > attempted in the past (although not by me, so this isn't firsthand > > > knowledge). One basic problem is HBase expects to be able to read what > it > > > has written immediately after the write completes. For example, > opening a > > > store file after a flush to service reads, or reopening store files > > after a > > > compaction. S3's notion of availability is too eventual. You can get > 404 > > or > > > 500 responses for some time after completing a file write. HBase will > > > panic. I'm not sure we'd accept kludges around this problem. Another > > > fundamental problem is S3 doesn't provide any means for appending to > > files, > > > so the HBase write ahead log cannot work. I suppose this could be > worked > > > around with changes that allow specification of one type of filesystem > > for > > > HFiles and another for write ahead logs. There are numerous finer > points > > > about HDFS behavior used as synchronization primitive, such as atomic > > > renames. > > > > > > > > > On Wed, Sep 9, 2015 at 4:23 PM, Anthony Nguyen < > > > [email protected]> > > > wrote: > > > > > > > Hi all, > > > > > > > > I'm investigating the use of S3 as a backing store for HBase. Would > > there > > > > be any major issues with modifying HBase in such a way where when an > S3 > > > > location is set for the rootdir, writes to .tmp are removed and > > > minimized, > > > > instead writing directly to the final destination? The reason I'd > like > > to > > > > do this is because renames in S3 are expensive and performance for > > > > operations such as compactions and snapshot restores that have many > > > renames > > > > suffer. > > > > > > > > Thanks! > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > - Andy > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > (via Tom White) > > > > > >
