It cannot work to use S3 as a backing store for HBase. This has been attempted in the past (although not by me, so this isn't firsthand knowledge). One basic problem is HBase expects to be able to read what it has written immediately after the write completes. For example, opening a store file after a flush to service reads, or reopening store files after a compaction. S3's notion of availability is too eventual. You can get 404 or 500 responses for some time after completing a file write. HBase will panic. I'm not sure we'd accept kludges around this problem. Another fundamental problem is S3 doesn't provide any means for appending to files, so the HBase write ahead log cannot work. I suppose this could be worked around with changes that allow specification of one type of filesystem for HFiles and another for write ahead logs. There are numerous finer points about HDFS behavior used as synchronization primitive, such as atomic renames.
On Wed, Sep 9, 2015 at 4:23 PM, Anthony Nguyen <[email protected]> wrote: > Hi all, > > I'm investigating the use of S3 as a backing store for HBase. Would there > be any major issues with modifying HBase in such a way where when an S3 > location is set for the rootdir, writes to .tmp are removed and minimized, > instead writing directly to the final destination? The reason I'd like to > do this is because renames in S3 are expensive and performance for > operations such as compactions and snapshot restores that have many renames > suffer. > > Thanks! > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
