[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics on object stores

2019-05-15 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Labels: HBOSS  (was: )

> HBOSS: A FileSystem implementation to provide HBase's required semantics on 
> object stores
> -
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
>  Labels: HBOSS
> Fix For: hbase-filesystem-1.0.0-alpha1
>
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase-filesystem-1.patch, 
> HBASE-22149-hbase-filesystem-1.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics on object stores

2019-05-09 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-22149:

Release Note: 


Initial implementation of the hbase-oss module. Defines a wrapper 
implementation of Apache Hadoop's FileSystem interface that bridges the gap 
between Apache HBase, which assumes that many operations are atomic, and 
object-store implementations of FileSystem (such as s3a) which inherently 
cannot provide atomic semantics to those operations natively.

The implementation can be used e.g. with the s3a filesystem by using a root fs 
like `s3a://bucket/` and defining

* `fs.s3a.impl`  set to `org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics`
* `fs.hboss.fs.s3a.impl` set to `org.apache.hadoop.fs.s3a.S3AFileSystem`

more details in the module's README.md

NOTE: This module is labeled with an ALPHA version. It is not considered 
production ready and makes no promises about compatibility between versions.

  was:


Initial implementation of the hbase-oss module. Defines a wrapper FileSystem 
implementation of Apache Hadoop's FileSystem interface that bridges the gap 
between Apache HBase, which assumes that many operations are atomic, and 
object-store implementations of FileSystem (such as s3a) which inherently 
cannot provide atomic semantics to those operations natively.

The implementation can be used e.g. with the s3a filesystem by using a root fs 
like `s3a://bucket/` and defining

* `fs.s3a.impl`  set to `org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics`
* `fs.hboss.fs.s3a.impl` set to `org.apache.hadoop.fs.s3a.S3AFileSystem`

more details in the module's README.md

NOTE: This module is labeled with an ALPHA version. It is not considered 
production ready and makes no promises about compatibility between versions.


> HBOSS: A FileSystem implementation to provide HBase's required semantics on 
> object stores
> -
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Fix For: hbase-filesystem-1.0.0-alpha1
>
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase-filesystem-1.patch, 
> HBASE-22149-hbase-filesystem-1.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 

[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics on object stores

2019-04-30 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Summary: HBOSS: A FileSystem implementation to provide HBase's required 
semantics on object stores  (was: HBOSS: A FileSystem implementation to provide 
HBase's required semantics)

> HBOSS: A FileSystem implementation to provide HBase's required semantics on 
> object stores
> -
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase-filesystem-1.patch, 
> HBASE-22149-hbase-filesystem-1.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-30 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-filesystem-1.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase-filesystem-1.patch, 
> HBASE-22149-hbase-filesystem-1.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-29 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-filesystem-1.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase-filesystem-1.patch, 
> HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-25 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-5.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-25 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: (was: HBASE-22149-hbase-5.patch)

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-25 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-5.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, 
> HBASE-22149-hbase-5.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-22 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-4.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-09 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-3.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-09 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase-2.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-03 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hbase.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Description: 
(Have been using the name HBOSS for HBase / Object Store Semantics)

I've had some thoughts about how to solve the problem of running HBase on 
object stores. There has been some thought in the past about adding the 
required semantics to S3Guard, but I have some concerns about that. First, it's 
mixing complicated solutions to different problems (bridging the gap between a 
flat namespace and a hierarchical namespace vs. solving inconsistency). Second, 
it's S3-specific, whereas other objects stores could use virtually identical 
solutions. And third, we can't do things like atomic renames in a true sense. 
There would have to be some trade-offs specific to HBase's needs and it's 
better if we can solve that in an HBase-specific module without mixing all that 
logic in with the rest of S3A.

Ideas to solve this above the FileSystem layer have been proposed and 
considered (HBASE-20431, for one), and maybe that's the right way forward 
long-term, but it certainly seems to be a hard problem and hasn't been done 
yet. But I don't know enough of all the internal considerations to make much of 
a judgment on that myself.

I propose a FileSystem implementation that wraps another FileSystem instance 
and provides locking of FileSystem operations to ensure correct semantics. 
Locking could quite possibly be done on the same ZooKeeper ensemble as an HBase 
cluster already uses (I'm sure there are some performance considerations here 
that deserve more attention). I've put together a proof-of-concept on which 
I've tested some aspects of atomic renames and atomic file creates. Both of 
these tests fail reliably on a naked s3a instance. I've also done a small YCSB 
run against a small cluster to sanity check other functionality and was 
successful. I will post the patch, and my laundry list of things that still 
need work. The WAL is still placed on HDFS, but the HBase root directory is 
otherwise on S3.

Note that my prototype is built on Hadoop's source tree right now. That's 
purely for my convenience in putting it together quickly, as that's where I 
mostly work. I actually think long-term, if this is accepted as a good 
solution, it makes sense to live in HBase (or it's own repository). It only 
depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
needs, so it should be able to iterate on the HBase community's terms alone.

Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
FileSystem that keeps hierarchical metadata in a more appropriate store that 
would allow the required transactions (maybe a special table in HBase could 
provide that store itself for other tables), and stores the underlying files 
with unique identifiers on S3. This allows renames to actually become fast 
instead of just large atomic operations. It does however place a strong 
dependency on the metadata store. I have not explored this idea much. My 
current proof-of-concept has been pleasantly simple, so I think it's the right 
solution unless it proves unable to provide the required performance 
characteristics.

  was:
I've had some thoughts about how to solve the problem of running HBase on 
object stores. There has been some thought in the past about adding the 
required semantics to S3Guard, but I have some concerns about that. First, it's 
mixing complicated solutions to different problems (bridging the gap between a 
flat namespace and a hierarchical namespace vs. solving inconsistency). Second, 
it's S3-specific, whereas other objects stores could use virtually identical 
solutions. And third, we can't do things like atomic renames in a true sense. 
There would have to be some trade-offs specific to HBase's needs and it's 
better if we can solve that in an HBase-specific module without mixing all that 
logic in with the rest of S3A.

Ideas to solve this above the FileSystem layer have been proposed and 
considered (HBASE-20431, for one), and maybe that's the right way forward 
long-term, but it certainly seems to be a hard problem and hasn't been done 
yet. But I don't know enough of all the internal considerations to make much of 
a judgment on that myself.

I propose a FileSystem implementation that wraps another FileSystem instance 
and provides locking of FileSystem operations to ensure correct semantics. 
Locking could quite possibly be done on the same ZooKeeper ensemble as an HBase 
cluster already uses (I'm sure there are some performance considerations here 
that deserve more attention). I've put together a proof-of-concept on which 
I've tested some aspects of atomic renames and atomic file creates. Both of 
these tests fail reliably on a naked s3a instance. I've also done a small YCSB 
run against a small cluster to sanity check other functionality and was 

[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-22149:

Priority: Critical  (was: Major)

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch
>
>
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-22149:

Component/s: Filesystem Integration

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch
>
>
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-22149:

Issue Type: New Feature  (was: Bug)

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HBASE-22149-hadoop.patch
>
>
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Attachment: HBASE-22149-hadoop.patch

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HBASE-22149-hadoop.patch
>
>
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Description: 
I've had some thoughts about how to solve the problem of running HBase on 
object stores. There has been some thought in the past about adding the 
required semantics to S3Guard, but I have some concerns about that. First, it's 
mixing complicated solutions to different problems (bridging the gap between a 
flat namespace and a hierarchical namespace vs. solving inconsistency). Second, 
it's S3-specific, whereas other objects stores could use virtually identical 
solutions. And third, we can't do things like atomic renames in a true sense. 
There would have to be some trade-offs specific to HBase's needs and it's 
better if we can solve that in an HBase-specific module without mixing all that 
logic in with the rest of S3A.

Ideas to solve this above the FileSystem layer have been proposed and 
considered (HBASE-20431, for one), and maybe that's the right way forward 
long-term, but it certainly seems to be a hard problem and hasn't been done 
yet. But I don't know enough of all the internal considerations to make much of 
a judgment on that myself.

I propose a FileSystem implementation that wraps another FileSystem instance 
and provides locking of FileSystem operations to ensure correct semantics. 
Locking could quite possibly be done on the same ZooKeeper ensemble as an HBase 
cluster already uses (I'm sure there are some performance considerations here 
that deserve more attention). I've put together a proof-of-concept on which 
I've tested some aspects of atomic renames and atomic file creates. Both of 
these tests fail reliably on a naked s3a instance. I've also done a small YCSB 
run against a small cluster to sanity check other functionality and was 
successful. I will post the patch, and my laundry list of things that still 
need work. The WAL is still placed on HDFS, but the HBase root directory is 
otherwise on S3.

Note that my prototype is built on Hadoop's source tree right now. That's 
purely for my convenience in putting it together quickly, as that's where I 
mostly work. I actually think long-term, if this is accepted as a good 
solution, it makes sense to live in HBase (or it's own repository). It only 
depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
needs, so it should be able to iterate on the HBase community's terms alone.

Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
FileSystem that keeps hierarchical metadata in a more appropriate store that 
would allow the required transactions (maybe a special table in HBase could 
provide that store itself for other tables), and stores the underlying files 
with unique identifiers on S3. This allows renames to actually become fast 
instead of just large atomic operations. It does however place a strong 
dependency on the metadata store. I have not explored this idea much. My 
current proof-of-concept has been pleasantly simple, so I think it's the right 
solution unless it proves unable to provide the required performance 
characteristics.

  was:
I've had some thoughts about how to solve the problem of running HBase on 
object stores. There has been some thought in the past about adding the 
required semantics to S3Guard, but I have some concerns about that. First, it's 
mixing complicated solutions to different problems (bridging the gap between a 
flat namespace and a hierarchical namespace vs. solving inconsistency). Second, 
it's S3-specific, whereas other objects stores could use virtually identical 
solutions. And third, we can't do things like atomic renames in a true sense. 
There would have to be some trade-offs specific to HBase's needs and it's 
better if we can solve that in an HBase-specific module without mixing all that 
logic in with the rest of S3A.

Ideas to solve this above the FileSystem layer have been proposed and 
considered (HBASE-20431, for one), and maybe that's the right way forward 
long-term, but it certainly seems to be a hard problem and hasn't been done 
yet. But I don't know enough of all the internal considerations to make much of 
a judgment on that myself.

I propose a FileSystem implementation that wraps another FileSystem instance 
and provides locking of FileSystem operations to ensure correct semantics. 
Locking could quite possibly be done on the same ZooKeeper ensemble as an HBase 
cluster already uses (I'm sure there are some performance considerations here 
that deserve more attention). I've put together a proof-of-concept one which 
I've tested some aspects of atomic renames and atomic file creates. Both of 
these tests fail on a naked s3a instance. I've also done a small YCSB run 
against a small cluster to sanity check other functionality and was successful. 
I will post the patch, and my laundry list of things that still 

[jira] [Updated] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-02 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HBASE-22149:
--
Description: 
I've had some thoughts about how to solve the problem of running HBase on 
object stores. There has been some thought in the past about adding the 
required semantics to S3Guard, but I have some concerns about that. First, it's 
mixing complicated solutions to different problems (bridging the gap between a 
flat namespace and a hierarchical namespace vs. solving inconsistency). Second, 
it's S3-specific, whereas other objects stores could use virtually identical 
solutions. And third, we can't do things like atomic renames in a true sense. 
There would have to be some trade-offs specific to HBase's needs and it's 
better if we can solve that in an HBase-specific module without mixing all that 
logic in with the rest of S3A.

Ideas to solve this above the FileSystem layer have been proposed and 
considered (HBASE-20431, for one), and maybe that's the right way forward 
long-term, but it certainly seems to be a hard problem and hasn't been done 
yet. But I don't know enough of all the internal considerations to make much of 
a judgment on that myself.

I propose a FileSystem implementation that wraps another FileSystem instance 
and provides locking of FileSystem operations to ensure correct semantics. 
Locking could quite possibly be done on the same ZooKeeper ensemble as an HBase 
cluster already uses (I'm sure there are some performance considerations here 
that deserve more attention). I've put together a proof-of-concept one which 
I've tested some aspects of atomic renames and atomic file creates. Both of 
these tests fail on a naked s3a instance. I've also done a small YCSB run 
against a small cluster to sanity check other functionality and was successful. 
I will post the patch, and my laundry list of things that still need work. The 
WAL is still placed on HDFS, but the HBase root directory is otherwise on S3.

Note that my prototype is built on Hadoop's source tree right now. That's 
purely for my convenience in putting it together quickly, as that's where I 
mostly work. I actually think long-term, if this is accepted as a good 
solution, it makes sense to live in HBase (or it's own repository). It only 
depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
needs, so it should be able to iterate on the HBase community's terms alone.

Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
FileSystem that keeps hierarchical metadata in a more appropriate store that 
would allow the required transactions (maybe a special table in HBase could 
provide that store itself for other tables), and stores the underlying files 
with unique identifiers on S3. This allows renames to actually become fast 
instead of just large atomic operations. It does however place a strong 
dependency on the metadata store. I have not explored this idea much. My 
current proof-of-concept has been pleasantly simple, so I think it's the right 
solution unless it proves unable to provide the required performance 
characteristics.

  was:
I've had some thoughts about how to solve the problem of running HBase on 
object stores. There has been some thought in the past about adding the 
required semantics to S3Guard, but I have some concerns about that. First, it's 
mixing complicated solutions to different problems (bridging the gap between a 
flat namespace and a hierarchical namespace vs. solving inconsistency). Second, 
it's S3-specific, whereas other objects stores could use virtually identical 
solutions. And third, we can't do things like atomic renames in a true sense. 
There would have to be some trade-offs and it's better if we can solve that in 
an HBase-specific module without mixing all that logic in with the rest of S3A.

Ideas to solve this above the FileSystem layer have been proposed and 
considered (HBASE-20431, for one), and maybe that's the right way forward 
long-term, but it certainly seems to be a hard problem and hasn't been done 
yet. But I don't know enough of all the internal considerations to make much of 
a judgment on that myself.

I propose a FileSystem implementation that wraps another FileSystem instance 
and provides locking of FileSystem operations to ensure correct semantics. 
Locking could quite possibly be done on the same ZooKeeper ensemble as an HBase 
cluster already uses (I'm sure there are some performance considerations here 
that deserve more attention). I've put together a proof-of-concept one which 
I've tested some aspects of atomic renames and atomic file creates. Both of 
these tests fail on a naked s3a instance. I've also done a small YCSB run 
against a small cluster to sanity check other functionality and was successful. 
I will post the patch, and my laundry list of things that still need work. The 
WAL is still placed