Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-29 Thread Steve Loughran
I do have a WiP library to hide that hoop-jumping behind a normal API, with a goal of 3.2+ support only. It does actually compile against hadoop 2, but it isn't tested there https://github.com/steveloughran/fs-api-shim 1. my goal is to make it a hadoop library like hadoop-third-party, with its

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-28 Thread Nick Dimiduk
On Mon, Mar 27, 2023 at 20:29 Wei-Chiu Chuang wrote: > For complex applications such as > HBase it is almost impossible to achieve true FS agnosticity without proper > contract tests, as now I am starting to realize. > This is absolutely true. HBase jumps through all sorts of painful reflective

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-27 Thread Wei-Chiu Chuang
I think moving up interfaces to FileSystem or some abstract FileSystem class has a few benefits: 1. Application can potentially be made FS-agnostic, with hasPathCapabilities() check. At least, make the code to compile. 2. We will be able to add a contract test to ensure behavior is expected. The

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-27 Thread Steve Loughran
side issue, as i think about what bulk delete call would also keep hbase happy https://issues.apache.org/jira/browse/HADOOP-18679 should we think about new API calls only raising RuntimeExceptions? The more work I do on futures the more the way we always raise IOEs complicates life. java has

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-24 Thread Steve Loughran
On Thu, 23 Mar 2023 at 10:07, Ayush Saxena wrote: > > Second idea mentioned in the original mail is also similar to mentioned in > the comment in the above ticket and is still quite acceptable, name can be > negotiated though, Add an interface to pull the relevant methods up in that > without

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-23 Thread Ayush Saxena
They both need it for a similar use case: "to support Ozone", not anything core that we handle as part of "Apache Hadoop" and I suppose both are working fine with HDFS, because of adding dependency with HDFS? and now they don't want to add Ozone for whatever reasons and folks chasing this

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-23 Thread Tsz Wo Sze
(Clicked "send" too accidentally.  Please ignore my previous email.  Sorry.) Hi, We probably should exclude HBase in this discuss.  I guess Wei-Chiu mentioning it as an example use case.  There are other projects such as Apache Solr requiring similar features. (1) We already has the

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-23 Thread Tsz Wo Sze
Hi, We probably should exclude HBase in this discuss.  I guess Wei-Chiu mentioning it as an example use case.  There are other projects such as Apache Solr requiring similar features. (1) We already has the Syncable (hsync/hflush) interface in Hadoop, it makes sense to have a recover() method

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-21 Thread Ayush Saxena
Well reflections are good or not will drag this somewhere else. I will respect what Tsz-Wo said and put this in my rule book for future :) If I get into Why we don’t have “all” the API in FileSystem itself will drag it to another area, What and where to use Abstraction and stuff like that, Which

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-21 Thread Tsz Wo Sze
Ayush, Yes, reflections are a part of Java.  Why we have to define the FileSystem APIs but not simply use reflections all the times? Reflection is good for dealing with unknown code such as loading a plugin, code analysis, etc.   However, it probably is not a good way to define APIs.

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-20 Thread Ayush Saxena
I am not sure what classifies as a Hack and what not, I thought reflections are part of Java. Whatever solution but pulling in just the HDFS specific stuff to FileSystem just for Ozone, because Hbase guys didn’t agree and we have people in Hadoop who we can convince, I am -1 to such an approach

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-20 Thread Tsz Wo Sze
It makes a lot sense to use PathCapabilities. Reflection is a hack but not a solution. Tsz-WoOn Tuesday, March 21, 2023, 09:43:15 AM GMT+8, Ayush Saxena wrote: Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase having resistance against the people proposing or

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-20 Thread Ayush Saxena
Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase having resistance against the people proposing or against Ozone’ Anyway doesn’t ViewDistributedFileSystem not solve this Ozone problem, I remember Uma chasing that to solve these problems only? Pulling up the core HDFS

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-20 Thread Wei-Chiu Chuang
Thank you. Makes sense to me. Yes, as part of this effort we are going to need contract tests. On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran wrote: >1. I think a new interface would be good as FileContext could do the >same thing >2. using PathCapabilities probes should still be

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-17 Thread Steve Loughran
1. I think a new interface would be good as FileContext could do the same thing 2. using PathCapabilities probes should still be mandatory as for FileContext it would depend on the back end 3. Whoever does this gets to specify what the API does and write the contract tests.

[DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-16 Thread Wei-Chiu Chuang
Hi, Stephen and I are working on a project to make HBase to run on Ozone. HBase, born out of the Hadoop project, depends on a number of HDFS specific APIs, including recoverLease() and isInSafeMode(). The HBase community [1] strongly voiced that they don't want the project to have direct