Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-09 Thread Steve Loughran
On 8 August 2013 21:51, Matevz Tadel mta...@ucsd.edu wrote: Hi Steve, Thank you very much for the reality check! Some more answers inline ... On 8/8/13 1:30 PM, Steve Loughran wrote: On 7 August 2013 10:59, Jeff Dost jd...@ucsd.edu wrote: Hello, We work in a software development team

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-09 Thread Jeff Dost
Hi Steve, Thanks again for your in depth replies, we found your comments quite useful. A few responses inline: On 8/9/13 10:31 AM, Steve Loughran wrote: On 8 August 2013 21:51, Matevz Tadel mta...@ucsd.edu wrote: We already do fallback to xrootd on open failures from our application

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Matevz Tadel
Hi everybody, I'm jumping in as Jeff is away due to an unexpected annoyance involving Californian wildlife. On 8/7/13 7:47 PM, Andrew Wang wrote: Blocks are supposed to be an internal abstraction within HDFS, and aren't an inherent part of FileSystem (the user-visible class used to access

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Colin McCabe
There is work underway to decouple the block layer and the namespace layer of HDFS from each other. Once this is done, block behaviors like the one you describe will be easy to implement. It's a use case very similar to the hierarchical storage management (HSM) use case that we've discussed

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Steve Loughran
On 7 August 2013 10:59, Jeff Dost jd...@ucsd.edu wrote: Hello, We work in a software development team at the UCSD CMS Tier2 Center. We would like to propose a mechanism to allow one to subclass the DFSInputStream in a clean way from an external package. First I'd like to give some

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Suresh Srinivas
This is being targeted for release 2.3. 2.1.x release stream is for stabilizing. When it reaches stability, 2.2 GA will be released. The current features in development will make it to 2.3, including HDFS-2832. On Thu, Aug 8, 2013 at 2:04 PM, Matevz Tadel mta...@ucsd.edu wrote: Thanks Colin,

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Matevz Tadel
Hi Steve, Thank you very much for the reality check! Some more answers inline ... On 8/8/13 1:30 PM, Steve Loughran wrote: On 7 August 2013 10:59, Jeff Dost jd...@ucsd.edu wrote: Hello, We work in a software development team at the UCSD CMS Tier2 Center. We would like to propose a

Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Jeff Dost
Hello, We work in a software development team at the UCSD CMS Tier2 Center. We would like to propose a mechanism to allow one to subclass the DFSInputStream in a clean way from an external package. First I'd like to give some motivation on why and then will proceed with the details. We

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Joe Bounour
Hello Jeff Is it something that could go under HCFS project? http://wiki.apache.org/hadoop/HCFS (I might be wrong?) Joe On 8/7/13 10:59 AM, Jeff Dost jd...@ucsd.edu wrote: Hello, We work in a software development team at the UCSD CMS Tier2 Center. We would like to propose a mechanism to

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Andrew Wang
I don't think exposing DFSClient and DistributedFileSystem members is necessary to achieve what you're trying to do. We've got wrapper FileSystems like FilterFileSystem and ViewFileSystem which you might be able to use for inspiration, and the HCFS wiki lists some third-party FileSystems that

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Jeff Dost
Thank you for the suggestion, but we don't see how simply wrapping a FileSystem object would be sufficient in our use case. The reason why is we need to catch and handle read exceptions at the block level. There aren't any public methods available in the high level FileSystem abstraction

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Andrew Wang
Blocks are supposed to be an internal abstraction within HDFS, and aren't an inherent part of FileSystem (the user-visible class used to access all Hadoop filesystems). Is it possible to instead deal with files and offsets? On a read failure, you could open a stream to the same file on the backup