Great, thanks a lot. b.
On Mon, Apr 20, 2015 at 7:03 PM, Chris Nauroth <[email protected]> wrote: > Hi Bram, > > Your gut feeling is correct. These 2 properties are used in private > implementation details of cluster communication. I believe these 2 > properties are currently the only difference compared to the public REST > API. > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > From: Bram Biesbrouck <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Monday, April 20, 2015 at 4:19 AM > > To: "[email protected]" <[email protected]> > Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem > > Hi Chris, > > Thanks for your insights. Last question: can you tell me the main > differences (from a Hadoop dev point of view) between the public REST api > and the HDFS wire protocol? > My gut feeling tells me hdfs is mainly used in cluster communication and > the public one is, well, for public api's. But maybe I'm missing some more > subtle differences? > > cheers, > > b. > > On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <[email protected]> > wrote: > >> Hello Bram, >> >> I'm glad to hear the information was helpful. >> >> If you'd like to request access to childNum as part of a guaranteed >> public API, then I encourage you to create a jira issue in the HDFS >> project. We could consider it for the future. >> >> HdfsFileStatus is a representation of the HDFS wire protocol, and it's >> intended to be decoupled from the public API FileStatus object so that the >> two can evolve independently. From a pure code reuse perspective, I >> suppose the two could share a common base class, but then that common base >> class would need to creep into the public API too. >> >> Currently, there is no guarantee about the availability of these fields >> in the public REST API. We're going to remove mention of them in the >> documentation. We're not necessarily planning to remove the fields from >> the JSON immediately, but there is also no guarantee that they'll stay >> there. >> >> Chris Nauroth >> Hortonworks >> http://hortonworks.com/ >> >> >> From: Bram Biesbrouck <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Friday, April 17, 2015 at 5:26 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem >> >> Hi Chris, >> >> Thanks for this reply. I thought something funny was happening. >> >> The childNum field is actually very useful (eg for (not) rendering a >> expansion marker next to a folder in a GUI when it has children), so it's a >> pity the info is there, but get's "eaten up" by the general interface, only >> to be re-calculated later on. >> It would be nice to have the info as an optional field in the FileStatus >> class (initialized to -1 like it is right now), so we can use it if it's >> there or just ignore it when not initialized. While I'm >> ranting, HdfsFileStatus should override from FileStatus because it's 95% >> the same code anyway. >> >> If I read your reply correctly, I assume the fields will be deleted >> from the webhdfs JSON responses as well in the future? >> >> Thanks again for the extensive reply, very useful and appreciated. >> >> cheers, >> >> b. >> >> >> >> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <[email protected] >> > wrote: >> >>> Hello Bram, >>> >>> There are a few Apache jiras with background discussion of the >>> introduction of these fields in WebHDFS. >>> >>> https://issues.apache.org/jira/browse/HDFS-4502 >>> >>> https://issues.apache.org/jira/browse/HDFS-4772 >>> >>> https://issues.apache.org/jira/browse/HDFS-4969 >>> >>> The new fields could not be supported in HTTPFS (only WebHDFS), and >>> they were not intended to be guaranteed in the public REST API. >>> Unfortunately, the fields were added to the documentation mistakenly in >>> Apache Hadoop 2.5.0. >>> >>> https://issues.apache.org/jira/browse/HDFS-6153 >>> >>> We're going to revert that documentation change in Apache Hadoop >>> 2.8.0. I suggest that your application does not rely on these fields, or >>> at least includes fallback logic to keep working as best as it can if the >>> fields are not present. Another way to determine the number of children >>> would be to make a subsequent LISTSTATUS call on the child path. >>> >>> I apologize if this caused any inconvenience, and I hope the >>> information helps. >>> >>> Chris Nauroth >>> Hortonworks >>> http://hortonworks.com/ >>> >>> >>> From: Bram Biesbrouck <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Thursday, April 16, 2015 at 7:58 AM >>> To: "[email protected]" <[email protected]> >>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem >>> >>> Hi all, >>> >>> I'm experiencing something strange while developing against the HttpFS >>> front-end webapp on Hadoop 2.6.0. >>> >>> I'm currently digging into WebHdfsFileSystem and HttpFS to understand >>> it better and understand how the rest api works. I've setup a local single >>> node Hadoop instance, which I can query successfully with eg. >>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS >>> Returning eg. this FileStatus object: >>> >>> { >>> accessTime: 0, >>> blockSize: 0, >>> childrenNum: 0, >>> fileId: 16386, >>> group: "supergroup", >>> length: 0, >>> modificationTime: 1417964248854, >>> owner: "hadoop", >>> pathSuffix: "user", >>> permission: "755", >>> replication: 0, >>> storagePolicy: 0, >>> type: "DIRECTORY" >>> } >>> >>> Now, when I start HttpFS and ask for the same data over it's interface >>> (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different >>> reply. Especially, the childrenNum and fileId fields are missing, compared >>> to the first result (same file or directory): >>> >>> { >>> pathSuffix: "user", >>> type: "DIRECTORY", >>> length: 0, >>> owner: "hadoop", >>> group: "supergroup", >>> permission: "755", >>> accessTime: 0, >>> modificationTime: 1417964248854, >>> blockSize: 0, >>> replication: 0 >>> } >>> >>> Since I need the childrenNum property, I started digging into the code >>> to see where it's "lost" and found that WebHdfsFileSystem performs a >>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just >>> before the list of filestatuses is returned. Basically, it converts >>> HdfsFileStatus objects into FileStatus objects, effectively chopping off >>> those two properties. >>> >>> The sources for HdfsFileStatus clearly state that it's an "Interface >>> that represents the over the wire information for a file.", so I wonder why >>> this happens, since the HdfsFileStatus contains all the right properties, >>> according to the docs at >>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory >>> >>> It feels like the FileStatus class hasn't been updated to match the >>> HdfsFileStatus class, but since they don't share any interfaces or >>> superclasses I get the feeling it's intentional, but I just can't find or >>> figure out why. >>> >>> Can somebody help or shed some light? >>> >>> thanks, >>> >>> b. >>> -- >>> >>> Bram Biesbrouck - 0486/118280 - www.beligum.com - the republic of >>> reinvention >>> >>> >> >> >> -- >> >> Bram Biesbrouck - 0486/118280 - www.beligum.com - the republic of >> reinvention >> >> > > > -- > > Bram Biesbrouck - 0486/118280 - www.beligum.com - the republic of > reinvention > > -- Bram Biesbrouck - 0486/118280 - www.beligum.com - the republic of reinvention
