Great, thanks a lot.

b.

On Mon, Apr 20, 2015 at 7:03 PM, Chris Nauroth <[email protected]>
wrote:

>   Hi Bram,
>
>  Your gut feeling is correct.  These 2 properties are used in private
> implementation details of cluster communication.  I believe these 2
> properties are currently the only difference compared to the public REST
> API.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Bram Biesbrouck <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Monday, April 20, 2015 at 4:19 AM
>
> To: "[email protected]" <[email protected]>
> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>
>   Hi Chris,
>
>  Thanks for your insights. Last question: can you tell me the main
> differences (from a Hadoop dev point of view) between the public REST api
> and the HDFS wire protocol?
> My gut feeling tells me hdfs is mainly used in cluster communication and
> the public one is, well, for public api's. But maybe I'm missing some more
> subtle differences?
>
>  cheers,
>
>  b.
>
> On Fri, Apr 17, 2015 at 7:28 PM, Chris Nauroth <[email protected]>
> wrote:
>
>>   Hello Bram,
>>
>>  I'm glad to hear the information was helpful.
>>
>>  If you'd like to request access to childNum as part of a guaranteed
>> public API, then I encourage you to create a jira issue in the HDFS
>> project.  We could consider it for the future.
>>
>>  HdfsFileStatus is a representation of the HDFS wire protocol, and it's
>> intended to be decoupled from the public API FileStatus object so that the
>> two can evolve independently.  From a pure code reuse perspective, I
>> suppose the two could share a common base class, but then that common base
>> class would need to creep into the public API too.
>>
>>  Currently, there is no guarantee about the availability of these fields
>> in the public REST API.  We're going to remove mention of them in the
>> documentation.  We're not necessarily planning to remove the fields from
>> the JSON immediately, but there is also no guarantee that they'll stay
>> there.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Bram Biesbrouck <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Friday, April 17, 2015 at 5:26 AM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: Found weird issue with HttpFS and WebHdfsFileSystem
>>
>>   Hi Chris,
>>
>>  Thanks for this reply. I thought something funny was happening.
>>
>>  The childNum field is actually very useful (eg for (not) rendering a
>> expansion marker next to a folder in a GUI when it has children), so it's a
>> pity the info is there, but get's "eaten up" by the general interface, only
>> to be re-calculated later on.
>> It would be nice to have the info as an optional field in the FileStatus
>> class (initialized to -1 like it is right now), so we can use it if it's
>> there or just ignore it when not initialized. While I'm
>> ranting, HdfsFileStatus should override from FileStatus because it's 95%
>> the same code anyway.
>>
>>  If I read your reply correctly, I assume the fields will be deleted
>> from the webhdfs JSON responses as well in the future?
>>
>>  Thanks again for the extensive reply, very useful and appreciated.
>>
>>  cheers,
>>
>>  b.
>>
>>
>>
>> On Fri, Apr 17, 2015 at 12:59 AM, Chris Nauroth <[email protected]
>> > wrote:
>>
>>>  Hello Bram,
>>>
>>>  There are a few Apache jiras with background discussion of the
>>> introduction of these fields in WebHDFS.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4502
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4772
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-4969
>>>
>>>  The new fields could not be supported in HTTPFS (only WebHDFS), and
>>> they were not intended to be guaranteed in the public REST API.
>>> Unfortunately, the fields were added to the documentation mistakenly in
>>> Apache Hadoop 2.5.0.
>>>
>>>  https://issues.apache.org/jira/browse/HDFS-6153
>>>
>>>  We're going to revert that documentation change in Apache Hadoop
>>> 2.8.0.  I suggest that your application does not rely on these fields, or
>>> at least includes fallback logic to keep working as best as it can if the
>>> fields are not present.  Another way to determine the number of children
>>> would be to make a subsequent LISTSTATUS call on the child path.
>>>
>>>  I apologize if this caused any inconvenience, and I hope the
>>> information helps.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Bram Biesbrouck <[email protected]>
>>> Reply-To: "[email protected]" <[email protected]>
>>> Date: Thursday, April 16, 2015 at 7:58 AM
>>> To: "[email protected]" <[email protected]>
>>> Subject: Found weird issue with HttpFS and WebHdfsFileSystem
>>>
>>>   Hi all,
>>>
>>>  I'm experiencing something strange while developing against the HttpFS
>>> front-end webapp on Hadoop 2.6.0.
>>>
>>>  I'm currently digging into WebHdfsFileSystem and HttpFS to understand
>>> it better and understand how the rest api works. I've setup a local single
>>> node Hadoop instance, which I can query successfully with eg.
>>> http://localhost:50070/webhdfs/v1/?op=LISTSTATUS
>>> Returning eg. this FileStatus object:
>>>
>>>  {
>>> accessTime: 0,
>>> blockSize: 0,
>>> childrenNum: 0,
>>> fileId: 16386,
>>> group: "supergroup",
>>> length: 0,
>>> modificationTime: 1417964248854,
>>> owner: "hadoop",
>>> pathSuffix: "user",
>>> permission: "755",
>>> replication: 0,
>>> storagePolicy: 0,
>>> type: "DIRECTORY"
>>> }
>>>
>>>  Now, when I start HttpFS and ask for the same data over it's interface
>>> (http://localhost:14000/webhdfs/v1/?op=LISTSTATUS), I get a different
>>> reply. Especially, the childrenNum and fileId fields are missing, compared
>>> to the first result (same file or directory):
>>>
>>>  {
>>> pathSuffix: "user",
>>> type: "DIRECTORY",
>>> length: 0,
>>> owner: "hadoop",
>>> group: "supergroup",
>>> permission: "755",
>>> accessTime: 0,
>>> modificationTime: 1417964248854,
>>> blockSize: 0,
>>> replication: 0
>>> }
>>>
>>>  Since I need the childrenNum property, I started digging into the code
>>> to see where it's "lost" and found that WebHdfsFileSystem performs a
>>> makeQualified() step (around line 1287 in WebHdfsFileSystem.java), just
>>> before the list of filestatuses is returned. Basically, it converts
>>> HdfsFileStatus objects into FileStatus objects, effectively chopping off
>>> those two properties.
>>>
>>>  The sources for HdfsFileStatus clearly state that it's an "Interface
>>> that represents the over the wire information for a file.", so I wonder why
>>> this happens, since the HdfsFileStatus contains all the right properties,
>>> according to the docs at
>>> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory
>>>
>>>  It feels like the FileStatus class hasn't been updated to match the
>>> HdfsFileStatus class, but since they don't share any interfaces or
>>> superclasses I get the feeling it's intentional, but I just can't find or
>>> figure out why.
>>>
>>>  Can somebody help or shed some light?
>>>
>>>  thanks,
>>>
>>>  b.
>>> --
>>>
>>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>>> reinvention
>>>
>>>
>>
>>
>>  --
>>
>>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
>> reinvention
>>
>>
>
>
>  --
>
>  Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
> reinvention
>
>


-- 

 Bram Biesbrouck - 0486/118280 - www.beligum.com -  the republic of
reinvention

Reply via email to