actually, i am wrong, the webhdfs rest call has an offset. 

Alejandro
(phone typing)

> On Mar 17, 2014, at 10:07, Alejandro Abdelnur <[email protected]> wrote:
> 
> dont recall how skips are handled in webhdfs, but i would assume that you'll 
> get to the first block As usual, and the skip is handled by the DN serving 
> the file (as webhdfs doesnot know at open that you'll skip)
> 
> Alejandro
> (phone typing)
> 
>> On Mar 17, 2014, at 9:47, RJ Nowling <[email protected]> wrote:
>> 
>> Hi Alejandro,
>> 
>> The WebHDFS API allows specifying an offset and length for the request.  If 
>> I specify an offset that start in the second block for a file (thus skipping 
>> the first block all together), will the namenode still direct me to a 
>> datanode with the first block or will it direct me to a namenode with the 
>> second block?  I.e., am I assured data locality only on the first block of 
>> the file (as you're saying) or on the first block I am accessing?
>> 
>> If it is as you say, then I may want to reach out the WebHDFS developers and 
>> see if they would be interested in the additional functionality.
>> 
>> Thank you,
>> RJ
>> 
>> 
>>> On Mon, Mar 17, 2014 at 2:40 AM, Alejandro Abdelnur <[email protected]> 
>>> wrote:
>>> I may have expressed myself wrong. You don't need to do any test to see how 
>>> locality works with files of multiple blocks. If you are accessing a file 
>>> of more than one block over webhdfs, you only have assured locality for the 
>>> first block of the file.
>>> 
>>> Thanks.
>>> 
>>> 
>>>> On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <[email protected]> wrote:
>>>> Thank you, Mingjiang and Alejandro.
>>>> 
>>>> This is interesting.  Since we will use the data locality information for 
>>>> scheduling, we could "hack" this to get the data locality information, at 
>>>> least for the first block.  As Alejandro says, we'd have to test what 
>>>> happens for other data blocks -- e.g., what if, knowing the block sizes, 
>>>> we request the second or third block?
>>>> 
>>>> Interesting food for thought!  I see some experiments in my future!  
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>>> On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <[email protected]> 
>>>>> wrote:
>>>>> well, this is for the first block of the file, the rest of the file 
>>>>> (blocks being local or not) are streamed out by the same datanode. for 
>>>>> small files (one block) you'll get locality, for large files only the 
>>>>> first block, and by chance if other blocks are local to that datanode. 
>>>>> 
>>>>> 
>>>>> Alejandro
>>>>> (phone typing)
>>>>> 
>>>>>> On Mar 16, 2014, at 18:53, Mingjiang Shi <[email protected]> wrote:
>>>>>> 
>>>>>> According to this page: 
>>>>>> http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
>>>>>>> Data Locality: The file read and file write calls are redirected to the 
>>>>>>> corresponding datanodes. It uses the full bandwidth of the Hadoop 
>>>>>>> cluster for streaming data.
>>>>>>> 
>>>>>>> A HDFS Built-in Component: WebHDFS is a first class built-in component 
>>>>>>> of HDFS. It runs inside Namenodes and Datanodes, therefore, it can use 
>>>>>>> all HDFS functionalities. It is a part of HDFS – there are no 
>>>>>>> additional servers to install
>>>>>>> 
>>>>>> 
>>>>>> So it looks like the data locality is built-into webhdfs, client will be 
>>>>>> redirected to the data node automatically. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <[email protected]> wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'm writing up a Google Summer of Code proposal to add HDFS support to 
>>>>>>> Disco, an Erlang MapReduce framework.  
>>>>>>> 
>>>>>>> We're interested in using WebHDFS.  I have two questions:
>>>>>>> 
>>>>>>> 1) Does WebHDFS allow querying data locality information?
>>>>>>> 
>>>>>>> 2) If the data locality information is known, can data on specific data 
>>>>>>> nodes be accessed via Web HDFS?  Or do all Web HDFS requests have to go 
>>>>>>> through a single server?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> RJ
>>>>>>> 
>>>>>>> -- 
>>>>>>> em [email protected]
>>>>>>> c 954.496.2314
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Cheers
>>>>>> -MJ
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> em [email protected]
>>>> c 954.496.2314
>>> 
>>> 
>>> 
>>> -- 
>>> Alejandro
>> 
>> 
>> 
>> -- 
>> em [email protected]
>> c 954.496.2314

Reply via email to