actually, i am wrong, the webhdfs rest call has an offset. Alejandro (phone typing)
> On Mar 17, 2014, at 10:07, Alejandro Abdelnur <[email protected]> wrote: > > dont recall how skips are handled in webhdfs, but i would assume that you'll > get to the first block As usual, and the skip is handled by the DN serving > the file (as webhdfs doesnot know at open that you'll skip) > > Alejandro > (phone typing) > >> On Mar 17, 2014, at 9:47, RJ Nowling <[email protected]> wrote: >> >> Hi Alejandro, >> >> The WebHDFS API allows specifying an offset and length for the request. If >> I specify an offset that start in the second block for a file (thus skipping >> the first block all together), will the namenode still direct me to a >> datanode with the first block or will it direct me to a namenode with the >> second block? I.e., am I assured data locality only on the first block of >> the file (as you're saying) or on the first block I am accessing? >> >> If it is as you say, then I may want to reach out the WebHDFS developers and >> see if they would be interested in the additional functionality. >> >> Thank you, >> RJ >> >> >>> On Mon, Mar 17, 2014 at 2:40 AM, Alejandro Abdelnur <[email protected]> >>> wrote: >>> I may have expressed myself wrong. You don't need to do any test to see how >>> locality works with files of multiple blocks. If you are accessing a file >>> of more than one block over webhdfs, you only have assured locality for the >>> first block of the file. >>> >>> Thanks. >>> >>> >>>> On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <[email protected]> wrote: >>>> Thank you, Mingjiang and Alejandro. >>>> >>>> This is interesting. Since we will use the data locality information for >>>> scheduling, we could "hack" this to get the data locality information, at >>>> least for the first block. As Alejandro says, we'd have to test what >>>> happens for other data blocks -- e.g., what if, knowing the block sizes, >>>> we request the second or third block? >>>> >>>> Interesting food for thought! I see some experiments in my future! >>>> >>>> Thanks! >>>> >>>> >>>>> On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <[email protected]> >>>>> wrote: >>>>> well, this is for the first block of the file, the rest of the file >>>>> (blocks being local or not) are streamed out by the same datanode. for >>>>> small files (one block) you'll get locality, for large files only the >>>>> first block, and by chance if other blocks are local to that datanode. >>>>> >>>>> >>>>> Alejandro >>>>> (phone typing) >>>>> >>>>>> On Mar 16, 2014, at 18:53, Mingjiang Shi <[email protected]> wrote: >>>>>> >>>>>> According to this page: >>>>>> http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/ >>>>>>> Data Locality: The file read and file write calls are redirected to the >>>>>>> corresponding datanodes. It uses the full bandwidth of the Hadoop >>>>>>> cluster for streaming data. >>>>>>> >>>>>>> A HDFS Built-in Component: WebHDFS is a first class built-in component >>>>>>> of HDFS. It runs inside Namenodes and Datanodes, therefore, it can use >>>>>>> all HDFS functionalities. It is a part of HDFS – there are no >>>>>>> additional servers to install >>>>>>> >>>>>> >>>>>> So it looks like the data locality is built-into webhdfs, client will be >>>>>> redirected to the data node automatically. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <[email protected]> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I'm writing up a Google Summer of Code proposal to add HDFS support to >>>>>>> Disco, an Erlang MapReduce framework. >>>>>>> >>>>>>> We're interested in using WebHDFS. I have two questions: >>>>>>> >>>>>>> 1) Does WebHDFS allow querying data locality information? >>>>>>> >>>>>>> 2) If the data locality information is known, can data on specific data >>>>>>> nodes be accessed via Web HDFS? Or do all Web HDFS requests have to go >>>>>>> through a single server? >>>>>>> >>>>>>> Thanks, >>>>>>> RJ >>>>>>> >>>>>>> -- >>>>>>> em [email protected] >>>>>>> c 954.496.2314 >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Cheers >>>>>> -MJ >>>> >>>> >>>> >>>> -- >>>> em [email protected] >>>> c 954.496.2314 >>> >>> >>> >>> -- >>> Alejandro >> >> >> >> -- >> em [email protected] >> c 954.496.2314
