Re: Why rely on url scheme for fetching?

Ankur Chauhan Sat, 01 Nov 2014 20:50:46 -0700

Hi Tim,

I just created a review https://reviews.apache.org/r/27483/ 
<https://reviews.apache.org/r/27483/> It's my first stab at it and I will try 
to add more tests as I figure out how to do the hadoop mocking and stuff. Have 
a look and let me know what you think about it so far.


-- Ankur

> On 1 Nov 2014, at 20:05, Ankur Chauhan <[email protected]> wrote:
> 
> Yea, i saw that the minute i pressed send. I'll start the review board so 
> that people can have a look at the change.
> 
> -- Ankur
> 
>> On 1 Nov 2014, at 20:01, Tim Chen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Ankur,
>> 
>> There is a fetcher_tests.cpp in src/tests.
>> 
>> Tim
>> 
>> On Sat, Nov 1, 2014 at 7:27 PM, Ankur Chauhan <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi Tim,
>> 
>> I am trying to find/write some test cases. I couldn't find a 
>> fetcher_tests.{cpp|hpp} so once I have something, I'll post on review board. 
>> I am new to gmock/gtest so bear with me while i get up to speed.
>> 
>> -- Ankur
>> 
>>> On 1 Nov 2014, at 19:23, Timothy Chen <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Ankur,
>>> 
>>> Can you post on reviewboard? We can discuss more about the code there.
>>> 
>>> Tim
>>> 
>>> Sent from my iPhone
>>> 
>>> On Nov 1, 2014, at 6:29 PM, Ankur Chauhan <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> Hi Tim,
>>>> 
>>>> I don't think there is an issue which is directly in line with what i 
>>>> wanted but the closest one that I could find in JIRA is 
>>>> https://issues.apache.org/jira/browse/MESOS-1711 
>>>> <https://issues.apache.org/jira/browse/MESOS-1711>
>>>> 
>>>> I have a branch ( 
>>>> https://github.com/ankurcha/mesos/compare/prefer_hadoop_fetcher 
>>>> <https://github.com/ankurcha/mesos/compare/prefer_hadoop_fetcher> ) that 
>>>> has a change that would enable users to specify whatever hdfs compatible 
>>>> uris to the mesos-fetcher but maybe you can weight in on it. Do you think 
>>>> this is the right track? if so, i would like to pick this issue and submit 
>>>> a patch for review.
>>>> 
>>>> -- Ankur
>>>> 
>>>> 
>>>>> On 1 Nov 2014, at 04:32, Tom Arnfeld <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Completely +1 to this. There are now quite a lot of hadoop compatible 
>>>>> filesystem wrappers out in the wild and this would certainly be very 
>>>>> useful.
>>>>> 
>>>>> I'm happy to contribute a patch. Here's a few related issues that might 
>>>>> be of interest;
>>>>> 
>>>>> - https://issues.apache.org/jira/browse/MESOS-1887 
>>>>> <https://issues.apache.org/jira/browse/MESOS-1887>
>>>>> - https://issues.apache.org/jira/browse/MESOS-1316 
>>>>> <https://issues.apache.org/jira/browse/MESOS-1316>
>>>>> - https://issues.apache.org/jira/browse/MESOS-336 
>>>>> <https://issues.apache.org/jira/browse/MESOS-336>
>>>>> - https://issues.apache.org/jira/browse/MESOS-1248 
>>>>> <https://issues.apache.org/jira/browse/MESOS-1248>
>>>>> 
>>>>> On 31 October 2014 22:39, Tim Chen <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> I believe there is already a JIRA ticket for this, if you search for 
>>>>> fetcher in Mesos JIRA I think you can find it.
>>>>> 
>>>>> Tim
>>>>> 
>>>>> On Fri, Oct 31, 2014 at 3:27 PM, Ankur Chauhan <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hi,
>>>>> 
>>>>> I have been looking at some of the stuff around the fetcher and saw 
>>>>> something interesting. The code for fetcher::fetch method is dependent on 
>>>>> a hard coded list of url schemes. No doubt that this works but is very 
>>>>> restrictive.
>>>>> Hadoop/HDFS in general is pretty flexible when it comes to being able to 
>>>>> fetch stuff from urls and has the ability to fetch a large number of 
>>>>> types of urls and can be extended by adding configuration into the 
>>>>> conf/hdfs-site.xml and core-site.xml
>>>>> 
>>>>> What I am proposing is that we refactor the fetcher.cpp to prefer to use 
>>>>> the hdfs (using hdfs/hdfs.hpp) to do all the fetching if HADOOP_HOME is 
>>>>> set and $HADOOP_HOME/bin/hadoop is available. This logic already exists 
>>>>> and we can just use it. The fallback logic for using net::download or 
>>>>> local file copy is may be left in place for installations that do not 
>>>>> have hadoop configured. This means that if hadoop is present we can 
>>>>> directly fetch urls such as tachyon://... snackfs:// ... cfs:// .... 
>>>>> ftp://... s3://... http:// ... file:// with no extra effort. This makes 
>>>>> up for a much better experience when it comes to debugging and 
>>>>> extensibility.
>>>>> 
>>>>> What do others think about this?
>>>>> 
>>>>> - Ankur
>>>>> 
>>>>> 
>>>> 
>> 
>> 
>

Re: Why rely on url scheme for fetching?

Reply via email to