We'd be very excited to see a pluggable mesos fetcher!
-- Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 ________________________________ From: Ken Sipe [[email protected]] Sent: 11 May 2016 08:40 To: [email protected] Subject: Re: Enable s3a for fetcher Jamie, I’m in Europe this week… so the timing of my responses are out of sync / delayed. There are 2 issues to work with here. The first is having a pluggable mesos fetcher… sounds like that is scheduled for 0.30. The other is what is available on dcos. Could you move that discussion to that mailing list? I will definitely work with you on getting this resolved. ken On May 10, 2016, at 3:45 PM, Briant, James <[email protected]<mailto:[email protected]>> wrote: Ok. Thanks Joseph. I will figure out how to get a more recent hadoop onto my dcos agents then. Jamie From: Joseph Wu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, May 10, 2016 at 1:40 PM To: user <[email protected]<mailto:[email protected]>> Subject: Re: Enable s3a for fetcher I can't speak to what DCOS does or will do (you can ask on the associated mailing list: [email protected]<mailto:[email protected]>). We will be maintaining existing functionality for the fetcher, which means supporting the schemes: * file * http, https, ftp, ftps * hdfs, hftp, s3, s3n <-- These rely on hadoop. And we will retain the --hadoop_home agent flag, which you can use to specify the hadoop binary. Other schemes might work right now, if you hack around with your node setup. But there's no guarantee that your hack will work between Mesos versions. In future, we will associate a fetcher plugin for each scheme. And you will be able to load custom fetcher plugins for additional schemes. TLDR: no "nerfing" and less hackiness :) On Tue, May 10, 2016 at 12:58 PM, Briant, James <[email protected]<mailto:[email protected]>> wrote: This is the mesos latest documentation: If the requested URI is based on some other protocol, then the fetcher tries to utilise a local Hadoop client and hence supports any protocol supported by the Hadoop client, e.g., HDFS, S3. See the slave configuration documentation<http://mesos.apache.org/documentation/latest/configuration/> for how to configure the slave with a path to the Hadoop client. [emphasis added] What you are saying is that dcos simply wont install hadoop on agents? Next question then: will you be nerfing fetcher.cpp, or will I be able to install hadoop on the agents myself, such that mesos will recognize s3a? From: Joseph Wu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, May 10, 2016 at 12:20 PM To: user <[email protected]<mailto:[email protected]>> Subject: Re: Enable s3a for fetcher Mesos does not explicitly support HDFS and S3. Rather, Mesos will assume you have a hadoop binary and use it (blindly) for certain types of URIs. If the hadoop binary is not present, the mesos-fetcher will fail to fetch your HDFS or S3 URIs. Mesos does not ship/package hadoop, so these URIs are not expected to work out of the box (for plain Mesos distributions). In all cases, the operator must preconfigure hadoop on each node (similar to how Docker in Mesos works). Here's the epic tracking the modularization of the mesos-fetcher (I estimate it'll be done by 0.30): https://issues.apache.org/jira/browse/MESOS-3918 ^ Once done, it should be easier to plug in more fetchers, such as one for your use-case. On Tue, May 10, 2016 at 11:21 AM, Briant, James <[email protected]<mailto:[email protected]>> wrote: I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works. If hadoop is gone, does that mean that hfds: URIs don’t work either? Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3. In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS? Jamie From: Cody Maloney <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, May 10, 2016 at 10:58 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Enable s3a for fetcher The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 for instance). Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff. Cody On Tue, May 10, 2016 at 9:55 AM Briant, James <[email protected]<mailto:[email protected]>> wrote: I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk: hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws aws-java-sdk-1.7.4.jar hadoop-aws-2.5.0-cdh5.3.3.jar What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works? Thanks, Jamie

