For info, the patch was committed today and made the cut to 0.24-rc1.



Thanks to @vinodkone for super-quick turnaround.



—
Sent from Mailbox

On Tue, Aug 18, 2015 at 10:45 AM, Marco Massenzio <[email protected]>
wrote:

> Hi Ashwanth,
> I've pushed a fix out for review <https://reviews.apache.org/r/37584/>,
> we'll see if it makes it in time for 0.24.
> As for the version, you can quickly verify that by running `mesos-master
> --version` (or just look at the very beginning of the logs, it will tell
> you a bunch of stuff about version, build, etc.)
> I am sorry, I don't really know enough about setting up Hadoop on Mesos to
> give you any useful guidance; from a quick glance at the code, it seems to
> me that, if the URI is a `hdfs://` one, the only way to retrieve the
> tarball is via HDFS (so you will need the hdfs client to be available on
> the Slave(s)).
> If you do use an HTTP URI (http://....) then it should work just fine.
> Hopefully others will be able to chime in with a more informed view.
> *Marco Massenzio*
> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
> On Tue, Aug 18, 2015 at 2:46 AM, Ashwanth Kumar <[email protected]> wrote:
>> Thanks Marco for the update.
>>
>> My understanding of the hadoop mesos framework was that the executor would
>> download the hadoop distro from mapred.mesos.executor.uri and execute the
>> TTs. I didn't know that to download from HDFS it needs `hdfs` binary in
>> PATH. I don't have a hadoop setup on the mesos slave. Should I go ahead and
>> add them?
>>
>> Regarding the line number mismatch, I installed the package through
>> mesosphere not sure if that's the reason.
>>
>>
>> On Tue, Aug 18, 2015 at 1:22 PM, Marco Massenzio <[email protected]>
>> wrote:
>>
>>> Are you sure this is a 0.21.1 cluster? the line numbers in the logs match
>>> the code in Mesos 0.23.0
>>>
>>> This is, however, a genuine bug (src/launcher/fetcher.cpp#L99):
>>>
>>>   Try<bool> available = hdfs.available();
>>>
>>>   if (available.isError() || !available.get()) {
>>>     return Error("Skipping fetch with Hadoop Client as"
>>>                  " Hadoop Client not available: " + available.error());
>>>   }
>>>
>>> The root cause is that (probably) the HDFS client is not available on the
>>> slave; however, we do not 'error()' but rather return a 'false' - this is
>>> all good.
>>> The bug is exposed in the return line, where we try to retrieve
>>> available.error() (which we should not - it's just `false`).
>>>
>>> This was a 'latent' bug that *may* have been exposed by (my) recent
>>> refactoring of os::shell which is used by hdfs.available() under the covers.
>>> (this is a bit unclear, though, as that refactoring is post-0.23)
>>>
>>> Be that as it may, I've filed
>>> https://issues.apache.org/jira/browse/MESOS-3287: the fix is trivial and
>>> I may be able to sneak it into 0.24 (which we're cutting now).
>>>
>>> Thanks for reporting!
>>>
>>> PS - bad code aside, the root cause is that the `hdfs` binary seems to be
>>> unreachable on the slave: is it installed in the PATH of the user under
>>> which the slave binary executes?
>>>
>>>
>>>
>>> *Marco Massenzio*
>>>
>>> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>>>
>>> On Mon, Aug 17, 2015 at 10:46 PM, Ashwanth Kumar <[email protected]>
>>> wrote:
>>>
>>>> We've a 20 node mesos cluster running mesos v0.21.1, We run marathon on
>>>> top of this setup without any problems for ~4 months now. I'm now trying to
>>>> get hadoop mesos <https://github.com/mesos/hadoop/> integration working
>>>> but I see the TaskTrackers that gets launched are failing with the
>>>> following error
>>>>
>>>> I0818 05:36:35.058688 24428 fetcher.cpp:409] Fetcher Info:
>>>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150706-075218-1611773194-5050-28439-S473\/hadoop","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/hdfs.prod:54310\/user\/ashwanth\/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/20150706-075218-1611773194-5050-28439-S473\/frameworks\/20150706-075218-1611773194-5050-28439-4532\/executors\/executor_Task_Tracker_4129\/runs\/c26f52d4-4055-46fa-b999-11d73f2096dd","user":"hadoop"}
>>>> I0818 05:36:35.059806 24428 fetcher.cpp:364] Fetching URI
>>>> 'hdfs://hdfs.prod:54310/user/ashwanth/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz'
>>>> I0818 05:36:35.059821 24428 fetcher.cpp:238] Fetching directly into the
>>>> sandbox directory
>>>> I0818 05:36:35.059835 24428 fetcher.cpp:176] Fetching URI
>>>> 'hdfs://hdfs.prod:54310/user/ashwanth/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz'
>>>> *mesos-fetcher:
>>>> /tmp/mesos-build/mesos-repo/3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp:90:
>>>> const string& Try<T>::error() const [with T = bool; std::string =
>>>> std::basic_string<char>]: Assertion `data.isNone()' failed.*
>>>> *** Aborted at 1439876195 (unix time) try "date -d @1439876195" if you
>>>> are using GNU date ***
>>>> PC: @       0x343ee32635 (unknown)
>>>> *** SIGABRT (@0x5f6c) received by PID 24428 (TID 0x7f988832f820) from
>>>> PID 24428; stack trace: ***
>>>>     @       0x343f20f710 (unknown)
>>>>     @       0x343ee32635 (unknown)
>>>>     @       0x343ee33e15 (unknown)
>>>>     @       0x343ee2b75e (unknown)
>>>>     @       0x343ee2b820 (unknown)
>>>>     @           0x408b0a Try<>::error()
>>>>     @           0x40cbcf download()
>>>>     @           0x4098a3 main
>>>>     @       0x343ee1ed5d (unknown)
>>>>     @           0x40aeb5 (unknown)
>>>> Failed to synchronize with slave (it's probably exited)
>>>>
>>>> Environment
>>>> - EC2 Machines
>>>> - Output of lsb_release -a
>>>> LSB Version:
>>>>  
>>>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>>> Distributor ID: CentOS
>>>> Description:  CentOS release 6.5 (Final)
>>>> Release:  6.5
>>>> Codename: Final
>>>>
>>>> Any ideas what I'm doing wrong?
>>>>
>>>> --
>>>> -- Ashwanth Kumar
>>>>
>>>
>>>
>>
>>
>> --
>> -- Ashwanth Kumar
>>

Reply via email to