Re: Bad connection to FS. command aborted.

Edmar Ferreira Thu, 23 Feb 2012 17:28:37 -0800

As my friend Apache would say:

It Works !


Tomorrow I will try to  launch a cluster with more machines ( 100+) an I
will watch for any other issues.

PS: Looking forward to see the 0.8 version, been able to launch different
hardware for each role will be awesome for me.

Thanks.

On Thu, Feb 23, 2012 at 8:34 PM, Edmar Ferreira <
[email protected]> wrote:

> Cool ! I will build from branch 0.7 and launch a test cluster.
>
>
> On Thu, Feb 23, 2012 at 8:14 PM, Andrei Savu <[email protected]>wrote:
>
>> See the last 3 comments on
>> https://issues.apache.org/jira/browse/WHIRR-490 - I will commit a fix
>> now both to branch 0.7 & trunk. Sorry for any inconvenience. I am happy we
>> found this problem now before building the release candidate.
>>
>> Thanks!
>>
>>
>> On Thu, Feb 23, 2012 at 9:07 PM, Andrei Savu <[email protected]>wrote:
>>
>>> This the branch for 0.7.1 RC0 and all tests are working as expected:
>>> https://svn.apache.org/repos/asf/whirr/branches/branch-0.7
>>>
>>> Can you give it a try? I'm still checking mapred.child.ulimit
>>>
>>>
>>> On Thu, Feb 23, 2012 at 9:05 PM, Edmar Ferreira <
>>> [email protected]> wrote:
>>>
>>>> just some changes in install_hadoop.sh to install ruby and some
>>>> dependencies.
>>>> I'm running whirr from trunk and I build it 5 days ago, I guess.
>>>> Do you think I need to do a svn checkout and build it again ?
>>>>
>>>>
>>>> On Thu, Feb 23, 2012 at 6:53 PM, Andrei Savu <[email protected]>wrote:
>>>>
>>>>> It's strange this is happening because the integration tests work as
>>>>> expected (we actually running MR jobs).
>>>>>
>>>>> Are you adding any other options?
>>>>>
>>>>>
>>>>> On Thu, Feb 23, 2012 at 8:50 PM, Andrei Savu <[email protected]>wrote:
>>>>>
>>>>>> That looks like a change we've made in
>>>>>> https://issues.apache.org/jira/browse/WHIRR-490
>>>>>>
>>>>>> It seems like "unlimited" is not a valid value for
>>>>>> mapred.child.ulimit. Let me investigate a bit more.
>>>>>>
>>>>>> In  the meantime you can add to your .properties file something like:
>>>>>>
>>>>>> hadoop-mapreduce.mapred.child.ulimit=<very-large-number>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 23, 2012 at 8:36 PM, Edmar Ferreira <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> changed it and the cluster is running and I can access the fs and
>>>>>>> submit jobs, but all jobs aways fail with this strange error:
>>>>>>>
>>>>>>> java.lang.NumberFormatException: For input string: "unlimited"
>>>>>>>         at 
>>>>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>>>>>>>         at java.lang.Integer.parseInt(Integer.java:481)
>>>>>>>         at java.lang.Integer.valueOf(Integer.java:570)
>>>>>>>         at 
>>>>>>> org.apache.hadoop.util.Shell.getUlimitMemoryCommand(Shell.java:86)
>>>>>>>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:379)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Also when I try to access the full error log I see this in the
>>>>>>> browser:
>>>>>>>
>>>>>>> HTTP ERROR: 410
>>>>>>>
>>>>>>> Failed to retrieve stdout log for task: 
>>>>>>> attempt_201202232026_0001_m_000005_0
>>>>>>>
>>>>>>> RequestURI=/tasklog
>>>>>>>
>>>>>>>
>>>>>>> My proxy is running and I'm using the socks proxy in localhost 6666
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 23, 2012 at 5:25 PM, Andrei Savu 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> That should work but I recommend you to try:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://apache.osuosl.org/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz
>>>>>>>>
>>>>>>>> archive.apache.org  is extremely unreliable.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 23, 2012 at 7:18 PM, Edmar Ferreira <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I will destroy this cluster and launch again with these lines in
>>>>>>>>> the properties:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> whirr.hadoop.version=0.20.2
>>>>>>>>> whirr.hadoop.tarball.url=
>>>>>>>>> http://archive.apache.org/dist/hadoop/core/hadoop-${whirr.hadoop.version}/hadoop-${whirr.hadoop.version}.tar.gz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any other ideas ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Feb 23, 2012 at 5:16 PM, Andrei Savu <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Yep, so I think this is the root cause. I'm pretty sure that you
>>>>>>>>>> need to make sure you are running the same version.
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 23, 2012 at 7:14 PM, Edmar Ferreira <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> When I run :
>>>>>>>>>>>
>>>>>>>>>>> hadoop version in one cluster machine I get
>>>>>>>>>>>
>>>>>>>>>>> Warning: $HADOOP_HOME is deprecated.
>>>>>>>>>>>
>>>>>>>>>>> Hadoop 0.20.205.0
>>>>>>>>>>> Subversion
>>>>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205-r
>>>>>>>>>>>  1179940
>>>>>>>>>>> Compiled by hortonfo on Fri Oct  7 06:20:32 UTC 2011
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When I run hadoop version in my local machine I get
>>>>>>>>>>>
>>>>>>>>>>> Hadoop 0.20.2
>>>>>>>>>>> Subversion
>>>>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r
>>>>>>>>>>>  911707
>>>>>>>>>>> Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 23, 2012 at 5:05 PM, Andrei Savu <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Do the local Hadoop version match the remote one?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Feb 23, 2012 at 7:00 PM, Edmar Ferreira <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I did a
>>>>>>>>>>>>>
>>>>>>>>>>>>> export HADOOP_CONF_DIR=~/.whirr/hadoop/
>>>>>>>>>>>>>
>>>>>>>>>>>>> before running hadoop fs -ls
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Feb 23, 2012 at 4:56 PM, Ashish <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did you set the HADOOP_CONF_DIR=~/.whirr/<you cluster name>
>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>> shell where you are running the hadoop command?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 24, 2012 at 12:23 AM, Andrei Savu <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>> > That looks fine.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Anything interesting in the Hadoop logs on the remote
>>>>>>>>>>>>>> machines? Are all the
>>>>>>>>>>>>>> > daemons running as expected?
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Thu, Feb 23, 2012 at 6:48 PM, Edmar Ferreira
>>>>>>>>>>>>>> > <[email protected]> wrote:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> last lines
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO
>>>>>>>>>>>>>> >>  [org.apache.whirr.actions.ScriptBasedClusterAction]
>>>>>>>>>>>>>> (main) Finished running
>>>>>>>>>>>>>> >> configure phase scripts on all cluster instances
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>>>>  (main)
>>>>>>>>>>>>>> >> Completed configuration of hadoop role hadoop-namenode
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>>>>  (main)
>>>>>>>>>>>>>> >> Namenode web UI available at
>>>>>>>>>>>>>> >> http://ec2-23-20-110-12.compute-1.amazonaws.com:50070
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,242 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>>>>  (main)
>>>>>>>>>>>>>> >> Wrote Hadoop site file
>>>>>>>>>>>>>> >> /Users/edmaroliveiraferreira/.whirr/hadoop/hadoop-site.xml
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>>>>  (main)
>>>>>>>>>>>>>> >> Wrote Hadoop proxy script
>>>>>>>>>>>>>> >> /Users/edmaroliveiraferreira/.whirr/hadoop/hadoop-proxy.sh
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopJobTrackerClusterActionHandler]
>>>>>>>>>>>>>> >> (main) Completed configuration of hadoop role
>>>>>>>>>>>>>> hadoop-jobtracker
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopJobTrackerClusterActionHandler]
>>>>>>>>>>>>>> >> (main) Jobtracker web UI available at
>>>>>>>>>>>>>> >> http://ec2-23-20-110-12.compute-1.amazonaws.com:50030
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopDataNodeClusterActionHandler]
>>>>>>>>>>>>>>  (main)
>>>>>>>>>>>>>> >> Completed configuration of hadoop role hadoop-datanode
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopTaskTrackerClusterActionHandler]
>>>>>>>>>>>>>> >> (main) Completed configuration of hadoop role
>>>>>>>>>>>>>> hadoop-tasktracker
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,253 INFO
>>>>>>>>>>>>>> >>  [org.apache.whirr.actions.ScriptBasedClusterAction]
>>>>>>>>>>>>>> (main) Finished running
>>>>>>>>>>>>>> >> start phase scripts on all cluster instances
>>>>>>>>>>>>>> >> 2012-02-23 16:04:30,257 DEBUG
>>>>>>>>>>>>>> [org.apache.whirr.service.ComputeCache]
>>>>>>>>>>>>>> >> (Thread-3) closing ComputeServiceContext {provider=aws-ec2,
>>>>>>>>>>>>>> >> endpoint=https://ec2.us-east-1.amazonaws.com,
>>>>>>>>>>>>>> apiVersion=2010-06-15,
>>>>>>>>>>>>>> >> buildVersion=, identity=08WMRG9HQYYGVQDT57R2,
>>>>>>>>>>>>>> iso3166Codes=[US-VA, US-CA,
>>>>>>>>>>>>>> >> US-OR, BR-SP, IE, SG, JP-13]}
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Thu, Feb 23, 2012 at 4:31 PM, Andrei Savu <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> I think it's the first time I see this. Anything
>>>>>>>>>>>>>> interesting in the
>>>>>>>>>>>>>> >>> logs?
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> On Thu, Feb 23, 2012 at 6:27 PM, Edmar Ferreira
>>>>>>>>>>>>>> >>> <[email protected]> wrote:
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Hi guys,
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> When I launch a cluster and run the proxy everything
>>>>>>>>>>>>>> seems to be right,
>>>>>>>>>>>>>> >>>> but when I try to use any command in hadoop I get this
>>>>>>>>>>>>>> error:
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Bad connection to FS. command aborted.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Any suggestions ?
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> --
>>>>>>>>>>>>>> >>>> Edmar Ferreira
>>>>>>>>>>>>>> >>>> Co-Founder at Everwrite
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>> >> Edmar Ferreira
>>>>>>>>>>>>>> >> Co-Founder at Everwrite
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>> ashish
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Blog: http://www.ashishpaliwal.com/blog
>>>>>>>>>>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Edmar Ferreira
>>>>>>>>>>>>> Co-Founder at Everwrite
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Edmar Ferreira
>>>>>>>>>>> Co-Founder at Everwrite
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Edmar Ferreira
>>>>>>>>> Co-Founder at Everwrite
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Edmar Ferreira
>>>>>>> Co-Founder at Everwrite
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Edmar Ferreira
>>>> Co-Founder at Everwrite
>>>>
>>>>
>>>
>>
>
>
> --
> Edmar Ferreira
> Co-Founder at Everwrite
>
>


-- 
Edmar Ferreira
Co-Founder at Everwrite

Re: Bad connection to FS. command aborted.

Reply via email to