Re: Bad connection to FS. command aborted.

Andrei Savu Thu, 23 Feb 2012 13:08:37 -0800

This the branch for 0.7.1 RC0 and all tests are working as expected:
https://svn.apache.org/repos/asf/whirr/branches/branch-0.7


Can you give it a try? I'm still checking mapred.child.ulimit

On Thu, Feb 23, 2012 at 9:05 PM, Edmar Ferreira <
[email protected]> wrote:

> just some changes in install_hadoop.sh to install ruby and some
> dependencies.
> I'm running whirr from trunk and I build it 5 days ago, I guess.
> Do you think I need to do a svn checkout and build it again ?
>
>
> On Thu, Feb 23, 2012 at 6:53 PM, Andrei Savu <[email protected]>wrote:
>
>> It's strange this is happening because the integration tests work as
>> expected (we actually running MR jobs).
>>
>> Are you adding any other options?
>>
>>
>> On Thu, Feb 23, 2012 at 8:50 PM, Andrei Savu <[email protected]>wrote:
>>
>>> That looks like a change we've made in
>>> https://issues.apache.org/jira/browse/WHIRR-490
>>>
>>> It seems like "unlimited" is not a valid value for mapred.child.ulimit.
>>> Let me investigate a bit more.
>>>
>>> In  the meantime you can add to your .properties file something like:
>>>
>>> hadoop-mapreduce.mapred.child.ulimit=<very-large-number>
>>>
>>>
>>> On Thu, Feb 23, 2012 at 8:36 PM, Edmar Ferreira <
>>> [email protected]> wrote:
>>>
>>>> changed it and the cluster is running and I can access the fs and
>>>> submit jobs, but all jobs aways fail with this strange error:
>>>>
>>>> java.lang.NumberFormatException: For input string: "unlimited"
>>>>    at 
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>>>>    at java.lang.Integer.parseInt(Integer.java:481)
>>>>    at java.lang.Integer.valueOf(Integer.java:570)
>>>>    at org.apache.hadoop.util.Shell.getUlimitMemoryCommand(Shell.java:86)
>>>>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:379)
>>>>
>>>>
>>>>
>>>> Also when I try to access the full error log I see this in the browser:
>>>>
>>>> HTTP ERROR: 410
>>>>
>>>> Failed to retrieve stdout log for task: 
>>>> attempt_201202232026_0001_m_000005_0
>>>>
>>>> RequestURI=/tasklog
>>>>
>>>>
>>>> My proxy is running and I'm using the socks proxy in localhost 6666
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 23, 2012 at 5:25 PM, Andrei Savu <[email protected]>wrote:
>>>>
>>>>> That should work but I recommend you to try:
>>>>>
>>>>>
>>>>> http://apache.osuosl.org/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz
>>>>>
>>>>> archive.apache.org  is extremely unreliable.
>>>>>
>>>>>
>>>>> On Thu, Feb 23, 2012 at 7:18 PM, Edmar Ferreira <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I will destroy this cluster and launch again with these lines in the
>>>>>> properties:
>>>>>>
>>>>>>
>>>>>> whirr.hadoop.version=0.20.2
>>>>>> whirr.hadoop.tarball.url=
>>>>>> http://archive.apache.org/dist/hadoop/core/hadoop-${whirr.hadoop.version}/hadoop-${whirr.hadoop.version}.tar.gz
>>>>>>
>>>>>>
>>>>>> Any other ideas ?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 23, 2012 at 5:16 PM, Andrei Savu 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Yep, so I think this is the root cause. I'm pretty sure that you
>>>>>>> need to make sure you are running the same version.
>>>>>>>
>>>>>>> On Thu, Feb 23, 2012 at 7:14 PM, Edmar Ferreira <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> When I run :
>>>>>>>>
>>>>>>>> hadoop version in one cluster machine I get
>>>>>>>>
>>>>>>>> Warning: $HADOOP_HOME is deprecated.
>>>>>>>>
>>>>>>>> Hadoop 0.20.205.0
>>>>>>>> Subversion
>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205-r
>>>>>>>>  1179940
>>>>>>>> Compiled by hortonfo on Fri Oct  7 06:20:32 UTC 2011
>>>>>>>>
>>>>>>>>
>>>>>>>> When I run hadoop version in my local machine I get
>>>>>>>>
>>>>>>>> Hadoop 0.20.2
>>>>>>>> Subversion
>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r 
>>>>>>>> 911707
>>>>>>>> Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 23, 2012 at 5:05 PM, Andrei Savu <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Do the local Hadoop version match the remote one?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Feb 23, 2012 at 7:00 PM, Edmar Ferreira <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, I did a
>>>>>>>>>>
>>>>>>>>>> export HADOOP_CONF_DIR=~/.whirr/hadoop/
>>>>>>>>>>
>>>>>>>>>> before running hadoop fs -ls
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 23, 2012 at 4:56 PM, Ashish 
>>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>> Did you set the HADOOP_CONF_DIR=~/.whirr/<you cluster name> from
>>>>>>>>>>> the
>>>>>>>>>>> shell where you are running the hadoop command?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 24, 2012 at 12:23 AM, Andrei Savu <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> > That looks fine.
>>>>>>>>>>> >
>>>>>>>>>>> > Anything interesting in the Hadoop logs on the remote
>>>>>>>>>>> machines? Are all the
>>>>>>>>>>> > daemons running as expected?
>>>>>>>>>>> >
>>>>>>>>>>> > On Thu, Feb 23, 2012 at 6:48 PM, Edmar Ferreira
>>>>>>>>>>> > <[email protected]> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> last lines
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO
>>>>>>>>>>> >>  [org.apache.whirr.actions.ScriptBasedClusterAction] (main)
>>>>>>>>>>> Finished running
>>>>>>>>>>> >> configure phase scripts on all cluster instances
>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>  (main)
>>>>>>>>>>> >> Completed configuration of hadoop role hadoop-namenode
>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>  (main)
>>>>>>>>>>> >> Namenode web UI available at
>>>>>>>>>>> >> http://ec2-23-20-110-12.compute-1.amazonaws.com:50070
>>>>>>>>>>> >> 2012-02-23 16:04:30,242 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>  (main)
>>>>>>>>>>> >> Wrote Hadoop site file
>>>>>>>>>>> >> /Users/edmaroliveiraferreira/.whirr/hadoop/hadoop-site.xml
>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler]
>>>>>>>>>>>  (main)
>>>>>>>>>>> >> Wrote Hadoop proxy script
>>>>>>>>>>> >> /Users/edmaroliveiraferreira/.whirr/hadoop/hadoop-proxy.sh
>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopJobTrackerClusterActionHandler]
>>>>>>>>>>> >> (main) Completed configuration of hadoop role
>>>>>>>>>>> hadoop-jobtracker
>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopJobTrackerClusterActionHandler]
>>>>>>>>>>> >> (main) Jobtracker web UI available at
>>>>>>>>>>> >> http://ec2-23-20-110-12.compute-1.amazonaws.com:50030
>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopDataNodeClusterActionHandler]
>>>>>>>>>>>  (main)
>>>>>>>>>>> >> Completed configuration of hadoop role hadoop-datanode
>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO
>>>>>>>>>>> >>
>>>>>>>>>>>  
>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopTaskTrackerClusterActionHandler]
>>>>>>>>>>> >> (main) Completed configuration of hadoop role
>>>>>>>>>>> hadoop-tasktracker
>>>>>>>>>>> >> 2012-02-23 16:04:30,253 INFO
>>>>>>>>>>> >>  [org.apache.whirr.actions.ScriptBasedClusterAction] (main)
>>>>>>>>>>> Finished running
>>>>>>>>>>> >> start phase scripts on all cluster instances
>>>>>>>>>>> >> 2012-02-23 16:04:30,257 DEBUG
>>>>>>>>>>> [org.apache.whirr.service.ComputeCache]
>>>>>>>>>>> >> (Thread-3) closing ComputeServiceContext {provider=aws-ec2,
>>>>>>>>>>> >> endpoint=https://ec2.us-east-1.amazonaws.com,
>>>>>>>>>>> apiVersion=2010-06-15,
>>>>>>>>>>> >> buildVersion=, identity=08WMRG9HQYYGVQDT57R2,
>>>>>>>>>>> iso3166Codes=[US-VA, US-CA,
>>>>>>>>>>> >> US-OR, BR-SP, IE, SG, JP-13]}
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Thu, Feb 23, 2012 at 4:31 PM, Andrei Savu <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I think it's the first time I see this. Anything interesting
>>>>>>>>>>> in the
>>>>>>>>>>> >>> logs?
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Thu, Feb 23, 2012 at 6:27 PM, Edmar Ferreira
>>>>>>>>>>> >>> <[email protected]> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Hi guys,
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> When I launch a cluster and run the proxy everything seems
>>>>>>>>>>> to be right,
>>>>>>>>>>> >>>> but when I try to use any command in hadoop I get this
>>>>>>>>>>> error:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Bad connection to FS. command aborted.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Any suggestions ?
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> --
>>>>>>>>>>> >>>> Edmar Ferreira
>>>>>>>>>>> >>>> Co-Founder at Everwrite
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Edmar Ferreira
>>>>>>>>>>> >> Co-Founder at Everwrite
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> thanks
>>>>>>>>>>> ashish
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://www.ashishpaliwal.com/blog
>>>>>>>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Edmar Ferreira
>>>>>>>>>> Co-Founder at Everwrite
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Edmar Ferreira
>>>>>>>> Co-Founder at Everwrite
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Edmar Ferreira
>>>>>> Co-Founder at Everwrite
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Edmar Ferreira
>>>> Co-Founder at Everwrite
>>>>
>>>>
>>>
>>
>
>
> --
> Edmar Ferreira
> Co-Founder at Everwrite
>
>

Re: Bad connection to FS. command aborted.

Reply via email to