As my friend Apache would say: It Works !
Tomorrow I will try to launch a cluster with more machines ( 100+) an I will watch for any other issues. PS: Looking forward to see the 0.8 version, been able to launch different hardware for each role will be awesome for me. Thanks. On Thu, Feb 23, 2012 at 8:34 PM, Edmar Ferreira < [email protected]> wrote: > Cool ! I will build from branch 0.7 and launch a test cluster. > > > On Thu, Feb 23, 2012 at 8:14 PM, Andrei Savu <[email protected]>wrote: > >> See the last 3 comments on >> https://issues.apache.org/jira/browse/WHIRR-490 - I will commit a fix >> now both to branch 0.7 & trunk. Sorry for any inconvenience. I am happy we >> found this problem now before building the release candidate. >> >> Thanks! >> >> >> On Thu, Feb 23, 2012 at 9:07 PM, Andrei Savu <[email protected]>wrote: >> >>> This the branch for 0.7.1 RC0 and all tests are working as expected: >>> https://svn.apache.org/repos/asf/whirr/branches/branch-0.7 >>> >>> Can you give it a try? I'm still checking mapred.child.ulimit >>> >>> >>> On Thu, Feb 23, 2012 at 9:05 PM, Edmar Ferreira < >>> [email protected]> wrote: >>> >>>> just some changes in install_hadoop.sh to install ruby and some >>>> dependencies. >>>> I'm running whirr from trunk and I build it 5 days ago, I guess. >>>> Do you think I need to do a svn checkout and build it again ? >>>> >>>> >>>> On Thu, Feb 23, 2012 at 6:53 PM, Andrei Savu <[email protected]>wrote: >>>> >>>>> It's strange this is happening because the integration tests work as >>>>> expected (we actually running MR jobs). >>>>> >>>>> Are you adding any other options? >>>>> >>>>> >>>>> On Thu, Feb 23, 2012 at 8:50 PM, Andrei Savu <[email protected]>wrote: >>>>> >>>>>> That looks like a change we've made in >>>>>> https://issues.apache.org/jira/browse/WHIRR-490 >>>>>> >>>>>> It seems like "unlimited" is not a valid value for >>>>>> mapred.child.ulimit. Let me investigate a bit more. >>>>>> >>>>>> In the meantime you can add to your .properties file something like: >>>>>> >>>>>> hadoop-mapreduce.mapred.child.ulimit=<very-large-number> >>>>>> >>>>>> >>>>>> On Thu, Feb 23, 2012 at 8:36 PM, Edmar Ferreira < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> changed it and the cluster is running and I can access the fs and >>>>>>> submit jobs, but all jobs aways fail with this strange error: >>>>>>> >>>>>>> java.lang.NumberFormatException: For input string: "unlimited" >>>>>>> at >>>>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) >>>>>>> at java.lang.Integer.parseInt(Integer.java:481) >>>>>>> at java.lang.Integer.valueOf(Integer.java:570) >>>>>>> at >>>>>>> org.apache.hadoop.util.Shell.getUlimitMemoryCommand(Shell.java:86) >>>>>>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:379) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Also when I try to access the full error log I see this in the >>>>>>> browser: >>>>>>> >>>>>>> HTTP ERROR: 410 >>>>>>> >>>>>>> Failed to retrieve stdout log for task: >>>>>>> attempt_201202232026_0001_m_000005_0 >>>>>>> >>>>>>> RequestURI=/tasklog >>>>>>> >>>>>>> >>>>>>> My proxy is running and I'm using the socks proxy in localhost 6666 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 23, 2012 at 5:25 PM, Andrei Savu >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> That should work but I recommend you to try: >>>>>>>> >>>>>>>> >>>>>>>> http://apache.osuosl.org/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz >>>>>>>> >>>>>>>> archive.apache.org is extremely unreliable. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 23, 2012 at 7:18 PM, Edmar Ferreira < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I will destroy this cluster and launch again with these lines in >>>>>>>>> the properties: >>>>>>>>> >>>>>>>>> >>>>>>>>> whirr.hadoop.version=0.20.2 >>>>>>>>> whirr.hadoop.tarball.url= >>>>>>>>> http://archive.apache.org/dist/hadoop/core/hadoop-${whirr.hadoop.version}/hadoop-${whirr.hadoop.version}.tar.gz >>>>>>>>> >>>>>>>>> >>>>>>>>> Any other ideas ? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 23, 2012 at 5:16 PM, Andrei Savu < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Yep, so I think this is the root cause. I'm pretty sure that you >>>>>>>>>> need to make sure you are running the same version. >>>>>>>>>> >>>>>>>>>> On Thu, Feb 23, 2012 at 7:14 PM, Edmar Ferreira < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> When I run : >>>>>>>>>>> >>>>>>>>>>> hadoop version in one cluster machine I get >>>>>>>>>>> >>>>>>>>>>> Warning: $HADOOP_HOME is deprecated. >>>>>>>>>>> >>>>>>>>>>> Hadoop 0.20.205.0 >>>>>>>>>>> Subversion >>>>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205-r >>>>>>>>>>> 1179940 >>>>>>>>>>> Compiled by hortonfo on Fri Oct 7 06:20:32 UTC 2011 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> When I run hadoop version in my local machine I get >>>>>>>>>>> >>>>>>>>>>> Hadoop 0.20.2 >>>>>>>>>>> Subversion >>>>>>>>>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r >>>>>>>>>>> 911707 >>>>>>>>>>> Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 23, 2012 at 5:05 PM, Andrei Savu < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Do the local Hadoop version match the remote one? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Feb 23, 2012 at 7:00 PM, Edmar Ferreira < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Yes, I did a >>>>>>>>>>>>> >>>>>>>>>>>>> export HADOOP_CONF_DIR=~/.whirr/hadoop/ >>>>>>>>>>>>> >>>>>>>>>>>>> before running hadoop fs -ls >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 23, 2012 at 4:56 PM, Ashish < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Did you set the HADOOP_CONF_DIR=~/.whirr/<you cluster name> >>>>>>>>>>>>>> from the >>>>>>>>>>>>>> shell where you are running the hadoop command? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Feb 24, 2012 at 12:23 AM, Andrei Savu < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> > That looks fine. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Anything interesting in the Hadoop logs on the remote >>>>>>>>>>>>>> machines? Are all the >>>>>>>>>>>>>> > daemons running as expected? >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Thu, Feb 23, 2012 at 6:48 PM, Edmar Ferreira >>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> last lines >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO >>>>>>>>>>>>>> >> [org.apache.whirr.actions.ScriptBasedClusterAction] >>>>>>>>>>>>>> (main) Finished running >>>>>>>>>>>>>> >> configure phase scripts on all cluster instances >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler] >>>>>>>>>>>>>> (main) >>>>>>>>>>>>>> >> Completed configuration of hadoop role hadoop-namenode >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,241 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler] >>>>>>>>>>>>>> (main) >>>>>>>>>>>>>> >> Namenode web UI available at >>>>>>>>>>>>>> >> http://ec2-23-20-110-12.compute-1.amazonaws.com:50070 >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,242 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler] >>>>>>>>>>>>>> (main) >>>>>>>>>>>>>> >> Wrote Hadoop site file >>>>>>>>>>>>>> >> /Users/edmaroliveiraferreira/.whirr/hadoop/hadoop-site.xml >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler] >>>>>>>>>>>>>> (main) >>>>>>>>>>>>>> >> Wrote Hadoop proxy script >>>>>>>>>>>>>> >> /Users/edmaroliveiraferreira/.whirr/hadoop/hadoop-proxy.sh >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopJobTrackerClusterActionHandler] >>>>>>>>>>>>>> >> (main) Completed configuration of hadoop role >>>>>>>>>>>>>> hadoop-jobtracker >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopJobTrackerClusterActionHandler] >>>>>>>>>>>>>> >> (main) Jobtracker web UI available at >>>>>>>>>>>>>> >> http://ec2-23-20-110-12.compute-1.amazonaws.com:50030 >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopDataNodeClusterActionHandler] >>>>>>>>>>>>>> (main) >>>>>>>>>>>>>> >> Completed configuration of hadoop role hadoop-datanode >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,246 INFO >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [org.apache.whirr.service.hadoop.HadoopTaskTrackerClusterActionHandler] >>>>>>>>>>>>>> >> (main) Completed configuration of hadoop role >>>>>>>>>>>>>> hadoop-tasktracker >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,253 INFO >>>>>>>>>>>>>> >> [org.apache.whirr.actions.ScriptBasedClusterAction] >>>>>>>>>>>>>> (main) Finished running >>>>>>>>>>>>>> >> start phase scripts on all cluster instances >>>>>>>>>>>>>> >> 2012-02-23 16:04:30,257 DEBUG >>>>>>>>>>>>>> [org.apache.whirr.service.ComputeCache] >>>>>>>>>>>>>> >> (Thread-3) closing ComputeServiceContext {provider=aws-ec2, >>>>>>>>>>>>>> >> endpoint=https://ec2.us-east-1.amazonaws.com, >>>>>>>>>>>>>> apiVersion=2010-06-15, >>>>>>>>>>>>>> >> buildVersion=, identity=08WMRG9HQYYGVQDT57R2, >>>>>>>>>>>>>> iso3166Codes=[US-VA, US-CA, >>>>>>>>>>>>>> >> US-OR, BR-SP, IE, SG, JP-13]} >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> On Thu, Feb 23, 2012 at 4:31 PM, Andrei Savu < >>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>> >> wrote: >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> I think it's the first time I see this. Anything >>>>>>>>>>>>>> interesting in the >>>>>>>>>>>>>> >>> logs? >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> On Thu, Feb 23, 2012 at 6:27 PM, Edmar Ferreira >>>>>>>>>>>>>> >>> <[email protected]> wrote: >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Hi guys, >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> When I launch a cluster and run the proxy everything >>>>>>>>>>>>>> seems to be right, >>>>>>>>>>>>>> >>>> but when I try to use any command in hadoop I get this >>>>>>>>>>>>>> error: >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Bad connection to FS. command aborted. >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Any suggestions ? >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> Thanks >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>>> -- >>>>>>>>>>>>>> >>>> Edmar Ferreira >>>>>>>>>>>>>> >>>> Co-Founder at Everwrite >>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> -- >>>>>>>>>>>>>> >> Edmar Ferreira >>>>>>>>>>>>>> >> Co-Founder at Everwrite >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> thanks >>>>>>>>>>>>>> ashish >>>>>>>>>>>>>> >>>>>>>>>>>>>> Blog: http://www.ashishpaliwal.com/blog >>>>>>>>>>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Edmar Ferreira >>>>>>>>>>>>> Co-Founder at Everwrite >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Edmar Ferreira >>>>>>>>>>> Co-Founder at Everwrite >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Edmar Ferreira >>>>>>>>> Co-Founder at Everwrite >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Edmar Ferreira >>>>>>> Co-Founder at Everwrite >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Edmar Ferreira >>>> Co-Founder at Everwrite >>>> >>>> >>> >> > > > -- > Edmar Ferreira > Co-Founder at Everwrite > > -- Edmar Ferreira Co-Founder at Everwrite
