Re: How to run hadoop jar command in a clustered environment
@Chris thanks a lot that helped a lot. On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth cnaur...@hortonworks.comwrote: Hello Thoihen, I'm moving this discussion from common-dev (questions about developing Hadoop) to user (questions about using Hadoop). If you haven't already seen it, then I recommend reading the cluster setup documentation. It's a bit different depending on the version of the Hadoop code that you're deploying and running. You mentioned JobTracker, so I expect that you're using something from the 1.x line, but here are links to both 1.x and 2.x docs just in case: 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html 2.x/trunk: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html To address your specific questions: 1. You can run the hadoop jar command and submit MapReduce jobs from any machine that has the Hadoop software and configuration deployed and has network connectivity to the machines that make up the Hadoop cluster. 2. Yes, you can use a separate machine that is not a member of the cluster (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or NodeManager). This is your choice. I've found it valuable to isolate nodes like this to prevent MR job tasks from taking processing resources away from interactive user commands, but this does mean that the resources on that node can't be utilized by MR jobs during user idle times, so it causes a small hit to overall utilization. Hope this helps, --Chris On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam thoihen...@gmail.com wrote: Hi All, I am really new to Hadoop and installed hadoop in my local ubuntu machine. I also created a wordcount.jar and started hadoop with start-all.sh which started all the hadoop daemons and used jps to confirm it. Cd to hadoop/bin and ran hadoop jar x.jar and successfully ran the map reduce program. Now, can someone please help me how I should run the hadoop jar command over a clustered environment say for example a cluster with 50 nodes. I know a dedicated machine would be namenode and another jobtracker and other datanodes and tasktrackers. 1. From which machine should I run the hadoop jar command considering I have a mapreduce jar in hand. Is it the jobtracker machine from where I should run this hadoop jar command or can I run this hadoop jar command from any machine in the cluster. 2, Can I run the map reduce job from another machine which is not part of the cluster , if yes how should I do it. Please help me. Regards thoihen
Re: Newbie question - How to start working on an issue?
@Mohammad Mustaqeem, Hadoop can be built using maven and ant 1.Building with maven assuming you are using ubuntu First download Hadoop by issuing the command-svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk(this you can find in how to contribute to hadoop wiki) now cd to hadoop-trunk there you will find one file called pom.xml. a. Type the command sudo apt-get install maven(This will install maven latest) if you have not installed maven, in case you have installed maven version 3 version it will give you errors. type mvn -version and see what version you have if you don't have the latest one just follow a. b. then use the command mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true(now maven will start downloading the dependencies and build) c. the final hadoop-x.x.x-snaphsot.tar.gz will be inside hadoop-trunk/hadoop-dist/target/ now you will have other files also. d.unzip the tar.gz and unzip and start your hadoop as if you have downloaded hadoop from cloudera or from apache.hope you know how to start the daemons and run the basic map reduce programs. for ant as you have mentioned earlier you are not using it so i am leaving it. i have already discussed building with ant earlier. Regards niranjan singh (sorry my caps lock is not working properly ) On Wed, Apr 10, 2013 at 6:24 PM, Mohammad Mustaqeem 3m.mustaq...@gmail.comwrote: @Chandrashekhar, How you build the Hadoop?? Plz, guide me. I also want to build. Which version of Hadoop u are using? On Wed, Apr 10, 2013 at 5:01 PM, Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hello everyone, Its been some time I have used Hadoop in my projects and now I want to contribute back to Hadoop. So this is my first time I am trying to contribute to Hadoop. I do not have experience of contributing to any open source project. I would like to know how to start working on an issue? Till now I have downloaded Hadoop source code and successfully built it. Now I have chosen one trivial issue which I think I can solve but I do not how to start working on it. If I have some question regarding the functionality of some piece of code then to whom can I ask? Do we need to learn by debugging or other people who know that piece of code will help us? Request you to please help. Thanks and Regards, Chandrash3khar -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270
Re: git clone hadoop taking too much time almost 12 hrs
Thanks Andrew for your suggestion,I will clone it from the mirror. Regards Niranjan Singh On Wed, Apr 10, 2013 at 11:04 PM, Andrew Wang andrew.w...@cloudera.comwrote: Hi Niranjan, Try doing your initial clone from the github mirror instead, I found it to be much faster: https://github.com/apache/hadoop-common I use the apache git for subsequent pulls. Best, Andrew On Tue, Apr 9, 2013 at 6:15 PM, maisnam ns maisnam...@gmail.com wrote: Hi, I am trying to execute - git clone git:// git.apache.org/hadoop-common.git so that I could setup a development environment for Hadoop under the Eclipse IDE but it is taking too much time. Can somebody let me know why it is taking too much time, I have a high speed internet connection and I don't think connectivity is the issue here. Thanks Niranjan Singh
Re: git clone hadoop taking too much time almost 12 hrs
Thanks moses and Harsh. Harsh , I 've bookmarked your blog, nice info. Regards Niranjan On Thu, Apr 11, 2013 at 12:09 AM, Harsh J ha...@cloudera.com wrote: I once blogged about cloning big repositories after experiencing the mammoth Android's repos were: http://www.harshj.com/2010/08/29/a-less-known-thing-about-cloning-git-repositories/ Try a git clone with a --depth=1 option, to reduce total download by not getting all the history objects. This would have some side-effects vs. a regular clone, but should be fine for contributions. On Wed, Apr 10, 2013 at 11:53 PM, mugisha moses mossp...@gmail.com wrote: The whole repo is like 290 mb so make sure you have a decent internet connection On Wed, Apr 10, 2013 at 9:03 PM, maisnam ns maisnam...@gmail.com wrote: Thanks Andrew for your suggestion,I will clone it from the mirror. Regards Niranjan Singh On Wed, Apr 10, 2013 at 11:04 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi Niranjan, Try doing your initial clone from the github mirror instead, I found it to be much faster: https://github.com/apache/hadoop-common I use the apache git for subsequent pulls. Best, Andrew On Tue, Apr 9, 2013 at 6:15 PM, maisnam ns maisnam...@gmail.com wrote: Hi, I am trying to execute - git clone git:// git.apache.org/hadoop-common.git so that I could setup a development environment for Hadoop under the Eclipse IDE but it is taking too much time. Can somebody let me know why it is taking too much time, I have a high speed internet connection and I don't think connectivity is the issue here. Thanks Niranjan Singh -- Harsh J
Re: Building Hadoop from source code
@Mohammad Mustaqeem - Once you create the patch as given in the 'How to contribute to Hadoop' and apply the patch. The changes will be reflected in the files you have modified. 1.Now you trigger the build ant or maven. I tried with ant so the ant command I gave is ant clean compile bin-package. Don't forget to download ivy.jar and copy into you ant home/ lib folder. Once the build is triggered Hadoop should get built along with the changes you made. If , I am not mistaken , you modified some hadoop files say BlockLocation.java, in your Hadoopx.x\src\core\org\apache\hadoop\fs\BlockLocation.java. The jar will be in Hadoopx.x\build\hadoop-0.20.3-dev-core.jar(In my version) Hope this clears your doubt. Regards Niranjan Singh On Tue, Apr 9, 2013 at 1:38 PM, Mohammad Mustaqeem 3m.mustaq...@gmail.comwrote: @Steve I am new to Hadoop developement. Can you please tell me, whats is the location of tar file?? On Tue, Apr 9, 2013 at 12:09 AM, Steve Loughran ste...@hortonworks.com wrote: On 8 April 2013 16:08, Mohammad Mustaqeem 3m.mustaq...@gmail.com wrote: Please, tell what I am doing wrong?? Whats the problem?? a lot of these seem to be network-related tests. You can turn off all the tests; look in BUILDING.TXT at the root of the source tree for the various operations, then add -DskipTests to the end of every command, such as mvn package -Pdist -Dtar -DskipTests to build the .tar packages mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true to turn off the javadoc creation too, for an even faster build -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270
Re: Building Hadoop from source code
@Mohammad Mustaqeem please follow the screenshots one by one cd build you will find Now cd hadoop-0.20.3-dev you will find the below screenshot Now the bin contains all the scripts you may need to start hadoop 1. Go to that bin directory and format your namenode 2. configure you site-xmls 3. start-all.sh note there are other ways also 4. do jps to see if all daemons are running On Tue, Apr 9, 2013 at 4:54 PM, Mohammad Mustaqeem 3m.mustaq...@gmail.comwrote: Please, anyone tell me where is the build hadoop?? that can be used to install the hadoop cluster. On Tue, Apr 9, 2013 at 2:32 PM, Mohammad Mustaqeem 3m.mustaq...@gmail.comwrote: @Ling, you mean to say to run find -name *tar*?? I run but don't know which file will be used to install Hadoop. @Niranjan I haven't changed anything I just executed svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk to get the Hadoop source code in the hadoop-trunk directory. After that I executed the cd hadoop-trunk and finally executed mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true. I want to ask what is location of tar file created which can be used to install the Hadoop ?? On Tue, Apr 9, 2013 at 2:08 PM, maisnam ns maisnam...@gmail.com wrote: @Mohammad Mustaqeem - Once you create the patch as given in the 'How to contribute to Hadoop' and apply the patch. The changes will be reflected in the files you have modified. 1.Now you trigger the build ant or maven. I tried with ant so the ant command I gave is ant clean compile bin-package. Don't forget to download ivy.jar and copy into you ant home/ lib folder. Once the build is triggered Hadoop should get built along with the changes you made. If , I am not mistaken , you modified some hadoop files say BlockLocation.java, in your Hadoopx.x\src\core\org\apache\hadoop\fs\BlockLocation.java. The jar will be in Hadoopx.x\build\hadoop-0.20.3-dev-core.jar(In my version) Hope this clears your doubt. Regards Niranjan Singh On Tue, Apr 9, 2013 at 1:38 PM, Mohammad Mustaqeem 3m.mustaq...@gmail.comwrote: @Steve I am new to Hadoop developement. Can you please tell me, whats is the location of tar file?? On Tue, Apr 9, 2013 at 12:09 AM, Steve Loughran ste...@hortonworks.com wrote: On 8 April 2013 16:08, Mohammad Mustaqeem 3m.mustaq...@gmail.com wrote: Please, tell what I am doing wrong?? Whats the problem?? a lot of these seem to be network-related tests. You can turn off all the tests; look in BUILDING.TXT at the root of the source tree for the various operations, then add -DskipTests to the end of every command, such as mvn package -Pdist -Dtar -DskipTests to build the .tar packages mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true to turn off the javadoc creation too, for an even faster build -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270 -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270 -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270
git clone hadoop taking too much time almost 12 hrs
Hi, I am trying to execute - git clone git:// git.apache.org/hadoop-common.git so that I could setup a development environment for Hadoop under the Eclipse IDE but it is taking too much time. Can somebody let me know why it is taking too much time, I have a high speed internet connection and I don't think connectivity is the issue here. Thanks Niranjan Singh