Re: How to run hadoop jar command in a clustered environment

2013-04-15 Thread maisnam ns
@Chris thanks a lot that helped a lot.


On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth cnaur...@hortonworks.comwrote:

 Hello Thoihen,

 I'm moving this discussion from common-dev (questions about developing
 Hadoop) to user (questions about using Hadoop).

 If you haven't already seen it, then I recommend reading the cluster setup
 documentation.  It's a bit different depending on the version of the Hadoop
 code that you're deploying and running.  You mentioned JobTracker, so I
 expect that you're using something from the 1.x line, but here are links to
 both 1.x and 2.x docs just in case:

 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
 2.x/trunk:

 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

 To address your specific questions:

 1. You can run the hadoop jar command and submit MapReduce jobs from any
 machine that has the Hadoop software and configuration deployed and has
 network connectivity to the machines that make up the Hadoop cluster.

 2. Yes, you can use a separate machine that is not a member of the cluster
 (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
 NodeManager).  This is your choice.  I've found it valuable to isolate
 nodes like this to prevent MR job tasks from taking processing resources
 away from interactive user commands, but this does mean that the resources
 on that node can't be utilized by MR jobs during user idle times, so it
 causes a small hit to overall utilization.

 Hope this helps,
 --Chris


 On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam thoihen...@gmail.com
 wrote:

  Hi All,
 
  I am really new to Hadoop and installed hadoop in my local ubuntu
 machine.
  I also created a wordcount.jar and started hadoop with start-all.sh which
  started all the hadoop daemons and used jps to confirm it. Cd to
 hadoop/bin
  and ran hadoop jar x.jar  and successfully ran the map reduce program.
 
  Now, can someone please help me how I should run the hadoop jar command
  over a clustered environment say for example a cluster with 50 nodes. I
  know a dedicated machine would be namenode and another jobtracker and
 other
  datanodes and tasktrackers.
 
  1. From which machine should I run the hadoop jar command considering I
  have a mapreduce jar in hand. Is it the jobtracker machine from where I
  should run this hadoop jar command or can I run this hadoop jar command
  from any machine in the cluster.
 
  2, Can I run the map reduce job from another machine which is not part of
  the cluster , if yes how should I do it.
 
  Please help me.
 
  Regards
  thoihen
 



Re: Newbie question - How to start working on an issue?

2013-04-10 Thread maisnam ns
@Mohammad Mustaqeem, Hadoop can be built using maven and ant

1.Building with maven assuming you are using ubuntu
First download  Hadoop by issuing the command-svn checkout
http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk(this you
can find in how to contribute to hadoop wiki)
now cd to hadoop-trunk there you will find one file called pom.xml.

a. Type the command sudo apt-get install maven(This will
install maven latest) if you have not installed maven, in case you have
installed maven version  3 version it will give you errors. type mvn
-version and see what version you have if you don't have the latest one
just follow a.
b. then use the command
mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true(now maven
will start downloading the dependencies and build)
c. the final hadoop-x.x.x-snaphsot.tar.gz will be inside
hadoop-trunk/hadoop-dist/target/  now you will have other files also.
d.unzip the tar.gz and unzip and start your hadoop as if you have
downloaded hadoop from cloudera or from apache.hope you know how to start
the daemons and run the basic map reduce programs.

for ant as you have mentioned earlier you are not using it so i am leaving
it. i have already discussed building with ant earlier.

Regards
niranjan singh

(sorry my caps lock is not working properly )




On Wed, Apr 10, 2013 at 6:24 PM, Mohammad Mustaqeem
3m.mustaq...@gmail.comwrote:

 @Chandrashekhar, How you build the Hadoop??
 Plz, guide me.
 I also want to build.
 Which version of Hadoop u are using?


 On Wed, Apr 10, 2013 at 5:01 PM, Chandrashekhar Kotekar 
 shekhar.kote...@gmail.com wrote:

  Hello everyone,
 
  Its been some time I have used Hadoop in my projects and now I want to
  contribute back to Hadoop. So this is my first time I am trying to
  contribute to Hadoop. I do not have experience of contributing to any
 open
  source project.
 
  I would like to know how to start working on an issue? Till now I have
  downloaded Hadoop source code and successfully built it.
 
  Now I have chosen one trivial issue which I think I can solve but I do
 not
  how to start working on it.
 
  If I have some question regarding the functionality of some piece of code
  then to whom can I ask?
 
  Do we need to learn by debugging or other people who know that piece of
  code will help us?
 
  Request you to please help.
 
  Thanks and Regards,
  Chandrash3khar
 



 --
 *With regards ---*
 *Mohammad Mustaqeem*,
 M.Tech (CSE)
 MNNIT Allahabad
 9026604270



Re: git clone hadoop taking too much time almost 12 hrs

2013-04-10 Thread maisnam ns
Thanks Andrew for your suggestion,I will clone it from the mirror.

Regards
Niranjan Singh


On Wed, Apr 10, 2013 at 11:04 PM, Andrew Wang andrew.w...@cloudera.comwrote:

 Hi Niranjan,

 Try doing your initial clone from the github mirror instead, I found it to
 be much faster:

 https://github.com/apache/hadoop-common

 I use the apache git for subsequent pulls.

 Best,
 Andrew


 On Tue, Apr 9, 2013 at 6:15 PM, maisnam ns maisnam...@gmail.com wrote:

  Hi,
 
  I am trying to execute  - git clone git://
  git.apache.org/hadoop-common.git so that I could setup a development
  environment for Hadoop under the Eclipse IDE but it is taking too much
  time.
 
  Can somebody let me know why it is taking too much time, I have a high
  speed internet connection and I don't think connectivity is the issue
 here.
 
  Thanks
  Niranjan Singh
 



Re: git clone hadoop taking too much time almost 12 hrs

2013-04-10 Thread maisnam ns
Thanks moses and Harsh.

Harsh , I 've bookmarked your blog, nice info.

Regards
Niranjan


On Thu, Apr 11, 2013 at 12:09 AM, Harsh J ha...@cloudera.com wrote:

 I once blogged about cloning big repositories after experiencing the
 mammoth Android's repos were:

 http://www.harshj.com/2010/08/29/a-less-known-thing-about-cloning-git-repositories/

 Try a git clone with a --depth=1 option, to reduce total download by not
 getting all the history objects. This would have some side-effects vs. a
 regular clone, but should be fine for contributions.


 On Wed, Apr 10, 2013 at 11:53 PM, mugisha moses mossp...@gmail.com
 wrote:

  The whole repo is like 290 mb  so make sure you have a decent internet
  connection
 
 
  On Wed, Apr 10, 2013 at 9:03 PM, maisnam ns maisnam...@gmail.com
 wrote:
 
   Thanks Andrew for your suggestion,I will clone it from the mirror.
  
   Regards
   Niranjan Singh
  
  
   On Wed, Apr 10, 2013 at 11:04 PM, Andrew Wang 
 andrew.w...@cloudera.com
   wrote:
  
Hi Niranjan,
   
Try doing your initial clone from the github mirror instead, I found
 it
   to
be much faster:
   
https://github.com/apache/hadoop-common
   
I use the apache git for subsequent pulls.
   
Best,
Andrew
   
   
On Tue, Apr 9, 2013 at 6:15 PM, maisnam ns maisnam...@gmail.com
  wrote:
   
 Hi,

 I am trying to execute  - git clone git://
 git.apache.org/hadoop-common.git so that I could setup a
 development
 environment for Hadoop under the Eclipse IDE but it is taking too
  much
 time.

 Can somebody let me know why it is taking too much time, I have a
  high
 speed internet connection and I don't think connectivity is the
 issue
here.

 Thanks
 Niranjan Singh

   
  
 



 --
 Harsh J



Re: Building Hadoop from source code

2013-04-09 Thread maisnam ns
@Mohammad Mustaqeem - Once you create the patch as given in the 'How to
contribute to Hadoop' and apply the patch. The  changes will be reflected
in the files you have modified.

1.Now you trigger the build ant or maven. I tried with ant so the ant
command I gave is  ant clean compile bin-package. Don't forget to download
ivy.jar and copy into you ant home/ lib folder. Once the build is triggered
Hadoop should get built along with the changes you made.

If , I am not mistaken , you modified some hadoop files say
BlockLocation.java, in your
Hadoopx.x\src\core\org\apache\hadoop\fs\BlockLocation.java.

The jar will be in Hadoopx.x\build\hadoop-0.20.3-dev-core.jar(In my version)

Hope this clears your doubt.

Regards
Niranjan Singh


On Tue, Apr 9, 2013 at 1:38 PM, Mohammad Mustaqeem
3m.mustaq...@gmail.comwrote:

 @Steve
 I am new to Hadoop developement.
 Can you please tell me, whats is the location of tar file??


 On Tue, Apr 9, 2013 at 12:09 AM, Steve Loughran ste...@hortonworks.com
 wrote:

  On 8 April 2013 16:08, Mohammad Mustaqeem 3m.mustaq...@gmail.com
 wrote:
 
   Please, tell what I am doing wrong??
   Whats the problem??
  
 
  a lot of these seem to be network-related tests. You can turn off all the
  tests; look in BUILDING.TXT at the root of the source tree for the
 various
  operations, then add -DskipTests to the end of every command, such as
 
  mvn package -Pdist -Dtar -DskipTests
 
  to build the .tar packages
 
   mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true
  to turn off the javadoc creation too, for an even faster build
 



 --
 *With regards ---*
 *Mohammad Mustaqeem*,
 M.Tech (CSE)
 MNNIT Allahabad
 9026604270



Re: Building Hadoop from source code

2013-04-09 Thread maisnam ns
@Mohammad Mustaqeem please follow the screenshots one by one


cd build you will find


Now cd hadoop-0.20.3-dev you will find the below screenshot


Now the bin contains all the scripts you may need to start hadoop
1. Go to that bin directory and format your namenode
2. configure you site-xmls
3. start-all.sh  note there are other ways also
4. do jps to see if all daemons are running



On Tue, Apr 9, 2013 at 4:54 PM, Mohammad Mustaqeem
3m.mustaq...@gmail.comwrote:

 Please, anyone tell me where is the build hadoop??
 that can be used to install the hadoop cluster.


 On Tue, Apr 9, 2013 at 2:32 PM, Mohammad Mustaqeem
 3m.mustaq...@gmail.comwrote:

  @Ling, you mean to say to run find -name *tar*??
  I run but don't know which file will be used to install Hadoop.
 
  @Niranjan
  I haven't changed anything I just executed svn checkout
  http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk to
 get
  the Hadoop source code in the hadoop-trunk directory.
  After that I executed the cd hadoop-trunk and finally executed mvn
  package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true.
  I want to ask what is location of tar file created which can be used to
  install the Hadoop ??
 
 
  On Tue, Apr 9, 2013 at 2:08 PM, maisnam ns maisnam...@gmail.com wrote:
 
  @Mohammad Mustaqeem - Once you create the patch as given in the 'How to
  contribute to Hadoop' and apply the patch. The  changes will be
 reflected
  in the files you have modified.
 
  1.Now you trigger the build ant or maven. I tried with ant so the ant
  command I gave is  ant clean compile bin-package. Don't forget to
 download
  ivy.jar and copy into you ant home/ lib folder. Once the build is
  triggered
  Hadoop should get built along with the changes you made.
 
  If , I am not mistaken , you modified some hadoop files say
  BlockLocation.java, in your
  Hadoopx.x\src\core\org\apache\hadoop\fs\BlockLocation.java.
 
  The jar will be in Hadoopx.x\build\hadoop-0.20.3-dev-core.jar(In my
  version)
 
  Hope this clears your doubt.
 
  Regards
  Niranjan Singh
 
 
  On Tue, Apr 9, 2013 at 1:38 PM, Mohammad Mustaqeem
  3m.mustaq...@gmail.comwrote:
 
   @Steve
   I am new to Hadoop developement.
   Can you please tell me, whats is the location of tar file??
  
  
   On Tue, Apr 9, 2013 at 12:09 AM, Steve Loughran 
 ste...@hortonworks.com
   wrote:
  
On 8 April 2013 16:08, Mohammad Mustaqeem 3m.mustaq...@gmail.com
   wrote:
   
 Please, tell what I am doing wrong??
 Whats the problem??

   
a lot of these seem to be network-related tests. You can turn off
 all
  the
tests; look in BUILDING.TXT at the root of the source tree for the
   various
operations, then add -DskipTests to the end of every command, such
 as
   
mvn package -Pdist -Dtar -DskipTests
   
to build the .tar packages
   
 mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true
to turn off the javadoc creation too, for an even faster build
   
  
  
  
   --
   *With regards ---*
   *Mohammad Mustaqeem*,
   M.Tech (CSE)
   MNNIT Allahabad
   9026604270
  
 
 
 
 
  --
  *With regards ---*
  *Mohammad Mustaqeem*,
   M.Tech (CSE)
  MNNIT Allahabad
  9026604270
 
 
 


 --
 *With regards ---*
 *Mohammad Mustaqeem*,
 M.Tech (CSE)
 MNNIT Allahabad
 9026604270



git clone hadoop taking too much time almost 12 hrs

2013-04-09 Thread maisnam ns
Hi,

I am trying to execute  - git clone git://
git.apache.org/hadoop-common.git so that I could setup a development
environment for Hadoop under the Eclipse IDE but it is taking too much time.

Can somebody let me know why it is taking too much time, I have a high
speed internet connection and I don't think connectivity is the issue here.

Thanks
Niranjan Singh