Re: How to run hadoop jar command in a clustered environment

2013-04-15 Thread Chris Nauroth
Hello Thoihen,

I'm moving this discussion from common-dev (questions about developing
Hadoop) to user (questions about using Hadoop).

If you haven't already seen it, then I recommend reading the cluster setup
documentation.  It's a bit different depending on the version of the Hadoop
code that you're deploying and running.  You mentioned JobTracker, so I
expect that you're using something from the 1.x line, but here are links to
both 1.x and 2.x docs just in case:

1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
2.x/trunk:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

To address your specific questions:

1. You can run the hadoop jar command and submit MapReduce jobs from any
machine that has the Hadoop software and configuration deployed and has
network connectivity to the machines that make up the Hadoop cluster.

2. Yes, you can use a separate machine that is not a member of the cluster
(meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
NodeManager).  This is your choice.  I've found it valuable to isolate
nodes like this to prevent MR job tasks from taking processing resources
away from interactive user commands, but this does mean that the resources
on that node can't be utilized by MR jobs during user idle times, so it
causes a small hit to overall utilization.

Hope this helps,
--Chris


On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam thoihen...@gmail.comwrote:

 Hi All,

 I am really new to Hadoop and installed hadoop in my local ubuntu machine.
 I also created a wordcount.jar and started hadoop with start-all.sh which
 started all the hadoop daemons and used jps to confirm it. Cd to hadoop/bin
 and ran hadoop jar x.jar  and successfully ran the map reduce program.

 Now, can someone please help me how I should run the hadoop jar command
 over a clustered environment say for example a cluster with 50 nodes. I
 know a dedicated machine would be namenode and another jobtracker and other
 datanodes and tasktrackers.

 1. From which machine should I run the hadoop jar command considering I
 have a mapreduce jar in hand. Is it the jobtracker machine from where I
 should run this hadoop jar command or can I run this hadoop jar command
 from any machine in the cluster.

 2, Can I run the map reduce job from another machine which is not part of
 the cluster , if yes how should I do it.

 Please help me.

 Regards
 thoihen



Re: How to run hadoop jar command in a clustered environment

2013-04-15 Thread maisnam ns
@Chris thanks a lot that helped a lot.


On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth cnaur...@hortonworks.comwrote:

 Hello Thoihen,

 I'm moving this discussion from common-dev (questions about developing
 Hadoop) to user (questions about using Hadoop).

 If you haven't already seen it, then I recommend reading the cluster setup
 documentation.  It's a bit different depending on the version of the Hadoop
 code that you're deploying and running.  You mentioned JobTracker, so I
 expect that you're using something from the 1.x line, but here are links to
 both 1.x and 2.x docs just in case:

 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
 2.x/trunk:

 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

 To address your specific questions:

 1. You can run the hadoop jar command and submit MapReduce jobs from any
 machine that has the Hadoop software and configuration deployed and has
 network connectivity to the machines that make up the Hadoop cluster.

 2. Yes, you can use a separate machine that is not a member of the cluster
 (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
 NodeManager).  This is your choice.  I've found it valuable to isolate
 nodes like this to prevent MR job tasks from taking processing resources
 away from interactive user commands, but this does mean that the resources
 on that node can't be utilized by MR jobs during user idle times, so it
 causes a small hit to overall utilization.

 Hope this helps,
 --Chris


 On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam thoihen...@gmail.com
 wrote:

  Hi All,
 
  I am really new to Hadoop and installed hadoop in my local ubuntu
 machine.
  I also created a wordcount.jar and started hadoop with start-all.sh which
  started all the hadoop daemons and used jps to confirm it. Cd to
 hadoop/bin
  and ran hadoop jar x.jar  and successfully ran the map reduce program.
 
  Now, can someone please help me how I should run the hadoop jar command
  over a clustered environment say for example a cluster with 50 nodes. I
  know a dedicated machine would be namenode and another jobtracker and
 other
  datanodes and tasktrackers.
 
  1. From which machine should I run the hadoop jar command considering I
  have a mapreduce jar in hand. Is it the jobtracker machine from where I
  should run this hadoop jar command or can I run this hadoop jar command
  from any machine in the cluster.
 
  2, Can I run the map reduce job from another machine which is not part of
  the cluster , if yes how should I do it.
 
  Please help me.
 
  Regards
  thoihen
 



[jira] [Created] (HADOOP-9476) Some test cases in TestUserGroupInformation fail if ran after testSetLoginUser.

2013-04-15 Thread Kihwal Lee (JIRA)
Kihwal Lee created HADOOP-9476:
--

 Summary: Some test cases in TestUserGroupInformation fail if ran 
after testSetLoginUser.
 Key: HADOOP-9476
 URL: https://issues.apache.org/jira/browse/HADOOP-9476
 Project: Hadoop Common
  Issue Type: Bug
  Components: security, test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Kihwal Lee


HADOOP-9352 added a new test case testSetLoginUser. If it runs prior to other 
test cases, some of them fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8630) rename isSingleSwitch() methods in new topo base class to isFlatTopology()

2013-04-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-8630.


Resolution: Won't Fix
  Assignee: Tsuyoshi OZAWA

Marking as wontfix but crediting Tsuyoshi Ozawa for his contribution

 rename isSingleSwitch() methods in new topo base class to isFlatTopology()
 --

 Key: HADOOP-8630
 URL: https://issues.apache.org/jira/browse/HADOOP-8630
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Steve Loughran
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: HADOOP-8630.2.patch, HADOOP-8630.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 The new topology logic that is not yet turned on in HDFS uses the method 
 {{isSingleSwitch()}} for implementations to declare whether or not they are 
 single switch. 
 The use of switch is an implementation issue; the big VM-based patch shows 
 that really it's about flat vs hierarchical, with Hadoop assuming that 
 subtrees in the hierarchy have better bandwidth (good) but correlated 
 failures (bad). 
 Renaming the method now -before it's fixed and used- is time time to do it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9477) posixGroups support for LDAP groups mapping service

2013-04-15 Thread Kai Zheng (JIRA)
Kai Zheng created HADOOP-9477:
-

 Summary: posixGroups support for LDAP groups mapping service
 Key: HADOOP-9477
 URL: https://issues.apache.org/jira/browse/HADOOP-9477
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: 2.0.5-beta


It would be nice to support posixGroups for LdapGroupsMapping service. Below is 
from current description for the provider:
hadoop.security.group.mapping.ldap.search.filter.group:
An additional filter to use when searching for LDAP groups. This should be
changed when resolving groups against a non-Active Directory installation.
posixGroups are currently not a supported group class.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira