Re: How to run hadoop jar command in a clustered environment
Hello Thoihen, I'm moving this discussion from common-dev (questions about developing Hadoop) to user (questions about using Hadoop). If you haven't already seen it, then I recommend reading the cluster setup documentation. It's a bit different depending on the version of the Hadoop code that you're deploying and running. You mentioned JobTracker, so I expect that you're using something from the 1.x line, but here are links to both 1.x and 2.x docs just in case: 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html 2.x/trunk: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html To address your specific questions: 1. You can run the hadoop jar command and submit MapReduce jobs from any machine that has the Hadoop software and configuration deployed and has network connectivity to the machines that make up the Hadoop cluster. 2. Yes, you can use a separate machine that is not a member of the cluster (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or NodeManager). This is your choice. I've found it valuable to isolate nodes like this to prevent MR job tasks from taking processing resources away from interactive user commands, but this does mean that the resources on that node can't be utilized by MR jobs during user idle times, so it causes a small hit to overall utilization. Hope this helps, --Chris On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam thoihen...@gmail.comwrote: Hi All, I am really new to Hadoop and installed hadoop in my local ubuntu machine. I also created a wordcount.jar and started hadoop with start-all.sh which started all the hadoop daemons and used jps to confirm it. Cd to hadoop/bin and ran hadoop jar x.jar and successfully ran the map reduce program. Now, can someone please help me how I should run the hadoop jar command over a clustered environment say for example a cluster with 50 nodes. I know a dedicated machine would be namenode and another jobtracker and other datanodes and tasktrackers. 1. From which machine should I run the hadoop jar command considering I have a mapreduce jar in hand. Is it the jobtracker machine from where I should run this hadoop jar command or can I run this hadoop jar command from any machine in the cluster. 2, Can I run the map reduce job from another machine which is not part of the cluster , if yes how should I do it. Please help me. Regards thoihen
Re: How to run hadoop jar command in a clustered environment
@Chris thanks a lot that helped a lot. On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth cnaur...@hortonworks.comwrote: Hello Thoihen, I'm moving this discussion from common-dev (questions about developing Hadoop) to user (questions about using Hadoop). If you haven't already seen it, then I recommend reading the cluster setup documentation. It's a bit different depending on the version of the Hadoop code that you're deploying and running. You mentioned JobTracker, so I expect that you're using something from the 1.x line, but here are links to both 1.x and 2.x docs just in case: 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html 2.x/trunk: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html To address your specific questions: 1. You can run the hadoop jar command and submit MapReduce jobs from any machine that has the Hadoop software and configuration deployed and has network connectivity to the machines that make up the Hadoop cluster. 2. Yes, you can use a separate machine that is not a member of the cluster (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or NodeManager). This is your choice. I've found it valuable to isolate nodes like this to prevent MR job tasks from taking processing resources away from interactive user commands, but this does mean that the resources on that node can't be utilized by MR jobs during user idle times, so it causes a small hit to overall utilization. Hope this helps, --Chris On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam thoihen...@gmail.com wrote: Hi All, I am really new to Hadoop and installed hadoop in my local ubuntu machine. I also created a wordcount.jar and started hadoop with start-all.sh which started all the hadoop daemons and used jps to confirm it. Cd to hadoop/bin and ran hadoop jar x.jar and successfully ran the map reduce program. Now, can someone please help me how I should run the hadoop jar command over a clustered environment say for example a cluster with 50 nodes. I know a dedicated machine would be namenode and another jobtracker and other datanodes and tasktrackers. 1. From which machine should I run the hadoop jar command considering I have a mapreduce jar in hand. Is it the jobtracker machine from where I should run this hadoop jar command or can I run this hadoop jar command from any machine in the cluster. 2, Can I run the map reduce job from another machine which is not part of the cluster , if yes how should I do it. Please help me. Regards thoihen
[jira] [Created] (HADOOP-9476) Some test cases in TestUserGroupInformation fail if ran after testSetLoginUser.
Kihwal Lee created HADOOP-9476: -- Summary: Some test cases in TestUserGroupInformation fail if ran after testSetLoginUser. Key: HADOOP-9476 URL: https://issues.apache.org/jira/browse/HADOOP-9476 Project: Hadoop Common Issue Type: Bug Components: security, test Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Kihwal Lee HADOOP-9352 added a new test case testSetLoginUser. If it runs prior to other test cases, some of them fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8630) rename isSingleSwitch() methods in new topo base class to isFlatTopology()
[ https://issues.apache.org/jira/browse/HADOOP-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-8630. Resolution: Won't Fix Assignee: Tsuyoshi OZAWA Marking as wontfix but crediting Tsuyoshi Ozawa for his contribution rename isSingleSwitch() methods in new topo base class to isFlatTopology() -- Key: HADOOP-8630 URL: https://issues.apache.org/jira/browse/HADOOP-8630 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Steve Loughran Assignee: Tsuyoshi OZAWA Priority: Trivial Attachments: HADOOP-8630.2.patch, HADOOP-8630.patch Original Estimate: 0.5h Remaining Estimate: 0.5h The new topology logic that is not yet turned on in HDFS uses the method {{isSingleSwitch()}} for implementations to declare whether or not they are single switch. The use of switch is an implementation issue; the big VM-based patch shows that really it's about flat vs hierarchical, with Hadoop assuming that subtrees in the hierarchy have better bandwidth (good) but correlated failures (bad). Renaming the method now -before it's fixed and used- is time time to do it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9477) posixGroups support for LDAP groups mapping service
Kai Zheng created HADOOP-9477: - Summary: posixGroups support for LDAP groups mapping service Key: HADOOP-9477 URL: https://issues.apache.org/jira/browse/HADOOP-9477 Project: Hadoop Common Issue Type: Improvement Reporter: Kai Zheng Assignee: Kai Zheng Fix For: 2.0.5-beta It would be nice to support posixGroups for LdapGroupsMapping service. Below is from current description for the provider: hadoop.security.group.mapping.ldap.search.filter.group: An additional filter to use when searching for LDAP groups. This should be changed when resolving groups against a non-Active Directory installation. posixGroups are currently not a supported group class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira