Re: map task execution time
Thanks Kai, I will try those. On Thu, Apr 5, 2012 at 3:15 AM, Kai Voigt k...@123.org wrote: Hi, Am 05.04.2012 um 00:20 schrieb bikash sharma: Is it possible to get the execution time of the constituent map/reduce tasks of a MapReduce job (say sort) at the end of a job run? Preferably, can we obtain this programatically? you can access the JobTracker's web UI and see the start and stop timestamps for every individual task. Since the JobTracker Java API is exposed, you can write your own application to fetch that data through your own code. Also, hadoop job on the command line can be used to read job statistics. Kai -- Kai Voigt k...@123.org
Re: map task execution time
Yes, how can we use hadoop job to get MR job stats, especially constituent task finish times? On Thu, Apr 5, 2012 at 9:02 AM, Jay Vyas jayunit...@gmail.com wrote: (excuse the typo in the last email : I meant I've been playing with Cinch , not I've been with Cinch) On Thu, Apr 5, 2012 at 7:54 AM, Jay Vyas jayunit...@gmail.com wrote: How can hadoop job be used to read m/r statistics ? On Thu, Apr 5, 2012 at 7:30 AM, bikash sharma sharmabiks...@gmail.com wrote: Thanks Kai, I will try those. On Thu, Apr 5, 2012 at 3:15 AM, Kai Voigt k...@123.org wrote: Hi, Am 05.04.2012 um 00:20 schrieb bikash sharma: Is it possible to get the execution time of the constituent map/reduce tasks of a MapReduce job (say sort) at the end of a job run? Preferably, can we obtain this programatically? you can access the JobTracker's web UI and see the start and stop timestamps for every individual task. Since the JobTracker Java API is exposed, you can write your own application to fetch that data through your own code. Also, hadoop job on the command line can be used to read job statistics. Kai -- Kai Voigt k...@123.org -- Jay Vyas MMSB/UCHC -- Jay Vyas MMSB/UCHC
Re: getting the process id of mapreduce tasks
Thanks so much Harsh! On Thu, Sep 29, 2011 at 12:42 AM, Harsh J ha...@cloudera.com wrote: Hello Bikash, The tasks run on the tasktracker, so that is where you'll need to look for the process ID -- not the JobTracker/client. Crudely speaking, $ ssh tasktracker01 # or whichever. $ jps | grep Child | cut -d -f 1 # And lo, PIDs to play with. On Thu, Sep 29, 2011 at 12:15 AM, bikash sharma sharmabiks...@gmail.com wrote: Hi, Is it possible to get the process id of each task in a MapReduce job? When I run a mapreduce job and do a monitoring in linux using ps, i just see the id of the mapreduce job process but not its constituent map/reduce tasks. The use case is to monitor the resource usage of each task by using sar utility in linux with specific process id of task. Thanks, Bikash -- Harsh J
Re: getting the process id of mapreduce tasks
Thanks Varad. On Wed, Sep 28, 2011 at 9:35 PM, Varad Meru meru.va...@gmail.com wrote: The process ids of each individual task can be seen using jps and jconsole commands provided by java. jconsole command on command-line interface provides a GUI screen for monitoring running tasks within java. The tasks are only visible as java virtual machine instance in the os system monitoring tool. Regards, Varad Meru --- Sent from my iPod On 29-Sep-2011, at 0:15, bikash sharma sharmabiks...@gmail.com wrote: Hi, Is it possible to get the process id of each task in a MapReduce job? When I run a mapreduce job and do a monitoring in linux using ps, i just see the id of the mapreduce job process but not its constituent map/reduce tasks. The use case is to monitor the resource usage of each task by using sar utility in linux with specific process id of task. Thanks, Bikash
linux containers with Hadoop
Hi, Does anyone knows if Linux containers (which are like kernel supported virtualization technique for providing resource isolation across process/appication) have ever been used with Hadoop to provide resource isolation for map/reduce tasks? If yes, what could be the up/down sides of such approach and how feasible it is in the context of Hadoop? Any pointers if any in terms of papers, etc would be useful. Thanks, Bikash
Re: linux containers with Hadoop
Thanks Edward, so mostly the linux containers are used in Hadoop for ensuring isolation in terms of providing security across mapreduce jobs from different users (even mesos seem to leverage the same) not for resource fairness? On Fri, Sep 30, 2011 at 1:39 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Fri, Sep 30, 2011 at 9:03 AM, bikash sharma sharmabiks...@gmail.com wrote: Hi, Does anyone knows if Linux containers (which are like kernel supported virtualization technique for providing resource isolation across process/appication) have ever been used with Hadoop to provide resource isolation for map/reduce tasks? If yes, what could be the up/down sides of such approach and how feasible it is in the context of Hadoop? Any pointers if any in terms of papers, etc would be useful. Thanks, Bikash Previously hadoop launched map reduce tasks as a single user, now with security tasks can launch as different users in the same OS/VM. I would say the closest you can to that isolation is the work done with mesos . http://www.mesosproject.org/
getting the process id of mapreduce tasks
Hi, Is it possible to get the process id of each task in a MapReduce job? When I run a mapreduce job and do a monitoring in linux using ps, i just see the id of the mapreduce job process but not its constituent map/reduce tasks. The use case is to monitor the resource usage of each task by using sar utility in linux with specific process id of task. Thanks, Bikash
Re: Getting the cpu, memory usage of map/reduce tasks
Thanks Raif. On Mon, Sep 26, 2011 at 2:01 PM, Ralf Heyde ralf.he...@gmx.de wrote: Hi Bikash, every map-/reduce task is - as far as I know - a single jvm instance - you can configure and/or run with jvm options. Maybe you can track these jvm's by using some system tools. Regards, Ralf -Original Message- From: bikash sharma [mailto:sharmabiks...@gmail.com] Sent: Freitag, 23. September 2011 20:58 To: common-user@hadoop.apache.org; common-...@hadoop.apache.org Subject: Getting the cpu, memory usage of map/reduce tasks Hi -- Is it possible to get the cpu and memory usage of individual map/reduces tasks when any mapreduce job is run. I came across this jira issue, but was not sure about the exact ways to access in the current hadoop distriubtion https://issues.apache.org/jira/browse/MAPREDUCE-220 Any help is highly appreciated. Thanks, Bikash
configuring different number of slaves for MR jobs
Hi -- Can we specify a different set of slaves for each mapreduce job run. I tried using the --config option and specify different set of slaves in slaves config file. However, it does not use the selective slaves set but the one initially configured. Any help? Thanks, Biksah
Re: configuring different number of slaves for MR jobs
Thanks Suhas. I will try using HOD. The use case for me is some research experiments with different set of slaves for each job run. On Tue, Sep 27, 2011 at 1:03 PM, Vitthal Suhas Gogate gog...@hortonworks.com wrote: Slaves file is used only by control scripts like {start/stop}-dfs.sh, {start/stop}-mapred.sh to start the data nodes and task trackers on specified set of slave machines.. they can not be used effectively to change the size of the cluster for each M/R job (unless you want to restart the task trackers with different number of slaves before every M/R job :) You can use Hadoop Job Tracker Schedulers (Capacity/Fair-share) to allocate and share the cluster capacity effectively. Also there is a option of using HOD (Hadoop on demand) for dynamically allocating the cluster of required number of nodes.. typically used by QA/RE folks for testing purposes.. Again in production resizing the HDFS cluster is not easy as nodes hold the data. --Suhas On Tue, Sep 27, 2011 at 8:50 AM, bikash sharma sharmabiks...@gmail.com wrote: Hi -- Can we specify a different set of slaves for each mapreduce job run. I tried using the --config option and specify different set of slaves in slaves config file. However, it does not use the selective slaves set but the one initially configured. Any help? Thanks, Biksah
Getting the cpu, memory usage of map/reduce tasks
Hi -- Is it possible to get the cpu and memory usage of individual map/reduces tasks when any mapreduce job is run. I came across this jira issue, but was not sure about the exact ways to access in the current hadoop distriubtion https://issues.apache.org/jira/browse/MAPREDUCE-220 Any help is highly appreciated. Thanks, Bikash
automatic monitoring the utilization of slaves
Hi -- Is there a way, by which a slave can get a trigger when a Hadoop jobs finished in master? The use case is as follows: I need to monitor the cpu, memory utilization utility automatically. For which, I need to know the timestamp to start and stop the sar utility corresponding to the start and finish of Hadoop job at master. Its simple to do at master, since the Hadoop job runs there, but how we do for slaves? Thanks. Bikash
/etc/hosts related error?
Hi I am experiencing a lot of tasks failures while running any Hadoop application. In particular, I get the following warnings: Error initializing attempt_201106081500_0018_r_00_0: java.io.IOException: Could not obtain block: blk_-7386162385184325734_1214 file=/home/hadoop/data/mapred/system/job_201106081500_0018/job.xml Looking in the forums, it seems it has something to do with /etc/hosts settings, because I cannot also access the jobtracker web interface via the hostname, but can access it via the actual IP address. I set the /etc/hosts in all the VMs as per ip address hostname Any idea? Thanks
hadoop cluster installation problems
Hi, I need to install hadoop on 16-node cluster. I have a couple of related questions: 1. I have installed hadoop on a shared directory, i.e., there is just one place where the whole hadoop installation files exist and all the 16 nodes use the same installation. Is that an issue or I need to install hadoop on each of these nodes in their local directory separately? 2. I installed hadoop-0.21 and after following the installation instructions, when i tried formatting, I get the following error: / Re-format filesystem in /var/tmp/data/dfs/name ? (Y or N) Y 11/04/13 09:16:23 INFO namenode.FSNamesystem: defaultReplication = 3 11/04/13 09:16:23 INFO namenode.FSNamesystem: maxReplication = 512 11/04/13 09:16:23 INFO namenode.FSNamesystem: minReplication = 1 11/04/13 09:16:23 INFO namenode.FSNamesystem: maxReplicationStreams = 2 11/04/13 09:16:23 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false 11/04/13 09:16:23 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 11/04/13 09:16:23 INFO namenode.FSNamesystem: fsOwner=bus145 11/04/13 09:16:23 INFO namenode.FSNamesystem: supergroup=supergroup 11/04/13 09:16:23 INFO namenode.FSNamesystem: isPermissionEnabled=true 11/04/13 09:16:23 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 11/04/13 09:16:24 INFO common.Storage: Cannot lock storage /var/tmp/data/dfs/name. The directory is already locked. 11/04/13 09:16:24 ERROR namenode.NameNode: java.io.IOException: Cannot lock storage /var/tmp/data/dfs/name. The directory is already locked. at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:617) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1426) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) 11/04/13 09:16:24 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at inti79.cse.psu.edu/130.203.58.207 / 3. I was using before hadoop-0.20, and formatting was working fine. 4. Also, when i do a bin/start-dfs.sh, am able to see the Namenode, Datanode up, however on bin/start-mapred.sh, am not able to see Jobtracker up on the master node, though Tasktracker seems to be up on slaves. Before upgrading to Hadoop-0.21, everything was working fine with hadoop-0.20 including running benchmarks and getting stats. Any suggestions in this regard is highly appreciated. Thanks, Bikash
Re: hadoop cluster installation problems
p.s. Also, while starting dfs using bin/start-dfs.sh, I get the following error: 2011-04-13 09:42:31,729 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = inti84.cse.psu.edu/130.203.58.212 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-04-13 09:42:31,853 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:175) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2011-04-13 09:42:31,854 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at inti84.cse.psu.edu/130.203.58.212 / 2011-04-13 09:44:03,265 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = inti84.cse.psu.edu/130.203.58.212 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-04-13 09:44:03,384 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:175) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) On Wed, Apr 13, 2011 at 9:20 AM, bikash sharma sharmabiks...@gmail.comwrote: Hi, I need to install hadoop on 16-node cluster. I have a couple of related questions: 1. I have installed hadoop on a shared directory, i.e., there is just one place where the whole hadoop installation files exist and all the 16 nodes use the same installation. Is that an issue or I need to install hadoop on each of these nodes in their local directory separately? 2. I installed hadoop-0.21 and after following the installation instructions, when i tried formatting, I get the following error: / Re-format filesystem in /var/tmp/data/dfs/name ? (Y or N) Y 11/04/13 09:16:23 INFO namenode.FSNamesystem: defaultReplication = 3 11/04/13 09:16:23 INFO namenode.FSNamesystem: maxReplication = 512 11/04/13 09:16:23 INFO namenode.FSNamesystem: minReplication = 1 11/04/13 09:16:23 INFO namenode.FSNamesystem: maxReplicationStreams = 2 11/04/13 09:16:23 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false 11/04/13 09:16:23 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 11/04/13 09:16:23 INFO namenode.FSNamesystem: fsOwner=bus145 11/04/13 09:16:23 INFO namenode.FSNamesystem: supergroup=supergroup 11/04/13 09:16:23 INFO namenode.FSNamesystem: isPermissionEnabled=true 11/04/13 09:16:23 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 11/04/13 09:16:24 INFO common.Storage: Cannot lock storage /var/tmp/data/dfs/name. The directory is already locked. 11/04/13 09:16:24 ERROR namenode.NameNode: java.io.IOException: Cannot lock storage /var/tmp/data/dfs/name. The directory is already locked. at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:617) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1426) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java
cluster restart error
Hi, I changed some config. parameters in core-site/mapred.xml files and then stopped dfs, mapred services. While restarting them again, I am unable to do so and looking at the logs, the following error occurs: 2011-04-12 17:27:39,343 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-04-12 17:27:39,453 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.NoClassDefFoundError: org/json/JSONException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at org.apache.hadoop.metrics.ContextFactory.getContext(ContextFactory.java:132) at org.apache.hadoop.metrics.MetricsUtil.getContext(MetricsUtil.java:56) at org.apache.hadoop.metrics.MetricsUtil.getContext(MetricsUtil.java:45) at org.apache.hadoop.mapred.TaskTracker$ShuffleServerMetrics.init(TaskTracker.java:250) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:917) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2834) Caused by: java.lang.ClassNotFoundException: org.json.JSONException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 8 more 2011-04-12 17:27:39,455 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down TaskTracker at inti79.cse.psu.edu/130.203.58.207 / Also, am unable to do namenode -format to fresh start the cluster. Any suggestions please? Thanks, Bikash
Re: cluster restart error
p.s. Also, am unable to connect while doing hadoop/bin/hadoop fs -ls with the following error: inti76.cse.psu.edu: starting tasktracker, logging to /i3c/hpcl/bus145/cse598g/hadoop/bin/../logs/hadoop-bus145-tasktracker-inti76.cse.psu.edu.out inti79.cse.psu.edu 36% hadoop/bin/hadoop fs -ls 11/04/12 17:37:34 INFO ipc.Client: Retrying connect to server: inti79.cse.psu.edu/130.203.58.207:54310. Already tried 0 time(s). 11/04/12 17:37:35 INFO ipc.Client: Retrying connect to server: inti79.cse.psu.edu/130.203.58.207:54310. Already tried 1 time(s). 11/04/12 17:37:36 INFO ipc.Client: Retrying connect to server: inti79.cse.psu.edu/130.203.58.207:54310. Already tried 2 time(s). On Tue, Apr 12, 2011 at 5:34 PM, bikash sharma sharmabiks...@gmail.comwrote: Hi, I changed some config. parameters in core-site/mapred.xml files and then stopped dfs, mapred services. While restarting them again, I am unable to do so and looking at the logs, the following error occurs: 2011-04-12 17:27:39,343 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-04-12 17:27:39,453 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.NoClassDefFoundError: org/json/JSONException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at org.apache.hadoop.metrics.ContextFactory.getContext(ContextFactory.java:132) at org.apache.hadoop.metrics.MetricsUtil.getContext(MetricsUtil.java:56) at org.apache.hadoop.metrics.MetricsUtil.getContext(MetricsUtil.java:45) at org.apache.hadoop.mapred.TaskTracker$ShuffleServerMetrics.init(TaskTracker.java:250) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:917) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2834) Caused by: java.lang.ClassNotFoundException: org.json.JSONException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 8 more 2011-04-12 17:27:39,455 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down TaskTracker at inti79.cse.psu.edu/130.203.58.207 / Also, am unable to do namenode -format to fresh start the cluster. Any suggestions please? Thanks, Bikash
available hadoop logs
Hi, For research purpose, I need some real Hadoop MapReduce job traces (ideally both inter and intra-job(in terms of Hadoop job configuration parameters like mapred.io.sort.factor)). Is there some freely available Hadoop traces corresponding to some real large setup? Thanks, Bikash
runtime resource change of applications
Hi, Can we dynamically vary the resource allocation/consumption (say memory, cores) of Hadoop MR applications like sort? Thanks, Bikash
Chukwa setup issues
Hi, I am trying to setup Chukwa for a 16-node Hadoop cluster. I followed the admin guide - http://incubator.apache.org/chukwa/docs/r0.4.0/admin.html#Agents However, I ran two the following issues: 1. What should be the collector port that needs to be specified in conf/collectors file 2. Am unable to see the collector running via web browser Am I missing something? Thanks in advance. -bikash p.s. - after i run collector, nothing happens % bin/chukwa collector 2011-04-01 16:07:16.410::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2011-04-01 16:07:16.523::INFO: jetty-6.1.11 2011-04-01 16:07:17.707::INFO: Started SelectChannelConnector@0.0.0.0: started Chukwa http collector on port
Re: Chukwa setup issues
Thanks Bill. I am able to connect via web now, actually had put wrong http port in config file. One following question - if i run a mapreduce program say terasort, how can we link chukwa to collect job metrics via web. On Fri, Apr 1, 2011 at 5:37 PM, Bill Graham billgra...@gmail.com wrote: Unfortunately conf/collectors is used in two different ways in Chukwa, each with a different syntax. This should really be fixed. 1. The script that starts the collectors looks at it for a list of hostnames (no ports) to start collectors on. To start it just on one host, set it to localhost. 2. The agent looks at that file for the list of collectors to attempt to communicate with. In that case the format is a list of HTTP urls with ports of the collectors. Can you telnet to port ? It looks like it's listening, but nothing's being sent. Is there anything in logs/collector.log? On Fri, Apr 1, 2011 at 1:09 PM, bikash sharma sharmabiks...@gmail.com wrote: Hi, I am trying to setup Chukwa for a 16-node Hadoop cluster. I followed the admin guide - http://incubator.apache.org/chukwa/docs/r0.4.0/admin.html#Agents However, I ran two the following issues: 1. What should be the collector port that needs to be specified in conf/collectors file 2. Am unable to see the collector running via web browser Am I missing something? Thanks in advance. -bikash p.s. - after i run collector, nothing happens % bin/chukwa collector 2011-04-01 16:07:16.410::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2011-04-01 16:07:16.523::INFO: jetty-6.1.11 2011-04-01 16:07:17.707::INFO: Started SelectChannelConnector@0.0.0.0: started Chukwa http collector on port
Re: Chukwa setup issues
I was trying to install HICC in Chukwa, but hicc.sh does not exist in the repository. Any idea? -bikash On Fri, Apr 1, 2011 at 5:57 PM, bikash sharma sharmabiks...@gmail.comwrote: Thanks Bill. I am able to connect via web now, actually had put wrong http port in config file. One following question - if i run a mapreduce program say terasort, how can we link chukwa to collect job metrics via web. On Fri, Apr 1, 2011 at 5:37 PM, Bill Graham billgra...@gmail.com wrote: Unfortunately conf/collectors is used in two different ways in Chukwa, each with a different syntax. This should really be fixed. 1. The script that starts the collectors looks at it for a list of hostnames (no ports) to start collectors on. To start it just on one host, set it to localhost. 2. The agent looks at that file for the list of collectors to attempt to communicate with. In that case the format is a list of HTTP urls with ports of the collectors. Can you telnet to port ? It looks like it's listening, but nothing's being sent. Is there anything in logs/collector.log? On Fri, Apr 1, 2011 at 1:09 PM, bikash sharma sharmabiks...@gmail.com wrote: Hi, I am trying to setup Chukwa for a 16-node Hadoop cluster. I followed the admin guide - http://incubator.apache.org/chukwa/docs/r0.4.0/admin.html#Agents However, I ran two the following issues: 1. What should be the collector port that needs to be specified in conf/collectors file 2. Am unable to see the collector running via web browser Am I missing something? Thanks in advance. -bikash p.s. - after i run collector, nothing happens % bin/chukwa collector 2011-04-01 16:07:16.410::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2011-04-01 16:07:16.523::INFO: jetty-6.1.11 2011-04-01 16:07:17.707::INFO: Started SelectChannelConnector@0.0.0.0: started Chukwa http collector on port
Re: observe the effect of changes to Hadoop
Thanks Steve. It worked. On Sun, Mar 27, 2011 at 2:08 PM, Steve Loughran ste...@apache.org wrote: On 25/03/2011 14:10, bikash sharma wrote: Hi, For my research project, I need to add a couple of functions in JobTracker.java source file to include additional information about TaskTrackers resource usage through heartbeat messages. I made those changes to JobTracker.java file. However, I am not very clear how to see these effects. I mean what are the next steps in terms of building the entire Hadoop code base, using the built distribution and installing it again in the cluster, etc? If you are working with the Job Tracker you only need to rebuild the mapreduce JARs and push the new JAR out to the Job Tracker server, restart that process. For more safety, put the same JAR on all the task trackers and shut down HDFS before the updates, but that's potentially overkil Any elaborate updates on these will be very useful since I do not have much experience in doing modifications to Hadoop like huge code base and observing the effects of these changes. I'd recommend getting everything working on a local machine single VM (the MiniMRCluster class helps), then move to multiple VMs and finally, if the code looks good, a real cluster with data you don't value. -stee
pointers to Hadoop eclipse
Hi, Can someone please point to any good reference that tells clearly how to checkout Hadoop code base in eclipse, make any changes and re-compile. Actually, I wanted to change some part in Hadoop, so wants to see the above effect, preferrably in eclipse. Thanks, Bikash
Re: definition of slots in Hadoop scheduling
Thanks Allen. On Sat, Mar 12, 2011 at 11:34 AM, Allen Wittenauer a...@apache.org wrote: (Removing common-dev, because this isn't really a dev question) On Feb 25, 2011, at 5:52 AM, bikash sharma wrote: Hi, How is task slot in Hadoop defined with respect to scheduling a map/reduce task on such slots available on TaskTrackers? On a TaskTracker, one sets how many maps and reduces one wants to run on that node. The JobTracker is informed of this value. When a job is getting scheduled, it compares the various tasks's input to see if a DataNode is providing a matching block. If a block exists or is nearby, the task is scheduled on that node.
slot related question
Hi, This is a conceptual question: 1. Are various resources shared across slots in Hadoop OR resources are partitioned across slots? 2. Any thoughts on experiments using Hadoop setup that can help confirm the above rationale? Thanks, Bikash
conceptual question regarding slots
Hi, Could someone throw some light as to how intuitively fixed-type slots in Hadoop have a negative impact of cluster utilization as mentioned in Arun's blog? http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/ Thanks, Bikash
Re: TaskTracker not starting on all nodes
Hi James, Sorry for the late response. No, the same problem persists. I reformatted HDFS, stopped mapred and hdfs daemons and restarted them (using start-dfs.sh and start-mapred.sh from master node). But surprisingly out of 4 nodes cluster, two nodes have TaskTracker running while other two do not have TaskTrackers on them (verified using jps). I guess since I have the Hadoop installed on shared storage, that might be the issue? Btw, how do I start the services independently on each node? -bikash On Sun, Feb 27, 2011 at 11:05 PM, James Seigel ja...@tynt.com wrote: Did you get it working? What was the fix? Sent from my mobile. Please excuse the typos. On 2011-02-27, at 8:43 PM, Simon gsmst...@gmail.com wrote: Hey Bikash, Maybe you can manually start a tasktracker on the node and see if there are any error messages. Also, don't forget to check your configure files for mapreduce and hdfs and make sure datanode can start successfully first. After all these steps, you can submit a job on the master node and see if there are any communication between these failed nodes and the master node. Post your error messages here if possible. HTH. Simon - On Sat, Feb 26, 2011 at 10:44 AM, bikash sharma sharmabiks...@gmail.com wrote: Thanks James. Well all the config. files and shared keys are on a shared storage that is accessed by all the nodes in the cluster. At times, everything runs fine on initialization, but at other times, the same problem persists, so was bit confused. Also, checked the TaskTracker logs on those nodes, there does not seem to be any error. -bikash On Sat, Feb 26, 2011 at 10:30 AM, James Seigel ja...@tynt.com wrote: Maybe your ssh keys aren’t distributed the same on each machine or the machines aren’t configured the same? J On 2011-02-26, at 8:25 AM, bikash sharma wrote: Hi, I have a 10 nodes Hadoop cluster, where I am running some benchmarks for experiments. Surprisingly, when I initialize the Hadoop cluster (hadoop/bin/start-mapred.sh), in many instances, only some nodes have TaskTracker process up (seen using jps), while other nodes do not have TaskTrackers. Could anyone please explain? Thanks, Bikash -- Regards, Simon
disable pipelining in Hadoop
Hi, Is there a way to disable the use of pipelining , i.e., the reduce phase is started only after the map phase is completed? -bikash
Re: disable pipelining in Hadoop
Hi, Thanks Benajamin and Bibek for the detailed explanations and pointers. The question came after reading the paper Real-time MapReduce Scheduling ( http://repository.upenn.edu/cis_reports/942/) where in their experimental setup, they say they disabled the use of speculative execution and use of pipelining. Thus, I was wandering how to enforce the latter concept. -bikash On Tue, Mar 1, 2011 at 9:48 AM, Benjamin Gufler benjamin.guf...@tum.dewrote: On 2011-03-01 15:42, Bibek Paudel wrote: On Tue, Mar 1, 2011 at 3:27 PM, Benjamin Guflerbenjamin.guf...@tum.de wrote: Is there a way to disable the use of pipelining , i.e., the reduce phase is started only after the map phase is completed? you need to configure the mapred.reduce.slowstart.completed.maps property in mapred-site.xml. It gives the percentage of mappers which must be complete before the first reducers are launched. By setting it to 1, you should obtain the wanted behaviour. I think this only schedules the reducers, and the scheduled reducers start copy (followed by sort) stages. The actual reduce functions are called only after all the intermediate data from all mappers have been copied over. The reduce functions cannot be called earlier anyway, as the last mapper to complete might produce output which must be processed on the first reduce invocation. So, if it was not the early copying and sorting, I think I didn't get your initial question, sorry. Benjamin
Re: TaskTracker not starting on all nodes
Hi Sonal, Thanks. I guess you are right. ps -ef exposes such processes. -bikash On Tue, Mar 1, 2011 at 1:29 PM, Sonal Goyal sonalgoy...@gmail.com wrote: Bikash, I have sometimes found hanging processes which jps does not report, but a ps -ef shows them. Maybe you can check this on the errant nodes.. Thanks and Regards, Sonal https://github.com/sonalgoyal/hihoHadoop ETL and Data Integrationhttps://github.com/sonalgoyal/hiho Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Tue, Mar 1, 2011 at 7:37 PM, bikash sharma sharmabiks...@gmail.comwrote: Hi James, Sorry for the late response. No, the same problem persists. I reformatted HDFS, stopped mapred and hdfs daemons and restarted them (using start-dfs.sh and start-mapred.sh from master node). But surprisingly out of 4 nodes cluster, two nodes have TaskTracker running while other two do not have TaskTrackers on them (verified using jps). I guess since I have the Hadoop installed on shared storage, that might be the issue? Btw, how do I start the services independently on each node? -bikash On Sun, Feb 27, 2011 at 11:05 PM, James Seigel ja...@tynt.com wrote: Did you get it working? What was the fix? Sent from my mobile. Please excuse the typos. On 2011-02-27, at 8:43 PM, Simon gsmst...@gmail.com wrote: Hey Bikash, Maybe you can manually start a tasktracker on the node and see if there are any error messages. Also, don't forget to check your configure files for mapreduce and hdfs and make sure datanode can start successfully first. After all these steps, you can submit a job on the master node and see if there are any communication between these failed nodes and the master node. Post your error messages here if possible. HTH. Simon - On Sat, Feb 26, 2011 at 10:44 AM, bikash sharma sharmabiks...@gmail.com wrote: Thanks James. Well all the config. files and shared keys are on a shared storage that is accessed by all the nodes in the cluster. At times, everything runs fine on initialization, but at other times, the same problem persists, so was bit confused. Also, checked the TaskTracker logs on those nodes, there does not seem to be any error. -bikash On Sat, Feb 26, 2011 at 10:30 AM, James Seigel ja...@tynt.com wrote: Maybe your ssh keys aren’t distributed the same on each machine or the machines aren’t configured the same? J On 2011-02-26, at 8:25 AM, bikash sharma wrote: Hi, I have a 10 nodes Hadoop cluster, where I am running some benchmarks for experiments. Surprisingly, when I initialize the Hadoop cluster (hadoop/bin/start-mapred.sh), in many instances, only some nodes have TaskTracker process up (seen using jps), while other nodes do not have TaskTrackers. Could anyone please explain? Thanks, Bikash -- Regards, Simon
TaskTracker not starting on all nodes
Hi, I have a 10 nodes Hadoop cluster, where I am running some benchmarks for experiments. Surprisingly, when I initialize the Hadoop cluster (hadoop/bin/start-mapred.sh), in many instances, only some nodes have TaskTracker process up (seen using jps), while other nodes do not have TaskTrackers. Could anyone please explain? Thanks, Bikash
Re: TaskTracker not starting on all nodes
Thanks James. Well all the config. files and shared keys are on a shared storage that is accessed by all the nodes in the cluster. At times, everything runs fine on initialization, but at other times, the same problem persists, so was bit confused. Also, checked the TaskTracker logs on those nodes, there does not seem to be any error. -bikash On Sat, Feb 26, 2011 at 10:30 AM, James Seigel ja...@tynt.com wrote: Maybe your ssh keys aren’t distributed the same on each machine or the machines aren’t configured the same? J On 2011-02-26, at 8:25 AM, bikash sharma wrote: Hi, I have a 10 nodes Hadoop cluster, where I am running some benchmarks for experiments. Surprisingly, when I initialize the Hadoop cluster (hadoop/bin/start-mapred.sh), in many instances, only some nodes have TaskTracker process up (seen using jps), while other nodes do not have TaskTrackers. Could anyone please explain? Thanks, Bikash
definition of slots in Hadoop scheduling
Hi, How is task slot in Hadoop defined with respect to scheduling a map/reduce task on such slots available on TaskTrackers? Thanks, Bikash
Re: definition of slots in Hadoop scheduling
Thanks very much Harsh. It seems then that slots are not defined in terms of actual machine resource capacities in terms of cpu, memory, disk and network bandwidth. -bikash On Fri, Feb 25, 2011 at 11:33 AM, Harsh J qwertyman...@gmail.com wrote: Please see this archived thread for a very similar question on what tasks really are: http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3c126335.8536...@web112111.mail.gq1.yahoo.com%3E Right now, they're just a cap number for parallelization, hand-configured and irrespective of the machine's capabilities. However, a Scheduler may take machine's states into account while assigning tasks to one. On Fri, Feb 25, 2011 at 7:22 PM, bikash sharma sharmabiks...@gmail.com wrote: Hi, How is task slot in Hadoop defined with respect to scheduling a map/reduce task on such slots available on TaskTrackers? Thanks, Bikash -- Harsh J www.harshj.com
measure the resource usage of each map/reduce task
Hi, Is there any way in which we can measure the resource usage of each map/reduce task running? I was trying to use sar utility to track each process resource usage, however it seems these individual map/reduce tasks are not listed as processes when I do ps -ex. Thanks, Bikash
task scheduling based on slots in Hadoop
Hi, Can anyone throw some more light on resource based scheduling in Hadoop. Specifically, are resources like CPU, Memory partitioned across slots? From the blog by Arun on capacity scheduler, http://developer.yahoo.com/blogs/hadoop/posts/2011/02/capacity-scheduler/ I understand that memory is the only resource supported, does that mean both memory and CPU are partitioned across map/reduce tasks in slots? Thanks in advance. -bikash
measure the time taken by stragglers
Hi, Is there a way in which we can measure the execution time for stragglers and non-stragglers tasks separately in Hadoop mapreduce? -bikash