Hbase and ghost regionservers
Hi, when I start my hbase cluster, hbase sometimes list 'ghost regionservers' without any regions kongs2.medisin.ntnu.no:60020 1345115409411 requests=0, regions=0, usedHeap=0, maxHeap=0 netstat does not list any services on 60020 if I start a region server locally I get one real services and the ghost server kongs2.medisin.ntnu.no:60020 1345119112497 requests=0, regions=64, usedHeap=141, maxHeap=1487 kongs2.medisin.ntnu.no:60020 1345119112497 requests=0, regions=0, usedHeap=0, maxHeap=0 Is it possible to manually blacklist or remove regionserver, via the hbase shell? I use the latest hbase from cloudera -Håvard
Relevance of mapreduce.* configuration properties for MR V2
Hi, I am currently trying to tune a CDH 4.0.1 cluster running HDFS, YARN, and HBase managed by Cloudera Manager 4.0.3 (Free Edition). In CM, there are a number of options for setting mapreduce.* configuration properties on the YARN client page. Some of the explanations in the GUI still refer to JobTracker and TaskTracker, e.g., - mapreduce.jobtracker.handler.count, - mapreduce.tasktracker.map.tasks.maximum, - mapreduce.tasktracker.reduce.tasks.maximum I wonder whether these and a number of other mapreduce.* (e.g., mapreduce.job.reduces) properties are observed by the MR2 ApplicationMaster or not. Can anyone clarify or point to respective documentation? Thanks, Martin
Hadoop idioms for reporting cluster and counter stats.
Hi guys : I want to start automating the output of counter stats, cluster size, etc... at the end of the main map reduce jobs which we run. Is there a simple way to do this ? Here is my current thought : 1) Run all jobs from a driver class (we already do this). 2) At the end of each job, intercept the global counters and write them out to a text file. This would presumably be on the local fs. 3) Export the local filesystem. 4) Maybe the NameNode also has access to such data , maybe via an API (clearly, the hadoop web ui gets this data from somewhere, re in the cluster summary header.. -- Jay Vyas MMSB/UCHC
Re: Number of Maps running more than expected
Also could you tell us more about your task statuses? You might also have failed tasks... Bertrand On Thu, Aug 16, 2012 at 11:01 PM, Bertrand Dechoux decho...@gmail.comwrote: Well, there is speculative executions too. http://developer.yahoo.com/hadoop/tutorial/module4.html *Speculative execution:* One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes. By forcing tasks to run in isolation from one another, individual tasks do not know *where* their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed *multiple times in parallel*, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as *speculative execution*. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively. Can you tell us your configuration with regards to those parameters? Regards Bertrand On Thu, Aug 16, 2012 at 8:36 PM, in.abdul in.ab...@gmail.com wrote: Hi Gaurav, Number map is not depents upon number block . It is really depends upon number of input splits . If you had 100GB of data and you had 10 split means then you can see only 10 maps . Please correct me if i am wrong Thanks and regards, Syed abdul kather On Aug 16, 2012 7:44 PM, Gaurav Dasgupta [via Lucene] ml-node+s472066n4001631...@n3.nabble.com wrote: Hi users, I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the 12 nodes and 1 node running the Job Tracker). In order to perform a WordCount benchmark test, I did the following: - Executed RandomTextWriter first to create 100 GB data (Note that I have changed the test.randomtextwrite.total_bytes parameter only, rest all are kept default). - Next, executed the WordCount program for that 100 GB dataset. The Block Size in hdfs-site.xml is set as 128 MB. Now, according to my calculation, total number of Maps to be executed by the wordcount job should be 100 GB / 128 MB or 102400 MB / 128 MB = 800. But when I am executing the job, it is running a total number of 900 Maps, i.e., 100 extra. So, why this extra number of Maps? Although, my job is completing successfully without any error. Again, if I don't execute the RandomTextWwriter job to create data for my wordcount, rather I put my own 100 GB text file in HDFS and run WordCount, I can then see the number of Maps are equivalent to my calculation, i.e., 800. Can anyone tell me why this odd behaviour of Hadoop regarding the number of Maps for WordCount only when the dataset is generated by RandomTextWriter? And what is the purpose of these extra number of Maps? Regards, Gaurav Dasgupta -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html To unsubscribe from Lucene, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Bertrand Dechoux -- Bertrand Dechoux
Re: Number of Maps running more than expected
You probably have speculative execution on. Extra maps and reduce tasks are run in case some of them fail Raj Sent from my iPad Please excuse the typos. On Aug 16, 2012, at 11:36 AM, in.abdul in.ab...@gmail.com wrote: Hi Gaurav, Number map is not depents upon number block . It is really depends upon number of input splits . If you had 100GB of data and you had 10 split means then you can see only 10 maps . Please correct me if i am wrong Thanks and regards, Syed abdul kather On Aug 16, 2012 7:44 PM, Gaurav Dasgupta [via Lucene] ml-node+s472066n4001631...@n3.nabble.com wrote: Hi users, I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the 12 nodes and 1 node running the Job Tracker). In order to perform a WordCount benchmark test, I did the following: - Executed RandomTextWriter first to create 100 GB data (Note that I have changed the test.randomtextwrite.total_bytes parameter only, rest all are kept default). - Next, executed the WordCount program for that 100 GB dataset. The Block Size in hdfs-site.xml is set as 128 MB. Now, according to my calculation, total number of Maps to be executed by the wordcount job should be 100 GB / 128 MB or 102400 MB / 128 MB = 800. But when I am executing the job, it is running a total number of 900 Maps, i.e., 100 extra. So, why this extra number of Maps? Although, my job is completing successfully without any error. Again, if I don't execute the RandomTextWwriter job to create data for my wordcount, rather I put my own 100 GB text file in HDFS and run WordCount, I can then see the number of Maps are equivalent to my calculation, i.e., 800. Can anyone tell me why this odd behaviour of Hadoop regarding the number of Maps for WordCount only when the dataset is generated by RandomTextWriter? And what is the purpose of these extra number of Maps? Regards, Gaurav Dasgupta -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Number of Maps running more than expected
It would be helpful to see some statistics out of both the jobs like bytes read, written number of errors etc. On Thu, Aug 16, 2012 at 8:02 PM, Raj Vishwanathan rajv...@yahoo.com wrote: You probably have speculative execution on. Extra maps and reduce tasks are run in case some of them fail Raj Sent from my iPad Please excuse the typos. On Aug 16, 2012, at 11:36 AM, in.abdul in.ab...@gmail.com wrote: Hi Gaurav, Number map is not depents upon number block . It is really depends upon number of input splits . If you had 100GB of data and you had 10 split means then you can see only 10 maps . Please correct me if i am wrong Thanks and regards, Syed abdul kather On Aug 16, 2012 7:44 PM, Gaurav Dasgupta [via Lucene] ml-node+s472066n4001631...@n3.nabble.com wrote: Hi users, I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the 12 nodes and 1 node running the Job Tracker). In order to perform a WordCount benchmark test, I did the following: - Executed RandomTextWriter first to create 100 GB data (Note that I have changed the test.randomtextwrite.total_bytes parameter only, rest all are kept default). - Next, executed the WordCount program for that 100 GB dataset. The Block Size in hdfs-site.xml is set as 128 MB. Now, according to my calculation, total number of Maps to be executed by the wordcount job should be 100 GB / 128 MB or 102400 MB / 128 MB = 800. But when I am executing the job, it is running a total number of 900 Maps, i.e., 100 extra. So, why this extra number of Maps? Although, my job is completing successfully without any error. Again, if I don't execute the RandomTextWwriter job to create data for my wordcount, rather I put my own 100 GB text file in HDFS and run WordCount, I can then see the number of Maps are equivalent to my calculation, i.e., 800. Can anyone tell me why this odd behaviour of Hadoop regarding the number of Maps for WordCount only when the dataset is generated by RandomTextWriter? And what is the purpose of these extra number of Maps? Regards, Gaurav Dasgupta -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html To unsubscribe from Lucene, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.