date:20120816

Hbase and ghost regionservers

2012-08-16 Thread Håvard Wahl Kongsgård

Hi, when I start my hbase cluster, hbase sometimes list 'ghost
regionservers' without any regions

kongs2.medisin.ntnu.no:60020 1345115409411
requests=0, regions=0, usedHeap=0, maxHeap=0

netstat does not list any services on 60020

if I start a region server locally

I get one real services and the ghost server

kongs2.medisin.ntnu.no:60020 1345119112497
requests=0, regions=64, usedHeap=141, maxHeap=1487
kongs2.medisin.ntnu.no:60020 1345119112497
requests=0, regions=0, usedHeap=0, maxHeap=0

Is it possible to manually blacklist or remove regionserver, via the
hbase shell?

I use the latest hbase from cloudera

-Håvard

Relevance of mapreduce.* configuration properties for MR V2

2012-08-16 Thread mg


Hi,

I am currently trying to tune a CDH 4.0.1 cluster running HDFS, YARN, 
and HBase managed by Cloudera Manager 4.0.3 (Free Edition).


In CM, there are a number of options for setting mapreduce.* 
configuration properties on the YARN client page.


Some of the explanations in the GUI still refer to JobTracker and 
TaskTracker, e.g.,

- mapreduce.jobtracker.handler.count,
- mapreduce.tasktracker.map.tasks.maximum,
- mapreduce.tasktracker.reduce.tasks.maximum

I wonder whether these and a number of other mapreduce.* (e.g., 
mapreduce.job.reduces) properties are observed by the MR2 
ApplicationMaster or not.


Can anyone clarify or point to respective documentation?

Thanks,
Martin

Hadoop idioms for reporting cluster and counter stats.

2012-08-16 Thread Jay Vyas

Hi guys : I want to start automating the output of  counter stats, cluster
size, etc... at the end of the main map reduce jobs which we run.  Is there
a simple way to do this ?

Here is my current thought :

1) Run all jobs from a driver class (we already do this).

2) At the end of each job, intercept the global counters and write them out
to a text file.  This would
presumably be on the local fs.

3) Export the local filesystem.

4) Maybe the NameNode also has access to such data , maybe via an API
(clearly, the hadoop web ui gets this
data from somewhere, re in the cluster summary header..


-- 
Jay Vyas
MMSB/UCHC

Re: Number of Maps running more than expected

2012-08-16 Thread Bertrand Dechoux

Also could you tell us more about your task statuses?
You might also have failed tasks...

Bertrand

On Thu, Aug 16, 2012 at 11:01 PM, Bertrand Dechoux decho...@gmail.comwrote:

Well, there is speculative executions too.

http://developer.yahoo.com/hadoop/tutorial/module4.html

*Speculative execution:* One problem with the Hadoop system is that by
dividing the tasks across many nodes, it is possible for a few slow nodes
to rate-limit the rest of the program. For example if one node has a slow
disk controller, then it may be reading its input at only 10% the speed of
all the other nodes. So when 99 map tasks are already complete, the system
is still waiting for the final map task to check in, which takes much
longer than all the other nodes.
By forcing tasks to run in isolation from one another, individual tasks
do not know *where* their inputs come from. Tasks trust the Hadoop
platform to just deliver the appropriate input. Therefore, the same input
can be processed *multiple times in parallel*, to exploit differences in
machine capabilities. As most of the tasks in a job are coming to a close,
the Hadoop platform will schedule redundant copies of the remaining tasks
across several nodes which do not have other work to perform. This process
is known as *speculative execution*. When tasks complete, they announce
this fact to the JobTracker. Whichever copy of a task finishes first
becomes the definitive copy. If other copies were executing speculatively,
Hadoop tells the TaskTrackers to abandon the tasks and discard their
outputs. The Reducers then receive their inputs from whichever Mapper
completed successfully, first.
Speculative execution is enabled by default. You can disable speculative
execution for the mappers and reducers by setting the
mapred.map.tasks.speculative.execution and
mapred.reduce.tasks.speculative.execution JobConf options to false,
respectively.

Can you tell us your configuration with regards to those parameters?

Regards

Bertrand

On Thu, Aug 16, 2012 at 8:36 PM, in.abdul in.ab...@gmail.com wrote:

Hi Gaurav,
Number map is not depents upon number block . It is really depends upon
number of input splits . If you had 100GB of data and you had 10 split
means then you can see only 10 maps .

Please correct me if i am wrong

Thanks and regards,
Syed abdul kather
On Aug 16, 2012 7:44 PM, Gaurav Dasgupta [via Lucene]
ml-node+s472066n4001631...@n3.nabble.com wrote:

Hi users,

I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
the 12 nodes and 1 node running the Job Tracker).
In order to perform a WordCount benchmark test, I did the following:

- Executed RandomTextWriter first to create 100 GB data (Note that
I
have changed the test.randomtextwrite.total_bytes parameter only,
rest
all are kept default).
- Next, executed the WordCount program for that 100 GB dataset.

The Block Size in hdfs-site.xml is set as 128 MB. Now, according to
my
calculation, total number of Maps to be executed by the wordcount job
should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
But when I am executing the job, it is running a total number of 900
Maps,
i.e., 100 extra.
So, why this extra number of Maps? Although, my job is completing
successfully without any error.

Again, if I don't execute the RandomTextWwriter job to create data for
my wordcount, rather I put my own 100 GB text file in HDFS and run
WordCount, I can then see the number of Maps are equivalent to my
calculation, i.e., 800.

Can anyone tell me why this odd behaviour of Hadoop regarding the number
of Maps for WordCount only when the dataset is generated by
RandomTextWriter? And what is the purpose of these extra number of Maps?

Regards,
Gaurav Dasgupta

--
If you reply to this email, your message will be added to the
discussion
below:

.
NAML
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context:
http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

--
Bertrand Dechoux

Re: Number of Maps running more than expected

2012-08-16 Thread Raj Vishwanathan

You probably have speculative execution on. Extra maps and reduce tasks are run
in case some of them fail

Raj

Sent from my iPad
Please excuse the typos.

On Aug 16, 2012, at 11:36 AM, in.abdul in.ab...@gmail.com wrote:

Hi Gaurav,
Number map is not depents upon number block . It is really depends upon
number of input splits . If you had 100GB of data and you had 10 split
means then you can see only 10 maps .

Please correct me if i am wrong

Thanks and regards,
Syed abdul kather
On Aug 16, 2012 7:44 PM, Gaurav Dasgupta [via Lucene]
ml-node+s472066n4001631...@n3.nabble.com wrote:

Hi users,

I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
the 12 nodes and 1 node running the Job Tracker).
In order to perform a WordCount benchmark test, I did the following:

- Executed RandomTextWriter first to create 100 GB data (Note that I
have changed the test.randomtextwrite.total_bytes parameter only, rest
all are kept default).
- Next, executed the WordCount program for that 100 GB dataset.

The Block Size in hdfs-site.xml is set as 128 MB. Now, according to my
calculation, total number of Maps to be executed by the wordcount job
should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
But when I am executing the job, it is running a total number of 900 Maps,
i.e., 100 extra.
So, why this extra number of Maps? Although, my job is completing
successfully without any error.

Regards,
Gaurav Dasgupta

--
If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html
To unsubscribe from Lucene, click
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

Re: Number of Maps running more than expected

2012-08-16 Thread Mohit Anchlia

It would be helpful to see some statistics out of both the jobs like bytes
read, written number of errors etc.

On Thu, Aug 16, 2012 at 8:02 PM, Raj Vishwanathan rajv...@yahoo.com wrote:

You probably have speculative execution on. Extra maps and reduce tasks
are run in case some of them fail

Raj

Sent from my iPad
Please excuse the typos.

On Aug 16, 2012, at 11:36 AM, in.abdul in.ab...@gmail.com wrote:

Hi Gaurav,
Number map is not depents upon number block . It is really depends upon
number of input splits . If you had 100GB of data and you had 10 split
means then you can see only 10 maps .

Please correct me if i am wrong

Thanks and regards,
Syed abdul kather
On Aug 16, 2012 7:44 PM, Gaurav Dasgupta [via Lucene]
ml-node+s472066n4001631...@n3.nabble.com wrote:

Hi users,

I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
the 12 nodes and 1 node running the Job Tracker).
In order to perform a WordCount benchmark test, I did the following:

- Executed RandomTextWriter first to create 100 GB data (Note that I
have changed the test.randomtextwrite.total_bytes parameter only,
rest
all are kept default).
- Next, executed the WordCount program for that 100 GB dataset.

Regards,
Gaurav Dasgupta

--
If you reply to this email, your message will be added to the discussion
below:

Hbase and ghost regionservers

Relevance of mapreduce.* configuration properties for MR V2

Hadoop idioms for reporting cluster and counter stats.

Re: Number of Maps running more than expected

Re: Number of Maps running more than expected

Re: Number of Maps running more than expected

6 matches

Site Navigation

Mail list logo

Footer information