2 tasks at the same time, for a total of 25 tasks at the end. Maybe as you are saying, I'm not facing the good jobtracker? I'm running the command line on the master server.
If I look at the map tasks, I can see that: Input Split Locations /default-rack/node1 With differents values depending on the tasks, but on the same page I can see machine=/default-rack/node3 (which is my master). How/where should I run this? Should I point it to Zookeeper instance instead? Thanks, JM 2012/10/11 Jean-Daniel Cryans <[email protected]>: > 2 tasks total or that are running at the same time? If latter, it just > means that you are using the local job tracker instead of your job > tracker because HBase couldn't find your MR config. > > J-D > > On Thu, Oct 11, 2012 at 1:36 PM, Jean-Marc Spaggiari > <[email protected]> wrote: >> Hi J-D, >> >> I have about 20M rows over 25 regions on 6 nodes. So that mean I >> should see something like 6 tasks or even 25, right? And not just 2? >> Keys are 128 byte long. Value is 1 byte. >> >> I tried also to update mapreduce.tasktracker.map.tasks.maximum but >> this is "the number of map tasks that should be launched on each node, >> not the number of nodes to be used for each map task.", so there was >> no changes, as expected. >> >> JM >> >> 2012/10/11 Jean-Daniel Cryans <[email protected]>: >>> On Thu, Oct 11, 2012 at 1:20 PM, Jean-Marc Spaggiari >>> <[email protected]> wrote: >>>> I'm now using thsi command line and it's working fine (except for the >>>> number of tasks). >>>> HADOOP_CLASSPATH=`/home/hbase/hbase-0.94.0/bin/hbase >>>> classpath`:`/home/hadoop/hadoop-1.0.3/bin/hadoop classpath` >>>> /home/hadoop/hadoop-1.0.3/bin/hadoop jar >>>> /home/hbase/hbase-0.94.0/hbase-0.94.1.jar rowcounter >>>> -Dhbase.client.scanner.caching=100 -Dmapred.map.tasks=6 >>>> -Dmapred.map.tasks.speculative.execution=false work_proposed >>>> >>>> I simply don't know if the -D parameters are taken into consideration >>>> since I get the same results (numbers of tasks, time of exec, etc.) >>>> with and without them. >>> >>> Using a higher caching value won't do much good if you don't have a >>> lot of rows. Since you didn't include any data like that in your >>> email, I won't guess how much 100 would help your case. >>> >>> The number of map tasks when mapping an HBase table will be the number >>> of regions you have in that table. Unfortunately you can't change it >>> unless you write your own input format for HBase. >>> >>> J-D
