okay. maybe you are right. thanks
2013/11/4 Pradeep Gollakota <pradeep...@gmail.com> > You would only be able to set it for the script... which means it will > apply to all 8 jobs. However, my guess is that you don't need to control > the number of map tasks per machine. > > > On Sun, Nov 3, 2013 at 4:21 PM, John <johnnyenglish...@gmail.com> wrote: > > > Thanks for your answer! How can I set the mapred.tasktracker.map.tasks. > > maxiumum value only for this speficic job? For example the pig script is > > creating 8 jobs, and I only want to modify this value for the first job? > I > > think there is no option in PigLatin to influence this value? > > > > kind regards > > > > > > > > > > 2013/11/4 Pradeep Gollakota <pradeep...@gmail.com> > > > > > I think you’re misunderstanding how HBaseStorage works. HBaseStorage > uses > > > the HBaseInputFormat underneath the hood. The number of map tasks that > > are > > > spawned is dependent on the number of regions you have. The map tasks > are > > > spawned such that the tasks are local to the regions they’re reading > > from. > > > You will typically not have to worry about problems such as this with > > > MapReduce. If you do have some performance concerns, you can set the > > > mapred.tasktracker.map.tasks.maxiumum setting in the job conf and it > will > > > not affect all the other jobs. > > > > > > > > > On Sun, Nov 3, 2013 at 3:04 PM, John <johnnyenglish...@gmail.com> > wrote: > > > > > > > Hi, > > > > > > > > is it possible to limit the number of map slots used for the load > > > function? > > > > For example I have 5 nodes with 10 map slots (each node has 2 slots > for > > > > every cpu) I want only one map task for every node. Is there a way to > > set > > > > this only for the load function? I know there is a option called > > > > "mapred.tasktracker.map.tasks.maximum", > > > > but this would influence every MapReduce job. I want to influence the > > > > number only for this specific job. > > > > > > > > My use case is the following: I'm using a modified version of the > > > > HBaseStorage function. I try to load for example 10 different rowkeys > > > with > > > > very big column sizes and join them afterwords. Since the columns all > > > have > > > > the same column family every row can be stored to a different server. > > For > > > > example rowkey rowkey 1-5 is stored on node1 and the other rowkeys on > > the > > > > other nodes. So If I create a Pig script to load the 10 keys and join > > > them > > > > afterwards this will end up in 1 MapReduce Job with 10 map task and > > some > > > > reduce tasks (depends on the parallel factor). The problem is that > > there > > > > will be created 2 map tasks on node1, because there are 2 slots > > > available. > > > > This means every task is reading simultaneously a large number of > > columns > > > > from the local hard drive. Maybe I'm wrong, but this should be a > > > > performance issue?! It should be faster if to read each rowkey one > > after > > > > another!? > > > > > > > > kind regards > > > > > > > > > >