Re: No. of Map and reduce tasks
What if I had multiple files in input directory, hadoop should then fire parallel map jobs? On Thu, May 26, 2011 at 7:21 PM, jagaran das jagaran_...@yahoo.co.in wrote: If you give really low size files, then the use of Big Block Size of Hadoop goes away. Instead try merging files. Hope that helps From: James Seigel ja...@tynt.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 6:04:07 PM Subject: Re: No. of Map and reduce tasks Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
No. of Map and reduce tasks
How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
If you give really low size files, then the use of Big Block Size of Hadoop goes away. Instead try merging files. Hope that helps From: James Seigel ja...@tynt.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 6:04:07 PM Subject: Re: No. of Map and reduce tasks Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
separate JVM flags for map and reduce tasks
Hi, I 'd like to pass different JVM options for map tasks and different ones for reduce tasks. I think it should be straightforward to add mapred.mapchild.java.opts, mapred.reducechild.java.opts to my conf/mapred-site.xml and process the new options accordingly in src/mapred/org/apache/mapreduce/TaskRunner.java . Let me know if you think it's more involved than what I described. My question is: if mapred.job.reuse.jvm.num.tasks is set to -1 (always reuse), can the same JVM be re-used for different types of tasks? So the same JVM being used e.g. first by a map task and then used by reduce task. I am assuming this is definitely possible, though I haven't verified in the code. So , if one wants to pass different jvm options to map tasks and reduce tasks, perhaps jobs.reuse.jvm.num.task should be set to 1 (never reuse) ? thanks for your help, - Vasilis