Re: No. of Map and reduce tasks

2011-05-31 Thread Mohit Anchlia
What if I had multiple files in input directory, hadoop should then
fire parallel map jobs?


On Thu, May 26, 2011 at 7:21 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 If you give really low size files, then the use of Big Block Size of Hadoop
 goes away.
 Instead try merging files.

 Hope that helps



 
 From: James Seigel ja...@tynt.com
 To: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 6:04:07 PM
 Subject: Re: No. of Map and reduce tasks

 Set input split size really low,  you might get something.

 I'd rather you fire up some nix commands and pack together that file
 onto itself a bunch if times and the put it back into hdfs and let 'er
 rip

 Sent from my mobile. Please excuse the typos.

 On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I think I understand that by last 2 replies :)  But my question is can
 I change this configuration to say split file into 250K so that
 multiple mappers can be invoked?

 On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in
 wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.






No. of Map and reduce tasks

2011-05-26 Thread Mohit Anchlia
How can I tell how the map and reduce tasks were spread accross the
cluster? I looked at the jobtracker web page but can't find that info.

Also, can I specify how many map or reduce tasks I want to be launched?

From what I understand is that it's based on the number of input files
passed to hadoop. So if I have 4 files there will be 4 Map taks that
will be launced and reducer is dependent on the hashpartitioner.


Re: No. of Map and reduce tasks

2011-05-26 Thread jagaran das
Hi Mohit,

No of Maps - It depends on what is the Total File Size / Block Size 
No of Reducers - You can specify.

Regards,
Jagaran 




From: Mohit Anchlia mohitanch...@gmail.com
To: common-user@hadoop.apache.org
Sent: Thu, 26 May, 2011 2:48:20 PM
Subject: No. of Map and reduce tasks

How can I tell how the map and reduce tasks were spread accross the
cluster? I looked at the jobtracker web page but can't find that info.

Also, can I specify how many map or reduce tasks I want to be launched?

From what I understand is that it's based on the number of input files
passed to hadoop. So if I have 4 files there will be 4 Map taks that
will be launced and reducer is dependent on the hashpartitioner.


Re: No. of Map and reduce tasks

2011-05-26 Thread Mohit Anchlia
I ran a simple pig script on this file:

-rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

that orders the contents by name. But it only created one mapper. How
can I change this to distribute accross multiple machines?

On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.



Re: No. of Map and reduce tasks

2011-05-26 Thread James Seigel
have more data for it to process :)


On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:
 
 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log
 
 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?
 
 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi Mohit,
 
 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.
 
 Regards,
 Jagaran
 
 
 
 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks
 
 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.
 
 Also, can I specify how many map or reduce tasks I want to be launched?
 
 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.
 



Re: No. of Map and reduce tasks

2011-05-26 Thread Mohit Anchlia
I think I understand that by last 2 replies :)  But my question is can
I change this configuration to say split file into 250K so that
multiple mappers can be invoked?

On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.





Re: No. of Map and reduce tasks

2011-05-26 Thread James Seigel
Set input split size really low,  you might get something.

I'd rather you fire up some nix commands and pack together that file
onto itself a bunch if times and the put it back into hdfs and let 'er
rip

Sent from my mobile. Please excuse the typos.

On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I think I understand that by last 2 replies :)  But my question is can
 I change this configuration to say split file into 250K so that
 multiple mappers can be invoked?

 On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in 
 wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.





Re: No. of Map and reduce tasks

2011-05-26 Thread jagaran das
If you give really low size files, then the use of Big Block Size of Hadoop 
goes away.
Instead try merging files.

Hope that helps




From: James Seigel ja...@tynt.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Thu, 26 May, 2011 6:04:07 PM
Subject: Re: No. of Map and reduce tasks

Set input split size really low,  you might get something.

I'd rather you fire up some nix commands and pack together that file
onto itself a bunch if times and the put it back into hdfs and let 'er
rip

Sent from my mobile. Please excuse the typos.

On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I think I understand that by last 2 replies :)  But my question is can
 I change this configuration to say split file into 250K so that
 multiple mappers can be invoked?

 On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in 
wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.





separate JVM flags for map and reduce tasks

2010-04-22 Thread Vasilis Liaskovitis
Hi,

I 'd like to pass different JVM options for map tasks and different
ones for reduce tasks. I think it should be straightforward to add
mapred.mapchild.java.opts, mapred.reducechild.java.opts to my
conf/mapred-site.xml and process the new options accordingly in
src/mapred/org/apache/mapreduce/TaskRunner.java . Let me know if you
think it's more involved than what I described.

My question is: if mapred.job.reuse.jvm.num.tasks is set to -1 (always
reuse), can the same JVM be re-used for different types of tasks? So
the same JVM being used e.g. first by a map task and then used by
reduce task. I am assuming this is definitely possible, though I
haven't verified in the code.
So , if one wants to pass different jvm options to map tasks and
reduce tasks, perhaps jobs.reuse.jvm.num.task should be set to 1
(never reuse) ?

thanks for your help,

- Vasilis