Hi,
>>What would really be great for me is if I could have the Reducer start
>>processing the map outputs as they are ready, and not after all Mappers finish
Check the property mapred.reduce.slowstart.completed.maps
>>I've read about chaining mappers, but to the best of my understanding the
>>se
Hi,
You might want to check https://issues.apache.org/jira/browse/HADOOP-4179
And http://hadoop.apache.org/common/docs/current/vaidya.html
Amogh
On 6/2/10 1:24 PM, "WANG Shicai" wrote:
Hi,
This message is a little long. I beg your patient.
Our team would like to tune MR performance by chang
Hi,
A hack that immediately comes to my mind is having the mapper touch a
predetermined filepath and use that to clean up. Or alternatively, check the
RunningJob interface available via JobClient, you can monitor and kill tasks
from there too.
Amogh
On 5/4/10 9:46 AM, "Ersoy Bayramoglu" wrot
Hi,
Not sure if this can be done.
Here's a relevant snippet of code:
{
super(inputCounter, conf, reporter);
combinerClass = cls;
keyClass = (Class) job.getMapOutputKeyClass();
valueClass = (Class) job.getMapOutputValueClass();
comparator = (RawComparator) job.getOutpu
K, the chaining has to be defined before the job is started, right?
But because I don't know the value of K beforehand,
I want the chain to continue forever until some counter in reduce task is zero.
Felix Halim
On Thu, Feb 4, 2010 at 3:53 PM, Amogh Vasekar wrote:
>
>>>However, from ri
n this case? If that is the
case, can a custom scheduler be written -- will it be any easy task?
Regards,
Raghava.
On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar wrote:
Hi,
>>Will there be a re-assignment of Map & Reduce nodes by the Master?
In general using available schedulers, I
Hi,
Yes the same location is populated with different values ( returned by
iter.next() ) for optimization reasons. There is a new patch which will allow
you to mark() and reset() iterator so that you buffer required values (
equivalently you can do that yourself, its anyways in-mem for the patch
>>However, from ri to m(i+1) there is an unnecessary barrier. m(i+1) should not
>>need to wait for all reducers ri to finish, right?
Yes, but r(i+1) cant be in the same job, since that requires another sort and
shuffle phase ( barrier ). So you would end up doing, job(i) : m(i)r(i)m(i+1) .
Job
get the same nodes always to run your map
>>> reduce job on a
>>> shared cluster?
while (!done) { JobClient.runJob(jobConf); <>}
If I write something like that in the code, would not the Map node run on the
same data chunk it has each time? Will there be a re-assignment o
Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split is
actually a reference to data, which is used to schedule job as close as
possible to data. The record reader then uses same object to pass the in
split.
W
.clements=disney....@hadoop.apache.org]
On Behalf Of Amogh Vasekar
Sent: Tuesday, January 19, 2010 10:53 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: chained mappers & reducers
Hi,
Can you elaborate on your case a little?
If you need sort and shuffle ( ie outputs of different reducer ta
Hi,
Can you elaborate on your case a little?
If you need sort and shuffle ( ie outputs of different reducer tasks of R1 to
be aggregated in some way ) , you have to write another map-red job. If you
need to process only local reducer data ( ie your reducer output key is same as
input key ), you
Hi,
>>so I wanted to try and lower the number to 10 and see how the performance is
The number of mappers is provided as only a hint to the framework, it is not
guaranteed to be that number.
>>I have been digging around in the hadoop source code and it looks like the
>>JobClient actually sets the
e second job and
appending the sum value to each record.
Kind regards
Steve Watt
From: Amogh Vasekar
To: "mapreduce-user@hadoop.apache.org"
Date: 01/12/2010 02:01 PM
Subject: Re: How do I sum by Key in the Reduce Phase AND keep the initial value
Hi,
I ran into a very similar situation quite some time back and had then
encountered this : http://issues.apache.org/jira/browse/HADOOP-475
After speaking to a few Hadoop folks, they had said complete cloning was not a
straightforward option for some optimization reasons.
There were a few things
rther assume, I need only apply the latest patch, which is 5.
Am I correct.
On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar wrote:
http://issues.apache.org/jira/browse/MAPREDUCE-370
You'll have to work around for now / try to apply patch.
Amogh
On 12/9/09 8:54 PM, "Geoffry Rob
http://issues.apache.org/jira/browse/MAPREDUCE-370
You'll have to work around for now / try to apply patch.
Amogh
On 12/9/09 8:54 PM, "Geoffry Roberts" wrote:
Aaron,
I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
lib.output.MultipleOutputs. I'm using the download page w
Hi,
The NLineInputFormat (o.a.h.mapreduce.lib.input) achieves more or less the
same, and should help you guide writing custom input format :)
Amogh
On 12/1/09 11:47 AM, "Kunal Gupta" wrote:
Can someone explain how to override the "FileInputFormat" and
"RecordReader" in order to be able to rea
Hi,
I'm pretty sure you need to specify unicode equivalent, or atleast that is what
I used in my java map-red program.
Amogh
On 11/10/09 9:24 AM, "wd" wrote:
hi,
I'm try to write a hadoop streaming job by perl. But i'm complately confused by
the key/value separator.
I found lots of separat
Hi,
The file name generated depends on the output pair and hence, if you are
modifying the key from mapper to reducer output, collisions are possible. You
may split and append 'name' (* from part-*) to get unique reducer files, which
can be merged later. Or see if multipleoutputs fits your bill
Hi,
Can you let us know if the count of attempt_ s is 32k - 1? I remember reading
about similar error sometime back.
Amogh
On 10/26/09 9:06 AM, "Ed Mazur" wrote:
I'm having problems on 0.20.0 when map output compression is enabled.
Map tasks complete (TaskRunner: Task 'attempt_*' done), but
Hi All,
Regarding the JVM reuse feature incorporated, it says reuse is generally
recommended for streaming and pipes jobs. I'm a little unclear on this and any
pointers will be appreciated.
Also, in what scenarios will this feature be helpful for java mapred jobs?
Thanks,
Amogh
er-specific
processing) ---> Store mails to designated boxes.
Do you have any suggestion? I am thinking about JVM re-use feature of Hadoop or
I can set up a chain of two map-reduce pairs.
Best regards.
Fang.
On Mon, Aug 24, 2009 at 1:25 PM, Amogh Vasekar
mailto:am...@yahoo-inc.com>> wrote
No, but if you want a "reducer like" functionality on the same node, have a
look at combiners. To get exact functionality you might need to tweak around a
little wrt buffers, flush etc.
Cheers!
Amogh
From: fan wei fang [mailto:eagleeye8...@gmail.com]
Sent: Monda
Same amount of data will have to be read and transferred over network, same
file or multiple files. If you do merge to a single file, the S&S phase
actually cant start till all mappers have finished, as opposed to fetching
outputs from individual mapper tasks which can be as soon as it has finis
25 matches
Mail list logo