Think of it in partition terms. If you know that your map-splits X, Y
and Z won't emit any key of partition P, then the Pth reducer can jump
ahead and run without those X, Y and Z completing their processing.
Otherwise, a reducer can't run until all maps have completed, in fear
of losing a few key
Thanks Harsh
I did set mapred.map.tasks = 1
but still I can consistently see 3 mappers being invoked
and the order is always like this:
_2_0
***_0_0
***_1_0
the 2_0 and 1_0 tasks are the ones that consume 0 data
this does look like a bug
you could try with a simp
Er, sorry I meant mapred.map.tasks = 1
On Thu, Jul 12, 2012 at 10:44 AM, Harsh J wrote:
> Try passing mapred.map.tasks = 0 or set a higher min-split size?
>
> On Thu, Jul 12, 2012 at 10:36 AM, Yang wrote:
>> Thanks Harsh
>>
>> I see
>>
>> then there seems to be some small problems with the Split
Try passing mapred.map.tasks = 0 or set a higher min-split size?
On Thu, Jul 12, 2012 at 10:36 AM, Yang wrote:
> Thanks Harsh
>
> I see
>
> then there seems to be some small problems with the Splitter / InputFormat.
>
> I'm just reading a 1-line text file through pig:
>
> A = LOAD 'myinput.txt' ;
yes, let me try that
changing the max mapper slot actually requires changing the hadoop config,
since I just found that
it's "final" param
On Wed, Jul 11, 2012 at 10:05 PM, Harsh J wrote:
> Your problem is more from the fact that you are running > 1 map slot
> per TT, and multiple mappers are
Thanks Harsh
I see
then there seems to be some small problems with the Splitter / InputFormat.
I'm just reading a 1-line text file through pig:
A = LOAD 'myinput.txt' ;
supposedly it should generate at most 1 mapper.
but in reality , it seems that pig generated 3 mappers, and basically fed
em
Your problem is more from the fact that you are running > 1 map slot
per TT, and multiple mappers are getting run at the same time, all
trying to bind to the same port. Limit your TT's max map tasks to 1
when you're relying on such techniques to debug, or use the
LocalJobRunner/Apache MRUnit instea
Yang,
No, those three are individual task attempts.
This is how you may generally dissect an attempt ID when reading it:
attempt_201207111710_0024_m_00_0
1. "attempt" - indicates its an attempt ID you'll be reading
2. "201207111710" - The job tracker timestamp ID, indicating which
instance
Yes speculative execution will affect your tasks, please read the FAQ
to understand the use of OutputCommitters:
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
On Thu, May 17, 2012 at 2:02 PM, Abhay Ratnaparkhi
wrote:
>
I have multiple reducers running simmultaneously. Each reducer is supposed
to output data in different file.
I'm creating a file on HDFS using fs.create() command in each reducer.
Will speculative execution of tasks affects the output as I'm not using any
outputFormat provided?
~Abhay
.
Thanks,
Anil
On Fri, Mar 30, 2012 at 9:54 PM, Harsh J wrote:
> Anil,
>
> You can also disable speculative execution on a per-job basis. See
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setMapSpeculativeExecution(boolean)
> (Which is
vmc visible 20 Oct 29 09:54 segments.gen
>
> part-4/data/spellchecker:
> total 16
> -rw-r--r-- 1 vmc visible 32 Oct 29 09:54 segments_1
> -rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen
>
>
> What might cause that attempt path to be lying around at the time of
&
20 Oct 29 09:54 segments.gen
What might cause that attempt path to be lying around at the time of
completion? Has anyone seen anything like this? My gut says if we were able
to disable speculative execution, we would probably see this go away. But
that might be overreacting.
In this job, of the 12
Hi,
Just wanted to post an update on this issue. I didn't spend a lot of time to
verify for sure what was going wrong but speculative execution definitely
was not the cause of the problem here. I was seeing job failures even with
speculative execution set to ON.
By recreating HDFS enviro
Hi Matei,
Thanks for your feedback. I am trying to verify/debug whether the failures
are actually due to speculative execution. I will send an update once I more
info on this.
-Shrinivas
On Thu, Jun 2, 2011 at 12:40 AM, Matei Zaharia wrote:
> Usually the number of speculatively executed ta
n't think OOM errors would be caused by not having speculation though;
there must be another problem causing that.
Matei
On Jun 1, 2011, at 12:42 PM, Shrinivas Joshi wrote:
> To find out whether it had any positive performance impact, I am trying with
> turning OFF speculative executio
To find out whether it had any positive performance impact, I am trying with
turning OFF speculative execution. Surprisingly, the job starts to fail in
reduce phase with OOM errors when I disable speculative execution for both
map and reduce tasks. Has anybody noticed similar behavior? Is there a
On Mar 3, 2011, at 3:29 PM, Jacob R Rideout wrote:
> On Thu, Mar 3, 2011 at 2:04 PM, Keith Wiley wrote:
>> On Mar 3, 2011, at 2:51 AM, Steve Loughran wrote:
>>
>>> yes, but the problem is determining which one will fail. Ideally you should
>>> find the route cause, which is often some race con
On Thu, Mar 3, 2011 at 2:04 PM, Keith Wiley wrote:
> On Mar 3, 2011, at 2:51 AM, Steve Loughran wrote:
>
>> yes, but the problem is determining which one will fail. Ideally you should
>> find the route cause, which is often some race condition or hardware fault.
>> If it's the same server ever t
On Mar 3, 2011, at 2:51 AM, Steve Loughran wrote:
> yes, but the problem is determining which one will fail. Ideally you should
> find the route cause, which is often some race condition or hardware fault.
> If it's the same server ever time, turn it off.
> You can play with the specex paramete
On 02/03/11 21:01, Keith Wiley wrote:
I realize that the intended purpose of speculative execution is to overcome individual
slow tasks...and I have read that it explicitly is *not* intended to start copies of a
task simultaneously and to then race them, but rather to start copies of tasks
I realize that the intended purpose of speculative execution is to overcome
individual slow tasks...and I have read that it explicitly is *not* intended to
start copies of a task simultaneously and to then race them, but rather to
start copies of tasks that "seem slow" after running f
zip files. I have
> to
> > play games so that the names of the zip files don't collide - and I am
> not
> > sure if this is stable.
> >
> > What am I missing in my understanding?
> >
> > Thank you,
> > Mark
> >
>
> You should take a look a
23 matches
Mail list logo