Did you mean to say "timeout" instead of "spill"? Spills don't cause task failures (unless a spill fails). Default timeout for a task is 10 min. It would be very helpful to have a stack trace to look at, at the very least.
On Fri, Jan 10, 2014 at 7:53 AM, Zebeljan, Nebojsa < [email protected]> wrote: > Hi Serega, > Default task attempts = 4 > --> Yes, 4 task attempts > > Do you use any "balancing" properties, for eaxmple > pig.exec.reducers.bytes.per.reducer > --> No > > I suppose you have unbalanced data > --> I guess so > > It's better to provide logs > --> Unfortunately not possible any more "May be cleaned up by Task > Tracker, if older logs" > > Regards, > Nebo > ________________________________________ > From: Serega Sheypak [[email protected]] > Sent: Friday, January 10, 2014 2:32 PM > To: [email protected] > Subject: Re: Spilling issue - Optimize "GROUP BY" > > "and after trying it on several datanodes in the end it failes" > Default task attempts = 4? > > 1. It's better to provde logs > 2. Do you use any "balancing" properties, for eaxmple > pig.exec.reducers.bytes.per.reducer ? > > I suppose you have unbalanced data > > > 2014/1/10 Zebeljan, Nebojsa <[email protected]> > > > Hi, > > I'm encountering for a "simple" pig script, spilling issues. All map > tasks > > and reducers succeed pretty fast except the last reducer! > > The last reducer always starts spilling after ~10mins and after trying it > > on several datanodes in the end it failes. > > > > Do you have any idea, how I could optimize the GROUP BY, so I don't run > > into spilling issues. > > > > Thanks in advance! > > > > Below the pig script: > > ### > > dataImport = LOAD <some data>; > > generatedData = FOREACH dataImport GENERATE Field_A, Field_B, Field_C; > > groupedData = GROUP generatedData BY (Field_B, Field_C); > > > > result = FOREACH groupedData { > > counter_1 = FILTER generatedData BY <some fields>; > > counter_2 = FILTER generatedData BY <some fields>; > > GENERATE > > group.Field_B, > > group.Field_C, > > COUNT(counter_1), > > COUNT(counter_2); > > } > > > > STORE result INTO <some path> USING PigStorage(); > > ### > > > > Regards, > > Nebo > > > > >
