Pass input through a logging UDF (a basic udf that just echoes input to stderr)?
On Thu, Jan 27, 2011 at 8:46 AM, Greg Langmead <[email protected]> wrote: > Thank you, Dmitriy. I do see which relations my map-only job was working > on, > but how do I see which subset of the data a given piece of that Map job was > working on, e.g. attempt_201101201235_0064_m_000243_0 > > If I save the input data by storing it before the Map job runs, I will > still > have the conundrum of identifying which subset of it went to piece 243, > unless I'm misunderstanding. > > Greg > > On 1/26/11 6:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote: > > > Greg, > > Pig 8 tells you which job is responsible for which set of operators; you > can > > save all the inputs to the map only job by inserting intermediate stores, > > and debug just the map-only job. > > > > D > > > > On Wed, Jan 26, 2011 at 2:49 PM, Greg Langmead <[email protected]> > wrote: > > > >> Pig 0.8 executes my script by running six jobs. One of them is > identified > >> as > >> "MAP_ONLY" and it always fails, with the innermost error I can find > either > >> saying "GC overhead limit exceeded" or "Java heap space". I suspect I > have > >> a > >> piece that is too large. How can I get my hands on the actual data it > was > >> processing, so I can ascertain the cause? The task log says "Input > records > >> from tmp1872359169" can I view that data? > >> > >> Thanks, > >> > >> Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1 > >> 310 > >> 437 7300 > >> > >> </pre> > >> <BR style="font-size:4px;"> > >> <a href = "http://www.sdl.com/innovate"><img src=" > >> http://www.sdl.com/images/Innovate2011_emailsignature_final.png" alt=" > >> www.sdl.com" border="0"/></a> > >> <BR> > >> <font face="arial" size="2"><a href ="http://www.sdl.com/innovate" > >> style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font> > >> <BR> > >> <BR> > >> <font face="arial" size="1" color="#736F6E"> > >> <b>SDL PLC confidential, all rights reserved.</b> > >> If you are not the intended recipient of this mail SDL requests and > >> requires that you delete it without acting upon or copying any of its > >> contents, and we further request that you advise us.<BR> > >> SDL PLC is a public limited company registered in England and Wales. > >> Registered number: 02675207.<BR> > >> Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire > SL6 > >> 7DY, UK. > >> </font> > >> > >
