Pass input through a logging UDF (a basic udf that just echoes input to
stderr)?

On Thu, Jan 27, 2011 at 8:46 AM, Greg Langmead <[email protected]> wrote:

> Thank you, Dmitriy. I do see which relations my map-only job was working
> on,
> but how do I see which subset of the data a given piece of that Map job was
> working on, e.g. attempt_201101201235_0064_m_000243_0
>
> If I save the input data by storing it before the Map job runs, I will
> still
> have the conundrum of identifying which subset of it went to piece 243,
> unless I'm misunderstanding.
>
> Greg
>
> On 1/26/11 6:23 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
>
> > Greg,
> > Pig 8 tells you which job is responsible for which set of operators; you
> can
> > save all the inputs to the map only job by inserting intermediate stores,
> > and debug just the map-only job.
> >
> > D
> >
> > On Wed, Jan 26, 2011 at 2:49 PM, Greg Langmead <[email protected]>
> wrote:
> >
> >> Pig 0.8 executes my script by running six jobs. One of them is
> identified
> >> as
> >> "MAP_ONLY" and it always fails, with the innermost error I can find
> either
> >> saying "GC overhead limit exceeded" or "Java heap space". I suspect I
> have
> >> a
> >> piece that is too large. How can I get my hands on the actual data it
> was
> >> processing, so I can ascertain the cause? The task log says "Input
> records
> >> from tmp1872359169" can I view that data?
> >>
> >> Thanks,
> >>
> >> Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1
> >> 310
> >> 437 7300
> >>
> >> </pre>
> >> <BR style="font-size:4px;">
> >> <a href = "http://www.sdl.com/innovate";><img src="
> >> http://www.sdl.com/images/Innovate2011_emailsignature_final.png"; alt="
> >> www.sdl.com" border="0"/></a>
> >> <BR>
> >> <font face="arial"  size="2"><a href ="http://www.sdl.com/innovate";
> >> style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font>
> >> <BR>
> >> <BR>
> >> <font face="arial"  size="1" color="#736F6E">
> >> <b>SDL PLC confidential, all rights reserved.</b>
> >> If you are not the intended recipient of this mail SDL requests and
> >> requires that you delete it without acting upon or copying any of its
> >> contents, and we further request that you advise us.<BR>
> >> SDL PLC is a public limited company registered in England and Wales.
> >>  Registered number: 02675207.<BR>
> >> Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire
> SL6
> >> 7DY, UK.
> >> </font>
> >>
>
>

Reply via email to