Daniel,

Awesome, thank you.  I will try that out.

--Eric


On Wed, Jan 5, 2011 at 1:14 AM, Daniel Dai <[email protected]> wrote:
> You are right. setLocation is called in frontend, however, it is in the
> context of InputFormat.getSplits() and it is too late to save anything in
> UDFContext. Your best bet is relativeToAbsolutePath, which is called in
> frontend and you can save your stuff in UDFContext.
>
> Daniel
>
> -----Original Message----- From: Eric Tschetter
> Sent: Tuesday, January 04, 2011 11:52 AM
> To: [email protected]
> Subject: UDFContext in 0.8 LoadFunc?
>
> I have a custom LoadFunc (I'm actually just extending PigStorage) that
> has some added logic to spider a given path and pick out the paths
> that I want.  I am currently doing the spidering in setLocation
> because that seemed like the place to do it.  It appears as if this is
> getting called on both the client and the cluster side, though, so my
> mappers are spidering a path that was already spidered on the client
> (wasted effort).  Whenever the spidering is over a lot of directories
> this is adding a significant amount of unneeded overhead to my jobs.
>
> I looked into using UDFContext to save the paths and then try to get
> the cluster-side processes to look up the paths in UDF context and
> just use them if they exist.  But, it looks like the actual job
> configuration object is being created before the calls to
> setLocation() so the stuff that I set in UDFContext is not making it
> across the wire.
>
> Is there a method that is called before setLocation that I can use to
> set the value in UDFContext (I'd prefer something that is given a
> Context/Configuration object).  Or, is my only option to build a
> Configuration object in the constructor, do the crawl and set the
> UDFContext there (will that even work)?
>
> --Eric
>

Reply via email to