Richard,
I think you want to use UDFContext to pass along the schema you get from
CheckSchema. Here are the docs:

http://hadoop.apache.org/pig/docs/r0.7.0/udf.html#Passing+Configurations+to+UDFs
http://hadoop.apache.org/pig/docs/r0.7.0/api/org/apache/pig/impl/util/UDFContext.html

-D

On Tue, May 25, 2010 at 7:08 PM, Richard Park <[email protected]> wrote:

> Hi,
> I’m porting our Load/Store funcs from pig 0.6 to 0.7. Currently we’re
> storing data in serialized binary JSON. The format requires that the meta
> data for the schema is stored in the header of the output file. Converting
> our LoadFunc was a fairly painless experience.
>
> However, I’ve hit snag while doing the StoreFunc. We’re using a custom
> SequenceFileOutputFormat and at the invocation of SequenceFileOutputFormat
> getRecordWriter, we create a SequenceFile.Metadata with the schema and pass
> it to the SequenceFile.Writer constructor.  Unfortunately with pig 0.7, the
> schema doesn’t seem to be available at the time the Writer is constructed.
> In 0.6, there was a MapRedUtil function that allowed us to get the schema
> through the StoreConfig, but that seems to have been removed.
>
> CheckSchema gets called on the client end, and StoreMetadata.storeSchema
> seems to be invoked late. How would I go about getting this schema data
> early (before the writer is created)? I suppose I could add the schema as a
> parameter in Configuration, but I’m not sure if the Configuration parameters
> will be properly propagated between the Load and Store func. I’ll play
> around with configs next.
>
> Any advice would be appreciated.
>
> Thanks,
> -Richard
>
>

Reply via email to