Richard, I think you want to use UDFContext to pass along the schema you get from CheckSchema. Here are the docs:
http://hadoop.apache.org/pig/docs/r0.7.0/udf.html#Passing+Configurations+to+UDFs http://hadoop.apache.org/pig/docs/r0.7.0/api/org/apache/pig/impl/util/UDFContext.html -D On Tue, May 25, 2010 at 7:08 PM, Richard Park <[email protected]> wrote: > Hi, > I’m porting our Load/Store funcs from pig 0.6 to 0.7. Currently we’re > storing data in serialized binary JSON. The format requires that the meta > data for the schema is stored in the header of the output file. Converting > our LoadFunc was a fairly painless experience. > > However, I’ve hit snag while doing the StoreFunc. We’re using a custom > SequenceFileOutputFormat and at the invocation of SequenceFileOutputFormat > getRecordWriter, we create a SequenceFile.Metadata with the schema and pass > it to the SequenceFile.Writer constructor. Unfortunately with pig 0.7, the > schema doesn’t seem to be available at the time the Writer is constructed. > In 0.6, there was a MapRedUtil function that allowed us to get the schema > through the StoreConfig, but that seems to have been removed. > > CheckSchema gets called on the client end, and StoreMetadata.storeSchema > seems to be invoked late. How would I go about getting this schema data > early (before the writer is created)? I suppose I could add the schema as a > parameter in Configuration, but I’m not sure if the Configuration parameters > will be properly propagated between the Load and Store func. I’ll play > around with configs next. > > Any advice would be appreciated. > > Thanks, > -Richard > >
