Hi,
I’m porting our Load/Store funcs from pig 0.6 to 0.7. Currently we’re storing 
data in serialized binary JSON. The format requires that the meta data for the 
schema is stored in the header of the output file. Converting our LoadFunc was 
a fairly painless experience.

However, I’ve hit snag while doing the StoreFunc. We’re using a custom 
SequenceFileOutputFormat and at the invocation of SequenceFileOutputFormat 
getRecordWriter, we create a SequenceFile.Metadata with the schema and pass it 
to the SequenceFile.Writer constructor.  Unfortunately with pig 0.7, the schema 
doesn’t seem to be available at the time the Writer is constructed. In 0.6, 
there was a MapRedUtil function that allowed us to get the schema through the 
StoreConfig, but that seems to have been removed.

CheckSchema gets called on the client end, and StoreMetadata.storeSchema seems 
to be invoked late. How would I go about getting this schema data early (before 
the writer is created)? I suppose I could add the schema as a parameter in 
Configuration, but I’m not sure if the Configuration parameters will be 
properly propagated between the Load and Store func. I’ll play around with 
configs next.

Any advice would be appreciated.

Thanks,
-Richard

Reply via email to