Ah I see, the null I was getting was whilst the map/reduce tasks were running which was because it's never called there.
I'll have a go at serialising the schema and sending it though the config which should be fine. Thanks, On 1 February 2011 16:28, jacob <[email protected]> wrote: > Thanks, that's ultimately what I went with. (Saw how it was done in the > AvroStorage class). Thought there might be a cleaner/simpler/better way > I was missing. > > --jacob > @thedatachef > > On Tue, 2011-02-01 at 21:22 +0530, Harsh J wrote: >> I remember facing this problem when trying to implement a Load/Store >> quite a while ago. >> >> The issue (not really an issue I guess) is that checkSchema is a >> front-end method. One that is used, perhaps multiple times, in the >> Pig's front-end code. It isn't called by the back-end code of Pig that >> runs on a given platform (Local or Hadoop). >> >> To persist your schema, ensure you put it onto the 'JobConf' (in loose >> terms). Pig lets you do this by using the UDFContext class for UDFs. >> Get a UDFContext for your UDF, then set a property in it with a key >> signifying your schema/other data and the value. Similarly, retrieve >> it in the other methods using a similar way, wherever you need it >> (getOutputFormat, putNext, etc.). >> >> On Tue, Feb 1, 2011 at 10:16 AM, Jacob Perkins >> <[email protected]> wrote: >> > Trying to write a simple storefunc that makes use of the input data's >> > field names. Is there a way to gain access to this inside of the call to >> > putNext? Ostensibly you could set a variable with the schema during the >> > call to checkSchema, eg. in HBaseStorage, but as far as I can tell this >> > is null by the time putNext is called. Is there some other way or am I >> > missing something obvious? >> > >> > --jacob >> > @thedatachef >> > >> > >> >> >> > > > -- Dan Harvey | Datamining Engineer www.mendeley.com/profiles/dan-harvey Mendeley Limited | London, UK | www.mendeley.com Registered in England and Wales | Company Number 6419015
