So I don't get null when I read the schema in the checkSchema method. I set the class's internal schema variable, as in your gist, and it's not null in the very next call to 'setStoreLocation'. However, it is null on all later calls to 'setStoreLocation' and any and all calls to putNext. Not entirely sure when it goes out of scope.
Now, if this was vanilla map-reduce I'd say that checkSchema is being called once during the initial map-reduce job setup phase and anything you do in there is not going to be accessible to your later tasks which are happening on many different machines in the cluster. You could set the schema with checkSchema and then on the FIRST call to setStoreLocation you could place the schema in the job's configuration as a string. What I'm not sure about is exactly how many times setStoreLocation is actually called. I suspect (any Pig devs wanna help me out here?) that it's called exactly once per task (ie. during the call to 'setup()' in vanilla map-reduce land). If that's true then all you'd have to do is set it the first time then read it on all subsequent calls to setStoreLocation. Could try it out at least... --jacob @thedatachef On Tue, 2011-02-01 at 15:23 +0000, Dan Harvey wrote: > This is the same problem I was getting, I've put a snippit of the code > I as was using here :- https://gist.github.com/804551 > > With this I get null whenever I try to read the ResourceSchema object > in the checkSchema() method. > > I've had a look over the AvroStorage and it seems to assume the > ResourceSchema won't be null at this point in time so I'm not sure > what's going on for me. > Does anyone know if this is the best way to get the schema, or if pig > will ever send a null schema to the checkSchema method? > > Thanks, > > On 1 February 2011 04:46, Jacob Perkins <[email protected]> wrote: > > > > Trying to write a simple storefunc that makes use of the input data's > > field names. Is there a way to gain access to this inside of the call to > > putNext? Ostensibly you could set a variable with the schema during the > > call to checkSchema, eg. in HBaseStorage, but as far as I can tell this > > is null by the time putNext is called. Is there some other way or am I > > missing something obvious? > > > > --jacob > > @thedatachef > > > > > > -- > Dan Harvey | Datamining Engineer > www.mendeley.com/profiles/dan-harvey > > Mendeley Limited | London, UK | www.mendeley.com > Registered in England and Wales | Company Number 6419015
