I've written a fair number of these, let me know if something is unclear. D
On Mon, Sep 12, 2011 at 1:44 PM, Reza <[email protected]> wrote: > sorry, didnt fully understand what you said, I think this will work now. > > thanks > > > ________________________________ > From: Reza <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Monday, September 12, 2011 4:31 PM > Subject: Re: LoadFunc and schemas (pig 0.9) > > That would work but it would overload the cluster since the tuples are > roughly 1k of data each. Really need the ability to parse down data to the > defined schema... > > > ________________________________ > From: Dmitriy Ryaboy <[email protected]> > To: [email protected]; Reza <[email protected]> > Sent: Monday, September 12, 2011 4:18 PM > Subject: Re: LoadFunc and schemas (pig 0.9) > > Don't provide an AS clause. Instead, implement the LoadMetadata interface > and return the appropriate schema in getSchema(). > > D > > On Mon, Sep 12, 2011 at 12:44 PM, Reza <[email protected]> wrote: > > > Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to > parse > > it. The problem is that I cant figure out how to access the schema that > is > > defined in the load statement. I am forced to do something like this: > > > > A = LOAD '/test/loadfiles/*' USING > > > com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)') > > AS (site:chararray,zone:chararray,pos:chararray); > > > > > > I have to define my schema twice, once for my custom loader and once for > > pig. I can see that there is a LoadCastor interface, but its not clear to > me > > how to use it in LoadFunc. All I need to do is get access to the schema > (the > > text after 'AS') inside of my LogStorage class. Whats the proper way to > load > > custom (non uniform) data into a schema? > > > > thanks >
