I've written a fair number of these, let me know if something is unclear.

D

On Mon, Sep 12, 2011 at 1:44 PM, Reza <[email protected]> wrote:

> sorry, didnt fully understand what you said, I think this will work now.
>
> thanks
>
>
> ________________________________
> From: Reza <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Monday, September 12, 2011 4:31 PM
> Subject: Re: LoadFunc and schemas (pig 0.9)
>
> That would work but it would overload the cluster since the tuples are
> roughly 1k of data each. Really need the ability to parse down data to the
> defined schema...
>
>
> ________________________________
> From: Dmitriy Ryaboy <[email protected]>
> To: [email protected]; Reza <[email protected]>
> Sent: Monday, September 12, 2011 4:18 PM
> Subject: Re: LoadFunc and schemas (pig 0.9)
>
> Don't provide an AS clause. Instead, implement the LoadMetadata interface
> and return the appropriate schema in getSchema().
>
> D
>
> On Mon, Sep 12, 2011 at 12:44 PM, Reza <[email protected]> wrote:
>
> > Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to
> parse
> > it. The problem is that I cant figure out how to access the schema that
> is
> > defined in the load statement. I am forced to do something like this:
> >
> > A = LOAD '/test/loadfiles/*' USING
> >
> com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)')
> > AS (site:chararray,zone:chararray,pos:chararray);
> >
> >
> > I have to define my schema twice, once for my custom loader and once for
> > pig. I can see that there is a LoadCastor interface, but its not clear to
> me
> > how to use it in LoadFunc. All I need to do is get access to the schema
> (the
> > text after 'AS') inside of my LogStorage class. Whats the proper way to
> load
> > custom (non uniform) data into a schema?
> >
> > thanks
>

Reply via email to