sorry, didnt fully understand what you said, I think this will work now.

thanks


________________________________
From: Reza <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Monday, September 12, 2011 4:31 PM
Subject: Re: LoadFunc and schemas (pig 0.9)

That would work but it would overload the cluster since the tuples are roughly 
1k of data each. Really need the ability to parse down data to the defined 
schema...


________________________________
From: Dmitriy Ryaboy <[email protected]>
To: [email protected]; Reza <[email protected]>
Sent: Monday, September 12, 2011 4:18 PM
Subject: Re: LoadFunc and schemas (pig 0.9)

Don't provide an AS clause. Instead, implement the LoadMetadata interface
and return the appropriate schema in getSchema().

D

On Mon, Sep 12, 2011 at 12:44 PM, Reza <[email protected]> wrote:

> Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to parse
> it. The problem is that I cant figure out how to access the schema that is
> defined in the load statement. I am forced to do something like this:
>
> A = LOAD '/test/loadfiles/*' USING
> com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)')
> AS (site:chararray,zone:chararray,pos:chararray);
>
>
> I have to define my schema twice, once for my custom loader and once for
> pig. I can see that there is a LoadCastor interface, but its not clear to me
> how to use it in LoadFunc. All I need to do is get access to the schema (the
> text after 'AS') inside of my LogStorage class. Whats the proper way to load
> custom (non uniform) data into a schema?
>
> thanks

Reply via email to