Okie.. so how can I make use of -schema option with PigStorage.
Suppose my Jscon schema is -
{
"name":"Student_Data",
"properties":
{
"id":
{
"type":"INTEGER",
"description":"Student id"
},
"name":
{
"type":"CHARARRAY",
"description":"Name of the student"
},
"marks":
{
"type":"INTEGER",
"description":"Marks of the student"
},
}
}
I tried to create the above schema in Pig Datatypes. Can I use it or Is
there a different way to use "-schema" option ?
<code>-schema</code> Reads/Stores the schema of the relation using a hidden
JSON file.
Or is there some other way to directly pass the schema defined in some
other file as plain text file and read it using PigStorage ?
Thanks,
Praveenesh
On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[email protected]> wrote:
> It's a json serialization of the Pig schema object, and isn't really meant
> to be created by hand.
> Patches to make it more human-friendly would be quite welcome.
>
> D
>
> On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <[email protected]
> >wrote:
>
> > Thanks,
> > I was also looking for -schema option in PigStorage.
> > But Can anyone explain how can we define that json schema file.
> > Some tutorial/small example would be very helpful.
> >
> > Praveenesh
> >
> > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[email protected]>
> > wrote:
> >
> > > It's pretty straightforward, that's why the LoadMetadata interface
> > exists.
> > > You just have to implement it and translate however you store the
> schema
> > to
> > > a Pig Schema object.
> > >
> > > PigStorageSchema will read a json file that describes the schema, you
> can
> > > look at how that's done there (actually, PigStorage itself will do that
> > in
> > > trunk).
> > >
> > > You can also check out what the Elephant-Bird library does for loading
> > > protocol buffers and thrift objects, where schema is derived from the
> > > object itself.
> > >
> > > -Dmitriy
> > >
> > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <[email protected]
> > > >wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I am new to Pig.
> > > > I was wondering is it possible to pass schema in pig load statement
> > while
> > > > loading it first time.
> > > >
> > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is
> > there
> > > a
> > > > way through which I can pass the schema defined in some other file
> > (some
> > > > kind of meta file) into pig load statement or do I have to define it
> > > every
> > > > time inside LOAD statement ?
> > > >
> > > > Thanks,
> > > > Praveenesh
> > > >
> > >
> >
>