it reads the schema file *it creates* . So, you process some data, store
it, then read it back later, and the schema is back.
Like I said, the json is not very human-readable -- the types are integers
rather than words like "chararray", etc.
Try saving something and check out the .pig_schema file to see an example.

D

On Sun, Feb 5, 2012 at 10:59 PM, praveenesh kumar <[email protected]>wrote:

> Okie.. so how can I make use of -schema option with PigStorage.
>
> Suppose my Jscon schema is -
>
> {
>        "name":"Student_Data",
>        "properties":
>        {
>                "id":
>                {
>                        "type":"INTEGER",
>                        "description":"Student id"
>                },
>                "name":
>                {
>                        "type":"CHARARRAY",
>                        "description":"Name of the student"
>
>                },
>                "marks":
>                {
>                        "type":"INTEGER",
>                        "description":"Marks of the student"
>                },
>
>        }
> }
>
> I tried to create the above schema in Pig Datatypes. Can I use it or Is
> there a different way to use  "-schema" option ?
> <code>-schema</code> Reads/Stores the schema of the relation using a hidden
> JSON file.
>
> Or is there some other way to directly pass the schema defined in some
> other file as plain text file and read it using PigStorage ?
>
> Thanks,
> Praveenesh
>
>
> On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[email protected]>
> wrote:
>
> > It's a json serialization of the Pig schema object, and isn't really
> meant
> > to be created by hand.
> > Patches to make it more human-friendly would be quite welcome.
> >
> > D
> >
> > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <[email protected]
> > >wrote:
> >
> > > Thanks,
> > > I was also looking for -schema option in PigStorage.
> > > But Can anyone explain how can we define that json schema file.
> > > Some tutorial/small example would be very helpful.
> > >
> > > Praveenesh
> > >
> > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[email protected]>
> > > wrote:
> > >
> > > > It's pretty straightforward, that's why the LoadMetadata interface
> > > exists.
> > > > You just have to implement it and translate however you store the
> > schema
> > > to
> > > > a Pig Schema object.
> > > >
> > > > PigStorageSchema will read a json file that describes the schema, you
> > can
> > > > look at how that's done there (actually, PigStorage itself will do
> that
> > > in
> > > > trunk).
> > > >
> > > > You can also check out what the Elephant-Bird library does for
> loading
> > > > protocol buffers and thrift objects, where schema is derived from the
> > > > object itself.
> > > >
> > > > -Dmitriy
> > > >
> > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Hey guys,
> > > > >
> > > > > I am new to Pig.
> > > > > I was wondering is it possible to pass schema in pig load statement
> > > while
> > > > > loading it first time.
> > > > >
> > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is
> > > there
> > > > a
> > > > > way through which I can pass the schema defined in some other file
> > > (some
> > > > > kind of meta file) into pig load statement or do I have to define
> it
> > > > every
> > > > > time inside LOAD statement ?
> > > > >
> > > > > Thanks,
> > > > > Praveenesh
> > > > >
> > > >
> > >
> >
>

Reply via email to