The integer values for types come from org.apache.pig.data.DataType
On Mon, Feb 6, 2012 at 1:17 AM, praveenesh kumar <[email protected]>wrote:
> Yeah I tried that -
> Here's what I get for a small sample data :
>
> {
> "fields":
> [
> {"name":"name","type":55,"description":"autogenerated from
> Pig Field Schema","schema":null},
> {"name":"age","type":10,"description":"autogenerated from
> Pig Field Schema","schema":null},
> {"name":"gpa","type":20,"description":"autogenerated from
> Pig Field Schema","schema":null}
> ],
>
> "version":0,
> "sortKeys":[],
> "sortKeyOrders":[]
> }
>
>
> I am looking to see if I can decode this formats and try to define my own
> schema in this way and use it in PigLoader function
>
> Thanks,
> Praveenesh
>
> On Mon, Feb 6, 2012 at 2:41 PM, Dmitriy Ryaboy <[email protected]> wrote:
>
> > it reads the schema file *it creates* . So, you process some data, store
> > it, then read it back later, and the schema is back.
> > Like I said, the json is not very human-readable -- the types are
> integers
> > rather than words like "chararray", etc.
> > Try saving something and check out the .pig_schema file to see an
> example.
> >
> > D
> >
> > On Sun, Feb 5, 2012 at 10:59 PM, praveenesh kumar <[email protected]
> > >wrote:
> >
> > > Okie.. so how can I make use of -schema option with PigStorage.
> > >
> > > Suppose my Jscon schema is -
> > >
> > > {
> > > "name":"Student_Data",
> > > "properties":
> > > {
> > > "id":
> > > {
> > > "type":"INTEGER",
> > > "description":"Student id"
> > > },
> > > "name":
> > > {
> > > "type":"CHARARRAY",
> > > "description":"Name of the student"
> > >
> > > },
> > > "marks":
> > > {
> > > "type":"INTEGER",
> > > "description":"Marks of the student"
> > > },
> > >
> > > }
> > > }
> > >
> > > I tried to create the above schema in Pig Datatypes. Can I use it or Is
> > > there a different way to use "-schema" option ?
> > > <code>-schema</code> Reads/Stores the schema of the relation using a
> > hidden
> > > JSON file.
> > >
> > > Or is there some other way to directly pass the schema defined in some
> > > other file as plain text file and read it using PigStorage ?
> > >
> > > Thanks,
> > > Praveenesh
> > >
> > >
> > > On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[email protected]>
> > > wrote:
> > >
> > > > It's a json serialization of the Pig schema object, and isn't really
> > > meant
> > > > to be created by hand.
> > > > Patches to make it more human-friendly would be quite welcome.
> > > >
> > > > D
> > > >
> > > > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <
> > [email protected]
> > > > >wrote:
> > > >
> > > > > Thanks,
> > > > > I was also looking for -schema option in PigStorage.
> > > > > But Can anyone explain how can we define that json schema file.
> > > > > Some tutorial/small example would be very helpful.
> > > > >
> > > > > Praveenesh
> > > > >
> > > > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > It's pretty straightforward, that's why the LoadMetadata
> interface
> > > > > exists.
> > > > > > You just have to implement it and translate however you store the
> > > > schema
> > > > > to
> > > > > > a Pig Schema object.
> > > > > >
> > > > > > PigStorageSchema will read a json file that describes the schema,
> > you
> > > > can
> > > > > > look at how that's done there (actually, PigStorage itself will
> do
> > > that
> > > > > in
> > > > > > trunk).
> > > > > >
> > > > > > You can also check out what the Elephant-Bird library does for
> > > loading
> > > > > > protocol buffers and thrift objects, where schema is derived from
> > the
> > > > > > object itself.
> > > > > >
> > > > > > -Dmitriy
> > > > > >
> > > > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <
> > > [email protected]
> > > > > > >wrote:
> > > > > >
> > > > > > > Hey guys,
> > > > > > >
> > > > > > > I am new to Pig.
> > > > > > > I was wondering is it possible to pass schema in pig load
> > statement
> > > > > while
> > > > > > > loading it first time.
> > > > > > >
> > > > > > > Suppose if I have a huge dataset.. containing around 100 cols..
> > Is
> > > > > there
> > > > > > a
> > > > > > > way through which I can pass the schema defined in some other
> > file
> > > > > (some
> > > > > > > kind of meta file) into pig load statement or do I have to
> define
> > > it
> > > > > > every
> > > > > > > time inside LOAD statement ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Praveenesh
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>