it reads the schema file *it creates* . So, you process some data, store it, then read it back later, and the schema is back. Like I said, the json is not very human-readable -- the types are integers rather than words like "chararray", etc. Try saving something and check out the .pig_schema file to see an example.
D On Sun, Feb 5, 2012 at 10:59 PM, praveenesh kumar <[email protected]>wrote: > Okie.. so how can I make use of -schema option with PigStorage. > > Suppose my Jscon schema is - > > { > "name":"Student_Data", > "properties": > { > "id": > { > "type":"INTEGER", > "description":"Student id" > }, > "name": > { > "type":"CHARARRAY", > "description":"Name of the student" > > }, > "marks": > { > "type":"INTEGER", > "description":"Marks of the student" > }, > > } > } > > I tried to create the above schema in Pig Datatypes. Can I use it or Is > there a different way to use "-schema" option ? > <code>-schema</code> Reads/Stores the schema of the relation using a hidden > JSON file. > > Or is there some other way to directly pass the schema defined in some > other file as plain text file and read it using PigStorage ? > > Thanks, > Praveenesh > > > On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[email protected]> > wrote: > > > It's a json serialization of the Pig schema object, and isn't really > meant > > to be created by hand. > > Patches to make it more human-friendly would be quite welcome. > > > > D > > > > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <[email protected] > > >wrote: > > > > > Thanks, > > > I was also looking for -schema option in PigStorage. > > > But Can anyone explain how can we define that json schema file. > > > Some tutorial/small example would be very helpful. > > > > > > Praveenesh > > > > > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[email protected]> > > > wrote: > > > > > > > It's pretty straightforward, that's why the LoadMetadata interface > > > exists. > > > > You just have to implement it and translate however you store the > > schema > > > to > > > > a Pig Schema object. > > > > > > > > PigStorageSchema will read a json file that describes the schema, you > > can > > > > look at how that's done there (actually, PigStorage itself will do > that > > > in > > > > trunk). > > > > > > > > You can also check out what the Elephant-Bird library does for > loading > > > > protocol buffers and thrift objects, where schema is derived from the > > > > object itself. > > > > > > > > -Dmitriy > > > > > > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar < > [email protected] > > > > >wrote: > > > > > > > > > Hey guys, > > > > > > > > > > I am new to Pig. > > > > > I was wondering is it possible to pass schema in pig load statement > > > while > > > > > loading it first time. > > > > > > > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is > > > there > > > > a > > > > > way through which I can pass the schema defined in some other file > > > (some > > > > > kind of meta file) into pig load statement or do I have to define > it > > > > every > > > > > time inside LOAD statement ? > > > > > > > > > > Thanks, > > > > > Praveenesh > > > > > > > > > > > > > > >
