Dmitriy V. Ryaboy commented on PIG-760:


If / when I get complex schemas to work, this could theoretically be promoted 
to PigStorage proper, which would be cool. For now, if you try to deserialize a 
complex schema, everything blows up.. So that's not so good (especially since I 
let you serialize complex schemas! Actually maybe I should turn that off).

I'll add some docs on the next iteration, good call.  Briefly -- it's a JSON 
representation of the ResourceSchema, as described on the LoadStore redesign 
proposal: http://wiki.apache.org/pig/LoadStoreRedesignProposal . Once you know 
what the fields are, it's pretty easy to read; the one complexity is that types 
are represented using constants from the DataType class, which are not publicly 

> Serialize schemas for PigStorage() and other storage types.
> -----------------------------------------------------------
>                 Key: PIG-760
>                 URL: https://issues.apache.org/jira/browse/PIG-760
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: David Ciemiewicz
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.6.0
>         Attachments: pigstorageschema-2.patch, pigstorageschema.patch
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to