Alan Gates commented on PIG-760:

Question for other pig committers:

Dmitry proposes with this patch to include and begin using some of the 
load-store redesign changes (see PIG-966).  Specifically, he includes versions 
of ResourceSchema, ResourceStatistcs, LoadMetadata, and StoreMetadata.  
Currently these are also being implemented on the load-store-redesign branch 
with the assumption that they'll be rolled into trunk for the 0.7 (or possibly 
a later) release.  He wants to include these new classes in this patch because 
he is using it for the cost based optimizer he is working on.

Are we ok with introducing these classes now since we know they are still under 
development and thus not yet stable?  I am if it is done with the stipulation 
that they will certainly change before they are officially released.  To make 
this clear to developers, I suggest moving them into a package 
org.apache.pig.experimental to make clear that fact they are not yet stable.  

> Serialize schemas for PigStorage() and other storage types.
> -----------------------------------------------------------
>                 Key: PIG-760
>                 URL: https://issues.apache.org/jira/browse/PIG-760
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: David Ciemiewicz
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>         Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch, pigstorageschema_4.patch
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to