[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

Alan Gates (JIRA) Tue, 10 Nov 2009 13:29:59 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776086#action_12776086
 ]


Alan Gates commented on PIG-760:
--------------------------------

The issue is we want to break interfaces once, so we don't want to introduce 
any of the interfaces now.  The load/store redesign obviously won't be going 
into 0.6.  The other issue is that any classes and interfaces introduced now in 
the redesign are inherently unstable.  So even if we just sneak in 
ResourceSchema and ResourceStatistics, which won't break anything, I doubt 
they'll look the same once the redesign is done.  And I certainly don't want to 
be bound to any backward compatibility for those classes between 0.6 and the 
redesign.

I suggest that you build your own version of these classes and use them in your 
load/store functions and your optimizer.  Then when the redesign comes out, 
your code can switch.  As we'd change the classes anyway, I don't think you're 
creating any extra work for yourself.

> Serialize schemas for PigStorage() and other storage types.
> -----------------------------------------------------------
>
>                 Key: PIG-760
>                 URL: https://issues.apache.org/jira/browse/PIG-760
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: David Ciemiewicz
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.6.0
>
>         Attachments: pigstorageschema-2.patch, pigstorageschema.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

Reply via email to