[ https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770573#action_12770573 ]
Alan Gates commented on PIG-760: -------------------------------- I know I'm wandering dangerously close to being fanatical here, but I really dislike taking a struct, making all the members private/protected, and then adding getters and setters. If some tools need getters and setters, feel free to add them. But please leave the members public. I notice you snuck in your names for LoadMetadata and StoreMetadata. I'm fine with motions to change the names. But let's get everyone to agree on the new names before we start using them. On the StoreMetadata interface, Pradeep had some thoughts on getting rid of it, as he felt all the necessary information could be communicated in StoreFunc.allFinished(). He should be publishing an update to the load/store redesign wiki ( http://wiki.apache.org/pig/LoadStoreRedesignProposal ) soon. He also wanted to change LoadMetadata.getSchema() to take a location so that the loader could find the file. Other changes all look good. One general thought. I want to figure out how to keep the ResourceStatistics object flexible enough that it's easy to add new statistics to it. One thought I'd had previously (I can't remember if we discussed this or not) was to add a Map<String, Object> to it. That way we can add new stats between versions of the object. Once the stats are accepted as valid and take hold, they could be moved into the object proper. Upside of this is its flexible. Downside is we risk devolving into an unknown properties object and every stat has to go through a transition. Thoughts? > Serialize schemas for PigStorage() and other storage types. > ----------------------------------------------------------- > > Key: PIG-760 > URL: https://issues.apache.org/jira/browse/PIG-760 > Project: Pig > Issue Type: New Feature > Reporter: David Ciemiewicz > Assignee: Dmitriy V. Ryaboy > Fix For: 0.6.0 > > Attachments: pigstorageschema-2.patch, pigstorageschema.patch > > > I'm finding PigStorage() really convenient for storage and data interchange > because it compresses well and imports into Excel and other analysis > environments well. > However, it is a pain when it comes to maintenance because the columns are in > fixed locations and I'd like to add columns in some cases. > It would be great if load PigStorage() could read a default schema from a > .schema file stored with the data and if store PigStorage() could store a > .schema file with the data. > I have tested this out and both Hadoop HDFS and Pig in -exectype local mode > will ignore a file called .schema in a directory of part files. > So, for example, if I have a chain of Pig scripts I execute such as: > A = load 'data-1' using PigStorage() as ( a: int , b: int ); > store A into 'data-2' using PigStorage(); > B = load 'data-2' using PigStorage(); > describe B; > describe B should output something like { a: int, b: int } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.