[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791612#action_12791612
 ] 

Hadoop QA commented on PIG-760:
---

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428185/pigstorageschema_7.patch
  against trunk revision 890596.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/131/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/131/console

This message is automatically generated.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch, pigstorageschema_5.patch, 
 pigstorageschema_7.patch, 
 TEST-org.apache.pig.piggybank.test.TestPigStorageSchema.txt


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-12-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791079#action_12791079
 ] 

Alan Gates commented on PIG-760:


I reran the test, it looks fine.  

Given that LoadMetadata et al has moved to experimental and PigStorageSchema to 
PiggyBank it doesn't seem to me that JsonMetadata belongs in builtin.  Are you 
ok with me moving it to experimental as part of applying the patch?

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch, pigstorageschema_5.patch, 
 TEST-org.apache.pig.piggybank.test.TestPigStorageSchema.txt


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-12-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788265#action_12788265
 ] 

Hadoop QA commented on PIG-760:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12427459/pigstorageschema_5.patch
  against trunk revision 888704.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/109/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/109/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/109/console

This message is automatically generated.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch, pigstorageschema_5.patch, 
 TEST-org.apache.pig.piggybank.test.TestPigStorageSchema.txt


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-12-09 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788300#action_12788300
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

The core test failure is in junit.framework -- doesn't seem related. Can 
someone confirm this is just Hudson acting out? Here's the error report: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/109/testReport/junit.framework/TestSuite$1/warning/

The javadoc fix is trivial, holding off uploading a patch in case I need to do 
something about the junit test failure.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch, pigstorageschema_5.patch, 
 TEST-org.apache.pig.piggybank.test.TestPigStorageSchema.txt


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-12-01 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784441#action_12784441
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

Right.. now that PigStorageSchema is in the piggybank, I need to use the full 
package name when referring to it. 
I'll update this ticket with a fix, and move the new interfaces to experimental.
This probably won't happen until next week.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch, 
 TEST-org.apache.pig.piggybank.test.TestPigStorageSchema.txt


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783890#action_12783890
 ] 

Alan Gates commented on PIG-760:


Is this patch ready to be reviewed with and checked in, or is it still in the 
development stages?

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783911#action_12783911
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

Ready

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783979#action_12783979
 ] 

Alan Gates commented on PIG-760:


Question for other pig committers:

Dmitry proposes with this patch to include and begin using some of the 
load-store redesign changes (see PIG-966).  Specifically, he includes versions 
of ResourceSchema, ResourceStatistcs, LoadMetadata, and StoreMetadata.  
Currently these are also being implemented on the load-store-redesign branch 
with the assumption that they'll be rolled into trunk for the 0.7 (or possibly 
a later) release.  He wants to include these new classes in this patch because 
he is using it for the cost based optimizer he is working on.

Are we ok with introducing these classes now since we know they are still under 
development and thus not yet stable?  I am if it is done with the stipulation 
that they will certainly change before they are officially released.  To make 
this clear to developers, I suggest moving them into a package 
org.apache.pig.experimental to make clear that fact they are not yet stable.  
Thoughts?


 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-30 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783995#action_12783995
 ] 

Pradeep Kamath commented on PIG-760:


I agree that these classes should go into org.apache.pig.experimental as part 
of this patch - the only issue I see is when eventually the load-store-redesign 
branch's changes are merged back to trunk the code in this patch will need to 
use the right package to refer to the classes implemented in core pig - with 
this patch committed we will need a reminder to make this change later - do we 
need a companion jira to track that?

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783188#action_12783188
 ] 

Hadoop QA commented on PIG-760:
---

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426297/pigstorageschema_4.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/64/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/64/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/64/console

This message is automatically generated.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782978#action_12782978
 ] 

Hadoop QA commented on PIG-760:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426186/pigstorageschema_3.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/61/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/61/console

This message is automatically generated.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-25 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782759#action_12782759
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

Argh. I totally missed Alan's comment before creating this new patch.

To address Alan's concerns: the patch doesn't break any current interfaces, 
just adds a few new ones, which aren't public.  I can commit to updating this 
issue as the definitions of the interfaces change. Where my implementation of 
the spec differs from the code in the load-store redesign branch, I believe 
it's because I am actually writing code that uses stats and schema loaders, and 
these changes should be considered for the branch.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776065#action_12776065
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

Alan et al --
Would this to be considered for trunk and 0.6, assuming I bring it up to parity 
with the interfaces as they are currently defined in the Load/Store redesign 
branch?  Or do you want to wait for the redesign to be complete before this can 
go in (naturally, I prefer the former)?

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776086#action_12776086
 ] 

Alan Gates commented on PIG-760:


The issue is we want to break interfaces once, so we don't want to introduce 
any of the interfaces now.  The load/store redesign obviously won't be going 
into 0.6.  The other issue is that any classes and interfaces introduced now in 
the redesign are inherently unstable.  So even if we just sneak in 
ResourceSchema and ResourceStatistics, which won't break anything, I doubt 
they'll look the same once the redesign is done.  And I certainly don't want to 
be bound to any backward compatibility for those classes between 0.6 and the 
redesign.

I suggest that you build your own version of these classes and use them in your 
load/store functions and your optimizer.  Then when the redesign comes out, 
your code can switch.  As we'd change the classes anyway, I don't think you're 
creating any extra work for yourself.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770573#action_12770573
 ] 

Alan Gates commented on PIG-760:


I know I'm wandering dangerously close to being fanatical here, but I really 
dislike taking a struct, making all the members private/protected, and then 
adding getters and setters.  If some tools need getters and setters, feel free 
to add them.  But please leave the members public.

I notice you snuck in your names for LoadMetadata and StoreMetadata.  I'm fine 
with motions to change the names.  But let's get everyone to agree on the new 
names before we start using them.

On the StoreMetadata interface, Pradeep had some thoughts on getting rid of it, 
as he felt all the necessary information could be communicated in 
StoreFunc.allFinished().  He should be publishing an update to the load/store 
redesign wiki ( http://wiki.apache.org/pig/LoadStoreRedesignProposal ) soon.  
He also wanted to change LoadMetadata.getSchema() to take a location so that 
the loader could find the file.

Other changes all look good.  

One general thought.  I want to figure out how to keep the ResourceStatistics 
object flexible enough that it's easy to add new statistics to it.  One thought 
I'd had previously (I can't remember if we discussed this or not) was to add a 
MapString, Object to it.  That way we can add new stats between versions of 
the object.  Once the stats are accepted as valid and take hold, they could be 
moved into the object proper.  Upside of this is its flexible.  Downside is we 
risk devolving into an unknown properties object and every stat has to go 
through a transition.  Thoughts?

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770618#action_12770618
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

bq. If some tools need getters and setters, feel free to add them. But please 
leave the members public. 

I'll revert the change.

bq. I notice you snuck in your names for LoadMetadata and StoreMetadata. I'm 
fine with motions to change the names. But let's get everyone to agree on the 
new names before we start using them.

Yeah I kind of figured we'll get to discuss again if I do that :-). It seems 
like we didn't really reach a final decision last time.  Are we sure the only 
time it might be reasonable to read or write metadata are during Loads and 
Stores? I am not. I can envision future uses where the storage is some 
ephemeral state that we have operating reporting stats into to enable adaptive 
optimizations.   Also, and I know I am nitpicking, LoadMetadata is an 
instruction, where's MetadataLoader is a thing. Same with StoreMetadata and 
MetadataStorer (but storer isn't a real word so I chose Writer..).

bq. On the StoreMetadata interface, Pradeep had some thoughts on getting rid of 
it, as he felt all the necessary information could be communicated in 
StoreFunc.allFinished(). He should be publishing an update to the load/store 
redesign wiki ( http://wiki.apache.org/pig/LoadStoreRedesignProposal ) soon. 

I was envisioning the setStatistics() and setSchema() methods as methods used 
to alter state, whereas allFinished()  essentially does the job of flushing 
whatever is needed (you'll notice I fake an allFinished() method in my finish() 
implementation by simply checking if any other task has started creating the 
necessary file yet -- a suboptimal workaround, but the best that can be done 
with the current interface).

bq. He also wanted to change LoadMetadata.getSchema() to take a location so 
that the loader could find the file.

A location by itself my not be sufficient -- for example for the JsonMetadata 
implementation, I need the DataStorage as well. 
I solved that by passing the location and storage into JsonMetadata's 
constructor. 
There is something to be said for being able to reuse the same MetadataLoader 
to load schemas for multiple locations, however. Assuming we can't come up with 
any scenarios where by the time we need to get the schema, we no longer have 
the location -- but we might have created the MetadataLoader beforehand, and 
set the internal location at that time -- I agree with the change.

bq. One thought I'd had previously (I can't remember if we discussed this or 
not) was to add a MapString, Object to it

I have a feeling we did discuss this, or something like this, possibly in a 
different context, but I can't find the mention either.   I am not sure what we 
would gain by this -- the only consumers of stats would be various 
optimizers/compilers/translators, right? So they would need to be updated to 
deal with new stats, and code that propagates / estimates stats down a logical 
plan would need to get updated, whenever a new statistic is added. That sounds 
pretty extensive. If we instead assume that any field is nullable (or, if 
collection, can be empty), and make sure that all missing fields are filled in 
as nulls/empties when the stat objects are deserialized, we should be ok with 
upgrades.


 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769141#action_12769141
 ] 

Hadoop QA commented on PIG-760:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12422958/pigstorageschema-2.patch
  against trunk revision 828891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 403 javac compiler warnings (more 
than the trunk's current 401 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/112/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/112/console

This message is automatically generated.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769435#action_12769435
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

The Javac warnings are about JobConf deprecation. There is a separate patch in 
the queue to turn this warning off until migration to the new APIs is finished.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-22 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768956#action_12768956
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

David,

If / when I get complex schemas to work, this could theoretically be promoted 
to PigStorage proper, which would be cool. For now, if you try to deserialize a 
complex schema, everything blows up.. So that's not so good (especially since I 
let you serialize complex schemas! Actually maybe I should turn that off).

I'll add some docs on the next iteration, good call.  Briefly -- it's a JSON 
representation of the ResourceSchema, as described on the LoadStore redesign 
proposal: http://wiki.apache.org/pig/LoadStoreRedesignProposal . Once you know 
what the fields are, it's pretty easy to read; the one complexity is that types 
are represented using constants from the DataType class, which are not publicly 
documented.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.6.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767862#action_12767862
 ] 

Alan Gates commented on PIG-760:


I don't take javac or findbugs warnings as final truth.  If you can give a good 
reason why the warning is wrong, not relevant, or you've chosen to take that 
risk to get some other benefit (such as you're not doing instanceof before a 
cast for performance and you believe the risk acceptable) then put that in 
comments and suppress the warning in the code.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
 Attachments: pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767605#action_12767605
 ] 

Hadoop QA commented on PIG-760:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12422565/pigstorageschema.patch
  against trunk revision 826110.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 410 javac compiler warnings (more 
than the trunk's current 408 warnings).

-1 findbugs.  The patch appears to introduce 15 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/101/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/101/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/101/console

This message is automatically generated.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
 Attachments: pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-19 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767651#action_12767651
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

Hmm.. I'll check out the javac warnings.

What do I do if I disagree with 14 out of 15 Findbugs warnings?  It wants me to 
copy objects in getters/setters, but I don't think that's necessary for this 
case. Commiters?

Also -- if someone gave me the ability to assign Jiras to myself, that would be 
great...

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
 Attachments: pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765603#action_12765603
 ] 

Alan Gates commented on PIG-760:


At this point no one has contributed a PigStorageSchema as suggested above.  We 
remain open to such a contribution if someone has the time.  

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-14 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765626#action_12765626
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

This would be a nice proof-of-concept task for the new Load/StoreMetadata 
interfaces, as it removes the complexity of dealing with something like Owl.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-07 Thread mark meissonnier (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763294#action_12763294
 ] 

mark meissonnier commented on PIG-760:
--

Any new development on this issue? I'm finding it painful to have to modify the 
input schema to all child pig scripts anytime I modify my root pig script. 
I was thinking of developing something quick and then I figured someone might 
have done something or I  could help the overall effort.
Please let me know.
Thanks

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-04-09 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697631#action_12697631
 ] 

David Ciemiewicz commented on PIG-760:
--

Sure, you could do that, create PigStorageSchema.

The thing is, I don't think it is necessary and it is possible to do this in a 
backward compatible way.

First, if the user specifies a LOAD ... AS clause schema, then PigStorage could 
simply use that casting to override what is in the .schema.  Of course, 
PigStorage might want to warn that there is an override at run time or do a 
smart warning only if there are incompatible differences between the 
serialized schema and the explicit AS clause schema.

Next, is there really any harm in creating the serialized shema file on each 
and every STORE?

Finally, why sub class when we could parameterize?

In other words, instead of writing:

store A into 'file' using PigStorageSchema();

Why not do:

store A into 'file' using PigStorage('schema=yes');  -- redundant schema=yes is 
default

I think it would be more useful to have single classes with parameterized 
options than a proliferation of classes.

Or, better yet, why can't I just define the behavior of PigStorage() for all of 
the instances in my script:

define PigStorage PigStorage(
'sep=\t',
'schema=yes',
'erroronmissingcolumn=no'
);

I have recently done similar things for other functions and it turns out to be 
a nice way of capturing global parameterizations for cleaner Pig code.




 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.