[jira] Commented: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783032#action_12783032
 ] 

Hadoop QA commented on PIG-990:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426246/pinned_options_5.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 363 release audit warnings 
(more than the trunk's current 362 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/63/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/63/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/63/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/63/console

This message is automatically generated.

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch, pinned_options_5.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783008#action_12783008
 ] 

Hadoop QA commented on PIG-760:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426245/pigstorageschema_3.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/62/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/62/console

This message is automatically generated.

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Is there any zebra design document ?

2009-11-26 Thread Dmitriy Ryaboy
https://issues.apache.org/jira/browse/PIG-833 indicates that at this point
javadocs+code are basically all the available documentation
The wiki has a couple of usage examples: http://wiki.apache.org/pig/zebra

-D

On Thu, Nov 26, 2009 at 10:32 AM, Jeff Zhang  wrote:

> Hi all,
>
> I'd like to learn zebra, so is there any zebra design document, I did not
> found any on wiki
> Thank you.
>
>
> Jeff Zhang
>


[jira] Updated: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-990:
--

Attachment: (was: pinned_options_5.patch)

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch, pinned_options_5.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-990:
--

Attachment: pinned_options_5.patch

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch, pinned_options_5.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-990:
--

Status: Patch Available  (was: Open)

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch, pinned_options_5.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Status: Patch Available  (was: Open)

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Attachment: (was: pigstorageschema_3.patch)

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Attachment: pigstorageschema_3.patch

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch, pigstorageschema_3.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-760:
--

Status: Open  (was: Patch Available)

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-990:
--

Attachment: pinned_options_5.patch

added a test for pinning group and join options.


> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch, pinned_options_5.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782978#action_12782978
 ] 

Hadoop QA commented on PIG-760:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426186/pigstorageschema_3.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/61/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/61/console

This message is automatically generated.

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
> pigstorageschema_3.patch
>
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782976#action_12782976
 ] 

Hadoop QA commented on PIG-922:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426180/PIG-922-p3_10.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 60 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

-1 release audit.  The applied patch generated 368 release audit warnings 
(more than the trunk's current 362 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/60/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/60/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/60/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/60/console

This message is automatically generated.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
> PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
> PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
> PIG-922-p3_10.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, 
> PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, 
> PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-990:
--

Status: Open  (was: Patch Available)

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-990) Provide a way to pin LogicalOperator Options

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782941#action_12782941
 ] 

Hadoop QA commented on PIG-990:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426178/pinned_options_4.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/59/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/59/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/59/console

This message is automatically generated.

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pinned_options.patch, pinned_options_2.patch, 
> pinned_options_3.patch, pinned_options_4.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Why we name it zebra ?

2009-11-26 Thread Jeff Zhang
Hi all,

I'd like to know where's the name zebra come from ? does it convey the
meaning of this meta data system that the columnar storage format is like
the lines on the zebra's skin.


Thank you

Jeff Zhang


Is there any zebra design document ?

2009-11-26 Thread Jeff Zhang
Hi all,

I'd like to learn zebra, so is there any zebra design document, I did not
found any on wiki
Thank you.


Jeff Zhang


[jira] Commented: (PIG-1074) Zebra store function should allow '::' in column names in output schema

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782884#action_12782884
 ] 

Hadoop QA commented on PIG-1074:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426177/PIG-1074.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/58/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/58/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/58/console

This message is automatically generated.

> Zebra store function should allow '::' in column names in output schema
> ---
>
> Key: PIG-1074
> URL: https://issues.apache.org/jira/browse/PIG-1074
> Project: Pig
>  Issue Type: Bug
>Reporter: Pradeep Kamath
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1074.patch, PIG-1074.patch, PIG-1074.patch
>
>
> the following script fails: 
>  {noformat}
> a = load '/zebra/singlefile/studenttab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader() as (name, age, gpa);
> b = load '/zebra/singlefile/votertab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader() as (name, age, registration, 
> contributions);
> c = filter a by age < 20;
> d = filter b by age < 20;
> store c into 
> '/user/pig/out//ZebraMultiQuery_30.out.1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> store d into 
> '/user/pig/out//ZebraMultiQuery_30.out.2' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> e = cogroup c by name, d by name;
> f = foreach e generate flatten(c), flatten(d);
> store f into '/user/pig//ZebraMultiQuery_30.out.3' 
> using org.apache.hadoop.zebra.pig.TableStorer('');
> {noformat}
> Here the schema of f has names like c::name and it looks like zebra storefunc 
> does not allow '::' in column name 
> The stack trace is
>  
> ERROR 2997: Unable to recreate exception from backend error: 
> java.io.IOException: ColumnGroup.Writer constructor failed : Partition 
> constructor failed :Encountered " ":" ": "" at line 1, column 3.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1113) Diamond query optimization throws error in JOIN

2009-11-26 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782877#action_12782877
 ] 

Ankur commented on PIG-1113:


The script fails even if correct schema is specified for the c:bag{}. So the 
following change does not alleviate the problem

set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, 
c:bag{T:tuple(l:chararray)});

> Diamond query optimization throws error in JOIN
> ---
>
> Key: PIG-1113
> URL: https://issues.apache.org/jira/browse/PIG-1113
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>
> The following script results in 1 M/R job as a result of diamond query 
> optimization but the script fails.
> set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, 
> c:chararray);
> set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{});
> set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3;
> set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as 
> f2, (chararray) 1 as f3;
> all_set2 = UNION set2_1, set2_2;
> joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3);
> dump joined_sets;
> And here is the error
> org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot 
> convert a bag to a String
>   at org.apache.pig.data.DataType.toString(DataType.java:739)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1113) Diamond query optimization throws error in JOIN

2009-11-26 Thread Ankur (JIRA)
Diamond query optimization throws error in JOIN
---

 Key: PIG-1113
 URL: https://issues.apache.org/jira/browse/PIG-1113
 Project: Pig
  Issue Type: Bug
Reporter: Ankur


The following script results in 1 M/R job as a result of diamond query 
optimization but the script fails.

set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, c:chararray);
set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{});

set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3;
set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as f2, 
(chararray) 1 as f3;

all_set2 = UNION set2_1, set2_2;

joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3);
dump joined_sets;

And here is the error

org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot 
convert a bag to a String
at org.apache.pig.data.DataType.toString(DataType.java:739)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782806#action_12782806
 ] 

Hadoop QA commented on PIG-1107:


+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426168/pig_piglinerecordreader_bug.patch
  against trunk revision 884235.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/57/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/57/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/57/console

This message is automatically generated.

> PigLineRecordReader bails out on an empty line for compressed data
> --
>
> Key: PIG-1107
> URL: https://issues.apache.org/jira/browse/PIG-1107
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankit Modi
>Assignee: Ankit Modi
> Fix For: 0.6.0
>
> Attachments: pig_piglinerecordreader_bug.patch
>
>
> PigLineRecordReader bails out with an exception when it encounters an empty 
> line in a compressed file
> java.lang.ArrayIndexOutOfBoundsException: -1
>at 
> org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
> at 
> org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
> at 
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1112) FLATTEN eliminates the alias

2009-11-26 Thread Ankur (JIRA)
FLATTEN eliminates the alias


 Key: PIG-1112
 URL: https://issues.apache.org/jira/browse/PIG-1112
 Project: Pig
  Issue Type: Bug
Reporter: Ankur
 Fix For: 0.6.0


If schema for a field of type 'bag' is partially defined then FLATTEN() 
incorrectly eliminates the field and throws an error. 
Consider the following example:-

A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, 
ladder:bag{});  
B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;   

C = GROUP B by (first,third);

This throws the error
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
Invalid alias: third in {first: chararray,second: chararray}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.