from:"\"JIRA\""

[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-02 Thread Santhosh Srinivasan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917246#action_12917246
 ] 

Santhosh Srinivasan commented on PIG-1661:
--

Sure, worth a try.

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917133#action_12917133
 ] 

Daniel Dai commented on PIG-1661:
-

The site looks good. I would vote yes. 

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917130#action_12917130
 ] 

Ashutosh Chauhan commented on PIG-1661:
---

+1 for experimenting with search-hadoop.
Patch itself is small enough, so even if we find otherwise, it can easily be 
reverted.

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917117#action_12917117
 ] 

Alan Gates commented on PIG-1661:
-

I like that JIRA, source code, javadocs, etc. get added in.  So I'm willing to 
switch.

Anyone else have an opinion?

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-10-01 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1656:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to 0.8 branch and trunk.


> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1656.1.patch, PIG-1656.2.patch
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-10-01 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917108#action_12917108
 ] 

Richard Ding commented on PIG-1656:
---

+1

> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1656.1.patch, PIG-1656.2.patch
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917103#action_12917103
 ] 

Otis Gospodnetic commented on PIG-1661:
---

search-hadoop.com (SH) is indexing:
* User & dev ML messages
* JIRA issues
* Wiki
* Web site
* Source code
* Javadocs

I think Google is only indexing the web site and probably the Wiki.  SH has a 
nice auto-complete/suggest-as-you-type functionality, facets (project, data 
type, author), different sorting options, source code with syntax 
highlighting

SH index is continuously refreshed (new documents added, deleted ones removed, 
etc.) and new changes become visible in search results every 10 minutes.


> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-10-01 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1656:
---

Attachment: PIG-1656.2.patch

PIG-1656.2.patch
Updated patch to include documentation of details of output schema generation.


> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1656.1.patch, PIG-1656.2.patch
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

2010-10-01 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917081#action_12917081
 ] 

Alan Gates commented on PIG-1565:
-

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 8 new or 
modified tests.
 [exec]
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec]

The javadoc warning is:

  [javadoc] 
/home/gates/src/pig/PIG-1565/trunk/src/org/apache/pig/builtin/INDEXOF.java:78: 
warning - Tag @link: can't find INDEX_OF(int, int) in java.lang.String

Building Piggybank now fails as well, since some of the ErrorCatchingBase class 
was moved into main Pig.

Also, the patch fails a couple of unit tests in TestStringUDFs.  It fails 
testIndexOf and testLastIndexOf() because it doesn't properly handle the null 
case.

I'll attach the output from running the tests.

> additional piggybank datetime and string UDFs
> -
>
> Key: PIG-1565
>     URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
>  Issue Type: Improvement
>Reporter: Andrew Hitchcock
>Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing 
> Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

2010-10-01 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1565:


Status: Open  (was: Patch Available)

> additional piggybank datetime and string UDFs
> -
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
>  Issue Type: Improvement
>Reporter: Andrew Hitchcock
>Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing 
> Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1542) log level not propogated to MR task loggers

2010-10-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917079#action_12917079
 ] 

Daniel Dai commented on PIG-1542:
-

Yes, -d xxx should treat as -Ddebug=xxx. And system properties already have 
higher priority in the current code. (And in my mind, we should deprecate -d in 
favor of -Ddebug)

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1531) Pig gobbles up error messages

2010-10-01 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1531:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to both trunk and 0.8. Thanks, Niraj!

> Pig gobbles up error messages
> -
>
> Key: PIG-1531
> URL: https://issues.apache.org/jira/browse/PIG-1531
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: pig-1531_3.patch, pig-1531_4.patch, PIG-1531_5.patch, 
> PIG_1531.patch, PIG_1531_2.patch
>
>
> Consider the following. I have my own Storer implementing StoreFunc and I am 
> throwing FrontEndException (and other Exceptions derived from PigException) 
> in its various methods. I expect those error messages to be shown in error 
> scenarios. Instead Pig gobbles up my error messages and shows its own generic 
> error message like: 
> {code}
> 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2116: Unexpected error. Could not validate the output specification for: 
> default.partitoned
> Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log
> {code}
> Instead I expect it to display my error messages which it stores away in that 
> log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917064#action_12917064
 ] 

Alan Gates commented on PIG-1661:
-

I have a few questions about this:

# Can you tell me what's better about using search-hadoop than google?
# What all are you indexing?  How does this differ from what google is indexing?
# How often do you refresh your index?  That is, how long will it take changes 
to show up in your search?


> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1542) log level not propogated to MR task loggers

2010-10-01 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917058#action_12917058
 ] 

Thejas M Nair commented on PIG-1542:


Comment on the patch -
In case of log level settings, it is not possible to override the config file 
setting using command line options. In other cases, the command line values 
usually override what is specified in configuration file. For example in case 
of hadoop properties, this is what happens. This is also very convenient, 
because you can easily change the setting for a particular invocation of pig. 
You don't have to change config file which you might potentially share with 
other users.


> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-10-01 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917050#action_12917050
 ] 

Richard Ding commented on PIG-1656:
---


We need to make it clear how the output schema of TOBAG is generated. For 
example, in the first case, the type is preserved in the inner schema:

{code}
grunt> a = load 'input' as (a0:int, a1:int);
grunt> b = foreach a generate TOBAG(a0, a1);
grunt> describe b;
b: {{int}}
{code}

but not in the second case:

{code}
grunt> a = load 'input' as (a0:int, a1:int);
grunt> c = group a by a0 ;
grunt> b = foreach c generate TOBAG(a.a0, a.a1);
grunt> describe b;
b: {{NULL}}
{code}

> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1656.1.patch
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY

2010-10-01 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1659:


Attachment: PIG-1659-1.patch

> sortinfo is not set for store if there is a filter after ORDER BY
> -
>
> Key: PIG-1659
> URL: https://issues.apache.org/jira/browse/PIG-1659
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1659-1.patch
>
>
> This has caused 6 (of 7) failures in the Zebra test 
> TestOrderPreserveVariableTable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1663) Order by only allows ordering on columns, not expressions

2010-10-01 Thread Alan Gates (JIRA)

Order by only allows ordering on columns, not expressions
-

 Key: PIG-1663
 URL: https://issues.apache.org/jira/browse/PIG-1663
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


Currently the following Pig Latin will fail:

{code}
A = LOAD '/Users/gates/test/data/studenttab10' as (name, age, gpa);
B = order A by (int)age;
dump B;
{code}

with an error message
{code}
ERROR 1000: Error during parsing. Encountered " "int" "int "" at line 2, column 
17.
Was expecting one of:
 ...
 ...
{code}

The issue is because Pig expects a column not an expression for Order By.  If 
the cast is removed, the script passes.  Order by should take an expression for 
its key, just as group, join, etc. do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-10-01 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1656:
---

Status: Patch Available  (was: Open)

Patch passes unit tests and test-patch .


> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1656.1.patch
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-10-01 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1656:
---

Attachment: PIG-1656.1.patch

> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1656.1.patch
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-10-01 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1658:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to both trunk and the 0.8 branch.

> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1658.patch, PIG-1658.patch
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-10-01 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917014#action_12917014
 ] 

Thejas M Nair commented on PIG-1658:


Looks good . +1


> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1658.patch, PIG-1658.patch
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY

2010-10-01 Thread Yan Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917012#action_12917012
 ] 

Yan Zhou commented on PIG-1659:
---

Need to make sure it is invoked after optimization in both old and new logical 
plans.

> sortinfo is not set for store if there is a filter after ORDER BY
> -
>
> Key: PIG-1659
> URL: https://issues.apache.org/jira/browse/PIG-1659
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> This has caused 6 (of 7) failures in the Zebra test 
> TestOrderPreserveVariableTable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-10-01 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1658:
--

Attachment: PIG-1658.patch

Add Zebra test TestMergeJoinPartial to the "pigtest" target.

> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1658.patch, PIG-1658.patch
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY

2010-10-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916998#action_12916998
 ] 

Daniel Dai commented on PIG-1659:
-

We should set sortInfo after optimization. So we should add SetSortInfo after 
the optimization of new logical plan. This code is missing.

> sortinfo is not set for store if there is a filter after ORDER BY
> -
>
> Key: PIG-1659
> URL: https://issues.apache.org/jira/browse/PIG-1659
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> This has caused 6 (of 7) failures in the Zebra test 
> TestOrderPreserveVariableTable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1662) Need better error message for MalFormedProbVecException

2010-10-01 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1662:
--

Status: Patch Available  (was: Open)

> Need better error message for MalFormedProbVecException
> ---
>
> Key: PIG-1662
> URL: https://issues.apache.org/jira/browse/PIG-1662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1662.patch
>
>
> Instead the generic error message:
> Backend error message
> -
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.MalFormedProbVecException:
>  ERROR 2122: Sum of probabilities should be one
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.DiscreteProbabilitySampleGenerator.(DiscreteProbabilitySampleGenerator.java:56)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:128)
>   ... 10 more
> it can easily print out the content of the malformed probability vector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1662) Need better error message for MalFormedProbVecException

2010-10-01 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1662:
--

Attachment: PIG-1662.patch

> Need better error message for MalFormedProbVecException
> ---
>
> Key: PIG-1662
> URL: https://issues.apache.org/jira/browse/PIG-1662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1662.patch
>
>
> Instead the generic error message:
> Backend error message
> -
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.MalFormedProbVecException:
>  ERROR 2122: Sum of probabilities should be one
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.DiscreteProbabilitySampleGenerator.(DiscreteProbabilitySampleGenerator.java:56)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:128)
>   ... 10 more
> it can easily print out the content of the malformed probability vector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1662) Need better error message for MalFormedProbVecException

2010-10-01 Thread Richard Ding (JIRA)

Need better error message for MalFormedProbVecException
---

 Key: PIG-1662
 URL: https://issues.apache.org/jira/browse/PIG-1662
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0


Instead the generic error message:

Backend error message
-

Caused by: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.MalFormedProbVecException:
 ERROR 2122: Sum of probabilities should be one
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.DiscreteProbabilitySampleGenerator.(DiscreteProbabilitySampleGenerator.java:56)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:128)
... 10 more

it can easily print out the content of the malformed probability vector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1542) log level not propogated to MR task loggers

2010-10-01 Thread niraj rai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916974#action_12916974
 ] 

niraj rai commented on PIG-1542:


All the tests: end to end, unit and test-patch tests passed. If Thejas does not 
have any feedback, please commit the patch.
Thanks
Niraj

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1531) Pig gobbles up error messages

2010-10-01 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1531:
---

Attachment: PIG-1531_5.patch

reviewed the patch and made the required changes after discussion with 
Ashutosh. Ran test-patch and unit test and everything looks fine.
Ashutosh, please commit the patch, if you don't have any further comment.
Thanks
Niraj

> Pig gobbles up error messages
> -
>
> Key: PIG-1531
> URL: https://issues.apache.org/jira/browse/PIG-1531
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: pig-1531_3.patch, pig-1531_4.patch, PIG-1531_5.patch, 
> PIG_1531.patch, PIG_1531_2.patch
>
>
> Consider the following. I have my own Storer implementing StoreFunc and I am 
> throwing FrontEndException (and other Exceptions derived from PigException) 
> in its various methods. I expect those error messages to be shown in error 
> scenarios. Instead Pig gobbles up my error messages and shows its own generic 
> error message like: 
> {code}
> 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2116: Unexpected error. Could not validate the output specification for: 
> default.partitoned
> Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log
> {code}
> Instead I expect it to display my error messages which it stores away in that 
> log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Alex Baranau (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau updated PIG-1661:
--

Status: Patch Available  (was: Open)

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Alex Baranau (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau updated PIG-1661:
--

Attachment: PIG-1661.patch

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1661) Add alternative search-provider to Pig site

2010-10-01 Thread Alex Baranau (JIRA)

Add alternative search-provider to Pig site
---

 Key: PIG-1661
 URL: https://issues.apache.org/jira/browse/PIG-1661
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Alex Baranau
Priority: Minor


Use search-hadoop.com service to make available search in Pig sources, MLs, 
wiki, etc.
This was initially proposed on user mailing list. The search service was 
already added in site's skin (common for all Hadoop related projects) via 
AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-09-30 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1658:
--

Attachment: PIG-1658.patch

This problem is caused by the PIG-1295 patch.

test-core pass. Zebra's nightly pass too.

test-patch output:

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Zebra's TestMergeJoinPartial is used to verify the fix.

> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1658.patch
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-09-30 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1658:
--

Status: Patch Available  (was: Open)

> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1658.patch
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT

2010-09-30 Thread Viraj Bhat (JIRA)

Consider passing result of COUNT/COUNT_STAR to LIMIT 
-

 Key: PIG-1660
 URL: https://issues.apache.org/jira/browse/PIG-1660
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.9.0


In realistic scenarios we need to split a dataset into segments by using LIMIT, 
and like to achieve that goal within the same pig script. Here is a case:

{code}
A = load '$DATA' using PigStorage(',') as (id, pvs);
B = group A by ALL;
C = foreach B generate COUNT_STAR(A) as row_cnt;
-- get the low 50% segment
D = order A by pvs;
E = limit D (C.row_cnt * 0.2);
store E in '$Eoutput';
-- get the high 20% segment
F = order A by pvs DESC;
G = limit F (C.row_cnt * 0.2);
store G in '$Goutput';
{code}

Since LIMIT only accepts constants, we have to split the operation to two steps 
in order to pass in the constants for the LIMIT statements. Please consider 
bringing this feature in so the processing can be more efficient.

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-30 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1638:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

> sh output gets mixed up with the grunt prompt
> -
>
> Key: PIG-1638
> URL: https://issues.apache.org/jira/browse/PIG-1638
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.8.0
>Reporter: niraj rai
>Assignee: niraj rai
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1638_0.patch
>
>
> Many times, the grunt prompt gets mixed up with the sh output.e.g.
> grunt> sh ls
> 000
> autocomplete
> bin
> build
> build.xml
> grunt> CHANGES.txt
> conf
> contrib
> In the above case,  grunt> is mixed up with the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1297) algebraic interface of udf does not get used if the foreach with udf projects column within group

2010-09-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1297.


Resolution: Duplicate

> algebraic interface of udf does not get used if the foreach with udf projects 
> column within group
> -
>
> Key: PIG-1297
> URL: https://issues.apache.org/jira/browse/PIG-1297
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> grunt> l = load 'file' as (a,b,c);
> grunt> g = group l by (a,b);
> grunt> f = foreach g generate SUM(l.c), group.a;
> grunt> explain f;
> ...
> ...
> #--
> # Map Reduce Plan
> #--
> MapReduce node 1-752
> Map Plan
> Local Rearrange[tuple]{tuple}(false) - 1-742
> |   |
> |   Project[bytearray][0] - 1-743
> |   |
> |   Project[bytearray][1] - 1-744
> |
> |---Load(file:///Users/tejas/pig/trunk/file:org.apache.pig.builtin.PigStorage)
>  - 1-739
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-751
> |
> |---New For Each(false,false)[bag] - 1-750
> |   |
> |   POUserFunc(org.apache.pig.builtin.SUM)[double] - 1-747
> |   |
> |   |---Project[bag][2] - 1-746
> |   |
> |   |---Project[bag][1] - 1-745
> |   |
> |   Project[bytearray][0] - 1-749
> |   |
> |   |---Project[tuple][0] - 1-748
> |
> |---Package[tuple]{tuple} - 1-741
> Global sort: false
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-30 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916725#action_12916725
 ] 

Daniel Dai commented on PIG-1638:
-

+1

> sh output gets mixed up with the grunt prompt
> -
>
> Key: PIG-1638
> URL: https://issues.apache.org/jira/browse/PIG-1638
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.8.0
>Reporter: niraj rai
>Assignee: niraj rai
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1638_0.patch
>
>
> Many times, the grunt prompt gets mixed up with the sh output.e.g.
> grunt> sh ls
> 000
> autocomplete
> bin
> build
> build.xml
> grunt> CHANGES.txt
> conf
> contrib
> In the above case,  grunt> is mixed up with the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository

2010-09-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1607:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to 0.8 branch and trunk.


> pig should have separate javadoc.jar in the maven repository
> 
>
> Key: PIG-1607
> URL: https://issues.apache.org/jira/browse/PIG-1607
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: niraj rai
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, 
> PIG-1607_3.patch, PIG-1607_4.patch
>
>
> At this moment, javadoc is part of the source.jar but pig should have 
> separate javadoc.jar in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository

2010-09-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1607:
---

Fix Version/s: 0.8.0
Affects Version/s: 0.8.0

> pig should have separate javadoc.jar in the maven repository
> 
>
> Key: PIG-1607
> URL: https://issues.apache.org/jira/browse/PIG-1607
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: niraj rai
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, 
> PIG-1607_3.patch, PIG-1607_4.patch
>
>
> At this moment, javadoc is part of the source.jar but pig should have 
> separate javadoc.jar in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1655) code duplicated for udfs that were moved from piggybank to builtin

2010-09-30 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai reassigned PIG-1655:
--

Assignee: niraj rai  (was: Thejas M Nair)

> code duplicated for udfs that were moved from piggybank to builtin
> --
>
> Key: PIG-1655
> URL: https://issues.apache.org/jira/browse/PIG-1655
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
>
> As part of PIG-1405, some udfs from piggybank were made standard udfs. But 
> now the code is duplicated in piggybank and org.apache.pig.builtin. . This 
> can cause confusion.
> I am planning to make these udfs in piggybank subclasses of those in 
> org.apache.pig.builtin. so that users don't have to change their scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1651) PIG class loading mishandled

2010-09-30 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1651:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

> PIG class loading mishandled
> 
>
> Key: PIG-1651
> URL: https://issues.apache.org/jira/browse/PIG-1651
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1651.patch
>
>
> If just having zebra.jar as being registered in a PIG script but not in the 
> CLASSPATH, the query using zebra fails since there appear to be multiple 
> classes loaded into JVM, causing static variable set previously not seen 
> after one instance of the class is created through reflection. (After the 
> zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is 
> as follows:
> ackend error message during job submission
> ---
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: hdfs://hostname/pathto/zebra_dir :: null
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
> at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
> at 
> org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
> ... 7 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY

2010-09-30 Thread Yan Zhou (JIRA)

sortinfo is not set for store if there is a filter after ORDER BY
-

 Key: PIG-1659
 URL: https://issues.apache.org/jira/browse/PIG-1659
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Daniel Dai
 Fix For: 0.8.0


This has caused 6 (of 7) failures in the Zebra test 
TestOrderPreserveVariableTable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-09-30 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1658:
-

Assignee: Yan Zhou

> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1607) pig should have separate javadoc.jar in the maven repository

2010-09-30 Thread Giridharan Kesavan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916594#action_12916594
 ] 

Giridharan Kesavan commented on PIG-1607:
-

looks good +1 

able to do mvn-install and mvn-deploy to install/deploy javadoc jar to the fs 
and apache mvn repo.



> pig should have separate javadoc.jar in the maven repository
> 
>
> Key: PIG-1607
> URL: https://issues.apache.org/jira/browse/PIG-1607
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, 
> PIG-1607_3.patch, PIG-1607_4.patch
>
>
> At this moment, javadoc is part of the source.jar but pig should have 
> separate javadoc.jar in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-09-30 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1658:
--

Fix Version/s: 0.8.0
Affects Version/s: 0.8.0

> ORDER BY does not work properly on integer/short keys that are -1
> -
>
> Key: PIG-1658
> URL: https://issues.apache.org/jira/browse/PIG-1658
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
>
> In fact, all these types of keys of values that are negative but within the 
> byte or short's range would have the problem.
> Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1

2010-09-30 Thread Yan Zhou (JIRA)

ORDER BY does not work properly on integer/short keys that are -1
-

 Key: PIG-1658
 URL: https://issues.apache.org/jira/browse/PIG-1658
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou


In fact, all these types of keys of values that are negative but within the 
byte or short's range would have the problem.

Basic cally, a byte value of -1 & 0xff will return 255 not -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1657) reduce the ivy verbosity during build.

2010-09-30 Thread Giridharan Kesavan (JIRA)

reduce the ivy verbosity during build.
--

 Key: PIG-1657
 URL: https://issues.apache.org/jira/browse/PIG-1657
 Project: Pig
  Issue Type: Improvement
Reporter: Giridharan Kesavan


ivy is very verbose while doing build, making it less verbose would let us see 
what the builds actually does.. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1650:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Niraj confirmed that unit tests and test-patch has succeded.
Patch looks good. +1 .
Committed to trunk and 0.8 branch.


> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch, PIG-1650_1.patch, PIG-1650_2.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1629) Need ability to limit bags produced during GROUP + LIMIT

2010-09-30 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916538#action_12916538
 ] 

Thejas M Nair commented on PIG-1629:


Similar optimization can be done for inner filter as well - 
C = foreach B{ D = filter A by x > 0; generate group, MyUDF(D);}

Changes required-
- group physical/MR plan implementation to have an inner limit/filter.
- logical optimizer rules to make the load/filter an inner plan of groupp


> Need ability to limit bags produced during GROUP + LIMIT
> 
>
> Key: PIG-1629
> URL: https://issues.apache.org/jira/browse/PIG-1629
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> Currently, the code below will construct the full group in memory and then 
> trim it. This requires in use of more memory than needed.
> A = load 'data' as (x, y, z);
> B = group A by x;
> C = foreach B{
> D = limit A 100;
> generate group, MyUDF(D);}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-30 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

unit tests passed. PIG-1649.5.patch committed to trunk and 0.8 branch.


> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, 
> PIG-1649.4.patch, PIG-1649.5.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository

2010-09-29 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1607:
---

Status: Patch Available  (was: Open)

> pig should have separate javadoc.jar in the maven repository
> 
>
> Key: PIG-1607
> URL: https://issues.apache.org/jira/browse/PIG-1607
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, 
> PIG-1607_3.patch, PIG-1607_4.patch
>
>
> At this moment, javadoc is part of the source.jar but pig should have 
> separate javadoc.jar in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-29 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1650:
---

Attachment: PIG-1650_2.patch

> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch, PIG-1650_1.patch, PIG-1650_2.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Attachment: PIG-1649.5.patch


I committed PIG-1649.4.patch to 0.8 and trunk after unit tests completed, but 
only after that I realized that I had run unit tests against a older patch 
(PIG-1649.2.patch). While running unit tests again I found 2 failure in 
TestJobSubmission. Patch PIG-1649.5.patch has the fix. I am waiting for unit 
tests to complete.

> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, 
> PIG-1649.4.patch, PIG-1649.5.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema

2010-09-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1656:
---

Summary: TOBAG  udfs ignores columns with null value;  it does not use 
input type to determine output schema  (was: TOBAG & TOTUPLE udfs ignores 
columns with null value;  TOBAG does not use input type to determine output 
schema)
Description: 
TOBAG udf ignores columns with null value
{code}
R4= foreach B generate $0,  TOBAG( id, null, id,null );
grunt> dump R4;
1000{(1),(1)}
1000{(2),(2)}
1000{(3),(3)}
1000{(4),(4)}
{code}


 TOBAG does not use input type to determine output schema
{code}
grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
grunt> describe B1;
B1: {{null}}
{code}


  was:
TOBAG & TOTUPLE udfs ignores columns with null value
{code}
R4= foreach B generate $0, TOTUPLE(null, id, null),  TOBAG( id, null, id,null );
grunt> dump R4;
1000(,1,)   {(1),(1)}
1000(,2,)   {(2),(2)}
1000(,3,)   {(3),(3)}
1000(,4,)   {(4),(4)}
{code}


 TOBAG does not use input type to determine output schema
{code}
grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
grunt> describe B1;
B1: {{null}}
{code}



> TOBAG  udfs ignores columns with null value;  it does not use input type to 
> determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> TOBAG udf ignores columns with null value
> {code}
> R4= foreach B generate $0,  TOBAG( id, null, id,null );
> grunt> dump R4;
> 1000{(1),(1)}
> 1000{(2),(2)}
> 1000{(3),(3)}
> 1000{(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-29 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1650:
---

Status: Patch Available  (was: Open)

> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch, PIG-1650_1.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-29 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1650:
---

Attachment: PIG-1650_1.patch

> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch, PIG-1650_1.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-29 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1650:
---

Status: Open  (was: Patch Available)

> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

  Status: Patch Available  (was: Open)
Hadoop Flags: [Reviewed]

> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, 
> PIG-1649.4.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1637.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

All tests pass except for TestSortedTableUnion / TestSortedTableUnionMergeJoin 
for zebra, which are already fail and will be addressed by 
[PIG-1649|https://issues.apache.org/jira/browse/PIG-1649].

Patch committed to both trunk and 0.8 branch.

> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> 
>
> Key: PIG-1637
> URL: https://issues.apache.org/jira/browse/PIG-1637
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1637-1.patch, PIG-1637-2.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume after merge, we 
> need to calculate B twice in D. Actually, C1 is only doing projection, no 
> calculation of B. Merging C1 and D will not result calculating B twice. So C1 
> and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915985#action_12915985
 ] 

Richard Ding commented on PIG-1649:
---

+1. Looks good.

> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, 
> PIG-1649.4.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Attachment: PIG-1649.4.patch

New patch addressing comments from Richard
- In UriUtil.isHDFSFile(String uri) return false if uri is null
- Modified a test in TestFRJoin2 to use comma separated file name.

> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, 
> PIG-1649.4.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1542) log level not propogated to MR task loggers

2010-09-28 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1542:
---

Status: Open  (was: Patch Available)

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1542) log level not propogated to MR task loggers

2010-09-28 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1542:
---

Status: Patch Available  (was: Open)

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1542) log level not propogated to MR task loggers

2010-09-28 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1542:
---

Attachment: PIG-1542_2.patch

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1656) TOBAG & TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema

2010-09-28 Thread Thejas M Nair (JIRA)

TOBAG & TOTUPLE udfs ignores columns with null value;  TOBAG does not use input 
type to determine output schema
---

 Key: PIG-1656
 URL: https://issues.apache.org/jira/browse/PIG-1656
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair


TOBAG & TOTUPLE udfs ignores columns with null value
{code}
R4= foreach B generate $0, TOTUPLE(null, id, null),  TOBAG( id, null, id,null );
grunt> dump R4;
1000(,1,)   {(1),(1)}
1000(,2,)   {(2),(2)}
1000(,3,)   {(3),(3)}
1000(,4,)   {(4),(4)}
{code}


 TOBAG does not use input type to determine output schema
{code}
grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
grunt> describe B1;
B1: {{null}}
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1656) TOBAG & TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1656:
---

Fix Version/s: 0.8.0
Affects Version/s: 0.8.0

> TOBAG & TOTUPLE udfs ignores columns with null value;  TOBAG does not use 
> input type to determine output schema
> ---
>
> Key: PIG-1656
> URL: https://issues.apache.org/jira/browse/PIG-1656
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> TOBAG & TOTUPLE udfs ignores columns with null value
> {code}
> R4= foreach B generate $0, TOTUPLE(null, id, null),  TOBAG( id, null, id,null 
> );
> grunt> dump R4;
> 1000(,1,)   {(1),(1)}
> 1000(,2,)   {(2),(2)}
> 1000(,3,)   {(3),(3)}
> 1000(,4,)   {(4),(4)}
> {code}
>  TOBAG does not use input type to determine output schema
> {code}
> grunt> B1 = foreach B generate TOBAG( 1, 2, 3); 
> grunt> describe B1;
> B1: {{null}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1655) code duplicated for udfs that were moved from piggybank to builtin

2010-09-28 Thread Thejas M Nair (JIRA)

code duplicated for udfs that were moved from piggybank to builtin
--

 Key: PIG-1655
 URL: https://issues.apache.org/jira/browse/PIG-1655
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


As part of PIG-1405, some udfs from piggybank were made standard udfs. But now 
the code is duplicated in piggybank and org.apache.pig.builtin. . This can 
cause confusion.
I am planning to make these udfs in piggybank subclasses of those in 
org.apache.pig.builtin. so that users don't have to change their scripts.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Attachment: PIG-1649.3.patch

New patch that includes apache license header for UriUtil.java. Passes 
test-patch, waiting for unit tests to finish.


> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1654) Pig should check schema alias duplication at any levels.

2010-09-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1654:
-

Description: 
The following script appears valid to Pig but it shouldn't:

A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray);
dump A;

Pig tries to launch map/reduce jobs for this.

However, for the following script, Pig correctly reports error message:

A = load 'file' as (a:int, a:long, c:bytearray);
dump A;

Error message is:
2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1108: Duplicate schema alias: b in "A"

Thus, Pig only checks alias duplication at the top level, which is confirmed by 
looking at the code. The right behavior is that the same check should be 
applied at all levels. 

This should be addressed in the new parser.



  was:
The following script appears valid to Pig but it shouldn't:

A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray);
dump A;

Pig tries to launch map/reduce jobs for this.

However, for the following script, Pig correctly reports error message:

A = load 'file' as (a:int, b:long, c:bytearray);
dump A;

Error message is:
2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1108: Duplicate schema alias: b in "A"

Thus, Pig only checks alias duplication at the top level, which is confirmed by 
looking at the code. The right behavior is that the same check should be 
applied at all levels. 

This should be addressed in the new parser.




> Pig should check schema alias duplication at any levels.
> 
>
>     Key: PIG-1654
> URL: https://issues.apache.org/jira/browse/PIG-1654
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The following script appears valid to Pig but it shouldn't:
> A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray);
> dump A;
> Pig tries to launch map/reduce jobs for this.
> However, for the following script, Pig correctly reports error message:
> A = load 'file' as (a:int, a:long, c:bytearray);
> dump A;
> Error message is:
> 2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: Duplicate schema alias: b in "A"
> Thus, Pig only checks alias duplication at the top level, which is confirmed 
> by looking at the code. The right behavior is that the same check should be 
> applied at all levels. 
> This should be addressed in the new parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1654) Pig should check schema alias duplication at any levels.

2010-09-28 Thread Xuefu Zhang (JIRA)

Pig should check schema alias duplication at any levels.


 Key: PIG-1654
 URL: https://issues.apache.org/jira/browse/PIG-1654
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.9.0


The following script appears valid to Pig but it shouldn't:

A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray);
dump A;

Pig tries to launch map/reduce jobs for this.

However, for the following script, Pig correctly reports error message:

A = load 'file' as (a:int, b:long, c:bytearray);
dump A;

Error message is:
2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1108: Duplicate schema alias: b in "A"

Thus, Pig only checks alias duplication at the top level, which is confirmed by 
looking at the code. The right behavior is that the same check should be 
applied at all levels. 

This should be addressed in the new parser.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1651) PIG class loading mishandled

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915959#action_12915959
 ] 

Daniel Dai commented on PIG-1651:
-

+1

> PIG class loading mishandled
> 
>
> Key: PIG-1651
> URL: https://issues.apache.org/jira/browse/PIG-1651
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1651.patch
>
>
> If just having zebra.jar as being registered in a PIG script but not in the 
> CLASSPATH, the query using zebra fails since there appear to be multiple 
> classes loaded into JVM, causing static variable set previously not seen 
> after one instance of the class is created through reflection. (After the 
> zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is 
> as follows:
> ackend error message during job submission
> ---
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: hdfs://hostname/pathto/zebra_dir :: null
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
> at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
> at 
> org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
> ... 7 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1651) PIG class loading mishandled

2010-09-28 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1651:
--

Attachment: PIG-1651.patch

> PIG class loading mishandled
> 
>
> Key: PIG-1651
> URL: https://issues.apache.org/jira/browse/PIG-1651
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1651.patch
>
>
> If just having zebra.jar as being registered in a PIG script but not in the 
> CLASSPATH, the query using zebra fails since there appear to be multiple 
> classes loaded into JVM, causing static variable set previously not seen 
> after one instance of the class is created through reflection. (After the 
> zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is 
> as follows:
> ackend error message during job submission
> ---
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: hdfs://hostname/pathto/zebra_dir :: null
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
> at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
> at 
> org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
> ... 7 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1651) PIG class loading mishandled

2010-09-28 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1651:
--

Status: Patch Available  (was: Open)

> PIG class loading mishandled
> 
>
> Key: PIG-1651
> URL: https://issues.apache.org/jira/browse/PIG-1651
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1651.patch
>
>
> If just having zebra.jar as being registered in a PIG script but not in the 
> CLASSPATH, the query using zebra fails since there appear to be multiple 
> classes loaded into JVM, causing static variable set previously not seen 
> after one instance of the class is created through reflection. (After the 
> zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is 
> as follows:
> ackend error message during job submission
> ---
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: hdfs://hostname/pathto/zebra_dir :: null
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
> at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
> at 
> org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
> ... 7 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1652.


Resolution: Duplicate

Marking as duplicate of PIG-1649 because the code path to consolidate input 
files in FRJoin also has the same issue. 


> TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
> estimateNumberOfReducers bug
> 
>
> Key: PIG-1652
> URL: https://issues.apache.org/jira/browse/PIG-1652
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
> the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias records3
> at org.apache.pig.PigServer.storeEx(PigServer.java:877)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
> Unexpected error during execution.
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Illegal character in scheme name at index 69: 
> org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
> at org.apache.hadoop.fs.Path.initialize(Path.java:140)
> at org.apache.hadoop.fs.Path.(Path.java:126)
> at org.apache.hadoop.fs.Path.(Path.java:50)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
> at 
> org.apache.pig.backend.hadoop.execu

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Status: Open  (was: Patch Available)

The patch also includes changes to fix the issue in PIG-1652 , since FRJoin 
code path also faces similar issue.



> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Attachment: PIG-1649.2.patch

PIG-1649.2.patch
Addressing review comments from Richard 
-  pointed out that that hdfs Path class constructor can fail on valid Uri like 
the format used for jdbc. So this patch checks if the input location uri has a 
hdfs scheme before using the hdfs Path constructor.
- The code here can run into same problem as one in PIG-1652. The patch also 
includes changes to handle comma separated file names.

A better long term solution would be to have support in LoadFunc or related 
interfaces to check the input size and to check if parts of the file should be 
consolidated.



> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch, PIG-1649.2.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915950#action_12915950
 ] 

Daniel Dai commented on PIG-1637:
-

Yes, it could be improved as per Xuefu's suggestion. Anyway, current patch 
solve the "combiner not used" issue, will commit this part first. I will open 
another Jira to improve it. Also, MergeForEach is a best example to practice 
cloning framework [PIG-1587|https://issues.apache.org/jira/browse/PIG-1587], so 
it is better to improve it once PIG-1587 is available.

> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> 
>
> Key: PIG-1637
>     URL: https://issues.apache.org/jira/browse/PIG-1637
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1637-1.patch, PIG-1637-2.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume after merge, we 
> need to calculate B twice in D. Actually, C1 is only doing projection, no 
> calculation of B. Merging C1 and D will not result calculating B twice. So C1 
> and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1651) PIG class loading mishandled

2010-09-28 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915945#action_12915945
 ] 

Richard Ding commented on PIG-1651:
---

The problem here is that PigContext uses LogicalPlanBuilder.classloader to 
instantiate the LoadFuncs, but the context ClassLoader for the Thread uses a 
different class loader, and hence the static variable set for the class loaded 
by one loader is not visible by the class loaded by the other loader. The 
solution is to use the same LogicalPlanBuilder.classloader as the context 
ClassLoader for the Thread.

> PIG class loading mishandled
> 
>
> Key: PIG-1651
> URL: https://issues.apache.org/jira/browse/PIG-1651
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> If just having zebra.jar as being registered in a PIG script but not in the 
> CLASSPATH, the query using zebra fails since there appear to be multiple 
> classes loaded into JVM, causing static variable set previously not seen 
> after one instance of the class is created through reflection. (After the 
> zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is 
> as follows:
> ackend error message during job submission
> ---
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: hdfs://hostname/pathto/zebra_dir :: null
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
> at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
> at 
> org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
> at 
> org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
> ... 7 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915943#action_12915943
 ] 

Xuefu Zhang commented on PIG-1637:
--

+1

Patch looks good, except that we don't have to require that all output 
expressions in the first foreach contain only simple projection. As long as the 
output expression in the first foreach that is referenced multiple times in the 
second foreach contains only simple projection, the merge can proceed. Doing 
this, the following two loops may be better merged to one.

@@ -93,14 +93,17 @@
 // Otherwise, we may do expression calculation more than once, 
defeat the benefit of this
 // optimization
 Set inputs = new HashSet();
+boolean duplicateInputs = false;
 for (Operator op : foreach2.getInnerPlan().getSources()) {
 // If the source is not LOInnerLoad, then it must be 
LOGenerate. This happens when 
 // the 1st ForEach does not rely on any input of 2nd ForEach
 if (op instanceof LOInnerLoad) {
 LOInnerLoad innerLoad = (LOInnerLoad)op;
 int input = innerLoad.getProjection().getColNum();
-if (inputs.contains(input))
-return false;
+if (inputs.contains(input)) {
+duplicateInputs = true;
+break;
+}
 else
 inputs.add(input);
 
@@ -109,6 +112,27 @@
 }
 }
 
+// Duplicate inputs in the case first foreach only containing 
LOInnerLoad and
+// LOGenerate is allowed, and output plan is simple projection
+if (duplicateInputs) {
+Iterator it1 = 
foreach1.getInnerPlan().getOperators();
+while( it1.hasNext() ) {
+Operator op = it1.next();
+if(!(op instanceof LOGenerate) && !(op instanceof 
LOInnerLoad))
+return false;
+if (op instanceof LOGenerate) {
+List outputPlans = 
((LOGenerate)op).getOutputPlans();
+for (LogicalExpressionPlan outputPlan : outputPlans) {
+Iterator iter = 
outputPlan.getOperators();
+while (iter.hasNext()) {
+if (!(iter.next() instanceof 
ProjectExpression))
+return false;
+}
+}
+}
+}
+}


> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> 
>
> Key: PIG-1637
> URL: https://issues.apache.org/jira/browse/PIG-1637
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1637-1.patch, PIG-1637-2.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume a

[jira] Commented: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915941#action_12915941
 ] 

Daniel Dai commented on PIG-1579:
-

Rollback the change and run test many times, all tests pass. Seems some change 
between r990721 and now (r1002348) fix this issue. Will rollback the change and 
close the Jira.

> Intermittent unit test failure for 
> TestScriptUDF.testPythonScriptUDFNullInputOutput
> ---
>
> Key: PIG-1579
> URL: https://issues.apache.org/jira/browse/PIG-1579
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1579-1.patch
>
>
> Error message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error 
> executing function: Traceback (most recent call last):
>   File "", line 5, in multStr
> TypeError: can't multiply sequence by non-int of type 'NoneType'
> at 
> org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-28 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1648:
--

Status: Patch Available  (was: Open)

> Split combination may return too many block locations to map/reduce framework
> -
>
> Key: PIG-1648
> URL: https://issues.apache.org/jira/browse/PIG-1648
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1648.patch
>
>
> For instance, if a small split has block locations h1, h2 and h3; another 
> small split has h1, h3, h4. After combination, the composite split contains 4 
> block locations. If the number of component splits is big, then the number of 
> block locations could be big too. In fact, the  number of block locations 
> serves as a hint to M/R as the best hosts this composite split should be run 
> on so the list should contain a short list, say 5, of the hosts that contain 
> the most data in this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-28 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1648:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch committed to both trunk and the 0.8 branch.

> Split combination may return too many block locations to map/reduce framework
> -
>
> Key: PIG-1648
> URL: https://issues.apache.org/jira/browse/PIG-1648
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1648.patch
>
>
> For instance, if a small split has block locations h1, h2 and h3; another 
> small split has h1, h3, h4. After combination, the composite split contains 4 
> block locations. If the number of component splits is big, then the number of 
> block locations could be big too. In fact, the  number of block locations 
> serves as a hint to M/R as the best hosts this composite split should be run 
> on so the list should contain a short list, say 5, of the hosts that contain 
> the most data in this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-28 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915889#action_12915889
 ] 

Richard Ding commented on PIG-1648:
---

+1

> Split combination may return too many block locations to map/reduce framework
> -
>
> Key: PIG-1648
> URL: https://issues.apache.org/jira/browse/PIG-1648
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1648.patch
>
>
> For instance, if a small split has block locations h1, h2 and h3; another 
> small split has h1, h3, h4. After combination, the composite split contains 4 
> block locations. If the number of component splits is big, then the number of 
> block locations could be big too. In fact, the  number of block locations 
> serves as a hint to M/R as the best hosts this composite split should be run 
> on so the list should contain a short list, say 5, of the hosts that contain 
> the most data in this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915880#action_12915880
 ] 

Daniel Dai commented on PIG-1637:
-

test-patch result for PIG-1637-2.patch:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> 
>
> Key: PIG-1637
> URL: https://issues.apache.org/jira/browse/PIG-1637
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1637-1.patch, PIG-1637-2.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume after merge, we 
> need to calculate B twice in D. Actually, C1 is only doing projection, no 
> calculation of B. Merging C1 and D will not result calculating B twice. So C1 
> and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1653) Scripting UDF fails if the path to script is an absolute path

2010-09-28 Thread Daniel Dai (JIRA)

Scripting UDF fails if the path to script is an absolute path
-

 Key: PIG-1653
 URL: https://issues.apache.org/jira/browse/PIG-1653
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0


The following script fail:
{code}
register '/homes/jianyong/pig/aaa/scriptingudf.py' using jython as myfuncs;
a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() as 
(name, age, gpa:double);
b = foreach a generate myfuncs.square(gpa);
dump b;
{code}

If we change the register to use relative path (such as "aaa/scriptingudf.py"), 
it success.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1637:


Attachment: PIG-1637-2.patch

A bug caught by Xuefu. Reattach the patch.

> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> 
>
> Key: PIG-1637
> URL: https://issues.apache.org/jira/browse/PIG-1637
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1637-1.patch, PIG-1637-2.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume after merge, we 
> need to calculate B twice in D. Actually, C1 is only doing projection, no 
> calculation of B. Merging C1 and D will not result calculating B twice. So C1 
> and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Olga Natkovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1652:
---

Assignee: Thejas M Nair

> TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
> estimateNumberOfReducers bug
> 
>
> Key: PIG-1652
> URL: https://issues.apache.org/jira/browse/PIG-1652
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
> the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias records3
> at org.apache.pig.PigServer.storeEx(PigServer.java:877)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
> Unexpected error during execution.
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Illegal character in scheme name at index 69: 
> org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
> at org.apache.hadoop.fs.Path.initialize(Path.java:140)
> at org.apache.hadoop.fs.Path.(Path.java:126)
> at org.apache.hadoop.fs.Path.(Path.java:50)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.M

[jira] Commented: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Olga Natkovich (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915860#action_12915860
 ] 

Olga Natkovich commented on PIG-1652:
-

I think the code needs to be modified to default to 1 if we can't perform the 
computation

> TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
> estimateNumberOfReducers bug
> 
>
> Key: PIG-1652
> URL: https://issues.apache.org/jira/browse/PIG-1652
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
> Fix For: 0.8.0
>
>
> TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
> the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias records3
> at org.apache.pig.PigServer.storeEx(PigServer.java:877)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
> Unexpected error during execution.
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Illegal character in scheme name at index 69: 
> org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
> at org.apache.hadoop.fs.Path.initialize(Path.java:140)
> at org.apache.hadoop.fs.Path.(Path.java:126)
> at org.apache.hadoop.fs.Path.(Path.java:50)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
> at 
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLaunche

[jira] Created: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Daniel Dai (JIRA)

TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
estimateNumberOfReducers bug


 Key: PIG-1652
 URL: https://issues.apache.org/jira/browse/PIG-1652
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0


TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the 
input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store 
alias records3
at org.apache.pig.PigServer.storeEx(PigServer.java:877)
at org.apache.pig.PigServer.store(PigServer.java:815)
at org.apache.pig.PigServer.openIterator(PigServer.java:727)
at 
org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
Unexpected error during execution.
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
at org.apache.pig.PigServer.storeEx(PigServer.java:873)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Illegal character in scheme name at index 69: 
org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.(Path.java:126)
at org.apache.hadoop.fs.Path.(Path.java:50)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at 
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at 
index 69: 
org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
at java.net.URI$Parser.fail(URI.java:2809)
at java.net.URI$Parser.checkChars(URI.java:2982)
at java.net.URI$Parser.parse(URI.java:3009)
at java.net.URI.(URI.java:736)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)

The reason is we are trying to do globStatus

[jira] Commented: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-28 Thread Yan Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915852#action_12915852
 ] 

Yan Zhou commented on PIG-1648:
---

test-patch results:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

test-core tests pass too.


> Split combination may return too many block locations to map/reduce framework
> -
>
> Key: PIG-1648
> URL: https://issues.apache.org/jira/browse/PIG-1648
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1648.patch
>
>
> For instance, if a small split has block locations h1, h2 and h3; another 
> small split has h1, h3, h4. After combination, the composite split contains 4 
> block locations. If the number of component splits is big, then the number of 
> block locations could be big too. In fact, the  number of block locations 
> serves as a hint to M/R as the best hosts this composite split should be run 
> on so the list should contain a short list, say 5, of the hosts that contain 
> the most data in this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-28 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1648:
--

Attachment: PIG-1648.patch

> Split combination may return too many block locations to map/reduce framework
> -
>
> Key: PIG-1648
> URL: https://issues.apache.org/jira/browse/PIG-1648
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1648.patch
>
>
> For instance, if a small split has block locations h1, h2 and h3; another 
> small split has h1, h3, h4. After combination, the composite split contains 4 
> block locations. If the number of component splits is big, then the number of 
> block locations could be big too. In fact, the  number of block locations 
> serves as a hint to M/R as the best hosts this composite split should be run 
> on so the list should contain a short list, say 5, of the hosts that contain 
> the most data in this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Status: Patch Available  (was: Open)

> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input

2010-09-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1649:
---

Attachment: PIG-1649.1.patch

Patch passes unit tests and test-patch .


> FRJoin fails to compute number of input files for replicated input
> --
>
> Key: PIG-1649
> URL: https://issues.apache.org/jira/browse/PIG-1649
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1649.1.patch
>
>
> In FRJoin, if input path has curly braces, it fails to compute number of 
> input files and logs the following exception in the log -
> 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of 
> input files
> java.net.URISyntaxException: Illegal character in path at index 12: 
> /user/tejas/{std*txt}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> at java.net.URI$Parser.parse(URI.java:3024)
> at java.net.URI.(URI.java:578)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
> at org.apache.pig.PigServer.storeEx(PigServer.java:873)
> at org.apache.pig.PigServer.store(PigServer.java:815)
> at org.apache.pig.PigServer.openIterator(PigServer.java:727)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:453)
> at org.apache.pig.Main.main(Main.java:107)
> This does not cause a query to fail. But since the number of input files 
> don't get calculated, the optimizations added in PIG-1458 to reduce load on 
> name node will not get used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-28 Thread Yan Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915815#action_12915815
 ] 

Yan Zhou commented on PIG-1648:
---

Top 5 locations with most data will be used. This has been agreed upon by the 
M/R dev.

> Split combination may return too many block locations to map/reduce framework
> -
>
> Key: PIG-1648
> URL: https://issues.apache.org/jira/browse/PIG-1648
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
>
> For instance, if a small split has block locations h1, h2 and h3; another 
> small split has h1, h3, h4. After combination, the composite split contains 4 
> block locations. If the number of component splits is big, then the number of 
> block locations could be big too. In fact, the  number of block locations 
> serves as a hint to M/R as the best hosts this composite split should be run 
> on so the list should contain a short list, say 5, of the hosts that contain 
> the most data in this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1651) PIG class loading mishandled

2010-09-27 Thread Yan Zhou (JIRA)

PIG class loading mishandled


 Key: PIG-1651
 URL: https://issues.apache.org/jira/browse/PIG-1651
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Richard Ding
 Fix For: 0.8.0


If just having zebra.jar as being registered in a PIG script but not in the 
CLASSPATH, the query using zebra fails since there appear to be multiple 
classes loaded into JVM, causing static variable set previously not seen after 
one instance of the class is created through reflection. (After the zebra.jar 
is specified in CLASSPATH, it works fine.) The exception stack is as follows:

ackend error message during job submission
---
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
create input splits for: hdfs://hostname/pathto/zebra_dir :: null
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
at 
org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
at 
org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
at 
org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
at 
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
at 
org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
at 
org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
at 
org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
at 
org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
... 7 more



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-27 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1650:
---

Status: Patch Available  (was: Open)

> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-27 Thread niraj rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1650:
---

Attachment: PIG-1650_0.patch

This patch will fix many broken commands inside the grunt shell.

> pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
> -
>
> Key: PIG-1650
> URL: https://issues.apache.org/jira/browse/PIG-1650
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Attachments: PIG-1650_0.patch
>
>
> grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc

2010-09-27 Thread niraj rai (JIRA)

pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
-

 Key: PIG-1650
 URL: https://issues.apache.org/jira/browse/PIG-1650
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai


grunt shell breaks for many unix xommands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1637:


Attachment: PIG-1637-1.patch

> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> 
>
> Key: PIG-1637
> URL: https://issues.apache.org/jira/browse/PIG-1637
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1637-1.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume after merge, we 
> need to calculate B twice in D. Actually, C1 is only doing projection, no 
> calculation of B. Merging C1 and D will not result calculating B twice. So C1 
> and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 12415 matches

Mail list logo