[jira] Commented: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915860#action_12915860
 ] 

Olga Natkovich commented on PIG-1652:
-

I think the code needs to be modified to default to 1 if we can't perform the 
computation

 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
 estimateNumberOfReducers bug
 

 Key: PIG-1652
 URL: https://issues.apache.org/jira/browse/PIG-1652
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0


 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
 the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
 store alias records3
 at org.apache.pig.PigServer.storeEx(PigServer.java:877)
 at org.apache.pig.PigServer.store(PigServer.java:815)
 at org.apache.pig.PigServer.openIterator(PigServer.java:727)
 at 
 org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
 Unexpected error during execution.
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
 at org.apache.pig.PigServer.storeEx(PigServer.java:873)
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Illegal character in scheme name at index 69: 
 org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at 
 org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
 at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
 at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
 Caused by: java.net.URISyntaxException: Illegal 

[jira] Assigned: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1652:
---

Assignee: Thejas M Nair

 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
 estimateNumberOfReducers bug
 

 Key: PIG-1652
 URL: https://issues.apache.org/jira/browse/PIG-1652
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Thejas M Nair
 Fix For: 0.8.0


 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
 the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
 store alias records3
 at org.apache.pig.PigServer.storeEx(PigServer.java:877)
 at org.apache.pig.PigServer.store(PigServer.java:815)
 at org.apache.pig.PigServer.openIterator(PigServer.java:727)
 at 
 org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
 Unexpected error during execution.
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
 at org.apache.pig.PigServer.storeEx(PigServer.java:873)
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Illegal character in scheme name at index 69: 
 org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
 at 
 org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
 at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
 at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
 Caused by: java.net.URISyntaxException: Illegal character in scheme name at 
 index 69: 
 

[jira] Resolved: (PIG-1646) Error meassage for pig root directory does not existcab be more meaningful

2010-09-24 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1646.
-

Resolution: Invalid

this ticket is for particular deployment scenerio - it has nothing to do with 
core pig functionality.

 Error meassage for pig root directory does not existcab be more meaningful
 

 Key: PIG-1646
 URL: https://issues.apache.org/jira/browse/PIG-1646
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Sherry Chen
Priority: Minor

 Currently, the error message for pig root directory does not exist is:
* You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, 
 symlink /grid/0/gs/pig/0.8 does not exist
 It can be corrected as:
* Pig root directory should be /grid/0/gs/pig/0.8, however, symlink 
 /grid/0/gs/pig/0.8 does not exist
 Steps to test:
 1. submit a pig job:  pig -useversion 0.8 -exectype local local.pig
 2. Read the error message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1600) Pig 080 Documentation

2010-09-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914731#action_12914731
 ] 

Olga Natkovich commented on PIG-1600:
-

patch committed to the trank and 0.8 branch. Thanks, Corinne

 Pig 080 Documentation
 -

 Key: PIG-1600
 URL: https://issues.apache.org/jira/browse/PIG-1600
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Fix For: 0.8.0

 Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch, 
 pig080-3.patch


 Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1504) need to document new functions moved from piggybank to builtin

2010-09-24 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1504.
-

Resolution: Fixed

 need to document new functions moved from piggybank to builtin
 --

 Key: PIG-1504
 URL: https://issues.apache.org/jira/browse/PIG-1504
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0


 We need to document the following new functions:
 ABS
 ACOS
 ASIN
 ATAN
 CBRT
 CEIL
 COR
 COSH
 COS
 COV
 EXP
 FLOOR
 INDEXOF
 LAST_INDEX_OF
 LCFIRST
 LOG10
 LOG
 LOWER
 RANDOM
 REGEX_EXTRACT_ALL
 REGEX_EXTRACT
 REPLACE
 ROUND
 SINH
 SIN
 SPLIT
 SQRT
 SUBSTRING
 TANH
 TAN
 TOBAG
 TOP
 TOTUPLE
 TRIM
 UCFIRST
 UPPER
 Large part of them are math function and descriptions can be found here: 
 http://download.oracle.com/docs/cd/E17409_01/javase/7/docs/api/java/lang/Math.html
 Dor the rest, we would need to provide descriptions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1632:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913733#action_12913733
 ] 

Olga Natkovich commented on PIG-1632:
-

Hi Eli, thanks for the patch.

I don't think this is the approach we want to take. I think we should publish 
just core pig jar in maven since users have a way to pull the dependencies. 
However, as part of our release package we should include bundled pig.jar so 
that it works for users out of the box and they get exactly the version we have 
been testing for. I am fine if additionally we include the core jar as well if 
we do not do this already.

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913743#action_12913743
 ] 

Olga Natkovich commented on PIG-1632:
-

I am fine with your second proposal which is what I also suggested in my last 
comment. The first one makes it harder for the users to compile their UDFs

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913759#action_12913759
 ] 

Olga Natkovich commented on PIG-1632:
-

+ 1, patch looks good. I will commit it to trunk and 0.8 branch shortly

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913788#action_12913788
 ] 

Olga Natkovich commented on PIG-1632:
-

patch committed to both 0.8 branch and trunk. Thanks, Eli for contributing!

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1635:


Fix Version/s: 0.8.0

 Logical simplifier does not simplify away constants under AND and OR; after 
 simplificaion the ordering of operands of AND and OR may get changed
 

 Key: PIG-1635
 URL: https://issues.apache.org/jira/browse/PIG-1635
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0


 b = FILTER a by (( f1  1) AND (1 == 1))
 or 
 b = FILTER a by ((f1  1) OR ( 1==0))
 should be simplified to
 b = FILTER a by f1  1;
 Regarding ordering change, an example is that 
 b = filter a by ((f1 is not null) AND (f2 is not null));
 Even without possible simplification, the expression is changed to
 b = filter a by ((f2 is not null) AND (f1 is not null));
 Even though the ordering change in this case, and probably in most other 
 cases, does not create any difference, but for two reasons some users might 
 care about the ordering: if stateful UDFs are used as operands of AND or OR; 
 and if the ordering is intended by the application designer to maximize the 
 chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1339) International characters in column names not supported

2010-09-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1339:


Fix Version/s: 0.9.0

We should see if the new parser makes this easier and if so fix it. 

 International characters in column names not supported
 --

 Key: PIG-1339
 URL: https://issues.apache.org/jira/browse/PIG-1339
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Viraj Bhat
 Fix For: 0.9.0


 There is a particular use-case in which someone specifies a column name to be 
 in International characters.
 {code}
 inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
 describe inputdata;
 dump inputdata;
 {code}
 ==
 Pig Stack Trace
 ---
 ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
 Encountered: \u3042 (12354), after : 
 org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
 1, column 64.  Encountered: \u3042 (12354), after : 
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:391)
 ==
 Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1640) bin/pig does not run in local mode due to classes missing from classpath

2010-09-21 Thread Olga Natkovich (JIRA)
bin/pig does not run in local mode due to classes missing from classpath


 Key: PIG-1640
 URL: https://issues.apache.org/jira/browse/PIG-1640
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
 Fix For: 0.8.0


This issue was reported by one of Yahoo users. I have not verified the problem. 
Here is the report

when do bin/pig -x local, the shell doesn't come up.  It complained about 
jline not being found.  Here is a patch to bin/pig:

+for f in $PIG_HOME/build/ivy/lib/Pig/*.jar; do
+CLASSPATH=${CLASSPATH}:$f;
+done
+


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1639:


Assignee: Xuefu Zhang  (was: Daniel Dai)

 New logical plan: PushUpFilter should not optimize if filter condition 
 contains UDF
 ---

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1617) 'group all' should always use one reducer

2010-09-20 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912655#action_12912655
 ] 

Olga Natkovich commented on PIG-1617:
-

Looks good. +1

 'group all' should always use one reducer
 -

 Key: PIG-1617
 URL: https://issues.apache.org/jira/browse/PIG-1617
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1617.1.patch


 'group all' sends all rows to a single reducer, it does not make sense to 
 spawn more than one reducer for it. But if higher value of parallelism is 
 specified or if the input is large enough so that changes in PIG-1249 result 
 in larger value being set, there are additional reducers spawned that don't 
 do anything useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-09-20 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1579:
---

Assignee: Daniel Dai

 Intermittent unit test failure for 
 TestScriptUDF.testPythonScriptUDFNullInputOutput
 ---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1579-1.patch


 Error message:
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error 
 executing function: Traceback (most recent call last):
   File iostream, line 5, in multStr
 TypeError: can't multiply sequence by non-int of type 'NoneType'
 at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1624) FOREACH AS documentation is incorrect

2010-09-17 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1624:


Fix Version/s: 0.8.0
   (was: 0.9.0)

We are still updating docs so we should be able to get this in for 0.8

 FOREACH AS documentation is incorrect
 -

 Key: PIG-1624
 URL: https://issues.apache.org/jira/browse/PIG-1624
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Corinne Chandel
 Fix For: 0.8.0


 According to the Pig Latin manual 
 (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the 
 correct usage of AS in a FOREACH clause is:
 {code}
 B = foreach A generate $0, $1, $2 as (user, age, gpa);
 {code}
 However, this is incorrect, and produce a syntax error.  The correct syntax 
 for AS for FOREACH is:
 {code}
 B = foreach A generate $0 as user, $1 as age, $2 as gpa;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1626) Need to clarify how COUNT handles nulls

2010-09-17 Thread Olga Natkovich (JIRA)
Need to clarify how COUNT handles nulls
---

 Key: PIG-1626
 URL: https://issues.apache.org/jira/browse/PIG-1626
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0


The current documentation just states: The COUNT function ignores NULL values. 
If you want to include NULL values in the count computation, use COUNT_STAR. 

The new text should be something like

The COUNT function follows syntax semantics and ignores nulls. What this means 
is that a tuple in the bag will not be counted if the first field in this tuple 
is NULL. If you want to include NULL values in the count computation, use 
COUNT_STAR. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1629) Need ability to limit bags produced during GROUP + LIMIT

2010-09-17 Thread Olga Natkovich (JIRA)
Need ability to limit bags produced during GROUP + LIMIT


 Key: PIG-1629
 URL: https://issues.apache.org/jira/browse/PIG-1629
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Fix For: 0.9.0


Currently, the code below will construct the full group in memory and then trim 
it. This requires in use of more memory than needed.

A = load 'data' as (x, y, z);
B = group A by x;
C = foreach B{
D = limit A 100;
generate group, MyUDF(D);}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1615) Return code from Pig is 0 even if the job fails when using -M flag

2010-09-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1615.
-

Resolution: Fixed

 Return code from Pig is 0 even if the job fails when using -M flag
 --

 Key: PIG-1615
 URL: https://issues.apache.org/jira/browse/PIG-1615
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.8.0


 I have a Pig script of this form, which I used inside a workflow system such 
 as Oozie.
 {code}
 A = load  '$INPUT' using PigStorage();
 store A into '$OUTPUT';
 {code}
 I run this as with Multi-query optimization turned off :
 {quote}
 $java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -p 
 INPUT=/user/viraj/junk1 -M -p OUTPUT=/user/viraj/junk2 loadpigstorage.pig
 {quote}
 The directory /user/viraj/junk1 is not present
 I get the following results:
 {quote}
 Input(s):
 Failed to read data from /user/viraj/junk1
 Output(s):
 Failed to produce result in /user/viraj/junk2
 {quote}
 This is expected, but the return code is still 0
 {code}
 $ echo $?
 0
 {code}
 If I run this script with Multi-query optimization turned on, it gives, a 
 return code of 2, which is correct.
 {code}
 $ java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -p 
 INPUT=/user/viraj/junk1 -p OUTPUT=/user/viraj/junk2 loadpigstorage.pig
 ...
 $ echo $?
 2
 {code}
 I believe a wrong return code from Pig, is causing Oozie to believe that Pig 
 script succeeded.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1247) Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error

2010-09-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1247:
---

Assignee: Xuefu Zhang

 Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 -

 Key: PIG-1247
 URL: https://issues.apache.org/jira/browse/PIG-1247
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Xuefu Zhang
 Fix For: 0.9.0


 I have a large script in which there are intermediate stores statements, one 
 of them writes to a directory I do not have permission to write to. 
 The stack trace I get from Pig is this:
 2010-02-20 02:16:32,055 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2999: Unexpected internal error. 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 Details at logfile: /home/viraj/pig_1266632145355.log
 Pig Stack Trace
 ---
 ERROR 2999: Unexpected internal error. 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 java.lang.ClassCastException: 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3583)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1407)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:949)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:762)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1036)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:986)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:386)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 
 The only way to find the error was to look at the javacc generated 
 QueryParser.java code and do a System.out.println()
 Here is a script to reproduce the problem:
 {code}
 A = load '/user/viraj/three.txt' using PigStorage();
 B = foreach A generate ['a'#'12'] as b:map[] ;
 store B into '/user/secure/pigtest' using PigStorage();
 {code}
 three.txt has 3 lines which contain nothing but the number 1.
 {code}
 $ hadoop fs -ls /user/secure/
 ls: could not get get listing for 'hdfs://mynamenode/user/secure' : 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=viraj, access=READ_EXECUTE, inode=secure:secure:users:rwx--
 {code}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1592) ORDER BY distribution is uneven when record size is correlated with order key

2010-09-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1592:
---

Assignee: Thejas M Nair

 ORDER BY distribution is uneven when record size is correlated with order key
 -

 Key: PIG-1592
 URL: https://issues.apache.org/jira/browse/PIG-1592
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Thejas M Nair
 Fix For: 0.9.0


 The partitioner contributed in PIG-545 distributes the order key space 
 between partitions so that each partition gets approximately the same number 
 of keys, even when the keys have a non-uniform distribution over the key 
 space.
 Unfortunately this still allows for severe partition imbalance when record 
 size is correlated with the order key. By way of motivating example, consider 
 this script which attempts to produce a list of genuses based on how many 
 species each genus contains:
 {code}
 set default_parallel 60;
 critters = load 'biodata'' as (genus, species);
 genus_counts = foreach (group critters by genus) generate group as genus, 
 COUNT(critters) as num_species, critters;
 ordered_genuses = order genus_counts by num_species desc;
 store ordered_genuses
 {code}
 The higher the value of genus_counts, the more species tuples will be 
 contained in the critters bag, the wider the row. This can cause a severe 
 processing imbalance, as the partitioner processing the records with the 
 highest values of genus_counts will have the same number of *records* as the 
 partitioner processing the lowest number, but it will have far more actual 
 *bytes* to work on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1606) flatten documentation does not discuss flatten of empty bag

2010-09-15 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909810#action_12909810
 ] 

Olga Natkovich commented on PIG-1606:
-

If we are not planning to change the semantics I will ask Corinne to document 
for 0.8

 flatten documentation does not discuss flatten of empty bag
 ---

 Key: PIG-1606
 URL: https://issues.apache.org/jira/browse/PIG-1606
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
 Fix For: 0.9.0


 From the existing flatten documentation, it is not clear that flatten of an 
 empty bag results in that row being discarded .
 For example the following query gives no output -
 {code}
 grunt cat /tmp/empty.bag
 {}  1
 grunt l = load '/tmp/empty.bag' as (b : bag{}, i : int);
 grunt f = foreach l generate flatten(b), i;
 grunt dump f;
 grunt
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1606) flatten documentation does not discuss flatten of empty bag

2010-09-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1606:


 Assignee: Corinne Chandel
Fix Version/s: 0.8.0
   (was: 0.9.0)

 flatten documentation does not discuss flatten of empty bag
 ---

 Key: PIG-1606
 URL: https://issues.apache.org/jira/browse/PIG-1606
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
Assignee: Corinne Chandel
 Fix For: 0.8.0


 From the existing flatten documentation, it is not clear that flatten of an 
 empty bag results in that row being discarded .
 For example the following query gives no output -
 {code}
 grunt cat /tmp/empty.bag
 {}  1
 grunt l = load '/tmp/empty.bag' as (b : bag{}, i : int);
 grunt f = foreach l generate flatten(b), i;
 grunt dump f;
 grunt
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1613) Explain how different UDF interfaces are used

2010-09-15 Thread Olga Natkovich (JIRA)
Explain how different UDF interfaces are used
-

 Key: PIG-1613
 URL: https://issues.apache.org/jira/browse/PIG-1613
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.7.0
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0


The current documentation describes individual UDF interfaces such as Algebraic 
and Accumulator but not their precedence or how they interact with each other 
and why you might want to implement several of them.

Corrine, I will add release notes to this JIRA shortly. Don't worry about it 
till then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1613) Explain how different UDF interfaces are used

2010-09-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1613:


Release Note: 
I think this should go into Advanced Topics in the UDF manual

There are multiple ways for a UDF to be invoked. The simplest UDF can just 
extend EvalFunc that requires only exec function to be implemented as described 
in the How to Write a Simple Eval Function section. Every eval UDF must 
implement this. Additionally, if a function is algebraic, it can implement 
Algebraic interface to significantly improve query performance in the cases 
when combiner can be used. The Aggregate Functions section covers this topic in 
detail. Finally, a function that can process tuples in the incremental fashion 
can also implement Accumulator interface to improve query memory consumption. 
Accumulator interface section explains this interface.

The exact method by which UDF is invoked is selected by the optimizer based on 
the UDF type and the query. Note that only a single interface is used at any 
given time. The optimizer tries to find the most efficient way to execute the 
function. If a combiner is used and function implements Algebraic interface 
then this interface will be used to invoke the function. If the combiner is not 
invoked but accumulator can be used and the function implements Accumulator 
interface then that interface is used. If neither of the conditions is 
satisfied then exec function is used to invoke the UDF.


Can one of the developers review the release notes to make sure they are 
accurate, thanks.

 Explain how different UDF interfaces are used
 -

 Key: PIG-1613
 URL: https://issues.apache.org/jira/browse/PIG-1613
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.7.0
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0


 The current documentation describes individual UDF interfaces such as 
 Algebraic and Accumulator but not their precedence or how they interact with 
 each other and why you might want to implement several of them.
 Corrine, I will add release notes to this JIRA shortly. Don't worry about it 
 till then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1578) PigServer.executeBatch does not return status of failed job for native mapreduce statement

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1578:


Fix Version/s: (was: 0.8.0)

 PigServer.executeBatch does not return status of failed job for native 
 mapreduce statement
 --

 Key: PIG-1578
 URL: https://issues.apache.org/jira/browse/PIG-1578
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Richard Ding

 For failed job PigServer.executeBatch does not return ExecJob . 
 ExecJobs are created using output statistics, and the output statistics for 
 jobs that failed does not seem to exist.
 The query i tried was a native mapreduce job, where the output file of the 
 native mr job already exists causing that job to fail.
 {code}
 A = load ' + INPUT_FILE + ';
 B = mapreduce ' + jarFileName + '  +
 Store A into 'table_testNativeMRJobSimple_input' +
 Load 'table_testNativeMRJobSimple_output' +
 `WordCount table_testNativeMRJobSimple_input  + INPUT_FILE + 
 `;);
 Store B into 'table_testNativeMRJobSimpleDir';);
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-815) misleading error message when streaming fails

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-815.


Resolution: Won't Fix

I don't think we have sufficient information to act on this

 misleading error message when streaming fails
 -

 Key: PIG-815
 URL: https://issues.apache.org/jira/browse/PIG-815
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.9.0


 One of the users reported seeing a confusing message: Jobs not found in the 
 JobClient. Please try to use Local, Hadoop Distributed or Hadoop MiniCluster 
 modes instead of Hadoop LocalExecution ERROR 2055: Received Error while 
 processing the map plan: 'process.pl ' failed with exit status: 255 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-638) error handling - enforce error codes

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-638:
---

Fix Version/s: (was: 0.9.0)

 error handling - enforce error codes
 

 Key: PIG-638
 URL: https://issues.apache.org/jira/browse/PIG-638
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Santhosh Srinivasan

 We should not allow exceptions that don't set error code as that kind of 
 information is not helpful for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1017) Converts strings to text in Pig

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1017:


Assignee: Thejas M Nair  (was: Sriranjan Manjunath)

We need to decide if this is something we should do for 0.9

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
Assignee: Thejas M Nair
 Fix For: 0.9.0

 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908984#action_12908984
 ] 

Olga Natkovich commented on PIG-366:


I think it used to use true local mode in pig. However, we no longer support 
this and the new version need to be connected to the current local mode in pig 
which is basically hadoop's local mode

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Robert Gibbon
Priority: Minor
 Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
 org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1600) Pig 080 Documentation

2010-09-10 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908139#action_12908139
 ] 

Olga Natkovich commented on PIG-1600:
-

I have reviewed the patch and will be committing it to trunk and 0.7 branch as 
soon as I have a successful doc build. Thanks, Corinne!

 Pig 080 Documentation
 -

 Key: PIG-1600
 URL: https://issues.apache.org/jira/browse/PIG-1600
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Fix For: 0.8.0

 Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch


 Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1600) Pig 080 Documentation

2010-09-10 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908150#action_12908150
 ] 

Olga Natkovich commented on PIG-1600:
-

pig080-2-2.patch committed to both trunk and 0.8 branch

 Pig 080 Documentation
 -

 Key: PIG-1600
 URL: https://issues.apache.org/jira/browse/PIG-1600
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Fix For: 0.8.0

 Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch


 Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1606) flatten documentation does not discuss flatten of empty bag

2010-09-10 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1606:


Fix Version/s: 0.9.0

 flatten documentation does not discuss flatten of empty bag
 ---

 Key: PIG-1606
 URL: https://issues.apache.org/jira/browse/PIG-1606
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
 Fix For: 0.9.0


 From the existing flatten documentation, it is not clear that flatten of an 
 empty bag results in that row being discarded .
 For example the following query gives no output -
 {code}
 grunt cat /tmp/empty.bag
 {}  1
 grunt l = load '/tmp/empty.bag' as (b : bag{}, i : int);
 grunt f = foreach l generate flatten(b), i;
 grunt dump f;
 grunt
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-10 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908249#action_12908249
 ] 

Olga Natkovich commented on PIG-1608:
-

pig-default is the only one we include. The other one is for users.

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai

 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1606) flatten documentation does not discuss flatten of empty bag

2010-09-10 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908250#action_12908250
 ] 

Olga Natkovich commented on PIG-1606:
-

Is this even the semantics we want. I would expect a single row with an empty 
field.

 flatten documentation does not discuss flatten of empty bag
 ---

 Key: PIG-1606
 URL: https://issues.apache.org/jira/browse/PIG-1606
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
 Fix For: 0.9.0


 From the existing flatten documentation, it is not clear that flatten of an 
 empty bag results in that row being discarded .
 For example the following query gives no output -
 {code}
 grunt cat /tmp/empty.bag
 {}  1
 grunt l = load '/tmp/empty.bag' as (b : bag{}, i : int);
 grunt f = foreach l generate flatten(b), i;
 grunt dump f;
 grunt
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1518) multi file input format for loaders

2010-09-09 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907805#action_12907805
 ] 

Olga Natkovich commented on PIG-1518:
-

Hi Justin, thanks for the patch!

I don't think we can commit it to 0.7 patch because we have already done the 
official 0.7 release and we can't introduce non-backward compatible changes to 
this branch.

However, I think it is great to have the patch on the JIRA so that anybody who 
is interested in this patch can apply it to their own tree and run with it. We 
have done similar things in the past (with hadoop versions) and it worked fine.

 multi file input format for loaders
 ---

 Key: PIG-1518
 URL: https://issues.apache.org/jira/browse/PIG-1518
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1518-0.7.0.patch, PIG-1518.patch, PIG-1518.patch, 
 PIG-1518.patch, PIG-1518.patch, PIG-1518.patch, PIG-1518.patch, 
 PIG-1518.patch, PIG-1518.patch


 We frequently run in the situation where Pig needs to deal with small files 
 in the input. In this case a separate map is created for each file which 
 could be very inefficient. 
 It would be greate to have an umbrella input format that can take multiple 
 files and use them in a single split. We would like to see this working with 
 different data formats if possible.
 There are already a couple of input formats doing similar thing: 
 MultifileInputFormat as well as CombinedInputFormat; howevere, neither works 
 with ne Hadoop 20 API. 
 We at least want to do a feasibility study for Pig 0.8.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1600) Pig 080 Documentation

2010-09-03 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906094#action_12906094
 ] 

Olga Natkovich commented on PIG-1600:
-

patch committed to 0.8 branch; trunk is next

 Pig 080 Documentation
 -

 Key: PIG-1600
 URL: https://issues.apache.org/jira/browse/PIG-1600
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Attachments: pig080-1.patch


 Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1600) Pig 080 Documentation

2010-09-03 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906107#action_12906107
 ] 

Olga Natkovich commented on PIG-1600:
-

patch committed to the trunk as well. thanks, corinne!

 Pig 080 Documentation
 -

 Key: PIG-1600
 URL: https://issues.apache.org/jira/browse/PIG-1600
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Fix For: 0.8.0

 Attachments: pig080-1.patch


 Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1600) Pig 080 Documentation

2010-09-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1600:


Fix Version/s: 0.8.0

 Pig 080 Documentation
 -

 Key: PIG-1600
 URL: https://issues.apache.org/jira/browse/PIG-1600
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Fix For: 0.8.0

 Attachments: pig080-1.patch


 Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1544) proactive-spill bags should share the memory alloted for it

2010-09-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905628#action_12905628
 ] 

Olga Natkovich commented on PIG-1544:
-

I am going to take my previous comment back and say that we should make this 
work for UDFs as well. The main reason for this is that we don't have another 
way to make sure that UDFs do not run out of memory. One approach that Alan 
proposed was to make bags when they are created to ask for memory and have a 
central broker in charge of the memory pool. The details of this or whether 
there is a better approach need to be still thought through.

 proactive-spill bags should share the memory alloted for it
 ---

 Key: PIG-1544
 URL: https://issues.apache.org/jira/browse/PIG-1544
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair

 Initially proactive spill bags were designed for use in (co)group 
 (InternalCacheBag) and they knew the total number of proactive bags that were 
 present, and shared the memory limit specified using the property 
 pig.cachedbag.memusage .
 But the two proactive bag implementations were added later - 
 InternalDistinctBag and InternalSortedBag are not aware of actual number of 
 bags being used - their users always assume total-numbags = 3. 
 This needs to be fixed and all proactive-spill bags should share the 
 memory-limit .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1544) proactive-spill bags should share the memory alloted for it

2010-09-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1544:


 Assignee: Thejas M Nair
Fix Version/s: 0.9.0

 proactive-spill bags should share the memory alloted for it
 ---

 Key: PIG-1544
 URL: https://issues.apache.org/jira/browse/PIG-1544
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.9.0


 Initially proactive spill bags were designed for use in (co)group 
 (InternalCacheBag) and they knew the total number of proactive bags that were 
 present, and shared the memory limit specified using the property 
 pig.cachedbag.memusage .
 But the two proactive bag implementations were added later - 
 InternalDistinctBag and InternalSortedBag are not aware of actual number of 
 bags being used - their users always assume total-numbags = 3. 
 This needs to be fixed and all proactive-spill bags should share the 
 memory-limit .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1309) Sort Merge Cogroup

2010-09-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1309:


Summary: Sort Merge Cogroup  (was: Map-side Cogroup)

 Sort Merge Cogroup
 --

 Key: PIG-1309
 URL: https://issues.apache.org/jira/browse/PIG-1309
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0, 0.8.0

 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, 
 PIG_1309_7.patch


 In never ending quest to make Pig go faster, we want to parallelize as many 
 relational operations as possible. Its already possible to do Group-by( 
 PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
 is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1550) better error handling in casting relations to scalars

2010-09-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905717#action_12905717
 ] 

Olga Natkovich commented on PIG-1550:
-

I will review the patch


 better error handling in casting relations to scalars
 -

 Key: PIG-1550
 URL: https://issues.apache.org/jira/browse/PIG-1550
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1550.1.patch


 I ran the following script:
 Input data:
 joe 100
 sam 20
 bob 134
 Script:
 A = load 'user_clicks' as (user: chararray, clicks: int);
 B = group A by user;
 C = foreach A generate group, SUM(A.clicks);
 D = foreach A generate clicks/(double)C.$1;
 dump C;
 Since C contains more than 1 tuple, I expected to get an error which I did. 
 However, the error was not very clear. When the job failed, I did see a valid 
 error (however it lacked the error code): 210630 [main] ERROR 
 org.apache.pig.tools.pigstats.PigStats  - ERROR 0: Scalar has more than one 
 row in the output
  However at the end of processing, I saw a misleading error:
 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 2088: Unable to 
 get results for: 
 hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage
 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: 
 hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1550) better error handling in casting relations to scalars

2010-09-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905731#action_12905731
 ] 

Olga Natkovich commented on PIG-1550:
-

+1, looks good

 better error handling in casting relations to scalars
 -

 Key: PIG-1550
 URL: https://issues.apache.org/jira/browse/PIG-1550
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1550.1.patch


 I ran the following script:
 Input data:
 joe 100
 sam 20
 bob 134
 Script:
 A = load 'user_clicks' as (user: chararray, clicks: int);
 B = group A by user;
 C = foreach A generate group, SUM(A.clicks);
 D = foreach A generate clicks/(double)C.$1;
 dump C;
 Since C contains more than 1 tuple, I expected to get an error which I did. 
 However, the error was not very clear. When the job failed, I did see a valid 
 error (however it lacked the error code): 210630 [main] ERROR 
 org.apache.pig.tools.pigstats.PigStats  - ERROR 0: Scalar has more than one 
 row in the output
  However at the end of processing, I saw a misleading error:
 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 2088: Unable to 
 get results for: 
 hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage
 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: 
 hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1594) NullPointerException in new logical planner

2010-09-01 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1594:


 Assignee: Daniel Dai
Fix Version/s: 0.8.0

 NullPointerException in new logical planner
 ---

 Key: PIG-1594
 URL: https://issues.apache.org/jira/browse/PIG-1594
 Project: Pig
  Issue Type: Bug
Reporter: Andrew Hitchcock
Assignee: Daniel Dai
 Fix For: 0.8.0


 I've been testing the trunk version of Pig on Elastic MapReduce against our 
 log processing sample application(1). When I try to run the query it throws a 
 NullPointerException and suggests I disable the new logical plan. Disabling 
 it works and the script succeeds. Here is the query I'm trying to run:
 {code}
 register file:/home/hadoop/lib/pig/piggybank.jar
   DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
   RAW_LOGS = LOAD '$INPUT' USING TextLoader as (line:chararray);
   LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line, '^(\\S+) (\\S+) 
 (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] (.+?) (\\S+) (\\S+) ([^]*) 
 ([^]*)')) as (remoteAddr:chararray, remoteLogname:chararray, 
 user:chararray, time:chararray, request:chararray, status:int, 
 bytes_string:chararray, referrer:chararray, browser:chararray);
   REFERRER_ONLY = FOREACH LOGS_BASE GENERATE referrer;
   FILTERED = FILTER REFERRER_ONLY BY referrer matches '.*bing.*' OR referrer 
 matches '.*google.*';
   SEARCH_TERMS = FOREACH FILTERED GENERATE FLATTEN(EXTRACT(referrer, 
 '.*[\\?]q=([^]+).*')) as terms:chararray;
   SEARCH_TERMS_FILTERED = FILTER SEARCH_TERMS BY NOT $0 IS NULL;
   SEARCH_TERMS_COUNT = FOREACH (GROUP SEARCH_TERMS_FILTERED BY $0) GENERATE 
 $0, COUNT($1) as num;
   SEARCH_TERMS_COUNT_SORTED = LIMIT(ORDER SEARCH_TERMS_COUNT BY num DESC) 50;
   STORE SEARCH_TERMS_COUNT_SORTED into '$OUTPUT';
 {code}
 And here is the stack trace that results:
 {code}
 ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2042: Error in 
 new logical plan. Try -Dpig.usenewlogicalplan=false.
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285)
 at org.apache.pig.PigServer.compilePp(PigServer.java:1301)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1154)
 at org.apache.pig.PigServer.execute(PigServer.java:1148)
 at org.apache.pig.PigServer.access$100(PigServer.java:123)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:1464)
 at org.apache.pig.PigServer.executeBatchEx(PigServer.java:350)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:111)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
 at org.apache.pig.Main.run(Main.java:491)
 at org.apache.pig.Main.main(Main.java:107)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
 at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76)
 at 
 org.apache.pig.piggybank.impl.ErrorCatchingBase.outputSchema(ErrorCatchingBase.java:76)
 at 
 org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:111)
 at 
 org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:175)
 at 
 org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143)
 at 
 org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:55)
 at 
 org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69)
 at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
 at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:87)
 at 
 org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:149)
 at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74)
 at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:76)
 at 
 

[jira] Updated: (PIG-1199) help includes obsolete options

2010-09-01 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1199:


Release Note: 
Help now takes properties keyword to show all java properties supported by Pig:

The following properties are supported:
Logging:
verbose=true|false; default is false. This property is the same as -v 
switch
brief=true|false; default is false. This property is the same as -b 
switch
debug=OFF|ERROR|WARN|INFO|DEBUG; default is INFO. This property is the 
same as -d switch
...

 help includes obsolete options
 --

 Key: PIG-1199
 URL: https://issues.apache.org/jira/browse/PIG-1199
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1199.patch, PIG-1199_2.patch


 This is confusing to users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1585) Add new properties to help and documentation

2010-09-01 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905323#action_12905323
 ] 

Olga Natkovich commented on PIG-1585:
-

Since this is just a minor cosmetic patch, I am just planning to commit the 
changes to both the branch and the trunk without tests and review.

 Add new properties to help and documentation
 

 Key: PIG-1585
 URL: https://issues.apache.org/jira/browse/PIG-1585
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1585.patch


 New properties:
 Compression:
 pig.tmpfilecompression, default to false, tells if the temporary files should 
 be compressed or not. If true, then 
 pig.tmpfilecompression.codec specifies which compression codec to use. 
 Currently, PIG only accepts gz and lzo as possible values. Since LZO is 
 under GPL license, Hadoop may need to be configured to use LZO codec. Please 
 refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for 
 details. 
 Combining small files:
 pig.noSplitCombination - disables combining multiple small files to the block 
 size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1585) Add new properties to help and documentation

2010-09-01 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1585:


Attachment: PIG-1585.patch

 Add new properties to help and documentation
 

 Key: PIG-1585
 URL: https://issues.apache.org/jira/browse/PIG-1585
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1585.patch


 New properties:
 Compression:
 pig.tmpfilecompression, default to false, tells if the temporary files should 
 be compressed or not. If true, then 
 pig.tmpfilecompression.codec specifies which compression codec to use. 
 Currently, PIG only accepts gz and lzo as possible values. Since LZO is 
 under GPL license, Hadoop may need to be configured to use LZO codec. Please 
 refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for 
 details. 
 Combining small files:
 pig.noSplitCombination - disables combining multiple small files to the block 
 size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1585) Add new properties to help and documentation

2010-09-01 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1585.
-

Resolution: Fixed

patch committed to both trunk and 0.8 branch. I also added 
LogicalExpressionSimplifier to the help

 Add new properties to help and documentation
 

 Key: PIG-1585
 URL: https://issues.apache.org/jira/browse/PIG-1585
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1585.patch


 New properties:
 Compression:
 pig.tmpfilecompression, default to false, tells if the temporary files should 
 be compressed or not. If true, then 
 pig.tmpfilecompression.codec specifies which compression codec to use. 
 Currently, PIG only accepts gz and lzo as possible values. Since LZO is 
 under GPL license, Hadoop may need to be configured to use LZO codec. Please 
 refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for 
 details. 
 Combining small files:
 pig.noSplitCombination - disables combining multiple small files to the block 
 size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1429:


Fix Version/s: (was: 0.8.0)

Unlinking because we are branching for release today

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Attachments: working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1549) Provide utility to construct CNF form of predicates

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1549:


Fix Version/s: (was: 0.8.0)

Unlinking from 0.8 release since we are about to branch

 Provide utility to construct CNF form of predicates
 ---

 Key: PIG-1549
 URL: https://issues.apache.org/jira/browse/PIG-1549
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Attachments: 0001-Add-CNF-utility-class.patch


 Provide utility to construct CNF form of predicates

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1530) PIG Logical Optimization: Push LOFilter above LOCogroup

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1530.
-

Resolution: Duplicate

Xuefu is addressing this issue as part of 
https://issues.apache.org/jira/browse/PIG-1575.

  PIG Logical Optimization: Push LOFilter above LOCogroup
 

 Key: PIG-1530
 URL: https://issues.apache.org/jira/browse/PIG-1530
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Swati Jain
Assignee: Swati Jain
Priority: Minor
 Fix For: 0.8.0


 Consider the following:
 {noformat}
 A = load 'any file' USING PigStorage(',') as (a1:int,a2:int,a3:int);
 B = load 'any file' USING PigStorage(',') as (b1:int,b2:int,b3:int);
 G = COGROUP A by (a1,a2) , B by (b1,b2);
 D = Filter G by group.$0 + 5  group.$1;
 explain D;
 {noformat}
 In the above example, LOFilter can be pushed above LOCogroup. Note there are 
 some tricky NULL issues to think about when the Cogroup is not of type INNER 
 (Similar to issues that need to be thought through when pushing LOFilter on 
 the right side of a LeftOuterJoin).
 Also note that typically the LOFilter in user programs will be below a 
 ForEach-Cogroup pair. To make this really useful, we need to also implement 
 LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1494) PIG Logical Optimization: Use CNF in PushUpFilter

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1494:



Unlinking from 0.8 since we are about to branch for release

 PIG Logical Optimization: Use CNF in PushUpFilter
 -

 Key: PIG-1494
 URL: https://issues.apache.org/jira/browse/PIG-1494
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Swati Jain
Assignee: Swati Jain
Priority: Minor

 The PushUpFilter rule is not able to handle complicated boolean expressions.
 For example, SplitFilter rule is splitting one LOFilter into two by AND. 
 However it will not be able to split LOFilter if the top level operator is 
 OR. For example:
 *ex script:*
 A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
 B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
 C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
 J1 = JOIN B by b1, C by c1;
 J2 = JOIN J1 by $0, A by a1;
 D = *Filter J2 by ( (c1  10) AND (a3+b3  10) ) OR (c2 == 5);*
 explain D;
 In the above example, the PushUpFilter is not able to push any filter 
 condition across any join as it contains columns from all branches (inputs). 
 But if we convert this expression into Conjunctive Normal Form (CNF) then 
 we would be able to push filter condition c1 10 and c2 == 5 below both join 
 conditions. Here is the CNF expression for highlighted line:
 ( (c1  10) OR (c2 == 5) ) AND ( (a3+b3  10) OR (c2 ==5) )
 *Suggestion:* It would be a good idea to convert LOFilter's boolean 
 expression into CNF, it would then be easy to push parts (conjuncts) of the 
 LOFilter boolean expression selectively. We would also not require rule 
 SplitFilter anymore if we were to add this utility to rule PushUpFilter 
 itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1506) Need to clarify the difference between null handling in JOIN and COGROUP

2010-08-31 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904785#action_12904785
 ] 

Olga Natkovich commented on PIG-1506:
-

This is what we need to document:

In the case of GROUP/COGROUP, the data with NULL key from the same input is 
grouped together. For instance:

Input data:

joe 5   2.5
sam 3.0
bob 3.5

script:

A = load 'small' as (name, age, gpa);
B = group A by age;
dump B;

Output:

(5,{(joe,5,2.5)})
(,{(sam,,3.0),(bob,,3.5)})

Note that both records with null age are grouped together.

However, data with null keys from different inputs is considered different and 
will generate multiple tuples in case of cogroup. For instance:

Input: Self cogroup on the same input.

Script:

A = load 'small' as (name, age, gpa);
B = load 'small' as (name, age, gpa);
C = cogroup A by age, B by age;
dump C;

Output:

(5,{(joe,5,2.5)},{(joe,5,2.5)})
(,{(sam,,3.0),(bob,,3.5)},{})
(,{},{(sam,,3.0),(bob,,3.5)})

Note that there are 2 tuples in the output corresponding to the null key: one 
that contains tuples from the first input (with no much from the second) and 
one the other way around.

JOIN adds another interesting twist to this because it follows SQL standard 
which means that JOIN by default represents inner join which through away all 
the nulls.

Input: the same as for COGROUP

Script:

A = load 'small' as (name, age, gpa);
B = load 'small' as (name, age, gpa);
C = join A by age, B by age;
dump C;

Output:

(joe,5,2.5,joe,5,2.5)

Note that all tuples that had NULL key got filtered out.


 Need to clarify the difference between null handling in JOIN and COGROUP
 

 Key: PIG-1506
 URL: https://issues.apache.org/jira/browse/PIG-1506
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1584) deal with inner cogroup

2010-08-31 Thread Olga Natkovich (JIRA)
deal with inner cogroup
---

 Key: PIG-1584
 URL: https://issues.apache.org/jira/browse/PIG-1584
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Fix For: 0.9.0


The current implementation of inner in case of cogroup is in conflict with 
join. We need to decide of whether to fix inner cogroup or just remove the 
functionality if it is not widely used

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1506) Need to clarify the difference between null handling in JOIN and COGROUP

2010-08-31 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904829#action_12904829
 ] 

Olga Natkovich commented on PIG-1506:
-

I verified that 0.8 code does deal correctly with multi-column keys with nulls

 Need to clarify the difference between null handling in JOIN and COGROUP
 

 Key: PIG-1506
 URL: https://issues.apache.org/jira/browse/PIG-1506
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1585) Add new properties to help and documentation

2010-08-31 Thread Olga Natkovich (JIRA)
Add new properties to help and documentation


 Key: PIG-1585
 URL: https://issues.apache.org/jira/browse/PIG-1585
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0


New properties:

Compression:

pig.tmpfilecompression, default to false, tells if the temporary files should 
be compressed or not. If true, then 
pig.tmpfilecompression.codec specifies which compression codec to use. 
Currently, PIG only accepts gz and lzo as possible values. Since LZO is 
under GPL license, Hadoop may need to be configured to use LZO codec. Please 
refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details. 

Combining small files:

pig.noSplitCombination - disables combining multiple small files to the block 
size


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1501) need to investigate the impact of compression on pig performance

2010-08-31 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904848#action_12904848
 ] 

Olga Natkovich commented on PIG-1501:
-

Ashutosh,

The reason it is off by default is because the default compression is gzip 
which is really slow and most of the time not what you want. Because of the 
licensing issue with lzo, users need to setup it on their own. Once they do the 
setup, they can enable the compression.

 need to investigate the impact of compression on pig performance
 

 Key: PIG-1501
 URL: https://issues.apache.org/jira/browse/PIG-1501
 Project: Pig
  Issue Type: Test
Reporter: Olga Natkovich
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: compress_perf_data.txt, compress_perf_data_2.txt, 
 PIG-1501.patch, PIG-1501.patch, PIG-1501.patch


 We would like to understand how compressing map results as well as well as 
 reducer output in a chain of MR jobs impacts performance. We can use PigMix 
 queries for this investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1586) Parameter subsitution using -param option runs into problems when substituing entire pig statements in a shell script (maybe this is a bash problem)

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1586:
---

Assignee: Viraj Bhat

Viraj volunteered to print the line that pig gets as part of parameter 
substitution to see if the escapes and quotes are eaten by the shell. Thanks 
Viraj

 Parameter subsitution using -param option runs into problems when substituing 
 entire pig statements in a shell script (maybe this is a bash problem)
 

 Key: PIG-1586
 URL: https://issues.apache.org/jira/browse/PIG-1586
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Viraj Bhat
Assignee: Viraj Bhat

 I have a Pig script as a template:
 {code}
 register Countwords.jar;
 A = $INPUT;
 B = FOREACH A GENERATE
 examples.udf.SubString($0,0,1),
 $1 as num;
 C = GROUP B BY $0;
 D = FOREACH C GENERATE group, SUM(B.num);
 STORE D INTO $OUTPUT;
 {code}
 I attempt to do Parameter substitutions using the following:
 Using Shell script:
 {code}
 #!/bin/bash
 java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r 
 -file sub.pig \
  -param INPUT=(foreach (COGROUP(load '/user/viraj/dataset1' 
 USING PigStorage() AS (word:chararray,num:int)) by (word),(load 
 '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by 
 (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2))) \
  -param OUTPUT=\'/user/viraj/output\' USING PigStorage()
 {code}
 {code}
 register Countwords.jar;
 A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS 
 (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING 
 PigStorage() AS (word:chararray,num:int)) by (word)) generate 
 flatten(examples.udf.CountWords(runsub.sh,,)));
 B = FOREACH A GENERATE
 examples.udf.SubString($0,0,1),
 $1 as num;
 C = GROUP B BY $0;
 D = FOREACH C GENERATE group, SUM(B.num);
 STORE D INTO /user/viraj/output;
 {code}
 The shell substitutes the $0 before passing it to java. 
 a) Is there a workaround for this?  
 b) Is this is Pig param problem?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1588) Parameter pre-processing of values containing pig positional variables ($0, $1 etc)

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1588.
-

Resolution: Duplicate

This is duplicate of https://issues.apache.org/jira/browse/PIG-1586 and at this 
point we do not believe that either is a bug in pig. Viraj is verifying that 
but we think that shell removes the escapes before giving it to Pig

 Parameter pre-processing of values containing pig positional variables ($0, 
 $1 etc)
 ---

 Key: PIG-1588
 URL: https://issues.apache.org/jira/browse/PIG-1588
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Laukik Chitnis
 Fix For: 0.7.0


 Pig 0.7 requires the positional variables to be escaped by a \\ when passed 
 as part of a parameter value (either through cmd line param or through 
 param_file), which was not the case in Pig 0.6 Assuming that this was not an 
 intended breakage of backward compatibility (could not find it in release 
 notes), this would be a bug.
 For example, We need to pass
 INPUT=CountWords(\\$0,\\$1,\\$2)
 instead of simply
 INPUT=CountWords($0,$1,$2)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1537) Column pruner causes wrong results when using both Custom Store UDF and PigStorage

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1537.
-

Resolution: Fixed

 Column pruner causes wrong results when using both Custom Store UDF and 
 PigStorage
 --

 Key: PIG-1537
 URL: https://issues.apache.org/jira/browse/PIG-1537
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.8.0


 I have script which is of this pattern and it uses 2 StoreFunc's:
 {code}
 register loader.jar
 register piggy-bank/java/build/storage.jar;
 %DEFAULT OUTPUTDIR /user/viraj/prunecol/
 ss_sc_0 = LOAD '/data/click/20100707/0' USING Loader() AS (a, b, c);
 ss_sc_filtered_0 = FILTER ss_sc_0 BY
 a#'id' matches '1.*' OR
 a#'id' matches '2.*' OR
 a#'id' matches '3.*' OR
 a#'id' matches '4.*';
 ss_sc_1 = LOAD '/data/click/20100707/1' USING Loader() AS (a, b, c);
 ss_sc_filtered_1 = FILTER ss_sc_1 BY
 a#'id' matches '65.*' OR
 a#'id' matches '466.*' OR
 a#'id' matches '043.*' OR
 a#'id' matches '044.*' OR
 a#'id' matches '0650.*' OR
 a#'id' matches '001.*';
 ss_sc_all = UNION ss_sc_filtered_0,ss_sc_filtered_1;
 ss_sc_all_proj = FOREACH ss_sc_all GENERATE
 a#'query' as query,
 a#'testid' as testid,
 a#'timestamp' as timestamp,
 a,
 b,
 c;
 ss_sc_all_ord = ORDER ss_sc_all_proj BY query,testid,timestamp PARALLEL 10;
 ss_sc_all_map = FOREACH ss_sc_all_ord  GENERATE a, b, c;
 STORE ss_sc_all_map INTO '$OUTPUTDIR/data/20100707' using Storage();
 ss_sc_all_map_count = group ss_sc_all_map all;
 count = FOREACH ss_sc_all_map_count GENERATE 'record_count' as 
 record_count,COUNT($1);
 STORE count INTO '$OUTPUTDIR/count/20100707' using PigStorage('\u0009');
 {code}
 I run this script using:
 a) java -cp pig0.7.jar script.pig
 b) java -cp pig0.7.jar -t PruneColumns script.pig
 What I observe is that the alias count produces the same number of records 
 but ss_sc_all_map have different sizes when run with above 2 options.
 Is due to the fact that there are 2 store func's used?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-747:
---

Fix Version/s: 0.9.0
   (was: 0.8.0)

 Logical to Physical Plan Translation fails when temporary alias are created 
 within foreach
 --

 Key: PIG-747
 URL: https://issues.apache.org/jira/browse/PIG-747
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.9.0

 Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch


 Consider a the pig script which calculates a new column F inside the foreach 
 as:
 {code}
 A = load 'physicalplan.txt' as (col1,col2,col3);
 B = foreach A {
D = col1/col2;
E = col3/col2;
F = E - (D*D);
generate
F as newcol;
 };
 dump B;
 {code}
 This gives the following error:
 ===
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
  ERROR 2015: Invalid physical operators in the physical plan
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
 at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
 ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give 
 operator of type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
  multiple outputs.  This operator does not support multiple outputs.
 at 
 org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
 ... 19 more
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1319) New logical optimization rules

2010-08-31 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1319:


Fix Version/s: 0.9.0
   (was: 0.8.0)

 New logical optimization rules
 --

 Key: PIG-1319
 URL: https://issues.apache.org/jira/browse/PIG-1319
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.9.0


 In [PIG-1178|https://issues.apache.org/jira/browse/PIG-1178], we build a new 
 logical optimization framework. One design goal for the new logical optimizer 
 is to make it easier to add new logical optimization rules. In this Jira, we 
 keep track of the development of these new logical optimization rules.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904450#action_12904450
 ] 

Olga Natkovich commented on PIG-1563:
-

Dmitry, thanks for the review. I did not discard your function - it was part of 
the patch. I did not change the code to use it just because I already finished 
testing the changes and did not have time to redo the code.

I am fixing some javadoc and release audit failures and will commit the code 
shortly.

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904462#action_12904462
 ] 

Olga Natkovich commented on PIG-1563:
-

 +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 13 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]


 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904467#action_12904467
 ] 

Olga Natkovich commented on PIG-1563:
-

I made one additional change and renamed SPLIT into STRSPLIT to avoid conflict 
with SPLIT operator

 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1563:


Attachment: PIG_1563_v3.patch

latest patch

 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch, PIG_1563_v3.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1563:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

patch committed. Thanks Dmitry for the help and review

 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch, PIG_1563_v3.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903578#action_12903578
 ] 

Olga Natkovich commented on PIG-1563:
-

I was able to make it successfully working (without wrapping) for the functions 
that have fixed number of arguments:

LAST_INDEX_OF
REPLACE
TRIM

I don't believe there is currently a way to make it work with variable number 
of args (even if the number of combinations is fixed.) Moreover, if we add the 
mapping table in this case, it breaks the case of typed data which is bad. This 
is the case with the remaining functions - INDEXOF and SPLIT.

So my suggestion is only to fix the first set of function and delay the rest to 
0.9 when we fix the mapping code.

Dmitry and others, are you ok with this? If so, I can update the patch to 
reflect this.




 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1502) Document and track system limits

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1502:


Fix Version/s: 0.9.0
   (was: 0.8.0)

 Document and track system limits
 

 Key: PIG-1502
 URL: https://issues.apache.org/jira/browse/PIG-1502
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.9.0


 We need to be able to publsih what system limitations are to make sure that 
 Pig is used in the way it was intended and tested. For instance, if you 
 combine 30 joins in a single MR job (via multiquery) this might not work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903581#action_12903581
 ] 

Olga Natkovich commented on PIG-1150:
-

Dmitry, are you planning to add unit tests? Do we still want this in for 0.8? 
(Since it is going into piggybank, we can do this post branching but then we 
need to test in 2 places.)

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1549) Provide utility to construct CNF form of predicates

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903591#action_12903591
 ] 

Olga Natkovich commented on PIG-1549:
-

I don't think this patch applies. can you regenerate the patch with svn diff 
from the latest code and also add unit tests, thanks

 Provide utility to construct CNF form of predicates
 ---

 Key: PIG-1549
 URL: https://issues.apache.org/jira/browse/PIG-1549
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: 0001-Add-CNF-utility-class.patch


 Provide utility to construct CNF form of predicates

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1494) PIG Logical Optimization: Use CNF in PushUpFilter

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903593#action_12903593
 ] 

Olga Natkovich commented on PIG-1494:
-

Can this be moved from 0.8 to 0.9 release since we are about to branch for 0.9?

 PIG Logical Optimization: Use CNF in PushUpFilter
 -

 Key: PIG-1494
 URL: https://issues.apache.org/jira/browse/PIG-1494
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Swati Jain
Assignee: Swati Jain
Priority: Minor
 Fix For: 0.8.0


 The PushUpFilter rule is not able to handle complicated boolean expressions.
 For example, SplitFilter rule is splitting one LOFilter into two by AND. 
 However it will not be able to split LOFilter if the top level operator is 
 OR. For example:
 *ex script:*
 A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
 B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
 C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
 J1 = JOIN B by b1, C by c1;
 J2 = JOIN J1 by $0, A by a1;
 D = *Filter J2 by ( (c1  10) AND (a3+b3  10) ) OR (c2 == 5);*
 explain D;
 In the above example, the PushUpFilter is not able to push any filter 
 condition across any join as it contains columns from all branches (inputs). 
 But if we convert this expression into Conjunctive Normal Form (CNF) then 
 we would be able to push filter condition c1 10 and c2 == 5 below both join 
 conditions. Here is the CNF expression for highlighted line:
 ( (c1  10) OR (c2 == 5) ) AND ( (a3+b3  10) OR (c2 ==5) )
 *Suggestion:* It would be a good idea to convert LOFilter's boolean 
 expression into CNF, it would then be easy to push parts (conjuncts) of the 
 LOFilter boolean expression selectively. We would also not require rule 
 SplitFilter anymore if we were to add this utility to rule PushUpFilter 
 itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1542) log level not propogated to MR task loggers

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1542:
---

Assignee: niraj rai

This will be looked at after the branch since this is a regression and we don't 
have time to do it now.

 log level not propogated to MR task loggers
 ---

 Key: PIG-1542
 URL: https://issues.apache.org/jira/browse/PIG-1542
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: niraj rai
 Fix For: 0.8.0


 Specifying -d DEBUG does not affect the logging of the MR tasks .
 This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1543:
---

Assignee: Daniel Dai

Daniel can you check if this is related to limit optimizer and if it was 
addressed with new optimizer. (This can be done post branch since it is a bug 
split.)

 IsEmpty returns the wrong value after using LIMIT
 -

 Key: PIG-1543
 URL: https://issues.apache.org/jira/browse/PIG-1543
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu
Assignee: Daniel Dai
 Fix For: 0.8.0


 1. Two input files:
 1a: limit_empty.input_a
 1
 1
 1
 1b: limit_empty.input_b
 2
 2
 2.
 The pig script: limit_empty.pig
 -- A contains only 1's  B contains only 2's
 A = load 'limit_empty.input_a' as (a1:int);
 B = load 'limit_empty.input_a' as (b1:int);
 C =COGROUP A by a1, B by b1;
 D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
 COUNT(B);
 store D into 'limit_empty.output/d';
 -- After the script done, we see the right results:
 -- {(1),(1),(1)}   {}  1   0   3   0
 -- {} {(2),(2)}  0   1   0   2
 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
 D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
 0:1), COUNT(Alim), COUNT(Blim);
 store D1 into 'limit_empty.output/d1';
 -- After the script done, we see the unexpected results:
 -- {(1)}   {}1   1   1   0
 -- {}  {(2)} 1   1   0   1
 dump D;
 dump D1;
 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
 The major one:
 IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
 IsEmpty() returns correctly in limit_empty.output/d/*.
 The difference is that one has been applied with LIMIT before using 
 IsEmpty().
 The minor one:
 The redirected output only contains the first dump:
 ({(1),(1),(1)},{},1,0,3L,0L)
 ({},{(2),(2)},0,1,0L,2L)
 We expect two more lines like:
 ({(1)},{},1,1,1L,0L)
 ({},{(2)},1,1,0L,1L)
 Besides, there is error says:
 [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1567) Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1567:
---

Assignee: Xuefu Zhang

 Optimization rule FilterAboveForeach is too restrictive and doesn't handle 
 project * correctly
 --

 Key: PIG-1567
 URL: https://issues.apache.org/jira/browse/PIG-1567
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


 FilterAboveForeach rule is to optimize the plan by pushing up filter above 
 previous foreach operator. However, during code review, two major problems 
 were found:
 1. Current implementation assumes that if no projection is found in the 
 filter condition then all columns from foreach are projected. This issue 
 prevents the following optimization:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY 8  5;
   STORE C INTO 'empty';
 2. Current implementation doesn't handle * probjection, which means project 
 all columns. As a result, it wasn't able to optimize the following:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY Identity.class.getName(*)  5;
   STORE C INTO 'empty';
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1570:
---

Assignee: Thejas M Nair

 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1572) change default datatype when relations are used as scalar to bytearray

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1572:
---

Assignee: Thejas M Nair

 change default datatype when relations are used as scalar to bytearray
 --

 Key: PIG-1572
 URL: https://issues.apache.org/jira/browse/PIG-1572
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 When relations are cast to scalar, the current default type is chararray. 
 This is inconsistent with the behavior in rest of pig-latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903637#action_12903637
 ] 

Olga Natkovich commented on PIG-1150:
-

So should we unlink this from the release?

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903640#action_12903640
 ] 

Olga Natkovich commented on PIG-1563:
-

which JIRA is that?

I will just get this in - I think that's all I have time today but I can look 
at the other one as well next week

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1150:


Fix Version/s: 0.9.0
   (was: 0.8.0)

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.9.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-529) Want support for loading CSV files

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-529.


Resolution: Duplicate

This is duplicate of PIG-1555 which has been resolved for Pig 0.8

 Want support for loading CSV files
 --

 Key: PIG-529
 URL: https://issues.apache.org/jira/browse/PIG-529
 Project: Pig
  Issue Type: New Feature
  Components: data
Reporter: Tom White

 Want to be able to load CSV data into Pig. This needs to handle quoting 
 correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-771) PigDump does not properly output Chinese UTF8 characters - they are displayed as question marks ??

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-771.


Fix Version/s: 0.7.0
   Resolution: Fixed

PigDump is no longer supported

 PigDump does not properly output Chinese UTF8 characters - they are displayed 
 as question marks ??
 --

 Key: PIG-771
 URL: https://issues.apache.org/jira/browse/PIG-771
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz
 Fix For: 0.7.0


 PigDump does not properly output Chinese UTF8 characters.
 The reason for this is that the function Tuple.toString() is called.
 DefaultTuple implements Tuple.toString() and it calls Object.toString() on 
 the opaque object d.
 Instead, I think that the code should be changed instead to call the new 
 DataType.toString() function.
 {code}
 @Override
 public String toString() {
 StringBuilder sb = new StringBuilder();
 sb.append('(');
 for (IteratorObject it = mFields.iterator(); it.hasNext();) {
 Object d = it.next();
 if(d != null) {
 if(d instanceof Map) {
 sb.append(DataType.mapToString((MapObject, Object)d));
 } else {
 sb.append(DataType.toString(d));  //  Change this one 
 line
 if(d instanceof Long) {
 sb.append(L);
 } else if(d instanceof Float) {
 sb.append(F);
 }
 }
 } else {
 sb.append();
 }
 if (it.hasNext())
 sb.append(,);
 }
 sb.append(')');
 return sb.toString();
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1577) support to variable number of arguments in UDF

2010-08-27 Thread Olga Natkovich (JIRA)
support to variable number of arguments in UDF
--

 Key: PIG-1577
 URL: https://issues.apache.org/jira/browse/PIG-1577
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
 Fix For: 0.9.0


In the current implementation, functionality that allows to map arguments to 
classes does not support functions with variable number of arguments. Also it 
does not support funtions that can have variable (but fixed in number) number 
of arguments. 

This causes problems for string UDFs such as CONCAT that can take an arbitrary 
number of arguments or TRIM that can take 1,2, or 3 arguments

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1563:


Attachment: PIG_1563_v2.patch

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903744#action_12903744
 ] 

Olga Natkovich commented on PIG-1563:
-

Uploaded new patch which does the following:

(1) Adds mapping function for functions with fixed number of arguments: 
SUBSTRING, LAST_INDEX_OF, REPLACE,TRIM
(2) Left the rest of the functions alone which means that until 0.9 they will 
only work on typed data. CONCAT is in the same category
(3) Re-used applicable tests that Dmitry create, thanks!
(3) Added a couple of e2e tests to make sure that we test the mapping function 
as well

Please, review. 

We will keep the open till we address (2) in 0.9.



 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1562) Fix the version for the dependent packages for the maven

2010-08-24 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1562:


Fix Version/s: 0.8.0

 Fix the version for the dependent packages for the maven 
 -

 Key: PIG-1562
 URL: https://issues.apache.org/jira/browse/PIG-1562
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Fix For: 0.8.0


 We need to fix the set version so that, version is properly set for the 
 dependent packages in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1560) Build target 'checkstyle' fails

2010-08-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901975#action_12901975
 ] 

Olga Natkovich commented on PIG-1560:
-

please, commit

 Build target 'checkstyle' fails
 ---

 Key: PIG-1560
 URL: https://issues.apache.org/jira/browse/PIG-1560
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Richard Ding
Assignee: Giridharan Kesavan
 Fix For: 0.8.0

 Attachments: pig-1560.patch


 Stack trace:
 {code}
 /trunk/build.xml:894: java.lang.NoClassDefFoundError: 
 org/apache/commons/logging/LogFactory
 at 
 org.apache.commons.beanutils.ConvertUtilsBean.init(ConvertUtilsBean.java:130)
 at 
 com.puppycrawl.tools.checkstyle.api.AutomaticBean.createBeanUtilsBean(AutomaticBean.java:73)
 at 
 com.puppycrawl.tools.checkstyle.api.AutomaticBean.contextualize(AutomaticBean.java:222)
 at 
 com.puppycrawl.tools.checkstyle.CheckStyleTask.createChecker(CheckStyleTask.java:372)
 at 
 com.puppycrawl.tools.checkstyle.CheckStyleTask.realExecute(CheckStyleTask.java:304)
 at 
 com.puppycrawl.tools.checkstyle.CheckStyleTask.execute(CheckStyleTask.java:265)
 at 
 org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.Target.execute(Target.java:390)
 at org.apache.tools.ant.Target.performTasks(Target.java:411)
 at 
 org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360)
 at org.apache.tools.ant.Project.executeTarget(Project.java:1329)
 at 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
 at org.apache.tools.ant.Project.executeTargets(Project.java:1212)
 at org.apache.tools.ant.Main.runBuild(Main.java:801)
 at org.apache.tools.ant.Main.startAnt(Main.java:218)
 at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
 at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.commons.logging.LogFactory
 at 
 org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1386)
 at 
 org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1336)
 at 
 org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1074)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 ... 22 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1559) Several things stated in Pig philosophy page are out of date

2010-08-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901979#action_12901979
 ] 

Olga Natkovich commented on PIG-1559:
-

Looks like limit issue I was seeing has been addressed in the latest trunk. 

I think we need to add unit tests to catch this things in the future.

 Several things stated in Pig philosophy page are out of date
 

 Key: PIG-1559
 URL: https://issues.apache.org/jira/browse/PIG-1559
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1559.patch


 The Pig philosophy page says several things that are no longer true (such as 
 that Pig does not have an optimizer (it does now), that we someday hope to 
 support streaming (we already do), that we some day hope to control splits 
 (we don't, we just use what Hadoop gives us now)).  These need to be updated 
 to reflect the current situation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1559) Several things stated in Pig philosophy page are out of date

2010-08-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901984#action_12901984
 ] 

Olga Natkovich commented on PIG-1559:
-

sorry, wrong JIRA

 Several things stated in Pig philosophy page are out of date
 

 Key: PIG-1559
 URL: https://issues.apache.org/jira/browse/PIG-1559
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1559.patch


 The Pig philosophy page says several things that are no longer true (such as 
 that Pig does not have an optimizer (it does now), that we someday hope to 
 support streaming (we already do), that we some day hope to control splits 
 (we don't, we just use what Hadoop gives us now)).  These need to be updated 
 to reflect the current situation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1557) couple of issue mapping aliases to jobs

2010-08-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901985#action_12901985
 ] 

Olga Natkovich commented on PIG-1557:
-

Looks like limit issue I was seeing has been addressed in the latest trunk. 

I think we need to add unit tests to catch this things in the future.



 couple of issue mapping aliases to jobs
 ---

 Key: PIG-1557
 URL: https://issues.apache.org/jira/browse/PIG-1557
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1557.patch


 I have a simple script:
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B generate group, COUNT(A);
 D = order C by $1;
 E = limit D 10;
 dump E;
 I noticed a couple of issues with alias to job mapping: neither load(A) nor 
 limit(E) shows in the output

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1563) SUBSTRING function is broken

2010-08-24 Thread Olga Natkovich (JIRA)
SUBSTRING function is broken


 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Yan Zhou
 Fix For: 0.8.0


Script:

A = load 'studenttab10k' as (name, age, gpa);
C = foreach A generate SUBSTRING(name, 0,5);
E = limit C 10;
dump E;

Output is always empty:

()
()
()
()
()
()
()
()
()
()


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902211#action_12902211
 ] 

Olga Natkovich commented on PIG-1563:
-

The same needs to be done (and we need unit tests) for the following string 
manipulation functions:

INDEXOF
LAST_INDEX_OF
REPLACE
SPLIT
TRIM

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Yan Zhou
 Fix For: 0.8.0


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-908) Need a way to correlate MR jobs with Pig statements

2010-08-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-908:
---


With Pig 0.8.0 we print a summary of the execution that contains (among other 
things) how aliases mapped to jobs. Example:

JobId   MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201004271216_12712  1   1   3   3   3   12  12  
12  B,C GROUP_BY,COMBINER
job_201004271216_12713  1   1   3   3   3   12  12  
12  D   SAMPLER
job_201004271216_12714  1   1   3   3   3   12  12  
12  D   ORDER_BY,COMBINER   
hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp743703298/tmp-2019944040,


 Need a way to correlate MR jobs with Pig statements
 ---

 Key: PIG-908
 URL: https://issues.apache.org/jira/browse/PIG-908
 Project: Pig
  Issue Type: Wish
Reporter: Dmitriy V. Ryaboy
Assignee: Richard Ding
 Fix For: 0.8.0


 Complex Pig Scripts often generate many Map-Reduce jobs, especially with the 
 recent introduction of multi-store capabilities.
 For example, the first script in the Pig tutorial produces 5 MR jobs.
 There is currently very little support for debugging resulting jobs; if one 
 of the MR jobs fails, it is hard to figure out which part of the script it 
 was responsible for. Explain plans help, but even with the explain plan, a 
 fair amount of effort (and sometimes, experimentation) is required to 
 correlate the failing MR job with the corresponding PigLatin statements.
 This ticket is created to discuss approaches to alleviating this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1488) Make HDFS temp dir configurable

2010-08-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1488:


Release Note: Pig stores intermediate data generated between MR jobs in a 
temp location on HDFS. In Pig 0.8.0 this location is configurable by using 
pig.temp.dir property. The default is /tmp which is the same as hardcoded 
location in Pig 0.7.0 and earlier versions

 Make HDFS temp dir configurable
 ---

 Key: PIG-1488
 URL: https://issues.apache.org/jira/browse/PIG-1488
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.8.0


 Currently it is hardcoded to /tmp. It should be made into a property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1484) BinStorage should support comma seperated path

2010-08-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1484:


Release Note: 
In Pig 0.7.0 only a single location is supported as input to BinStorage. (This 
location can be a file, a directory or a glob). With Pig 0.8.0 we are making 
BinSTorage  (similar to PigStorage) support a list of locations.

Example:

a = load '1.bin,2.bin' using BinStorage();



 BinStorage should support comma seperated path
 --

 Key: PIG-1484
 URL: https://issues.apache.org/jira/browse/PIG-1484
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1484-1.patch, PIG-1484-2.patch, PIG-1484-3.patch


 BinStorage does not take comma seperated path. The following script fail:
 a = load '1.bin,2.bin' using BinStorage();
 dump a;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1557) couple of issue mapping aliases to jobs

2010-08-23 Thread Olga Natkovich (JIRA)
couple of issue mapping aliases to jobs
---

 Key: PIG-1557
 URL: https://issues.apache.org/jira/browse/PIG-1557
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Richard Ding


I have a simple script:

A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
B = group A by name;
C = foreach B generate group, COUNT(A);
D = order C by $1;
E = limit D 10;
dump E;

I noticed a couple of issues with alias to job mapping: neither load(A) nor 
limit(E) shows in the output


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1447) Tune memory usage of InternalCachedBag

2010-08-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901576#action_12901576
 ] 

Olga Natkovich commented on PIG-1447:
-

This is probably the smallest patch I have reviewed recently :). +1

 Tune memory usage of InternalCachedBag
 --

 Key: PIG-1447
 URL: https://issues.apache.org/jira/browse/PIG-1447
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: L15_modified.pig, L15_modified2.pig, PIG-1447.1.patch


 We need to find a better value for pig.cachedbag.memusage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1354) UDFs for dynamic invocation of simple Java methods

2010-08-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901577#action_12901577
 ] 

Olga Natkovich commented on PIG-1354:
-

Dmitry, Could you add release notes on how to use this?

 UDFs for dynamic invocation of simple Java methods
 --

 Key: PIG-1354
 URL: https://issues.apache.org/jira/browse/PIG-1354
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG-1354.patch, PIG-1354.patch, PIG-1354.patch


 The need to create wrapper UDFs for simple Java functions creates unnecessary 
 work for Pig users, slows down the development process, and produces a lot of 
 trivial classes. We can use Java's reflection to allow invoking a number of 
 methods on the fly, dynamically, by creating a generic UDF to accomplish this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   8   9   10   >