from:"Daniel Dai $JIRA$"

[jira] Commented: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY

2010-10-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916998#action_12916998
 ] 

Daniel Dai commented on PIG-1659:
-

We should set sortInfo after optimization. So we should add SetSortInfo after 
the optimization of new logical plan. This code is missing.

 sortinfo is not set for store if there is a filter after ORDER BY
 -

 Key: PIG-1659
 URL: https://issues.apache.org/jira/browse/PIG-1659
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Daniel Dai
 Fix For: 0.8.0


 This has caused 6 (of 7) failures in the Zebra test 
 TestOrderPreserveVariableTable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY

2010-10-01 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1659:


Attachment: PIG-1659-1.patch

 sortinfo is not set for store if there is a filter after ORDER BY
 -

 Key: PIG-1659
 URL: https://issues.apache.org/jira/browse/PIG-1659
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1659-1.patch


 This has caused 6 (of 7) failures in the Zebra test 
 TestOrderPreserveVariableTable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1542) log level not propogated to MR task loggers

2010-10-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917079#action_12917079
 ] 

Daniel Dai commented on PIG-1542:
-

Yes, -d xxx should treat as -Ddebug=xxx. And system properties already have 
higher priority in the current code. (And in my mind, we should deprecate -d in 
favor of -Ddebug)

 log level not propogated to MR task loggers
 ---

 Key: PIG-1542
 URL: https://issues.apache.org/jira/browse/PIG-1542
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch


 Specifying -d DEBUG does not affect the logging of the MR tasks .
 This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-30 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916725#action_12916725
 ] 

Daniel Dai commented on PIG-1638:
-

+1

 sh output gets mixed up with the grunt prompt
 -

 Key: PIG-1638
 URL: https://issues.apache.org/jira/browse/PIG-1638
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.8.0
Reporter: niraj rai
Assignee: niraj rai
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1638_0.patch


 Many times, the grunt prompt gets mixed up with the sh output.e.g.
 grunt sh ls
 000
 autocomplete
 bin
 build
 build.xml
 grunt CHANGES.txt
 conf
 contrib
 In the above case,  grunt is mixed up with the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-30 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1638:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 sh output gets mixed up with the grunt prompt
 -

 Key: PIG-1638
 URL: https://issues.apache.org/jira/browse/PIG-1638
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.8.0
Reporter: niraj rai
Assignee: niraj rai
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1638_0.patch


 Many times, the grunt prompt gets mixed up with the sh output.e.g.
 grunt sh ls
 000
 autocomplete
 bin
 build
 build.xml
 grunt CHANGES.txt
 conf
 contrib
 In the above case,  grunt is mixed up with the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug

2010-09-28 Thread Daniel Dai (JIRA)

TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to 
estimateNumberOfReducers bug


 Key: PIG-1652
 URL: https://issues.apache.org/jira/browse/PIG-1652
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0


TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the 
input size estimation. Here is the stack of TestSortedTableUnionMergeJoin:

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store 
alias records3
at org.apache.pig.PigServer.storeEx(PigServer.java:877)
at org.apache.pig.PigServer.store(PigServer.java:815)
at org.apache.pig.PigServer.openIterator(PigServer.java:727)
at 
org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
Unexpected error during execution.
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197)
at org.apache.pig.PigServer.storeEx(PigServer.java:873)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Illegal character in scheme name at index 69: 
org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.init(Path.java:126)
at org.apache.hadoop.fs.Path.init(Path.java:50)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966)
at 
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at 
index 69: 
org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file:
at java.net.URI$Parser.fail(URI.java:2809)
at java.net.URI$Parser.checkChars(URI.java:2982)
at java.net.URI$Parser.parse(URI.java:3009)
at java.net.URI.init(URI.java:736)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)

The reason is we are trying to

[jira] Created: (PIG-1653) Scripting UDF fails if the path to script is an absolute path

2010-09-28 Thread Daniel Dai (JIRA)

Scripting UDF fails if the path to script is an absolute path
-

 Key: PIG-1653
 URL: https://issues.apache.org/jira/browse/PIG-1653
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0


The following script fail:
{code}
register '/homes/jianyong/pig/aaa/scriptingudf.py' using jython as myfuncs;
a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() as 
(name, age, gpa:double);
b = foreach a generate myfuncs.square(gpa);
dump b;
{code}

If we change the register to use relative path (such as aaa/scriptingudf.py), 
it success.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915880#action_12915880
 ] 

Daniel Dai commented on PIG-1637:
-

test-patch result for PIG-1637-2.patch:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 Combiner not use because optimizor inserts a foreach between group and 
 algebric function
 

 Key: PIG-1637
 URL: https://issues.apache.org/jira/browse/PIG-1637
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1637-1.patch, PIG-1637-2.patch


 The following script does not use combiner after new optimization change.
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 This is because after group, optimizer detect group key is not used 
 afterward, it add a foreach statement after C. This is how it looks like 
 after optimization:
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 C1 = foreach C generate B;
 D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 That cancel the combiner optimization for D. 
 The way to solve the issue is to merge the C1 we inserted and D. Currently, 
 we do not merge these two foreach. The reason is that one output of the first 
 foreach (B) is referred twice in D, and currently rule assume after merge, we 
 need to calculate B twice in D. Actually, C1 is only doing projection, no 
 calculation of B. Merging C1 and D will not result calculating B twice. So C1 
 and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915941#action_12915941
 ] 

Daniel Dai commented on PIG-1579:
-

Rollback the change and run test many times, all tests pass. Seems some change 
between r990721 and now (r1002348) fix this issue. Will rollback the change and 
close the Jira.

 Intermittent unit test failure for 
 TestScriptUDF.testPythonScriptUDFNullInputOutput
 ---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1579-1.patch


 Error message:
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error 
 executing function: Traceback (most recent call last):
   File iostream, line 5, in multStr
 TypeError: can't multiply sequence by non-int of type 'NoneType'
 at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915950#action_12915950
 ] 

Daniel Dai commented on PIG-1637:
-

Yes, it could be improved as per Xuefu's suggestion. Anyway, current patch 
solve the combiner not used issue, will commit this part first. I will open 
another Jira to improve it. Also, MergeForEach is a best example to practice 
cloning framework [PIG-1587|https://issues.apache.org/jira/browse/PIG-1587], so 
it is better to improve it once PIG-1587 is available.

 Combiner not use because optimizor inserts a foreach between group and 
 algebric function
 

 Key: PIG-1637
 URL: https://issues.apache.org/jira/browse/PIG-1637
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1637-1.patch, PIG-1637-2.patch


 The following script does not use combiner after new optimization change.
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 This is because after group, optimizer detect group key is not used 
 afterward, it add a foreach statement after C. This is how it looks like 
 after optimization:
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 C1 = foreach C generate B;
 D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 That cancel the combiner optimization for D. 
 The way to solve the issue is to merge the C1 we inserted and D. Currently, 
 we do not merge these two foreach. The reason is that one output of the first 
 foreach (B) is referred twice in D, and currently rule assume after merge, we 
 need to calculate B twice in D. Actually, C1 is only doing projection, no 
 calculation of B. Merging C1 and D will not result calculating B twice. So C1 
 and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1651) PIG class loading mishandled

2010-09-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915959#action_12915959
 ] 

Daniel Dai commented on PIG-1651:
-

+1

 PIG class loading mishandled
 

 Key: PIG-1651
 URL: https://issues.apache.org/jira/browse/PIG-1651
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1651.patch


 If just having zebra.jar as being registered in a PIG script but not in the 
 CLASSPATH, the query using zebra fails since there appear to be multiple 
 classes loaded into JVM, causing static variable set previously not seen 
 after one instance of the class is created through reflection. (After the 
 zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is 
 as follows:
 ackend error message during job submission
 ---
 org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
 create input splits for: hdfs://hostname/pathto/zebra_dir :: null
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284)
 at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123)
 at 
 org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413)
 at 
 org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718)
 at 
 org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084)
 at 
 org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919)
 at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866)
 at 
 org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780)
 at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863)
 at 
 org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017)
 at 
 org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
 ... 7 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-28 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1637.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

All tests pass except for TestSortedTableUnion / TestSortedTableUnionMergeJoin 
for zebra, which are already fail and will be addressed by 
[PIG-1649|https://issues.apache.org/jira/browse/PIG-1649].

Patch committed to both trunk and 0.8 branch.

 Combiner not use because optimizor inserts a foreach between group and 
 algebric function
 

 Key: PIG-1637
 URL: https://issues.apache.org/jira/browse/PIG-1637
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1637-1.patch, PIG-1637-2.patch


 The following script does not use combiner after new optimization change.
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 This is because after group, optimizer detect group key is not used 
 afterward, it add a foreach statement after C. This is how it looks like 
 after optimization:
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 C1 = foreach C generate B;
 D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 That cancel the combiner optimization for D. 
 The way to solve the issue is to merge the C1 we inserted and D. Currently, 
 we do not merge these two foreach. The reason is that one output of the first 
 foreach (B) is referred twice in D, and currently rule assume after merge, we 
 need to calculate B twice in D. Actually, C1 is only doing projection, no 
 calculation of B. Merging C1 and D will not result calculating B twice. So C1 
 and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1647) Logical simplifier throws a NPE

2010-09-27 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915365#action_12915365
 ] 

Daniel Dai commented on PIG-1647:
-

+1. Please commit.

 Logical simplifier throws a NPE
 ---

 Key: PIG-1647
 URL: https://issues.apache.org/jira/browse/PIG-1647
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1647.patch, PIG-1647.patch


 A query like:
 A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray);
 B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' 
 and ((d is not null and d != '') or (e is not null and e != ''));
 will cause the logical expression simplifier to throw a NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1637:


Attachment: PIG-1637-1.patch

 Combiner not use because optimizor inserts a foreach between group and 
 algebric function
 

 Key: PIG-1637
 URL: https://issues.apache.org/jira/browse/PIG-1637
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1637-1.patch


 The following script does not use combiner after new optimization change.
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 This is because after group, optimizer detect group key is not used 
 afterward, it add a foreach statement after C. This is how it looks like 
 after optimization:
 {code}
 A = load ':INPATH:/pigmix/page_views' using 
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
 as (user, action, timespent, query_term, ip_addr, timestamp, 
 estimated_revenue, page_info, page_links);
 B = foreach A generate user, (int)timespent as timespent, 
 (double)estimated_revenue as estimated_revenue;
 C = group B all; 
 C1 = foreach C generate B;
 D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
 store D into ':OUTPATH:';
 {code}
 That cancel the combiner optimization for D. 
 The way to solve the issue is to merge the C1 we inserted and D. Currently, 
 we do not merge these two foreach. The reason is that one output of the first 
 foreach (B) is referred twice in D, and currently rule assume after merge, we 
 need to calculate B twice in D. Actually, C1 is only doing projection, no 
 calculation of B. Merging C1 and D will not result calculating B twice. So C1 
 and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-26 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915037#action_12915037
 ] 

Daniel Dai commented on PIG-1643:
-

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

All tests pass.

 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch, 
 PIG-1643.4.patch


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1643.
-

Release Note: PIG-1643.4.patch committed to both trunk and 0.8 branch.
  Resolution: Fixed

 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch, 
 PIG-1643.4.patch


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-4.patch

PIG-1644-4.patch fix findbug warnings and additional unit failures.

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch, PIG-1644-2.patch, PIG-1644-3.patch, 
 PIG-1644-4.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1644.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

All tests pass. 

Patch committed to both trunk and 0.8 branch.

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch, PIG-1644-2.patch, PIG-1644-3.patch, 
 PIG-1644-4.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914662#action_12914662
 ] 

Daniel Dai commented on PIG-1635:
-

+1, patch looks good. Also can you have a review of all connect/disconnect 
usage in ExpressionSimplifer, according to 
[PIG-1644|https://issues.apache.org/jira/browse/PIG-1644]? I see lots of misuse 
in other rules.

 Logical simplifier does not simplify away constants under AND and OR; after 
 simplificaion the ordering of operands of AND and OR may get changed
 

 Key: PIG-1635
 URL: https://issues.apache.org/jira/browse/PIG-1635
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1635.patch


 b = FILTER a by (( f1  1) AND (1 == 1))
 or 
 b = FILTER a by ((f1  1) OR ( 1==0))
 should be simplified to
 b = FILTER a by f1  1;
 Regarding ordering change, an example is that 
 b = filter a by ((f1 is not null) AND (f2 is not null));
 Even without possible simplification, the expression is changed to
 b = filter a by ((f2 is not null) AND (f1 is not null));
 Even though the ordering change in this case, and probably in most other 
 cases, does not create any difference, but for two reasons some users might 
 care about the ordering: if stateful UDFs are used as operands of AND or OR; 
 and if the ordering is intended by the application designer to maximize the 
 chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-3.patch

Find one bug introduced by refactory. Attach PIG-1644-3.patch with the fix, and 
running the tests again.

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch, PIG-1644-2.patch, PIG-1644-3.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914675#action_12914675
 ] 

Daniel Dai commented on PIG-1635:
-

+1 for commit.

 Logical simplifier does not simplify away constants under AND and OR; after 
 simplificaion the ordering of operands of AND and OR may get changed
 

 Key: PIG-1635
 URL: https://issues.apache.org/jira/browse/PIG-1635
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1635.patch


 b = FILTER a by (( f1  1) AND (1 == 1))
 or 
 b = FILTER a by ((f1  1) OR ( 1==0))
 should be simplified to
 b = FILTER a by f1  1;
 Regarding ordering change, an example is that 
 b = filter a by ((f1 is not null) AND (f2 is not null));
 Even without possible simplification, the expression is changed to
 b = filter a by ((f2 is not null) AND (f1 is not null));
 Even though the ordering change in this case, and probably in most other 
 cases, does not create any difference, but for two reasons some users might 
 care about the ordering: if stateful UDFs are used as operands of AND or OR; 
 and if the ordering is intended by the application designer to maximize the 
 chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-1643:
-


The following script does not produce the right result after patch:
{code}
a = load '/grid/2/dev/pigqa/in/singlefile/studenttab10k';
b = foreach a generate *;
store b into '/grid/2/dev/pigqa/out/log/hadoopqa.1285338379/Foreach_2.out';
{code}

 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1643.1.patch, PIG-1643.2.patch


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1643:


Attachment: PIG-1643.2.patch

Attach a fix.

 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1643.1.patch, PIG-1643.2.patch


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF

2010-09-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1639:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 New logical plan: PushUpFilter should not push before group/cogroup if filter 
 condition contains UDF
 

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1639-1.patch


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1643:


Attachment: PIG-1643.3.patch

PIG-1643.3.patch is more general than PIG-1643.2.patch. It solves this null 
schema issue for all expressions.

 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-23 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914126#action_12914126
 ] 

Daniel Dai commented on PIG-1643:
-

+1 if tests pass.

 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1643.1.patch


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-23 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914147#action_12914147
 ] 

Daniel Dai commented on PIG-1644:
-

Yes, I think we can do replace/remove/insert. They should be simple and clear 
enough to use. Here is the new methods adding to OperatorPlan:
{code}
replace(Operator oldOperator, Operator newOperator)
remove(Operator operatorToRemove) // Connect all its successors to 
predecessor/connect all it's predecessors to successor
insertBefore(Operator operatorToInsert, Operator pos) // Insert 
operatorToInsert before pos, connect all pos's predecessors to operatorToInsert
insertAfter(Operator operatorToInsert, Operator pos) // Insert operatorToInsert 
after pos, connect operatorToInsert to all pos's successor
{code}

How does it sounds?

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF

2010-09-23 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1639:


Summary: New logical plan: PushUpFilter should not push before 
group/cogroup if filter condition contains UDF  (was: New logical plan: 
PushUpFilter should not optimize if filter condition contains UDF)

 New logical plan: PushUpFilter should not push before group/cogroup if filter 
 condition contains UDF
 

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1639-1.patch


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF

2010-09-23 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914154#action_12914154
 ] 

Daniel Dai commented on PIG-1639:
-

+1 if all tests pass.

 New logical plan: PushUpFilter should not push before group/cogroup if filter 
 condition contains UDF
 

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1639-1.patch


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-23 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914317#action_12914317
 ] 

Daniel Dai commented on PIG-1644:
-

After looking into the existing code, seems insertBetween is a more useful 
method. So I want to drop insertBefore/insertAfter, and add insertBetween
{code}
insertBetween(Operator pred, Operator operatorToInsert, Operator succ)
{code}

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-23 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-2.patch

Attach the patch with new methods and refactory of existing code.

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch, PIG-1644-2.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913714#action_12913714
 ] 

Daniel Dai commented on PIG-1636:
-

test-patch result:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

All tests pass.

 Scalar fail if the scalar variable is generated by limit
 

 Key: PIG-1636
 URL: https://issues.apache.org/jira/browse/PIG-1636
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1636-1.patch


 The following script fail:
 {code}
 a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
 b = group a all;
 c = foreach b generate SUM(a.age) as total;
 c1= limit c 1;
 d = foreach a generate name, age/(double)c1.total as d_sum;
 store d into '111';
 {code}
 The problem is we have a reference to c1 in d. In the optimizer, we push 
 limit before foreach, d still reference to limit, and we get the wrong schema 
 for the scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1636.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 Scalar fail if the scalar variable is generated by limit
 

 Key: PIG-1636
 URL: https://issues.apache.org/jira/browse/PIG-1636
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1636-1.patch


 The following script fail:
 {code}
 a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
 b = group a all;
 c = foreach b generate SUM(a.age) as total;
 c1= limit c 1;
 d = foreach a generate name, age/(double)c1.total as d_sum;
 store d into '111';
 {code}
 The problem is we have a reference to c1 in d. In the optimizer, we push 
 limit before foreach, d still reference to limit, and we get the wrong schema 
 for the scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)

New logical plan: Plan.connect with position is misused in some places
--

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


When we replace/remove/insert a node, we will use disconnect/connect methods of 
OperatorPlan. When we disconnect an edge, we shall save the position of the 
edge in origination and destination, and use this position when connect to the 
new predecessor/successor. Some of the pattens are:

Insert a new node:
{code}
PairInteger, Integer pos = plan.disconnect(pred, succ);
plan.connect(pred, pos.first, newnode, 0);
plan.connect(newnode, 0, succ, pos.second);
{code}

Remove a node:
{code}
PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
plan.connect(pred, pos1.first, succ, pos2.second);
{code}

Replace a node:
{code}
PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
plan.connect(pred, pos1.first, newNode, pos1.second);
plan.connect(newNode, pos2.first, succ, pos2.second);
{code}

There are couple of places of we does not follow this pattern, that results 
some error. For example, the following script fail:
{code}
a = load '1.txt' as (a0, a1, a2, a3);
b = foreach a generate a0, a1, a2;
store b into 'aaa';
c = order b by a2;
d = foreach c generate a2;
store d into 'bbb';
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: (was: PIG-1644-1.patch)

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-1.patch

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-1.patch

Attach the patch to address all such places in new logical plan, except for 
ExpressionSimplifier. There is some work underway for ExpressionSimplifier 
([PIG-1635|https://issues.apache.org/jira/browse/PIG-1635]) include some of 
these changes, I don't want to conflict with that patch. So after PIG-1635, we 
may also review the connect/disconnect usage of ExpressionSimplifier.

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1605:

Attachment: PIG-1605-1.patch

Adding soft link to plan to solve input file dependency
---

Key: PIG-1605
URL: https://issues.apache.org/jira/browse/PIG-1605
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: PIG-1605-1.patch

In scalar implementation, we need to deal with implicit dependencies.
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
the problem by adding a LOScalar operator. Here is a different approach. We
will add a soft link to the plan, and soft link is only visible to the
walkers. By doing this, we can make sure we visit LOStore which generate
scalar first, and then LOForEach which use the scalar. All other part of the
logical plan does not know the existence of the soft link. The benefits are:
1. Logical plan do not need to deal with LOScalar, this makes logical plan
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data
flow in pipeline. In scalar, the dependency means an operator depends on a
file generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we
introduce another UDF dependent on a file generated by another operator, we
can use this mechanism to solve it.
4. With soft link, we can use scalar come from different sources in the same
statement, which in my mind is not a rare use case. (eg: D = foreach C
generate c0/A.total, c1/B.count; )
Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a
LOStore
2. store-load dependency, where we will load a file which is generated by a
store in the same script. This happens in multi-store case. Currently we
solve it by regular link. It is better to use a soft link.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-21 Thread Daniel Dai (JIRA)

Scalar fail if the scalar variable is generated by limit


 Key: PIG-1636
 URL: https://issues.apache.org/jira/browse/PIG-1636
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script fail:
{code}
a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
b = group a all;
c = foreach b generate SUM(a.age) as total;
c1= limit c 1;
d = foreach a generate name, age/(double)c1.total as d_sum;
store d into '111';
{code}

The problem is we have a reference to c1 in d. In the optimizer, we push limit 
before foreach, d still reference to limit, and we get the wrong schema for the 
scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-21 Thread Daniel Dai (JIRA)

Combiner not use because optimizor inserts a foreach between group and algebric 
function


 Key: PIG-1637
 URL: https://issues.apache.org/jira/browse/PIG-1637
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script does not use combiner after new optimization change.

{code}
A = load ':INPATH:/pigmix/page_views' using 
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, 
estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)timespent as timespent, 
(double)estimated_revenue as estimated_revenue;
C = group B all; 
D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
store D into ':OUTPATH:';
{code}

This is because after group, optimizer detect group key is not used afterward, 
it add a foreach statement after C. This is how it looks like after 
optimization:
{code}
A = load ':INPATH:/pigmix/page_views' using 
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, 
estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)timespent as timespent, 
(double)estimated_revenue as estimated_revenue;
C = group B all; 
C1 = foreach C generate B;
D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
store D into ':OUTPATH:';
{code}

That cancel the combiner optimization for D. 

The way to solve the issue is to merge the C1 we inserted and D. Currently, we 
do not merge these two foreach. The reason is that one output of the first 
foreach (B) is referred twice in D, and currently rule assume after merge, we 
need to calculate B twice in D. Actually, C1 is only doing projection, no 
calculation of B. Merging C1 and D will not result calculating B twice. So C1 
and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-21 Thread Daniel Dai (JIRA)

New logical plan: PushUpFilter should not optimize if filter condition contains 
UDF
---

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-21 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1639:


Description: 
The following script fail:
{code}
a = load 'file' AS (f1, f2, f3);
b = group a by f1;
c = filter b by COUNT(a)  1;
dump c;
{code}

 New logical plan: PushUpFilter should not optimize if filter condition 
 contains UDF
 ---

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1598) Pig gobbles up error messages - Part 2

2010-09-21 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1598:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch looks good. Committed to both trunk and 0.8 branch.

 Pig gobbles up error messages - Part 2
 --

 Key: PIG-1598
 URL: https://issues.apache.org/jira/browse/PIG-1598
 Project: Pig
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG-1598_0.patch


 Another case of PIG-1531 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1605:

Attachment: PIG-1605-2.patch

PIG-1605-2.patch fix findbug warnings.

test-patch result:
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or
modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] -1 release audit. The applied patch generated 455 release
audit warnings (more than the trunk's current 453 warning
s).

Adding soft link to plan to solve input file dependency
---

Key: PIG-1605
URL: https://issues.apache.org/jira/browse/PIG-1605
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: PIG-1605-1.patch, PIG-1605-2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai resolved PIG-1605.
-

Hadoop Flags: [Reviewed]
Resolution: Fixed

Release audit warning is due to jdiff. No new file added. Patch committed to
both trunk and 0.8 branch.

Adding soft link to plan to solve input file dependency
---

Key: PIG-1605
URL: https://issues.apache.org/jira/browse/PIG-1605
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: PIG-1605-1.patch, PIG-1605-2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-15 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909821#action_12909821
 ] 

Daniel Dai commented on PIG-1608:
-

Two comments:
1. target buildJar-withouthadoop should also include this change
2. format comment: use space instead of tab

Target jar, package looks good.

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Attachments: PIG-1608_0.patch


 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-15 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1608:


Fix Version/s: 0.9.0
Affects Version/s: 0.8.0

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: niraj rai
Assignee: niraj rai
 Fix For: 0.9.0

 Attachments: PIG-1608_0.patch, PIG-1608_1.patch


 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-15 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1608:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to trunk. Thanks Niraj!

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: niraj rai
Assignee: niraj rai
 Fix For: 0.9.0

 Attachments: PIG-1608_0.patch, PIG-1608_1.patch


 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1614) javacc.jar pulled twice from maven repository

2010-09-15 Thread Daniel Dai (JIRA)

javacc.jar pulled twice from maven repository
-

 Key: PIG-1614
 URL: https://issues.apache.org/jira/browse/PIG-1614
 Project: Pig
  Issue Type: Bug
  Components: build
Reporter: Daniel Dai
Priority: Trivial


ant pull javacc.jar twice from maven. One is javacc.jar, and the other is 
javacc-4.2.jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-13 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908886#action_12908886
 ] 

Daniel Dai commented on PIG-1608:
-

pig should include pig-default.properties into pig.jar, but not pig.properties, 
just like hadoop does for core-default.xml, core-site.xml.

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai

 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1605:

Description:
In scalar implementation, we need to deal with implicit dependencies.
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
the problem by adding a LOScalar operator. Here is a different approach. We
will add a soft link to the plan, and soft link is only visible to the walkers.
By doing this, we can make sure we visit LOStore which generate scalar first,
and then LOForEach which use the scalar. All other part of the logical plan
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data
flow in pipeline. In scalar, the dependency means an operator depends on a file
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce
another UDF dependent on a file generated by another operator, we can use this
mechanism to solve it.
4. With soft link, we can use scalar come from different sources in the same
statement, which in my mind is not a rare use case. (eg: D = foreach C generate
c0/A.total, c1/B.count;)

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a
store in the same script. This happens in multi-store case. Currently we solve
it by regular link. It is better to use a soft link.

was:
In scalar implementation, we need to deal with implicit dependencies.
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
the problem by adding a LOScalar operator. Here is a different approach. We
will add a soft link to the plan, and soft link is only visible to the walkers.
By doing this, we can make sure we visit LOStore which generate scalar first,
and then LOForEach which use the scalar. All other part of the logical plan
does not know the existence of the soft link. The benefits are:

Adding soft link to plan to solve input file dependency
---

Key: PIG-1605
URL: https://issues.apache.org/jira/browse/PIG-1605
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.8.0

In scalar implementation, we need to deal with implicit dependencies.
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
the problem by adding a LOScalar operator. Here is a different approach. We
will add a soft link to the plan, and soft link is only visible to the
walkers. By doing this, we can make sure we visit LOStore which generate
scalar first, and then LOForEach which use the scalar. All other part of the
logical plan does not know the existence of the soft link. The benefits are:
1. Logical plan do not need to deal with LOScalar, this makes logical plan
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data
flow in pipeline. In scalar, the dependency means an operator depends on a
file generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we
introduce another UDF dependent on a file generated by another operator, we
can use this mechanism to solve it.
4. With soft link, we can use scalar come from different sources in the same
statement, which in my mind is not a rare use case. (eg: D = foreach C
generate c0/A.total, c1/B.count;)
Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a
LOStore
2. store-load dependency, where we will load a file which is

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1605:

Adding soft link to plan to solve input file dependency
---

Key: PIG-1605
URL: https://issues.apache.org/jira/browse/PIG-1605
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.8.0

[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909007#action_12909007
 ] 

Daniel Dai commented on PIG-1605:
-

Changes are reasonably small. Here is a summary:
1. Add the following methods to the plan (both old and new):
{code}
public void createSoftLink(E from, E to)
public ListE getSoftLinkPredecessors(E op)
public ListE getSoftLinkSuccessors(E op)
{code}

2. All walkers need to change. When walker get predecessors/successors, it need 
to get both soft/regular link predecessors. The changes are straight forward, eg
from:
{code}
CollectionO newSuccessors = mPlan.getSuccessors(suc);
{code}
to:
{code}
CollectionO newSuccessors = mPlan.getSuccessors(suc);
newSuccessors.addAll(mPlan.getSoftLinkSuccessors(suc));
{code}

3. Change plan utility functions, such as replace, replaceAndAddSucessors, 
replaceAndAddPredecessors, etc
In new logical plan, there is no change since we only have minimum utility 
functions. In old logical plan, there should be some change to make those 
utility functions aware of soft link, but if we decide not support old logical 
plan going forward, no change needed, only need to note those utility functions 
does not deal with soft link within the function.

4. Change scalar to use soft link
This include creating soft link, maintaining soft link when doing transform 
(migrating to new plan, translating to physical plan). 

5. Change store-load to use soft link
This is an optional step. Currently we use regular link, conceptually we shall 
use soft link. It is Ok if we don't do this for now.

Also note in most cases, there is no soft link, the plan will behave just like 
before, so this change should be safe enough.

 Adding soft link to plan to solve input file dependency
 ---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 In scalar implementation, we need to deal with implicit dependencies. 
 [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
 the problem by adding a LOScalar operator. Here is a different approach. We 
 will add a soft link to the plan, and soft link is only visible to the 
 walkers. By doing this, we can make sure we visit LOStore which generate 
 scalar first, and then LOForEach which use the scalar. All other part of the 
 logical plan does not know the existence of the soft link. The benefits are:
 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
 cleaner
 2. Conceptually scalar dependency is different. Regular link represent a data 
 flow in pipeline. In scalar, the dependency means an operator depends on a 
 file generated by the other operator. It's different type of data dependency.
 3. Soft link can solve other dependency problem in the future. If we 
 introduce another UDF dependent on a file generated by another operator, we 
 can use this mechanism to solve it. 
 4. With soft link, we can use scalar come from different sources in the same 
 statement, which in my mind is not a rare use case. (eg: D = foreach C 
 generate c0/A.total, c1/B.count; )
 Currently, there are two cases we can use soft link:
 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
 LOStore
 2. store-load dependency, where we will load a file which is generated by a 
 store in the same script. This happens in multi-store case. Currently we 
 solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1604) 'relation as scalar' does not work with complex types

2010-09-10 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908096#action_12908096
 ] 

Daniel Dai commented on PIG-1604:
-

+1, patch looks good.

 'relation as scalar' does not work with complex types 
 --

 Key: PIG-1604
 URL: https://issues.apache.org/jira/browse/PIG-1604
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1604.1.patch


 Statement such as 
 sclr = limit b 1;
 d = foreach a generate name, age/(double)sclr.mapcol#'it' as some_sum;
 Results in the following parse error:
  ERROR 1000: Error during parsing. Non-atomic field expected but found atomic 
 field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-10 Thread Daniel Dai (JIRA)

Adding soft link to plan to solve input file dependency
---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
All other part of the logical plan does not know the existence of the soft 
link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-10 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1605:

was:
In scalar implementation, we need to deal with implicit dependencies.
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
the problem by adding a LOScalar operator. Here is a different approach. We
will add a soft link to the plan, and soft link is only visible to the walkers.
All other part of the logical plan does not know the existence of the soft
link. The benefits are:

Adding soft link to plan to solve input file dependency
---

Key: PIG-1605
URL: https://issues.apache.org/jira/browse/PIG-1605
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.8.0

In scalar implementation, we need to deal with implicit dependencies.
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
the problem by adding a LOScalar operator. Here is a different approach. We
will add a soft link to the plan, and soft link is only visible to the
walkers. By doing this, we can make sure we visit LOStore which generate
scalar first, and then LOForEach which use the scalar. All other part of the
logical plan does not know the existence of the soft link. The benefits are:
1. Logical plan do not need to deal with LOScalar, this makes logical plan
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data
flow in pipeline. In scalar, the dependency means an operator depends on a
file generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we
introduce another UDF dependent on a file generated by another operator, we
can use this mechanism to solve it.
Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a
LOStore
2. store-load dependency, where we will load a file which is generated by a
store in the same script. This happens in multi-store case. Currently we
solve it by regular link. It is better to use a soft link.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1322) Logical Optimizer: change outer join into regular join

2010-09-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1322:


 Assignee: Xuefu Zhang  (was: Daniel Dai)
Fix Version/s: 0.9.0

 Logical Optimizer: change outer join into regular join
 --

 Key: PIG-1322
 URL: https://issues.apache.org/jira/browse/PIG-1322
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.9.0


 In some cases, we can change the outer join into a regular join. The benefit 
 is regular join is easier to optimize in subsequent optimization. 
 Example:
 C = join A by a0 LEFT OUTER, B by b0;
 D = filter C by b0  0;
 = 
 C = join A by a0, B by b0;
 D = filter C by b0  0;
 Because we made this change, so PushUpFilter rule can further push the filter 
 in front of regular join which otherwise cannot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct

2010-09-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1437:


 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0

 [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
 -

 Key: PIG-1437
 URL: https://issues.apache.org/jira/browse/PIG-1437
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 Its possible to rewrite queries like this
 {code}
 A = load 'data' as (name,age);
 B = group A by (name,age);
 C = foreach B generate group.name, group.age;
 dump C;
 {code}
 or
 {code} 
 (name,age);
 B = group A by (name
 A = load 'data' as,age);
 C = foreach B generate flatten(group);
 dump C;
 {code}
 to
 {code}
 A = load 'data' as (name,age);
 B = distinct A;
 dump B;
 {code}
 This could only be done if no columns within the bags are referenced 
 subsequently in the script. Since in Pig-Hadoop world DISTINCT will be 
 executed more effeciently then group-by this will be a huge win. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1601) Make scalar work for secure hadoop

2010-09-07 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1601.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 Make scalar work for secure hadoop
 --

 Key: PIG-1601
 URL: https://issues.apache.org/jira/browse/PIG-1601
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1601-1.patch


 Error message:
 open file
 'hdfs://gsbl90890.blue.ygrid.yahoo.com/tmp/temp851711738/tmp727366271'; error 
 =
 java.io.IOException: Delegation Token can be issued only with kerberos or web
 authentication at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:4975)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.getDelegationToken(NameNode.java:432)
 at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597) at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1301) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1297) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1295) at
 org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:66) at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:313)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:448)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:441)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Divide.java:72)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at
 org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

2010-09-07 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906932#action_12906932
]

Daniel Dai commented on PIG-1595:
-

+1 for the test failure fix.

casting relation to scalar- problem with handling of data from non PigStorage
loaders
-

Key: PIG-1595
URL: https://issues.apache.org/jira/browse/PIG-1595
Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1595.1.patch, PIG-1595.2.patch

If load functions that don't follow the same bytearray format as PigStorage
for other supported datatypes, or those that don't implement the LoadCaster
interface are used in 'casting relation to scalar' (PIG-1434), it can cause
the query to fail or create incorrect results.
The root cause of the problem is that there is a real dependency between the
ReadScalars udf that returns the scalar value and the LogicalOperator that
acts as its input. But the logicalplan does not capture this dependency. So
in SchemaResetter visitor used by the optimizer, the order in which schema is
reset and evaluated does not take this into consideration. If the schema of
the input LogicalOperator does not get evaluated before the ReadScalar udf,
the resutltype of ReadScalar udf becomes bytearray. POUserFunc will convert
the input to bytearray using ' new DataByteArray(inp.toString().getBytes())'.
But this bytearray encoding of other supported types might not be same for
the LoadFunction associated with the column, and that can result in problems.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-07 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1178:

Attachment: PIG-1178-11.patch

PIG-1178-11.patch change the layout of explain, error code and comments, etc.
No real functional changes.

test-patch result:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 11 new or
modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the
total number of release audit warnings.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: expressions-2.patch, expressions.patch, lp.patch,
lp.patch, PIG-1178-10.patch, PIG-1178-11.patch, PIG-1178-4.patch,
PIG-1178-5.patch, PIG-1178-6.patch, PIG-1178-7.patch, PIG-1178-8.patch,
PIG-1178-9.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch,
pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch,
pig_1178_3.patch

The current implementation of the logical plan and the logical optimizer in
Pig has proven to not be easily extensible. Developer feedback has indicated
that adding new rules to the optimizer is quite burdensome. In addition, the
logical plan has been an area of numerous bugs, many of which have been
difficult to fix. Developers also feel that the logical plan is difficult to
understand and maintain. The root cause for these issues is that a number of
design decisions that were made as part of the 0.2 rewrite of the front end
have now proven to be sub-optimal. The heart of this proposal is to revisit a
number of those proposals and rebuild the logical plan with a simpler design
that will make it much easier to maintain the logical plan as well as extend
the logical optimizer.
See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full
details.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-07 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907061#action_12907061
]

Daniel Dai commented on PIG-1178:
-

PIG-1178-11.patch committed to both trunk and 0.8 branch.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-07 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-10.patch, PIG-1178-11.patch, PIG-1178-4.patch, 
 PIG-1178-5.patch, PIG-1178-6.patch, PIG-1178-7.patch, PIG-1178-8.patch, 
 PIG-1178-9.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, 
 pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-06 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906592#action_12906592
]

Daniel Dai commented on PIG-1178:
-

Patch PIG-1178-10.patch committed.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: expressions-2.patch, expressions.patch, lp.patch,
lp.patch, PIG-1178-10.patch, PIG-1178-4.patch, PIG-1178-5.patch,
PIG-1178-6.patch, PIG-1178-7.patch, PIG-1178-8.patch, PIG-1178-9.patch,
pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch,
pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1594) NullPointerException in new logical planner

2010-09-06 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1594.
-

Resolution: Fixed

This issue is fixed by PIG-1178-10.patch.

 NullPointerException in new logical planner
 ---

 Key: PIG-1594
 URL: https://issues.apache.org/jira/browse/PIG-1594
 Project: Pig
  Issue Type: Bug
Reporter: Andrew Hitchcock
Assignee: Daniel Dai
 Fix For: 0.8.0


 I've been testing the trunk version of Pig on Elastic MapReduce against our 
 log processing sample application(1). When I try to run the query it throws a 
 NullPointerException and suggests I disable the new logical plan. Disabling 
 it works and the script succeeds. Here is the query I'm trying to run:
 {code}
 register file:/home/hadoop/lib/pig/piggybank.jar
   DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
   RAW_LOGS = LOAD '$INPUT' USING TextLoader as (line:chararray);
   LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line, '^(\\S+) (\\S+) 
 (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] (.+?) (\\S+) (\\S+) ([^]*) 
 ([^]*)')) as (remoteAddr:chararray, remoteLogname:chararray, 
 user:chararray, time:chararray, request:chararray, status:int, 
 bytes_string:chararray, referrer:chararray, browser:chararray);
   REFERRER_ONLY = FOREACH LOGS_BASE GENERATE referrer;
   FILTERED = FILTER REFERRER_ONLY BY referrer matches '.*bing.*' OR referrer 
 matches '.*google.*';
   SEARCH_TERMS = FOREACH FILTERED GENERATE FLATTEN(EXTRACT(referrer, 
 '.*[\\?]q=([^]+).*')) as terms:chararray;
   SEARCH_TERMS_FILTERED = FILTER SEARCH_TERMS BY NOT $0 IS NULL;
   SEARCH_TERMS_COUNT = FOREACH (GROUP SEARCH_TERMS_FILTERED BY $0) GENERATE 
 $0, COUNT($1) as num;
   SEARCH_TERMS_COUNT_SORTED = LIMIT(ORDER SEARCH_TERMS_COUNT BY num DESC) 50;
   STORE SEARCH_TERMS_COUNT_SORTED into '$OUTPUT';
 {code}
 And here is the stack trace that results:
 {code}
 ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2042: Error in 
 new logical plan. Try -Dpig.usenewlogicalplan=false.
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285)
 at org.apache.pig.PigServer.compilePp(PigServer.java:1301)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1154)
 at org.apache.pig.PigServer.execute(PigServer.java:1148)
 at org.apache.pig.PigServer.access$100(PigServer.java:123)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:1464)
 at org.apache.pig.PigServer.executeBatchEx(PigServer.java:350)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:111)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
 at org.apache.pig.Main.run(Main.java:491)
 at org.apache.pig.Main.main(Main.java:107)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
 at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76)
 at 
 org.apache.pig.piggybank.impl.ErrorCatchingBase.outputSchema(ErrorCatchingBase.java:76)
 at 
 org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:111)
 at 
 org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:175)
 at 
 org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143)
 at 
 org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:55)
 at 
 org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69)
 at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
 at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:87)
 at 
 org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:149)
 at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74)
 at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:76)
 at

[jira] Updated: (PIG-1575) Complete the migration of optimization rule PushUpFilter including missing test cases

2010-09-05 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1575:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 Complete the migration of optimization rule PushUpFilter including missing 
 test cases
 -

 Key: PIG-1575
 URL: https://issues.apache.org/jira/browse/PIG-1575
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1575-1.patch, jira-1575-2.patch, jira-1575-3.patch, 
 jira-1575-4.patch, jira-1575-5.patch


 The Optimization rule under the new logical plan, PushUpFilter, only does a 
 subset of optimization scenarios compared to the same rule under the old 
 logical plan. For instance, it only considers filter after join, but the old 
 optimization also considers other operators such as CoGroup, Union, Cross, 
 etc. The migration of the rule should be complete.
 Also, the test cases created for testing the old PushUpFilter wasn't migrated 
 to the new logical plan code base. It should be also migrated. (A few has 
 been migrated in JIRA-1574.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1575) Complete the migration of optimization rule PushUpFilter including missing test cases

2010-09-05 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1575:


Attachment: jira-1575-5.patch

Patch looks good. Attach the final patch. 

test patch result:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

All tests pass.

Patch committed to both trunk and 0.8 branch.

 Complete the migration of optimization rule PushUpFilter including missing 
 test cases
 -

 Key: PIG-1575
 URL: https://issues.apache.org/jira/browse/PIG-1575
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1575-1.patch, jira-1575-2.patch, jira-1575-3.patch, 
 jira-1575-4.patch, jira-1575-5.patch


 The Optimization rule under the new logical plan, PushUpFilter, only does a 
 subset of optimization scenarios compared to the same rule under the old 
 logical plan. For instance, it only considers filter after join, but the old 
 optimization also considers other operators such as CoGroup, Union, Cross, 
 etc. The migration of the rule should be complete.
 Also, the test cases created for testing the old PushUpFilter wasn't migrated 
 to the new logical plan code base. It should be also migrated. (A few has 
 been migrated in JIRA-1574.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1548) Optimize scalar to consolidate the part file

2010-09-04 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906321#action_12906321
 ] 

Daniel Dai commented on PIG-1548:
-

Patch break TestFRJoin2.testConcatenateJobForScalar3. Comment out 
TestFRJoin2.testConcatenateJobForScalar3 temporarily.

 Optimize scalar to consolidate the part file
 

 Key: PIG-1548
 URL: https://issues.apache.org/jira/browse/PIG-1548
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1548.patch, PIG-1548_1.patch


 Current scalar implementation will write a scalar file onto dfs. When Pig 
 need the scalar, it will open the dfs file directly. Each scalar file 
 contains more than one part file though it contains only one record. This 
 puts a huge load to namenode. We should consolidate part file before open it. 
 Another optional step is put the consolicated file into distributed cache. 
 This further bring down the load of namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

2010-09-04 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906322#action_12906322
]

Daniel Dai commented on PIG-1595:
-

Patch break TestScalarAliases.testScalarErrMultipleRowsInInput. Comment out
TestScalarAliases.testScalarErrMultipleRowsInInput temporarily.

casting relation to scalar- problem with handling of data from non PigStorage
loaders
-

Key: PIG-1595
URL: https://issues.apache.org/jira/browse/PIG-1595
Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1595.1.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1591) pig does not create a log file, if tje MR job succeeds but front end fails.

2010-09-03 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905966#action_12905966
 ] 

Daniel Dai commented on PIG-1591:
-

+1. No unit test needed since it is about error message. Manually tested and it 
works. Will commit it shortly.

 pig does not create a log file, if tje MR job succeeds but front end fails.
 ---

 Key: PIG-1591
 URL: https://issues.apache.org/jira/browse/PIG-1591
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Attachments: pig_1591.patch


 When I run this script:
 A = load 'limit_empty.input_a' as (a1:int);
 B = load 'limit_empty.input_b' as (b1:int);
 C =COGROUP A by a1, B by b1;
 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
 D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
 0:1), COUNT(Alim), COUNT(Blim);
 dump D1;
 The MR job succeeds but the pig job fails with the fillowing error:
 2010-08-31 13:33:09,960 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2010-08-31 13:33:09,962 [main] INFO  org.apache.pig.impl.io.InterStorage - 
 Pig Internal storage in use
 2010-08-31 13:33:09,963 [main] INFO  org.apache.pig.impl.io.InterStorage - 
 Pig Internal storage in use
 2010-08-31 13:33:09,963 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 2010-08-31 13:33:09,964 [main] INFO  org.apache.pig.impl.io.InterStorage - 
 Pig Internal storage in use
 2010-08-31 13:33:09,965 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2010-08-31 13:33:09,969 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 1
 2010-08-31 13:33:09,969 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 2010-08-31 13:33:09,973 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.HJob - 
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 org.apache.pig.data.Tuple
 since MR job is succeeded, so the pig does not create any log file, but it 
 should still create a log file, giving the cause of failure in the pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1591) pig does not create a log file, if tje MR job succeeds but front end fails.

2010-09-03 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1591.
-

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.8.0
   Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 pig does not create a log file, if tje MR job succeeds but front end fails.
 ---

 Key: PIG-1591
 URL: https://issues.apache.org/jira/browse/PIG-1591
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: pig_1591.patch


 When I run this script:
 A = load 'limit_empty.input_a' as (a1:int);
 B = load 'limit_empty.input_b' as (b1:int);
 C =COGROUP A by a1, B by b1;
 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
 D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
 0:1), COUNT(Alim), COUNT(Blim);
 dump D1;
 The MR job succeeds but the pig job fails with the fillowing error:
 2010-08-31 13:33:09,960 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2010-08-31 13:33:09,962 [main] INFO  org.apache.pig.impl.io.InterStorage - 
 Pig Internal storage in use
 2010-08-31 13:33:09,963 [main] INFO  org.apache.pig.impl.io.InterStorage - 
 Pig Internal storage in use
 2010-08-31 13:33:09,963 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 2010-08-31 13:33:09,964 [main] INFO  org.apache.pig.impl.io.InterStorage - 
 Pig Internal storage in use
 2010-08-31 13:33:09,965 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2010-08-31 13:33:09,969 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 1
 2010-08-31 13:33:09,969 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 2010-08-31 13:33:09,973 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.HJob - 
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 org.apache.pig.data.Tuple
 since MR job is succeeded, so the pig does not create any log file, but it 
 should still create a log file, giving the cause of failure in the pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1601) Make scalar work for secure hadoop

2010-09-03 Thread Daniel Dai (JIRA)

Make scalar work for secure hadoop
--

 Key: PIG-1601
 URL: https://issues.apache.org/jira/browse/PIG-1601
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0
 Attachments: PIG-1601-1.patch

Error message:
open file
'hdfs://gsbl90890.blue.ygrid.yahoo.com/tmp/temp851711738/tmp727366271'; error =
java.io.IOException: Delegation Token can be issued only with kerberos or web
authentication at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:4975)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getDelegationToken(NameNode.java:432)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1301) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1297) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1295) at
org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:66) at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:313)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:448)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:441)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Divide.java:72)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at
org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
at org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1601) Make scalar work for secure hadoop

2010-09-03 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1601:


Attachment: PIG-1601-1.patch

 Make scalar work for secure hadoop
 --

 Key: PIG-1601
 URL: https://issues.apache.org/jira/browse/PIG-1601
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1601-1.patch


 Error message:
 open file
 'hdfs://gsbl90890.blue.ygrid.yahoo.com/tmp/temp851711738/tmp727366271'; error 
 =
 java.io.IOException: Delegation Token can be issued only with kerberos or web
 authentication at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:4975)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.getDelegationToken(NameNode.java:432)
 at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597) at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1301) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1297) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1295) at
 org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:66) at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:313)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:448)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:441)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Divide.java:72)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at
 org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-09-02 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905587#action_12905587
 ] 

Daniel Dai commented on PIG-1543:
-

test-patch result:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

All tests pass

 IsEmpty returns the wrong value after using LIMIT
 -

 Key: PIG-1543
 URL: https://issues.apache.org/jira/browse/PIG-1543
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1543-1.patch


 1. Two input files:
 1a: limit_empty.input_a
 1
 1
 1
 1b: limit_empty.input_b
 2
 2
 2.
 The pig script: limit_empty.pig
 -- A contains only 1's  B contains only 2's
 A = load 'limit_empty.input_a' as (a1:int);
 B = load 'limit_empty.input_a' as (b1:int);
 C =COGROUP A by a1, B by b1;
 D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
 COUNT(B);
 store D into 'limit_empty.output/d';
 -- After the script done, we see the right results:
 -- {(1),(1),(1)}   {}  1   0   3   0
 -- {} {(2),(2)}  0   1   0   2
 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
 D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
 0:1), COUNT(Alim), COUNT(Blim);
 store D1 into 'limit_empty.output/d1';
 -- After the script done, we see the unexpected results:
 -- {(1)}   {}1   1   1   0
 -- {}  {(2)} 1   1   0   1
 dump D;
 dump D1;
 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
 The major one:
 IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
 IsEmpty() returns correctly in limit_empty.output/d/*.
 The difference is that one has been applied with LIMIT before using 
 IsEmpty().
 The minor one:
 The redirected output only contains the first dump:
 ({(1),(1),(1)},{},1,0,3L,0L)
 ({},{(2),(2)},0,1,0L,2L)
 We expect two more lines like:
 ({(1)},{},1,1,1L,0L)
 ({},{(2)},1,1,0L,1L)
 Besides, there is error says:
 [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1587) Cloning utility functions for new logical plan

2010-09-01 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1587:


Description: 
We sometimes need to copy a logical operator/plan when writing an optimization 
rule. Currently copy an operator/plan is awkward. We need to write some 
utilities to facilitate this process. Swati contribute PIG-1510 but we feel it 
still cannot address most use cases. I propose to add some more utilities into 
new logical plan:

all LogicalExpressions:
{code}
copy(LogicalExpressionPlan newPlan, boolean keepUid);
{code}
* Do a shallow copy of the logical expression operator (except for fieldSchema, 
uidOnlySchema, ProjectExpression.attachedRelationalOp)
* Set the plan to newPlan
* If keepUid is true, further copy uidOnlyFieldSchema

all LogicalRelationalOperators:
{code}
copy(LogicalPlan newPlan, boolean keepUid);
{code}
* Do a shallow copy of the logical relational operator (except for schema, uid 
related fields)
* Set the plan to newPlan;
* If the operator have inner plan/expression plan, copy the whole inner plan 
with the same keepUid flag (Especially, LOInnerLoad will copy its inner 
project, with the same keepUid flag)
* If keepUid is true, further copy uid related fields (LOUnion.uidMapping, 
LOCogroup.groupKeyUidOnlySchema, LOCogroup.generatedInputUids)

LogicalExpressionPlan.java
{code}
LogicalExpressionPlan copy(LogicalRelationalOperator attachedRelationalOp, 
boolean keepUid);
LogicalExpressionPlan copyAbove(LogicalExpression leave, 
LogicalRelationalOperator attachedRelationalOp, boolean keepUid);
LogicalExpressionPlan copyBelow(LogicalExpression root, 
LogicalRelationalOperator attachedRelationalOp, boolean keepUid);
{code}
* Create a new logical expression plan and copy expression operator along with 
connection with the same keepUid flag
* Set all ProjectExpression.attachedRelationalOp to attachedRelationalOp 
parameter

{code}
PairListOperator, ListOperator merge(LogicalExpressionPlan plan, 
LogicalRelationalOperator attachedRelationalOp);
{code}
* Merge plan into the current logical expression plan as an independent tree
* attachedRelationalOp is the destination operator new logical expression plan 
attached to
* return the sources/sinks of this independent tree


LogicalPlan.java
{code}
LogicalPlan copy(LOForEach foreach, boolean keepUid);
LogicalPlan copyAbove(LogicalRelationalOperator leave, LOForEach foreach, 
boolean keepUid);
LogicalPlan copyBelow(LogicalRelationalOperator root, LOForEach foreach, 
boolean keepUid);
{code}
* Main use case to copy inner plan of ForEach
* Create a new logical plan and copy relational operator along with connection
* Copy all expression plans inside relational operator, set plan and 
attachedRelationalOp properly
* If the plan is ForEach inner plan, param foreach is the destination ForEach 
operator; otherwise, pass null

{code}
PairListOperator, ListOperator merge(LogicalPlan plan, LOForEach foreach);
{code}
* Merge plan into the current logical plan as an independent tree
* foreach is the destination LOForEach is the destination plan is a ForEach 
inner plan; otherwise, pass null
* return the sources/sinks of this independent tree


  was:
We sometimes need to copy a logical operator/plan when writing an optimization 
rule. Currently copy an operator/plan is awkward. We need to write some 
utilities to facilitate this process. Swati contribute PIG-1510 but we feel it 
still cannot address most use cases. I propose to add some more utilities into 
new logical plan:

all LogicalExpressions:
{code}
copy(LogicalExpressionPlan newPlan, boolean keepUid);
{code}
* Do a shallow copy of the logical expression operator (except for fieldSchema, 
uidOnlySchema, ProjectExpression.attachedRelationalOp)
* Set the plan to newPlan
* If keepUid is true, further copy uidOnlyFieldSchema

all LogicalRelationalOperators:
{code}
copy(LogicalPlan newPlan, boolean keepUid);
{code}
* Do a shallow copy of the logical relational operator (except for schema, uid 
related fields)
* Set the plan to newPlan;
* If the operator have inner plan/expression plan, copy the whole inner plan 
with the same keepUid flag (Especially, LOInnerLoad will copy its inner 
project, with the same keepUid flag)
* If keepUid is true, further copy uid related fields (LOUnion.uidMapping, 
LOCogroup.groupKeyUidOnlySchema, LOCogroup.generatedInputUids)

LogicalExpressionPlan.java
{code}
LogicalExpressionPlan copy(LogicalRelationalOperator attachedRelationalOp, 
boolean keepUid);
{code}
* Copy expression operator along with connection with the same keepUid flag
* Set all ProjectExpression.attachedRelationalOp to attachedRelationalOp 
parameter

{code}
ListOperator merge(LogicalExpressionPlan plan);
{code}
* Merge plan into the current logical expression plan as an independent tree
* return the sources of this independent tree


LogicalPlan.java
{code}
LogicalPlan copy(boolean keepUid);
{code}
* Main use

[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-09-01 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1543:


Status: Patch Available  (was: Open)

 IsEmpty returns the wrong value after using LIMIT
 -

 Key: PIG-1543
 URL: https://issues.apache.org/jira/browse/PIG-1543
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1543-1.patch


 1. Two input files:
 1a: limit_empty.input_a
 1
 1
 1
 1b: limit_empty.input_b
 2
 2
 2.
 The pig script: limit_empty.pig
 -- A contains only 1's  B contains only 2's
 A = load 'limit_empty.input_a' as (a1:int);
 B = load 'limit_empty.input_a' as (b1:int);
 C =COGROUP A by a1, B by b1;
 D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
 COUNT(B);
 store D into 'limit_empty.output/d';
 -- After the script done, we see the right results:
 -- {(1),(1),(1)}   {}  1   0   3   0
 -- {} {(2),(2)}  0   1   0   2
 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
 D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
 0:1), COUNT(Alim), COUNT(Blim);
 store D1 into 'limit_empty.output/d1';
 -- After the script done, we see the unexpected results:
 -- {(1)}   {}1   1   1   0
 -- {}  {(2)} 1   1   0   1
 dump D;
 dump D1;
 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
 The major one:
 IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
 IsEmpty() returns correctly in limit_empty.output/d/*.
 The difference is that one has been applied with LIMIT before using 
 IsEmpty().
 The minor one:
 The redirected output only contains the first dump:
 ({(1),(1),(1)},{},1,0,3L,0L)
 ({},{(2),(2)},0,1,0L,2L)
 We expect two more lines like:
 ({(1)},{},1,1,1L,0L)
 ({},{(2)},1,1,0L,1L)
 Besides, there is error says:
 [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1572) change default datatype when relations are used as scalar to bytearray

2010-09-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905293#action_12905293
 ] 

Daniel Dai commented on PIG-1572:
-

Patch looks good. One minor doubt is when we migrate to new logical plan, 
UserFuncExpression already have necessary cast inserted, seems we do not need 
to change new logical plan's UserFuncExpression.getFieldSchema(), am I right?

 change default datatype when relations are used as scalar to bytearray
 --

 Key: PIG-1572
 URL: https://issues.apache.org/jira/browse/PIG-1572
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1572.1.patch, PIG-1572.2.patch


 When relations are cast to scalar, the current default type is chararray. 
 This is inconsistent with the behavior in rest of pig-latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1583) piggybank unit test TestLookupInFiles is broken

2010-09-01 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1583:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 piggybank unit test TestLookupInFiles is broken
 ---

 Key: PIG-1583
 URL: https://issues.apache.org/jira/browse/PIG-1583
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1583-1.patch


 Error message:
 10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from 
 attempt_20100831093139211_0001_m_00_3: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught 
 error from UDF: org.apache.pig.piggybank.evaluation.string.LookupInFiles 
 [LookupInFiles : Cannot open file one]
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:262)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:283)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:355)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.io.IOException: LookupInFiles : Cannot open file one
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:92)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:115)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:49)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 ... 10 more
 Caused by: java.io.IOException: hdfs://localhost:47453/user/hadoopqa/one 
 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:224)
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:172)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:89)
 ... 13 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1583) piggybank unit test TestLookupInFiles is broken

2010-08-31 Thread Daniel Dai (JIRA)

piggybank unit test TestLookupInFiles is broken
---

 Key: PIG-1583
 URL: https://issues.apache.org/jira/browse/PIG-1583
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0
 Attachments: PIG-1583-1.patch

Error message:
10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from 
attempt_20100831093139211_0001_m_00_3: 
org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught 
error from UDF: org.apache.pig.piggybank.evaluation.string.LookupInFiles 
[LookupInFiles : Cannot open file one]
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:262)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:283)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:355)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: LookupInFiles : Cannot open file one
at 
org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:92)
at 
org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:115)
at 
org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:49)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
... 10 more
Caused by: java.io.IOException: hdfs://localhost:47453/user/hadoopqa/one 
does not exist
at 
org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:224)
at 
org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:172)
at 
org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:89)
... 13 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1583) piggybank unit test TestLookupInFiles is broken

2010-08-31 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1583:


Attachment: (was: PIG-1583-1.patch)

 piggybank unit test TestLookupInFiles is broken
 ---

 Key: PIG-1583
 URL: https://issues.apache.org/jira/browse/PIG-1583
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1583-1.patch


 Error message:
 10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from 
 attempt_20100831093139211_0001_m_00_3: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught 
 error from UDF: org.apache.pig.piggybank.evaluation.string.LookupInFiles 
 [LookupInFiles : Cannot open file one]
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:262)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:283)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:355)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.io.IOException: LookupInFiles : Cannot open file one
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:92)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:115)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:49)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 ... 10 more
 Caused by: java.io.IOException: hdfs://localhost:47453/user/hadoopqa/one 
 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:224)
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:172)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:89)
 ... 13 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1583) piggybank unit test TestLookupInFiles is broken

2010-08-31 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1583:


Attachment: PIG-1583-1.patch

 piggybank unit test TestLookupInFiles is broken
 ---

 Key: PIG-1583
 URL: https://issues.apache.org/jira/browse/PIG-1583
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1583-1.patch


 Error message:
 10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from 
 attempt_20100831093139211_0001_m_00_3: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught 
 error from UDF: org.apache.pig.piggybank.evaluation.string.LookupInFiles 
 [LookupInFiles : Cannot open file one]
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:262)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:283)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:355)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.io.IOException: LookupInFiles : Cannot open file one
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:92)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:115)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:49)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 ... 10 more
 Caused by: java.io.IOException: hdfs://localhost:47453/user/hadoopqa/one 
 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:224)
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:172)
 at 
 org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:89)
 ... 13 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1587) Cloning utility functions for new logical plan

2010-08-31 Thread Daniel Dai (JIRA)

Cloning utility functions for new logical plan
--

 Key: PIG-1587
 URL: https://issues.apache.org/jira/browse/PIG-1587
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.9.0


We sometimes need to copy a logical operator/plan when writing an optimization 
rule. Currently copy an operator/plan is awkward. We need to write some 
utilities to facilitate this process. Swati contribute PIG-1510 but we feel it 
still cannot address most use cases. I propose to add some more utilities into 
new logical plan:

all LogicalExpressions:
{code}
copy(LogicalExpressionPlan newPlan, boolean keepUid);
{code}
* Do a shallow copy of the logical expression operator (except for fieldSchema, 
uidOnlySchema, ProjectExpression.attachedRelationalOp)
* Set the plan to newPlan
* If keepUid is true, further copy uidOnlyFieldSchema

all LogicalRelationalOperators:
{code}
copy(LogicalPlan newPlan, boolean keepUid);
{code}
* Do a shallow copy of the logical relational operator (except for schema, uid 
related fields)
* Set the plan to newPlan;
* If the operator have inner plan/expression plan, copy the whole inner plan 
with the same keepUid flag (Especially, LOInnerLoad will copy its inner 
project, with the same keepUid flag)
* If keepUid is true, further copy uid related fields (LOUnion.uidMapping, 
LOCogroup.groupKeyUidOnlySchema, LOCogroup.generatedInputUids)

LogicalExpressionPlan.java
{code}
LogicalExpressionPlan copy(LogicalRelationalOperator attachedRelationalOp, 
boolean keepUid);
{code}
* Copy expression operator along with connection with the same keepUid flag
* Set all ProjectExpression.attachedRelationalOp to attachedRelationalOp 
parameter

{code}
ListOperator merge(LogicalExpressionPlan plan);
{code}
* Merge plan into the current logical expression plan as an independent tree
* return the sources of this independent tree


LogicalPlan.java
{code}
LogicalPlan copy(boolean keepUid);
{code}
* Main use case to copy inner plan of ForEach
* Copy all relational operator along with connection
* Copy all expression plans inside relational operator, set plan and 
attachedRelationalOp properly

{code}
ListOperator merge(LogicalPlan plan);
{code}
* Merge plan into the current logical plan as an independent tree
* return the sources of this independent tree


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins

2010-08-30 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1574:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

test-patch result:
jira-1574-1.patch

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

This patch does not push filter before join if the join is outer join. Actually 
we can push filter to the outer side of the join. I assume it will be addressed 
in PIG-1575.

Patch jira-1574-1.patch committed. Thanks Xuefu!

 Optimization rule PushUpFilter causes filter to be pushed up out joins
 --

 Key: PIG-1574
 URL: https://issues.apache.org/jira/browse/PIG-1574
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1574-1.patch


 The PushUpFilter optimization rule in the new logical plan moves the filter 
 up to one of the join branch. It does this aggressively by find an operator 
 that has all the projection UIDs. However, it didn't consider that the found 
 operator might be another join. If that join is outer, then we cannot simply 
 move the filter to one of its branches.
 As an example, the following script will be erroneously optimized:
 A = load 'myfile' as (d1:int);
 B = load 'anotherfile' as (d2:int);
 C = join A by d1 full outer, B by d2;
 D = load 'xxx' as (d3:int);
 E = join C by d1, D by d3;
 F = filter E by d1  5;
 G = store F into 'dummy';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly

2010-08-30 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1568:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

test-patch result:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Patch committed. Thanks Xuefu!

 Optimization rule FilterAboveForeach is too restrictive and doesn't handle 
 project * correctly
 --

 Key: PIG-1568
 URL: https://issues.apache.org/jira/browse/PIG-1568
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1568-1.patch, jira-1568-1.patch


 FilterAboveForeach rule is to optimize the plan by pushing up filter above 
 previous foreach operator. However, during code review, two major problems 
 were found:
 1. Current implementation assumes that if no projection is found in the 
 filter condition then all columns from foreach are projected. This issue 
 prevents the following optimization:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY 8  5;
   STORE C INTO 'empty';
 2. Current implementation doesn't handle * probjection, which means project 
 all columns. As a result, it wasn't able to optimize the following:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY Identity.class.getName(*)  5;
   STORE C INTO 'empty';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-08-30 Thread Daniel Dai (JIRA)

Intermittent unit test failure for 
TestScriptUDF.testPythonScriptUDFNullInputOutput
---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-08-30 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1579:


Attachment: PIG-1579-1.patch

Attach a fix. However, this fix is shallow and may need an in-depth look. 
Commit the temporary fix and leave the Jira open.

 Intermittent unit test failure for 
 TestScriptUDF.testPythonScriptUDFNullInputOutput
 ---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1579-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-08-30 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1579:


Description: 
Error message:
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error executing 
function: Traceback (most recent call last):
  File iostream, line 5, in multStr
TypeError: can't multiply sequence by non-int of type 'NoneType'

at 
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


 Intermittent unit test failure for 
 TestScriptUDF.testPythonScriptUDFNullInputOutput
 ---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1579-1.patch


 Error message:
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error 
 executing function: Traceback (most recent call last):
   File iostream, line 5, in multStr
 TypeError: can't multiply sequence by non-int of type 'NoneType'
 at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-365) Map side optimization for Limit (top k case)

2010-08-29 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-365.


Resolution: Won't Fix

 Map side optimization for Limit (top k case)
 

 Key: PIG-365
 URL: https://issues.apache.org/jira/browse/PIG-365
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor

 In map side, only collect top k records to improve performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-365) Map side optimization for Limit (top k case)

2010-08-29 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903996#action_12903996
 ] 

Daniel Dai commented on PIG-365:


Hi, Gianmarco,
Yes, you are right. This is a quite old Jira and it is no longer applicable. I 
will close this Jira. More recent limit optimization we are still looking at is 
[PIG-1270|https://issues.apache.org/jira/browse/PIG-1270]. 

 Map side optimization for Limit (top k case)
 

 Key: PIG-365
 URL: https://issues.apache.org/jira/browse/PIG-365
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor

 In map side, only collect top k records to improve performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-08-28 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903803#action_12903803
]

Daniel Dai commented on PIG-1178:
-

test-patch result for PIG-11780-8:

[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or
modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the
total number of release audit warnings.

Patch committed.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: expressions-2.patch, expressions.patch, lp.patch,
lp.patch, PIG-1178-4.patch, PIG-1178-5.patch, PIG-1178-6.patch,
PIG-1178-7.patch, PIG-1178-8.patch, pig_1178.patch, pig_1178.patch,
PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch,
pig_1178_3.4.patch, pig_1178_3.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-08-27 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903263#action_12903263
]

Daniel Dai commented on PIG-506:

Patch looks good. One minor comment, PlanHelper.LoadStoreFinder may better be
PlanHelper.LoadStoreNativeFinder.

Does pig need a NATIVE keyword?
---

Key: PIG-506
URL: https://issues.apache.org/jira/browse/PIG-506
Project: Pig
Issue Type: New Feature
Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
Fix For: 0.8.0

Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch,
NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, PIG-506.2.patch,
PIG-506.3.patch, PIG-506.patch, TestWordCount.jar

Assume a user had a job that broke easily into three pieces. Further assume
that pieces one and three were easily expressible in pig, but that piece two
needed to be written in map reduce for whatever reason (performance,
something that pig could not easily express, legacy job that was too
important to change, etc.). Today the user would either have to use map
reduce for the entire job or manually handle the stitching together of pig
and map reduce jobs. What if instead pig provided a NATIVE keyword that
would allow the script to pass off the data stream to the underlying system
(in this case map reduce). The semantics of NATIVE would vary by underlying
system. In the map reduce case, we would assume that this indicated a
collection of one or more fully contained map reduce jobs, so that pig would
store the data, invoke the map reduce jobs, and then read the resulting data
to continue. It might look something like this:
{code}
A = load 'myfile';
X = load 'myotherfile';
B = group A by $0;
C = foreach B generate group, myudf(B);
D = native (jar=mymr.jar, infile=frompig outfile=topig);
E = join D by $0, X by $0;
...
{code}
This differs from streaming in that it allows the user to insert an arbitrary
amount of native processing, whereas streaming allows the insertion of one
binary. It also differs in that, for streaming, data is piped directly into
and out of the binary as part of the pig pipeline. Here the pipeline would
be broken, data written to disk, and the native block invoked, then data read
back from disk.
Another alternative is to say this is unnecessary because the user can do the
coordination from java, using the PIgServer interface to run pig and calling
the map reduce job explicitly. The advantages of the native keyword are that
the user need not be worried about coordination between the jobs, pig will
take care of it. Also the user can make use of existing java applications
without being a java programmer.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-08-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1512:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

This is already fixed in the latest code. Thanks Swati!

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1321) Logical Optimizer: Merge cascading foreach

2010-08-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1321:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed. Thanks Xuefu!

 Logical Optimizer: Merge cascading foreach
 --

 Key: PIG-1321
 URL: https://issues.apache.org/jira/browse/PIG-1321
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1321-2.patch, jira-1321-3.patch, pig-1321.patch


 We can merge consecutive foreach statement.
 Eg:
 b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
 c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
 = c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1321) Logical Optimizer: Merge cascading foreach

2010-08-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1321:


Attachment: jira-1321-3.patch

Repost the pre-condition:
1. two consecutive foreach statements.
2. the second foreach statement is a simple inner plan in which the ognly 
statement is a GENERATE statement. In other words, the second foreach statement 
must be something like FOREACH A GENERATE 
3. The first foreach statement cannot contain flatten due to its complexity
4. No 1st foreach output is referred more than once in second foreach, eg: B = 
foreach ; C = foreach B generate $0, $1, $0 will not be merged. The reason 
if we merge, $0 will be calculated twice, which defeat the benefit of merging.

All tests pass. test-patch result:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

 Logical Optimizer: Merge cascading foreach
 --

 Key: PIG-1321
 URL: https://issues.apache.org/jira/browse/PIG-1321
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1321-2.patch, jira-1321-3.patch, pig-1321.patch


 We can merge consecutive foreach statement.
 Eg:
 b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
 c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
 = c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1515) Migrate logical optimization rule: PushDownForeachFlatten

2010-08-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1515:


Attachment: jira-1515-2.patch

All tests pass. 

test-patch result:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 Migrate logical optimization rule: PushDownForeachFlatten
 -

 Key: PIG-1515
 URL: https://issues.apache.org/jira/browse/PIG-1515
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1515-1.patch, jira-1515-2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1515) Migrate logical optimization rule: PushDownForeachFlatten

2010-08-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1515:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed. Thanks Xuefu!

 Migrate logical optimization rule: PushDownForeachFlatten
 -

 Key: PIG-1515
 URL: https://issues.apache.org/jira/browse/PIG-1515
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1515-1.patch, jira-1515-2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-08-27 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


Attachment: PIG-1178-8.patch

PIG-1178-8.patch fix TestPruneColumn.testMapKey3

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, PIG-1178-5.patch, PIG-1178-6.patch, 
 PIG-1178-7.patch, PIG-1178-8.patch, pig_1178.patch, pig_1178.patch, 
 PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, 
 pig_1178_3.4.patch, pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1514) Migrate logical optimization rule: OpLimitOptimizer

2010-08-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1514:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Did a combined test-patch with PIG-1497:

 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 80 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 443 release 
audit warnings (more than the trunk's current 433 warnings).

All new source code has license header except for test benchmarks 
(new-optlimitplan*.dot)

Patch committed. Thanks Xuefu!

 Migrate logical optimization rule: OpLimitOptimizer
 ---

 Key: PIG-1514
 URL: https://issues.apache.org/jira/browse/PIG-1514
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1514-0.patch, jira-1514-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1497) Mandatory rule PartitionFilterOptimizer

2010-08-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1497:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Did a combined test-patch with PIG-1514:

[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 80 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac 
compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] -1 release audit. The applied patch generated 443 release audit warnings 
(more than the trunk's current 433 warnings).

All new source code have the license header.

Patch committed. Thanks Xuefu!

 Mandatory rule PartitionFilterOptimizer
 ---

 Key: PIG-1497
 URL: https://issues.apache.org/jira/browse/PIG-1497
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1497-0.patch


 Need to migrate PartitionFilterOptimizer to new logical optimizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1514) Migrate logical optimization rule: OpLimitOptimizer

2010-08-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1514:


Status: Patch Available  (was: Open)

 Migrate logical optimization rule: OpLimitOptimizer
 ---

 Key: PIG-1514
 URL: https://issues.apache.org/jira/browse/PIG-1514
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1514-0.patch, jira-1514-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1337 matches

Mail list logo