[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711480#action_12711480
 ] 

Hadoop QA commented on PIG-697:
---

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12408636/OptimizerPhase3_parrt1.patch
  against trunk revision 776106.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/51/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/51/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/51/console

This message is automatically generated.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly 

[jira] Updated: (PIG-811) Globs with ? in the pattern are broken in local mode

2009-05-21 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated PIG-811:
---

Attachment: local_engine_glob.patch

This patch should fix the problem. Globs are working again in local engine mode.


 Globs with ? in the pattern are broken in local mode
 --

 Key: PIG-811
 URL: https://issues.apache.org/jira/browse/PIG-811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: local_engine_glob.patch


 Script:
 a = load 'studenttab10?';
 dump a;
 Actual file name: studenttab10k
 Stack trace:
 ERROR 2081: Unable to setup the load function.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763)
 at org.apache.pig.PigServer.execute(PigServer.java:756)
 at org.apache.pig.PigServer.access$100(PigServer.java:88)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:923)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:242)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:372)
 Caused by: java.io.IOException: 
 file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at 
 org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-811) Globs with ? in the pattern are broken in local mode

2009-05-21 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated PIG-811:
---

Status: Patch Available  (was: Open)

 Globs with ? in the pattern are broken in local mode
 --

 Key: PIG-811
 URL: https://issues.apache.org/jira/browse/PIG-811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: local_engine_glob.patch


 Script:
 a = load 'studenttab10?';
 dump a;
 Actual file name: studenttab10k
 Stack trace:
 ERROR 2081: Unable to setup the load function.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763)
 at org.apache.pig.PigServer.execute(PigServer.java:756)
 at org.apache.pig.PigServer.access$100(PigServer.java:88)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:923)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:242)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:372)
 Caused by: java.io.IOException: 
 file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at 
 org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-811) Globs with ? in the pattern are broken in local mode

2009-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711549#action_12711549
 ] 

Hadoop QA commented on PIG-811:
---

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12408664/local_engine_glob.patch
  against trunk revision 776106.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/52/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/52/console

This message is automatically generated.

 Globs with ? in the pattern are broken in local mode
 --

 Key: PIG-811
 URL: https://issues.apache.org/jira/browse/PIG-811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: local_engine_glob.patch


 Script:
 a = load 'studenttab10?';
 dump a;
 Actual file name: studenttab10k
 Stack trace:
 ERROR 2081: Unable to setup the load function.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763)
 at org.apache.pig.PigServer.execute(PigServer.java:756)
 at org.apache.pig.PigServer.access$100(PigServer.java:88)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:923)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:242)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:372)
 Caused by: java.io.IOException: 
 file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at 
 org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-21 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711769#action_12711769
 ] 

Pradeep Kamath commented on PIG-802:


Review comments:
In MRCompiler, does POPackageLite need to be used in the following too:
{noformat}
if (limit!=-1) {
 POPackage pkg_c = new POPackage(new 
OperatorKey(scope,nig.getNextNodeId(scope)));
...
}
{noformat}

In POPackage, the following declarations :
{noformat}
IteratorNullableTuple tupIter; 

Object key; 
{noformat}
should have protected access specifier to make the intent that these are used 
in POPackageLite explicit.

In ReadOnceBag.equals() you could also check if the keyInfo maps are equal.

The getValueTuple() in ReadOnceBag had duplicate code from 
POPackage.getValueTuple(). Instead of having the same code in two places, I am 
wondering if you could just construct ReadOnceBag with a POPackageLite instance 
passed in the constructor. Then if you make the POPackageLite.getValueTuple() 
method public, you can just invoke it from ReadOnceBag code. This way the code 
remains in one place. 

 PERFORMANCE: not creating bags for ORDER BY
 ---

 Key: PIG-802
 URL: https://issues.apache.org/jira/browse/PIG-802
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: OrderByOptimization.patch


 Order by should be changed to not use POPackage to put all of the tuples in a 
 bag on the reduce side, as the bag is just immediately flattened. It can 
 instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-813) Semantics of * and count

2009-05-21 Thread George Mavromatis (JIRA)
Semantics of * and count


 Key: PIG-813
 URL: https://issues.apache.org/jira/browse/PIG-813
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: George Mavromatis
Priority: Critical
 Fix For: 0.2.0


Pig script to count the number of rows in a studenttab10k file which contains 
10k records.
{code}
studenttab = LOAD 'studenttab10k' AS (name:chararray, age:int,gpa:float);

X2 = GROUP studenttab ALL;

describe X2;

Y2 = FOREACH X2 GENERATE COUNT(*);

explain Y2;

DUMP Y2;

{code}

returns the following error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator 
for alias Y2
Details at logfile: /homes/viraj/pig-svn/trunk/pig_1242783700970.log


If you look at the log file:

Caused by: java.lang.ClassCastException
at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:76)
at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:68)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:223)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:245)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:88)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-813) Semantics of * and count

2009-05-21 Thread George Mavromatis (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Mavromatis updated PIG-813:
--

Component/s: (was: impl)
 documentation
   Priority: Major  (was: Critical)
Description: 
Continuation of PIG-812. See PIG-812 for more details.


In order for this to be resolved in the right manner the following must added 
in the http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html

1) The semantics of * as explained by Olga.
2) An example of GROUP ALL

Otherwise people will waste their time doing the same (documentation-caused) 
mistakes again.

  was:
Pig script to count the number of rows in a studenttab10k file which contains 
10k records.
{code}
studenttab = LOAD 'studenttab10k' AS (name:chararray, age:int,gpa:float);

X2 = GROUP studenttab ALL;

describe X2;

Y2 = FOREACH X2 GENERATE COUNT(*);

explain Y2;

DUMP Y2;

{code}

returns the following error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator 
for alias Y2
Details at logfile: /homes/viraj/pig-svn/trunk/pig_1242783700970.log


If you look at the log file:

Caused by: java.lang.ClassCastException
at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:76)
at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:68)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:223)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:245)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:88)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)



 Semantics of * and count
 

 Key: PIG-813
 URL: https://issues.apache.org/jira/browse/PIG-813
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.2.0
Reporter: George Mavromatis
 Fix For: 0.2.0


 Continuation of PIG-812. See PIG-812 for more details.
 In order for this to be resolved in the right manner the following must added 
 in the http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html
 1) The semantics of * as explained by Olga.
 2) An example of GROUP ALL
 Otherwise people will waste their time doing the same (documentation-caused) 
 mistakes again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-21 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711811#action_12711811
 ] 

Pradeep Kamath commented on PIG-802:


I think even in the future if ReadOnceBags are used in places other than order 
by, they would need to be used immediately after a POPackageLite. So tying the 
two together is not bad and would reduce code duplication. 

 PERFORMANCE: not creating bags for ORDER BY
 ---

 Key: PIG-802
 URL: https://issues.apache.org/jira/browse/PIG-802
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: OrderByOptimization.patch


 Order by should be changed to not use POPackage to put all of the tuples in a 
 bag on the reduce side, as the bag is just immediately flattened. It can 
 instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-814) Make Binstorage more robust when data contains record markers

2009-05-21 Thread Pradeep Kamath (JIRA)
Make Binstorage more robust when data contains record markers
-

 Key: PIG-814
 URL: https://issues.apache.org/jira/browse/PIG-814
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0


When the inputstream for BinStorage is at a position where the data has the 
record marker sequence, the code incorrectly assumes that it is at the 
beginning of a record (tuple) and calls DataReaderWriter.readDatum() trying to 
read the tuple. The problem is more likely when RandomSampleLoader (used in 
order by implementation) skips the input stream for sampling and calls 
Binstorage.getNext(). The code should be more robust in such cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711861#action_12711861
 ] 

Alan Gates commented on PIG-697:


Comments on OptimizerPhase3_parrt1.patch

Why does LOSplit say it requires no fields?  If the split has filter conditions 
then it seems like it would need those fields.

Shouldn't LOStream require all fields rather than none?  It seems like users 
will have written their scripts assuming that their stream executable gets all 
of the fields coming out of the previous operator.



 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
 

[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711863#action_12711863
 ] 

Santhosh Srinivasan commented on PIG-697:
-

LOSplit is a no-op operator. LOSplitOutput is modeled after filter.

Fair comment about LOStream.  I will make this change and resubmit the patch.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed 

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: In Progress  (was: Patch Available)

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second operator, will be pushed in front of first
  * @param inputNum, 

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Attachment: OptimizerPhase3_parrt1-1.patch

Attaching patch incorporating the review comments.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * 

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: Patch Available  (was: In Progress)

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second operator, will be pushed in front of 

[jira] Commented: (PIG-811) Globs with ? in the pattern are broken in local mode

2009-05-21 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711878#action_12711878
 ] 

Olga Natkovich commented on PIG-811:


+1, patch looks good. will be committing shortly

 Globs with ? in the pattern are broken in local mode
 --

 Key: PIG-811
 URL: https://issues.apache.org/jira/browse/PIG-811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: local_engine_glob.patch


 Script:
 a = load 'studenttab10?';
 dump a;
 Actual file name: studenttab10k
 Stack trace:
 ERROR 2081: Unable to setup the load function.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763)
 at org.apache.pig.PigServer.execute(PigServer.java:756)
 at org.apache.pig.PigServer.access$100(PigServer.java:88)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:923)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:242)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:372)
 Caused by: java.io.IOException: 
 file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at 
 org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-811) Globs with ? in the pattern are broken in local mode

2009-05-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-811:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed; thanks, gunther!

 Globs with ? in the pattern are broken in local mode
 --

 Key: PIG-811
 URL: https://issues.apache.org/jira/browse/PIG-811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: local_engine_glob.patch


 Script:
 a = load 'studenttab10?';
 dump a;
 Actual file name: studenttab10k
 Stack trace:
 ERROR 2081: Unable to setup the load function.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763)
 at org.apache.pig.PigServer.execute(PigServer.java:756)
 at org.apache.pig.PigServer.access$100(PigServer.java:88)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:923)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:242)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:372)
 Caused by: java.io.IOException: 
 file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at 
 org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711912#action_12711912
 ] 

Hadoop QA commented on PIG-697:
---

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12408756/OptimizerPhase3_parrt1-1.patch
  against trunk revision 776106.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/53/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/53/console

This message is automatically generated.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This