[jira] Commented: (PIG-811) Globs with ? in the pattern are broken in local mode

2009-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712039#action_12712039
 ] 

Hudson commented on PIG-811:


Integrated in Pig-trunk #450 (See 
[http://hudson.zones.apache.org/hudson/job/Pig-trunk/450/])
: Globs with ? in the pattern are broken in local mode
(hagleitn via olgan)


 Globs with ? in the pattern are broken in local mode
 --

 Key: PIG-811
 URL: https://issues.apache.org/jira/browse/PIG-811
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.3.0

 Attachments: local_engine_glob.patch


 Script:
 a = load 'studenttab10?';
 dump a;
 Actual file name: studenttab10k
 Stack trace:
 ERROR 2081: Unable to setup the load function.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763)
 at org.apache.pig.PigServer.execute(PigServer.java:756)
 at org.apache.pig.PigServer.access$100(PigServer.java:88)
 at org.apache.pig.PigServer$Graph.execute(PigServer.java:923)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:242)
 at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:372)
 Caused by: java.io.IOException: 
 file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at 
 org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-05-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712153#action_12712153
 ] 

Alan Gates commented on PIG-697:


+1 for latest rev of part 3.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second operator, will be 

[jira] Commented: (PIG-814) Make Binstorage more robust when data contains record markers

2009-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712234#action_12712234
 ] 

Hadoop QA commented on PIG-814:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408830/PIG-814.patch
  against trunk revision 777334.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/console

This message is automatically generated.

 Make Binstorage more robust when data contains record markers
 -

 Key: PIG-814
 URL: https://issues.apache.org/jira/browse/PIG-814
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-814.patch


 When the inputstream for BinStorage is at a position where the data has the 
 record marker sequence, the code incorrectly assumes that it is at the 
 beginning of a record (tuple) and calls DataReaderWriter.readDatum() trying 
 to read the tuple. The problem is more likely when RandomSampleLoader (used 
 in order by implementation) skips the input stream for sampling and calls 
 Binstorage.getNext(). The code should be more robust in such cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #54

2009-05-22 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/changes

Changes:

[olga] PIG-811: Globs with ? in the pattern are broken in local mode
(hagleitn via olgan)

--
[...truncated 91347 lines...]
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block 
blk_-2984143651417806180_1010 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 2 
for block blk_-2984143651417806180_1010 terminating
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39985 is added to 
blk_-2984143651417806180_1010 size 6
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42285 is added to 
blk_-2984143651417806180_1010 size 6
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_8913105092576416601_1011
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Receiving block 
blk_8913105092576416601_1011 src: /127.0.0.1:43940 dest: /127.0.0.1:43013
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Receiving block 
blk_8913105092576416601_1011 src: /127.0.0.1:58247 dest: /127.0.0.1:42285
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Receiving block 
blk_8913105092576416601_1011 src: /127.0.0.1:47324 dest: /127.0.0.1:37818
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block 
blk_8913105092576416601_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 0 
for block blk_8913105092576416601_1011 terminating
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37818 is added to 
blk_8913105092576416601_1011 size 6
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block 
blk_8913105092576416601_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42285 is added to 
blk_8913105092576416601_1011 size 6
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 1 
for block blk_8913105092576416601_1011 terminating
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:43013 is added to 
blk_8913105092576416601_1011 size 6
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block 
blk_8913105092576416601_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 2 
for block blk_8913105092576416601_1011 terminating
 [exec] [junit] 09/05/22 13:31:40 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:59719
 [exec] [junit] 09/05/22 13:31:40 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:42267
 [exec] [junit] 09/05/22 13:31:40 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/05/22 13:31:40 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Deleting block 
blk_119542544782224106_1005 file dfs/data/data7/current/blk_119542544782224106
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Deleting block 
blk_1396965902325380469_1006 file dfs/data/data8/current/blk_1396965902325380469
 [exec] [junit] 09/05/22 13:31:41 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/05/22 13:31:41 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200905221330_0002/job.jar. 
blk_-812068944017254210_1012
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Receiving block 
blk_-812068944017254210_1012 src: /127.0.0.1:47325 dest: /127.0.0.1:37818
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Receiving block 
blk_-812068944017254210_1012 src: /127.0.0.1:43944 dest: /127.0.0.1:43013
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Receiving block 
blk_-812068944017254210_1012 src: /127.0.0.1:36802 dest: /127.0.0.1:39985
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Received block 
blk_-812068944017254210_1012 of size 1393103 from /127.0.0.1
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: PacketResponder 0 
for block blk_-812068944017254210_1012 terminating
 [exec] [junit] 09/05/22 13:31:41 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 

[jira] Updated: (PIG-67) FileLocalizer doesn't work on reduce sise

2009-05-22 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-67:
---

Attachment: FileLocalizer.java

get JobConf from PigMapReduce class  so that  reducers can operate on files as 
well.

 FileLocalizer doesn't work on reduce sise
 -

 Key: PIG-67
 URL: https://issues.apache.org/jira/browse/PIG-67
 Project: Pig
  Issue Type: Bug
Reporter: Utkarsh Srivastava
 Attachments: FileLocalizer.java


 FileLocalizer.openDFSFile() does not work on the reduce side. This is 
 probably because FileLocalizer uses PigRecordReader which exists only on the 
 map task.
 The correct solution will be for FileLocalizer to have a hadoop conf that is 
 initialized by the reduce task on the reduce side, and the pig record reader 
 on the map side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-05-22 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712260#action_12712260
 ] 

Santhosh Srinivasan commented on PIG-697:
-

Patch OptimizerPhase3_part-1.patch has been committed.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple 

[jira] Updated: (PIG-67) FileLocalizer doesn't work on reduce sise

2009-05-22 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-67:
---

Attachment: (was: FileLocalizer.java)

 FileLocalizer doesn't work on reduce sise
 -

 Key: PIG-67
 URL: https://issues.apache.org/jira/browse/PIG-67
 Project: Pig
  Issue Type: Bug
Reporter: Utkarsh Srivastava

 FileLocalizer.openDFSFile() does not work on the reduce side. This is 
 probably because FileLocalizer uses PigRecordReader which exists only on the 
 map task.
 The correct solution will be for FileLocalizer to have a hadoop conf that is 
 initialized by the reduce task on the reduce side, and the pig record reader 
 on the map side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



UDF with parameters?

2009-05-22 Thread Brian Long
Hi,

I'm interested in developing a PERCENTILE UDF, e.g. for calculating a
median, 99th percentile, 90th percentile, etc. I'd like the UDF to be
parametric with respect to the percentile being requested, but I don't see
any way to do that, and it seems like I might need to create PERCENTILE_50,
PERCENTILE_90, etc type UDFs explicitly, versus being able to do something
like GENERATE PERCENTILE(90, duration)

I'm new to Pig, so I might be missing the way to do this... is it possible?

Thanks,
Brian


Re: UDF with parameters?

2009-05-22 Thread Alan Gates
Yes, it is possible.  The UDF should take the percentage you want as a  
constructor argument.  It will have to be passed as a string and  
converted.  Then in your Pig Latin, you will use the DEFINE statement  
to pass the argument to the constructor.


REGISTER /src/myfunc.jar
DEFINE percentile myfunc.percentile('90');
A = LOAD 'students' as (name, gpa);
B = FOREACH A GENERATE percentile(gpa);

See http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html#DEFINE for  
more details.


Alan.

On May 22, 2009, at 3:37 PM, Brian Long wrote:


Hi,

I'm interested in developing a PERCENTILE UDF, e.g. for calculating a
median, 99th percentile, 90th percentile, etc. I'd like the UDF to be
parametric with respect to the percentile being requested, but I  
don't see
any way to do that, and it seems like I might need to create  
PERCENTILE_50,
PERCENTILE_90, etc type UDFs explicitly, versus being able to do  
something

like GENERATE PERCENTILE(90, duration)

I'm new to Pig, so I might be missing the way to do this... is it  
possible?


Thanks,
Brian




[jira] Updated: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception

2009-05-22 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated PIG-656:
--

Attachment: reserved.patch

This patch allows the use of reserved words in function names. To avoid parsing 
ambiguity, the first part of the fully qualified function name (i.e. test 
before the first .) cannot be a reserved word. But the rest of the parts in 
fully qualified function names can be any identifier, including a reserved word.

So, for example, with this patch, the statement:

{code}
define X com.yahoo.load();
{code}

or

{code}
modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist));
{code}

Now compiles and runs perfectly well.

 Use of eval or any other keyword in the package hierarchy of a UDF causes 
 parse exception
 -

 Key: PIG-656
 URL: https://issues.apache.org/jira/browse/PIG-656
 Project: Pig
  Issue Type: Bug
  Components: documentation, grunt
Affects Versions: 0.2.1
Reporter: Viraj Bhat
Assignee: Milind Bhandarkar
 Fix For: 0.3.0

 Attachments: mywordcount.txt, reserved.patch, TOKENIZE.jar


 Consider a Pig script which does something similar to a word count. It uses 
 the built-in TOKENIZE function, but packages it inside a class hierarchy such 
 as mypackage.eval
 {code}
 register TOKENIZE.jar
 my_src  = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t')  AS 
 (mlist: chararray);
 modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist));
 describe modules;
 grouped = GROUP modules BY $0;
 describe grouped;
 counts  = FOREACH grouped GENERATE COUNT(modules), group;
 ordered = ORDER counts BY $0;
 dump ordered;
 {code}
 The parser complains:
 ===
 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray}
 ===
 I looked at the following source code at 
 (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems 
 that : EVAL is a keyword in Pig. Here are some clarifications:
 1) Is there documentation on what the EVAL keyword actually is?
 2) Is EVAL keyword actually implemented?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-22 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-802:
-

Attachment: (was: OrderByOptimization.patch)

 PERFORMANCE: not creating bags for ORDER BY
 ---

 Key: PIG-802
 URL: https://issues.apache.org/jira/browse/PIG-802
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: OrderByOptimization.patch


 Order by should be changed to not use POPackage to put all of the tuples in a 
 bag on the reduce side, as the bag is just immediately flattened. It can 
 instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-22 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-802:
-

Attachment: OrderByOptimization.patch

Attaching the modified patch. The detachInput method in POPackageLite will set 
key and tupIter to null. So ReadOnceBag maintains separate references to them. 
POPackageLite overloads the getValueTuple method with the additional key 
parameter to use the one provided by ReadOnceBag. The implementation of 
POPackage is untouched.

 PERFORMANCE: not creating bags for ORDER BY
 ---

 Key: PIG-802
 URL: https://issues.apache.org/jira/browse/PIG-802
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: OrderByOptimization.patch


 Order by should be changed to not use POPackage to put all of the tuples in a 
 bag on the reduce side, as the bag is just immediately flattened. It can 
 instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #55

2009-05-22 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/changes

Changes:

[sms] PIG-697: Proposed improvements to pig's optimizer

[pradeepkth] PIG-814:Make Binstorage more robust when data contains record 
markers (pradeepkth)

--
[...truncated 91184 lines...]
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Deleting block 
blk_-1090490746339115162_1005 file 
dfs/data/data1/current/blk_-1090490746339115162
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_11151567103307144_1011
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Receiving block 
blk_11151567103307144_1011 src: /127.0.0.1:38664 dest: /127.0.0.1:49311
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Receiving block 
blk_11151567103307144_1011 src: /127.0.0.1:34216 dest: /127.0.0.1:51371
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Receiving block 
blk_11151567103307144_1011 src: /127.0.0.1:34352 dest: /127.0.0.1:59469
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Received block 
blk_11151567103307144_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: PacketResponder 0 
for block blk_11151567103307144_1011 terminating
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59469 is added to 
blk_11151567103307144_1011 size 6
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Received block 
blk_11151567103307144_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: PacketResponder 1 
for block blk_11151567103307144_1011 terminating
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Received block 
blk_11151567103307144_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: PacketResponder 2 
for block blk_11151567103307144_1011 terminating
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:51371 is added to 
blk_11151567103307144_1011 size 6
 [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:49311 is added to 
blk_11151567103307144_1011 size 6
 [exec] [junit] 09/05/22 18:59:35 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:33302
 [exec] [junit] 09/05/22 18:59:35 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:39094
 [exec] [junit] 09/05/22 18:59:35 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/05/22 18:59:35 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/05/22 18:59:36 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/05/22 18:59:36 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/05/22 18:59:36 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200905221858_0002/job.jar. 
blk_-5203208028961802060_1012
 [exec] [junit] 09/05/22 18:59:36 INFO dfs.DataNode: Receiving block 
blk_-5203208028961802060_1012 src: /127.0.0.1:34353 dest: /127.0.0.1:59469
 [exec] [junit] 09/05/22 18:59:36 INFO dfs.DataNode: Receiving block 
blk_-5203208028961802060_1012 src: /127.0.0.1:36700 dest: /127.0.0.1:48790
 [exec] [junit] 09/05/22 18:59:36 INFO dfs.DataNode: Receiving block 
blk_-5203208028961802060_1012 src: /127.0.0.1:34220 dest: /127.0.0.1:51371
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: Received block 
blk_-5203208028961802060_1012 of size 1405185 from /127.0.0.1
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: PacketResponder 0 
for block blk_-5203208028961802060_1012 terminating
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:51371 is added to 
blk_-5203208028961802060_1012 size 1405185
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: Received block 
blk_-5203208028961802060_1012 of size 1405185 from /127.0.0.1
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: PacketResponder 1 
for block blk_-5203208028961802060_1012 terminating
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48790 is added to 
blk_-5203208028961802060_1012 size 1405185
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: Received block 
blk_-5203208028961802060_1012 of size 1405185 from /127.0.0.1
 [exec] [junit] 09/05/22 18:59:37 INFO dfs.StateChange: BLOCK* 

[jira] Commented: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception

2009-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712333#action_12712333
 ] 

Hadoop QA commented on PIG-656:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408857/reserved.patch
  against trunk revision 08.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/console

This message is automatically generated.

 Use of eval or any other keyword in the package hierarchy of a UDF causes 
 parse exception
 -

 Key: PIG-656
 URL: https://issues.apache.org/jira/browse/PIG-656
 Project: Pig
  Issue Type: Bug
  Components: documentation, grunt
Affects Versions: 0.2.1
Reporter: Viraj Bhat
Assignee: Milind Bhandarkar
 Fix For: 0.3.0

 Attachments: mywordcount.txt, reserved.patch, TOKENIZE.jar


 Consider a Pig script which does something similar to a word count. It uses 
 the built-in TOKENIZE function, but packages it inside a class hierarchy such 
 as mypackage.eval
 {code}
 register TOKENIZE.jar
 my_src  = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t')  AS 
 (mlist: chararray);
 modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist));
 describe modules;
 grouped = GROUP modules BY $0;
 describe grouped;
 counts  = FOREACH grouped GENERATE COUNT(modules), group;
 ordered = ORDER counts BY $0;
 dump ordered;
 {code}
 The parser complains:
 ===
 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray}
 ===
 I looked at the following source code at 
 (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems 
 that : EVAL is a keyword in Pig. Here are some clarifications:
 1) Is there documentation on what the EVAL keyword actually is?
 2) Is EVAL keyword actually implemented?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-816) PigStorage() does not accept Unicode characters in its contructor

2009-05-22 Thread Viraj Bhat (JIRA)
PigStorage() does not accept Unicode characters in its contructor 
--

 Key: PIG-816
 URL: https://issues.apache.org/jira/browse/PIG-816
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Priority: Critical
 Fix For: 0.3.0


Simple Pig script which uses Unicode characters in the PigStorage() constructor 
fails with the following error:

{code}
studenttab = LOAD '/user/viraj/studenttab10k' AS (name:chararray, 
age:int,gpa:float);
X2 = GROUP studenttab by age;
Y2 = FOREACH X2 GENERATE group, COUNT(studenttab);
store Y2 into '/user/viraj/y2' using PigStorage('\u0001');
{code}


ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate 
exception from backend error: org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException: 
Character reference #1 is an invalid XML character.

Attaching log file.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-816) PigStorage() does not accept Unicode characters in its contructor

2009-05-22 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-816:
---

Attachment: pig_1243043613713.log

Log file for detailed error message

 PigStorage() does not accept Unicode characters in its contructor 
 --

 Key: PIG-816
 URL: https://issues.apache.org/jira/browse/PIG-816
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Priority: Critical
 Fix For: 0.3.0

 Attachments: pig_1243043613713.log


 Simple Pig script which uses Unicode characters in the PigStorage() 
 constructor fails with the following error:
 {code}
 studenttab = LOAD '/user/viraj/studenttab10k' AS (name:chararray, 
 age:int,gpa:float);
 X2 = GROUP studenttab by age;
 Y2 = FOREACH X2 GENERATE group, COUNT(studenttab);
 store Y2 into '/user/viraj/y2' using PigStorage('\u0001');
 {code}
 
 ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate 
 exception from backend error: org.apache.hadoop.ipc.RemoteException: 
 java.io.IOException: java.lang.RuntimeException: 
 org.xml.sax.SAXParseException: Character reference #1 is an invalid XML 
 character.
 
 Attaching log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.