[jira] Commented: (PIG-811) Globs with ? in the pattern are broken in local mode
[ https://issues.apache.org/jira/browse/PIG-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712039#action_12712039 ] Hudson commented on PIG-811: Integrated in Pig-trunk #450 (See [http://hudson.zones.apache.org/hudson/job/Pig-trunk/450/]) : Globs with ? in the pattern are broken in local mode (hagleitn via olgan) Globs with ? in the pattern are broken in local mode -- Key: PIG-811 URL: https://issues.apache.org/jira/browse/PIG-811 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Gunther Hagleitner Fix For: 0.3.0 Attachments: local_engine_glob.patch Script: a = load 'studenttab10?'; dump a; Actual file name: studenttab10k Stack trace: ERROR 2081: Unable to setup the load function. org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function. at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:128) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:163) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:763) at org.apache.pig.PigServer.execute(PigServer.java:756) at org.apache.pig.PigServer.access$100(PigServer.java:88) at org.apache.pig.PigServer$Graph.execute(PigServer.java:923) at org.apache.pig.PigServer.executeBatch(PigServer.java:242) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:151) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:123) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:372) Caused by: java.io.IOException: file:/home/y/share/pigtest/local/data/singlefile/studenttab10 does not exist at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188) at org.apache.pig.impl.io.FileLocalizer.openLFSFile(FileLocalizer.java:244) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:299) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:96) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:124) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer
[ https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712153#action_12712153 ] Alan Gates commented on PIG-697: +1 for latest rev of part 3. Proposed improvements to pig's optimizer Key: PIG-697 URL: https://issues.apache.org/jira/browse/PIG-697 Project: Pig Issue Type: Bug Components: impl Reporter: Alan Gates Assignee: Santhosh Srinivasan Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, OptimizerPhase3_parrt1.patch I propose the following changes to pig optimizer, plan, and operator functionality to support more robust optimization: 1) Remove the required array from Rule. This will change rules so that they only match exact patterns instead of allowing missing elements in the pattern. This has the downside that if a given rule applies to two patterns (say Load-Filter-Group, Load-Group) you have to write two rules. But it has the upside that the resulting rules know exactly what they are getting. The original intent of this was to reduce the number of rules that needed to be written. But the resulting rules have do a lot of work to understand the operators they are working with. With exact matches only, each rule will know exactly the operators it is working on and can apply the logic of shifting the operators around. All four of the existing rules set all entries of required to true, so removing this will have no effect on them. 2) Change PlanOptimizer.optimize to iterate over the rules until there are no conversions or a certain number of iterations has been reached. Currently the function is: {code} public final void optimize() throws OptimizerException { RuleMatcher matcher = new RuleMatcher(); for (Rule rule : mRules) { if (matcher.match(rule)) { // It matches the pattern. Now check if the transformer // approves as well. ListListO matches = matcher.getAllMatches(); for (ListO match:matches) { if (rule.transformer.check(match)) { // The transformer approves. rule.transformer.transform(match); } } } } } {code} It would change to be: {code} public final void optimize() throws OptimizerException { RuleMatcher matcher = new RuleMatcher(); boolean sawMatch; int iterators = 0; do { sawMatch = false; for (Rule rule : mRules) { ListListO matches = matcher.getAllMatches(); for (ListO match:matches) { // It matches the pattern. Now check if the transformer // approves as well. if (rule.transformer.check(match)) { // The transformer approves. sawMatch = true; rule.transformer.transform(match); } } } // Not sure if 1000 is the right number of iterations, maybe it // should be configurable so that large scripts don't stop too // early. } while (sawMatch numIterations++ 1000); } {code} The reason for limiting the number of iterations is to avoid infinite loops. The reason for iterating over the rules is so that each rule can be applied multiple times as necessary. This allows us to write simple rules, mostly swaps between neighboring operators, without worrying that we get the plan right in one pass. For example, we might have a plan that looks like: Load-Join-Filter-Foreach, and we want to optimize it to Load-Foreach-Filter-Join. With two simple rules (swap filter and join and swap foreach and filter), applied iteratively, we can get from the initial to final plan, without needing to understanding the big picture of the entire plan. 3) Add three calls to OperatorPlan: {code} /** * Swap two operators in a plan. Both of the operators must have single * inputs and single outputs. * @param first operator * @param second operator * @throws PlanException if either operator is not single input and output. */ public void swap(E first, E second) throws PlanException { ... } /** * Push one operator in front of another. This function is for use when * the first operator has multiple inputs. The caller can specify * which input of the first operator the second operator should be pushed to. * @param first operator, assumed to have multiple inputs. * @param second operator, will be
[jira] Commented: (PIG-814) Make Binstorage more robust when data contains record markers
[ https://issues.apache.org/jira/browse/PIG-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712234#action_12712234 ] Hadoop QA commented on PIG-814: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408830/PIG-814.patch against trunk revision 777334. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/console This message is automatically generated. Make Binstorage more robust when data contains record markers - Key: PIG-814 URL: https://issues.apache.org/jira/browse/PIG-814 Project: Pig Issue Type: Bug Affects Versions: 0.2.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.3.0 Attachments: PIG-814.patch When the inputstream for BinStorage is at a position where the data has the record marker sequence, the code incorrectly assumes that it is at the beginning of a record (tuple) and calls DataReaderWriter.readDatum() trying to read the tuple. The problem is more likely when RandomSampleLoader (used in order by implementation) skips the input stream for sampling and calls Binstorage.getNext(). The code should be more robust in such cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #54
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/54/changes Changes: [olga] PIG-811: Globs with ? in the pattern are broken in local mode (hagleitn via olgan) -- [...truncated 91347 lines...] [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block blk_-2984143651417806180_1010 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 2 for block blk_-2984143651417806180_1010 terminating [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39985 is added to blk_-2984143651417806180_1010 size 6 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42285 is added to blk_-2984143651417806180_1010 size 6 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/hudson/input2.txt. blk_8913105092576416601_1011 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Receiving block blk_8913105092576416601_1011 src: /127.0.0.1:43940 dest: /127.0.0.1:43013 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Receiving block blk_8913105092576416601_1011 src: /127.0.0.1:58247 dest: /127.0.0.1:42285 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Receiving block blk_8913105092576416601_1011 src: /127.0.0.1:47324 dest: /127.0.0.1:37818 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block blk_8913105092576416601_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 0 for block blk_8913105092576416601_1011 terminating [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37818 is added to blk_8913105092576416601_1011 size 6 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block blk_8913105092576416601_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42285 is added to blk_8913105092576416601_1011 size 6 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 1 for block blk_8913105092576416601_1011 terminating [exec] [junit] 09/05/22 13:31:40 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:43013 is added to blk_8913105092576416601_1011 size 6 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: Received block blk_8913105092576416601_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 13:31:40 INFO dfs.DataNode: PacketResponder 2 for block blk_8913105092576416601_1011 terminating [exec] [junit] 09/05/22 13:31:40 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:59719 [exec] [junit] 09/05/22 13:31:40 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:42267 [exec] [junit] 09/05/22 13:31:40 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/05/22 13:31:40 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Deleting block blk_119542544782224106_1005 file dfs/data/data7/current/blk_119542544782224106 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Deleting block blk_1396965902325380469_1006 file dfs/data/data8/current/blk_1396965902325380469 [exec] [junit] 09/05/22 13:31:41 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/05/22 13:31:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/05/22 13:31:41 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200905221330_0002/job.jar. blk_-812068944017254210_1012 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Receiving block blk_-812068944017254210_1012 src: /127.0.0.1:47325 dest: /127.0.0.1:37818 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Receiving block blk_-812068944017254210_1012 src: /127.0.0.1:43944 dest: /127.0.0.1:43013 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Receiving block blk_-812068944017254210_1012 src: /127.0.0.1:36802 dest: /127.0.0.1:39985 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: Received block blk_-812068944017254210_1012 of size 1393103 from /127.0.0.1 [exec] [junit] 09/05/22 13:31:41 INFO dfs.DataNode: PacketResponder 0 for block blk_-812068944017254210_1012 terminating [exec] [junit] 09/05/22 13:31:41 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated:
[jira] Updated: (PIG-67) FileLocalizer doesn't work on reduce sise
[ https://issues.apache.org/jira/browse/PIG-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-67: --- Attachment: FileLocalizer.java get JobConf from PigMapReduce class so that reducers can operate on files as well. FileLocalizer doesn't work on reduce sise - Key: PIG-67 URL: https://issues.apache.org/jira/browse/PIG-67 Project: Pig Issue Type: Bug Reporter: Utkarsh Srivastava Attachments: FileLocalizer.java FileLocalizer.openDFSFile() does not work on the reduce side. This is probably because FileLocalizer uses PigRecordReader which exists only on the map task. The correct solution will be for FileLocalizer to have a hadoop conf that is initialized by the reduce task on the reduce side, and the pig record reader on the map side. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer
[ https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712260#action_12712260 ] Santhosh Srinivasan commented on PIG-697: - Patch OptimizerPhase3_part-1.patch has been committed. Proposed improvements to pig's optimizer Key: PIG-697 URL: https://issues.apache.org/jira/browse/PIG-697 Project: Pig Issue Type: Bug Components: impl Reporter: Alan Gates Assignee: Santhosh Srinivasan Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, OptimizerPhase3_parrt1.patch I propose the following changes to pig optimizer, plan, and operator functionality to support more robust optimization: 1) Remove the required array from Rule. This will change rules so that they only match exact patterns instead of allowing missing elements in the pattern. This has the downside that if a given rule applies to two patterns (say Load-Filter-Group, Load-Group) you have to write two rules. But it has the upside that the resulting rules know exactly what they are getting. The original intent of this was to reduce the number of rules that needed to be written. But the resulting rules have do a lot of work to understand the operators they are working with. With exact matches only, each rule will know exactly the operators it is working on and can apply the logic of shifting the operators around. All four of the existing rules set all entries of required to true, so removing this will have no effect on them. 2) Change PlanOptimizer.optimize to iterate over the rules until there are no conversions or a certain number of iterations has been reached. Currently the function is: {code} public final void optimize() throws OptimizerException { RuleMatcher matcher = new RuleMatcher(); for (Rule rule : mRules) { if (matcher.match(rule)) { // It matches the pattern. Now check if the transformer // approves as well. ListListO matches = matcher.getAllMatches(); for (ListO match:matches) { if (rule.transformer.check(match)) { // The transformer approves. rule.transformer.transform(match); } } } } } {code} It would change to be: {code} public final void optimize() throws OptimizerException { RuleMatcher matcher = new RuleMatcher(); boolean sawMatch; int iterators = 0; do { sawMatch = false; for (Rule rule : mRules) { ListListO matches = matcher.getAllMatches(); for (ListO match:matches) { // It matches the pattern. Now check if the transformer // approves as well. if (rule.transformer.check(match)) { // The transformer approves. sawMatch = true; rule.transformer.transform(match); } } } // Not sure if 1000 is the right number of iterations, maybe it // should be configurable so that large scripts don't stop too // early. } while (sawMatch numIterations++ 1000); } {code} The reason for limiting the number of iterations is to avoid infinite loops. The reason for iterating over the rules is so that each rule can be applied multiple times as necessary. This allows us to write simple rules, mostly swaps between neighboring operators, without worrying that we get the plan right in one pass. For example, we might have a plan that looks like: Load-Join-Filter-Foreach, and we want to optimize it to Load-Foreach-Filter-Join. With two simple rules (swap filter and join and swap foreach and filter), applied iteratively, we can get from the initial to final plan, without needing to understanding the big picture of the entire plan. 3) Add three calls to OperatorPlan: {code} /** * Swap two operators in a plan. Both of the operators must have single * inputs and single outputs. * @param first operator * @param second operator * @throws PlanException if either operator is not single input and output. */ public void swap(E first, E second) throws PlanException { ... } /** * Push one operator in front of another. This function is for use when * the first operator has multiple inputs. The caller can specify * which input of the first operator the second operator should be pushed to. * @param first operator, assumed to have multiple
[jira] Updated: (PIG-67) FileLocalizer doesn't work on reduce sise
[ https://issues.apache.org/jira/browse/PIG-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-67: --- Attachment: (was: FileLocalizer.java) FileLocalizer doesn't work on reduce sise - Key: PIG-67 URL: https://issues.apache.org/jira/browse/PIG-67 Project: Pig Issue Type: Bug Reporter: Utkarsh Srivastava FileLocalizer.openDFSFile() does not work on the reduce side. This is probably because FileLocalizer uses PigRecordReader which exists only on the map task. The correct solution will be for FileLocalizer to have a hadoop conf that is initialized by the reduce task on the reduce side, and the pig record reader on the map side. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
UDF with parameters?
Hi, I'm interested in developing a PERCENTILE UDF, e.g. for calculating a median, 99th percentile, 90th percentile, etc. I'd like the UDF to be parametric with respect to the percentile being requested, but I don't see any way to do that, and it seems like I might need to create PERCENTILE_50, PERCENTILE_90, etc type UDFs explicitly, versus being able to do something like GENERATE PERCENTILE(90, duration) I'm new to Pig, so I might be missing the way to do this... is it possible? Thanks, Brian
Re: UDF with parameters?
Yes, it is possible. The UDF should take the percentage you want as a constructor argument. It will have to be passed as a string and converted. Then in your Pig Latin, you will use the DEFINE statement to pass the argument to the constructor. REGISTER /src/myfunc.jar DEFINE percentile myfunc.percentile('90'); A = LOAD 'students' as (name, gpa); B = FOREACH A GENERATE percentile(gpa); See http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html#DEFINE for more details. Alan. On May 22, 2009, at 3:37 PM, Brian Long wrote: Hi, I'm interested in developing a PERCENTILE UDF, e.g. for calculating a median, 99th percentile, 90th percentile, etc. I'd like the UDF to be parametric with respect to the percentile being requested, but I don't see any way to do that, and it seems like I might need to create PERCENTILE_50, PERCENTILE_90, etc type UDFs explicitly, versus being able to do something like GENERATE PERCENTILE(90, duration) I'm new to Pig, so I might be missing the way to do this... is it possible? Thanks, Brian
[jira] Updated: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception
[ https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar updated PIG-656: -- Attachment: reserved.patch This patch allows the use of reserved words in function names. To avoid parsing ambiguity, the first part of the fully qualified function name (i.e. test before the first .) cannot be a reserved word. But the rest of the parts in fully qualified function names can be any identifier, including a reserved word. So, for example, with this patch, the statement: {code} define X com.yahoo.load(); {code} or {code} modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist)); {code} Now compiles and runs perfectly well. Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception - Key: PIG-656 URL: https://issues.apache.org/jira/browse/PIG-656 Project: Pig Issue Type: Bug Components: documentation, grunt Affects Versions: 0.2.1 Reporter: Viraj Bhat Assignee: Milind Bhandarkar Fix For: 0.3.0 Attachments: mywordcount.txt, reserved.patch, TOKENIZE.jar Consider a Pig script which does something similar to a word count. It uses the built-in TOKENIZE function, but packages it inside a class hierarchy such as mypackage.eval {code} register TOKENIZE.jar my_src = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t') AS (mlist: chararray); modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist)); describe modules; grouped = GROUP modules BY $0; describe grouped; counts = FOREACH grouped GENERATE COUNT(modules), group; ordered = ORDER counts BY $0; dump ordered; {code} The parser complains: === 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray} === I looked at the following source code at (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems that : EVAL is a keyword in Pig. Here are some clarifications: 1) Is there documentation on what the EVAL keyword actually is? 2) Is EVAL keyword actually implemented? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Setty updated PIG-802: - Attachment: (was: OrderByOptimization.patch) PERFORMANCE: not creating bags for ORDER BY --- Key: PIG-802 URL: https://issues.apache.org/jira/browse/PIG-802 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Attachments: OrderByOptimization.patch Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Setty updated PIG-802: - Attachment: OrderByOptimization.patch Attaching the modified patch. The detachInput method in POPackageLite will set key and tupIter to null. So ReadOnceBag maintains separate references to them. POPackageLite overloads the getValueTuple method with the additional key parameter to use the one provided by ReadOnceBag. The implementation of POPackage is untouched. PERFORMANCE: not creating bags for ORDER BY --- Key: PIG-802 URL: https://issues.apache.org/jira/browse/PIG-802 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Attachments: OrderByOptimization.patch Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #55
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/changes Changes: [sms] PIG-697: Proposed improvements to pig's optimizer [pradeepkth] PIG-814:Make Binstorage more robust when data contains record markers (pradeepkth) -- [...truncated 91184 lines...] [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Deleting block blk_-1090490746339115162_1005 file dfs/data/data1/current/blk_-1090490746339115162 [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/hudson/input2.txt. blk_11151567103307144_1011 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Receiving block blk_11151567103307144_1011 src: /127.0.0.1:38664 dest: /127.0.0.1:49311 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Receiving block blk_11151567103307144_1011 src: /127.0.0.1:34216 dest: /127.0.0.1:51371 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Receiving block blk_11151567103307144_1011 src: /127.0.0.1:34352 dest: /127.0.0.1:59469 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Received block blk_11151567103307144_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: PacketResponder 0 for block blk_11151567103307144_1011 terminating [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59469 is added to blk_11151567103307144_1011 size 6 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Received block blk_11151567103307144_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: PacketResponder 1 for block blk_11151567103307144_1011 terminating [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: Received block blk_11151567103307144_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/05/22 18:59:35 INFO dfs.DataNode: PacketResponder 2 for block blk_11151567103307144_1011 terminating [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:51371 is added to blk_11151567103307144_1011 size 6 [exec] [junit] 09/05/22 18:59:35 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:49311 is added to blk_11151567103307144_1011 size 6 [exec] [junit] 09/05/22 18:59:35 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:33302 [exec] [junit] 09/05/22 18:59:35 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:39094 [exec] [junit] 09/05/22 18:59:35 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/05/22 18:59:35 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/05/22 18:59:36 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/05/22 18:59:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/05/22 18:59:36 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200905221858_0002/job.jar. blk_-5203208028961802060_1012 [exec] [junit] 09/05/22 18:59:36 INFO dfs.DataNode: Receiving block blk_-5203208028961802060_1012 src: /127.0.0.1:34353 dest: /127.0.0.1:59469 [exec] [junit] 09/05/22 18:59:36 INFO dfs.DataNode: Receiving block blk_-5203208028961802060_1012 src: /127.0.0.1:36700 dest: /127.0.0.1:48790 [exec] [junit] 09/05/22 18:59:36 INFO dfs.DataNode: Receiving block blk_-5203208028961802060_1012 src: /127.0.0.1:34220 dest: /127.0.0.1:51371 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: Received block blk_-5203208028961802060_1012 of size 1405185 from /127.0.0.1 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: PacketResponder 0 for block blk_-5203208028961802060_1012 terminating [exec] [junit] 09/05/22 18:59:37 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:51371 is added to blk_-5203208028961802060_1012 size 1405185 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: Received block blk_-5203208028961802060_1012 of size 1405185 from /127.0.0.1 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: PacketResponder 1 for block blk_-5203208028961802060_1012 terminating [exec] [junit] 09/05/22 18:59:37 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48790 is added to blk_-5203208028961802060_1012 size 1405185 [exec] [junit] 09/05/22 18:59:37 INFO dfs.DataNode: Received block blk_-5203208028961802060_1012 of size 1405185 from /127.0.0.1 [exec] [junit] 09/05/22 18:59:37 INFO dfs.StateChange: BLOCK*
[jira] Commented: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception
[ https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712333#action_12712333 ] Hadoop QA commented on PIG-656: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408857/reserved.patch against trunk revision 08. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/55/console This message is automatically generated. Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception - Key: PIG-656 URL: https://issues.apache.org/jira/browse/PIG-656 Project: Pig Issue Type: Bug Components: documentation, grunt Affects Versions: 0.2.1 Reporter: Viraj Bhat Assignee: Milind Bhandarkar Fix For: 0.3.0 Attachments: mywordcount.txt, reserved.patch, TOKENIZE.jar Consider a Pig script which does something similar to a word count. It uses the built-in TOKENIZE function, but packages it inside a class hierarchy such as mypackage.eval {code} register TOKENIZE.jar my_src = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t') AS (mlist: chararray); modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist)); describe modules; grouped = GROUP modules BY $0; describe grouped; counts = FOREACH grouped GENERATE COUNT(modules), group; ordered = ORDER counts BY $0; dump ordered; {code} The parser complains: === 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray} === I looked at the following source code at (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems that : EVAL is a keyword in Pig. Here are some clarifications: 1) Is there documentation on what the EVAL keyword actually is? 2) Is EVAL keyword actually implemented? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-816) PigStorage() does not accept Unicode characters in its contructor
PigStorage() does not accept Unicode characters in its contructor -- Key: PIG-816 URL: https://issues.apache.org/jira/browse/PIG-816 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Viraj Bhat Priority: Critical Fix For: 0.3.0 Simple Pig script which uses Unicode characters in the PigStorage() constructor fails with the following error: {code} studenttab = LOAD '/user/viraj/studenttab10k' AS (name:chararray, age:int,gpa:float); X2 = GROUP studenttab by age; Y2 = FOREACH X2 GENERATE group, COUNT(studenttab); store Y2 into '/user/viraj/y2' using PigStorage('\u0001'); {code} ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backend error: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference #1 is an invalid XML character. Attaching log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-816) PigStorage() does not accept Unicode characters in its contructor
[ https://issues.apache.org/jira/browse/PIG-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-816: --- Attachment: pig_1243043613713.log Log file for detailed error message PigStorage() does not accept Unicode characters in its contructor -- Key: PIG-816 URL: https://issues.apache.org/jira/browse/PIG-816 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Viraj Bhat Priority: Critical Fix For: 0.3.0 Attachments: pig_1243043613713.log Simple Pig script which uses Unicode characters in the PigStorage() constructor fails with the following error: {code} studenttab = LOAD '/user/viraj/studenttab10k' AS (name:chararray, age:int,gpa:float); X2 = GROUP studenttab by age; Y2 = FOREACH X2 GENERATE group, COUNT(studenttab); store Y2 into '/user/viraj/y2' using PigStorage('\u0001'); {code} ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backend error: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference #1 is an invalid XML character. Attaching log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.