[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins
[ https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1574: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed test-patch result: jira-1574-1.patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. This patch does not push filter before join if the join is outer join. Actually we can push filter to the outer side of the join. I assume it will be addressed in PIG-1575. Patch jira-1574-1.patch committed. Thanks Xuefu! Optimization rule PushUpFilter causes filter to be pushed up out joins -- Key: PIG-1574 URL: https://issues.apache.org/jira/browse/PIG-1574 Project: Pig Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: jira-1574-1.patch The PushUpFilter optimization rule in the new logical plan moves the filter up to one of the join branch. It does this aggressively by find an operator that has all the projection UIDs. However, it didn't consider that the found operator might be another join. If that join is outer, then we cannot simply move the filter to one of its branches. As an example, the following script will be erroneously optimized: A = load 'myfile' as (d1:int); B = load 'anotherfile' as (d2:int); C = join A by d1 full outer, B by d2; D = load 'xxx' as (d3:int); E = join C by d1, D by d3; F = filter E by d1 5; G = store F into 'dummy'; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly
[ https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1568: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed test-patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Patch committed. Thanks Xuefu! Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly -- Key: PIG-1568 URL: https://issues.apache.org/jira/browse/PIG-1568 Project: Pig Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: jira-1568-1.patch, jira-1568-1.patch FilterAboveForeach rule is to optimize the plan by pushing up filter above previous foreach operator. However, during code review, two major problems were found: 1. Current implementation assumes that if no projection is found in the filter condition then all columns from foreach are projected. This issue prevents the following optimization: A = LOAD 'file.txt' AS (a(u,v), b, c); B = FOREACH A GENERATE $0, b; C = FILTER B BY 8 5; STORE C INTO 'empty'; 2. Current implementation doesn't handle * probjection, which means project all columns. As a result, it wasn't able to optimize the following: A = LOAD 'file.txt' AS (a(u,v), b, c); B = FOREACH A GENERATE $0, b; C = FILTER B BY Identity.class.getName(*) 5; STORE C INTO 'empty'; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput
Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput --- Key: PIG-1579 URL: https://issues.apache.org/jira/browse/PIG-1579 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput
[ https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1579: Attachment: PIG-1579-1.patch Attach a fix. However, this fix is shallow and may need an in-depth look. Commit the temporary fix and leave the Jira open. Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput --- Key: PIG-1579 URL: https://issues.apache.org/jira/browse/PIG-1579 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1579-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput
[ https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1579: Description: Error message: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error executing function: Traceback (most recent call last): File iostream, line 5, in multStr TypeError: can't multiply sequence by non-int of type 'NoneType' at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput --- Key: PIG-1579 URL: https://issues.apache.org/jira/browse/PIG-1579 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1579-1.patch Error message: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error executing function: Traceback (most recent call last): File iostream, line 5, in multStr TypeError: can't multiply sequence by non-int of type 'NoneType' at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1482) Pig gets confused when more than one loader is involved
[ https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1482: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to trunk. Xuefu, thanks for the fix. Pig gets confused when more than one loader is involved --- Key: PIG-1482 URL: https://issues.apache.org/jira/browse/PIG-1482 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ankur Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: jira-1482-final-1.patch, jira-1482-final-2.patch, jira-1482-final.patch, jira-1482-final.patch, jira-1482-final.patch In case of two relations being loaded using different loader, joined, grouped and projected, pig gets confused in trying to find appropriate loader for the requested cast. Consider the following script :- A = LOAD 'data1' USING PigStorage() AS (s, m, l); B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3; C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 :0) as v3:int; D = LOAD 'data2' USING TextLoader() AS (a); E = JOIN C BY v1, D BY a USING 'replicated'; F = GROUP E BY (v1, a); G = FOREACH F GENERATE (chararray)group.v1, group.a; dump G; This throws the error, stack trace of which is in the next comment -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs
[ https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904258#action_12904258 ] Thejas M Nair commented on PIG-1570: Regarding bq. Another thing to investigate (somewhat related) - there seems to be a problem when PigServer is used to execute query having native mr operator - i was unable to run the tests in local mode . But i am able to run query in local mode from commandline. The problem was that in test setup, the MiniCluster hadoop-site.xml (~/pigtest/conf/hadoop-site.xml) is in classpath. The WordCount.jar would end up trying to run the MR job using minicluster and fail, if rest of the test is using local mode. native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs - Key: PIG-1570 URL: https://issues.apache.org/jira/browse/PIG-1570 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 The code path for handling failure in MR job corresponding to native MR is different and does not have the same behavior. For example, even if the MR job for mapreduce operator fails, the number of jobs that failed is being reported as 0 in PigStats log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs
[ https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1570: --- Attachment: PIG-1570.1.patch Patch passed test-patch and core tests. Patch is ready for review. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 5 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs - Key: PIG-1570 URL: https://issues.apache.org/jira/browse/PIG-1570 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1570.1.patch The code path for handling failure in MR job corresponding to native MR is different and does not have the same behavior. For example, even if the MR job for mapreduce operator fails, the number of jobs that failed is being reported as 0 in PigStats log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs
[ https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904263#action_12904263 ] Thejas M Nair commented on PIG-1570: The code path that is followed in case of the native MR job is still different because the jar is a black box, and pig just calls the main function, pig doesn't even know if it is a MR job that is actually being run. This fixes the pig stats reporting (log messages) for failed native MR job and also the feature list in the native MR job. native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs - Key: PIG-1570 URL: https://issues.apache.org/jira/browse/PIG-1570 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1570.1.patch The code path for handling failure in MR job corresponding to native MR is different and does not have the same behavior. For example, even if the MR job for mapreduce operator fails, the number of jobs that failed is being reported as 0 in PigStats log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails
[ https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904267#action_12904267 ] Richard Ding commented on PIG-1343: --- Patch is committed to the trunk. Thanks Niraj. pig_log file missing even though Main tells it is creating one and an M/R job fails Key: PIG-1343 URL: https://issues.apache.org/jira/browse/PIG-1343 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: niraj rai Fix For: 0.8.0 Attachments: 1343.patch, PIG-1343-1.patch, PIG-1343_6.patch, pig_1343_2.patch, pig_1343_4.patch, PIG_1343_5.patch There is a particular case where I was running with the latest trunk of Pig. {code} $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig_1263420012601.log $ls -l pig_1263420012601.log ls: pig_1263420012601.log: No such file or directory {code} The job failed and the log file did not contain anything, the only way to debug was to look into the Jobtracker logs. Here are some reasons which would have caused this behavior: 1) The underlying filer/NFS had some issues. In that case do we not error on stdout? 2) There are some errors from the backend which are not being captured Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails
[ https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1343: -- Attachment: PIG-1343_6.patch pig_log file missing even though Main tells it is creating one and an M/R job fails Key: PIG-1343 URL: https://issues.apache.org/jira/browse/PIG-1343 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: niraj rai Fix For: 0.8.0 Attachments: 1343.patch, PIG-1343-1.patch, PIG-1343_6.patch, pig_1343_2.patch, pig_1343_4.patch, PIG_1343_5.patch There is a particular case where I was running with the latest trunk of Pig. {code} $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig_1263420012601.log $ls -l pig_1263420012601.log ls: pig_1263420012601.log: No such file or directory {code} The job failed and the log file did not contain anything, the only way to debug was to look into the Jobtracker logs. Here are some reasons which would have caused this behavior: 1) The underlying filer/NFS had some issues. In that case do we not error on stdout? 2) There are some errors from the backend which are not being captured Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails
[ https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1343: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed pig_log file missing even though Main tells it is creating one and an M/R job fails Key: PIG-1343 URL: https://issues.apache.org/jira/browse/PIG-1343 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: niraj rai Fix For: 0.8.0 Attachments: 1343.patch, PIG-1343-1.patch, PIG-1343_6.patch, pig_1343_2.patch, pig_1343_4.patch, PIG_1343_5.patch There is a particular case where I was running with the latest trunk of Pig. {code} $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig_1263420012601.log $ls -l pig_1263420012601.log ls: pig_1263420012601.log: No such file or directory {code} The job failed and the log file did not contain anything, the only way to debug was to look into the Jobtracker logs. Here are some reasons which would have caused this behavior: 1) The underlying filer/NFS had some issues. In that case do we not error on stdout? 2) There are some errors from the backend which are not being captured Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1580) new syntax for native mapreduce operator
new syntax for native mapreduce operator Key: PIG-1580 URL: https://issues.apache.org/jira/browse/PIG-1580 Project: Pig Issue Type: Task Reporter: Thejas M Nair Assignee: Thejas M Nair mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both. Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release. MAPREDUCE id jar INPUT 'path' USING LoadFunc OUTPUT 'path' USING StoreFunc [SHIP 'path' [, 'path' ...]] [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1580) new syntax for native mapreduce operator
[ https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1580: --- Fix Version/s: 0.8.0 new syntax for native mapreduce operator Key: PIG-1580 URL: https://issues.apache.org/jira/browse/PIG-1580 Project: Pig Issue Type: Task Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both. Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release. MAPREDUCE id jar INPUT 'path' USING LoadFunc OUTPUT 'path' USING StoreFunc [SHIP 'path' [, 'path' ...]] [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1581) Parser fails to recognize semicolons in quoted strings
Parser fails to recognize semicolons in quoted strings -- Key: PIG-1581 URL: https://issues.apache.org/jira/browse/PIG-1581 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.7.0 Environment: CentOS 5.5 Reporter: Christopher Hackman Priority: Minor Within some contexts, the parser fails to treat semicolons correctly, and sees them as an EOL. Given an input file: /test1.txt (in the hdfs) 1;a 2;b 3;c 4;d 5;e And the following Pig script: REGISTER /tmp/piggybank.jar ; DEFINE REGEXEXTRACTALL org.apache.pig.piggybank.evaluation.string.RegexExtractAll(); lines = LOAD '/test1.txt' AS (line:chararray); delimited = FOREACH lines GENERATE FLATTEN ( REGEXEXTRACTALL(line, '^(\\d+);(\\w+)$') ) AS ( digit:int, word:chararray ); DUMP delimited; I receive the following error: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 5, column 40. Encountered: EOF after : \'^(d+); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1580) new syntax for native mapreduce operator
[ https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904298#action_12904298 ] Thejas M Nair commented on PIG-1580: Updating syntax to include support for parameters - MAPREDUCE id jar 'params' INPUT 'path' USING LoadFunc OUTPUT 'path' USING StoreFunc [SHIP 'path' [, 'path' ...]] [CACHE 'dfs_path#dfs_file' , 'dfs_path#dfs_file' ...] new syntax for native mapreduce operator Key: PIG-1580 URL: https://issues.apache.org/jira/browse/PIG-1580 Project: Pig Issue Type: Task Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both. Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release. MAPREDUCE id jar INPUT 'path' USING LoadFunc OUTPUT 'path' USING StoreFunc [SHIP 'path' [, 'path' ...]] [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs
[ https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904321#action_12904321 ] Richard Ding commented on PIG-1570: --- +1. native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs - Key: PIG-1570 URL: https://issues.apache.org/jira/browse/PIG-1570 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1570.1.patch The code path for handling failure in MR job corresponding to native MR is different and does not have the same behavior. For example, even if the MR job for mapreduce operator fails, the number of jobs that failed is being reported as 0 in PigStats log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1205: --- Attachment: PIG_1205_9.patch Patch with the StoreCaster changes as suggested by Alan. With +1s from Alan and Jeff, committing. Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904325#action_12904325 ] Dmitriy V. Ryaboy commented on PIG-1205: Re HBASE-1933, they are publishing snapshots of current trunk, not the 0.20 branch. We'll be able to start using maven to pull down hbase when we upgrade to their 0.9 release (which iirc depends on hdfs appends...) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1458) aggregate files for replicated join
[ https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1458: -- Attachment: PIG-1458_1.patch New patch addressing review comments. aggregate files for replicated join --- Key: PIG-1458 URL: https://issues.apache.org/jira/browse/PIG-1458 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch, PIG-1458_1.patch We have noticed that if the smaller data in replicated join has many files, this puts unneeded burden on the name node. pre-aggregating the files can improve the situation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1205: --- Status: Resolved (was: Patch Available) Release Note: HBaseStorage has been significantly reworked with this release. Usage: {code} my_data = LOAD 'hbase://table_name' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 colfamily:col2', '-caching 100') as (col1:int, col2:chararray); STORE my_date INTO 'hbaseL//other_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 colfamily:col2'); {code} HBaseStorage can now write data into HBase as well as read it. The first argument is a space-delimited list of columns to be loaded (or stored). Columns are specified as columnfamily:column_name. The second argument is an optional set of key-value pairs used to control HBaseStorage behavior. Available arguments are: * {{monospaced}}-loadKey{{monospaced}} Used to load the row key; false by default. If true, the first field in the returned tuple will be the value of the row key. * {{monospaced}}-gt, -gte, -lt, and -lte{{monospaced}} Used to specify bounds on row keys to be scanned. The keys are specified as binary data, using the hex representation. Any slashes have to be double-escaped (two slashes per single real slash) to be parsed correctly. * {{monospaced}}-caching{{monospaced}} Used to specify the number of rows to be cached per HBase RPC call. See http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#setScannerCaching%28int%29 for more information about this HBase feature. * {{monospaced}}-limit{{monospaced}} Used to control how many rows *per scanned region* will be retrieved. This can of course speed up processing if you just want a few rows. The total number of rows returned will be up to number of regions * limit. The limit is applied after any -gt, -lt, etc filters. Pig's LIMIT operator can be used in conjunction with this argument. * {{monospaced}}-caster{{monospaced}} Used to specify a LoadCaster (or LoadStoreCaster, for storage) used to convert the data stored in HBase into Pig data. By default, the Utf8StorageConverter is used, which stores all data as its string representation. The string HBaseBinaryConverter can be used to specify that data is stored in HBase's native binary format. Note that the HBaseBinary converter does not work with complex data types such as maps, tuples, and bags. You can also specify a full class path such as org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter to use your own Caster. The default caster can be changed by setting the pig.hbase.caster property in pig,properties HBaseStorage matches column arguments to tuple fields based on their ordinal position. When storing, the first field is expected to be the key value. Resolution: Fixed Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1458) aggregate files for replicated join
[ https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904346#action_12904346 ] Koji Noguchi commented on PIG-1458: --- Can we increase the replication to 10 for the aggregated file (if not already done)? aggregate files for replicated join --- Key: PIG-1458 URL: https://issues.apache.org/jira/browse/PIG-1458 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch, PIG-1458_1.patch We have noticed that if the smaller data in replicated join has many files, this puts unneeded burden on the name node. pre-aggregating the files can improve the situation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904356#action_12904356 ] Alan Gates commented on PIG-1399: - {code} [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. {code} I'll attach the results of findbugs separately. Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1569) java properties not honored in case of properties such as stop.on.failure
[ https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1569: -- Status: Patch Available (was: Open) java properties not honored in case of properties such as stop.on.failure - Key: PIG-1569 URL: https://issues.apache.org/jira/browse/PIG-1569 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1569.patch In org.apache.pig.Main , properties are being set to default value without checking if the java system properties have been set to something else. stop.on.failure, opt.multiquery, aggregate.warning are some properties that have this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1569) java properties not honored in case of properties such as stop.on.failure
[ https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1569: -- Attachment: PIG-1569.patch java properties not honored in case of properties such as stop.on.failure - Key: PIG-1569 URL: https://issues.apache.org/jira/browse/PIG-1569 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1569.patch In org.apache.pig.Main , properties are being set to default value without checking if the java system properties have been set to something else. stop.on.failure, opt.multiquery, aggregate.warning are some properties that have this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1399: Attachment: newPatchFindbugsWarnings.html Results of findbugs from manual run of test-patch Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1458) aggregate files for replicated join
[ https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904358#action_12904358 ] Thejas M Nair commented on PIG-1458: +1 aggregate files for replicated join --- Key: PIG-1458 URL: https://issues.apache.org/jira/browse/PIG-1458 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch, PIG-1458_1.patch We have noticed that if the smaller data in replicated join has many files, this puts unneeded burden on the name node. pre-aggregating the files can improve the situation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1399: -- Attachment: PIG-1399.patch I use findbugs 1.3.9 and it finds the patch clean. The attached findbugs results were generated using 1.3.8, it might be the difference. Anyways, I make a minor modification that should fix the warnings by 1.3.8. Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1569) java properties not honored in case of properties such as stop.on.failure
[ https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904369#action_12904369 ] Thejas M Nair commented on PIG-1569: looks good. +1 java properties not honored in case of properties such as stop.on.failure - Key: PIG-1569 URL: https://issues.apache.org/jira/browse/PIG-1569 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1569.patch In org.apache.pig.Main , properties are being set to default value without checking if the java system properties have been set to something else. stop.on.failure, opt.multiquery, aggregate.warning are some properties that have this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1399: -- Status: Patch Available (was: Open) Release Note: This logical simplification contains the following types of simplifications: 1) Constant pre-calculation Example: B = filter A by a0 5+7; is simplified to B = filter A by a0 12; 2) Elimination of negations Example: B = filter A by not (not(a05) or a10); is simplified to B = filter A by a05 and a=10; 3) Elimination of logical implied expression in AND Example: B = filter A by (a0 5 and a0 7); is simplified to B = filter A by a0 7; 4) Elimination of logical implied expression in OR Example: B = filter A by ((a0 5) or (a0 6 and a1 15); is simplified to B = filter C by a0 5; 5) Equivalence elimination Example: B = filter A by (a0 5 and a0 5); is simplified to B = filter A by a0 5; 6) Elimination of complementary expressions in OR Example: B = filter A by (a0 5 OR a0 = 5); is simplified to non-filtering 7) Elimination of naive TRUE expression Example: B = filter A by 1==1; is simplified to non-filtering Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1458) aggregate files for replicated join
[ https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904385#action_12904385 ] Richard Ding commented on PIG-1458: --- Koji, Please open a jira on increasing the replication factor of the replicated files. Now it uses the default replication factor. Thanks, -Richard aggregate files for replicated join --- Key: PIG-1458 URL: https://issues.apache.org/jira/browse/PIG-1458 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch, PIG-1458_1.patch We have noticed that if the smaller data in replicated join has many files, this puts unneeded burden on the name node. pre-aggregating the files can improve the situation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1569) java properties not honored in case of properties such as stop.on.failure
[ https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1569: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed java properties not honored in case of properties such as stop.on.failure - Key: PIG-1569 URL: https://issues.apache.org/jira/browse/PIG-1569 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1569.patch In org.apache.pig.Main , properties are being set to default value without checking if the java system properties have been set to something else. stop.on.failure, opt.multiquery, aggregate.warning are some properties that have this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1572: --- Attachment: PIG-1572.1.patch Summary of changes - Changed default type (ie type when input relation to scalar has not type) to bytearray. - Replaced PigStorage with InterStorage for load/store of scalar data, so typed data is stored. - Changes to track lineage of the ReadScalars udf to the load function(s). - Removed unnecessary casts on output of ReadScalars - describe alias; PigServer code now checks the alias of the leaf logical operators - Changed test cases - explicit cast no longer required when bytearray is used in arithmetic operations. Moved some of the tests to local mode to reduce test run time. change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs
[ https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1570. Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to trunk. native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs - Key: PIG-1570 URL: https://issues.apache.org/jira/browse/PIG-1570 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1570.1.patch The code path for handling failure in MR job corresponding to native MR is different and does not have the same behavior. For example, even if the MR job for mapreduce operator fails, the number of jobs that failed is being reported as 0 in PigStats log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1458) aggregate files for replicated join
[ https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1458. --- Hadoop Flags: [Reviewed] Resolution: Fixed aggregate files for replicated join --- Key: PIG-1458 URL: https://issues.apache.org/jira/browse/PIG-1458 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch, PIG-1458_1.patch We have noticed that if the smaller data in replicated join has many files, this puts unneeded burden on the name node. pre-aggregating the files can improve the situation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1563) SUBSTRING function is broken
[ https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904450#action_12904450 ] Olga Natkovich commented on PIG-1563: - Dmitry, thanks for the review. I did not discard your function - it was part of the patch. I did not change the code to use it just because I already finished testing the changes and did not have time to redo the code. I am fixing some javadoc and release audit failures and will commit the code shortly. SUBSTRING function is broken Key: PIG-1563 URL: https://issues.apache.org/jira/browse/PIG-1563 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: PIG_1563.patch, PIG_1563_v2.patch Script: A = load 'studenttab10k' as (name, age, gpa); C = foreach A generate SUBSTRING(name, 0,5); E = limit C 10; dump E; Output is always empty: () () () () () () () () () () -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1458) aggregate files for replicated join
[ https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904451#action_12904451 ] Richard Ding commented on PIG-1458: --- Patch committed to trunk. aggregate files for replicated join --- Key: PIG-1458 URL: https://issues.apache.org/jira/browse/PIG-1458 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch, PIG-1458_1.patch We have noticed that if the smaller data in replicated join has many files, this puts unneeded burden on the name node. pre-aggregating the files can improve the situation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1483) [piggybank] Add HadoopJobHistoryLoader to the piggybank
[ https://issues.apache.org/jira/browse/PIG-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904453#action_12904453 ] Richard Ding commented on PIG-1483: --- Patch committed to trunk. [piggybank] Add HadoopJobHistoryLoader to the piggybank --- Key: PIG-1483 URL: https://issues.apache.org/jira/browse/PIG-1483 Project: Pig Issue Type: New Feature Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1483.patch, PIG-1483_1.patch PIG-1333 added many script-related entries to the MR job xml file and thus it's now possible to use Pig for querying Hadoop job history/xml files to get script-level usage statistics. What we need is a Pig loader that can parse these files and generate corresponding data objects. The goal of this jira is to create a HadoopJobHistoryLoader in piggybank. Here is an example that shows the intended usage: *Find all the jobs grouped by script and user:* {code} a = load '/mapred/history/_logs/history/' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]); b = foreach a generate (Chararray) j#'PIG_SCRIPT_ID' as id, (Chararray) j#'USER' as user, (Chararray) j#'JOBID' as job; c = filter b by not (id is null); d = group c by (id, user); e = foreach d generate flatten(group), c.job; dump e; {code} A couple more examples: *Find scripts that use only the default parallelism:* {code} a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]); b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces; c = group b by (id, user, script_name) parallel 10; d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces; e = filter d by max_reduces == 1; dump e; {code} *Find the running time of each script (in seconds):* {code} a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]); b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end; c = group b by (id, user, script_name) d = foreach c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000; dump d; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1557) couple of issue mapping aliases to jobs
[ https://issues.apache.org/jira/browse/PIG-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904456#action_12904456 ] Richard Ding commented on PIG-1557: --- Patch committed to trunk. couple of issue mapping aliases to jobs --- Key: PIG-1557 URL: https://issues.apache.org/jira/browse/PIG-1557 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1557.patch, PIG-1557_1.patch I have a simple script: A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa); B = group A by name; C = foreach B generate group, COUNT(A); D = order C by $1; E = limit D 10; dump E; I noticed a couple of issues with alias to job mapping: neither load(A) nor limit(E) shows in the output -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1563) Some string functions don't work with bytearray arguments
[ https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904462#action_12904462 ] Olga Natkovich commented on PIG-1563: - +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 13 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] Some string functions don't work with bytearray arguments - Key: PIG-1563 URL: https://issues.apache.org/jira/browse/PIG-1563 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: PIG_1563.patch, PIG_1563_v2.patch Script: A = load 'studenttab10k' as (name, age, gpa); C = foreach A generate SUBSTRING(name, 0,5); E = limit C 10; dump E; Output is always empty: () () () () () () () () () () -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1563) Some string functions don't work with bytearray arguments
[ https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904467#action_12904467 ] Olga Natkovich commented on PIG-1563: - I made one additional change and renamed SPLIT into STRSPLIT to avoid conflict with SPLIT operator Some string functions don't work with bytearray arguments - Key: PIG-1563 URL: https://issues.apache.org/jira/browse/PIG-1563 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: PIG_1563.patch, PIG_1563_v2.patch Script: A = load 'studenttab10k' as (name, age, gpa); C = foreach A generate SUBSTRING(name, 0,5); E = limit C 10; dump E; Output is always empty: () () () () () () () () () () -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1563) Some string functions don't work with bytearray arguments
[ https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1563: Attachment: PIG_1563_v3.patch latest patch Some string functions don't work with bytearray arguments - Key: PIG-1563 URL: https://issues.apache.org/jira/browse/PIG-1563 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: PIG_1563.patch, PIG_1563_v2.patch, PIG_1563_v3.patch Script: A = load 'studenttab10k' as (name, age, gpa); C = foreach A generate SUBSTRING(name, 0,5); E = limit C 10; dump E; Output is always empty: () () () () () () () () () () -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1563) Some string functions don't work with bytearray arguments
[ https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1563: Status: Resolved (was: Patch Available) Resolution: Fixed patch committed. Thanks Dmitry for the help and review Some string functions don't work with bytearray arguments - Key: PIG-1563 URL: https://issues.apache.org/jira/browse/PIG-1563 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: PIG_1563.patch, PIG_1563_v2.patch, PIG_1563_v3.patch Script: A = load 'studenttab10k' as (name, age, gpa); C = foreach A generate SUBSTRING(name, 0,5); E = limit C 10; dump E; Output is always empty: () () () () () () () () () () -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1531) Pig gobbles up error messages
[ https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904497#action_12904497 ] Ashutosh Chauhan commented on PIG-1531: --- Niraj ran all the unit tests. All passed. No complaints from test-patch either. Committed to the trunk. Thanks, Niraj ! Pig gobbles up error messages - Key: PIG-1531 URL: https://issues.apache.org/jira/browse/PIG-1531 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: niraj rai Fix For: 0.8.0 Attachments: pig-1531_3.patch, PIG_1531.patch, PIG_1531_2.patch Consider the following. I have my own Storer implementing StoreFunc and I am throwing FrontEndException (and other Exceptions derived from PigException) in its various methods. I expect those error messages to be shown in error scenarios. Instead Pig gobbles up my error messages and shows its own generic error message like: {code} 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116: Unexpected error. Could not validate the output specification for: default.partitoned Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log {code} Instead I expect it to display my error messages which it stores away in that log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.