[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Status: Resolved (was: Patch Available) Resolution: Fixed unit tests passed. PIG-1649.5.patch committed to trunk and 0.8 branch. FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, PIG-1649.4.patch, PIG-1649.5.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1629) Need ability to limit bags produced during GROUP + LIMIT
[ https://issues.apache.org/jira/browse/PIG-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916538#action_12916538 ] Thejas M Nair commented on PIG-1629: Similar optimization can be done for inner filter as well - C = foreach B{ D = filter A by x 0; generate group, MyUDF(D);} Changes required- - group physical/MR plan implementation to have an inner limit/filter. - logical optimizer rules to make the load/filter an inner plan of groupp Need ability to limit bags produced during GROUP + LIMIT Key: PIG-1629 URL: https://issues.apache.org/jira/browse/PIG-1629 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.9.0 Currently, the code below will construct the full group in memory and then trim it. This requires in use of more memory than needed. A = load 'data' as (x, y, z); B = group A by x; C = foreach B{ D = limit A 100; generate group, MyUDF(D);} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1650: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Niraj confirmed that unit tests and test-patch has succeded. Patch looks good. +1 . Committed to trunk and 0.8 branch. pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch, PIG-1650_1.patch, PIG-1650_2.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1657) reduce the ivy verbosity during build.
reduce the ivy verbosity during build. -- Key: PIG-1657 URL: https://issues.apache.org/jira/browse/PIG-1657 Project: Pig Issue Type: Improvement Reporter: Giridharan Kesavan ivy is very verbose while doing build, making it less verbose would let us see what the builds actually does.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Reporter: Yan Zhou In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Fix Version/s: 0.8.0 Affects Version/s: 0.8.0 ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1658: - Assignee: Yan Zhou ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916594#action_12916594 ] Giridharan Kesavan commented on PIG-1607: - looks good +1 able to do mvn-install and mvn-deploy to install/deploy javadoc jar to the fs and apache mvn repo. pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY
sortinfo is not set for store if there is a filter after ORDER BY - Key: PIG-1659 URL: https://issues.apache.org/jira/browse/PIG-1659 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Daniel Dai Fix For: 0.8.0 This has caused 6 (of 7) failures in the Zebra test TestOrderPreserveVariableTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1651) PIG class loading mishandled
[ https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1651: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1651.patch If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1655) code duplicated for udfs that were moved from piggybank to builtin
[ https://issues.apache.org/jira/browse/PIG-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai reassigned PIG-1655: -- Assignee: niraj rai (was: Thejas M Nair) code duplicated for udfs that were moved from piggybank to builtin -- Key: PIG-1655 URL: https://issues.apache.org/jira/browse/PIG-1655 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 As part of PIG-1405, some udfs from piggybank were made standard udfs. But now the code is duplicated in piggybank and org.apache.pig.builtin. . This can cause confusion. I am planning to make these udfs in piggybank subclasses of those in org.apache.pig.builtin. so that users don't have to change their scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1607: --- Fix Version/s: 0.8.0 Affects Version/s: 0.8.0 pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1607: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to 0.8 branch and trunk. pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1638) sh output gets mixed up with the grunt prompt
[ https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916725#action_12916725 ] Daniel Dai commented on PIG-1638: - +1 sh output gets mixed up with the grunt prompt - Key: PIG-1638 URL: https://issues.apache.org/jira/browse/PIG-1638 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Priority: Minor Fix For: 0.8.0 Attachments: PIG-1638_0.patch Many times, the grunt prompt gets mixed up with the sh output.e.g. grunt sh ls 000 autocomplete bin build build.xml grunt CHANGES.txt conf contrib In the above case, grunt is mixed up with the output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1297) algebraic interface of udf does not get used if the foreach with udf projects column within group
[ https://issues.apache.org/jira/browse/PIG-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1297. Resolution: Duplicate algebraic interface of udf does not get used if the foreach with udf projects column within group - Key: PIG-1297 URL: https://issues.apache.org/jira/browse/PIG-1297 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.9.0 grunt l = load 'file' as (a,b,c); grunt g = group l by (a,b); grunt f = foreach g generate SUM(l.c), group.a; grunt explain f; ... ... #-- # Map Reduce Plan #-- MapReduce node 1-752 Map Plan Local Rearrange[tuple]{tuple}(false) - 1-742 | | | Project[bytearray][0] - 1-743 | | | Project[bytearray][1] - 1-744 | |---Load(file:///Users/tejas/pig/trunk/file:org.apache.pig.builtin.PigStorage) - 1-739 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-751 | |---New For Each(false,false)[bag] - 1-750 | | | POUserFunc(org.apache.pig.builtin.SUM)[double] - 1-747 | | | |---Project[bag][2] - 1-746 | | | |---Project[bag][1] - 1-745 | | | Project[bytearray][0] - 1-749 | | | |---Project[tuple][0] - 1-748 | |---Package[tuple]{tuple} - 1-741 Global sort: false -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1638) sh output gets mixed up with the grunt prompt
[ https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1638: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. sh output gets mixed up with the grunt prompt - Key: PIG-1638 URL: https://issues.apache.org/jira/browse/PIG-1638 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Priority: Minor Fix For: 0.8.0 Attachments: PIG-1638_0.patch Many times, the grunt prompt gets mixed up with the sh output.e.g. grunt sh ls 000 autocomplete bin build build.xml grunt CHANGES.txt conf contrib In the above case, grunt is mixed up with the output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT
Consider passing result of COUNT/COUNT_STAR to LIMIT - Key: PIG-1660 URL: https://issues.apache.org/jira/browse/PIG-1660 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.9.0 In realistic scenarios we need to split a dataset into segments by using LIMIT, and like to achieve that goal within the same pig script. Here is a case: {code} A = load '$DATA' using PigStorage(',') as (id, pvs); B = group A by ALL; C = foreach B generate COUNT_STAR(A) as row_cnt; -- get the low 50% segment D = order A by pvs; E = limit D (C.row_cnt * 0.2); store E in '$Eoutput'; -- get the high 20% segment F = order A by pvs DESC; G = limit F (C.row_cnt * 0.2); store G in '$Goutput'; {code} Since LIMIT only accepts constants, we have to split the operation to two steps in order to pass in the constants for the LIMIT statements. Please consider bringing this feature in so the processing can be more efficient. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Status: Patch Available (was: Open) ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Attachment: PIG-1658.patch This problem is caused by the PIG-1295 patch. test-core pass. Zebra's nightly pass too. test-patch output: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Zebra's TestMergeJoinPartial is used to verify the fix. ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.