[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781415#action_12781415 ] Hadoop QA commented on PIG-872: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425805/PIG_872.patch.1 against trunk revision 882818. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/165/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/165/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/165/console This message is automatically generated. use distributed cache for the replicated data set in FR join Key: PIG-872 URL: https://issues.apache.org/jira/browse/PIG-872 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Attachments: PIG_872.patch.1 Currently, the replicated file is read directly from DFS by all maps. If the number of the concurrent maps is huge, we can overwhelm the NameNode with open calls. Using distributed cache will address the issue and might also give a performance boost since the file will be copied locally once and the reused by all tasks running on the same machine. The basic approach would be to use cacheArchive to place the file into the cache on the frontend and on the backend, the tasks would need to refer to the data using path from the cache. Note that cacheArchive does not work in Hadoop local mode. (Not a problem for us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-872) use distributed cache for the replicated data set in FR join
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-872: --- Status: Open (was: Patch Available) use distributed cache for the replicated data set in FR join Key: PIG-872 URL: https://issues.apache.org/jira/browse/PIG-872 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Attachments: PIG_872.patch.1 Currently, the replicated file is read directly from DFS by all maps. If the number of the concurrent maps is huge, we can overwhelm the NameNode with open calls. Using distributed cache will address the issue and might also give a performance boost since the file will be copied locally once and the reused by all tasks running on the same machine. The basic approach would be to use cacheArchive to place the file into the cache on the frontend and on the backend, the tasks would need to refer to the data using path from the cache. Note that cacheArchive does not work in Hadoop local mode. (Not a problem for us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-872) use distributed cache for the replicated data set in FR join
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-872: --- Status: Patch Available (was: Open) resubmitting the patch. looks like we had problems running tests use distributed cache for the replicated data set in FR join Key: PIG-872 URL: https://issues.apache.org/jira/browse/PIG-872 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Attachments: PIG_872.patch.1 Currently, the replicated file is read directly from DFS by all maps. If the number of the concurrent maps is huge, we can overwhelm the NameNode with open calls. Using distributed cache will address the issue and might also give a performance boost since the file will be copied locally once and the reused by all tasks running on the same machine. The basic approach would be to use cacheArchive to place the file into the cache on the frontend and on the backend, the tasks would need to refer to the data using path from the cache. Note that cacheArchive does not work in Hadoop local mode. (Not a problem for us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split
[ https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1091: -- Fix Version/s: 0.6.0 [zebra] Exception when load with projection of map keys on a map column that is not map split -- Key: PIG-1091 URL: https://issues.apache.org/jira/browse/PIG-1091 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1091.patch With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection of f2#{a} will see exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-524) ORDER (x,y) gives syntax error
[ https://issues.apache.org/jira/browse/PIG-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-524. Resolution: Duplicate This duplicate of PIG-900 ORDER (x,y) gives syntax error -- Key: PIG-524 URL: https://issues.apache.org/jira/browse/PIG-524 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Olga Natkovich In trunk, this is a valid notation A = load 'data' as (x, y z); B = order A by (x,y); However, new code only allows B = order A by x,y; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-807) PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop values iterator)
[ https://issues.apache.org/jira/browse/PIG-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-807. Resolution: Won't Fix accumulator interface has been introduced for UDFs to solve this issue PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop values iterator) Key: PIG-807 URL: https://issues.apache.org/jira/browse/PIG-807 Project: Pig Issue Type: Improvement Affects Versions: 0.2.1 Reporter: Pradeep Kamath Currently all bags resulting from a group or cogroup are materialized as bags containing all of the contents. The issue with this is that if a particular key has many corresponding values, all these values get stuffed in a bag which may run out of memory and hence spill causing slow down in performance and sometime memory exceptions. In many cases, the udfs which use these bags coming out a group and cogroup only need to iterate over the bag in a unidirectional read-once manner. This can be implemented by having the bag implement its iterator by simply iterating over the underlying hadoop iterator provided in the reduce. This kind of a bag is also needed in http://issues.apache.org/jira/browse/PIG-802. So the code can be reused for this issue too. The other part of this issue is to have some way for the udfs to communicate to Pig that any input bags that they need are read once bags . This can be achieved by having an Interface - say UsesReadOnceBags which is serves as a tag to indicate the intent to Pig. Pig can then rewire its execution plan to use ReadOnceBags is feasible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1088: --- Attachment: PIG-1088.1.patch Changes address Pradeep's comments. All mergejoin test cases pass. Also ran test-commit test cases and ensured that they match results seen in PIG-1094 . test-patch results - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. change merge join and merge join indexer to work with new LoadFunc interface Key: PIG-1088 URL: https://issues.apache.org/jira/browse/PIG-1088 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: PIG-1088.1.patch, PIG-1088.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-843) PERFORMANCE: improvements in memory management
[ https://issues.apache.org/jira/browse/PIG-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-843. Resolution: Fixed I believe memory issue has been sufficiently addressed. PERFORMANCE: improvements in memory management -- Key: PIG-843 URL: https://issues.apache.org/jira/browse/PIG-843 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Currently, Pig uses way too much memory. We need to understand where memory goes and come up with strategy to minimize memory footprint -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1078) [zebra] merge join with empty table failed
[ https://issues.apache.org/jira/browse/PIG-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1078: -- Fix Version/s: 0.7.0 [zebra] merge join with empty table failed -- Key: PIG-1078 URL: https://issues.apache.org/jira/browse/PIG-1078 Project: Pig Issue Type: Bug Reporter: Jing Huang Fix For: 0.6.0, 0.7.0 Attachments: PIG-1078.patch Got indexOutOfBound exception. Here is the pig script: register /grid/0/dev/hadoopqa/jars/zebra.jar; --a1 = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --a2 = load 'empty.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --dump a1; --a1order = order a1 by a; --a2order = order a2 by a; --store a1order into 'a1' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]'); --store a2order into 'empty' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]'); rec1 = load 'a1' using org.apache.hadoop.zebra.pig.TableLoader(); rec2 = load 'empty' using org.apache.hadoop.zebra.pig.TableLoader(); joina = join rec1 by a, rec2 by a using merge ; dump joina; == please note that table a1 and empty are created correctly. Here is the stack trace: Backend error message - java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478) at org.apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Pig Stack Trace --- ERROR 6015: During execution, encountered a Hadoop error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias joina at org.apache.pig.PigServer.openIterator(PigServer.java:481) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During execution, encountered a Hadoop error. at .apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478) at .apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166) at .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400) at .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at .apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at .apache.hadoop.mapred.MapTask.run(MapTask.java:307) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 ... 10 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1074) Zebra store function should allow '::' in column names in output schema
[ https://issues.apache.org/jira/browse/PIG-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1074: -- Fix Version/s: 0.7.0 0.6.0 Zebra store function should allow '::' in column names in output schema --- Key: PIG-1074 URL: https://issues.apache.org/jira/browse/PIG-1074 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath Fix For: 0.6.0, 0.7.0 the following script fails: {noformat} a = load '/zebra/singlefile/studenttab10k' using org.apache.hadoop.zebra.pig.TableLoader() as (name, age, gpa); b = load '/zebra/singlefile/votertab10k' using org.apache.hadoop.zebra.pig.TableLoader() as (name, age, registration, contributions); c = filter a by age 20; d = filter b by age 20; store c into '/user/pig/out//ZebraMultiQuery_30.out.1' using org.apache.hadoop.zebra.pig.TableStorer(''); store d into '/user/pig/out//ZebraMultiQuery_30.out.2' using org.apache.hadoop.zebra.pig.TableStorer(''); e = cogroup c by name, d by name; f = foreach e generate flatten(c), flatten(d); store f into '/user/pig//ZebraMultiQuery_30.out.3' using org.apache.hadoop.zebra.pig.TableStorer(''); {noformat} Here the schema of f has names like c::name and it looks like zebra storefunc does not allow '::' in column name The stack trace is ERROR 2997: Unable to recreate exception from backend error: java.io.IOException: ColumnGroup.Writer constructor failed : Partition constructor failed :Encountered : : at line 1, column 3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1098) [zebra] Zebra Performance Optimizations
[ https://issues.apache.org/jira/browse/PIG-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1098: -- Fix Version/s: 0.7.0 0.6.0 [zebra] Zebra Performance Optimizations --- Key: PIG-1098 URL: https://issues.apache.org/jira/browse/PIG-1098 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Many in-core performance optimization opportunities exist in zebra, such as removal of redundant precautionary checks, use of better collection types to reduce levels of indirection to the memory objects, changing of input splits in ascending sizes to descending sizes. Observed protyped improvements are around 10% wall clock time improvements. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1095) [zebra] Schema support of anonymous fields in COLECTION fails
[ https://issues.apache.org/jira/browse/PIG-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1095: -- Fix Version/s: 0.7.0 0.6.0 [zebra] Schema support of anonymous fields in COLECTION fails - Key: PIG-1095 URL: https://issues.apache.org/jira/browse/PIG-1095 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 The schema parser fails on schemas of COLLECTION columns like c:collection(int). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: TPC-H benchmark
I don't know of any. Officially Pig cannot publish a TPC-H number because it is not a transaction based store. But I still think it would be very interesting to see the results if someone took the time to translate the queries. Alan. On Nov 22, 2009, at 6:20 PM, RichardGUO Fei wrote: Hi, Apart from Pig Performance and Pig Mix, do you know any TPC-H benchmark rewritten for Pig? Thanks, Richard _ MSN十周年庆典,查看MSN注册时间,赢取神秘大奖 http://10.msn.com.cn
[jira] Resolved: (PIG-844) PERFORMANCE: streaming data to the UDFs in foreach
[ https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-844. accumulate interface took care of this. PERFORMANCE: streaming data to the UDFs in foreach -- Key: PIG-844 URL: https://issues.apache.org/jira/browse/PIG-844 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use more memory than actually needed as in many cases it would be better to push the data one tuple at a time to the UDFs. For the case where combiner is invoked, this might not be that important; however, for non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant memory improvement. Another possible use case is where the data is already grouped going into pig and we don't need to group it again. How this will effect UDF interface needs to be further discussed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-856) PERFORMANCE: reduce number of replicas
[ https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-856. Resolution: Won't Fix We tried reducing the number of replicas and the performance actually degraded probably because there were fewer places to read the data from. PERFORMANCE: reduce number of replicas -- Key: PIG-856 URL: https://issues.apache.org/jira/browse/PIG-856 Project: Pig Issue Type: Improvement Affects Versions: 0.3.0 Reporter: Olga Natkovich Currently Pig uses the default number of replicas between MR jobs. Currently, the number is 3. Given the temp nature of the data, we should never need more than 2 and should explicitely set it to improve performance and to be nicer to the name node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-598: -- Status: Open (was: Patch Available) Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-598: -- Patch Info: (was: [Patch Available]) Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781510#action_12781510 ] Thejas M Nair commented on PIG-598: --- bq. One issue I faced while working on PIG-928 was when trying to name variables in ruby bound to java variables. Ashutosh, You can use \ to escape parameter substitution . Use 'return \$input.split();' instead of 'return $input.split();' . After parameter substitution, it becomes 'return $input.split();' . Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath reopened PIG-1090: - Reopening since we need to implement LoadMetadata interface in BinStorage so as to implement the getSchema() method in that interface - this will depend on the decision for the comment - http://issues.apache.org/jira/browse/PIG-966?focusedCommentId=12780873page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12780873 Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-598: -- Attachment: PIG-598.1.patch Additional changes in this patch- * Fixed parsing in PigFileParser.jj * Modified test input file for testCommentWithParam() - inputComment.pig, to include comments within and at end of statements Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.1.patch, PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-598: -- Status: Patch Available (was: Open) Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.1.patch, PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781546#action_12781546 ] Ashutosh Chauhan commented on PIG-598: -- I guess my question is what should be the behavior when $ is specified in the script and no substitution for it is provided. There are two options: a) If Pig encounters a $ and doesn't find a substitution for it, it fails right there. b) Pig logs a warning message and continue assuming user wants literal $ and not the substitution. Advantage for b) is there will not be a need of escaping. Disadvantage is when no substitution was unintentional, Pig will fail later, possibly with a different error message. Disadvantage of a) is it mandates user to escape $, where its possible not to have such requirement. Advantage is a clear error message can be thrown if no substitution was unintentional. What do you think which option shall we choose? Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.1.patch, PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split
[ https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781551#action_12781551 ] Alan Gates commented on PIG-1091: - Patch applied to 0.6 branch. [zebra] Exception when load with projection of map keys on a map column that is not map split -- Key: PIG-1091 URL: https://issues.apache.org/jira/browse/PIG-1091 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1091.patch With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection of f2#{a} will see exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781560#action_12781560 ] Hadoop QA commented on PIG-872: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425805/PIG_872.patch.1 against trunk revision 882818. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/console This message is automatically generated. use distributed cache for the replicated data set in FR join Key: PIG-872 URL: https://issues.apache.org/jira/browse/PIG-872 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Attachments: PIG_872.patch.1 Currently, the replicated file is read directly from DFS by all maps. If the number of the concurrent maps is huge, we can overwhelm the NameNode with open calls. Using distributed cache will address the issue and might also give a performance boost since the file will be copied locally once and the reused by all tasks running on the same machine. The basic approach would be to use cacheArchive to place the file into the cache on the frontend and on the backend, the tasks would need to refer to the data using path from the cache. Note that cacheArchive does not work in Hadoop local mode. (Not a problem for us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781562#action_12781562 ] Thejas M Nair commented on PIG-598: --- I prefer option a. It is just a matter of putting a \ before the $ in the scripts :) I think compared to the cost of time spending debugging a weird error or unexpected output results, the cost of a for the user is trivial. Ideally, I think we should support an option where user can change from default behavior (a) to (b) using a commandline switch or a statement in the script. Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.1.patch, PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1078) [zebra] merge join with empty table failed
[ https://issues.apache.org/jira/browse/PIG-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-1078: --- Assignee: Yan Zhou [zebra] merge join with empty table failed -- Key: PIG-1078 URL: https://issues.apache.org/jira/browse/PIG-1078 Project: Pig Issue Type: Bug Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0, 0.7.0 Attachments: PIG-1078.patch Got indexOutOfBound exception. Here is the pig script: register /grid/0/dev/hadoopqa/jars/zebra.jar; --a1 = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --a2 = load 'empty.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --dump a1; --a1order = order a1 by a; --a2order = order a2 by a; --store a1order into 'a1' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]'); --store a2order into 'empty' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]'); rec1 = load 'a1' using org.apache.hadoop.zebra.pig.TableLoader(); rec2 = load 'empty' using org.apache.hadoop.zebra.pig.TableLoader(); joina = join rec1 by a, rec2 by a using merge ; dump joina; == please note that table a1 and empty are created correctly. Here is the stack trace: Backend error message - java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478) at org.apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Pig Stack Trace --- ERROR 6015: During execution, encountered a Hadoop error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias joina at org.apache.pig.PigServer.openIterator(PigServer.java:481) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During execution, encountered a Hadoop error. at .apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478) at .apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166) at .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400) at .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at .apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at .apache.hadoop.mapred.MapTask.run(MapTask.java:307) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 ... 10 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781608#action_12781608 ] Pradeep Kamath commented on PIG-1088: - The patch did not include tests since there were existing tests in TestMergeJoin which I confirmed work in the load-store branch with this patch. change merge join and merge join indexer to work with new LoadFunc interface Key: PIG-1088 URL: https://issues.apache.org/jira/browse/PIG-1088 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: PIG-1088.1.patch, PIG-1088.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1088: Resolution: Fixed Hadoop Flags: [Incompatible change, Reviewed] Status: Resolved (was: Patch Available) +1, Patch committed to load-store-redesign with the minor change made in consultation with Thejas: DataType.isAtomic returns true for GENERIC_WRITABLECOMPARABLE and DataType.isComplex returns false for it. change merge join and merge join indexer to work with new LoadFunc interface Key: PIG-1088 URL: https://issues.apache.org/jira/browse/PIG-1088 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: PIG-1088.1.patch, PIG-1088.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781692#action_12781692 ] Dmitriy V. Ryaboy commented on PIG-966: --- LoadFunc has a method called determineSchema, not getSchema. This implies some sort of introspection, so I can see interpreting this as if you are looking at the data, use determineSchema, and if you have a metadata store/repo then implement LoadMetadata. But I agree this is clunky and potentially confusing. I am of two minds about this. On one hand, moving the method make sense as it's metadata-related. On the other hand, it makes implementations that work with self-describing formats like Avro implement a heavy-looking interface, and requires further changes to existing LoadFunc implementations that will have to be ported. Another issue is that LoadMetadata.getSchema() returns a ResourceSchema, whereas LoadFunc.determineSchema() returns Pig's Schema. The two are compatible (I have a translation from one to the other in PIG-760), but not the same. Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781695#action_12781695 ] Dmitriy V. Ryaboy commented on PIG-966: --- Regarding Streaming: We should support Typed Bytes as a binary protocol for streaming. This was a huge performance win for Dumbo (and I think Hive, as well). Here's a 7-slide intro: http://static.last.fm/johan/huguk-20090414/klaas-hadoop-1722.pdf Patch/discussion here: https://issues.apache.org/jira/browse/HADOOP-1722 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781705#action_12781705 ] Hadoop QA commented on PIG-598: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425862/PIG-598.1.patch against trunk revision 882818. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 213 javac compiler warnings (more than the trunk's current 211 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 361 release audit warnings (more than the trunk's current 356 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/console This message is automatically generated. Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.1.patch, PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments
[ https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781711#action_12781711 ] Thejas M Nair commented on PIG-598: --- bq. -1 javac. The applied patch generated 213 javac compiler warnings (more than the trunk's current 211 warnings). The additional warnings are from code generated by javacc, which cannot be fixed in the .jj files. bq. -1 release audit. The applied patch generated 361 release audit warnings (more than the trunk's current 356 warnings). The release audit warnings are from new test input and benchmark files, because they don't have the apache license header. Parameter substitution ($PARAMETER) should not be performed in comments --- Key: PIG-598 URL: https://issues.apache.org/jira/browse/PIG-598 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: David Ciemiewicz Assignee: Thejas M Nair Attachments: PIG-598.1.patch, PIG-598.patch Compiling the following code example will generate an error that $NOT_A_PARAMETER is an Undefined Parameter. This is problematic as sometimes you want to comment out parts of your code, including parameters so that you don't have to define them. This I think it would be really good if parameter substitution was not performed in comments. {code} -- $NOT_A_PARAMETER {code} {code} -bash-3.00$ pig -exectype local -latest comment.pig USING: /grid/0/gs/pig/current java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86) at org.apache.pig.Main.runParamPreprocessor(Main.java:394) at org.apache.pig.Main.main(Main.java:296) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1095) [zebra] Schema support of anonymous fields in COLECTION fails
[ https://issues.apache.org/jira/browse/PIG-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781781#action_12781781 ] Hadoop QA commented on PIG-1095: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425897/PIG-1095.patch against trunk revision 883515. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/53/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/53/console This message is automatically generated. [zebra] Schema support of anonymous fields in COLECTION fails - Key: PIG-1095 URL: https://issues.apache.org/jira/browse/PIG-1095 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1095.patch The schema parser fails on schemas of COLLECTION columns like c:collection(int). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: TPC-H benchmark
Hey, It's not Pig, but if you're looking for TPC-H on Hadoop, the Hive team has run the TPC-H benchmarks: http://issues.apache.org/jira/browse/HIVE-600. Regards, Jeff 2009/11/23 Alan Gates ga...@yahoo-inc.com I don't know of any. Officially Pig cannot publish a TPC-H number because it is not a transaction based store. But I still think it would be very interesting to see the results if someone took the time to translate the queries. Alan. On Nov 22, 2009, at 6:20 PM, RichardGUO Fei wrote: Hi, Apart from Pig Performance and Pig Mix, do you know any TPC-H benchmark rewritten for Pig? Thanks, Richard _ MSN十周年庆典,查看MSN注册时间,赢取神秘大奖 http://10.msn.com.cn