[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779867#action_12779867 ] Sriranjan Manjunath commented on PIG-872: - Olga, I agree with your 1st point. I will get rid of the test case. To rectify 2, shouldn't maprReduceOper.getReplFiles() return only the replicated files? What's the rationale behind returning a null for the fragmented input? I could change it to what Ashutosh suggested, but it would just be cleaner if fragmented input was not represented by a null. use distributed cache for the replicated data set in FR join Key: PIG-872 URL: https://issues.apache.org/jira/browse/PIG-872 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Attachments: PIG_872.patch Currently, the replicated file is read directly from DFS by all maps. If the number of the concurrent maps is huge, we can overwhelm the NameNode with open calls. Using distributed cache will address the issue and might also give a performance boost since the file will be copied locally once and the reused by all tasks running on the same machine. The basic approach would be to use cacheArchive to place the file into the cache on the frontend and on the backend, the tasks would need to refer to the data using path from the cache. Note that cacheArchive does not work in Hadoop local mode. (Not a problem for us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-420) Limit on nothing functionality
[ https://issues.apache.org/jira/browse/PIG-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780011#action_12780011 ] Thejas M Nair commented on PIG-420: --- The idea proposed by Rekha seems to be better alternative for 'limit on nothing' . It would be good to have something similar to C++ preprocessor macros . This way the if debug decisions can be done at compile time, and there will not be any performance impact. Pig could have some syntax to denote debug only sections of the pig script , something like - {code} a = load 'file'; b = #IFDEF DEBUG { limit a, 100; } #ELSE { a; /*assuming we start supporting the syntax b=a; */} c = filter b by $0 = 1; #IFDEF DEBUG { store c into 'debug_file' ; } {code} Limit on nothing functionality -- Key: PIG-420 URL: https://issues.apache.org/jira/browse/PIG-420 Project: Pig Issue Type: Improvement Reporter: Anand Murugappan Pig 2.0 implements the limit feature but as a standalone statement. Limit is very useful in debug mode where we could run queries on smaller amount of data (faster and on fewer nodes) to iron out issues but in the production mode we would like to run through all the data. It would be good to have a easy switch between debug and prod mode using the limit statement without having to change the underlying code templates. Given that LIMIT is a separate standalone statement it gets hard to parametrize the code. For instance a query template might look like, A = LOAD '...'; B = LIMIT A $N; C = FOREACH B In debug mode, we would like to set the variable $N to 100 but in prod mode we would like to set it to a 'special value' that would not apply LIMIT and letting us run it on all the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780148#action_12780148 ] Olga Natkovich commented on PIG-872: I am fine if you want to remove it as long as it does not break any existing functionality. I am not sure why it is present in the list. use distributed cache for the replicated data set in FR join Key: PIG-872 URL: https://issues.apache.org/jira/browse/PIG-872 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Attachments: PIG_872.patch Currently, the replicated file is read directly from DFS by all maps. If the number of the concurrent maps is huge, we can overwhelm the NameNode with open calls. Using distributed cache will address the issue and might also give a performance boost since the file will be copied locally once and the reused by all tasks running on the same machine. The basic approach would be to use cacheArchive to place the file into the cache on the frontend and on the backend, the tasks would need to refer to the data using path from the cache. Note that cacheArchive does not work in Hadoop local mode. (Not a problem for us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator
[ https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780169#action_12780169 ] Pradeep Kamath commented on PIG-1064: - Patch committed to trunk. Behvaiour of COGROUP with and without schema when using * operator Key: PIG-1064 URL: https://issues.apache.org/jira/browse/PIG-1064 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, PIG-1064-5.patch, PIG-1064.patch I have 2 tab separated files, 1.txt and 2.txt $ cat 1.txt 1 2 2 3 $ cat 2.txt 1 2 2 3 I use COGROUP feature of Pig in the following way: $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main {code} grunt A = load '1.txt'; grunt B = load '2.txt' as (b0, b1); grunt C = cogroup A by *, B by *; {code} 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1012: Each COGroup input has to have the same number of inner plans Details at logfile: pig_1256845224752.log == If I reverse, the order of the schema's {code} grunt A = load '1.txt' as (a0, a1); grunt B = load '2.txt'; grunt C = cogroup A by *, B by *; {code} 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1013: Grouping attributes can either be star (*) or a list of expressions, but not both. Details at logfile: pig_1256845224752.log == Now running without schema?? {code} grunt A = load '1.txt'; grunt B = load '2.txt'; grunt C = cogroup A by *, B by *; grunt dump C; {code} 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-319926700/tmp-1990275961 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 2 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 154 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((1,2),{(1,2)},{(1,2)}) ((2,3),{(2,3)},{(2,3)}) == Is this a bug or a feature? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator
[ https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1064: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Behvaiour of COGROUP with and without schema when using * operator Key: PIG-1064 URL: https://issues.apache.org/jira/browse/PIG-1064 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, PIG-1064-5.patch, PIG-1064.patch I have 2 tab separated files, 1.txt and 2.txt $ cat 1.txt 1 2 2 3 $ cat 2.txt 1 2 2 3 I use COGROUP feature of Pig in the following way: $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main {code} grunt A = load '1.txt'; grunt B = load '2.txt' as (b0, b1); grunt C = cogroup A by *, B by *; {code} 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1012: Each COGroup input has to have the same number of inner plans Details at logfile: pig_1256845224752.log == If I reverse, the order of the schema's {code} grunt A = load '1.txt' as (a0, a1); grunt B = load '2.txt'; grunt C = cogroup A by *, B by *; {code} 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1013: Grouping attributes can either be star (*) or a list of expressions, but not both. Details at logfile: pig_1256845224752.log == Now running without schema?? {code} grunt A = load '1.txt'; grunt B = load '2.txt'; grunt C = cogroup A by *, B by *; grunt dump C; {code} 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-319926700/tmp-1990275961 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 2 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 154 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-10-29 12:55:37,202 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((1,2),{(1,2)},{(1,2)}) ((2,3),{(2,3)},{(2,3)}) == Is this a bug or a feature? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1097) Pig do not support group by boolean type
[ https://issues.apache.org/jira/browse/PIG-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780223#action_12780223 ] David Ciemiewicz commented on PIG-1097: --- I think that one could argue that Filter functions are REALLY just EvalBoolean functions in disguise. That Filter functions were a way of adding return type to Pig for Boolean cases when Pig had no types. Further, I'd argue, that now that Pig does have data types, that Filter should be deprecated and all Filter functions should now become EvalBoolean. In otherwords, I believe it was an oversight in the types migration to not migrate Filter to EvalBoolean Pig do not support group by boolean type Key: PIG-1097 URL: https://issues.apache.org/jira/browse/PIG-1097 Project: Pig Issue Type: Improvement Components: impl Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Fix For: 0.6.0 My Script is as following, the TestUDF return boolean type. {color:blue} DEFINE testUDF org.apache.pig.piggybank.util.TestUDF(); raw = LOAD 'data/input'; raw = FOREACH raw GENERATE testUDF(); raw = GROUP raw BY $0; DUMP raw; {color} *The above script will throw exception:* Exception in thread main org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias raw at org.apache.pig.PigServer.openIterator(PigServer.java:481) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.PigServer.registerScript(PigServer.java:409) at PigExample.main(PigExample.java:13) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias raw at org.apache.pig.PigServer.store(PigServer.java:536) at org.apache.pig.PigServer.openIterator(PigServer.java:464) ... 5 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:269) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780) at org.apache.pig.PigServer.store(PigServer.java:528) ... 6 more Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2036: Unhandled key type boolean at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.selectComparator(JobControlCompiler.java:856) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:561) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:128) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249) ... 8 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1100) PIG hangs on second call to DUMP or STORE
PIG hangs on second call to DUMP or STORE - Key: PIG-1100 URL: https://issues.apache.org/jira/browse/PIG-1100 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Environment: Linux mniv-laptop 2.6.24-25-generic #1 SMP Tue Oct 20 07:31:10 UTC 2009 i686 GNU/Linux java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode) Apache Pig version 0.5.0 (r829623) Hadoop 0.20.1 Subversion http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r 810220 Reporter: Michael Niv pig hangs on the last line on the script below when I run with -x local. It runs fine when run on hadoop. Happy to provide the files used in bugrep.pig below: v2.txt and document-date.pl (michael...@gmail.com) I initially ran into a problem which involved cogrouping two things like id_docdate_s1 below, but this is what I came up with while tightening down my bugreport. Thanks in advance. -- bugrep.pig DEFINE get_doc_date `document-date.pl`; id_text1 = LOAD 'v2.txt' AS (id,text); id_docdate1 = STREAM id_text1 THROUGH get_doc_date AS (id,docdate); id_docdate_s1 = ORDER id_docdate1 BY docdate; store id_docdate_s1 into 'f1.out'; id_text2 = LOAD 'v2.txt' AS (id,text); id_docdate2 = STREAM id_text2 THROUGH get_doc_date AS (id,docdate); id_docdate_s2 = ORDER id_docdate2 BY docdate; store id_docdate_s2 into 'f2.out';-- second store call hangs pig -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1100) PIG hangs on second call to DUMP or STORE
[ https://issues.apache.org/jira/browse/PIG-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Niv updated PIG-1100: - Attachment: bugrep.tar Attached are repro details, including pig script, perl streaming program, and sufficient data sample. Also a console-session of what I saw. PIG hangs on second call to DUMP or STORE - Key: PIG-1100 URL: https://issues.apache.org/jira/browse/PIG-1100 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Environment: Linux mniv-laptop 2.6.24-25-generic #1 SMP Tue Oct 20 07:31:10 UTC 2009 i686 GNU/Linux java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode) Apache Pig version 0.5.0 (r829623) Hadoop 0.20.1 Subversion http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r 810220 Reporter: Michael Niv Attachments: bugrep.tar pig hangs on the last line on the script below when I run with -x local. It runs fine when run on hadoop. Happy to provide the files used in bugrep.pig below: v2.txt and document-date.pl (michael...@gmail.com) I initially ran into a problem which involved cogrouping two things like id_docdate_s1 below, but this is what I came up with while tightening down my bugreport. Thanks in advance. -- bugrep.pig DEFINE get_doc_date `document-date.pl`; id_text1 = LOAD 'v2.txt' AS (id,text); id_docdate1 = STREAM id_text1 THROUGH get_doc_date AS (id,docdate); id_docdate_s1 = ORDER id_docdate1 BY docdate; store id_docdate_s1 into 'f1.out'; id_text2 = LOAD 'v2.txt' AS (id,text); id_docdate2 = STREAM id_text2 THROUGH get_doc_date AS (id,docdate); id_docdate_s2 = ORDER id_docdate2 BY docdate; store id_docdate_s2 into 'f2.out';-- second store call hangs pig -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1094) Fix unit tests corresponding to source changes so far
[ https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780312#action_12780312 ] Richard Ding commented on PIG-1094: --- The change of PIG-879 added the following failure cases: ||Testcase Class||Testcase Method||Cause|| |TestLoad|testLoadRemoteRel|local mode needs to be fixed| |TestLoad|testLoadRemoteRelScheme|local mode needs to be fixed| |TestLoad|testGlobChars|local mode needs to be fixed| Fix unit tests corresponding to source changes so far - Key: PIG-1094 URL: https://issues.apache.org/jira/browse/PIG-1094 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1094.patch The check-in's so far on load-store-redesign branch have nor addressed unit test failures due to interface changes. This jira is to track the task of making the common case unit tests work with the new interfaces. Some aspects of the new proposal like using LoadCaster interface for casting, making local mode work have not been completed yet. Tests which are failing due to those reasons will not be fixed in this jira and addressed in the jiras corresponding to those tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1085) Pass JobConf and UDF specific configuration information to UDFs
[ https://issues.apache.org/jira/browse/PIG-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780313#action_12780313 ] Alan Gates commented on PIG-1085: - Applied the patch to the 0.6 branch as well. Pass JobConf and UDF specific configuration information to UDFs --- Key: PIG-1085 URL: https://issues.apache.org/jira/browse/PIG-1085 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Attachments: udfconf-2.patch, udfconf.patch Users have long asked for a way to get the JobConf structure in their UDFs. It would also be nice to have a way to pass properties between the front end and back end so that UDFs can store state during parse time and use it at runtime. This patch does part of what is proposed in PIG-602, but not all of it. It does not provide a way to give user specified configuration files to UDFs. So I will mark 602 as depending on this bug, but it isn't a duplicate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1099) [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG
[ https://issues.apache.org/jira/browse/PIG-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1099: Resolution: Fixed Fix Version/s: 0.7.0 Status: Resolved (was: Patch Available) Patch checked in. [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG -- Key: PIG-1099 URL: https://issues.apache.org/jira/browse/PIG-1099 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.7.0 Attachments: PIG_1099.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Welcome Jeff Zhang
All, I would like to welcome Jeff Zhang as our newest Pig committer. Jeff has been contributing to Pig for about nine months now. He's been active on the mailing lists, in contributing patches, and in helping other users with their patches. Congratulations Jeff, and thanks for your contributions to Pig. Alan.
[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split
[ https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1091: -- Attachment: PIG-1091.patch [zebra] Exception when load with projection of map keys on a map column that is not map split -- Key: PIG-1091 URL: https://issues.apache.org/jira/browse/PIG-1091 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Attachments: PIG-1091.patch With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection of f2#{a} will see exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split
[ https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1091: -- Status: Patch Available (was: Open) [zebra] Exception when load with projection of map keys on a map column that is not map split -- Key: PIG-1091 URL: https://issues.apache.org/jira/browse/PIG-1091 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Attachments: PIG-1091.patch With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection of f2#{a} will see exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1094) Fix unit tests corresponding to source changes so far
[ https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780325#action_12780325 ] Olga Natkovich commented on PIG-1094: - We should fix the failures as part of the patch that caused them. Now that we have the baseline, we should make sure that we don't introduce any new failures. Fix unit tests corresponding to source changes so far - Key: PIG-1094 URL: https://issues.apache.org/jira/browse/PIG-1094 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1094.patch The check-in's so far on load-store-redesign branch have nor addressed unit test failures due to interface changes. This jira is to track the task of making the common case unit tests work with the new interfaces. Some aspects of the new proposal like using LoadCaster interface for casting, making local mode work have not been completed yet. Tests which are failing due to those reasons will not be fixed in this jira and addressed in the jiras corresponding to those tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-879. Resolution: Fixed Hadoop Flags: [Reviewed] +1, patch committed on load-store-redesign branch with minor change in TestLoad to correctly set up file on MiniCluster. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Assignee: Richard Ding Attachments: PIG-879.patch, PIG-879.patch, PIG-879.patch, PIG-879.patch, PIG-879.patch Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Welcome Jeff Zhang
I am very glad to join the pig family. I have grown and learned a lot with others' help in the last nine months.I will continue contribute to pig and learn from others. Jeff Zhang On Thu, Nov 19, 2009 at 2:48 PM, Alan Gates ga...@yahoo-inc.com wrote: All, I would like to welcome Jeff Zhang as our newest Pig committer. Jeff has been contributing to Pig for about nine months now. He's been active on the mailing lists, in contributing patches, and in helping other users with their patches. Congratulations Jeff, and thanks for your contributions to Pig. Alan.
Re: Welcome Jeff Zhang
Congrats Jeff! On Thu, Nov 19, 2009 at 7:47 PM, Jeff Zhang zjf...@gmail.com wrote: I am very glad to join the pig family. I have grown and learned a lot with others' help in the last nine months.I will continue contribute to pig and learn from others. Jeff Zhang On Thu, Nov 19, 2009 at 2:48 PM, Alan Gates ga...@yahoo-inc.com wrote: All, I would like to welcome Jeff Zhang as our newest Pig committer. Jeff has been contributing to Pig for about nine months now. He's been active on the mailing lists, in contributing patches, and in helping other users with their patches. Congratulations Jeff, and thanks for your contributions to Pig. Alan.
[jira] Assigned: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig
[ https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy reassigned PIG-909: - Assignee: Dmitriy V. Ryaboy Allow Pig executable to use hadoop jars not bundled with pig Key: PIG-909 URL: https://issues.apache.org/jira/browse/PIG-909 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Attachments: pig_909.patch The current pig executable (bin/pig) looks for a file named hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig. The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop jars, if that variable is set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split
[ https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1091: --- Patch reviewed. +1 [zebra] Exception when load with projection of map keys on a map column that is not map split -- Key: PIG-1091 URL: https://issues.apache.org/jira/browse/PIG-1091 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Attachments: PIG-1091.patch With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection of f2#{a} will see exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1088: --- Attachment: PIG-1088.patch * LoadOrderedInput (proposed) is called OrderedLoadFunc . It is an abstract class because LoadFunc is an abstract class. * TextInputOrder(proposed) is called FileInputLoadFunc. * New internal type called GENERIC_WRITABLECOMPARABLE has been added, to be used for WritableComparable classes. Tuples can read/write this type. * ReadToEndLoader takes a list of input splits to be read All TestMergeJoin test cases are passing. testpatch results - [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. change merge join and merge join indexer to work with new LoadFunc interface Key: PIG-1088 URL: https://issues.apache.org/jira/browse/PIG-1088 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: PIG-1088.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1097) Pig do not support group by boolean type
[ https://issues.apache.org/jira/browse/PIG-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780380#action_12780380 ] Jeff Zhang commented on PIG-1097: - agree, FilterFunc is equivalent to EvalFuncBoolean in my opinion. I do not know about the history of FilterFunc, does it come before pig support types? But now I think it should be deprecated. And why pig do not support boolean type in foreach projection and group by ? any performance consideration ? Pig do not support group by boolean type Key: PIG-1097 URL: https://issues.apache.org/jira/browse/PIG-1097 Project: Pig Issue Type: Improvement Components: impl Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Fix For: 0.6.0 My Script is as following, the TestUDF return boolean type. {color:blue} DEFINE testUDF org.apache.pig.piggybank.util.TestUDF(); raw = LOAD 'data/input'; raw = FOREACH raw GENERATE testUDF(); raw = GROUP raw BY $0; DUMP raw; {color} *The above script will throw exception:* Exception in thread main org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias raw at org.apache.pig.PigServer.openIterator(PigServer.java:481) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.PigServer.registerScript(PigServer.java:409) at PigExample.main(PigExample.java:13) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias raw at org.apache.pig.PigServer.store(PigServer.java:536) at org.apache.pig.PigServer.openIterator(PigServer.java:464) ... 5 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:269) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780) at org.apache.pig.PigServer.store(PigServer.java:528) ... 6 more Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2036: Unhandled key type boolean at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.selectComparator(JobControlCompiler.java:856) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:561) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:128) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249) ... 8 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780384#action_12780384 ] Hadoop QA commented on PIG-1088: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425554/PIG-1088.patch against trunk revision 882340. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/50/console This message is automatically generated. change merge join and merge join indexer to work with new LoadFunc interface Key: PIG-1088 URL: https://issues.apache.org/jira/browse/PIG-1088 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: PIG-1088.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split
[ https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780456#action_12780456 ] Hadoop QA commented on PIG-1091: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425542/PIG-1091.patch against trunk revision 882340. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/162/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/162/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/162/console This message is automatically generated. [zebra] Exception when load with projection of map keys on a map column that is not map split -- Key: PIG-1091 URL: https://issues.apache.org/jira/browse/PIG-1091 Project: Pig Issue Type: Bug Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Attachments: PIG-1091.patch With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection of f2#{a} will see exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.