[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples
[ https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-712: --- Attachment: (was: Pig_712_Patch.txt) Need utilities to create schemas for bags and tuples Key: PIG-712 URL: https://issues.apache.org/jira/browse/PIG-712 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Priority: Minor Fix For: 0.3.0 Pig should provide utilities to create bag and tuple schemas. Currently, users return schemas in outputSchema method and end up with very verbose boiler plate code. It will be very nice if Pig encapsulates the boiler plate code in utility methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples
[ https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-712: --- Attachment: Pig_712_Patch_Merged.txt I've merged the two patches into one patch,because the testcase dependent on the implemenation. Need utilities to create schemas for bags and tuples Key: PIG-712 URL: https://issues.apache.org/jira/browse/PIG-712 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Priority: Minor Fix For: 0.3.0 Attachments: Pig_712_Patch_Merged.txt Pig should provide utilities to create bag and tuple schemas. Currently, users return schemas in outputSchema method and end up with very verbose boiler plate code. It will be very nice if Pig encapsulates the boiler plate code in utility methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PIG-712) Need utilities to create schemas for bags and tuples
[ https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695985#action_12695985 ] Jeff Zhang edited comment on PIG-712 at 4/6/09 3:43 AM: I've merged the two patches into one patch,because the TestSchemaUtil dependend on the SchemaUtil. But It seems I can not submit this patch. Why? was (Author: zjffdu): I've merged the two patches into one patch,because the testcase dependent on the implemenation. Need utilities to create schemas for bags and tuples Key: PIG-712 URL: https://issues.apache.org/jira/browse/PIG-712 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Priority: Minor Fix For: 0.3.0 Attachments: Pig_712_Patch_Merged.txt Pig should provide utilities to create bag and tuple schemas. Currently, users return schemas in outputSchema method and end up with very verbose boiler plate code. It will be very nice if Pig encapsulates the boiler plate code in utility methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696244#action_12696244 ] Pradeep Kamath commented on PIG-733: Tests are not included in this patch since there are existing tests for order by. All core unit tests did pass and finbugs gave the same number of warnings with and without the patch (output below). The excess warnings produced by the patch have been addressed in the new version of the patch (PIG-733-v2.patch). {noformat} === CORE UNIT TESTS OUTPUT WITH PATCH [prade...@afterside:/tmp/PIG-733/trunk] test-core: [mkdir] Created dir: /tmp/PIG-733/trunk/build/test/logs [junit] Running org.apache.pig.test.TestAdd [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.056 sec ... [junit] Running org.apache.pig.test.TestTypeCheckingValidatorNoSchema [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.629 sec [junit] Running org.apache.pig.test.TestUnion [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 49.94 sec test-contrib: BUILD SUCCESSFUL Total time: 77 minutes 47 seconds === FINDBUGS OUTPUT WITH PATCH [prade...@afterside:/tmp/PIG-733/trunk] [prade...@chargesize:/tmp/PIG-733/trunk]ant -Dfindbugs.home=/homes/pradeepk/findbugs-1.3.8 findbugs Buildfile: build.xml ... findbugs: [mkdir] Created dir: /tmp/PIG-733/trunk/build/test/findbugs [findbugs] Executing findbugs from ant task [findbugs] Running FindBugs... [findbugs] Warnings generated: 665 [findbugs] Calculating exit code... [findbugs] Setting 'bugs found' flag (1) [findbugs] Exit code set to: 1 [findbugs] Java Result: 1 [findbugs] Output saved to /tmp/PIG-733/trunk/build/test/findbugs/pig-findbugs-report.xml [xslt] Processing /tmp/PIG-733/trunk/build/test/findbugs/pig-findbugs-report.xml to /tmp/PIG-733/trunk/build/test/findbugs/pig-findbugs-report.html [xslt] Loading stylesheet /homes/pradeepk/findbugs-1.3.8/src/xsl/default.xsl === FINDBUGS OUTPUT WITHOUT PATCH [prade...@chargesize:/tmp/svncheckout/trunk]ant -Dfindbugs.home=/homes/pradeepk/findbugs-1.3.8 findbugs Buildfile: build.xml check-for-findbugs: ... findbugs: [mkdir] Created dir: /tmp/svncheckout/trunk/build/test/findbugs [findbugs] Executing findbugs from ant task [findbugs] Running FindBugs... [findbugs] Warnings generated: 665 [findbugs] Calculating exit code... [findbugs] Setting 'bugs found' flag (1) [findbugs] Exit code set to: 1 [findbugs] Java Result: 1 [findbugs] Output saved to /tmp/svncheckout/trunk/build/test/findbugs/pig-findbugs-report.xml [xslt] Processing /tmp/svncheckout/trunk/build/test/findbugs/pig-findbugs-report.xml to /tmp/svncheckout/trunk/build/test/findbugs/pig-findbugs-report.html [xslt] Loading stylesheet /homes/pradeepk/findbugs-1.3.8/src/xsl/default.xsl {noformat} Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input --- Key: PIG-733 URL: https://issues.apache.org/jira/browse/PIG-733 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.3.0 Attachments: PIG-733.patch Order by has a sampling job which samples the input and creates a sorted list of sample items. CUrrently the number of items sampled is 100 per map task. So if the input is large resulting in many maps (say 50,000) the sample is big. This sorted sample is stored on dfs. The WeightedRangePartitioner computes quantile boundaries and weighted probabilities for repeating values in each map by reading the samples file from DFS. In queries with many maps (in the order of 50,000) the dfs read of the sample file fails with FileSystem closed error. This seems to point to a dfs issue wherein a big dfs file being read simultaneously by many dfs clients (in this case all maps) causes the clients to be closed. However on the pig side, loading the sample from each map in the final map reduce job and computing the quantile boundaries and weighted probabilities is inefficient. We should do this computation through a FindQuantiles udf in the same map reduce job which produces the sorted samples. This way lesser data is written to dfs and in the final map reduce job, the weightedRangePartitioner needs to just load the computed information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-733: --- Attachment: PIG-733-v2.patch Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input --- Key: PIG-733 URL: https://issues.apache.org/jira/browse/PIG-733 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.3.0 Attachments: PIG-733-v2.patch, PIG-733.patch Order by has a sampling job which samples the input and creates a sorted list of sample items. CUrrently the number of items sampled is 100 per map task. So if the input is large resulting in many maps (say 50,000) the sample is big. This sorted sample is stored on dfs. The WeightedRangePartitioner computes quantile boundaries and weighted probabilities for repeating values in each map by reading the samples file from DFS. In queries with many maps (in the order of 50,000) the dfs read of the sample file fails with FileSystem closed error. This seems to point to a dfs issue wherein a big dfs file being read simultaneously by many dfs clients (in this case all maps) causes the clients to be closed. However on the pig side, loading the sample from each map in the final map reduce job and computing the quantile boundaries and weighted probabilities is inefficient. We should do this computation through a FindQuantiles udf in the same map reduce job which produces the sorted samples. This way lesser data is written to dfs and in the final map reduce job, the weightedRangePartitioner needs to just load the computed information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-754) Bugs with load and store and filenames passed with -param containing periods
[ https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696263#action_12696263 ] Viraj Bhat commented on PIG-754: Ciemo there is a workaround in this form, if we make a param_file known as testparamfile which contains the param infile, it works. param_file option passes through a different code path which avoids problems which the -param faces. {code} testmachine~/pigscripts cat testparamfile infile = 'file.right' pig -exectype local -param_file testparamfile infile.pig {code} I suspect that this issues is somewhat related to the following Jira, maybe . dot should be included as a problematic special character. http://issues.apache.org/jira/browse/PIG-564 Bugs with load and store and filenames passed with -param containing periods Key: PIG-754 URL: https://issues.apache.org/jira/browse/PIG-754 Project: Pig Issue Type: Bug Reporter: David Ciemiewicz This one drove me batty. I have two files file and file.right. file: {code} WRONG This is file, not file.right. {code} file.right: {code} RIGHT This is file.right.. {code} infile.pig: {code} A = load '$infile' using PigStorage(); dump A; {code} When I pass in file.right as the infile parameter value, the wrong file is read: {code} -bash-3.00$ pig -exectype local -param infile=file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:18:36,291 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:18:36,292 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (WRONG ) (This is file, not file.right.) {code} However, if I pass in infile as ./file.right, the script magically works. {code} -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:20:46,735 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:20:46,736 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I do not have this problem if I use the file name with a period in the script itself: infile2.pig {code} A = load 'file.right' using PigStorage(); dump A; {code} {code} -bash-3.00$ pig -exectype local infile2.pig USING: /grid/0/gs/pig/current 2009-04-05 23:22:47,022 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:22:47,023 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I also experience similar problems when I try to pass in param outfile in a store statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-754) Bugs with load and store and filenames passed with -param containing periods
[ https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696265#action_12696265 ] Viraj Bhat commented on PIG-754: Another workaround as suggested in PIG:564 :) {code} pig -exectype local -param infile=\'file.right\' infile.pig (RIGHT) (This is file.right..) {code} Bugs with load and store and filenames passed with -param containing periods Key: PIG-754 URL: https://issues.apache.org/jira/browse/PIG-754 Project: Pig Issue Type: Bug Reporter: David Ciemiewicz This one drove me batty. I have two files file and file.right. file: {code} WRONG This is file, not file.right. {code} file.right: {code} RIGHT This is file.right.. {code} infile.pig: {code} A = load '$infile' using PigStorage(); dump A; {code} When I pass in file.right as the infile parameter value, the wrong file is read: {code} -bash-3.00$ pig -exectype local -param infile=file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:18:36,291 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:18:36,292 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (WRONG ) (This is file, not file.right.) {code} However, if I pass in infile as ./file.right, the script magically works. {code} -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:20:46,735 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:20:46,736 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I do not have this problem if I use the file name with a period in the script itself: infile2.pig {code} A = load 'file.right' using PigStorage(); dump A; {code} {code} -bash-3.00$ pig -exectype local infile2.pig USING: /grid/0/gs/pig/current 2009-04-05 23:22:47,022 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:22:47,023 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I also experience similar problems when I try to pass in param outfile in a store statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PIG-754) Bugs with load and store and filenames passed with -param containing periods
[ https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696265#action_12696265 ] Viraj Bhat edited comment on PIG-754 at 4/6/09 2:43 PM: Another workaround as suggested in PIG-564 :) {code} pig -exectype local -param infile=\'file.right\' infile.pig (RIGHT) (This is file.right..) {code} was (Author: viraj): Another workaround as suggested in PIG:564 :) {code} pig -exectype local -param infile=\'file.right\' infile.pig (RIGHT) (This is file.right..) {code} Bugs with load and store and filenames passed with -param containing periods Key: PIG-754 URL: https://issues.apache.org/jira/browse/PIG-754 Project: Pig Issue Type: Bug Reporter: David Ciemiewicz This one drove me batty. I have two files file and file.right. file: {code} WRONG This is file, not file.right. {code} file.right: {code} RIGHT This is file.right.. {code} infile.pig: {code} A = load '$infile' using PigStorage(); dump A; {code} When I pass in file.right as the infile parameter value, the wrong file is read: {code} -bash-3.00$ pig -exectype local -param infile=file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:18:36,291 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:18:36,292 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (WRONG ) (This is file, not file.right.) {code} However, if I pass in infile as ./file.right, the script magically works. {code} -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:20:46,735 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:20:46,736 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I do not have this problem if I use the file name with a period in the script itself: infile2.pig {code} A = load 'file.right' using PigStorage(); dump A; {code} {code} -bash-3.00$ pig -exectype local infile2.pig USING: /grid/0/gs/pig/current 2009-04-05 23:22:47,022 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:22:47,023 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I also experience similar problems when I try to pass in param outfile in a store statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696266#action_12696266 ] David Ciemiewicz commented on PIG-564: -- Period (.) is also a special character that seems to cause problems. See related JIRA PIG-754 Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example: mydir='mydir-myextraqual' and then running the pig script as: java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param_file myparamfile mypigparamsub.pig The following issues need to be fixed: 1) In -param option if parameter value contains special characters, it is truncated 2) In param_file, if param_value contains a special characters, it should be enclosed in quotes 3) If 2 is a known issue then it should be documented in http://wiki.apache.org/pig/ParameterSubstitution -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-754) Bugs with load and store and filenames passed with -param containing periods
[ https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696273#action_12696273 ] Viraj Bhat commented on PIG-754: Something that I am still not understanding is why does the following work when you supply the full path!! even if it has a special character,?? {code} -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig (RIGHT) (This is file.right..) {code} or for that matter when you supply the full path {code} -bash-3.00$ pig -exectype local -param infile=/full/path/to/file.right infile.pig (RIGHT) (This is file.right..) {code} Bugs with load and store and filenames passed with -param containing periods Key: PIG-754 URL: https://issues.apache.org/jira/browse/PIG-754 Project: Pig Issue Type: Bug Reporter: David Ciemiewicz This one drove me batty. I have two files file and file.right. file: {code} WRONG This is file, not file.right. {code} file.right: {code} RIGHT This is file.right.. {code} infile.pig: {code} A = load '$infile' using PigStorage(); dump A; {code} When I pass in file.right as the infile parameter value, the wrong file is read: {code} -bash-3.00$ pig -exectype local -param infile=file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:18:36,291 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:18:36,292 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (WRONG ) (This is file, not file.right.) {code} However, if I pass in infile as ./file.right, the script magically works. {code} -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig USING: /grid/0/gs/pig/current 2009-04-05 23:20:46,735 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:20:46,736 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I do not have this problem if I use the file name with a period in the script itself: infile2.pig {code} A = load 'file.right' using PigStorage(); dump A; {code} {code} -bash-3.00$ pig -exectype local infile2.pig USING: /grid/0/gs/pig/current 2009-04-05 23:22:47,022 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-05 23:22:47,023 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (RIGHT) (This is file.right.) {code} I also experience similar problems when I try to pass in param outfile in a store statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-755) Difficult to debug parameter substitution problems based on the error messages when running in local mode
Difficult to debug parameter substitution problems based on the error messages when running in local mode - Key: PIG-755 URL: https://issues.apache.org/jira/browse/PIG-755 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script in which I do a parameter substitution for the input file. I have a use case where I find it difficult to debug based on the error messages in local mode. {code} A = load '$infile' using PigStorage() as ( date: chararray, count : long, gmean : double ); dump A; {code} 1) I run it in local mode with the input file in the current working directory {code} prompt $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -exectype local -param infile='inputfile.txt' localparamsub.pig {code} 2009-04-07 00:03:51,967 [main] ERROR org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore - Received error from storer function: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function. 2009-04-07 00:03:51,970 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Failed jobs!! 2009-04-07 00:03:51,971 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 1 out of 1 failed! 2009-04-07 00:03:51,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062631414.log ERROR 1066: Unable to open iterator for alias A org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A at org.apache.pig.PigServer.openIterator(PigServer.java:439) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:433) ... 5 more 2) I run it in map reduce mode {code} prompt $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -param infile='inputfile.txt' localparamsub.pig {code} 2009-04-07 00:07:31,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-04-07 00:07:32,074 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-04-07 00:07:34,543 [Thread-7] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-04-07 00:07:39,540 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-04-07 00:07:39,540 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed 2009-04-07 00:07:39,563 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: inputfile does not exist. Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062851400.log ERROR 2100: inputfile does not exist. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A at org.apache.pig.PigServer.openIterator(PigServer.java:439) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A at org.apache.pig.PigServer.store(PigServer.java:470) at org.apache.pig.PigServer.openIterator(PigServer.java:427) ... 5 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A at
[jira] Updated: (PIG-755) Difficult to debug parameter substitution problems based on the error messages when running in local mode
[ https://issues.apache.org/jira/browse/PIG-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-755: --- Attachment: localparamsub.pig inputfile.txt Script and testfile Difficult to debug parameter substitution problems based on the error messages when running in local mode - Key: PIG-755 URL: https://issues.apache.org/jira/browse/PIG-755 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Fix For: 0.3.0 Attachments: inputfile.txt, localparamsub.pig I have a script in which I do a parameter substitution for the input file. I have a use case where I find it difficult to debug based on the error messages in local mode. {code} A = load '$infile' using PigStorage() as ( date: chararray, count : long, gmean : double ); dump A; {code} 1) I run it in local mode with the input file in the current working directory {code} prompt $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -exectype local -param infile='inputfile.txt' localparamsub.pig {code} 2009-04-07 00:03:51,967 [main] ERROR org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore - Received error from storer function: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function. 2009-04-07 00:03:51,970 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Failed jobs!! 2009-04-07 00:03:51,971 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 1 out of 1 failed! 2009-04-07 00:03:51,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062631414.log ERROR 1066: Unable to open iterator for alias A org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A at org.apache.pig.PigServer.openIterator(PigServer.java:439) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:433) ... 5 more 2) I run it in map reduce mode {code} prompt $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -param infile='inputfile.txt' localparamsub.pig {code} 2009-04-07 00:07:31,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-04-07 00:07:32,074 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-04-07 00:07:34,543 [Thread-7] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-04-07 00:07:39,540 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-04-07 00:07:39,540 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed 2009-04-07 00:07:39,563 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: inputfile does not exist. Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062851400.log ERROR 2100: inputfile does not exist. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A at org.apache.pig.PigServer.openIterator(PigServer.java:439) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352)
[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696350#action_12696350 ] Pradeep Kamath commented on PIG-627: +1, patch committed. Thanks for the contribution Gunther! PERFORMANCE: multi-query optimization - Key: PIG-627 URL: https://issues.apache.org/jira/browse/PIG-627 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Attachments: file_cmds-0305.patch, fix_store_prob.patch, merge_741727_HEAD__0324.patch, merge_741727_HEAD__0324_2.patch, multi-store-0303.patch, multi-store-0304.patch, multiquery-phase2_0313.patch, multiquery-phase2_0323.patch, multiquery_0223.patch, multiquery_0224.patch, multiquery_0306.patch, multiquery_explain_fix.patch, non_reversible_store_load_dependencies.patch, non_reversible_store_load_dependencies_2.patch, noop_filter_absolute_path_flag.patch, noop_filter_absolute_path_flag_0401.patch Currently, if your Pig script contains multiple stores and some shared computation, Pig will execute several independent queries. For instance: A = load 'data' as (a, b, c); B = filter A by a 5; store B into 'output1'; C = group B by b; store C into 'output2'; This script will result in map-only job that generated output1 followed by a map-reduce job that generated output2. As the resuld data is read, parsed and filetered twice which is unnecessary and costly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-732) Utility UDFs
[ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696404#action_12696404 ] Ankur commented on PIG-732: --- Hi Olga, can you please take a look and suggest what's wrong? Utility UDFs - Key: PIG-732 URL: https://issues.apache.org/jira/browse/PIG-732 Project: Pig Issue Type: New Feature Reporter: Ankur Priority: Minor Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch Two utility UDFs and their respective test cases. 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples. 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.