[jira] Commented: (PIG-6) Addition of Hbase Storage Option In Load/Store Statement
[ https://issues.apache.org/jira/browse/PIG-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715385#action_12715385 ] Amr Awadallah commented on PIG-6: - Any progress on this? Addition of Hbase Storage Option In Load/Store Statement Key: PIG-6 URL: https://issues.apache.org/jira/browse/PIG-6 Project: Pig Issue Type: New Feature Environment: all environments Reporter: Edward J. Yoon Fix For: 0.2.0 Attachments: hbase-0.18.1-test.jar, hbase-0.18.1.jar, m34813f5.txt, PIG-6.patch, PIG-6_V01.patch It needs to be able to load full table in hbase. (maybe ... difficult? i'm not sure yet.) Also, as described below, It needs to compose an abstract 2d-table only with certain data filtered from hbase array structure using arbitrary query-delimited. {code} A = LOAD table('hbase_table'); or B = LOAD table('hbase_table') Using HbaseQuery('Query-delimited by attributes timestamp') as (f1, f2[, f3]); {code} Once test is done on my local machines, I will clarify the grammars and give you more examples to help you explain more storage options. Any advice welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #66
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/ -- [...truncated 91192 lines...] [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 0 for block blk_-1934744134976204728_1010 terminating [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39568 is added to blk_-1934744134976204728_1010 size 6 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block blk_-1934744134976204728_1010 of size 6 from /127.0.0.1 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54659 is added to blk_-1934744134976204728_1010 size 6 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 1 for block blk_-1934744134976204728_1010 terminating [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block blk_-1934744134976204728_1010 of size 6 from /127.0.0.1 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 2 for block blk_-1934744134976204728_1010 terminating [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:58588 is added to blk_-1934744134976204728_1010 size 6 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/hudson/input2.txt. blk_4153181122005715837_1011 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Receiving block blk_4153181122005715837_1011 src: /127.0.0.1:34228 dest: /127.0.0.1:58588 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Receiving block blk_4153181122005715837_1011 src: /127.0.0.1:52486 dest: /127.0.0.1:54659 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Receiving block blk_4153181122005715837_1011 src: /127.0.0.1:45265 dest: /127.0.0.1:39568 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block blk_4153181122005715837_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39568 is added to blk_4153181122005715837_1011 size 6 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 0 for block blk_4153181122005715837_1011 terminating [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block blk_4153181122005715837_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54659 is added to blk_4153181122005715837_1011 size 6 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 1 for block blk_4153181122005715837_1011 terminating [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:58588 is added to blk_4153181122005715837_1011 size 6 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block blk_4153181122005715837_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 2 for block blk_4153181122005715837_1011 terminating [exec] [junit] 09/06/02 01:05:52 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:59653 [exec] [junit] 09/06/02 01:05:52 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:47970 [exec] [junit] 09/06/02 01:05:52 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/06/02 01:05:52 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Deleting block blk_4578647973586690435_1006 file dfs/data/data1/current/blk_4578647973586690435 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Deleting block blk_5949797326287727563_1005 file dfs/data/data2/current/blk_5949797326287727563 [exec] [junit] 09/06/02 01:05:53 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/06/02 01:05:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/06/02 01:05:53 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200906020105_0002/job.jar. blk_456729402385173_1012 [exec] [junit] 09/06/02 01:05:53 INFO dfs.DataNode: Receiving block blk_456729402385173_1012 src: /127.0.0.1:50300 dest: /127.0.0.1:42011 [exec] [junit] 09/06/02 01:05:53 INFO dfs.DataNode: Receiving block blk_456729402385173_1012 src: /127.0.0.1:52489 dest: /127.0.0.1:54659 [exec] [junit] 09/06/02 01:05:53 INFO dfs.DataNode: Receiving block
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715406#action_12715406 ] Hadoop QA commented on PIG-796: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409612/796.patch against trunk revision 780722. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 225 javac compiler warnings (more than the trunk's current 224 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/console This message is automatically generated. support conversion from numeric types to chararray --- Key: PIG-796 URL: https://issues.apache.org/jira/browse/PIG-796 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Attachments: 796.patch, pig-796.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig
[ https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715639#action_12715639 ] Alan Gates commented on PIG-826: It can be done like this: {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); Grouped = group Logs all; foreach Grouped { duser = distinct Logs.user; dcountry = distinct Logs.country; durl = distinct Logs.url; generate COUNT(duser), COUNT(dcountry), COUNT(durl); }; {code} DISTINCT as Function/Operator rather than statement/operator - High Level Pig --- Key: PIG-826 URL: https://issues.apache.org/jira/browse/PIG-826 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz In SQL, a user would think nothing of doing something like: {code} select COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url) as url_count from server_logs; {code} But in Pig, we'd need to do something like the following. And this is about the most compact version I could come up with. {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); DistinctUsers = distinct (foreach Logs generate user); DistinctCountries = distinct (foreach Logs generate country); DistinctUrls = distinct (foreach Logs generate url); DistinctUsersCount = foreach (group DistinctUsers all) generate group, COUNT(DistinctUsers) as user_count; DistinctCountriesCount = foreach (group DistinctCountries all) generate group, COUNT(DistinctCountries) as country_count; DistinctUrlCount = foreach (group DistinctUrls all) generate group, COUNT(DistinctUrls) as url_count; AllDistinctCounts = cross DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount; Report = foreach AllDistinctCounts generate DistinctUsersCount::user_count, DistinctCountriesCount::country_count, DistinctUrlCount::url_count; store Report into 'log_report' using PigStorage(); {code} It would be good if there was a higher level version of Pig that permitted code to be written as: {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); Report = overall Logs generate COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url)) as url_count; store Report into 'log_report' using PigStorage(); {code} I do want this in Pig and not as SQL. I'd expect High Level Pig to generate Lower Level Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715651#action_12715651 ] Hadoop QA commented on PIG-830: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409682/pig-830.patch against trunk revision 780722. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/67/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/67/console This message is automatically generated. Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830.patch The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program
Records and bytes written reported by pig are wrong in a multi-store program Key: PIG-831 URL: https://issues.apache.org/jira/browse/PIG-831 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor The stats features checked in as part of PIG-626 (reporting the number of records and bytes written at the end of the query) print wrong values (often but not always 0) when the pig script being run contains more than 1 store. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program
[ https://issues.apache.org/jira/browse/PIG-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715657#action_12715657 ] Alan Gates commented on PIG-831: There are a couple of issues going on here. One, PigStats looks through the plan until it finds the first root and then stops. So for multi-store scripts that have multiple roots in their plans, this does not work. Two, Hadoop does not return accurate numbers for records written in many cases. I do not know if this is a bug in hadoop or a bug in the output format pig uses when doing multiple stores in one job. Records and bytes written reported by pig are wrong in a multi-store program Key: PIG-831 URL: https://issues.apache.org/jira/browse/PIG-831 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor The stats features checked in as part of PIG-626 (reporting the number of records and bytes written at the end of the query) print wrong values (often but not always 0) when the pig script being run contains more than 1 store. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig
[ https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715680#action_12715680 ] Amr Awadallah commented on PIG-826: --- neat. DISTINCT as Function/Operator rather than statement/operator - High Level Pig --- Key: PIG-826 URL: https://issues.apache.org/jira/browse/PIG-826 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz In SQL, a user would think nothing of doing something like: {code} select COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url) as url_count from server_logs; {code} But in Pig, we'd need to do something like the following. And this is about the most compact version I could come up with. {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); DistinctUsers = distinct (foreach Logs generate user); DistinctCountries = distinct (foreach Logs generate country); DistinctUrls = distinct (foreach Logs generate url); DistinctUsersCount = foreach (group DistinctUsers all) generate group, COUNT(DistinctUsers) as user_count; DistinctCountriesCount = foreach (group DistinctCountries all) generate group, COUNT(DistinctCountries) as country_count; DistinctUrlCount = foreach (group DistinctUrls all) generate group, COUNT(DistinctUrls) as url_count; AllDistinctCounts = cross DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount; Report = foreach AllDistinctCounts generate DistinctUsersCount::user_count, DistinctCountriesCount::country_count, DistinctUrlCount::url_count; store Report into 'log_report' using PigStorage(); {code} It would be good if there was a higher level version of Pig that permitted code to be written as: {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); Report = overall Logs generate COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url)) as url_count; store Report into 'log_report' using PigStorage(); {code} I do want this in Pig and not as SQL. I'd expect High Level Pig to generate Lower Level Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-564: --- Attachment: PIG-564.patch Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Olga Natkovich Attachments: PIG-564.patch Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example: mydir='mydir-myextraqual' and then running the pig script as: java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param_file myparamfile mypigparamsub.pig The following issues need to be fixed: 1) In -param option if parameter value contains special characters, it is truncated 2) In param_file, if param_value contains a special characters, it should be enclosed in quotes 3) If 2 is a known issue then it should be documented in http://wiki.apache.org/pig/ParameterSubstitution -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-564: --- Status: Patch Available (was: Open) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Olga Natkovich Attachments: PIG-564.patch Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example: mydir='mydir-myextraqual' and then running the pig script as: java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param_file myparamfile mypigparamsub.pig The following issues need to be fixed: 1) In -param option if parameter value contains special characters, it is truncated 2) In param_file, if param_value contains a special characters, it should be enclosed in quotes 3) If 2 is a known issue then it should be documented in http://wiki.apache.org/pig/ParameterSubstitution -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-830: --- Attachment: TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt Log file for failing unit test. Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830.patch, TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-809) number of input lines it processed, number of output lines it produced for PIG job
[ https://issues.apache.org/jira/browse/PIG-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715709#action_12715709 ] Alan Gates commented on PIG-809: Sorry, I referenced the wrong jira in the previous comment. I meant PIG-626. number of input lines it processed, number of output lines it produced for PIG job -- Key: PIG-809 URL: https://issues.apache.org/jira/browse/PIG-809 Project: Pig Issue Type: Improvement Components: impl Environment: Linux Reporter: Supreeth Excerpt from the mail conversation. It will be a great addition to Pig. Hadoop currently provides all these counters. All Pig has to do is to add them up for all Hadoop jobs in the script, and emit them at the end of the script. File a jira ? - Milind On 5/13/09 8:16 AM, Supreeth Hosur Nagesh Rao supre...@yahoo-inc.com wrote: Hi Olga With every PIG job is there any way for us to trap into the operational stats of that job, like number of input lines it processed, number of output lines it produced? I dont want to have a separate PIG script to do the same as it may be additional parsing, so is there such a stat. If not can that be provided, and exposed as a config parameter? -Supreeth This will be a great feature to have for our processing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-830: --- Status: Open (was: Patch Available) When I run the unit tests I get a failure in TestMyRegexLoader. I'll attach the log file. Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830.patch, TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-830: -- Attachment: pig-830-v2.patch Sorry about that. New version attached, passes the test this time. Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830-v2.patch, pig-830.patch, TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig
[ https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715726#action_12715726 ] David Ciemiewicz commented on PIG-826: -- Alan, thanks! But what if I want to do the following: {code} foreach Grouped { dcountryurl = distinct Logs.(country,url); generate COUNT(dcountryurl); }; {code} Projecting multiple aliases doesn't seem to work. I also tried the following and it doesn't work either. {code} foreach Grouped { dcountryurl = distinct Logs.country, Logs.url; generate COUNT(dcountryurl); }; {code} DISTINCT as Function/Operator rather than statement/operator - High Level Pig --- Key: PIG-826 URL: https://issues.apache.org/jira/browse/PIG-826 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz In SQL, a user would think nothing of doing something like: {code} select COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url) as url_count from server_logs; {code} But in Pig, we'd need to do something like the following. And this is about the most compact version I could come up with. {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); DistinctUsers = distinct (foreach Logs generate user); DistinctCountries = distinct (foreach Logs generate country); DistinctUrls = distinct (foreach Logs generate url); DistinctUsersCount = foreach (group DistinctUsers all) generate group, COUNT(DistinctUsers) as user_count; DistinctCountriesCount = foreach (group DistinctCountries all) generate group, COUNT(DistinctCountries) as country_count; DistinctUrlCount = foreach (group DistinctUrls all) generate group, COUNT(DistinctUrls) as url_count; AllDistinctCounts = cross DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount; Report = foreach AllDistinctCounts generate DistinctUsersCount::user_count, DistinctCountriesCount::country_count, DistinctUrlCount::url_count; store Report into 'log_report' using PigStorage(); {code} It would be good if there was a higher level version of Pig that permitted code to be written as: {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); Report = overall Logs generate COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url)) as url_count; store Report into 'log_report' using PigStorage(); {code} I do want this in Pig and not as SQL. I'd expect High Level Pig to generate Lower Level Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-830: -- Status: Patch Available (was: Open) Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830-v2.patch, pig-830.patch, TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #68
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/ -- [...truncated 91202 lines...] [exec] [junit] 09/06/02 16:37:14 INFO dfs.DataNode: PacketResponder 1 for block blk_-396386958455995109_1011 terminating [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52766 is added to blk_-396386958455995109_1011 size 6 [exec] [junit] 09/06/02 16:37:14 INFO dfs.DataNode: Received block blk_-396386958455995109_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/06/02 16:37:14 INFO dfs.DataNode: PacketResponder 2 for block blk_-396386958455995109_1011 terminating [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48955 is added to blk_-396386958455995109_1011 size 6 [exec] [junit] 09/06/02 16:37:14 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:54255 [exec] [junit] 09/06/02 16:37:14 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:43852 [exec] [junit] 09/06/02 16:37:14 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/06/02 16:37:14 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* ask 127.0.0.1:48955 to delete blk_-1150111323764591607_1005 blk_1008659681632345014_1006 [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* ask 127.0.0.1:52766 to delete blk_-8809735407422622866_1004 [exec] [junit] 09/06/02 16:37:15 WARN dfs.DataNode: Unexpected error trying to delete block blk_-8809735407422622866_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/06/02 16:37:15 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/06/02 16:37:15 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/06/02 16:37:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200906021636_0002/job.jar. blk_-9174002834871825284_1012 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Receiving block blk_-9174002834871825284_1012 src: /127.0.0.1:57456 dest: /127.0.0.1:48955 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Receiving block blk_-9174002834871825284_1012 src: /127.0.0.1:57970 dest: /127.0.0.1:52766 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Receiving block blk_-9174002834871825284_1012 src: /127.0.0.1:45566 dest: /127.0.0.1:40635 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Received block blk_-9174002834871825284_1012 of size 1411482 from /127.0.0.1 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: PacketResponder 0 for block blk_-9174002834871825284_1012 terminating [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40635 is added to blk_-9174002834871825284_1012 size 1411482 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Received block blk_-9174002834871825284_1012 of size 1411482 from /127.0.0.1 [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52766 is added to blk_-9174002834871825284_1012 size 1411482 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: PacketResponder 1 for block blk_-9174002834871825284_1012 terminating [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Received block blk_-9174002834871825284_1012 of size 1411482 from /127.0.0.1 [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48955 is added to blk_-9174002834871825284_1012 size 1411482 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: PacketResponder 2 for block blk_-9174002834871825284_1012 terminating [exec] [junit] 09/06/02 16:37:15 INFO fs.FSNamesystem: Increasing replication for file /tmp/hadoop-hudson/mapred/system/job_200906021636_0002/job.jar. New replication is 2
[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715742#action_12715742 ] Hadoop QA commented on PIG-564: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409702/PIG-564.patch against trunk revision 780722. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 24 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 159 release audit warnings (more than the trunk's current 156 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/console This message is automatically generated. Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Olga Natkovich Attachments: PIG-564.patch Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example:
[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig
[ https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715745#action_12715745 ] Mridul Muralidharan commented on PIG-826: - This would be a welcome change ! Another usecase which would get enabled (which, imo cant be done 'easily' now) is to use DISTINCT in filter. Like : B = FILTER A by COUNT(DISTINCT($1)) 1; DISTINCT as Function/Operator rather than statement/operator - High Level Pig --- Key: PIG-826 URL: https://issues.apache.org/jira/browse/PIG-826 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz In SQL, a user would think nothing of doing something like: {code} select COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url) as url_count from server_logs; {code} But in Pig, we'd need to do something like the following. And this is about the most compact version I could come up with. {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); DistinctUsers = distinct (foreach Logs generate user); DistinctCountries = distinct (foreach Logs generate country); DistinctUrls = distinct (foreach Logs generate url); DistinctUsersCount = foreach (group DistinctUsers all) generate group, COUNT(DistinctUsers) as user_count; DistinctCountriesCount = foreach (group DistinctCountries all) generate group, COUNT(DistinctCountries) as country_count; DistinctUrlCount = foreach (group DistinctUrls all) generate group, COUNT(DistinctUrls) as url_count; AllDistinctCounts = cross DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount; Report = foreach AllDistinctCounts generate DistinctUsersCount::user_count, DistinctCountriesCount::country_count, DistinctUrlCount::url_count; store Report into 'log_report' using PigStorage(); {code} It would be good if there was a higher level version of Pig that permitted code to be written as: {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); Report = overall Logs generate COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url)) as url_count; store Report into 'log_report' using PigStorage(); {code} I do want this in Pig and not as SQL. I'd expect High Level Pig to generate Lower Level Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715764#action_12715764 ] Alan Gates commented on PIG-564: Questions/comments on the patch. 1) Why did output1.pig change to look exactly like the new input5.pig? It seems like output1.pig shouldn't have changed. 2) A comment in the javacc files on how OTHER and IDENTIFIER interact in the pattern matching might be helpful, as it isn't immediately obvious (at least to me :) ). As long as 1 is ok, then +1. Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Olga Natkovich Attachments: PIG-564.patch Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example: mydir='mydir-myextraqual' and then running the pig script as: java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param_file myparamfile mypigparamsub.pig The following issues need to be fixed: 1) In -param option if parameter value contains special characters, it is truncated 2) In param_file, if param_value contains a special characters, it should be enclosed in quotes 3) If 2 is a known issue then it should be documented in http://wiki.apache.org/pig/ParameterSubstitution -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-Patch-minerva.apache.org #69
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/changes
[jira] Commented: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715765#action_12715765 ] Hadoop QA commented on PIG-830: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409709/pig-830-v2.patch against trunk revision 781206. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/console This message is automatically generated. Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830-v2.patch, pig-830.patch, TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715767#action_12715767 ] Olga Natkovich commented on PIG-564: Alan, thanks for review. (1) output1.pig is a generated file. I think it was checked in initially by mistake. Its content is irrelevant. (2) I might have to resubmit a patch anyway if I figure out the extra warnings (the link is broken at the moment). If I have to do that, I will also add comments. Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Olga Natkovich Attachments: PIG-564.patch Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example: mydir='mydir-myextraqual' and then running the pig script as: java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param_file myparamfile mypigparamsub.pig The following issues need to be fixed: 1) In -param option if parameter value contains special characters, it is truncated 2) In param_file, if param_value contains a special characters, it should be enclosed in quotes 3) If 2 is a known issue then it should be documented in http://wiki.apache.org/pig/ParameterSubstitution -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'
[ https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715792#action_12715792 ] Giridharan Kesavan commented on PIG-564: Use this link for releaseaudit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt I 've fixed the test-patch scripts for the broken link. Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' --- Key: PIG-564 URL: https://issues.apache.org/jira/browse/PIG-564 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Olga Natkovich Attachments: PIG-564.patch Consider the following Pig script which uses parameter substitution {code} %default qual '/user/viraj' %default mydir 'mydir_myextraqual' VISIT_LOGS = load '$qual/$mydir' as (a,b,c); dump VISIT_LOGS; {code} If you run the script as: == java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param mydir=mydir-myextraqual mypigparamsub.pig == You get the following error: == 2008-12-15 19:49:43,964 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: /user/viraj/mydir does not exist at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job terminated with anomalous status FAILED] at org.apache.pig.PigServer.openIterator(PigServer.java:389) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: java.io.IOException: Job terminated with anomalous status FAILED ... 6 more == Also tried using: -param mydir='mydir\-myextraqual' This behavior occurs if the parameter value contains characters such as +,=, ?. A workaround for this behavior is using a param_file which contains param_name=param_value on each line, with the param_value enclosed by quotes. For example: mydir='mydir-myextraqual' and then running the pig script as: java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main -param_file myparamfile mypigparamsub.pig The following issues need to be fixed: 1) In -param option if parameter value contains special characters, it is truncated 2) In param_file, if param_value contains a special characters, it should be enclosed in quotes 3) If 2 is a known issue then it should be documented in http://wiki.apache.org/pig/ParameterSubstitution -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2
[ https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-830: -- Attachment: pig-830-v3.patch As I experimented with these classes, I realized that the naive implementation that used a regex to capture strings, and return a tuple of strings, is not appropriate for the typed version of Pig, since one may want to cast various fields into integers, etc. The attached version returns a tuple of DataByteArrays , instead. Port Apache Log parsing piggybank contrib to Pig 0.2 Key: PIG-830 URL: https://issues.apache.org/jira/browse/PIG-830 Project: Pig Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Dmitriy V. Ryaboy Priority: Minor Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt The piggybank contribs (pig-472, pig-473, pig-474, pig-476, pig-486, pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was merged in. They should be updated to work with the current APIs and added back into trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.