[jira] Commented: (PIG-6) Addition of Hbase Storage Option In Load/Store Statement

2009-06-02 Thread Amr Awadallah (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715385#action_12715385
 ] 

Amr Awadallah commented on PIG-6:
-

Any progress on this?

 Addition of Hbase Storage Option In Load/Store Statement
 

 Key: PIG-6
 URL: https://issues.apache.org/jira/browse/PIG-6
 Project: Pig
  Issue Type: New Feature
 Environment: all environments
Reporter: Edward J. Yoon
 Fix For: 0.2.0

 Attachments: hbase-0.18.1-test.jar, hbase-0.18.1.jar, m34813f5.txt, 
 PIG-6.patch, PIG-6_V01.patch


 It needs to be able to load full table in hbase.  (maybe ... difficult? i'm 
 not sure yet.)
 Also, as described below, 
 It needs to compose an abstract 2d-table only with certain data filtered from 
 hbase array structure using arbitrary query-delimited. 
 {code}
 A = LOAD table('hbase_table');
 or
 B = LOAD table('hbase_table') Using HbaseQuery('Query-delimited by attributes 
  timestamp') as (f1, f2[, f3]);
 {code}
 Once test is done on my local machines, 
 I will clarify the grammars and give you more examples to help you explain 
 more storage options. 
 Any advice welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #66

2009-06-02 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/

--
[...truncated 91192 lines...]
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 0 
for block blk_-1934744134976204728_1010 terminating
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39568 is added to 
blk_-1934744134976204728_1010 size 6
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block 
blk_-1934744134976204728_1010 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54659 is added to 
blk_-1934744134976204728_1010 size 6
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 1 
for block blk_-1934744134976204728_1010 terminating
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block 
blk_-1934744134976204728_1010 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 2 
for block blk_-1934744134976204728_1010 terminating
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:58588 is added to 
blk_-1934744134976204728_1010 size 6
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_4153181122005715837_1011
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Receiving block 
blk_4153181122005715837_1011 src: /127.0.0.1:34228 dest: /127.0.0.1:58588
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Receiving block 
blk_4153181122005715837_1011 src: /127.0.0.1:52486 dest: /127.0.0.1:54659
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Receiving block 
blk_4153181122005715837_1011 src: /127.0.0.1:45265 dest: /127.0.0.1:39568
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block 
blk_4153181122005715837_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39568 is added to 
blk_4153181122005715837_1011 size 6
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 0 
for block blk_4153181122005715837_1011 terminating
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block 
blk_4153181122005715837_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54659 is added to 
blk_4153181122005715837_1011 size 6
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 1 
for block blk_4153181122005715837_1011 terminating
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:58588 is added to 
blk_4153181122005715837_1011 size 6
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Received block 
blk_4153181122005715837_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: PacketResponder 2 
for block blk_4153181122005715837_1011 terminating
 [exec] [junit] 09/06/02 01:05:52 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:59653
 [exec] [junit] 09/06/02 01:05:52 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:47970
 [exec] [junit] 09/06/02 01:05:52 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/02 01:05:52 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Deleting block 
blk_4578647973586690435_1006 file dfs/data/data1/current/blk_4578647973586690435
 [exec] [junit] 09/06/02 01:05:52 INFO dfs.DataNode: Deleting block 
blk_5949797326287727563_1005 file dfs/data/data2/current/blk_5949797326287727563
 [exec] [junit] 09/06/02 01:05:53 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/02 01:05:53 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/02 01:05:53 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906020105_0002/job.jar. 
blk_456729402385173_1012
 [exec] [junit] 09/06/02 01:05:53 INFO dfs.DataNode: Receiving block 
blk_456729402385173_1012 src: /127.0.0.1:50300 dest: /127.0.0.1:42011
 [exec] [junit] 09/06/02 01:05:53 INFO dfs.DataNode: Receiving block 
blk_456729402385173_1012 src: /127.0.0.1:52489 dest: /127.0.0.1:54659
 [exec] [junit] 09/06/02 01:05:53 INFO dfs.DataNode: Receiving block 

[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715406#action_12715406
 ] 

Hadoop QA commented on PIG-796:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409612/796.patch
  against trunk revision 780722.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 225 javac compiler warnings (more 
than the trunk's current 224 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/console

This message is automatically generated.

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: 796.patch, pig-796.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Dmitriy V. Ryaboy (JIRA)
Port Apache Log parsing piggybank contrib to Pig 0.2


 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor


The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, pig-487, 
pig-488, pig-503, pig-509) got dropped after the types branch was merged in.
They should be updated to work with the current APIs and added back into trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig

2009-06-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715639#action_12715639
 ] 

Alan Gates commented on PIG-826:


It can be done like this:

{code}
Logs = load 'log' using PigStorage()
as ( user: chararray, country: chararray, url: chararray);

Grouped = group Logs all;
foreach Grouped {
   duser = distinct Logs.user;
   dcountry = distinct Logs.country;
   durl = distinct Logs.url;
   generate COUNT(duser), COUNT(dcountry), COUNT(durl);
};
{code}

 DISTINCT as Function/Operator rather than statement/operator - High Level 
 Pig
 ---

 Key: PIG-826
 URL: https://issues.apache.org/jira/browse/PIG-826
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 In SQL, a user would think nothing of doing something like:
 {code}
 select
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url) as url_count
 from
 server_logs;
 {code}
 But in Pig, we'd need to do something like the following.  And this is about 
 the most
 compact version I could come up with.
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 DistinctUsers = distinct (foreach Logs generate user);
 DistinctCountries = distinct (foreach Logs generate country);
 DistinctUrls = distinct (foreach Logs generate url);
 DistinctUsersCount = foreach (group DistinctUsers all) generate
 group, COUNT(DistinctUsers) as user_count;
 DistinctCountriesCount = foreach (group DistinctCountries all) generate
 group, COUNT(DistinctCountries) as country_count;
 DistinctUrlCount = foreach (group DistinctUrls all) generate
 group, COUNT(DistinctUrls) as url_count;
 AllDistinctCounts = cross
 DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount;
 Report = foreach AllDistinctCounts generate
 DistinctUsersCount::user_count,
 DistinctCountriesCount::country_count,
 DistinctUrlCount::url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 It would be good if there was a higher level version of Pig that permitted 
 code to be written as:
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 Report = overall Logs generate
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url)) as url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 I do want this in Pig and not as SQL.  I'd expect High Level Pig to generate 
 Lower Level Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715651#action_12715651
 ] 

Hadoop QA commented on PIG-830:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409682/pig-830.patch
  against trunk revision 780722.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/67/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/67/console

This message is automatically generated.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830.patch


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program

2009-06-02 Thread Alan Gates (JIRA)
Records and bytes written reported by pig are wrong in a multi-store program


 Key: PIG-831
 URL: https://issues.apache.org/jira/browse/PIG-831
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor


The stats features checked in as part of PIG-626 (reporting the number of 
records and bytes written at the end of the query) print wrong values (often 
but not always 0) when the pig script being run contains more than 1 store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program

2009-06-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715657#action_12715657
 ] 

Alan Gates commented on PIG-831:


There are a couple of issues going on here.

One, PigStats looks through the plan until it finds the first root and then 
stops.  So for multi-store scripts that have multiple roots in their plans, 
this does not work.

Two, Hadoop does not return accurate numbers for records written in many cases. 
 I do not know if this is a bug in hadoop or a bug in the output format pig 
uses when doing multiple stores in one job.

 Records and bytes written reported by pig are wrong in a multi-store program
 

 Key: PIG-831
 URL: https://issues.apache.org/jira/browse/PIG-831
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor

 The stats features checked in as part of PIG-626 (reporting the number of 
 records and bytes written at the end of the query) print wrong values (often 
 but not always 0) when the pig script being run contains more than 1 store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig

2009-06-02 Thread Amr Awadallah (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715680#action_12715680
 ] 

Amr Awadallah commented on PIG-826:
---

neat.

 DISTINCT as Function/Operator rather than statement/operator - High Level 
 Pig
 ---

 Key: PIG-826
 URL: https://issues.apache.org/jira/browse/PIG-826
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 In SQL, a user would think nothing of doing something like:
 {code}
 select
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url) as url_count
 from
 server_logs;
 {code}
 But in Pig, we'd need to do something like the following.  And this is about 
 the most
 compact version I could come up with.
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 DistinctUsers = distinct (foreach Logs generate user);
 DistinctCountries = distinct (foreach Logs generate country);
 DistinctUrls = distinct (foreach Logs generate url);
 DistinctUsersCount = foreach (group DistinctUsers all) generate
 group, COUNT(DistinctUsers) as user_count;
 DistinctCountriesCount = foreach (group DistinctCountries all) generate
 group, COUNT(DistinctCountries) as country_count;
 DistinctUrlCount = foreach (group DistinctUrls all) generate
 group, COUNT(DistinctUrls) as url_count;
 AllDistinctCounts = cross
 DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount;
 Report = foreach AllDistinctCounts generate
 DistinctUsersCount::user_count,
 DistinctCountriesCount::country_count,
 DistinctUrlCount::url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 It would be good if there was a higher level version of Pig that permitted 
 code to be written as:
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 Report = overall Logs generate
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url)) as url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 I do want this in Pig and not as SQL.  I'd expect High Level Pig to generate 
 Lower Level Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-564:
---

Attachment: PIG-564.patch

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-564:
---

Status: Patch Available  (was: Open)

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-830:
---

Attachment: TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt

Log file for failing unit test.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-809) number of input lines it processed, number of output lines it produced for PIG job

2009-06-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715709#action_12715709
 ] 

Alan Gates commented on PIG-809:


Sorry, I referenced the wrong jira in the previous comment.  I meant PIG-626.

 number of input lines it processed, number of output lines it produced for 
 PIG job
 --

 Key: PIG-809
 URL: https://issues.apache.org/jira/browse/PIG-809
 Project: Pig
  Issue Type: Improvement
  Components: impl
 Environment: Linux
Reporter: Supreeth

 Excerpt from the mail conversation.
 It will be a great addition to Pig. Hadoop currently provides all these
 counters. All Pig has to do is to add them up for all Hadoop jobs in the
 script, and emit them at the end of the script. File a jira ?
 - Milind
 On 5/13/09 8:16 AM, Supreeth Hosur Nagesh Rao supre...@yahoo-inc.com
 wrote:
   Hi Olga
   
   With every PIG job is there any way for us to trap into the operational
   stats of that job, like number of input lines it processed, number of
   output lines it produced?
   
   I dont want to have a separate PIG script to do the same as it may be
   additional parsing, so is there such a stat. If not can that be
   provided, and exposed as a config parameter?
   
   -Supreeth
 This will be a great feature to have for our processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-830:
---

Status: Open  (was: Patch Available)

When I run the unit tests I get a failure in TestMyRegexLoader.  I'll attach 
the log file.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-830:
--

Attachment: pig-830-v2.patch

Sorry about that. New version attached, passes the test this time.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830-v2.patch, pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig

2009-06-02 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715726#action_12715726
 ] 

David Ciemiewicz commented on PIG-826:
--

Alan, thanks!  But what if I want to do the following:

{code}
foreach Grouped {
   dcountryurl = distinct Logs.(country,url);
   generate COUNT(dcountryurl);
};
{code}

Projecting multiple aliases doesn't seem to work. I also tried the following 
and it doesn't work either.

{code}
foreach Grouped {
   dcountryurl = distinct Logs.country, Logs.url;
   generate COUNT(dcountryurl);
};
{code}

 DISTINCT as Function/Operator rather than statement/operator - High Level 
 Pig
 ---

 Key: PIG-826
 URL: https://issues.apache.org/jira/browse/PIG-826
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 In SQL, a user would think nothing of doing something like:
 {code}
 select
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url) as url_count
 from
 server_logs;
 {code}
 But in Pig, we'd need to do something like the following.  And this is about 
 the most
 compact version I could come up with.
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 DistinctUsers = distinct (foreach Logs generate user);
 DistinctCountries = distinct (foreach Logs generate country);
 DistinctUrls = distinct (foreach Logs generate url);
 DistinctUsersCount = foreach (group DistinctUsers all) generate
 group, COUNT(DistinctUsers) as user_count;
 DistinctCountriesCount = foreach (group DistinctCountries all) generate
 group, COUNT(DistinctCountries) as country_count;
 DistinctUrlCount = foreach (group DistinctUrls all) generate
 group, COUNT(DistinctUrls) as url_count;
 AllDistinctCounts = cross
 DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount;
 Report = foreach AllDistinctCounts generate
 DistinctUsersCount::user_count,
 DistinctCountriesCount::country_count,
 DistinctUrlCount::url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 It would be good if there was a higher level version of Pig that permitted 
 code to be written as:
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 Report = overall Logs generate
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url)) as url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 I do want this in Pig and not as SQL.  I'd expect High Level Pig to generate 
 Lower Level Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-830:
--

Status: Patch Available  (was: Open)

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830-v2.patch, pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #68

2009-06-02 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/

--
[...truncated 91202 lines...]
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.DataNode: PacketResponder 1 
for block blk_-396386958455995109_1011 terminating
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52766 is added to 
blk_-396386958455995109_1011 size 6
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.DataNode: Received block 
blk_-396386958455995109_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.DataNode: PacketResponder 2 
for block blk_-396386958455995109_1011 terminating
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48955 is added to 
blk_-396386958455995109_1011 size 6
 [exec] [junit] 09/06/02 16:37:14 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:54255
 [exec] [junit] 09/06/02 16:37:14 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:43852
 [exec] [junit] 09/06/02 16:37:14 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/02 16:37:14 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:48955 to delete  blk_-1150111323764591607_1005 
blk_1008659681632345014_1006
 [exec] [junit] 09/06/02 16:37:14 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:52766 to delete  blk_-8809735407422622866_1004
 [exec] [junit] 09/06/02 16:37:15 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_-8809735407422622866_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/06/02 16:37:15 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/06/02 16:37:15 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/02 16:37:15 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906021636_0002/job.jar. 
blk_-9174002834871825284_1012
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Receiving block 
blk_-9174002834871825284_1012 src: /127.0.0.1:57456 dest: /127.0.0.1:48955
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Receiving block 
blk_-9174002834871825284_1012 src: /127.0.0.1:57970 dest: /127.0.0.1:52766
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Receiving block 
blk_-9174002834871825284_1012 src: /127.0.0.1:45566 dest: /127.0.0.1:40635
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Received block 
blk_-9174002834871825284_1012 of size 1411482 from /127.0.0.1
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: PacketResponder 0 
for block blk_-9174002834871825284_1012 terminating
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40635 is added to 
blk_-9174002834871825284_1012 size 1411482
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Received block 
blk_-9174002834871825284_1012 of size 1411482 from /127.0.0.1
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52766 is added to 
blk_-9174002834871825284_1012 size 1411482
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: PacketResponder 1 
for block blk_-9174002834871825284_1012 terminating
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: Received block 
blk_-9174002834871825284_1012 of size 1411482 from /127.0.0.1
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48955 is added to 
blk_-9174002834871825284_1012 size 1411482
 [exec] [junit] 09/06/02 16:37:15 INFO dfs.DataNode: PacketResponder 2 
for block blk_-9174002834871825284_1012 terminating
 [exec] [junit] 09/06/02 16:37:15 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200906021636_0002/job.jar. New replication 
is 2
 

[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715742#action_12715742
 ] 

Hadoop QA commented on PIG-564:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409702/PIG-564.patch
  against trunk revision 780722.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 24 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 159 release audit warnings 
(more than the trunk's current 156 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/console

This message is automatically generated.

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 

[jira] Commented: (PIG-826) DISTINCT as Function/Operator rather than statement/operator - High Level Pig

2009-06-02 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715745#action_12715745
 ] 

Mridul Muralidharan commented on PIG-826:
-

This would be a welcome change !
Another usecase which would get enabled (which, imo cant be done 'easily' now) 
is to use DISTINCT in filter.

Like :

B = FILTER A by COUNT(DISTINCT($1))  1;





 DISTINCT as Function/Operator rather than statement/operator - High Level 
 Pig
 ---

 Key: PIG-826
 URL: https://issues.apache.org/jira/browse/PIG-826
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz

 In SQL, a user would think nothing of doing something like:
 {code}
 select
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url) as url_count
 from
 server_logs;
 {code}
 But in Pig, we'd need to do something like the following.  And this is about 
 the most
 compact version I could come up with.
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 DistinctUsers = distinct (foreach Logs generate user);
 DistinctCountries = distinct (foreach Logs generate country);
 DistinctUrls = distinct (foreach Logs generate url);
 DistinctUsersCount = foreach (group DistinctUsers all) generate
 group, COUNT(DistinctUsers) as user_count;
 DistinctCountriesCount = foreach (group DistinctCountries all) generate
 group, COUNT(DistinctCountries) as country_count;
 DistinctUrlCount = foreach (group DistinctUrls all) generate
 group, COUNT(DistinctUrls) as url_count;
 AllDistinctCounts = cross
 DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount;
 Report = foreach AllDistinctCounts generate
 DistinctUsersCount::user_count,
 DistinctCountriesCount::country_count,
 DistinctUrlCount::url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 It would be good if there was a higher level version of Pig that permitted 
 code to be written as:
 {code}
 Logs = load 'log' using PigStorage()
 as ( user: chararray, country: chararray, url: chararray);
 Report = overall Logs generate
 COUNT(DISTINCT(user)) as user_count,
 COUNT(DISTINCT(country)) as country_count,
 COUNT(DISTINCT(url)) as url_count;
 store Report into 'log_report' using PigStorage();
 {code}
 I do want this in Pig and not as SQL.  I'd expect High Level Pig to generate 
 Lower Level Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715764#action_12715764
 ] 

Alan Gates commented on PIG-564:


Questions/comments on the patch.  

1) Why did output1.pig change to look exactly like the new input5.pig?  It 
seems like output1.pig shouldn't have changed.

2) A comment in the javacc files on how OTHER and IDENTIFIER interact in the 
pattern matching might be helpful, as it isn't immediately obvious (at least to 
me :) ).

As long as 1 is ok, then +1.

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-Patch-minerva.apache.org #69

2009-06-02 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/changes




[jira] Commented: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715765#action_12715765
 ] 

Hadoop QA commented on PIG-830:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409709/pig-830-v2.patch
  against trunk revision 781206.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/69/console

This message is automatically generated.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830-v2.patch, pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715767#action_12715767
 ] 

Olga Natkovich commented on PIG-564:


Alan, thanks for review. 

(1) output1.pig is a generated file. I think it was checked in initially by 
mistake. Its content is irrelevant.
(2) I might have to resubmit a patch anyway if I figure out the extra warnings 
(the link is broken at the moment). If I have to do that, I will also add 
comments.

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-02 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715792#action_12715792
 ] 

Giridharan Kesavan commented on PIG-564:


Use this link for releaseaudit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/68/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt

I 've fixed the test-patch scripts for the broken link.

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-02 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-830:
--

Attachment: pig-830-v3.patch

As I experimented with these classes, I realized that the naive implementation 
that used a regex to capture strings, and return a tuple of strings, is not 
appropriate for the typed version of Pig, since one may want to cast various 
fields into integers, etc.  The attached version returns a tuple of 
DataByteArrays , instead.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.