[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716246#action_12716246
 ] 

Hudson commented on PIG-564:


Integrated in Pig-trunk #463 (See 
[http://hudson.zones.apache.org/hudson/job/Pig-trunk/463/])
: problem with parameter substitution and special charachters (olgan)


 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-830:
--

Status: Patch Available  (was: Open)

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-06-04 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-564:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Olga Natkovich
 Attachments: PIG-564.patch


 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716336#action_12716336
 ] 

Hadoop QA commented on PIG-830:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409730/pig-830-v3.patch
  against trunk revision 781599.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/71/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/71/console

This message is automatically generated.

 Port Apache Log parsing piggybank contrib to Pig 0.2
 

 Key: PIG-830
 URL: https://issues.apache.org/jira/browse/PIG-830
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, 
 TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt


 The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
 pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
 merged in.
 They should be updated to work with the current APIs and added back into 
 trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-796:
---

   Resolution: Fixed
Fix Version/s: 0.3.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Patch commited - thanks for contributing Ashutosh!

 support  conversion from numeric types to chararray
 ---

 Key: PIG-796
 URL: https://issues.apache.org/jira/browse/PIG-796
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Fix For: 0.3.0

 Attachments: 796.patch, pig-796.patch, pig-796.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-833) Storage access layer

2009-06-04 Thread Jay Tang (JIRA)
Storage access layer


 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang


A layer is needed to provide a high level data access abstraction and a tabular 
view of data in Hadoop, and could free Pig users from implementing their own 
data storage/retrieval code.  This layer should also include a columnar storage 
format in order to provide fast data projection, CPU/space-efficient data 
serialization, and a schema language to manage physical storage metadata.  
Eventually it could also support predicate pushdown for further performance 
improvement.  Initially, this layer could be a contrib project in Pig and 
become a hadoop subproject later on.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-817) Pig Docs for 0.3.0 Release

2009-06-04 Thread Corinne Chandel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712496#action_12712496
 ] 

Corinne Chandel edited comment on PIG-817 at 6/4/09 11:50 AM:
--

(1) PIG-817-2.patch - patch file



  was (Author: chandec):
(1) PIG_817.patch - patch file

(2) Doc-Build.zip - local doc build (for review)

(3) Doc-XML-Files - copies of the updated XML files (in case you need them)
  
 Pig Docs for 0.3.0 Release
 --

 Key: PIG-817
 URL: https://issues.apache.org/jira/browse/PIG-817
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.3.0
Reporter: Corinne Chandel
 Attachments: PIG-817-2.patch


 Update Pig docs for 0.3.0 release
  Getting Started 
  Pig Latin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-817) Pig Docs for 0.3.0 Release

2009-06-04 Thread Corinne Chandel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716370#action_12716370
 ] 

Corinne Chandel commented on PIG-817:
-

Please delete this file (no longer in use): quickstart.xml

\Trunk\src\docs\src\documentation\content\xdocs\quickstart.xml

 Pig Docs for 0.3.0 Release
 --

 Key: PIG-817
 URL: https://issues.apache.org/jira/browse/PIG-817
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.3.0
Reporter: Corinne Chandel
 Attachments: PIG-817-2.patch


 Update Pig docs for 0.3.0 release
  Getting Started 
  Pig Latin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program

2009-06-04 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716392#action_12716392
 ] 

Olga Natkovich commented on PIG-831:


+1 on the patch. please, keep the bug open since we should at some point 
correctly report numbers for multiquery


 Records and bytes written reported by pig are wrong in a multi-store program
 

 Key: PIG-831
 URL: https://issues.apache.org/jira/browse/PIG-831
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Attachments: PIG-831.patch


 The stats features checked in as part of PIG-626 (reporting the number of 
 records and bytes written at the end of the query) print wrong values (often 
 but not always 0) when the pig script being run contains more than 1 store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-834) incorrect plan when algebraic functions are nested

2009-06-04 Thread Thejas M Nair (JIRA)
incorrect plan when algebraic functions are nested
--

 Key: PIG-834
 URL: https://issues.apache.org/jira/browse/PIG-834
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Priority: Critical


a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.


# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2009-06-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-834:
--

Description: 
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.

{code}
# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false
{code}

  was:
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.


# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false


 incorrect plan when algebraic functions are nested
 --

 Key: PIG-834
 URL: https://issues.apache.org/jira/browse/PIG-834
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Priority: Critical

 a = load 'students.txt' as (c1,c2,c3,c4); 
 c = group a by c2;  
 f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
 Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
 distinct does not function, and incorrect results are produced.
 Distinct should have been evaluated in the 3 stages and output of Distinct 
 should be given to COUNT in reduce stage.
 {code}
 # Map Reduce Plan  
 #--
 MapReduce node 1-122
 Map Plan
 Local Rearrange[tuple]{bytearray}(false) 

[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716463#action_12716463
 ] 

Jeff Hammerbacher commented on PIG-823:
---

Hey Olga,

Really looking forward to seeing more discussion on this issue. The NameNode 
already contains file metadata like ctime, mtime, the block list, permissions, 
etc. Will the proposed metadata service subsume those attributes as well? 
Curious to see the proposed design.

Thanks,
Jeff

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-06-04 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716470#action_12716470
 ] 

Hong Tang commented on PIG-833:
---

Jeff, just like the SQL effort, the space of columnar storage is also wide 
open, and I think it is more beneficial to the overall healthy of the hadoop 
ecosystem.

With that being said, I also looked at the patch attached with HIVE-352. It 
appears that what the patch does is a level below our stated objectives. 
Specifically, the guts of the implementation (RCFile) is very close in spirit 
to TFile as described HADOOP-3315, which seems to have its first comprehensive 
patch back in December 2008. 

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang

 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-06-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716471#action_12716471
 ] 

Jeff Hammerbacher commented on PIG-833:
---

Hey Hong,

I never mentioned SQL or an ecosystem in my comment, but thanks for your 
observation. I was simply referring to the existence of a fairly detailed 
discussion in a related subproject that the Pig team may not have been 
following. I'll add an additional one here: 
https://issues.apache.org/jira/browse/HIVE-279 addresses the predicate pushdown 
feature.

Regards,
Jeff 

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang

 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-765) to implement jdiff

2009-06-04 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-765:
---

Status: In Progress  (was: Patch Available)

 to implement jdiff
 --

 Key: PIG-765
 URL: https://issues.apache.org/jira/browse/PIG-765
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: pig-765.patch, pig-765.patch, pig-765.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-765) to implement jdiff

2009-06-04 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-765:
---

Attachment: pig-765.patch

this jdiff patch is created after resolving the author tag issue mentioned in 
pig-806.

 to implement jdiff
 --

 Key: PIG-765
 URL: https://issues.apache.org/jira/browse/PIG-765
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: pig-765.patch, pig-765.patch, pig-765.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #72

2009-06-04 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/72/changes

Changes:

[olga] PIG-813: documentation updates (chandec via olgan)

[pradeepkth] PIG-796: support conversion from numeric types to chararray 
(Ashutosh Chauhan via pradeepkth)

--
started
Building remotely on minerva.apache.org (Ubuntu)
Updating http://svn.apache.org/repos/asf/hadoop/pig/trunk
U test/org/apache/pig/test/TestPOCast.java
C CHANGES.txt
U 
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
U src/org/apache/pig/data/DataType.java
D src/docs/src/documentation/content/xdocs/quickstart.xml
U src/docs/src/documentation/content/xdocs/site.xml
U src/docs/src/documentation/content/xdocs/index.xml
U src/docs/src/documentation/content/xdocs/piglatin.xml
Fetching 'http://svn.apache.org/repos/asf/hadoop/core/nightly/test-patch' at -1 
into 
'http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/trunk/test/bin'
 
At revision 781914
At revision 781914
no change for http://svn.apache.org/repos/asf/hadoop/core/nightly/test-patch 
since the previous build
[Pig-Patch-minerva.apache.org] $ /bin/bash /tmp/hudson7154531927977690732.sh
/home/hudson/tools/java/latest1.6/bin/java
Buildfile: build.xml

check-for-findbugs:

findbugs.check:

java5.check:

forrest.check:

hudson-test-patch:
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Testing patch for PIG-765.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Reverted 'CHANGES.txt'
 [exec] 
 [exec] Fetching external item into 'test/bin'
 [exec] Atest/bin/test-patch.sh
 [exec] Updated external to revision 781914.
 [exec] 
 [exec] Updated to revision 781914.
 [exec] PIG-765 patch is being downloaded at Thu Jun  4 22:48:13 PDT 2009 
from
 [exec] 
http://issues.apache.org/jira/secure/attachment/12409932/pig-765.patch
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Pre-building trunk to determine trunk number
 [exec] of release audit, javac, and Findbugs warnings.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] /home/hudson/tools/ant/latest/bin/ant  
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= releaseaudit 
 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/patchprocess/trunkReleaseAuditWarnings.txt
  21
 [exec] /home/hudson/tools/ant/latest/bin/ant  -Djavac.args=-Xlint 
-Xmaxwarns 1000 -Declipse.home=/home/nigel/tools/eclipse/latest 
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= clean tar  
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/patchprocess/trunkJavacWarnings.txt
  21
 [exec] Trunk compilation is broken?
 [exec]   % Total% Received % Xferd  Average Speed   TimeTime 
Time  Current
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec]  Dload  Upload   Total   Spent
Left  Speed
 [exec] 
 [exec]   0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0  0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0

BUILD FAILED
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/trunk/build.xml
 :653: exec returned: 1

Total time: 1 minute 45 seconds
Recording test results
Description found: PIG-765