[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join

2009-11-19 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779867#action_12779867
 ] 

Sriranjan Manjunath commented on PIG-872:
-

Olga, I agree with your 1st point. I will get rid of the test case.
To rectify 2, shouldn't maprReduceOper.getReplFiles() return only the 
replicated files? What's the rationale behind returning a null for the 
fragmented input? I could change it to what Ashutosh suggested, but it would 
just be cleaner if fragmented input was not represented by a null.


 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Attachments: PIG_872.patch


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-420) Limit on nothing functionality

2009-11-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780011#action_12780011
 ] 

Thejas M Nair commented on PIG-420:
---

The idea proposed by Rekha seems to be better alternative for 'limit on 
nothing' . It would be good to have something similar to C++ preprocessor 
macros . This way the if debug decisions can be done at compile time, and 
there will not be any performance impact.

Pig could have some syntax to denote debug only sections of the pig script , 
something like -
{code}
a = load 'file';
b = #IFDEF DEBUG { limit a, 100; } #ELSE { a; /*assuming we start supporting 
the syntax b=a; */}
c = filter b by $0 = 1;
#IFDEF DEBUG { store c into 'debug_file' ; }

{code}

 Limit on nothing functionality
 --

 Key: PIG-420
 URL: https://issues.apache.org/jira/browse/PIG-420
 Project: Pig
  Issue Type: Improvement
Reporter: Anand Murugappan

 Pig 2.0 implements the limit feature but as a standalone statement. 
 Limit is very useful in debug mode where we could run queries on smaller 
 amount of data (faster and on fewer nodes) to iron out issues but in the 
 production mode we would like to run through all the data. It would be good 
 to have a easy switch between debug and prod mode using the limit statement 
 without having to change the underlying code templates. Given that LIMIT is a 
 separate standalone statement it gets hard to parametrize the code. 
 For instance a query template might look like, 
 A = LOAD '...';
 B = LIMIT A $N;
 C = FOREACH B  
 In debug mode, we would like to set the variable $N to 100 but in prod mode 
 we would like to set it to a 'special value' that would not apply LIMIT and 
 letting us run it on all the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join

2009-11-19 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780148#action_12780148
 ] 

Olga Natkovich commented on PIG-872:


I am fine if you want to remove it as long as it does not break any existing 
functionality. I am not sure why it is present in the list.

 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Attachments: PIG_872.patch


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-19 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780169#action_12780169
 ] 

Pradeep Kamath commented on PIG-1064:
-

Patch committed to trunk.

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064-5.patch, PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-19 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1064:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064-5.patch, PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1097) Pig do not support group by boolean type

2009-11-19 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780223#action_12780223
 ] 

David Ciemiewicz commented on PIG-1097:
---

I think that one could argue that Filter functions are REALLY just 
EvalBoolean functions in disguise.

That Filter functions were a way of adding return type to Pig for Boolean cases 
when Pig had no types.

Further, I'd argue, that now that Pig does have data types, that Filter should 
be deprecated and all Filter functions should now become EvalBoolean.

In otherwords, I believe it was an oversight in the types migration to not 
migrate Filter to EvalBoolean

 Pig do not support group by boolean type
 

 Key: PIG-1097
 URL: https://issues.apache.org/jira/browse/PIG-1097
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Fix For: 0.6.0


 My Script is as following, the TestUDF return boolean type.
 {color:blue}
 DEFINE testUDF org.apache.pig.piggybank.util.TestUDF();
 raw = LOAD 'data/input';
 raw = FOREACH raw GENERATE testUDF();
 raw = GROUP raw BY $0;
 DUMP raw;
 {color}
 *The above script will throw exception:*
 Exception in thread main 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias raw
   at org.apache.pig.PigServer.openIterator(PigServer.java:481)
   at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
   at org.apache.pig.PigServer.registerScript(PigServer.java:409)
   at PigExample.main(PigExample.java:13)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
 Unable to store alias raw
   at org.apache.pig.PigServer.store(PigServer.java:536)
   at org.apache.pig.PigServer.openIterator(PigServer.java:464)
   ... 5 more
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
 Unexpected error during execution.
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:269)
   at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780)
   at org.apache.pig.PigServer.store(PigServer.java:528)
   ... 6 more
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2036: Unhandled key type boolean
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.selectComparator(JobControlCompiler.java:856)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:561)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:251)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:128)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
   ... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1100) PIG hangs on second call to DUMP or STORE

2009-11-19 Thread Michael Niv (JIRA)
PIG hangs on second call to DUMP or STORE
-

 Key: PIG-1100
 URL: https://issues.apache.org/jira/browse/PIG-1100
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
 Environment: Linux mniv-laptop 2.6.24-25-generic #1 SMP Tue Oct 20 
07:31:10 UTC 2009 i686 GNU/Linux

java version 1.6.0_16
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode)

Apache Pig version 0.5.0 (r829623) 

Hadoop 0.20.1
Subversion 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r 810220

Reporter: Michael Niv


pig hangs on the last line on the script below when I run with -x local. It 
runs fine when run on hadoop.
Happy to provide the files used in bugrep.pig below: v2.txt and 
document-date.pl (michael...@gmail.com)
I initially ran into a problem which involved cogrouping two things like 
id_docdate_s1 below, but this is what I came up with while tightening down my 
bugreport.
Thanks in advance.

-- bugrep.pig

DEFINE get_doc_date `document-date.pl`;

id_text1 = LOAD 'v2.txt' AS (id,text);
id_docdate1 = STREAM id_text1 THROUGH  get_doc_date AS (id,docdate);
id_docdate_s1 = ORDER id_docdate1 BY docdate;
store id_docdate_s1 into 'f1.out';

id_text2 = LOAD 'v2.txt' AS (id,text);
id_docdate2 = STREAM id_text2 THROUGH  get_doc_date AS (id,docdate);
id_docdate_s2 = ORDER id_docdate2 BY docdate;
store id_docdate_s2 into 'f2.out';-- second store call hangs pig


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1100) PIG hangs on second call to DUMP or STORE

2009-11-19 Thread Michael Niv (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Niv updated PIG-1100:
-

Attachment: bugrep.tar

Attached are repro details, including pig script, perl streaming program, and 
sufficient data sample.  Also a console-session of what I saw.


 PIG hangs on second call to DUMP or STORE
 -

 Key: PIG-1100
 URL: https://issues.apache.org/jira/browse/PIG-1100
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
 Environment: Linux mniv-laptop 2.6.24-25-generic #1 SMP Tue Oct 20 
 07:31:10 UTC 2009 i686 GNU/Linux
 java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode)
 Apache Pig version 0.5.0 (r829623) 
 Hadoop 0.20.1
 Subversion 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r 
 810220
Reporter: Michael Niv
 Attachments: bugrep.tar


 pig hangs on the last line on the script below when I run with -x local. It 
 runs fine when run on hadoop.
 Happy to provide the files used in bugrep.pig below: v2.txt and 
 document-date.pl (michael...@gmail.com)
 I initially ran into a problem which involved cogrouping two things like 
 id_docdate_s1 below, but this is what I came up with while tightening down my 
 bugreport.
 Thanks in advance.
 -- bugrep.pig
 DEFINE get_doc_date `document-date.pl`;
 id_text1 = LOAD 'v2.txt' AS (id,text);
 id_docdate1 = STREAM id_text1 THROUGH  get_doc_date AS (id,docdate);
 id_docdate_s1 = ORDER id_docdate1 BY docdate;
 store id_docdate_s1 into 'f1.out';
 id_text2 = LOAD 'v2.txt' AS (id,text);
 id_docdate2 = STREAM id_text2 THROUGH  get_doc_date AS (id,docdate);
 id_docdate_s2 = ORDER id_docdate2 BY docdate;
 store id_docdate_s2 into 'f2.out';-- second store call hangs pig

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1094) Fix unit tests corresponding to source changes so far

2009-11-19 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780312#action_12780312
 ] 

Richard Ding commented on PIG-1094:
---

The change of PIG-879 added the following failure cases:

||Testcase Class||Testcase Method||Cause||
|TestLoad|testLoadRemoteRel|local mode needs to be fixed|
|TestLoad|testLoadRemoteRelScheme|local mode needs to be fixed| 
|TestLoad|testGlobChars|local mode needs to be fixed| 

 Fix unit tests corresponding to source changes so far
 -

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1094.patch


 The check-in's so far on load-store-redesign branch have nor addressed unit 
 test failures due to interface changes. This jira is to track the task of 
 making the common case unit tests work with the new interfaces. Some aspects 
 of the new proposal like using LoadCaster interface for casting, making local 
 mode work have not been completed yet. Tests which are failing due to those 
 reasons will not be fixed in this jira and addressed in the jiras 
 corresponding to those tasks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1085) Pass JobConf and UDF specific configuration information to UDFs

2009-11-19 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780313#action_12780313
 ] 

Alan Gates commented on PIG-1085:
-

Applied the patch to the 0.6 branch as well.

 Pass JobConf and UDF specific configuration information to UDFs
 ---

 Key: PIG-1085
 URL: https://issues.apache.org/jira/browse/PIG-1085
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: udfconf-2.patch, udfconf.patch


 Users have long asked for a way to get the JobConf structure in their UDFs.  
 It would also be nice to have a way to pass properties between the front end 
 and back end so that UDFs can store state during parse time and use it at 
 runtime.
 This patch does part of what is proposed in PIG-602, but not all of it.  It 
 does not provide a way to give user specified configuration files to UDFs.  
 So I will mark 602 as depending on this bug, but it isn't a duplicate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1099) [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG

2009-11-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1099:


   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

Patch checked in.

 [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG
 --

 Key: PIG-1099
 URL: https://issues.apache.org/jira/browse/PIG-1099
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.7.0

 Attachments: PIG_1099.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Welcome Jeff Zhang

2009-11-19 Thread Alan Gates

All,

I would like to welcome Jeff Zhang as our newest Pig committer.  Jeff  
has been contributing to Pig for about nine months now.  He's been  
active on the mailing lists, in contributing patches, and in helping  
other users with their patches.  Congratulations Jeff, and thanks for  
your contributions to Pig.


Alan.


[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-19 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1091:
--

Attachment: PIG-1091.patch

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Attachments: PIG-1091.patch


 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-19 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1091:
--

Status: Patch Available  (was: Open)

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Attachments: PIG-1091.patch


 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1094) Fix unit tests corresponding to source changes so far

2009-11-19 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780325#action_12780325
 ] 

Olga Natkovich commented on PIG-1094:
-

We should fix the failures as part of the patch that caused them. Now that we 
have the baseline, we should make sure that we don't introduce any new failures.

 Fix unit tests corresponding to source changes so far
 -

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1094.patch


 The check-in's so far on load-store-redesign branch have nor addressed unit 
 test failures due to interface changes. This jira is to track the task of 
 making the common case unit tests work with the new interfaces. Some aspects 
 of the new proposal like using LoadCaster interface for casting, making local 
 mode work have not been completed yet. Tests which are failing due to those 
 reasons will not be fixed in this jira and addressed in the jiras 
 corresponding to those tasks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader

2009-11-19 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-879.


  Resolution: Fixed
Hadoop Flags: [Reviewed]

+1, patch committed on load-store-redesign branch with minor change in TestLoad 
to correctly set up file on MiniCluster.

 Pig should provide a way for input location string in load statement to be 
 passed as-is to the Loader
 -

 Key: PIG-879
 URL: https://issues.apache.org/jira/browse/PIG-879
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Pradeep Kamath
Assignee: Richard Ding
 Attachments: PIG-879.patch, PIG-879.patch, PIG-879.patch, 
 PIG-879.patch, PIG-879.patch


  Due to multiquery optimization, Pig always converts the filenames to 
 absolute URIs (see 
 http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section 
 about Incompatible Changes - Path Names and Schemes). This is necessary since 
 the script may have cd .. statements between load or store statements and 
 if the load statements have relative paths, we would need to convert to 
 absolute paths to know where to load/store from. To do this 
 QueryParser.massageFilename() has the code below[1] which basically gives the 
 fully qualified hdfs path
  
 However the issue with this approach is that if the filename string is 
 something like 
 hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2,
  the code below[1] actually translates this to 
 hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2
  and throws an exception that it is an incorrect path.
  
 Some loaders may want to interpret the filenames (the input location string 
 in the load statement) in any way they wish and may want Pig to not make 
 absolute paths out of them.
  
 There are a few options to address this:
 1)A command line switch to indicate to Pig that pathnames in the script 
 are all absolute and hence Pig should not alter them and pass them as-is to 
 Loaders and Storers. 
 2)A keyword in the load and store statements to indicate the same intent 
 to pig
 3)A property which users can supply on cmdline or in pig.properties to 
 indicate the same intent.
 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String 
 curDir) which does the conversion to absolute - this way Loader can chose to 
 implement it as a noop.
 Thoughts?
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Welcome Jeff Zhang

2009-11-19 Thread Jeff Zhang
I am very glad to join the pig family. I have grown and learned a lot with
others' help in the last nine months.I will continue contribute to pig and
learn from others.


Jeff Zhang


On Thu, Nov 19, 2009 at 2:48 PM, Alan Gates ga...@yahoo-inc.com wrote:

 All,

 I would like to welcome Jeff Zhang as our newest Pig committer.  Jeff has
 been contributing to Pig for about nine months now.  He's been active on the
 mailing lists, in contributing patches, and in helping other users with
 their patches.  Congratulations Jeff, and thanks for your contributions to
 Pig.

 Alan.



Re: Welcome Jeff Zhang

2009-11-19 Thread Dmitriy Ryaboy
Congrats Jeff!


On Thu, Nov 19, 2009 at 7:47 PM, Jeff Zhang zjf...@gmail.com wrote:
 I am very glad to join the pig family. I have grown and learned a lot with
 others' help in the last nine months.I will continue contribute to pig and
 learn from others.


 Jeff Zhang


 On Thu, Nov 19, 2009 at 2:48 PM, Alan Gates ga...@yahoo-inc.com wrote:

 All,

 I would like to welcome Jeff Zhang as our newest Pig committer.  Jeff has
 been contributing to Pig for about nine months now.  He's been active on the
 mailing lists, in contributing patches, and in helping other users with
 their patches.  Congratulations Jeff, and thanks for your contributions to
 Pig.

 Alan.




[jira] Assigned: (PIG-909) Allow Pig executable to use hadoop jars not bundled with pig

2009-11-19 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy reassigned PIG-909:
-

Assignee: Dmitriy V. Ryaboy

 Allow Pig executable to use hadoop jars not bundled with pig
 

 Key: PIG-909
 URL: https://issues.apache.org/jira/browse/PIG-909
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Attachments: pig_909.patch


 The current pig executable (bin/pig) looks for a file named 
 hadoop${PIG_HADOOP_VERSION}.jar that comes bundled with Pig.
 The proposed change will allow Pig to look in $HADOOP_HOME for the hadoop 
 jars, if that variable is set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-19 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1091:
---


Patch reviewed. +1

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Attachments: PIG-1091.patch


 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2009-11-19 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1088:
---

Attachment: PIG-1088.patch

* LoadOrderedInput (proposed) is called OrderedLoadFunc . It is an abstract 
class because LoadFunc is an abstract class.
* TextInputOrder(proposed) is called FileInputLoadFunc.  
* New internal type called GENERIC_WRITABLECOMPARABLE has been added, to be 
used for  WritableComparable classes. Tuples can read/write this type. 
* ReadToEndLoader takes a list of input splits to be read

All TestMergeJoin test cases are passing.

testpatch results -
 [exec]
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.



 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1097) Pig do not support group by boolean type

2009-11-19 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780380#action_12780380
 ] 

Jeff Zhang commented on PIG-1097:
-

agree, FilterFunc is equivalent to EvalFuncBoolean in my opinion.  I do not 
know about the history of FilterFunc, does it come before pig support types? 
But now I think it should be deprecated.

And why pig do not support boolean type in foreach projection and group by ? 
any performance consideration ?



 Pig do not support group by boolean type
 

 Key: PIG-1097
 URL: https://issues.apache.org/jira/browse/PIG-1097
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Fix For: 0.6.0


 My Script is as following, the TestUDF return boolean type.
 {color:blue}
 DEFINE testUDF org.apache.pig.piggybank.util.TestUDF();
 raw = LOAD 'data/input';
 raw = FOREACH raw GENERATE testUDF();
 raw = GROUP raw BY $0;
 DUMP raw;
 {color}
 *The above script will throw exception:*
 Exception in thread main 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias raw
   at org.apache.pig.PigServer.openIterator(PigServer.java:481)
   at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
   at org.apache.pig.PigServer.registerScript(PigServer.java:409)
   at PigExample.main(PigExample.java:13)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
 Unable to store alias raw
   at org.apache.pig.PigServer.store(PigServer.java:536)
   at org.apache.pig.PigServer.openIterator(PigServer.java:464)
   ... 5 more
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
 Unexpected error during execution.
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:269)
   at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780)
   at org.apache.pig.PigServer.store(PigServer.java:528)
   ... 6 more
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2036: Unhandled key type boolean
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.selectComparator(JobControlCompiler.java:856)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:561)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:251)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:128)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
   ... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2009-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780384#action_12780384
 ] 

Hadoop QA commented on PIG-1088:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425554/PIG-1088.patch
  against trunk revision 882340.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/50/console

This message is automatically generated.

 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780456#action_12780456
 ] 

Hadoop QA commented on PIG-1091:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425542/PIG-1091.patch
  against trunk revision 882340.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/162/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/162/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/162/console

This message is automatically generated.

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Attachments: PIG-1091.patch


 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.