[jira] Commented: (PIG-1312) Make Pig work with hadoop security

2010-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847894#action_12847894
 ] 

Hadoop QA commented on PIG-1312:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439352/PIG-1312-1.patch
  against trunk revision 925513.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/258/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/258/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/258/console

This message is automatically generated.

 Make Pig work with hadoop security
 --

 Key: PIG-1312
 URL: https://issues.apache.org/jira/browse/PIG-1312
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1312-1.patch


 In order to make Pig work with hadoop security, we need to set 
 mapreduce.job.credentials.binary in the JobConf before we call getSplit() 
 in the backend. We need to change code in merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-21 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847909#action_12847909
 ] 

Ankur commented on PIG-1229:


@Ashtosh Chauhan 
I read the HSQLDB license and it looked ok to me but I am not a lawyer :-) . 
Besides that apache cocoon uses it. I think we should be ok pulling it through 
ivy.

I'll make the ivy and load-store related changes and submit a new patch on 
Monday.

Sorry for the delay.
 

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.7.0

 Attachments: hsqldb.jar, jira-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]

2010-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847920#action_12847920
 ] 

Hadoop QA commented on PIG-1308:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439354/PIG-1308.patch
  against trunk revision 925513.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/console

This message is automatically generated.

 Inifinite loop in JobClient when reading from BinStorage Message: 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2]
 

 Key: PIG-1308
 URL: https://issues.apache.org/jira/browse/PIG-1308
 Project: Pig
  Issue Type: Bug
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1308.patch


 Simple script fails to read files from BinStorage() and fails to submit jobs 
 to JobTracker. This occurs with trunk and not with Pig 0.6 branch.
 {code}
 data = load 'binstoragesample' using BinStorage() as (s, m, l);
 A = foreach ULT generate   s#'key' as value;
 X = limit A 20;
 dump X;
 {code}
 When this script is submitted to the Jobtracker, we found the following error:
 2010-03-18 22:31:22,296 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:01,574 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:43,276 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:33:21,743 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:02,004 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:43,442 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:35:25,907 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:07,402 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:48,596 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:37:28,014 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:04,823 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:38,981 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:39:12,220 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 Stack Trace revelead 
 at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144)
 at 
 org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115)
 at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263)
 at 
 org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at 

[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized

2010-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847951#action_12847951
 ] 

Hadoop QA commented on PIG-1285:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439393/PIG-1285.2.patch
  against trunk revision 925513.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/260/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/260/console

This message is automatically generated.

 Allow SingleTupleBag to be serialized
 -

 Key: PIG-1285
 URL: https://issues.apache.org/jira/browse/PIG-1285
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG-1285.2.patch, PIG-1285.patch


 Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
 spillable bag implementation is not needed in the Combiner optimization.
 Unfortunately this can create problems. The below Initial.exec() code fails 
 at run-time with the message that a SingleTupleBag cannot be serialized:
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   try {
  Tuple resTuple = tupleFactory_.newTuple(in.size());
  for (int i=0; i in.size(); i++) {
resTuple.set(i, in.get(i));
 }
 return resTuple;
} catch (IOException e) {
  log.warn(e);
  return null;
   }
 }
 {code}
 The code below can fix the problem in the UDF, but it seems like something 
 that should be handled transparently, not requiring UDF authors to know about 
 SingleTupleBags.
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   
   /*
* Unfortunately SingleTupleBags are not serializable. We cache whether 
 a given index contains a bag
* in the map below, and copy all bags into DefaultBags before 
 returning to avoid serialization exceptions.
*/
   MapInteger, Boolean isBagAtIndex = Maps.newHashMap();
   
   try {
 Tuple resTuple = tupleFactory_.newTuple(in.size());
 for (int i=0; i in.size(); i++) {
   Object obj = in.get(i);
   if (!isBagAtIndex.containsKey(i)) {
 isBagAtIndex.put(i, obj instanceof SingleTupleBag);
   }
   if (isBagAtIndex.get(i)) {
 DataBag newBag = bagFactory_.newDefaultBag();
 newBag.addAll((DataBag)obj);
 obj = newBag;
   }
   resTuple.set(i, obj);
 }
 return resTuple;
   } catch (IOException e) {
 log.warn(e);
 return null;
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847986#action_12847986
 ] 

Julien Le Dem commented on PIG-928:
---

@Woody

The main advantage of embedding pig calls in the scripting language is that it 
enables iterative algorithms, which Pig is no very good at currently. Why would 
we limit users to UDFs when they can have their whole program in their 
scripting language of choice?

4. Python is a very interesting language to integrate with Pig because it has 
all the same native data structures (tuple:tuple, list:bag, dictionary:map) 
which makes the UDFs compact and easy to code. That said, in scripting 
languages that don't match as well as Python to the Pig types, using the schema 
to disambiguate will be a must have.
When do we need to convert sequences and iterators ? Pig has only tuple, bag 
and map as complex types AFAIK.
5. agreed, It should be cached or initialised at the begining.
3. and 6. I'll investigate passing the main script through the classpath when I 
have time. One interpreter would be nice to save memory and initialization 
time. I'm not sure the shared state is such an advantage as UDFs should not 
rely on being run in the same process. Maybe I'm just missing something.

About the multi language: I'm not against it, but there's not that much code to 
share.
The scripting-pig type conversion is specific to each language as you 
mentioned. also calling functions, getting a list of functions, defining output 
schemas will be specific.

How I see the multilanguage:

pig local|mapred -script {language} {scriptfile}

main program:
- generic: loads the sript file
- generic: makes the script available in the classpath of the tasks (through a 
jar generated on the fly?)
- specific: initializes the interpreter for the scripting language
- specific: adds the global variables defined by pig for the main (in my case: 
decorators, pig server instance)
- generic: loads the script in the interpreter
- specific: figures out the list of functions and registers them automatically 
as UDFs in PIG using a dedicated UDF wrapper class
- specific: run the main

Pig execute call from the script:
- generic: parse the Pig string to replace ${expression} by the value of the 
expression as evaluated by the interpreter in the local scope.

UDF init:
- generic: loads the script from the classpath
- specific: initializes the interpreter for the scripting language
- specific: add the global variables defined by pig for the UDFs (in my case: 
decorators)
- generic: loads the script in the interpreter
- specific: figures out the runtime for the outputSchema: function call or 
static schema (parsing of schema generic)

UDF call:
- specific: convert a pig tuple to a parameter list in the scripting language 
types
- specific: call the function with the parameters
- specific: convert the result to Pig types
- generic: return the result
 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1282:
---

Attachment: PIG-1282.patch

added hadoop license comment to BaseTestCase.java

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch, PIG-1282.patch, PIG-1282.patch, 
 PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1282:
---

Status: Open  (was: Patch Available)

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1282:
---

Status: Patch Available  (was: Open)

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1282:
---

Attachment: (was: PIG-1282.patch)

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1282:
---

Attachment: (was: PIG-1282.patch)

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1282:
---

Attachment: (was: PIG-1282.patch)

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.