[jira] Commented: (PIG-1312) Make Pig work with hadoop security
[ https://issues.apache.org/jira/browse/PIG-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847894#action_12847894 ] Hadoop QA commented on PIG-1312: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439352/PIG-1312-1.patch against trunk revision 925513. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/258/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/258/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/258/console This message is automatically generated. Make Pig work with hadoop security -- Key: PIG-1312 URL: https://issues.apache.org/jira/browse/PIG-1312 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1312-1.patch In order to make Pig work with hadoop security, we need to set mapreduce.job.credentials.binary in the JobConf before we call getSplit() in the backend. We need to change code in merge join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847909#action_12847909 ] Ankur commented on PIG-1229: @Ashtosh Chauhan I read the HSQLDB license and it looked ok to me but I am not a lawyer :-) . Besides that apache cocoon uses it. I think we should be ok pulling it through ivy. I'll make the ivy and load-store related changes and submit a new patch on Monday. Sorry for the delay. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.7.0 Attachments: hsqldb.jar, jira-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]
[ https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847920#action_12847920 ] Hadoop QA commented on PIG-1308: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439354/PIG-1308.patch against trunk revision 925513. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/259/console This message is automatically generated. Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2] Key: PIG-1308 URL: https://issues.apache.org/jira/browse/PIG-1308 Project: Pig Issue Type: Bug Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1308.patch Simple script fails to read files from BinStorage() and fails to submit jobs to JobTracker. This occurs with trunk and not with Pig 0.6 branch. {code} data = load 'binstoragesample' using BinStorage() as (s, m, l); A = foreach ULT generate s#'key' as value; X = limit A 20; dump X; {code} When this script is submitted to the Jobtracker, we found the following error: 2010-03-18 22:31:22,296 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:01,574 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:43,276 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:33:21,743 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:02,004 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:43,442 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:35:25,907 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:07,402 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:48,596 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:37:28,014 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:04,823 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:38,981 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:39:12,220 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 Stack Trace revelead at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115) at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404) at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167) at org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263) at org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at
[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized
[ https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847951#action_12847951 ] Hadoop QA commented on PIG-1285: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439393/PIG-1285.2.patch against trunk revision 925513. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/260/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/260/console This message is automatically generated. Allow SingleTupleBag to be serialized - Key: PIG-1285 URL: https://issues.apache.org/jira/browse/PIG-1285 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: PIG-1285.2.patch, PIG-1285.patch Currently, Pig uses a SingleTupleBag for efficiency when a full-blown spillable bag implementation is not needed in the Combiner optimization. Unfortunately this can create problems. The below Initial.exec() code fails at run-time with the message that a SingleTupleBag cannot be serialized: {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { resTuple.set(i, in.get(i)); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} The code below can fix the problem in the UDF, but it seems like something that should be handled transparently, not requiring UDF authors to know about SingleTupleBags. {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; /* * Unfortunately SingleTupleBags are not serializable. We cache whether a given index contains a bag * in the map below, and copy all bags into DefaultBags before returning to avoid serialization exceptions. */ MapInteger, Boolean isBagAtIndex = Maps.newHashMap(); try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { Object obj = in.get(i); if (!isBagAtIndex.containsKey(i)) { isBagAtIndex.put(i, obj instanceof SingleTupleBag); } if (isBagAtIndex.get(i)) { DataBag newBag = bagFactory_.newDefaultBag(); newBag.addAll((DataBag)obj); obj = newBag; } resTuple.set(i, obj); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847986#action_12847986 ] Julien Le Dem commented on PIG-928: --- @Woody The main advantage of embedding pig calls in the scripting language is that it enables iterative algorithms, which Pig is no very good at currently. Why would we limit users to UDFs when they can have their whole program in their scripting language of choice? 4. Python is a very interesting language to integrate with Pig because it has all the same native data structures (tuple:tuple, list:bag, dictionary:map) which makes the UDFs compact and easy to code. That said, in scripting languages that don't match as well as Python to the Pig types, using the schema to disambiguate will be a must have. When do we need to convert sequences and iterators ? Pig has only tuple, bag and map as complex types AFAIK. 5. agreed, It should be cached or initialised at the begining. 3. and 6. I'll investigate passing the main script through the classpath when I have time. One interpreter would be nice to save memory and initialization time. I'm not sure the shared state is such an advantage as UDFs should not rely on being run in the same process. Maybe I'm just missing something. About the multi language: I'm not against it, but there's not that much code to share. The scripting-pig type conversion is specific to each language as you mentioned. also calling functions, getting a list of functions, defining output schemas will be specific. How I see the multilanguage: pig local|mapred -script {language} {scriptfile} main program: - generic: loads the sript file - generic: makes the script available in the classpath of the tasks (through a jar generated on the fly?) - specific: initializes the interpreter for the scripting language - specific: adds the global variables defined by pig for the main (in my case: decorators, pig server instance) - generic: loads the script in the interpreter - specific: figures out the list of functions and registers them automatically as UDFs in PIG using a dedicated UDF wrapper class - specific: run the main Pig execute call from the script: - generic: parse the Pig string to replace ${expression} by the value of the expression as evaluated by the interpreter in the local scope. UDF init: - generic: loads the script from the classpath - specific: initializes the interpreter for the scripting language - specific: add the global variables defined by pig for the UDFs (in my case: decorators) - generic: loads the script in the interpreter - specific: figures out the runtime for the outputSchema: function call or static schema (parsing of schema generic) UDF call: - specific: convert a pig tuple to a parameter list in the scripting language types - specific: call the function with the parameters - specific: convert the result to Pig types - generic: return the result UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1282: --- Attachment: PIG-1282.patch added hadoop license comment to BaseTestCase.java [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch, PIG-1282.patch, PIG-1282.patch, PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1282: --- Status: Open (was: Patch Available) [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1282: --- Status: Patch Available (was: Open) [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1282: --- Attachment: (was: PIG-1282.patch) [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1282: --- Attachment: (was: PIG-1282.patch) [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1282: --- Attachment: (was: PIG-1282.patch) [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.