[jira] Reopened: (PIG-1182) Pig reference manual does not mention syntax for comments
[ https://issues.apache.org/jira/browse/PIG-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Ciemiewicz reopened PIG-1182: --- Corinne, not sure what you are so resistant to following the basic principles of documenting ALL syntax, including comments, in the reference manual. If the document is open to the community to edit, I'm more than willing to do the work myself since I have contibuted as a technical writer for programming language reference manuals in my past as well as having been a developer of compilers and software development tools. Also, I think the passage you sited could use a little work on the English: Using Comments in Scripts If you place Pig Latin statements in a script, the script can include comments. For multi-line comments use /* */ For single line comments use -- /* myscript.pig My script includes three simple Pig Latin Statements. */ A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); -- load statement B = FOREACH A GENERATE name; -- foreach statement DUMP B; --dump statement Case Sensitivity Pig reference manual does not mention syntax for comments - Key: PIG-1182 URL: https://issues.apache.org/jira/browse/PIG-1182 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.5.0 Reporter: David Ciemiewicz Assignee: Corinne Chandel Fix For: 0.7.0 The Pig 0.5.0 reference manual does not mention how to write comments in your pig code using -- (two dashes). http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html Also, does /* */ also work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front
[ https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reopened PIG-1216: --- Reopening as the assumption made for the patch doesn't hold. New load store design does not allow Pig to validate inputs and outputs up front Key: PIG-1216 URL: https://issues.apache.org/jira/browse/PIG-1216 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1216.patch, pig-1216_1.patch In Pig 0.6 and before, Pig attempts to verify existence of inputs and non-existence of outputs during parsing to avoid run time failures when inputs don't exist or outputs can't be overwritten. The downside to this was that Pig assumed all inputs and outputs were HDFS files, which made implementation harder for non-HDFS based load and store functions. In the load store redesign (PIG-966) this was delegated to InputFormats and OutputFormats to avoid this problem and to make use of the checks already being done in those implementations. Unfortunately, for Pig Latin scripts that run more then one MR job, this does not work well. MR does not do input/output verification on all the jobs at once. It does them one at a time. So if a Pig Latin script results in 10 MR jobs and the file to store to at the end already exists, the first 9 jobs will be run before the 10th job discovers that the whole thing was doomed from the beginning. To avoid this a validate call needs to be added to the new LoadFunc and StoreFunc interfaces. Pig needs to pass this method enough information that the load function implementer can delegate to InputFormat.getSplits() and the store function implementer to OutputFormat.checkOutputSpecs() if s/he decides to. Since 90% of all load and store functions use HDFS and PigStorage will also need to, the Pig team should implement a default file existence check on HDFS and make it available as a static method to other Load/Store function implementers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
COMPLETED merge of load-store-redesign branch to trunk
The merge from load-store-redesign branch to trunk is now completed. New commits can now proceed on trunk. The load-store-redesign branch is deprecated with this merge and no more commits should be done on that branch. Pradeep From: Pradeep Kamath Sent: Thursday, February 18, 2010 11:20 AM To: Pradeep Kamath; 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: BEGINNING merge of load-store-redesign branch to trunk - hold off commits! Hi, I will begin this activity now - a request to all committers to not commit to trunk or load-store-redesign till I send an all clear message - I am anticipating this will hopefully be completed by end of day (Pacific time) tomorrow. Thanks, Pradeep From: Pradeep Kamath Sent: Tuesday, February 16, 2010 11:34 AM To: 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: Plan to merge load-store-redesign branch to trunk Hi, We would like to merge the load-store-redesign branch to trunk tentatively on Thursday. To do this, I would like to request all committers to not commit anything to load-store-redesign branch or trunk during the period of the merge. I will send out a mail to indicate begin and end of this activity - tentatively I am expecting this to be a day's period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any conflicts and run all tests. Pradeep
[jira] Commented: (PIG-1188) Padding nulls to the input tuple according to input schema
[ https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835944#action_12835944 ] Richard Ding commented on PIG-1188: --- To summarize where we are: Right now Pig project operator pads null if the value to be projected doesn't exist. As a consequence, the desired result is achieved if PigStorage is used and a schema with data types is specified, since in this case Pig inserts a project+cast operator for each field in the schema. In the case where no schema is specified in the load statement, Pig is doing a good job adhering to the Pig's philosophy and let the program run without throwing runtime exception. Now leave the case where a schema is specified without data types. There are several options: * Pig automatically insert a project operator for each field in the schema to ensure the input data matches the schema. The trade-off for this is the performance penalty. Is it worthwhile if most user data is well-behaved? * Users can explicitly add a foreach statement after the load statement which projects all the fields in the schema. This is similar to the practice by the users to run a map job first to cleanup the data. * Pig can also delegate the padding work to the loaders. The problem is that now the schema isn't passed to the loaders. Padding nulls to the input tuple according to input schema -- Key: PIG-1188 URL: https://issues.apache.org/jira/browse/PIG-1188 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Richard Ding Fix For: 0.7.0 Currently, the number of fields in the input tuple is determined by the data. When we have schema, we should generate input data according to the schema, and padding nulls if necessary. Here is one example: Pig script: {code} a = load '1.txt' as (a0, a1); dump a; {code} Input file: {code} 1 2 1 2 3 1 {code} Current result: {code} (1,2) (1,2,3) (1) {code} Desired result: {code} (1,2) (1,2) (1, null) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: COMPLETED merge of load-store-redesign branch to trunk
Great stuff guys, I've been keen on refactoring the pig HiveRCLoader reader and writer to use the new load-store redesign. - Original Message - From: Pradeep Kamath prade...@yahoo-inc.com To: pig-dev@hadoop.apache.org pig-dev@hadoop.apache.org; pig-u...@hadoop.apache.org pig-u...@hadoop.apache.org Sent: Fri Feb 19 20:05:54 2010 Subject: COMPLETED merge of load-store-redesign branch to trunk The merge from load-store-redesign branch to trunk is now completed. New commits can now proceed on trunk. The load-store-redesign branch is deprecated with this merge and no more commits should be done on that branch. Pradeep From: Pradeep Kamath Sent: Thursday, February 18, 2010 11:20 AM To: Pradeep Kamath; 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: BEGINNING merge of load-store-redesign branch to trunk - hold off commits! Hi, I will begin this activity now - a request to all committers to not commit to trunk or load-store-redesign till I send an all clear message - I am anticipating this will hopefully be completed by end of day (Pacific time) tomorrow. Thanks, Pradeep From: Pradeep Kamath Sent: Tuesday, February 16, 2010 11:34 AM To: 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: Plan to merge load-store-redesign branch to trunk Hi, We would like to merge the load-store-redesign branch to trunk tentatively on Thursday. To do this, I would like to request all committers to not commit anything to load-store-redesign branch or trunk during the period of the merge. I will send out a mail to indicate begin and end of this activity - tentatively I am expecting this to be a day's period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any conflicts and run all tests. Pradeep
[jira] Updated: (PIG-1215) Make Hadoop jobId more prominent in the client log
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1215: -- Attachment: pig-1215_4.patch Change as suggested by Olga. Other parts of patch are as before. Make Hadoop jobId more prominent in the client log -- Key: PIG-1215 URL: https://issues.apache.org/jira/browse/PIG-1215 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1215.patch, pig-1215.patch, pig-1215_1.patch, pig-1215_3.patch, pig-1215_4.patch This is a request from applications that want to be able to programmatically parse client logs to find hadoop Ids. The woould like to see each job id on a separate line in the following format: hadoopJobId: job_123456789 They would also like to see the jobs in the order they are executed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1245) Remove the connection to nameone in HExecutionEngine.init()
Remove the connection to nameone in HExecutionEngine.init() Key: PIG-1245 URL: https://issues.apache.org/jira/browse/PIG-1245 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Fix For: 0.7.0 PigContext.connect() calls HExecutionEngine.init(). The former is called from the backend map/reduce tasks in DefaultIndexableLoader used in merge join. It is not clear that a connection to the namenode is required in HExecutionEngine.init(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1188) Padding nulls to the input tuple according to input schema
[ https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1188: Fix Version/s: (was: 0.7.0) Looks like most common cases are already working. Unlinking from 0.7.0 release. Padding nulls to the input tuple according to input schema -- Key: PIG-1188 URL: https://issues.apache.org/jira/browse/PIG-1188 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Richard Ding Currently, the number of fields in the input tuple is determined by the data. When we have schema, we should generate input data according to the schema, and padding nulls if necessary. Here is one example: Pig script: {code} a = load '1.txt' as (a0, a1); dump a; {code} Input file: {code} 1 2 1 2 3 1 {code} Current result: {code} (1,2) (1,2,3) (1) {code} Desired result: {code} (1,2) (1,2) (1, null) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1215) Make Hadoop jobId more prominent in the client log
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1215: -- Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked-in. Make Hadoop jobId more prominent in the client log -- Key: PIG-1215 URL: https://issues.apache.org/jira/browse/PIG-1215 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1215.patch, pig-1215.patch, pig-1215_1.patch, pig-1215_3.patch, pig-1215_4.patch This is a request from applications that want to be able to programmatically parse client logs to find hadoop Ids. The woould like to see each job id on a separate line in the following format: hadoopJobId: job_123456789 They would also like to see the jobs in the order they are executed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836035#action_12836035 ] Pradeep Kamath commented on PIG-966: LoadFunc is now an abstract class with default implementations for some of the methods - we hope this will aid implementers. I would like to make the same change for StoreFunc. Since PigStorage currently does both load and store, we would need to also introduce an interface - StoreFuncInterface so that PigStorage can extend LoadFunc and implement StoreFuncInterface. To be symmetrical, we would need to also introduce a LoadFuncInterface. This interface can be used by implementers if they want their loadFunc implementation to extend some other class. We can document and recommend strongly to users to only use our abstract classes since that would be make them less vulnerable to incompatibile additions in the future (hopefully when we add new methods into these abstract classes we will give a default implementation). I will upload a patch for this unless anyone has strong objections. Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1218) Use distributed cache to store samples
[ https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1218: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed patch PIG-1218_2.patch since the merge join changes need to be re-worked and will be handled in a different patch. Thanks Richard! Use distributed cache to store samples -- Key: PIG-1218 URL: https://issues.apache.org/jira/browse/PIG-1218 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1218.patch, PIG-1218_2.patch, PIG-1218_3.patch Currently, in the case of skew join and order by we use sample that is just written to the dfs (not distributed cache) and, as the result, get opened and copied around more than necessary. This impacts query performance and also places unnecesary load on the name node -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1182) Pig reference manual does not mention syntax for comments
[ https://issues.apache.org/jira/browse/PIG-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836055#action_12836055 ] Olga Natkovich commented on PIG-1182: - Ciemo, There is a reason why Corinne created to sections of the document. A single document was just too large so it was hard to manage changes and even to load it takes some time. If I understand correctly, the real issue that you are pointing out is that it is hard to find specific information that you are looking for quickly. Traditionally indices are used for this purpose and pig documentation does not have one. Short term, Corinne does not have time to work on it due to other commitment. If you or other users would like to help with that, that would certainly be appreciated. Pig reference manual does not mention syntax for comments - Key: PIG-1182 URL: https://issues.apache.org/jira/browse/PIG-1182 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.5.0 Reporter: David Ciemiewicz Assignee: Corinne Chandel Fix For: 0.7.0 The Pig 0.5.0 reference manual does not mention how to write comments in your pig code using -- (two dashes). http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html Also, does /* */ also work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1246) SequenceFileLoader problem with compressed values
SequenceFileLoader problem with compressed values - Key: PIG-1246 URL: https://issues.apache.org/jira/browse/PIG-1246 Project: Pig Issue Type: Bug Reporter: Derek Brown I sent the following to the pig-users list, and Dmitriy said to open a ticket. http://mail-archives.apache.org/mod_mbox/hadoop-pig-user/201002.mbox/%3c357a70951002191451n6136a3en8475652fc0bd3...@mail.gmail.com%3e I'm having a problem getting the SequenceFileLoader, from the Piggybank, to read sequence files whose values are block comressed (gzip'd). I'm using Pig 0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera. Did the following: * Copied the SequenceFileLoader class into my own project * Removed public LoadFunc.RequiredFieldResponse fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList) because LoadFunc.RequiredFieldList isn't resolvable, and added public void fieldsToRead(Schema schema) * Jarred up the .class file * Programmatically created a trivial sequence file of a few lines, with IntWritable keys and Text values, using the basic code in an example in Hadoop The Definitive Guide * That file is successfully read and keys/values displayed, with hadoop fs -text, as well as with pig, doing the following: grunt register sequencefileloader.jar; grunt r = load '/path/to/sequence_file' using com.foobar.SequenceFileLoader(); grunt dump r; * The sequence file with the compressed values is successfully read with hadoop fs -text * When doing the load step in pig with that file, the following results: -- 2010-02-19 16:59:14,489 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform.. . using builtin-java classes where applicable 2010-02-19 16:59:14,490 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor 2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1018: Problem determining schema during load Details at logfile: /path/to/pig_1266616744562.log -- That log file contains the following: -- org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Problem determining schema during load at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981) at org.apache.pig.PigServer.registerQuery(PigServer.java:383) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:363) Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem determining schema during load at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031) ... 8 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: Problem determining schema during load at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732) ... 10 more Caused by: java.io.EOFException at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) at java.util.zip.GZIPInputStream.init(GZIPInputStream.java:58) at java.util.zip.GZIPInputStream.init(GZIPInputStream.java:68) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.init(GzipCodec.java:92) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.init(GzipCodec.java:101) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1412) at com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140) at
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836080#action_12836080 ] Pradeep Kamath commented on PIG-966: In retrospect, I think we can skip on creating a LoadFuncInterface since currently there is no real use case for an interface - we are adding it to keep symmetry with StoreFuncINterface and to allow implementations which extends other classes to implement this interface. The first motivation is not very strong and second also can be achieved through composition rather than inheritance - it is unclear how inheriting a different class would benefit a Loader implementation over composition to delegation functionality. By introducing a LoadFuncInterface we would be exposing users who implement it to backward incompatible additions in the future. So I think we should not add a LoadFuncInterface now and ONLY if a real need arises add it. The rest of my proposal (making StoreFunc an abstract class and add a new StoreFuncInterface) still holds. Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1233) NullPointerException in AVG
[ https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1233: Resolution: Fixed Status: Resolved (was: Patch Available) NullPointerException in AVG Key: PIG-1233 URL: https://issues.apache.org/jira/browse/PIG-1233 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur Assignee: Ankur Fix For: 0.7.0 Attachments: jira-1233.patch The overridden method - getValue() in AVG throws null pointer exception in case accumulate() is not called leaving variable 'intermediateCount' initialized to null. This causes java to throw exception when it tries to 'unbox' the value for numeric comparison. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1233) NullPointerException in AVG
[ https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836086#action_12836086 ] Olga Natkovich commented on PIG-1233: - patch committed to the trunk. Thanks, Ankur! NullPointerException in AVG Key: PIG-1233 URL: https://issues.apache.org/jira/browse/PIG-1233 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur Assignee: Ankur Fix For: 0.7.0 Attachments: jira-1233.patch The overridden method - getValue() in AVG throws null pointer exception in case accumulate() is not called leaving variable 'intermediateCount' initialized to null. This causes java to throw exception when it tries to 'unbox' the value for numeric comparison. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-961) Integration with Hadoop 21
[ https://issues.apache.org/jira/browse/PIG-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-961. Resolution: Fixed We have already integrated with Hadoop 20 API. Integration with Hadoop 21 -- Key: PIG-961 URL: https://issues.apache.org/jira/browse/PIG-961 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Assignee: Ying He Attachments: hadoop21.jar, PIG-961.patch, PIG-961.patch2 Hadoop 21 is not yet released but we know that switch to new MR API is coming there. This JIRA is for early integration with this API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF
[ https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1241: Affects Version/s: 0.6.0 Fix Version/s: 0.7.0 Assignee: Ying He Accumulator is turned on when a map is used with a non-accumulative UDF --- Key: PIG-1241 URL: https://issues.apache.org/jira/browse/PIG-1241 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ying He Assignee: Ying He Fix For: 0.7.0 Attachments: accum.patch Exception is thrown for a script like the following: register /homes/yinghe/owl/string.jar; a = load 'a.txt' as (id, url); b = group a by (id, url); c = foreach b generate COUNT(a), (CHARARRAY) string.URLPARSE(group.url)#'url'; dump c; In this query, URLPARSE() is not accumulative, and it returns a map. The accumulator optimizer failed to check UDF in this case, and tries to run the job in accumulative mode. ClassCastException is thrown when trying to cast UDF into Accumulator interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF
[ https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1241: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to the trunk. Thanks, Ying Accumulator is turned on when a map is used with a non-accumulative UDF --- Key: PIG-1241 URL: https://issues.apache.org/jira/browse/PIG-1241 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ying He Assignee: Ying He Fix For: 0.7.0 Attachments: accum.patch Exception is thrown for a script like the following: register /homes/yinghe/owl/string.jar; a = load 'a.txt' as (id, url); b = group a by (id, url); c = foreach b generate COUNT(a), (CHARARRAY) string.URLPARSE(group.url)#'url'; dump c; In this query, URLPARSE() is not accumulative, and it returns a map. The accumulator optimizer failed to check UDF in this case, and tries to run the job in accumulative mode. ClassCastException is thrown when trying to cast UDF into Accumulator interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1247) Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error
Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error - Key: PIG-1247 URL: https://issues.apache.org/jira/browse/PIG-1247 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 I have a large script in which there are intermediate stores statements, one of them writes to a directory I do not have permission to write to. The stack trace I get from Pig is this: 2010-02-20 02:16:32,055 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error Details at logfile: /home/viraj/pig_1266632145355.log Pig Stack Trace --- ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error java.lang.ClassCastException: org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3583) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1407) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:949) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:762) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1036) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:986) at org.apache.pig.PigServer.registerQuery(PigServer.java:386) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) The only way to find the error was to look at the javacc generated QueryParser.java code and do a System.out.println() Here is a script to reproduce the problem: {code} A = load '/user/viraj/three.txt' using PigStorage(); B = foreach A generate ['a'#'12'] as b:map[] ; store B into '/user/secure/pigtest' using PigStorage(); {code} three.txt has 3 lines which contain nothing but the number 1. {code} $ hadoop fs -ls /user/secure/ ls: could not get get listing for 'hdfs://mynamenode/user/secure' : org.apache.hadoop.security.AccessControlException: Permission denied: user=viraj, access=READ_EXECUTE, inode=secure:secure:users:rwx-- {code} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1247) Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error
[ https://issues.apache.org/jira/browse/PIG-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836107#action_12836107 ] Daniel Dai commented on PIG-1247: - This error handling code is hard coded by javacc. Seems we do not have a way to get around currently. Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error - Key: PIG-1247 URL: https://issues.apache.org/jira/browse/PIG-1247 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 I have a large script in which there are intermediate stores statements, one of them writes to a directory I do not have permission to write to. The stack trace I get from Pig is this: 2010-02-20 02:16:32,055 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error Details at logfile: /home/viraj/pig_1266632145355.log Pig Stack Trace --- ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error java.lang.ClassCastException: org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3583) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1407) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:949) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:762) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1036) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:986) at org.apache.pig.PigServer.registerQuery(PigServer.java:386) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) The only way to find the error was to look at the javacc generated QueryParser.java code and do a System.out.println() Here is a script to reproduce the problem: {code} A = load '/user/viraj/three.txt' using PigStorage(); B = foreach A generate ['a'#'12'] as b:map[] ; store B into '/user/secure/pigtest' using PigStorage(); {code} three.txt has 3 lines which contain nothing but the number 1. {code} $ hadoop fs -ls /user/secure/ ls: could not get get listing for 'hdfs://mynamenode/user/secure' : org.apache.hadoop.security.AccessControlException: Permission denied: user=viraj, access=READ_EXECUTE, inode=secure:secure:users:rwx-- {code} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836108#action_12836108 ] Ashutosh Chauhan commented on PIG-928: -- Hey Woody, Great work !! This will definitely be useful for lot of Pig users. I just hastily looked at your work. One question which stuck to me is you are doing lot of heavy lifting to provide for multi-language support by figuring out which language user is asking for and then doing reflection to load appropriate interpreter and stuff. I think it might be easier to use one of the frameworks here (BSF or javax.script) which hides this and allows handling of multiple language transparently. (atleast, thats what they claim to do) Have you taken a look at them? These frameworks will arguably help us to provide support for more languages without maintaining lot of code on our part. Though, I am sure they will come at the performance cost (certainly CPU and possibly memory too). UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Attachments: package.zip, scripting.tgz, scripting.tgz It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.