[jira] Commented: (PIG-812) COUNT(*) does not work
[ https://issues.apache.org/jira/browse/PIG-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729550#action_12729550 ] Hadoop QA commented on PIG-812: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413078/PIG-812.patch against trunk revision 792663. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/121/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/121/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/121/console This message is automatically generated. COUNT(*) does not work --- Key: PIG-812 URL: https://issues.apache.org/jira/browse/PIG-812 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Benjamin Reed Fix For: 0.2.0 Attachments: PIG-812.patch, PIG-812.pdf, studenttab10k Pig script to count the number of rows in a studenttab10k file which contains 10k records. {code} studenttab = LOAD 'studenttab10k' AS (name:chararray, age:int,gpa:float); X2 = GROUP studenttab ALL; describe X2; Y2 = FOREACH X2 GENERATE COUNT(*); explain Y2; DUMP Y2; {code} returns the following error ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias Y2 Details at logfile: /homes/viraj/pig-svn/trunk/pig_1242783700970.log If you look at the log file: Caused by: java.lang.ClassCastException at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:76) at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:68) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:223) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:245) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:88) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729700#action_12729700 ] Alan Gates commented on PIG-794: I agree with Doug's comments that it's better to use an API to build the schema that will give us compile time checking. I think it will also (hopefully) be easier to figure out the schema when reading the code, as it will avoid the need to read JSON directly. I have a general question on the approach. This is a direct port of Pig's BinStorage to use Avro, including the writing of indicator bytes for types. I do not have a deep knowledge of Avro. But I had assumed that since it was a de/serialization framework with types, part of what it would provide was type recognition. That is, can't this code rely on Avro to set the type for it? Do we need to be writing those indicator bytes ourselves? Perhaps this is the same comment that Doug is making about using GenericDatumReader and addField. In response to Hong's comment, the sync marks are vulnerable as you point out. But the loader needs some way to find a proper starting place when it's handed any block but the initial block of a file. I wonder if we could create a new sync type. It would always consist of a 100 byte marker (say the first 25 prime numbers, or the first 25 digits of pi or something). We could then write a tuple with that sync type every 1000 records in the data. Loaders that don't start at position 0 could then seek to the first sync type it found before it began reading. All loaders would read past the end of their position until they saw a sync type. As for this being compatible with with non-pig apps, that isn't the purpose of this AvroStorage function. This is for pig to pass data between MR jobs for itself. Having a tool independent storage format is a bigger project, as it requires agreeing on things like sync marks, how to represent different Avro objects, etc. Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Fix For: 0.2.0 Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-880) Order by is borken with complex fields
Order by is borken with complex fields -- Key: PIG-880 URL: https://issues.apache.org/jira/browse/PIG-880 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Fix For: 0.4.0 Pig script: a = load 'studentcomplextab10k' as (smap:map[],c2,c3); f = foreach a generate smap#'name, smap#'age', smap#'gpa' ; s = order f by $0; store s into 'sc.out' Stack: Caused by: java.lang.ArrayStoreException at java.lang.System.arraycopy(Native Method) at java.util.Arrays.copyOf(Arrays.java:2763) at java.util.ArrayList.toArray(ArrayList.java:305) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96) ... 5 more at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769) at org.apache.pig.PigServer.execute(PigServer.java:762) at org.apache.pig.PigServer.access$100(PigServer.java:91) at org.apache.pig.PigServer$Graph.execute(PigServer.java:933) at org.apache.pig.PigServer.executeBatch(PigServer.java:245) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:389) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729744#action_12729744 ] Hong Tang commented on PIG-879: --- 1) and 3) are kind of equivalent to user, and are preferred for customized loaders that do not wish pig to do the escaping at all. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729751#action_12729751 ] Dmitriy V. Ryaboy commented on PIG-879: --- Having this be a global flag through properties wouldn't work for scripts that require both behaviors in different load statements. Maybe a boolean performPathConversion flag which is true by default, and can be overridden via the load statement? Custom Loaders could change what their default is. I think a boolean flag is more straightforward than a method you have to override with a no-op. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729755#action_12729755 ] Thejas M Nair commented on PIG-879: --- The problem with 1 3 is that the setting is universal to the grunt shell or script. In cases where user wants to read from read from multiple sources with different loaders, it will be inconvenient to be forced to use absolute uri's for all of them. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-881) Pig should ship load udfs to the backend
Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Fix For: 0.4.0 Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-881: -- Assignee: Daniel Dai Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729770#action_12729770 ] Daniel Dai commented on PIG-881: Find some problem, I will deliver patch again shortly. Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-881: --- Attachment: (was: PIG-881-1.patch) Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729771#action_12729771 ] Hong Tang commented on PIG-879: --- Both are valid arguments. The problem of 2) and 4) are that they require change to the load statement syntax or load-func api and would take longer to get there. I guess we could structure the fix in two phases: Phase One: supporting 1) and 3), so that we can have the minimum to move along without having to disable multi-query optimization completely. User should be able to modify the script to change all relative paths to absolute ones (the chance of such usage should be rare that most people should not be impacted). Phase Two: support either 2) or 4) (but I do not think we need both). And personally I think 4) would be better because loader should be the one that interprets the location string syntax. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-881: --- Attachment: PIG-881-1.patch Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729785#action_12729785 ] Olga Natkovich commented on PIG-881: +1; the patch looks good! Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-882) log level not propogated to loggers
log level not propogated to loggers Key: PIG-882 URL: https://issues.apache.org/jira/browse/PIG-882 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Pig accepts log level as a parameter. But the log level it captures is not set appropriately, so that loggers in different classes log at the specified level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-881: --- Status: Patch Available (was: Open) Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch, PIG-881-2.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729859#action_12729859 ] Milind Bhandarkar commented on PIG-879: --- I see some long term issues with all the approaches/options. First, not all loaders require a path. (e.g. DBLoader) Some paths (e.g. hftp:// or hsftp://) do not have a notion of relative or absolute. Indeed, the right way to fix this is to change the syntax of load and store statements, so that the loader itself deals with the path handling, and not pig. Second, take out copyToLocal, cp, mv, and all the dfs shell functionality from pig. These are side effects and impose a barrier for optimization. In the current form, they do not belong in a dataflow language. Grunt could still support it. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-880) Order by is borken with complex fields
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729882#action_12729882 ] Pradeep Kamath commented on PIG-880: The root cause of this issue is that in interpreting map data, PigStorage returns values in the map to be of the type that it deduces based on the data. So string data for values are returned as String, integer values are returned as Integer. However the logical layer in Pig assumes the type of the values in the map to be ByteArray since it cannot assume any type. If one of the sampled values forming the quantile list is a null, it is assumed to be of type of the reduce key of the final order by job. In this case, since the order by key is smap#'name', it is thought to be of type ByteArray. However the values resulting from the map lookup are actually of type String. This mismatch results in the above exception - if nulls are filtered out, map.collect() fails because hadoop thinks the map key type is bytearray but it gets a Text (string). A proposal to fix this is to Change TextDataParser which is used by PigStorage for reading map data to return ByteArray type for the values in the map. Thoughts? Order by is borken with complex fields -- Key: PIG-880 URL: https://issues.apache.org/jira/browse/PIG-880 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Fix For: 0.4.0 Pig script: a = load 'studentcomplextab10k' as (smap:map[],c2,c3); f = foreach a generate smap#'name, smap#'age', smap#'gpa' ; s = order f by $0; store s into 'sc.out' Stack: Caused by: java.lang.ArrayStoreException at java.lang.System.arraycopy(Native Method) at java.util.Arrays.copyOf(Arrays.java:2763) at java.util.ArrayList.toArray(ArrayList.java:305) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96) ... 5 more at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769) at org.apache.pig.PigServer.execute(PigServer.java:762) at org.apache.pig.PigServer.access$100(PigServer.java:91) at org.apache.pig.PigServer$Graph.execute(PigServer.java:933) at org.apache.pig.PigServer.executeBatch(PigServer.java:245) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:389) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-881: --- Attachment: PIG-881-3.patch Get all unit test pass. Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch, PIG-881-2.patch, PIG-881-3.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-724) Treating map values in PigStorage
[ https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-724: --- Summary: Treating map values in PigStorage (was: Treating integers and strings in PigStorage) Treating map values in PigStorage - Key: PIG-724 URL: https://issues.apache.org/jira/browse/PIG-724 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.1 Reporter: Santhosh Srinivasan Fix For: 0.2.1 Currently, PigStorage cannot treats the materialized string 123 as an integer with the value 123. If the user intended this to be the string 123, PigStorage cannot deal with it. This reasoning also applies to doubles. Due to this issue, maps that contain values which are of the same type but manifest the issue discussed at beginning of the paragraph, Pig throws its hands up at runtime. An example to illustrate the problem will help. In the example below a sample row in the data (map.txt) contains the following: [key01#35,key02#value01] When Pig tries to convert the stream to a map, it creates a MapObject, Object where the key is a string and the value is an integer. Running the script shown below, results in a run-time error. {code} grunt a = load 'map.txt' as (themap: map[]); grunt b = filter a by (chararray)(themap#'key01') == 'hello'; grunt dump b; 2009-03-18 15:19:03,773 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-03-18 15:19:28,797 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed 2009-03-18 15:19:28,817 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1081: Cannot cast to chararray. Expected bytearray but received: int {code} There are two ways to resolve this issue: 1. Change the conversion routine for bytesToMap to return a map where the value is a bytearray and not the actual type. This change breaks backward compatibility 2. Introduce checks in POCast where conversions that are legal in the type checking world are allowed, i.e., run time checks will be made to check for compatible casts. In the above example, an int can be converted to a chararray and the cast will be made. If on the other hand, it was a chararray to int conversion then an exception will be thrown. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-881: --- Status: Patch Available (was: In Progress) Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch, PIG-881-2.patch, PIG-881-3.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-881: --- Status: In Progress (was: Patch Available) Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch, PIG-881-2.patch, PIG-881-3.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729887#action_12729887 ] Olga Natkovich commented on PIG-881: +1 on the patch. Patch process seems to ve stuck again. We ran the tests manually and they passed, so please, commit the patch. Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch, PIG-881-2.patch, PIG-881-3.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-881) Pig should ship load udfs to the backend
[ https://issues.apache.org/jira/browse/PIG-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729892#action_12729892 ] Hadoop QA commented on PIG-881: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413156/PIG-881-2.patch against trunk revision 792663. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/122/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/122/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/122/console This message is automatically generated. Pig should ship load udfs to the backend Key: PIG-881 URL: https://issues.apache.org/jira/browse/PIG-881 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-881-1.patch, PIG-881-2.patch, PIG-881-3.patch Currently, when we use load udfs, we have to use register statement. It is ideal that if user put udf jars in classpath, we can omit register statement, Pig can pick the udf from classpath automatically. However, Pig do not ship load udfs currently, the classpath approach does not work. register works because Pig ship that entire jar. Pig do ship eval udfs and storage udfs, we should ship load udfs as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.