[jira] Created: (PIG-1371) Pig should handle deep casting of complex types
Pig should handle deep casting of complex types Key: PIG-1371 URL: https://issues.apache.org/jira/browse/PIG-1371 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath Fix For: 0.8.0 Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made. There are two issues currently: 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type bag no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed. 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be: * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field. * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer * The cast throws an IncompatibleCastException -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Patch Available (was: Open) POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Open (was: Patch Available) The unit tests all run successfully on my local machine - the hudson QA failure was due to a temporal port conflict issue - will resubmit - meantime the patch is ready for review. POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856088#action_12856088 ] Olga Natkovich commented on PIG-1364: - +1 for all the branches Public javadoc on apache site still on 0.2, needs to be updated for each version release Key: PIG-1364 URL: https://issues.apache.org/jira/browse/PIG-1364 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.4.0, 0.5.0, 0.6.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0 Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains javadocs for 0.2. It is also versionless. It needs to be changed so that javadocs for recent versions are posted. It also needs to change so that the version is in the api so that multiple versions of the API can be posted. It's probably too late to do this for 0.6 and before, but it needs to happen for 0.7. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kidave updated PIG-1331: - Attachment: owl.contrib.4.tar.gz The test issue was because the jettyrunner jar had not been copied into contrib/owl/ci directory. The manual steps required are cp jetty-runner-7.0.0.pre5.jar contrib/owl/ci cp log4j-1.2.15.jar contrib/owl/java/lib chmod +x contrib/owl/bin/owl.sh contrib/owl/ci/derby_initialize_schema.sh contrib/owl/ci/derby_cleanup.sh contrib/owl/ci/jetty_check.sh contrib/owl/ci/jetty_start.sh contrib/owl/ci/jetty_stop.sh Attached an updated patch with the javadoc and findbugs issues fixed. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Attachments: anttestoutput.tgz, build.log, ivy_version.patch, owl.contrib.3.tgz, owl.contrib.4.tar.gz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856096#action_12856096 ] Daniel Dai commented on PIG-1369: - +1 POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1361) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig
[ https://issues.apache.org/jira/browse/PIG-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856122#action_12856122 ] Daniel Dai commented on PIG-1361: - +1. I tested from Pig, getSchema behaves as described. Didn't look at the code, someone know the code should also take a look. [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig - Key: PIG-1361 URL: https://issues.apache.org/jira/browse/PIG-1361 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Gaurav Jain Assignee: Gaurav Jain Priority: Minor Fix For: 0.8.0 Attachments: PIG-1361.patch Pig request for consistency reasons among different TableLoader that Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1338: Fix Version/s: 0.8.0 Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, PIG-1338-4.patch, PIG-1338-5.patch, PIG-1338-6.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1330) Move pruned schema tracking logic from LoadFunc to core code
[ https://issues.apache.org/jira/browse/PIG-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1330: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed No unit test since this mostly a document change, no behavior will be changed for now, just to be consistent as the code evolves. Committed to both 0.7 branch and trunk. Move pruned schema tracking logic from LoadFunc to core code Key: PIG-1330 URL: https://issues.apache.org/jira/browse/PIG-1330 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1330-1.patch Currently, LoadFunc.getSchema require a schema after column pruning. The good side of this is LoadFunc.getSchema matches the data it actually load. This gives a sense of consistency. However, by doing this, every LoadFunc need to keep track of the columns pruned. This is an unnecessary burden to the LoadFunc writer and it is very error proning. This issue is to move this logic from LoadFunc to Pig core. LoadFunc.getSchema then only need to return original schema even after pruning. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1199) help includes obsolete options
[ https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1199: Fix Version/s: 0.8.0 (was: 0.7.0) help includes obsolete options -- Key: PIG-1199 URL: https://issues.apache.org/jira/browse/PIG-1199 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.8.0 This is confusing to users -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (PIG-803) Pig Latin Reference Manual - discussion of Pig streaming is incomplete
[ https://issues.apache.org/jira/browse/PIG-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-803: -- Assignee: Corinne Chandel (was: Olga Natkovich) Looks like streaming discussion in http://hadoop.apache.org/pig/docs/r0.6.0/piglatin_ref2.html#STREAM is not very clear on how define is used with streaming. Corinne could we make the connection more clear, thanks. Pig Latin Reference Manual - discussion of Pig streaming is incomplete -- Key: PIG-803 URL: https://issues.apache.org/jira/browse/PIG-803 Project: Pig Issue Type: Bug Components: documentation Reporter: David Ciemiewicz Assignee: Corinne Chandel Fix For: 0.7.0 The Pig Latin Reference Manual section on STREAM is missing broad swaths of information such as a discussion of the ship() clause. http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_STREAM_ A more complete definition seems to be here: http://wiki.apache.org/pig/PigStreamingFunctionalSpec However, it discusses auto shipping of scripts which doesn't seem to be working. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1359) bin/pig script does not pick up correct jar libraries
[ https://issues.apache.org/jira/browse/PIG-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856165#action_12856165 ] Daniel Dai commented on PIG-1359: - Hi, Gianmarco, Thanks for your concern. Actually we need one additional step to make bin/pig work. We shall copy $PIG_HOME/build/pig-0.8.0-dev.jar to $PIG_HOME/pig-0.8.0-core.jar. This will be handled in ant's package target when releasing. But if you check out from svn, we will do this additional step to work with bin/pig. bin/pig script does not pick up correct jar libraries - Key: PIG-1359 URL: https://issues.apache.org/jira/browse/PIG-1359 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Environment: Linux Ubuntu 8.10, java-6-sun Reporter: Gianmarco De Francisci Morales Priority: Trivial Fix For: 0.8.0 Attachments: pig-1359.patch The bin/pig script tries to load pig jar libraries from the pig-*-core.jar using this bash fragment {code} # for releases, add core pig to CLASSPATH for f in $PIG_HOME/pig-*core.jar; do CLASSPATH=${CLASSPATH}:$f; done # during development pig jar might be in build for f in $PIG_HOME/build/pig-*-core.jar; do CLASSPATH=${CLASSPATH}:$f; done {code} The pig-\*-core.jar does not contain the dependencies for pig that are found in build/ivy/lib/Pig/\*.jar (jline). The script does not even pick the pig.jar in PIG_HOME that is produced as a result of the ant build process. This results in the following error after successfully building pig: {code} Exception in thread main java.lang.NoClassDefFoundError: jline/ConsoleReaderInputStream Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1361) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig
[ https://issues.apache.org/jira/browse/PIG-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1361: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Yan already reviewed it. Commit the patch to both trunk and 0.7 branch. [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig - Key: PIG-1361 URL: https://issues.apache.org/jira/browse/PIG-1361 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Gaurav Jain Assignee: Gaurav Jain Priority: Minor Fix For: 0.8.0 Attachments: PIG-1361.patch Pig request for consistency reasons among different TableLoader that Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1361) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig
[ https://issues.apache.org/jira/browse/PIG-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1361: Fix Version/s: 0.7.0 (was: 0.8.0) Affects Version/s: 0.7.0 (was: 0.8.0) Description: Pig request for consistency reasons among different TableLoader that Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig (was: Pig request for consistency reasons among different TableLoader that Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig ) [Zebra] Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig - Key: PIG-1361 URL: https://issues.apache.org/jira/browse/PIG-1361 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Gaurav Jain Assignee: Gaurav Jain Priority: Minor Fix For: 0.7.0 Attachments: PIG-1361.patch Pig request for consistency reasons among different TableLoader that Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1359) bin/pig script does not pick up correct jar libraries
[ https://issues.apache.org/jira/browse/PIG-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1359: Status: Resolved (was: Patch Available) Resolution: Won't Fix bin/pig script does not pick up correct jar libraries - Key: PIG-1359 URL: https://issues.apache.org/jira/browse/PIG-1359 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Environment: Linux Ubuntu 8.10, java-6-sun Reporter: Gianmarco De Francisci Morales Priority: Trivial Fix For: 0.8.0 Attachments: pig-1359.patch The bin/pig script tries to load pig jar libraries from the pig-*-core.jar using this bash fragment {code} # for releases, add core pig to CLASSPATH for f in $PIG_HOME/pig-*core.jar; do CLASSPATH=${CLASSPATH}:$f; done # during development pig jar might be in build for f in $PIG_HOME/build/pig-*-core.jar; do CLASSPATH=${CLASSPATH}:$f; done {code} The pig-\*-core.jar does not contain the dependencies for pig that are found in build/ivy/lib/Pig/\*.jar (jline). The script does not even pick the pig.jar in PIG_HOME that is produced as a result of the ant build process. This results in the following error after successfully building pig: {code} Exception in thread main java.lang.NoClassDefFoundError: jline/ConsoleReaderInputStream Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1359) bin/pig script does not pick up correct jar libraries
[ https://issues.apache.org/jira/browse/PIG-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856179#action_12856179 ] Daniel Dai commented on PIG-1359: - The comment change # Set the version for Hadoop, default to 17 - # Set the version for Hadoop, default to 20 is totally valid, we will change it. bin/pig script does not pick up correct jar libraries - Key: PIG-1359 URL: https://issues.apache.org/jira/browse/PIG-1359 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Environment: Linux Ubuntu 8.10, java-6-sun Reporter: Gianmarco De Francisci Morales Priority: Trivial Fix For: 0.8.0 Attachments: pig-1359.patch The bin/pig script tries to load pig jar libraries from the pig-*-core.jar using this bash fragment {code} # for releases, add core pig to CLASSPATH for f in $PIG_HOME/pig-*core.jar; do CLASSPATH=${CLASSPATH}:$f; done # during development pig jar might be in build for f in $PIG_HOME/build/pig-*-core.jar; do CLASSPATH=${CLASSPATH}:$f; done {code} The pig-\*-core.jar does not contain the dependencies for pig that are found in build/ivy/lib/Pig/\*.jar (jline). The script does not even pick the pig.jar in PIG_HOME that is produced as a result of the ant build process. This results in the following error after successfully building pig: {code} Exception in thread main java.lang.NoClassDefFoundError: jline/ConsoleReaderInputStream Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend
[ https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1323: Status: Resolved (was: Patch Available) Resolution: Invalid There is already a hadoop property mapred.task.id which is set to the map/reduce task id in the backend and is not set in the front end which can be used to figure this out. Hence it is best not to introduce new properties in the configuration for this purpose. Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend --- Key: PIG-1323 URL: https://issues.apache.org/jira/browse/PIG-1323 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1323.patch Loaders which interact with external systems like a metadata server may need to know if the LoadFunc.setLocation call happens from the frontend (on the client machine) or in the backend (on each map task). The Configuration in the Job argument to setLocation() can contain this information. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856208#action_12856208 ] Alan Gates commented on PIG-1331: - Now that I've figured out how to read directions I've run the tests on the new patch and they pass. I've also run javadocs and findbugs and all looks good. I'll start a vote on whether to accept this as a contrib. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Attachments: anttestoutput.tgz, build.log, ivy_version.patch, owl.contrib.3.tgz, owl.contrib.4.tar.gz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1363) Unnecessary loadFunc instantiations
[ https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1363: -- Attachment: pig-1363.patch Ideal solution of this problem is to have {{{LoadFunc}}} implements {{{Serializable}}}. Then LoadFunc will be instantiated once first time its needed (in LoLoad) and then everywhere this one object is used. But this will be backward incompatible as all the load func implementation then have to be necessarily implement Serializable. So, for now we will live with this. This patch gets rid of the multiple load func instantiation in front end where it could be avoided without the need of making it Serializable. No test cases are needed since this is purely code cleanup and doesn't add/delete/modify any existing functionality, so current regression tests suffice. Unnecessary loadFunc instantiations --- Key: PIG-1363 URL: https://issues.apache.org/jira/browse/PIG-1363 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Fix For: 0.8.0 Attachments: pig-1363.patch In MRCompiler loadfuncs are instantiated at multiple locations in different visit methods. This is inconsistent and confusing. LoadFunc should be instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). A getter should be added to POLoad to retrieve this instantiated loadFunc wherever it is needed in later stages of compilation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856239#action_12856239 ] Alan Gates commented on PIG-1331: - After looking through the rules it looks like we don't need a vote on contrib projects. This JIRA serves as the place for people to voice their concerns and vote for or against. I'd like to commit this within a few days unless I hear otherwise. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Attachments: anttestoutput.tgz, build.log, ivy_version.patch, owl.contrib.3.tgz, owl.contrib.4.tar.gz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Attachment: PIG-1372.patch Attached patch restores PigInputFormat.sJob - however it is deprecated (and so also PigMapReduce.sJobConf for user code) and the javadoc comment indicates to use UDFContext.getUDFContext().getJobConf() instead. No tests are included since this simply restores a static variable for backward compatibility and is not used in pig code. Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Status: Patch Available (was: Open) Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856246#action_12856246 ] Alan Gates commented on PIG-1370: - bq. In Expression.java the comment reads - // Because this isn't actually used yet. - Expression class is currently used in LoadMetadata.setPartitionFilter() - I agree though that we should mark this not stable - Evolving perhaps? Changed to Evolving, which matches LoadMetadata. bq. In SortColInfo.java, I think we should not mark this public since this is used entirely inside Pig code and any communicating to external storeFuncs is through ResourceSchema. The same comment holds for SortInfo.java But it is accepted as an arg to one of ResourceSchema's constructors. I think that makes it public, unless we want to say that constructor isn't intended for public use (in which case, why is it public?). bq. Is there a way to mark public methods in a class as internal - for example PigServer.getAliases() - currently this is used by unit tests - should we be exposing this method to the user? if not can we mark it not public through annotation? (Is there a different policy like if there is no javadoc comments for a public method, then it is not truely public?) I don't know how to do this with annotations. I've changed the javadocs to have an initial sentence of Intended to be used by unit tests only. bq. I think CollectableLoadFunc should be evolving Motion carried, I've changed it to evolving. bq. ComparsionFunc.java unfortunately already had ctrl-m chars - the new additions in the patch also do - if it isn't extensive, we could remove the ctrl-m chars. Another comment is that per http://wiki.apache.org/pig/Pig070IncompatibleChanges, custom comparators are no longer supported since 0.7 - if so, should this be @deprecated? - I think currently custom comparators don't work in local mode. I did mark ComparisonFunc as deprecated. Are you saying we should just remove it instead of deprecate it? bq. I hope marking LoadFunc stable will not prevent additions to this abstract class (which should not break backward compatibilty if default impls are provided) The definition of stable is that it will work across versions without a recompile of Java code. So this won't prevent growing existing abstract classes. Marking Pig interfaces for org.apache.pig package - Key: PIG-1370 URL: https://issues.apache.org/jira/browse/PIG-1370 Project: Pig Issue Type: Sub-task Components: documentation Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1370.patch Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of changes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Attachment: PIG-1370_2.patch New patch with changes based on Pradeep's feedback. Marking Pig interfaces for org.apache.pig package - Key: PIG-1370 URL: https://issues.apache.org/jira/browse/PIG-1370 Project: Pig Issue Type: Sub-task Components: documentation Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1370.patch, PIG-1370_2.patch Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of changes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Status: Open (was: Patch Available) Marking Pig interfaces for org.apache.pig package - Key: PIG-1370 URL: https://issues.apache.org/jira/browse/PIG-1370 Project: Pig Issue Type: Sub-task Components: documentation Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1370.patch, PIG-1370_2.patch Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of changes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Status: Patch Available (was: Open) Marking Pig interfaces for org.apache.pig package - Key: PIG-1370 URL: https://issues.apache.org/jira/browse/PIG-1370 Project: Pig Issue Type: Sub-task Components: documentation Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1370.patch, PIG-1370_2.patch Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of changes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (PIG-1363) Unnecessary loadFunc instantiations
[ https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned PIG-1363: - Assignee: Ashutosh Chauhan Unnecessary loadFunc instantiations --- Key: PIG-1363 URL: https://issues.apache.org/jira/browse/PIG-1363 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: pig-1363.patch In MRCompiler loadfuncs are instantiated at multiple locations in different visit methods. This is inconsistent and confusing. LoadFunc should be instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). A getter should be added to POLoad to retrieve this instantiated loadFunc wherever it is needed in later stages of compilation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1363) Unnecessary loadFunc instantiations
[ https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1363: -- Status: Patch Available (was: Open) Unnecessary loadFunc instantiations --- Key: PIG-1363 URL: https://issues.apache.org/jira/browse/PIG-1363 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: pig-1363.patch In MRCompiler loadfuncs are instantiated at multiple locations in different visit methods. This is inconsistent and confusing. LoadFunc should be instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). A getter should be added to POLoad to retrieve this instantiated loadFunc wherever it is needed in later stages of compilation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1364: Status: Resolved (was: Patch Available) Resolution: Fixed All of the patches checked in. Confirmed that for 0.4, 0.5, and 0.6 on the website the link now points to the version specific docs. Public javadoc on apache site still on 0.2, needs to be updated for each version release Key: PIG-1364 URL: https://issues.apache.org/jira/browse/PIG-1364 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.4.0, 0.5.0, 0.6.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.7.0, 0.6.0, 0.5.0, 0.4.0 Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains javadocs for 0.2. It is also versionless. It needs to be changed so that javadocs for recent versions are posted. It also needs to change so that the version is in the api so that multiple versions of the API can be posted. It's probably too late to do this for 0.6 and before, but it needs to happen for 0.7. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1373) We need to add jdiff output to docs on the website
We need to add jdiff output to docs on the website -- Key: PIG-1373 URL: https://issues.apache.org/jira/browse/PIG-1373 Project: Pig Issue Type: Bug Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Fix For: 0.8.0 Our build process constructs a jdiff between APIs for different versions. But we don't post the results of that to the website when we deploy the docs. We should, in order to help users understand changes across versions of pig. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag
Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag -- Key: PIG-1374 URL: https://issues.apache.org/jira/browse/PIG-1374 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0, 0.7.0 Reporter: Viraj Bhat Script loads data from BinStorage(), then flattens columns and then sorts on the second column with order descending. The order by fails with the ClassCastException {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $1 desc; dump d; {code} The sampling job fails with the following error: === java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) === The schema for b, c and d are as follows: b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}} c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} If we modify this script to order on the first column it seems to work {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $0 desc; dump d; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) There is a workaround to do a projection before ORDER {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; newc = foreach c generate $0 as uuid, $1 as velocity; newd = order newc by velocity desc; dump newd; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) The schema for the Loader is as follows: {code} public Schema outputSchema(Schema input) { try{ ListSchema.FieldSchema list = new ArrayListSchema.FieldSchema(); list.add(new Schema.FieldSchema(uuid, DataType.CHARARRAY)); list.add(new Schema.FieldSchema(velocity, DataType.DOUBLE)); Schema tupleSchema = new Schema(list); Schema.FieldSchema tupleFs = new Schema.FieldSchema(tuple, tupleSchema, DataType.TUPLE); Schema bagSchema = new Schema(tupleFs); bagSchema.setTwoLevelAccessRequired(true); Schema.FieldSchema bagFs = new Schema.FieldSchema(bag_of_tuples,bagSchema, DataType.BAG); return new Schema(bagFs); }catch (Exception e){ return null; } } {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag
[ https://issues.apache.org/jira/browse/PIG-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-1374: --- Assignee: Daniel Dai Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag -- Key: PIG-1374 URL: https://issues.apache.org/jira/browse/PIG-1374 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0, 0.7.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Script loads data from BinStorage(), then flattens columns and then sorts on the second column with order descending. The order by fails with the ClassCastException {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $1 desc; dump d; {code} The sampling job fails with the following error: === java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) === The schema for b, c and d are as follows: b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}} c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} If we modify this script to order on the first column it seems to work {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $0 desc; dump d; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) There is a workaround to do a projection before ORDER {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; newc = foreach c generate $0 as uuid, $1 as velocity; newd = order newc by velocity desc; dump newd; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) The schema for the Loader is as follows: {code} public Schema outputSchema(Schema input) { try{ ListSchema.FieldSchema list = new ArrayListSchema.FieldSchema(); list.add(new Schema.FieldSchema(uuid, DataType.CHARARRAY)); list.add(new Schema.FieldSchema(velocity, DataType.DOUBLE)); Schema tupleSchema = new Schema(list); Schema.FieldSchema tupleFs = new Schema.FieldSchema(tuple, tupleSchema, DataType.TUPLE); Schema bagSchema = new Schema(tupleFs); bagSchema.setTwoLevelAccessRequired(true); Schema.FieldSchema bagFs = new Schema.FieldSchema(bag_of_tuples,bagSchema, DataType.BAG); return new Schema(bagFs); }catch (Exception e){ return null; } } {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1374) Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag
[ https://issues.apache.org/jira/browse/PIG-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1374: Fix Version/s: 0.7.0 Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag -- Key: PIG-1374 URL: https://issues.apache.org/jira/browse/PIG-1374 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0, 0.7.0 Reporter: Viraj Bhat Fix For: 0.7.0 Script loads data from BinStorage(), then flattens columns and then sorts on the second column with order descending. The order by fails with the ClassCastException {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $1 desc; dump d; {code} The sampling job fails with the following error: === java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) === The schema for b, c and d are as follows: b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}} c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} If we modify this script to order on the first column it seems to work {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $0 desc; dump d; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) There is a workaround to do a projection before ORDER {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; newc = foreach c generate $0 as uuid, $1 as velocity; newd = order newc by velocity desc; dump newd; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) The schema for the Loader is as follows: {code} public Schema outputSchema(Schema input) { try{ ListSchema.FieldSchema list = new ArrayListSchema.FieldSchema(); list.add(new Schema.FieldSchema(uuid, DataType.CHARARRAY)); list.add(new Schema.FieldSchema(velocity, DataType.DOUBLE)); Schema tupleSchema = new Schema(list); Schema.FieldSchema tupleFs = new Schema.FieldSchema(tuple, tupleSchema, DataType.TUPLE); Schema bagSchema = new Schema(tupleFs); bagSchema.setTwoLevelAccessRequired(true); Schema.FieldSchema bagFs = new Schema.FieldSchema(bag_of_tuples,bagSchema, DataType.BAG); return new Schema(bagFs); }catch (Exception e){ return null; } } {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira