[jira] [Updated] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2181: - Resolution: Fixed Fix Version/s: 0.9.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Chinna! Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Fix For: 0.9.0 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.6.patch, HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2380) Add ByteArray Datatype
[ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113770#comment-13113770 ] John Sichi commented on HIVE-2380: -- I'm planning to review this one next week. Add ByteArray Datatype -- Key: HIVE-2380 URL: https://issues.apache.org/jira/browse/HIVE-2380 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: hive-2380.patch, hive-2380_1.patch Add bytearray as a primitive data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2442) Metastore upgrade script and schema DDL for Hive 0.8.0
[ https://issues.apache.org/jira/browse/HIVE-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2442: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, committed to trunk, thanks Carl! I did not do testing; we can do that with the release candidate, and then if there are problems, submit a corrective patch. I noticed that you omitted PostgreSQL? Metastore upgrade script and schema DDL for Hive 0.8.0 -- Key: HIVE-2442 URL: https://issues.apache.org/jira/browse/HIVE-2442 Project: Hive Issue Type: Task Components: Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach Priority: Blocker Fix For: 0.8.0 Attachments: HIVE-2442-branch-08.1.patch.txt, HIVE-2442-trunk.1.patch.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2462) make INNER a non-reserved keyword
make INNER a non-reserved keyword - Key: HIVE-2462 URL: https://issues.apache.org/jira/browse/HIVE-2462 Project: Hive Issue Type: Improvement Reporter: John Sichi Assignee: John Sichi HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards compatibility for queries which were using it as an identifier. This patch addresses that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2462) make INNER a non-reserved keyword
[ https://issues.apache.org/jira/browse/HIVE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112861#comment-13112861 ] John Sichi commented on HIVE-2462: -- Not sure whether we want/need this, but if so, here's the patch. make INNER a non-reserved keyword - Key: HIVE-2462 URL: https://issues.apache.org/jira/browse/HIVE-2462 Project: Hive Issue Type: Improvement Reporter: John Sichi Assignee: John Sichi Fix For: 0.9.0 Attachments: HIVE-2462.1.patch HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards compatibility for queries which were using it as an identifier. This patch addresses that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112881#comment-13112881 ] John Sichi commented on HIVE-2181: -- +1. Will commit when tests pass. Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.6.patch, HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1558) introducing the dual table
[ https://issues.apache.org/jira/browse/HIVE-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1558: Assignee: Marcin Kurczych introducing the dual table Key: HIVE-1558 URL: https://issues.apache.org/jira/browse/HIVE-1558 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Ning Zhang Assignee: Marcin Kurczych The dual table in MySQL and Oracle is very convenient in testing UDFs or constructing rows without reading any other tables. If dual is the only data source we could leverage the local mode execution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2244) Add a Plugin Developer Kit to Hive
[ https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-2244: Assignee: John Sichi Add a Plugin Developer Kit to Hive -- Key: HIVE-2244 URL: https://issues.apache.org/jira/browse/HIVE-2244 Project: Hive Issue Type: New Feature Components: UDF Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-2244.patch See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2463) fix Eclipse for javaewah upgrade
fix Eclipse for javaewah upgrade Key: HIVE-2463 URL: https://issues.apache.org/jira/browse/HIVE-2463 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.9.0 Reporter: John Sichi Assignee: John Sichi I always forget this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2463) fix Eclipse for javaewah upgrade
[ https://issues.apache.org/jira/browse/HIVE-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2463: - Attachment: HIVE-2463.1.patch fix Eclipse for javaewah upgrade Key: HIVE-2463 URL: https://issues.apache.org/jira/browse/HIVE-2463 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.9.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.9.0 Attachments: HIVE-2463.1.patch I always forget this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2463) fix Eclipse for javaewah upgrade
[ https://issues.apache.org/jira/browse/HIVE-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2463: - Fix Version/s: 0.9.0 Status: Patch Available (was: Open) fix Eclipse for javaewah upgrade Key: HIVE-2463 URL: https://issues.apache.org/jira/browse/HIVE-2463 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.9.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.9.0 Attachments: HIVE-2463.1.patch I always forget this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2244) Add a Plugin Developer Kit to Hive
[ https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2244: - Attachment: HIVE-2244.1.patch HIVE-2244.1.patch has the optimization. Add a Plugin Developer Kit to Hive -- Key: HIVE-2244 URL: https://issues.apache.org/jira/browse/HIVE-2244 Project: Hive Issue Type: New Feature Components: UDF Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-2244.1.patch, HIVE-2244.patch See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2244) Add a Plugin Developer Kit to Hive
[ https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2244: - Status: Patch Available (was: Open) Review Board at https://reviews.apache.org/r/2030 Add a Plugin Developer Kit to Hive -- Key: HIVE-2244 URL: https://issues.apache.org/jira/browse/HIVE-2244 Project: Hive Issue Type: New Feature Components: UDF Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-2244.1.patch, HIVE-2244.patch See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2458) Group-by query optimization Followup: add flag in conf/hive-default.xml
[ https://issues.apache.org/jira/browse/HIVE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2458: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Prajakta! Group-by query optimization Followup: add flag in conf/hive-default.xml --- Key: HIVE-2458 URL: https://issues.apache.org/jira/browse/HIVE-2458 Project: Hive Issue Type: Improvement Components: Indexing, Query Processor Affects Versions: 0.7.1 Reporter: Prajakta Kalmegh Assignee: Prajakta Kalmegh Fix For: 0.9.0 Attachments: HIVE-2458.1.patch, HIVE-2458.2.patch Followup patch to HIVE-1694. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2181: - Status: Open (was: Patch Available) I ran TestHiveServer, and even though it passed, I saw the exception below in the test output. That's because one of the test cases leaves the socket in use, so the second one fails to open it. Rather than actually starting the server, maybe just unit-test the cleanup method in isolation? {noformat} [junit] org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:1. [junit] at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:93) [junit] at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:75) [junit] at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:68) [junit] at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:688) [junit] at org.apache.hadoop.hive.service.TestHiveServer$2.run(TestHiveServer.java:423) [junit] - --- [junit] {noformat} Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2459) remove all @author tags from source
[ https://issues.apache.org/jira/browse/HIVE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-2459. -- Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Committed to trunk. Thanks Ashutosh! remove all @author tags from source --- Key: HIVE-2459 URL: https://issues.apache.org/jira/browse/HIVE-2459 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.9.0 Attachments: hive-2459.patch $ grep --exclude-dir=build --exclude-dir=.svn -r @author . ./ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java: * @author athusoo ./ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java: * @author John Sichi -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1496) enhance CREATE INDEX to support immediate index build
[ https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108953#comment-13108953 ] John Sichi commented on HIVE-1496: -- Ashutosh, the DEFERRED REBUILD refers to the data portion (not the metadata for the index definition). enhance CREATE INDEX to support immediate index build - Key: HIVE-1496 URL: https://issues.apache.org/jira/browse/HIVE-1496 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0, 0.8.0 Reporter: John Sichi Assignee: Ashutosh Chauhan Attachments: hive-1496.patch Currently we only support WITH DEFERRED REBUILD. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2181: - Status: Open (was: Patch Available) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, HIVE-2181.4.patch, HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2458) Group-by query optimization Followup: add flag in conf/hive-default.xml
[ https://issues.apache.org/jira/browse/HIVE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109091#comment-13109091 ] John Sichi commented on HIVE-2458: -- For the _Of_ casing, I searched the code base and found a few more instances: * RewriteCanApplyProcFactory.java * RewriteQueryUsingAggregateIndex.java Do these need to be changed too? Group-by query optimization Followup: add flag in conf/hive-default.xml --- Key: HIVE-2458 URL: https://issues.apache.org/jira/browse/HIVE-2458 Project: Hive Issue Type: Improvement Components: Indexing, Query Processor Affects Versions: 0.7.1 Reporter: Prajakta Kalmegh Assignee: Prajakta Kalmegh Fix For: 0.9.0 Attachments: HIVE-2458.1.patch Followup patch to HIVE-1694. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2458) Group-by query optimization Followup: add flag in conf/hive-default.xml
[ https://issues.apache.org/jira/browse/HIVE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109132#comment-13109132 ] John Sichi commented on HIVE-2458: -- +1. Will commit when tests pass. Group-by query optimization Followup: add flag in conf/hive-default.xml --- Key: HIVE-2458 URL: https://issues.apache.org/jira/browse/HIVE-2458 Project: Hive Issue Type: Improvement Components: Indexing, Query Processor Affects Versions: 0.7.1 Reporter: Prajakta Kalmegh Assignee: Prajakta Kalmegh Fix For: 0.9.0 Attachments: HIVE-2458.1.patch, HIVE-2458.2.patch Followup patch to HIVE-1694. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1079) CREATE VIEW followup: derive dependencies on underlying base table partitions from view definition
[ https://issues.apache.org/jira/browse/HIVE-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1079: Assignee: Prajakta Kalmegh (was: John Sichi) CREATE VIEW followup: derive dependencies on underlying base table partitions from view definition --- Key: HIVE-1079 URL: https://issues.apache.org/jira/browse/HIVE-1079 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Prajakta Kalmegh When querying a view, it would be useful to know which underlying base table partitions it depends on in order to know how fresh the result is (or to be able to wait until all of those partitions have been loaded consistently). The task is to come up with a way to perform this analysis automatically (possibly overconservatively), or alternately to let the view creator annotate the view definition with this dependency information, or some combination of the two. Note that this would be useful for any complex query which directly accesses base tables (not just view definitions). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3
[ https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2448: - Attachment: javaewah-0.3.jar Upgrade JavaEWAH to 0.3 --- Key: HIVE-2448 URL: https://issues.apache.org/jira/browse/HIVE-2448 Project: Hive Issue Type: Improvement Components: Indexing Reporter: John Sichi Assignee: John Sichi Attachments: javaewah-0.3.jar It contains performance improvements and should be a drop-in replacement. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3
[ https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2448: - Attachment: HIVE-2448.1.patch Upgrade JavaEWAH to 0.3 --- Key: HIVE-2448 URL: https://issues.apache.org/jira/browse/HIVE-2448 Project: Hive Issue Type: Improvement Components: Indexing Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-2448.1.patch, javaewah-0.3.jar It contains performance improvements and should be a drop-in replacement. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2448) Upgrade JavaEWAH to 0.3
[ https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108138#comment-13108138 ] John Sichi commented on HIVE-2448: -- Did you look in the src .zip? There's a file unit.java there. Upgrade JavaEWAH to 0.3 --- Key: HIVE-2448 URL: https://issues.apache.org/jira/browse/HIVE-2448 Project: Hive Issue Type: Improvement Components: Indexing Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-2448.1.patch, javaewah-0.3.jar It contains performance improvements and should be a drop-in replacement. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3
[ https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2448: - Fix Version/s: 0.9.0 Status: Patch Available (was: Open) Upgrade JavaEWAH to 0.3 --- Key: HIVE-2448 URL: https://issues.apache.org/jira/browse/HIVE-2448 Project: Hive Issue Type: Improvement Components: Indexing Reporter: John Sichi Assignee: John Sichi Fix For: 0.9.0 Attachments: HIVE-2448.1.patch, javaewah-0.3.jar It contains performance improvements and should be a drop-in replacement. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3
[ https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2448: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk since Ed already +1'd it. Upgrade JavaEWAH to 0.3 --- Key: HIVE-2448 URL: https://issues.apache.org/jira/browse/HIVE-2448 Project: Hive Issue Type: Improvement Components: Indexing Reporter: John Sichi Assignee: John Sichi Fix For: 0.9.0 Attachments: HIVE-2448.1.patch, javaewah-0.3.jar It contains performance improvements and should be a drop-in replacement. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-198) Parse errors report incorrectly.
[ https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105680#comment-13105680 ] John Sichi commented on HIVE-198: - The updated patch does not apply cleanly. Also, I tried the original test query from the description. Before your patch, the message is cannot recognize input near ',' 'bigint' ')' in column type. After your patch, the message is unexpected input token 'userid' near ',' 'bigint' ')' in column type. It's not clear that this is an improvement, since userid is fine; the syntax error is in what follows it. The problem as originally reported (referring to KW_TEMPORARY) seems to have been fixed long ago. Parse errors report incorrectly. Key: HIVE-198 URL: https://issues.apache.org/jira/browse/HIVE-198 Project: Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Assignee: Aviv Eyal Labels: parse Attachments: HIVE-198.2.patch.txt, PraseErrorMessage.patch The following two queries fail: CREATE TABLE output_table(userid, bigint); CREATE TABLE output_table(userid bigint, age int, sex string, location string); each giving the error message FAILED: Parse Error: line 1:16 mismatched input 'TABLE' expecting KW_TEMPORARY Although one might not catch it from the error message, the problem with the first is that there is a comma between userid and bigint, and the problem with the second is that location is a reserved keyword. Reported errors should more accurately describe the nature of the error, such as no type given for column 'userid' or 'location' is not a valid column name. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2448) Upgrade JavaEWAH to 0.3
Upgrade JavaEWAH to 0.3 --- Key: HIVE-2448 URL: https://issues.apache.org/jira/browse/HIVE-2448 Project: Hive Issue Type: Improvement Components: Indexing Reporter: John Sichi Assignee: John Sichi It contains performance improvements and should be a drop-in replacement. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1040) use sed rather than diff for masking out noise in diff-based tests
[ https://issues.apache.org/jira/browse/HIVE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105732#comment-13105732 ] John Sichi commented on HIVE-1040: -- Another benefit is to show us exactly what is being masked out just by examining the .q.out files (something that currently makes some tests give less coverage than they should). use sed rather than diff for masking out noise in diff-based tests -- Key: HIVE-1040 URL: https://issues.apache.org/jira/browse/HIVE-1040 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Affects Versions: 0.4.1 Reporter: John Sichi Priority: Minor The current diff -I approach has two problems: (1) it does not allow resolution finer than line-level, so it's impossible to mask out pattern occurrences within a line, and (2) it produces unmasked files, so if you run diff on the command line to compare the result .q.out with the checked-in file, you see the noise. My suggestion is to first run sed to replace noise patterns with an unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files without using any -I. This would require a one-time hit to update all existing .q.out files so that they would contain the pre-masked results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2449) streamline .q.out format
streamline .q.out format Key: HIVE-2449 URL: https://issues.apache.org/jira/browse/HIVE-2449 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: John Sichi Currently, we enable all available testing hooks (e.g. lineage, input/output) for all tests. This creates a huge amount of noise in the .q.out files, making it very difficult to read them and to review diffs in them. To fix this, we should only selectively enable specific hooks for specific tests where the coverage is needed. Undertaking this will necessitate a one-time hit for updating all existing .q.out files. Probably best to do together with HIVE-1040. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1040) use sed rather than diff for masking out noise in diff-based tests
[ https://issues.apache.org/jira/browse/HIVE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105759#comment-13105759 ] John Sichi commented on HIVE-1040: -- Good point. We can probably figure out how to do it completely within Java by filtering the CLI output stream via java.util.regex. That's what I did for Eigenbase, and it worked fine. use sed rather than diff for masking out noise in diff-based tests -- Key: HIVE-1040 URL: https://issues.apache.org/jira/browse/HIVE-1040 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Affects Versions: 0.4.1 Reporter: John Sichi Priority: Minor The current diff -I approach has two problems: (1) it does not allow resolution finer than line-level, so it's impossible to mask out pattern occurrences within a line, and (2) it produces unmasked files, so if you run diff on the command line to compare the result .q.out with the checked-in file, you see the noise. My suggestion is to first run sed to replace noise patterns with an unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files without using any -I. This would require a one-time hit to update all existing .q.out files so that they would contain the pre-masked results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2446) Introduction of client statistics publishers possibility
[ https://issues.apache.org/jira/browse/HIVE-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-2446: Assignee: Robert Surówka Introduction of client statistics publishers possibility Key: HIVE-2446 URL: https://issues.apache.org/jira/browse/HIVE-2446 Project: Hive Issue Type: Improvement Components: Clients, Statistics Reporter: Robert Surówka Assignee: Robert Surówka Priority: Minor Attachments: HIVE-2446.1.patch, HIVE-2446.1.patch Original Estimate: 1h Remaining Estimate: 1h The purpose of this change is to allow publication or storage of counters while the job is running. Introduced two new variables to hive-default.xml and HiveConf.java: hive.client.stats.publishers and hive.client.stats.counters. First one specifies classes names, whose instances will be executed by HadoopJobExecHelper.java (similarly as hooks are) in its method progress(ExecDriverTaskHandle): MapRedStats. Second one specifies list of counters that any client stat publishers should publish or stored. Details regarding format of this list is up to a specific deployment (it is up to client stats publishers to parse it), yet it is required to use display names of counter groups and counters. Added interface ClientStatsPublishers in org.apache.hadoop.hive.ql.stats package, that must be implemented by all stats publishers. Added code to progress(ExecDriverTaskHandle): MapRedStats from HadoopJobExecHelper.java that puts counters' values to a Java map and then executes registered client stats publishers giving them that map and running job id. Added two new methods to HadoopJobExecHelper: extractAllCounterValues(Counters) and getClientStatsPublishers() that are used by code from previous sentence. Made cosmetic changes in two other classes -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2182: - Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Chinna! Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.9.0 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, HIVE-2182.4.patch, HIVE-2182.5.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception it should throw meaning full exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-198) Parse errors report incorrectly.
[ https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104948#comment-13104948 ] John Sichi commented on HIVE-198: - Did you miss create_or_replace_view6.q.out? Perhaps it was committed after you started. jsichi-mac:clientnegative jsichi$ grep cannot recognize *.q.out column_rename3.q.out:FAILED: Parse Error: line 1:27 cannot recognize input near 'EOF' 'EOF' 'EOF' in column type create_or_replace_view6.q.out:FAILED: Parse Error: line 2:52 cannot recognize input near 'blah' 'EOF' 'EOF' in select clause invalid_select_expression.q.out:FAILED: Parse Error: line 1:32 cannot recognize input near '.' 'foo' 'EOF' in expression specification invalid_tbl_name.q.out:FAILED: Parse Error: line 1:20 cannot recognize input near '-' 'name' '(' in create table statement Parse errors report incorrectly. Key: HIVE-198 URL: https://issues.apache.org/jira/browse/HIVE-198 Project: Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Assignee: Aviv Eyal Labels: parse Attachments: HIVE-198.2.patch.txt, PraseErrorMessage.patch The following two queries fail: CREATE TABLE output_table(userid, bigint); CREATE TABLE output_table(userid bigint, age int, sex string, location string); each giving the error message FAILED: Parse Error: line 1:16 mismatched input 'TABLE' expecting KW_TEMPORARY Although one might not catch it from the error message, the problem with the first is that there is a comma between userid and bigint, and the problem with the second is that location is a reserved keyword. Reported errors should more accurately describe the nature of the error, such as no type given for column 'userid' or 'location' is not a valid column name. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2380) Add ByteArray Datatype
[ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104980#comment-13104980 ] John Sichi commented on HIVE-2380: -- I don't see any references to it, so I think you're free to use it. Add ByteArray Datatype -- Key: HIVE-2380 URL: https://issues.apache.org/jira/browse/HIVE-2380 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: hive-2380.patch Add bytearray as a primitive data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105014#comment-13105014 ] John Sichi commented on HIVE-2181: -- Oops, looks like I typed in the wrong JIRA issue number in the commit message :( Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105025#comment-13105025 ] John Sichi commented on HIVE-2182: -- Oops, looks like I typed in the wrong JIRA issue number in the commit message (I typed in HIVE-2181 instead of HIVE-2182), so the Hudson commit message went there instead. I've fixed it in the svn log though. Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.9.0 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, HIVE-2182.4.patch, HIVE-2182.5.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception it should throw meaning full exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2380) Add ByteArray Datatype
[ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103894#comment-13103894 ] John Sichi commented on HIVE-2380: -- For accessor functions: * length * substring * concat We can follow up later with search capabilities. For conversions: * to/from hex string * to/from string using a specific encoding (or default JVM encoding if not specified) * to/from base64 string We can follow up later with more interesting conversions for non-string types. Add ByteArray Datatype -- Key: HIVE-2380 URL: https://issues.apache.org/jira/browse/HIVE-2380 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: hive-2380.patch Add bytearray as a primitive data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2223) support grouping on complex types in Hive
[ https://issues.apache.org/jira/browse/HIVE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103900#comment-13103900 ] John Sichi commented on HIVE-2223: -- I can't seem to view the diff on Review Board? support grouping on complex types in Hive - Key: HIVE-2223 URL: https://issues.apache.org/jira/browse/HIVE-2223 Project: Hive Issue Type: New Feature Reporter: Kate Ting Assignee: Jonathan Chang Priority: Minor Attachments: HIVE-2223.patch Creating a query with a GROUP BY statement when an array type column is part of the column list is not yet supported: CREATE TABLE test_group_by ( key INT, group INT, terms ARRAYSTRING); SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms; ... Hash code on complex types not supported yet. java.lang.RuntimeException: Error while closing operators at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211) ... 4 more Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780) ... 9 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2223) support grouping on complex types in Hive
[ https://issues.apache.org/jira/browse/HIVE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104096#comment-13104096 ] John Sichi commented on HIVE-2223: -- It applies cleanly for me, but I was also able to upload it to Review Board successfully. Did you try choosing hive-git for the repository? support grouping on complex types in Hive - Key: HIVE-2223 URL: https://issues.apache.org/jira/browse/HIVE-2223 Project: Hive Issue Type: New Feature Reporter: Kate Ting Assignee: Jonathan Chang Priority: Minor Attachments: HIVE-2223.patch Creating a query with a GROUP BY statement when an array type column is part of the column list is not yet supported: CREATE TABLE test_group_by ( key INT, group INT, terms ARRAYSTRING); SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms; ... Hash code on complex types not supported yet. java.lang.RuntimeException: Error while closing operators at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211) ... 4 more Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780) ... 9 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege
[ https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2405: - Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Yongqiang! get_privilege does not get user level privilege --- Key: HIVE-2405 URL: https://issues.apache.org/jira/browse/HIVE-2405 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.9.0 Attachments: HIVE-2405.1.patch, HIVE-2405.2.patch hive set hive.security.authorization.enabled=true; hive grant all to user heyongqiang; hive show grant user heyongqiang; principalName heyongqiang principalType USER privilege All grantTime Wed Aug 24 11:51:54 PDT 2011 grantor heyongqiang Time taken: 0.032 seconds hive CREATE TABLE src (foo INT, bar STRING); Authorization failed:No privilege 'Create' found for outputs { database:default}. Use show grant to get more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102927#comment-13102927 ] John Sichi commented on HIVE-2182: -- Yeah, I hit those failures too while testing. I'll rerun with the latest patch. Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, HIVE-2182.4.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception it should throw meaning full exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2441) Metastore upgrade scripts for schema change introduced in HIVE-2215
[ https://issues.apache.org/jira/browse/HIVE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102944#comment-13102944 ] John Sichi commented on HIVE-2441: -- @Ashutosh: we provide the create scripts since DBA's may choose to control schema modification (rather than letting Hive do it automatically). Metastore upgrade scripts for schema change introduced in HIVE-2215 --- Key: HIVE-2441 URL: https://issues.apache.org/jira/browse/HIVE-2441 Project: Hive Issue Type: Task Components: Metastore Reporter: Carl Steinbach Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.8.0 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102988#comment-13102988 ] John Sichi commented on HIVE-1694: -- Prajakta, can you re-attach your latest patch granting rights to ASF (so the feather shows up next to the attachment), and then click the Submit Patch button? Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2441) Metastore upgrade scripts for schema change introduced in HIVE-2215
[ https://issues.apache.org/jira/browse/HIVE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102995#comment-13102995 ] John Sichi commented on HIVE-2441: -- Since we defined the feature generically, the tables should always be created; their presence will not cause any problem, and having that unconditional actually seems less confusing to me (we don't currently have any feature-specific portion of the metastore). Metastore upgrade scripts for schema change introduced in HIVE-2215 --- Key: HIVE-2441 URL: https://issues.apache.org/jira/browse/HIVE-2441 Project: Hive Issue Type: Task Components: Metastore Reporter: Carl Steinbach Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.8.0 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103013#comment-13103013 ] John Sichi commented on HIVE-1694: -- +1. Will commit when tests pass. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Fix For: 0.8.0 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-730) Allow Hive UDF/UDAF to use scala
[ https://issues.apache.org/jira/browse/HIVE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103112#comment-13103112 ] John Sichi commented on HIVE-730: - Apparently we just need to document it better: http://mail-archives.apache.org/mod_mbox/hive-user/201109.mbox/%3CCAKi8Xk3XQHJu1y++BM=oOS6M=astg3mbaojs+9zszugjjf1...@mail.gmail.com%3E Allow Hive UDF/UDAF to use scala Key: HIVE-730 URL: https://issues.apache.org/jira/browse/HIVE-730 Project: Hive Issue Type: New Feature Reporter: Zheng Shao Scala is a programing language that is concise and can run on top of jvm. http://www.scala-lang.org/ We should have some examples of Hive UDF/UDAF in Scala, and make it easy for people to write Hive UDF/UDAF in Scala. Thanks Venky for information and idea on scala and hive integration. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2327) UDFs should be made aware when their arguments are constants.
[ https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-2327. -- Resolution: Duplicate UDFs should be made aware when their arguments are constants. - Key: HIVE-2327 URL: https://issues.apache.org/jira/browse/HIVE-2327 Project: Hive Issue Type: Improvement Reporter: Adam Kramer There are a lot of UDFs which would show major performance differences if one assumes that some of its arguments are constant. Consider, for example, any UDF that takes a regular expression as input: This can be complied once (fast) if it's a constant, or once per row (wicked slow) if it's not a constant. Or, consider any UDF that reads from a file and/or takes a filename as input; it would have to re-read the whole file if the filename changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-200) ant test will fail with apache-ant-1.7.1
[ https://issues.apache.org/jira/browse/HIVE-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-200. - Resolution: Won't Fix We're on 1.8.x these days. ant test will fail with apache-ant-1.7.1 Key: HIVE-200 URL: https://issues.apache.org/jira/browse/HIVE-200 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Zheng Shao ant test succeeded with Apache Ant version 1.6.5 compiled on June 2 2005, but fails with apache-ant-1.7.1. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2380) Add ByteArray Datatype
[ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2380: - Status: Open (was: Patch Available) Add ByteArray Datatype -- Key: HIVE-2380 URL: https://issues.apache.org/jira/browse/HIVE-2380 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: hive-2380.patch Add bytearray as a primitive data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2327) UDFs should be made aware when their arguments are constants.
[ https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103150#comment-13103150 ] John Sichi commented on HIVE-2327: -- OK, then I guess this issue should be renamed? UDFs should be made aware when their arguments are constants. - Key: HIVE-2327 URL: https://issues.apache.org/jira/browse/HIVE-2327 Project: Hive Issue Type: Improvement Reporter: Adam Kramer There are a lot of UDFs which would show major performance differences if one assumes that some of its arguments are constant. Consider, for example, any UDF that takes a regular expression as input: This can be complied once (fast) if it's a constant, or once per row (wicked slow) if it's not a constant. Or, consider any UDF that reads from a file and/or takes a filename as input; it would have to re-read the whole file if the filename changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1694: - Resolution: Fixed Fix Version/s: (was: 0.8.0) 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Prajakta! Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Fix For: 0.9.0 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2182: - Status: Open (was: Patch Available) I got merge conflicts trying to apply the latest patch. At revision 1170007. (Stripping trailing CRs from patch.) patching file ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java (Stripping trailing CRs from patch.) patching file ql/src/test/queries/clientnegative/udfnull.q (Stripping trailing CRs from patch.) patching file ql/src/test/results/clientnegative/udfnull.q.out (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/cast1.q.xml Hunk #2 FAILED at 62. Hunk #3 FAILED at 124. Hunk #4 FAILED at 160. Hunk #5 succeeded at 371 (offset 4 lines). Hunk #7 succeeded at 455 (offset 4 lines). Hunk #9 succeeded at 526 (offset 4 lines). Hunk #11 succeeded at 622 (offset 4 lines). Hunk #13 succeeded at 1066 (offset 4 lines). Hunk #15 FAILED at 1131. Hunk #16 FAILED at 1193. 5 out of 16 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/cast1.q.xml.rej (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/groupby1.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/groupby2.q.xml Hunk #13 succeeded at 1408 (offset 4 lines). (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/groupby3.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/groupby4.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/groupby5.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/groupby6.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/input20.q.xml Hunk #1 FAILED at 1. Hunk #2 FAILED at 62. Hunk #3 FAILED at 124. Hunk #6 FAILED at 850. Hunk #7 FAILED at 862. Hunk #8 FAILED at 919. Hunk #9 FAILED at 981. Hunk #10 FAILED at 1015. 8 out of 10 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input20.q.xml.rej (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/input8.q.xml Hunk #1 FAILED at 1. Hunk #2 FAILED at 62. Hunk #3 FAILED at 124. Hunk #4 FAILED at 156. Hunk #5 succeeded at 314 (offset 4 lines). Hunk #7 succeeded at 403 (offset 4 lines). Hunk #8 FAILED at 641. Hunk #9 FAILED at 653. Hunk #10 FAILED at 710. Hunk #11 FAILED at 772. 8 out of 11 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input8.q.xml.rej (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/join2.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample1.q.xml Hunk #5 succeeded at 555 (offset 4 lines). Hunk #7 succeeded at 639 (offset 4 lines). Hunk #9 succeeded at 885 (offset 4 lines). Hunk #11 succeeded at 1021 (offset 4 lines). (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample2.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample3.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample4.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample5.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample6.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/sample7.q.xml (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/udf1.q.xml Hunk #5 succeeded at 510 (offset 4 lines). Hunk #7 succeeded at 606 (offset 4 lines). Hunk #9 succeeded at 702 (offset 4 lines). Hunk #11 succeeded at 798 (offset 4 lines). Hunk #13 succeeded at 894 (offset 4 lines). Hunk #15 succeeded at 997 (offset 4 lines). Hunk #17 succeeded at 1093 (offset 4 lines). Hunk #19 succeeded at 1203 (offset 4 lines). Hunk #21 succeeded at 1306 (offset 4 lines). Hunk #23 succeeded at 1904 (offset 4 lines). Hunk #25 succeeded at 2023 (offset 4 lines). (Stripping trailing CRs from patch.) patching file ql/src/test/results/compiler/plan/udf4.q.xml Hunk #5 succeeded at 523 (offset 4 lines). Hunk #7 succeeded at 585 (offset 4 lines). Hunk #9 succeeded at 662 (offset 4 lines). Hunk #11 succeeded at 717 (offset 4 lines). Hunk #13 succeeded at 794 (offset 4 lines). Hunk #15 succeeded at 849 (offset 4 lines). Hunk #17 succeeded at 919 (offset 4 lines). Hunk #19 succeeded at 996 (offset 4 lines). Hunk #21 succeeded at 1051 (offset 4 lines). Hunk #23 succeeded at 1126 (offset 4 lines). Hunk #25 succeeded at 1212 (offset 4 lines). Hunk #27 succeeded at 1296 (offset 4 lines). Hunk #29 succeeded at 1846 (offset 4 lines). Hunk #31 succeeded at 1965 (offset 4 lines). (Stripping trailing CRs from patch.) patching file
[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101499#comment-13101499 ] John Sichi commented on HIVE-2182: -- It's still failing for me with the latest patch. Did you use -Doverwrite=true to regenerate the log? {noformat} [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I Location -I LOCATION ' -I transient_lastDdlTime -I last_modified_ -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I job_[0-9]*_[0-9]* -I USING 'java -cp /data/users/jsichi/open/test-trunk/build/ql/test/logs/clientnegative/udfnull.q.out /data/users/jsichi/open/test-trunk/ql/src/test/results/clientnegative/udfnull.q.out [junit] 18c18,27 [junit] /data/users/jsichi/open/test-trunk/build/ql/tmp//hive.log [junit] --- [junit] /home/opensrc/9thsep/build/ql/tmp//hive.log [junit] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask [junit] PREHOOK: query: CREATE TEMPORARY FUNCTION example_arraysum AS 'org.apache.hadoop.hive.contrib.udf.example.UDFExampleArraySum' [junit] PREHOOK: type: CREATEFUNCTION [junit] POSTHOOK: query: CREATE TEMPORARY FUNCTION example_arraysum AS 'org.apache.hadoop.hive.contrib.udf.example.UDFExampleArraySum' [junit] POSTHOOK: type: CREATEFUNCTION [junit] PREHOOK: query: SELECT example_arraysum(lint)FROM src_thrift [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@src_thrift [junit] PREHOOK: Output: file:/tmp/root/hive_2011-05-25_10-05-57_126_4632621650656424226/-mr-1 {noformat} Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null
[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101501#comment-13101501 ] John Sichi commented on HIVE-2182: -- Oops, sorry, ignore comment above...I misapplied the latest patch. Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception it should throw meaning full exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101508#comment-13101508 ] John Sichi commented on HIVE-2182: -- +1. Will commit when tests pass. Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception it should throw meaning full exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101606#comment-13101606 ] John Sichi commented on HIVE-1694: -- Looks great. One last change: for all the SELECT queries in the .q file, can you add an ORDER BY on a full key for test determinism. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, HIVE-1694.6.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2182: - Status: Open (was: Patch Available) Can you add the test case back in? Also create a review board request? Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2182.1.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception it should throw meaning full exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2402) Function like with empty string is throwing null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100569#comment-13100569 ] John Sichi commented on HIVE-2402: -- +1. Will commit when tests pass. Function like with empty string is throwing null pointer exception -- Key: HIVE-2402 URL: https://issues.apache.org/jira/browse/HIVE-2402 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2402.1.patch, HIVE-2402.patch select emp.ename from emp where ename like '' This query is throwing null pointer exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2223) support grouping on complex types in Hive
[ https://issues.apache.org/jira/browse/HIVE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100570#comment-13100570 ] John Sichi commented on HIVE-2223: -- Jonathan, fill in the bug field in Review Board with HIVE-2223 so that the comments from there will automatically get propagated here. support grouping on complex types in Hive - Key: HIVE-2223 URL: https://issues.apache.org/jira/browse/HIVE-2223 Project: Hive Issue Type: New Feature Reporter: Kate Ting Assignee: Jonathan Chang Priority: Minor Attachments: HIVE-2223.patch Creating a query with a GROUP BY statement when an array type column is part of the column list is not yet supported: CREATE TABLE test_group_by ( key INT, group INT, terms ARRAYSTRING); SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms; ... Hash code on complex types not supported yet. java.lang.RuntimeException: Error while closing operators at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211) ... 4 more Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780) ... 9 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-198) Parse errors report incorrectly.
[ https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-198: --- Assignee: Aviv Eyal Parse errors report incorrectly. Key: HIVE-198 URL: https://issues.apache.org/jira/browse/HIVE-198 Project: Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Assignee: Aviv Eyal Labels: parse Attachments: PraseErrorMessage.patch The following two queries fail: CREATE TABLE output_table(userid, bigint); CREATE TABLE output_table(userid bigint, age int, sex string, location string); each giving the error message FAILED: Parse Error: line 1:16 mismatched input 'TABLE' expecting KW_TEMPORARY Although one might not catch it from the error message, the problem with the first is that there is a comma between userid and bigint, and the problem with the second is that location is a reserved keyword. Reported errors should more accurately describe the nature of the error, such as no type given for column 'userid' or 'location' is not a valid column name. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-198) Parse errors report incorrectly.
[ https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-198: Status: Open (was: Patch Available) Could you add a test case, and also submit a review board request? https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess Parse errors report incorrectly. Key: HIVE-198 URL: https://issues.apache.org/jira/browse/HIVE-198 Project: Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Assignee: Aviv Eyal Labels: parse Attachments: PraseErrorMessage.patch The following two queries fail: CREATE TABLE output_table(userid, bigint); CREATE TABLE output_table(userid bigint, age int, sex string, location string); each giving the error message FAILED: Parse Error: line 1:16 mismatched input 'TABLE' expecting KW_TEMPORARY Although one might not catch it from the error message, the problem with the first is that there is a comma between userid and bigint, and the problem with the second is that location is a reserved keyword. Reported errors should more accurately describe the nature of the error, such as no type given for column 'userid' or 'location' is not a valid column name. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2250) DESCRIBE EXTENDED table_name shows inconsistent compression information.
[ https://issues.apache.org/jira/browse/HIVE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-2250: Assignee: subramanian raghunathan DESCRIBE EXTENDED table_name shows inconsistent compression information. -- Key: HIVE-2250 URL: https://issues.apache.org/jira/browse/HIVE-2250 Project: Hive Issue Type: Bug Components: CLI, Diagnosability Affects Versions: 0.7.0 Environment: RHEL, Full Cloudera stack Reporter: Travis Powell Assignee: subramanian raghunathan Priority: Critical Attachments: HIVE-2250.patch Commands executed in this order: user@node # hive hive SET hive.exec.compress.output=true; hive SET io.seqfile.compression.type=BLOCK; hive CREATE TABLE table_name ( [...] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE; hive CREATE TABLE staging_table ( [...] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; hive LOAD DATA LOCAL INPATH 'file:///root/input/' OVERWRITE INTO TABLE staging_table; hive INSERT OVERWRITE TABLE table_name SELECT * FROM staging_table; (Map reduce job to change to sequence file...) hive DESCRIBE EXTENDED table_name; Detailed Table Information Table(tableName:table_name, dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, type:string, comment:null), FieldSchema(name:remote_address, type:string, comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), FieldSchema(name:canister_session_id, type:bigint, comment:null), FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, type:string, comment:null), FieldSchema(name:tltvid, type:string, comment:null), FieldSchema(name:canister_server, type:string, comment:null), FieldSchema(name:session_timestamp, type:string, comment:null), FieldSchema(name:session_duration, type:string, comment:null), FieldSchema(name:hit_count, type:bigint, comment:null), FieldSchema(name:http_user_agent, type:string, comment:null), FieldSchema(name:extractid, type:bigint, comment:null), FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= *** SEE ABOVE: Compression is set to FALSE, even though contents of table is compressed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2217) add Query text for debugging in lock data
[ https://issues.apache.org/jira/browse/HIVE-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-2217: Assignee: Jiayan Jiang add Query text for debugging in lock data - Key: HIVE-2217 URL: https://issues.apache.org/jira/browse/HIVE-2217 Project: Hive Issue Type: Improvement Affects Versions: 0.7.1 Reporter: Namit Jain Assignee: Jiayan Jiang Attachments: hive_diff2 Currently, the queryId is stored in the lock data - Query text would improve the debuggability -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2432) Bring project into compliance with Apache Software Foundation Branding Requirements
Bring project into compliance with Apache Software Foundation Branding Requirements --- Key: HIVE-2432 URL: https://issues.apache.org/jira/browse/HIVE-2432 Project: Hive Issue Type: Improvement Reporter: John Sichi Assignee: John Sichi http://www.apache.org/foundation/marks/pmcs.html I will be creating sub-tasks for the various work items needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2433) add DOAP file for Hive
add DOAP file for Hive -- Key: HIVE-2433 URL: https://issues.apache.org/jira/browse/HIVE-2433 Project: Hive Issue Type: Sub-task Reporter: John Sichi http://www.apache.org/foundation/marks/pmcs.html#metadata -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2434) add a TM to Hive logo image
add a TM to Hive logo image --- Key: HIVE-2434 URL: https://issues.apache.org/jira/browse/HIVE-2434 Project: Hive Issue Type: Sub-task Reporter: John Sichi http://www.apache.org/foundation/marks/pmcs.html#graphics And maybe the feather? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2435) Update project naming and description in Hive wiki
[ https://issues.apache.org/jira/browse/HIVE-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2435: - Description: http://www.apache.org/foundation/marks/pmcs.html#naming Update project naming and description in Hive wiki -- Key: HIVE-2435 URL: https://issues.apache.org/jira/browse/HIVE-2435 Project: Hive Issue Type: Sub-task Reporter: John Sichi Assignee: John Sichi http://www.apache.org/foundation/marks/pmcs.html#naming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2436) Update project naming and description in Hive website
Update project naming and description in Hive website - Key: HIVE-2436 URL: https://issues.apache.org/jira/browse/HIVE-2436 Project: Hive Issue Type: Sub-task Reporter: John Sichi http://www.apache.org/foundation/marks/pmcs.html#naming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2435) Update project naming and description in Hive wiki
Update project naming and description in Hive wiki -- Key: HIVE-2435 URL: https://issues.apache.org/jira/browse/HIVE-2435 Project: Hive Issue Type: Sub-task Reporter: John Sichi Assignee: John Sichi -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2437) update project website navigation links
update project website navigation links --- Key: HIVE-2437 URL: https://issues.apache.org/jira/browse/HIVE-2437 Project: Hive Issue Type: Sub-task Reporter: John Sichi http://www.apache.org/foundation/marks/pmcs.html#navigation -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2438) add trademark attributions to Hive homepage
add trademark attributions to Hive homepage --- Key: HIVE-2438 URL: https://issues.apache.org/jira/browse/HIVE-2438 Project: Hive Issue Type: Sub-task Reporter: John Sichi http://www.apache.org/foundation/marks/pmcs.html#attributions -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2402) Function like with empty string is throwing null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2402: - Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Chinna! Function like with empty string is throwing null pointer exception -- Key: HIVE-2402 URL: https://issues.apache.org/jira/browse/HIVE-2402 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.9.0 Attachments: HIVE-2402.1.patch, HIVE-2402.patch select emp.ename from emp where ename like '' This query is throwing null pointer exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF
[ https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2182: - Status: Open (was: Patch Available) I am getting the failure below when running the new test with latest trunk. Did you update the .q.out? {noformat} [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I Location -I LOCATION ' -I transient_lastDdlTime -I last_modified_ -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I job_[0-9]*_[0-9]* -I USING 'java -cp /data/users/jsichi/open/test-trunk/build/ql/test/logs/clientnegative/udfnull.q.out /data/users/jsichi/open/test-trunk/ql/src/test/results/clientnegative/udfnull.q.out [junit] 8,18c8 [junit] PREHOOK: Output: file:/tmp/jsichi/hive_2011-09-08_16-48-29_269_6749666372366482183/-mr-1 [junit] Execution failed with exit status: 2 [junit] Obtaining error information [junit] [junit] Task failed! [junit] Task ID: [junit]Stage-1 [junit] [junit] Logs: [junit] [junit] /data/users/jsichi/open/test-trunk/build/ql/tmp//hive.log [junit] --- [junit] PREHOOK: Output: file:/tmp/root/hive_2011-05-25_10-05-57_126_4632621650656424226/-mr-1 [junit] Exception: Client execution results failed with error code = 1 [junit] See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. [junit] Cleaning up TestNegativeCliDriver [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 5.496 sec [junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED {noformat} Avoid null pointer exception when executing UDF --- Key: HIVE-2182 URL: https://issues.apache.org/jira/browse/HIVE-2182 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.patch For using UDF's executed following steps {noformat} add jar /home/udf/udf.jar; create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} But from the above steps if we miss the first step (add jar) and execute remaining steps {noformat} create temporary function grade as 'udf.Grade'; select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m; {noformat} In tasktracker it is throwing this exception {noformat} Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 18 more Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 18 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107) ... 31 more {noformat} Instead of null pointer exception
[jira] [Updated] (HIVE-2426) Test that views with joins work properly
[ https://issues.apache.org/jira/browse/HIVE-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2426: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, passed tests, committed to trunk. Thanks Charles! Test that views with joins work properly Key: HIVE-2426 URL: https://issues.apache.org/jira/browse/HIVE-2426 Project: Hive Issue Type: Test Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2426.3.patch, HIVE-2426v2.patch With the testcase {noformat} drop table invites; drop table invites2; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); set hive.mapred.mode=strict; -- test join views: see HIVE-1989 create view v as select invites.bar, invites2.foo, invites2.ds from invites join invites2 on invites.ds=invites2.ds; explain select * from v where ds='2011-09-01'; drop view v; drop table invites; drop table invites2; {noformat} We should not have the partition pruner complain about invites.ds not having a predicate because the predicate invites2.ds='2011-09-01' will be inferred with the ppd transitivity optimization -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd
[ https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099939#comment-13099939 ] John Sichi commented on HIVE-2420: -- Yongqiang, didn't we already temporarily set that to false in our own config due to HIVE-2344? That has since been fixed, but if there are other problems, we can keep it disabled until all are resolved. partition pruner expr is not populated due to some bug in ppd - Key: HIVE-2420 URL: https://issues.apache.org/jira/browse/HIVE-2420 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2420.reproduce.diff -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2380) Add ByteArray Datatype
[ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098322#comment-13098322 ] John Sichi commented on HIVE-2380: -- You can find design doc examples here: https://cwiki.apache.org/confluence/display/Hive/DesignDocs Add ByteArray Datatype -- Key: HIVE-2380 URL: https://issues.apache.org/jira/browse/HIVE-2380 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: hive-2380.patch Add bytearray as a primitive data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1143) CREATE VIEW followup: updatable views
[ https://issues.apache.org/jira/browse/HIVE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098413#comment-13098413 ] John Sichi commented on HIVE-1143: -- Charles, if you had a patch in progress for this one, can you post it here as a checkpoint in case someone else has time to pick it up later? CREATE VIEW followup: updatable views -- Key: HIVE-1143 URL: https://issues.apache.org/jira/browse/HIVE-1143 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen For HIVE-972, we only implemented read-only views. Updatable views are difficult in general, but for simple cases where views are being used to impose a rename layer on existing tables/columns, update support would be high value (for consistent read/write access) and not a lot of work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2426) Test that views with joins work properly
[ https://issues.apache.org/jira/browse/HIVE-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2426: - Attachment: HIVE-2426.3.patch Made a few minor improvements to the overlap handling code and comments. Test that views with joins work properly Key: HIVE-2426 URL: https://issues.apache.org/jira/browse/HIVE-2426 Project: Hive Issue Type: Test Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2426.3.patch, HIVE-2426v2.patch With the testcase {noformat} drop table invites; drop table invites2; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); set hive.mapred.mode=strict; -- test join views: see HIVE-1989 create view v as select invites.bar, invites2.foo, invites2.ds from invites join invites2 on invites.ds=invites2.ds; explain select * from v where ds='2011-09-01'; drop view v; drop table invites; drop table invites2; {noformat} We should not have the partition pruner complain about invites.ds not having a predicate because the predicate invites2.ds='2011-09-01' will be inferred with the ppd transitivity optimization -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)
[ https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2369: - Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Clément! Minor typo in error message in HiveConnection.java (JDBC) - Key: HIVE-2369 URL: https://issues.apache.org/jira/browse/HIVE-2369 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1, 0.8.0 Environment: Linux Reporter: Clément Notin Assignee: Clément Notin Priority: Trivial Fix For: 0.9.0 Attachments: HIVE-2369.patch Original Estimate: 2m Remaining Estimate: 2m There is a minor typo issue in HiveConnection.java (jdbc) : {code}throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01);{code} It seems like there's a i missing. I know it's a very minor typo but I report it anyway. I won't attach a patch because it would be too long for me to SVN checkout just for 1 letter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2408) Perpetually degrading performance in checkPaths
[ https://issues.apache.org/jira/browse/HIVE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2408: - Component/s: (was: HBase Handler) Query Processor Perpetually degrading performance in checkPaths --- Key: HIVE-2408 URL: https://issues.apache.org/jira/browse/HIVE-2408 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1, 0.8.0 Reporter: Grisha Trubetskoy In ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, checkPaths() tacks on a copy_N if a file exists, working its way up until an available file name is found. The problem is that the exists() check is quite expensive in HDFS, and if you have hundreds of files to go through this becomes a serious bottleneck. A better solution would be to use a timestamp in the file name, then followed by the copy_N scheme. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1989: - Status: Open (was: Patch Available) I got failures in the following tests: index_auto_mult_tables index_auto_mult_tables_compact outer_join_ppr ppd_gby_join ppd_join ppd_join2 ppd_join3 ppd_outer_join3 ppd_outer_join5 ppd_union recognize transitivity of predicates on join keys - Key: HIVE-1989 URL: https://issues.apache.org/jira/browse/HIVE-1989 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Fix For: 0.8.0 Attachments: HIVE-1989v1.patch, HIVE-1989v10.patch, HIVE-1989v11.patch, HIVE-1989v4.patch, HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, HIVE-1989v8.patch, HIVE-1989v9.patch Given {noformat} set hive.mapred.mode=strict; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); select count(*) from invites join invites2 on invites.ds=invites2.ds where invites.ds='2011-01-01'; {noformat} currently an error occurs: {noformat} Error in semantic analysis: No Partition Predicate Found for Alias invites2 Table invites2 {noformat} The optimizer should be able to infer a predicate on invites2 via transitivity. The current lack places a burden on the user to add a redundant predicate, and makes impossible (at least in strict mode) join views where both underlying tables are partitioned (the join select list has to pick one of the tables arbitrarily). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096433#comment-13096433 ] John Sichi commented on HIVE-1989: -- +1. Will commit when tests pass. recognize transitivity of predicates on join keys - Key: HIVE-1989 URL: https://issues.apache.org/jira/browse/HIVE-1989 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Fix For: 0.8.0 Attachments: HIVE-1989v1.patch, HIVE-1989v10.patch, HIVE-1989v11.patch, HIVE-1989v12.patch, HIVE-1989v4.patch, HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, HIVE-1989v8.patch, HIVE-1989v9.patch Given {noformat} set hive.mapred.mode=strict; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); select count(*) from invites join invites2 on invites.ds=invites2.ds where invites.ds='2011-01-01'; {noformat} currently an error occurs: {noformat} Error in semantic analysis: No Partition Predicate Found for Alias invites2 Table invites2 {noformat} The optimizer should be able to infer a predicate on invites2 via transitivity. The current lack places a burden on the user to add a redundant predicate, and makes impossible (at least in strict mode) join views where both underlying tables are partitioned (the join select list has to pick one of the tables arbitrarily). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1989: - Resolution: Fixed Fix Version/s: (was: 0.8.0) 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Charles! recognize transitivity of predicates on join keys - Key: HIVE-1989 URL: https://issues.apache.org/jira/browse/HIVE-1989 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-1989v1.patch, HIVE-1989v10.patch, HIVE-1989v11.patch, HIVE-1989v12.patch, HIVE-1989v4.patch, HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, HIVE-1989v8.patch, HIVE-1989v9.patch Given {noformat} set hive.mapred.mode=strict; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); select count(*) from invites join invites2 on invites.ds=invites2.ds where invites.ds='2011-01-01'; {noformat} currently an error occurs: {noformat} Error in semantic analysis: No Partition Predicate Found for Alias invites2 Table invites2 {noformat} The optimizer should be able to infer a predicate on invites2 via transitivity. The current lack places a burden on the user to add a redundant predicate, and makes impossible (at least in strict mode) join views where both underlying tables are partitioned (the join select list has to pick one of the tables arbitrarily). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095607#comment-13095607 ] John Sichi commented on HIVE-2337: -- +1. Will commit when tests pass. Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false
[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095621#comment-13095621 ] John Sichi commented on HIVE-1545: -- I'm way behind on the PDK (probably not gonna make it for 0.8), but I'm planning to rework the UDFUtils into annotations as part of it. Cyril, I think they are mostly used for validation purposes, in which case you can just comment out the calls for now if you want to use the UDF without validation. Add a bunch of UDFs and UDAFs - Key: HIVE-1545 URL: https://issues.apache.org/jira/browse/HIVE-1545 Project: Hive Issue Type: New Feature Components: UDF Reporter: Jonathan Chang Assignee: Jonathan Chang Priority: Minor Attachments: core.tar.gz, ext.tar.gz, udfs.tar.gz, udfs.tar.gz Here some UD(A)Fs which can be incorporated into the Hive distribution: UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns 1. UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x b_{i} but = b_{i+1}. Returns 0 if x is smaller than all the buckets. UDFFindInArray - Finds the 1-index of the first element in the array given as the second argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0. UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates (in degrees). UDFLDA - Performs LDA inference on a vector given fixed topics. UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever any of its parameters changes. UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5. UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array. UDFUnescape - Returns the string unescaped (using C/Java style unescaping). UDFWhich - Given a boolean array, return the indices which are TRUE. UDFJaccard UDAFCollect - Takes all the values associated with a row and converts it into a list. Make sure to have: set hive.map.aggr = false; UDAFCollectMap - Like collect except that it takes tuples and generates a map. UDAFEntropy - Compute the entropy of a column. UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns. UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL. UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the N (passed as the third parameter) largest values of VAL. UDAFHistogram -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095643#comment-13095643 ] John Sichi commented on HIVE-1989: -- Charles, can you add a test case for the original partitioned join view use case? Separate JIRA is fine. recognize transitivity of predicates on join keys - Key: HIVE-1989 URL: https://issues.apache.org/jira/browse/HIVE-1989 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Fix For: 0.8.0 Attachments: HIVE-1989v1.patch, HIVE-1989v4.patch, HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, HIVE-1989v8.patch, HIVE-1989v9.patch Given {noformat} set hive.mapred.mode=strict; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); select count(*) from invites join invites2 on invites.ds=invites2.ds where invites.ds='2011-01-01'; {noformat} currently an error occurs: {noformat} Error in semantic analysis: No Partition Predicate Found for Alias invites2 Table invites2 {noformat} The optimizer should be able to infer a predicate on invites2 via transitivity. The current lack places a burden on the user to add a redundant predicate, and makes impossible (at least in strict mode) join views where both underlying tables are partitioned (the join select list has to pick one of the tables arbitrarily). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2380) Add ByteArray Datatype
[ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095669#comment-13095669 ] John Sichi commented on HIVE-2380: -- Ashutosh, maybe we can discuss this one at the contributor meetup next week (and then record the conclusions here). A few questions that I've heard so far: * Is there a design doc somewhere? * Since Hive already has an array type, but this feature is independent, we probably want a different type name than bytearray. * For conversions, is going through string for all types a good default behavior? An alternative would be to prevent implicit conversions altogether, and force users to pick the UDF with the desired behavior. E.g. for string/binary conversion, it's a good idea to be able to specify an encoding rather than always using the JVM default. * How does the new type work with TRANSFORM scripts, UDF's, saving to textfile, etc? * Don't we need more accessor functions (e.g. making the existing string functions such as LENGTH work)? Add ByteArray Datatype -- Key: HIVE-2380 URL: https://issues.apache.org/jira/browse/HIVE-2380 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: hive-2380.patch Add bytearray as a primitive data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2401) Show functions with regex not working
[ https://issues.apache.org/jira/browse/HIVE-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2401: - Status: Open (was: Patch Available) The wiki already explains how to do this. I don't hink we need any behavior change here. hive show functions 'm.*'; OK map map_keys map_values max min minute month Show functions with regex not working - Key: HIVE-2401 URL: https://issues.apache.org/jira/browse/HIVE-2401 Project: Hive Issue Type: Improvement Components: CLI Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2401.patch show functions a; If it gives all the function names starting with a is easy to search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2402) Function like with empty string is throwing null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2402: - Status: Open (was: Patch Available) Function like with empty string is throwing null pointer exception -- Key: HIVE-2402 URL: https://issues.apache.org/jira/browse/HIVE-2402 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2402.patch select emp.ename from emp where ename like '' This query is throwing null pointer exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)
[ https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-2369: Assignee: Clément Notin Minor typo in error message in HiveConnection.java (JDBC) - Key: HIVE-2369 URL: https://issues.apache.org/jira/browse/HIVE-2369 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1, 0.8.0 Environment: Linux Reporter: Clément Notin Assignee: Clément Notin Priority: Trivial Attachments: HIVE-2369.patch Original Estimate: 2m Remaining Estimate: 2m There is a minor typo issue in HiveConnection.java (jdbc) : {code}throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01);{code} It seems like there's a i missing. I know it's a very minor typo but I report it anyway. I won't attach a patch because it would be too long for me to SVN checkout just for 1 letter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HIVE-2401) Show functions with regex not working
[ https://issues.apache.org/jira/browse/HIVE-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095671#comment-13095671 ] John Sichi edited comment on HIVE-2401 at 9/1/11 11:31 PM: --- The wiki already explains how to do this. I don't think we need any behavior change here. hive show functions 'm.*'; OK map map_keys map_values max min minute month was (Author: jvs): The wiki already explains how to do this. I don't hink we need any behavior change here. hive show functions 'm.*'; OK map map_keys map_values max min minute month Show functions with regex not working - Key: HIVE-2401 URL: https://issues.apache.org/jira/browse/HIVE-2401 Project: Hive Issue Type: Improvement Components: CLI Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2401.patch show functions a; If it gives all the function names starting with a is easy to search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2337: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Charles! Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
[jira] [Updated] (HIVE-2184) Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()
[ https://issues.apache.org/jira/browse/HIVE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2184: - Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Chinna! Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close() --- Key: HIVE-2184 URL: https://issues.apache.org/jira/browse/HIVE-2184 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.9.0 Attachments: HIVE-2184.1.patch, HIVE-2184.1.patch, HIVE-2184.2.patch, HIVE-2184.3.patch, HIVE-2184.patch 1)Hive.close() will call HiveMetaStoreClient.close() in this method the variable standAloneClient is never become true then client.shutdown() never call. 2)Hive.close() After calling metaStoreClient.close() need to make metaStoreClient=null -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2383) Incorrect alias filtering for predicate pushdown
[ https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2383: - Resolution: Fixed Fix Version/s: (was: 0.8.0) 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Passed tests and committed to trunk. Thanks Charles! Incorrect alias filtering for predicate pushdown Key: HIVE-2383 URL: https://issues.apache.org/jira/browse/HIVE-2383 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Charles Chen Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch The predicate pushdown optimizer starts at the topmost operators traverses the operator tree, at each stage collecting predicates to be pushed down. At each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is called, which merges the predicates of the children nodes into the current node. The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds as a map from the alias a predicate refers to (a predicate may only refer to one alias at a time as only such predicates can be pushed) to a list of such predicates. Since at each stage the alias the predicate refers to may change (subqueries may change aliases), this is updated for each operator (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks the ExprNodeDesc for each predicate). When a JoinOperator is encountered, mergeWithChildrenPred is passed an optional parameter aliases which contains a set of aliases that can be pushed per ansi semantics (see hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases). The part that is incorrect is that aliases are filtered in mergeWithChildrenPred before extractPushdownPreds is called, which associates the predicates with the correct alias in the current operator's context while the filtering should happen after. In test case Q2 below, when the predicate a.bar=3 comes into the JoinOperator, the alias is a coming in so it is accepted for pushdown. When brought into the JoinOperator's context, however, since the predicate refers to b.foo in the inner scope, we should not actually accept this for pushdown. With the test cases {noformat} -- Q1: predicate should not be pushed on the right side of a left outer join (this is correct in trunk) explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; -- Q2: predicate should not be pushed on the right side of a left outer join (this is broken in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; -- Q3: predicate should be pushed (this is correct in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, a.bar FROM pokes a JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; {noformat} The current output is {noformat} hive -- Q1: predicate should not be pushed on the right side of a left outer join explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. (TOK_TABLE_OR_COL b) foo (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3 STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 0 value expressions: expr: foo type: int b TableScan alias: b Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 1
[jira] [Resolved] (HIVE-1395) Table aliases are ambiguous
[ https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-1395. -- Resolution: Won't Fix We're fixing the bugs and sticking with the normal SQL rules, which allow duplicate aliases, for the reasons mentioned above. Table aliases are ambiguous --- Key: HIVE-1395 URL: https://issues.apache.org/jira/browse/HIVE-1395 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Adam Kramer Consider this query: SELECT a.num FROM ( SELECT a.num AS num, b.num AS num2 FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num ) a WHERE a.num2 IS NULL; ...in this case, the table alias 'a' is ambiguous. It could be the outer table (i.e., the subquery result), or it could be the inner table (foo). In the above case, Hive silently parses the outer reference to a as the inner reference. The result, then, is akin to: SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad. The bigger problem, however, is that Hive even lets people use the same table alias at multiple points in the query. We should simply throw an exception during the parse stage if there is any ambiguity in which table is which, just like we do if the column names are ambiguous. Or, if for some reason we need people to be able to use 'a' to refer to multiple tables or subqueries, it would be excellent if the exact parsing structure were made clear and added to the wiki. In that case, I will file a separate bug JIRA to complain about how it should be different. :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name
[ https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-1342. -- Resolution: Fixed Fix Version/s: 0.9.0 Fixed by committing sub-issues (not the patches attached to this issue). Predicate push down get error result when sub-queries have the same alias name --- Key: HIVE-1342 URL: https://issues.apache.org/jira/browse/HIVE-1342 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Ted Xu Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-1342v1.patch, HIVE-1342v2.patch, HIVE-1342v3.patch, HIVE-1342v4.patch, cmd.hql, explain, ppd_same_alias_1.patch, ppd_same_alias_2.patch Query is over-optimized by PPD when sub-queries have the same alias name, see the query: --- create table if not exists dm_fact_buyer_prd_info_d ( category_id string ,gmv_trade_num int ,user_idint ) PARTITIONED BY (ds int); set hive.optimize.ppd=true; set hive.map.aggr=true; explain select category_id1,category_id2,assoc_idx from ( select category_id1 , category_id2 , count(distinct user_id) as assoc_idx from ( select t1.category_id as category_id1 , t2.category_id as category_id2 , t1.user_id from ( select category_id, user_id from dm_fact_buyer_prd_info_d group by category_id, user_id ) t1 join ( select category_id, user_id from dm_fact_buyer_prd_info_d group by category_id, user_id ) t2 on t1.user_id=t2.user_id ) t1 group by category_id1, category_id2 ) t_o where category_id1 category_id2 and assoc_idx 2; - The query above will fail when execute, throwing exception: can not cast UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text). I explained the query and the execute plan looks really wired ( only Stage-1, see the highlighted predicate): --- Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t_o:t1:t1:dm_fact_buyer_prd_info_d TableScan alias: dm_fact_buyer_prd_info_d Filter Operator predicate: expr: *(category_id user_id)* type: boolean Select Operator expressions: expr: category_id type: string expr: user_id type: bigint outputColumnNames: category_id, user_id Group By Operator keys: expr: category_id type: string expr: user_id type: bigint mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: bigint sort order: ++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: bigint tag: -1 Reduce Operator Tree: Group By Operator keys: expr: KEY._col0 type: string expr: KEY._col1 type: bigint mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
[jira] [Commented] (HIVE-2383) Incorrect alias filtering for predicate pushdown
[ https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095020#comment-13095020 ] John Sichi commented on HIVE-2383: -- Oh, um, also: +1. Incorrect alias filtering for predicate pushdown Key: HIVE-2383 URL: https://issues.apache.org/jira/browse/HIVE-2383 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Charles Chen Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch The predicate pushdown optimizer starts at the topmost operators traverses the operator tree, at each stage collecting predicates to be pushed down. At each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is called, which merges the predicates of the children nodes into the current node. The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds as a map from the alias a predicate refers to (a predicate may only refer to one alias at a time as only such predicates can be pushed) to a list of such predicates. Since at each stage the alias the predicate refers to may change (subqueries may change aliases), this is updated for each operator (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks the ExprNodeDesc for each predicate). When a JoinOperator is encountered, mergeWithChildrenPred is passed an optional parameter aliases which contains a set of aliases that can be pushed per ansi semantics (see hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases). The part that is incorrect is that aliases are filtered in mergeWithChildrenPred before extractPushdownPreds is called, which associates the predicates with the correct alias in the current operator's context while the filtering should happen after. In test case Q2 below, when the predicate a.bar=3 comes into the JoinOperator, the alias is a coming in so it is accepted for pushdown. When brought into the JoinOperator's context, however, since the predicate refers to b.foo in the inner scope, we should not actually accept this for pushdown. With the test cases {noformat} -- Q1: predicate should not be pushed on the right side of a left outer join (this is correct in trunk) explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; -- Q2: predicate should not be pushed on the right side of a left outer join (this is broken in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; -- Q3: predicate should be pushed (this is correct in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, a.bar FROM pokes a JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; {noformat} The current output is {noformat} hive -- Q1: predicate should not be pushed on the right side of a left outer join explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. (TOK_TABLE_OR_COL b) foo (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3 STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 0 value expressions: expr: foo type: int b TableScan alias: b Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 1 value expressions: expr: foo type: int expr: bar type: int
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095105#comment-13095105 ] John Sichi commented on HIVE-2337: -- Charles, did you intentionally omit the new ppd_outer_join5.q from the latest patch? Also, there's a weird non-ASCII character in the Javadoc. Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0}
[jira] [Updated] (HIVE-2382) Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation
[ https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2382: - Resolution: Fixed Fix Version/s: (was: 0.8.0) 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Charles! Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation --- Key: HIVE-2382 URL: https://issues.apache.org/jira/browse/HIVE-2382 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Charles Chen Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-2382v1.patch, HIVE-2382v2.patch When a GROUP BY is specified, a select operator is added before the GROUP BY in SemanticAnalyzer.insertSelectAllPlanForGroupBy. Currently, the column expression map for this is set to the column expression map for the parent operator. This behavior is incorrect as, for example, the parent operator could rearrange the order of the columns (_col0 = _col0, _col1 = _col2, _col2 = _col1) and the new operator should not repeat this. The predicate pushdown optimization uses the column expression map to track which columns a filter expression refers to at different operators. This results in a filter on incorrect columns. Here is a simple case of this going wrong: Using {noformat} create table invites (id int, foo int, bar int); {noformat} executing the query {noformat} explain select * from (select foo, bar from (select bar, foo from invites c union all select bar, foo from invites d) b) a group by bar, foo having bar=1; {noformat} results in {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a-subquery1:b-subquery1:c TableScan alias: c Filter Operator predicate: expr: (foo = 1) type: boolean Select Operator expressions: expr: bar type: int expr: foo type: int outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col1 type: int expr: _col0 type: int outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: int expr: _col1 type: int outputColumnNames: _col0, _col1 Group By Operator bucketGroup: false keys: expr: _col1 type: int expr: _col0 type: int mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: int expr: _col1 type: int sort order: ++ Map-reduce partition columns: expr: _col0 type: int expr: _col1 type: int tag: -1 a-subquery2:b-subquery2:d TableScan alias: d Filter Operator predicate: expr: (foo = 1) type: boolean Select Operator expressions: expr: bar type: int expr: foo type: int outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col1 type: int expr: _col0 type: int outputColumnNames: _col0, _col1 Select Operator expressions: