[jira] [Updated] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5538: --- Status: Open (was: Patch Available) Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5538: --- Attachment: HIVE-5538.2.patch Good point! Uploading the patch re-based against latest trunk. Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5538: --- Status: Patch Available (was: Open) Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6318) Document SSL support added to HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972361#comment-13972361 ] Vaibhav Gumashta commented on HIVE-6318: [~leftylev] Sorry for the late revert. After the missing list you shared in hive 13 release thread, I gave the description to [~rhbutani], who's submitted those as a cumulative patch. Thanks a lot for the nudge! Document SSL support added to HiveServer2 - Key: HIVE-6318 URL: https://issues.apache.org/jira/browse/HIVE-6318 Project: Hive Issue Type: Sub-task Components: HiveServer2, JDBC Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 SSL support is/will be added to HiveServer2 running in both binary and http mode, in unsecured auth modes. Need to document the usage and setup. Linking relevant jiras. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6466) Add support for pluggable authentication modules (PAM) in Hive
[ https://issues.apache.org/jira/browse/HIVE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972362#comment-13972362 ] Vaibhav Gumashta commented on HIVE-6466: [~leftylev] Thanks a lot for the edits! Add support for pluggable authentication modules (PAM) in Hive -- Key: HIVE-6466 URL: https://issues.apache.org/jira/browse/HIVE-6466 Project: Hive Issue Type: New Feature Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6466.1.patch, HIVE-6466.2.patch More on PAM in these articles: http://www.tuxradar.com/content/how-pam-works https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Pluggable_Authentication_Modules.html Usage from JPAM api: http://jpam.sourceforge.net/JPamUserGuide.html#id.s7.1 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972363#comment-13972363 ] Vaibhav Gumashta commented on HIVE-6468: Thanks a lot for the edits corrections [~leftylev]! The doc looks good. HS2 out of memory error when curl sends a get request - Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab Assignee: Navis Attachments: HIVE-6468.1.patch.txt We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions
[ https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated HIVE-6427: --- Attachment: (was: HIVE-6427-2.patch) Hive Server2 should reopen Metastore client in case of any Thrift exceptions Key: HIVE-6427 URL: https://issues.apache.org/jira/browse/HIVE-6427 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Priority: Critical Attachments: HIVE-6427.patch In case of metastore restart hive server doesn't reopen connection to metastore. Any command gives broken pipe or similar exceptions. http://paste.ubuntu.com/6926215/ Any subsequent command doesn't reestablish connection and tries to use stale (closed) connection. Looks like we shouldn't blindly convert any MetaException to HiveSQLException, but should distinguish between fatal exceptions and logical exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions
[ https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated HIVE-6427: --- Attachment: HIVE-6427.patch Hive Server2 should reopen Metastore client in case of any Thrift exceptions Key: HIVE-6427 URL: https://issues.apache.org/jira/browse/HIVE-6427 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Environment: cloudera cdh5 beta2 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Priority: Critical Attachments: HIVE-6427.patch In case of metastore restart hive server doesn't reopen connection to metastore. Any command gives broken pipe or similar exceptions. http://paste.ubuntu.com/6926215/ Any subsequent command doesn't reestablish connection and tries to use stale (closed) connection. Looks like we shouldn't blindly convert any MetaException to HiveSQLException, but should distinguish between fatal exceptions and logical exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions
[ https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated HIVE-6427: --- Environment: (was: cloudera cdh5 beta2) Hive Server2 should reopen Metastore client in case of any Thrift exceptions Key: HIVE-6427 URL: https://issues.apache.org/jira/browse/HIVE-6427 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Priority: Critical Attachments: HIVE-6427.patch In case of metastore restart hive server doesn't reopen connection to metastore. Any command gives broken pipe or similar exceptions. http://paste.ubuntu.com/6926215/ Any subsequent command doesn't reestablish connection and tries to use stale (closed) connection. Looks like we shouldn't blindly convert any MetaException to HiveSQLException, but should distinguish between fatal exceptions and logical exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 20444: Hive Server2 should reopen Metastore client connection in case of any Thrift exceptions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20444/ --- Review request for hive. Bugs: HIVE-6427 https://issues.apache.org/jira/browse/HIVE-6427 Repository: hive-git Description --- Connection to metastore should be reestablished. TExceptions should not be swallowed. Diffs - hcatalog/core/src/test/java/org/apache/hive/hcatalog/cli/TestSemanticAnalysis.java 3cc548e hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClientHMSImpl.java c4b5971 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java 5410b45 service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java a9d5902 Diff: https://reviews.apache.org/r/20444/diff/ Testing --- Using in our production more then 1 month. Thanks, Andrey Stepachev
[jira] [Commented] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions
[ https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972527#comment-13972527 ] Andrey Stepachev commented on HIVE-6427: https://reviews.apache.org/r/20444/ Hive Server2 should reopen Metastore client in case of any Thrift exceptions Key: HIVE-6427 URL: https://issues.apache.org/jira/browse/HIVE-6427 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Priority: Critical Attachments: HIVE-6427.patch In case of metastore restart hive server doesn't reopen connection to metastore. Any command gives broken pipe or similar exceptions. http://paste.ubuntu.com/6926215/ Any subsequent command doesn't reestablish connection and tries to use stale (closed) connection. Looks like we shouldn't blindly convert any MetaException to HiveSQLException, but should distinguish between fatal exceptions and logical exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972790#comment-13972790 ] Justin Coffey commented on HIVE-6920: - cc: [~brocknoland] [~xuefuz] Parquet Serde Simplification Key: HIVE-6920 URL: https://issues.apache.org/jira/browse/HIVE-6920 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6920.patch Various fixes and code simplification in the ParquetHiveSerde (with minor optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing
[ https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-538: Attachment: HIVE-538.patch I've attached a patch that builds a self-containing jar - make hive_jdbc.jar self-containing -- Key: HIVE-538 URL: https://issues.apache.org/jira/browse/HIVE-538 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0 Reporter: Raghotham Murthy Assignee: Nick White Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are required in the classpath to run jdbc applications on hive. We need to do atleast the following to get rid of most unnecessary dependencies: 1. get rid of dynamic serde and use a standard serialization format, maybe tab separated, json or avro 2. dont use hadoop configuration parameters 3. repackage thrift and fb303 classes into hive_jdbc.jar -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere
[ https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-6923: - Assignee: Nick White Status: Patch Available (was: Open) Use slf4j For Logging Everywhere Key: HIVE-6923 URL: https://issues.apache.org/jira/browse/HIVE-6923 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Nick White Assignee: Nick White Fix For: 0.13.0 Attachments: HIVE-6923.patch Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've attached a patch to tidy this up, by just using slf4j for all loggers. This means that applications using the JDBC driver can make Hive log through their own slf4j implementation consistently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere
[ https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-6923: - Attachment: HIVE-6923.patch Use slf4j For Logging Everywhere Key: HIVE-6923 URL: https://issues.apache.org/jira/browse/HIVE-6923 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Nick White Fix For: 0.13.0 Attachments: HIVE-6923.patch Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've attached a patch to tidy this up, by just using slf4j for all loggers. This means that applications using the JDBC driver can make Hive log through their own slf4j implementation consistently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6923) Use slf4j For Logging Everywhere
Nick White created HIVE-6923: Summary: Use slf4j For Logging Everywhere Key: HIVE-6923 URL: https://issues.apache.org/jira/browse/HIVE-6923 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Nick White Fix For: 0.13.0 Attachments: HIVE-6923.patch Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've attached a patch to tidy this up, by just using slf4j for all loggers. This means that applications using the JDBC driver can make Hive log through their own slf4j implementation consistently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6912) HWI not working - HTTP ERROR 500
[ https://issues.apache.org/jira/browse/HIVE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunil ranjan khuntia updated HIVE-6912: --- Priority: Critical (was: Major) HWI not working - HTTP ERROR 500 Key: HIVE-6912 URL: https://issues.apache.org/jira/browse/HIVE-6912 Project: Hive Issue Type: Bug Reporter: sunil ranjan khuntia Priority: Critical I tried to use hive HWI to write hive queries on a UI. As p[er the steps mentioned here https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface I set Ant and ran the hive hwi service. but In browser when i hit http://localhost:/hwi i got the below error HTTP ERROR 500 Problem accessing /hwi/. Reason: Unable to find a javac compiler; com.sun.tools.javac.Main is not on the classpath. Perhaps JAVA_HOME does not point to the JDK. It is currently set to /usr/java/jdk1.6.0_32/jre Caused by: Unable to find a javac compiler; com.sun.tools.javac.Main is not on the classpath. Perhaps JAVA_HOME does not point to the JDK. It is currently set to /usr/java/jdk1.6.0_32/jre at org.apache.tools.ant.taskdefs.compilers.CompilerAdapterFactory.getCompiler(CompilerAdapterFactory.java:129) I have checked and changed JAVA_HOME. But its still the same -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6903) Change default value of hive.metastore.execute.setugi to true
[ https://issues.apache.org/jira/browse/HIVE-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6903: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Thejas for review! Change default value of hive.metastore.execute.setugi to true - Key: HIVE-6903 URL: https://issues.apache.org/jira/browse/HIVE-6903 Project: Hive Issue Type: Task Components: Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-6903.1.patch, HIVE-6903.patch Since its introduction in HIVE-2616 I havent seen any bug reported for it, only grief from users who expect system to work as if this is true by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6913) Hive unable to find the hashtable file during complex multi-staged map join
[ https://issues.apache.org/jira/browse/HIVE-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973072#comment-13973072 ] Ashutosh Chauhan commented on HIVE-6913: +1 Hive unable to find the hashtable file during complex multi-staged map join --- Key: HIVE-6913 URL: https://issues.apache.org/jira/browse/HIVE-6913 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-6913.patch, HIVE-6913.patch If a query has multiple mapjoins and one of the tables to be mapjoined is empty, the query can result in a no such file or directory when looking for the hashtable. This is because when we generate a dummy hash table, we do not close the TableScan (TS) operator for that table. Additionally, HashTableSinkOperator (HTSO) outputs it's hash tables in the closeOp method. However, when close is called on HTSO we check to ensure that all parents are closed: https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L333 which is not true on this case, because the TS operator for the empty table was never closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-1608: --- Status: Open (was: Patch Available) [~appodictic] Do you know whats the factor usually? How large is sequence file compared to text in usual scenario? [~brocknoland] It will be good to enlist the benefits we will get by switching over to sequence file. use sequencefile as the default for storing intermediate results Key: HIVE-1608 URL: https://issues.apache.org/jira/browse/HIVE-1608 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-1608.patch The only argument for having a text file for storing intermediate results seems to be better debuggability. But, tailing a sequence file is possible, and it should be more space efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere
[ https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-6923: - Attachment: (was: HIVE-6923.patch) Use slf4j For Logging Everywhere Key: HIVE-6923 URL: https://issues.apache.org/jira/browse/HIVE-6923 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Nick White Assignee: Nick White Fix For: 0.13.0 Attachments: HIVE-6923.patch Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've attached a patch to tidy this up, by just using slf4j for all loggers. This means that applications using the JDBC driver can make Hive log through their own slf4j implementation consistently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6923) Use slf4j For Logging Everywhere
[ https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-6923: - Attachment: HIVE-6923.patch Use slf4j For Logging Everywhere Key: HIVE-6923 URL: https://issues.apache.org/jira/browse/HIVE-6923 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Nick White Assignee: Nick White Fix For: 0.13.0 Attachments: HIVE-6923.patch Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've attached a patch to tidy this up, by just using slf4j for all loggers. This means that applications using the JDBC driver can make Hive log through their own slf4j implementation consistently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4576) templeton.hive.properties does not allow values with commas
[ https://issues.apache.org/jira/browse/HIVE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973089#comment-13973089 ] Ashutosh Chauhan commented on HIVE-4576: Seems like blindly replacing \ after split may run into problems if \ is used in non-escaping context , eg windows path like D:\hive\hive-site.xml or may be I am misreading the patch. templeton.hive.properties does not allow values with commas --- Key: HIVE-4576 URL: https://issues.apache.org/jira/browse/HIVE-4576 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.5.0 Reporter: Vitaliy Fuks Assignee: Eugene Koifman Priority: Minor Attachments: HIVE-4576.patch templeton.hive.properties accepts a comma-separated list of key=value property pairs that will be passed to Hive. However, this makes it impossible to use any value that itself has a comma in it. For example: {code:xml}property nametempleton.hive.properties/name valuehive.metastore.sasl.enabled=false,hive.metastore.uris=thrift://foo1.example.com:9083,foo2.example.com:9083/value /property{code} {noformat}templeton: starting [/usr/bin/hive, --service, cli, --hiveconf, hive.metastore.sasl.enabled=false, --hiveconf, hive.metastore.uris=thrift://foo1.example.com:9083, --hiveconf, foo2.example.com:9083 etc..{noformat} because the value is parsed using standard org.apache.hadoop.conf.Configuration.getStrings() call which simply splits on commas from here: {code:java}for (String prop : appConf.getStrings(AppConfig.HIVE_PROPS_NAME)){code} This is problematic for any hive property that itself has multiple values, such as hive.metastore.uris above or hive.aux.jars.path. There should be some way to escape commas or a different delimiter should be used. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973115#comment-13973115 ] Ashutosh Chauhan commented on HIVE-6361: [~julianhyde] If you are working on this, this may be right time to get this work in Hive. We are just getting started after doing a release. So, this seems like a right time to absorb code churn which we may have here. Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6908) TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures
[ https://issues.apache.org/jira/browse/HIVE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973119#comment-13973119 ] Ashutosh Chauhan commented on HIVE-6908: I am not sure what original author of test had in mind for this. Perhaps [~vgumashta] may know more. TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures -- Key: HIVE-6908 URL: https://issues.apache.org/jira/browse/HIVE-6908 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6908.patch This has failed sometimes in the pre-commit tests. ThriftCLIServiceTest.testExecuteStatementAsync runs two statements. They are given 100 second timeout total, not sure if its by intention. As the first is a select query, it will take a majority of the time. The second statement (create table) should be quicker, but it fails sometimes because timeout is already mostly used up. The timeout should probably be reset after the first statement. If the operation finishes before the timeout, it wont have any effect as it'll break out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5870) Move TestJDBCDriver2.testNewConnectionConfiguration to TestJDBCWithMiniHS2
[ https://issues.apache.org/jira/browse/HIVE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5870: --- Status: Open (was: Patch Available) This one seems to fall through cracks. Lets get this in. [~szehon] Patch needs a rebase. Move TestJDBCDriver2.testNewConnectionConfiguration to TestJDBCWithMiniHS2 -- Key: HIVE-5870 URL: https://issues.apache.org/jira/browse/HIVE-5870 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-5870.patch TestJDBCDriver2.testNewConnectionConfiguration() attempts to start a Hiveserver2 instance in the test. This can cause issues as creating HiveServer2 needs correct environment/path. This test should be moved to TestJdbcWithMiniHS2, which uses MiniHS2. MiniHS2 is for this purpose (setting all the environment properly before starting HiveServer2 instance). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973129#comment-13973129 ] Hive QA commented on HIVE-5538: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12640595/HIVE-5538.2.patch {color:red}ERROR:{color} -1 due to 34 failed/errored test(s), 5405 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quote1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_decimal_date org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/precommit-hive/17/testReport Console output: http://bigtop01.cloudera.org:8080/job/precommit-hive/17/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 34 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12640595 Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973141#comment-13973141 ] Xuefu Zhang commented on HIVE-6835: --- Just curious. If the avro serde is initialized with the table schema (which is the latest), is there a problem for it to read the old data, that is, data that conforms to the partition level metadata? I have seen so many JIRAs about schema evolution, and isn't quite sure what is possible and what is not. The example given here is adding a new column in the beginning. What about other cases, such as adding it at the end, or changing data type, etc? Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973160#comment-13973160 ] Ashutosh Chauhan commented on HIVE-5538: I think its good idea to turn verctorization on by default. Lets triage these failures. Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973166#comment-13973166 ] Ashutosh Chauhan commented on HIVE-6835: I would also like to know answer for Xuefu's questions. It will be good to document what kind of schema evolution is supported by Avro Serde and more importantly what kinds are *not* supported. Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
Sergey Shelukhin created HIVE-6924: -- Summary: MapJoinKeyBytes::hashCode() should use Murmur hash Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973180#comment-13973180 ] Sergey Shelukhin commented on HIVE-6430: We should probably do the same in actual codebase... I'll file a JIRA MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing
[ https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-538: Attachment: (was: HIVE-538.patch) make hive_jdbc.jar self-containing -- Key: HIVE-538 URL: https://issues.apache.org/jira/browse/HIVE-538 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0 Reporter: Raghotham Murthy Assignee: Nick White Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are required in the classpath to run jdbc applications on hive. We need to do atleast the following to get rid of most unnecessary dependencies: 1. get rid of dynamic serde and use a standard serialization format, maybe tab separated, json or avro 2. dont use hadoop configuration parameters 3. repackage thrift and fb303 classes into hive_jdbc.jar -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing
[ https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated HIVE-538: Attachment: HIVE-538.patch make hive_jdbc.jar self-containing -- Key: HIVE-538 URL: https://issues.apache.org/jira/browse/HIVE-538 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0 Reporter: Raghotham Murthy Assignee: Nick White Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are required in the classpath to run jdbc applications on hive. We need to do atleast the following to get rid of most unnecessary dependencies: 1. get rid of dynamic serde and use a standard serialization format, maybe tab separated, json or avro 2. dont use hadoop configuration parameters 3. repackage thrift and fb303 classes into hive_jdbc.jar -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-538) make hive_jdbc.jar self-containing
[ https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973198#comment-13973198 ] Ashutosh Chauhan commented on HIVE-538: --- [~njw45] Can you take a look at HIVE-6593 to see if it satisfies your needs? make hive_jdbc.jar self-containing -- Key: HIVE-538 URL: https://issues.apache.org/jira/browse/HIVE-538 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0 Reporter: Raghotham Murthy Assignee: Nick White Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are required in the classpath to run jdbc applications on hive. We need to do atleast the following to get rid of most unnecessary dependencies: 1. get rid of dynamic serde and use a standard serialization format, maybe tab separated, json or avro 2. dont use hadoop configuration parameters 3. repackage thrift and fb303 classes into hive_jdbc.jar -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6925) show query progress in Beeline
Gwen Shapira created HIVE-6925: -- Summary: show query progress in Beeline Key: HIVE-6925 URL: https://issues.apache.org/jira/browse/HIVE-6925 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Gwen Shapira In the old Hive CLI, the MR output was written to screen. Making it easy to watch the progress - map and reduce % done. In Beeline, there is no output until the query is done (or fails). Showing some kind of progress indicator will be nice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973210#comment-13973210 ] Edward Capriolo commented on HIVE-1608: --- It is not much. SequenceFile + none (codec) only ads some block information around text. I still thing sequence by default is a good idea. It makes it easier to add compression later without sacrificing split- ablility. use sequencefile as the default for storing intermediate results Key: HIVE-1608 URL: https://issues.apache.org/jira/browse/HIVE-1608 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-1608.patch The only argument for having a text file for storing intermediate results seems to be better debuggability. But, tailing a sequence file is possible, and it should be more space efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
Improving test coverage for HiveServer2
First, I would like to thank Prasad Mujumdar for his recent contributions of MiniHS2 and MiniKDC. Those are awesome test infra components to make testing easier for HS2 and kerberos. Thanks, Prasad! With that checked in now in our repo, we can start making use of those to improve our test coverage. There are variety of new features which have landed recently in trunk for HS2 like http transport, pluggable authentication, new authorization model to name a few. There are test cases for these in isolation, but not in combination of other features. e.g, HS2 running on http transport with new auth model with kerberos authentication. Or, HS2 running in binary mode with LDAP authentication. So on and so forth. I don't have good sense which all combinations we want to support and thus test. But for atleast those which we do want to support, seems like its possible to write tests for those using MiniHS2 MiniKDC. I think we can take our existing tests in TestJdbcDriver2 (possibly with little refactoring) and run it against MiniHS2 in various server configurations. Also, we have TestBeelineDriver, which is currently turned off by default. Shall we turn it on? I think it can tests at various level of concurrency. May be to begin with we can set concurrency level at 1 and if things look good bump that number higher up. Thoughts? Thanks, Ashutosh
Remove HCat cli
As far as I can see, all the functionality it provides can be provided by hive cli with some configuration. There is functionality like -g and -p option which it has, for which if there are users can be added to hive cli. So, it seems we can get rid of HCatCli.java and its friends as well as bin/hcat If dev folks think positively about this we can ask on user list to see how users feel about it. Thanks, Ashutosh
Remove HiveServer1
HiveServer2 was introduced in Hive 0.10 since than we have 3 releases 0.11, 0.12 soon to be 0.13. I think its a high time we remove HS1 from our trunk. Thoughts? Thanks, Ashutosh
[jira] [Commented] (HIVE-1643) support range scans and non-key columns in HBase filter pushdown
[ https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973241#comment-13973241 ] Sandy Pratt commented on HIVE-1643: --- Craig, I've been running my patch for this issue in production for at least a year now, and it seems to work well enough. I have an item on my plate to contribute the source, but it will have to wait until I have an opening in my schedule. Because the HBase handler is a pluggable SerDe, and my implementation strays a bit from the one in Hive, I'll probably stick it on Github or something and post a pointer here. support range scans and non-key columns in HBase filter pushdown Key: HIVE-1643 URL: https://issues.apache.org/jira/browse/HIVE-1643 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: John Sichi Assignee: bharath v Labels: patch Attachments: HIVE-1643.patch, Hive-1643.2.patch, hbase_handler.patch HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus conjunctions etc). Non-rowkey conditions can't be used to filter out entire ranges, but they can be used to push the per-row filter processing as far down as possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6924: --- Attachment: HIVE-6924.patch MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6924: --- Status: Patch Available (was: Open) MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973244#comment-13973244 ] Sergey Shelukhin commented on HIVE-6924: [~t3rmin4t0r] fyi [~ashutoshc] can you please +1? :) MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Remove HiveServer1
I am +1 on it. I'd also add that we removed JDBC-1 which was supposed to work with HiveServer1. Thanks, --Vaibhav On Thu, Apr 17, 2014 at 11:26 AM, Ashutosh Chauhan hashut...@apache.orgwrote: HiveServer2 was introduced in Hive 0.10 since than we have 3 releases 0.11, 0.12 soon to be 0.13. I think its a high time we remove HS1 from our trunk. Thoughts? Thanks, Ashutosh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too
[ https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-6756: --- Attachment: HIVE-6756.patch In alert table file format for the ORC and RC file formats are setting the corresponding serdes, reaming file formats are not setting the corresponding serde. In create table if we are not specifying the serde other than ORC and RC file formats it is setting with LazySimpleSerDe, like create table in alert table set file format added this. alter table set fileformat should set serde too --- Key: HIVE-6756 URL: https://issues.apache.org/jira/browse/HIVE-6756 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Chinna Rao Lalam Attachments: HIVE-6756.patch Currently doing alter table set fileformat doesn't change the serde. This is unexpected by customers because the serdes are largely file format specific. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too
[ https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-6756: --- Status: Patch Available (was: Open) alter table set fileformat should set serde too --- Key: HIVE-6756 URL: https://issues.apache.org/jira/browse/HIVE-6756 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Chinna Rao Lalam Attachments: HIVE-6756.patch Currently doing alter table set fileformat doesn't change the serde. This is unexpected by customers because the serdes are largely file format specific. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6912) HWI not working - HTTP ERROR 500
[ https://issues.apache.org/jira/browse/HIVE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-6912. Resolution: Duplicate Fix Version/s: 0.13.0 This has been fixed via HIVE-5132 HWI not working - HTTP ERROR 500 Key: HIVE-6912 URL: https://issues.apache.org/jira/browse/HIVE-6912 Project: Hive Issue Type: Bug Reporter: sunil ranjan khuntia Priority: Critical Fix For: 0.13.0 I tried to use hive HWI to write hive queries on a UI. As p[er the steps mentioned here https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface I set Ant and ran the hive hwi service. but In browser when i hit http://localhost:/hwi i got the below error HTTP ERROR 500 Problem accessing /hwi/. Reason: Unable to find a javac compiler; com.sun.tools.javac.Main is not on the classpath. Perhaps JAVA_HOME does not point to the JDK. It is currently set to /usr/java/jdk1.6.0_32/jre Caused by: Unable to find a javac compiler; com.sun.tools.javac.Main is not on the classpath. Perhaps JAVA_HOME does not point to the JDK. It is currently set to /usr/java/jdk1.6.0_32/jre at org.apache.tools.ant.taskdefs.compilers.CompilerAdapterFactory.getCompiler(CompilerAdapterFactory.java:129) I have checked and changed JAVA_HOME. But its still the same -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973253#comment-13973253 ] Ashutosh Chauhan commented on HIVE-6924: Not sure, but I hear Cuckoo hashing is even better. We have internal implementation of it in ql/exec/vector/expressions/CuckooSetBytes.java Shall we use that? MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6926) HiveServer2 should use tcp instead of binary as the name of the transport mode
Vaibhav Gumashta created HIVE-6926: -- Summary: HiveServer2 should use tcp instead of binary as the name of the transport mode Key: HIVE-6926 URL: https://issues.apache.org/jira/browse/HIVE-6926 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 I think the name binary really doesn't convey anything. I'll make the change in a backward compatible way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
[ https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973268#comment-13973268 ] Ashutosh Chauhan commented on HIVE-6862: +1 add DB schema DDL and upgrade 12to13 scripts for MS SQL Server -- Key: HIVE-6862 URL: https://issues.apache.org/jira/browse/HIVE-6862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-6862.2.patch, HIVE-6862.3.patch, HIVE-6862.patch need to add a unifed 0.13 script and a separate script for ACID support NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6768) remove hcatalog/webhcat/svr/src/main/config/override-container-log4j.properties
[ https://issues.apache.org/jira/browse/HIVE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973271#comment-13973271 ] Ashutosh Chauhan commented on HIVE-6768: In addition to this file, I assume we also need to revert changes introduced in HIVE-5511 [~ekoifman] would you like to attach a patch for this? remove hcatalog/webhcat/svr/src/main/config/override-container-log4j.properties --- Key: HIVE-6768 URL: https://issues.apache.org/jira/browse/HIVE-6768 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman now that MAPREDUCE-5806 is fixed we can remove override-container-log4j.properties and and all the logic around this which was introduced in HIVE-5511 to work around MAPREDUCE-5806 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973279#comment-13973279 ] Anthony Hsu commented on HIVE-6835: --- The AvroSerDe handles schema evolution as described in http://avro.apache.org/docs/current/spec.html#Schema+Resolution. However, in the Hive code, the AvroSerDe needs to always be initialized with the latest schema so that ObjectInspectorConverters.getConvertedOI() (in FetchOperator:getRecordReader()) will work. When the AvroSerDe actually reads the Avro file, it will then compare the latest schema to the actual schema stored in the Avro file and do schema resolution/evolution. Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too
[ https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6756: --- Status: Open (was: Patch Available) I think instead of always defaulting to LazySimpleSerde, better is to set LazySimpleSerde for Textfile and SequenceFile format only and throw exception in cases where serde is not specified. We cant assume other file format uses LazySimpleSerde. alter table set fileformat should set serde too --- Key: HIVE-6756 URL: https://issues.apache.org/jira/browse/HIVE-6756 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Chinna Rao Lalam Attachments: HIVE-6756.patch Currently doing alter table set fileformat doesn't change the serde. This is unexpected by customers because the serdes are largely file format specific. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Remove HiveServer1
+1 removing server1 and related. However, +1 on keeping Hive CLI. On Thu, Apr 17, 2014 at 11:34 AM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: I am +1 on it. I'd also add that we removed JDBC-1 which was supposed to work with HiveServer1. Thanks, --Vaibhav On Thu, Apr 17, 2014 at 11:26 AM, Ashutosh Chauhan hashut...@apache.org wrote: HiveServer2 was introduced in Hive 0.10 since than we have 3 releases 0.11, 0.12 soon to be 0.13. I think its a high time we remove HS1 from our trunk. Thoughts? Thanks, Ashutosh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973285#comment-13973285 ] Brock Noland commented on HIVE-1608: The big win here is that columns with new lines don't get screwed up by default. That is they work out of the box. use sequencefile as the default for storing intermediate results Key: HIVE-1608 URL: https://issues.apache.org/jira/browse/HIVE-1608 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-1608.patch The only argument for having a text file for storing intermediate results seems to be better debuggability. But, tailing a sequence file is possible, and it should be more space efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Remove HCat cli
The HCat wikidoc lists some of the differences between Hive and HCat CLIs here: Hive CLIhttps://cwiki.apache.org/confluence/display/Hive/HCatalog+CLI#HCatalogCLI-HiveCLI . -- Lefty On Thu, Apr 17, 2014 at 2:24 PM, Ashutosh Chauhan hashut...@apache.orgwrote: As far as I can see, all the functionality it provides can be provided by hive cli with some configuration. There is functionality like -g and -p option which it has, for which if there are users can be added to hive cli. So, it seems we can get rid of HCatCli.java and its friends as well as bin/hcat If dev folks think positively about this we can ask on user list to see how users feel about it. Thanks, Ashutosh
[jira] [Created] (HIVE-6927) Add support for MSSQL in schematool
Deepesh Khandelwal created HIVE-6927: Summary: Add support for MSSQL in schematool Key: HIVE-6927 URL: https://issues.apache.org/jira/browse/HIVE-6927 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Schematool is the preferred way of initializing schema for Hive. Since HIVE-6862 provided the script for MSSQL it would be nice to add the support for it in schematool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6927) Add support for MSSQL in schematool
[ https://issues.apache.org/jira/browse/HIVE-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6927: - Attachment: HIVE-6927.patch Attaching the patch for review. Add support for MSSQL in schematool --- Key: HIVE-6927 URL: https://issues.apache.org/jira/browse/HIVE-6927 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6927.patch Schematool is the preferred way of initializing schema for Hive. Since HIVE-6862 provided the script for MSSQL it would be nice to add the support for it in schematool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6919) hive sql std auth select query fails on partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-6919. Resolution: Fixed Fix Version/s: 0.14.0 Committed to trunk. Thanks, Thejas! hive sql std auth select query fails on partitioned tables -- Key: HIVE-6919 URL: https://issues.apache.org/jira/browse/HIVE-6919 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Fix For: 0.14.0 Attachments: HIVE-6919.1.patch {code} analyze table studentparttab30k partition (ds) compute statistics; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied. Principal [name=hadoopqa, type=USER] does not have following privileges on Object [type=PARTITION, name=null] : [SELECT] (state=42000,code=4) {code} Sql std auth is supposed to ignore partition level objects for privilege checks, but that is not working as intended. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973327#comment-13973327 ] Xuefu Zhang commented on HIVE-6835: --- {quote} in the Hive code, the AvroSerDe needs to always be initialized with the latest schema so that ObjectInspectorConverters.getConvertedOI() (in FetchOperator:getRecordReader()) will work. {quote} [~erwaman] I guess I don't quite follow this. The exception stack shows that casting error happens when reading old data with partition schema which is old schema. If the schema matches the data, I'm not sure why we'd have this casting error? On the other hand, if we use the new schema and read old data, would it be possible that error might arise? Anyway, I'm not fully understanding the real cause of the problem and how the change will address all other possible scenarios. Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6928) Beeline should not chop off describe extended results by default
Szehon Ho created HIVE-6928: --- Summary: Beeline should not chop off describe extended results by default Key: HIVE-6928 URL: https://issues.apache.org/jira/browse/HIVE-6928 Project: Hive Issue Type: Bug Components: CLI Reporter: Szehon Ho By default, beeline truncates long results based on the console width like: +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | | | | Detailed Table Information | Table(tableName:refills, dbName:default, owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto | +-+--+ 5 rows selected (0.4 seconds) This can be changed by !outputformat, but the default should behave better to give a better experience to the first-time beeline user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6929) hcatalog packaging is not fully integrated with hive
Ashutosh Chauhan created HIVE-6929: -- Summary: hcatalog packaging is not fully integrated with hive Key: HIVE-6929 URL: https://issues.apache.org/jira/browse/HIVE-6929 Project: Hive Issue Type: Task Components: HCatalog, WebHCat Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Currently, if you run {{mvn package}} hcatalog jars are in {{hcatalog/share/hcatalog}} and similarly webhcat jars. All other hive jars are in lib/ and thats where hcatalog jars should also be. Similar is the story for webhcat. To reduce confusion, its better that hcatalog follow hive's dir structure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6929) hcatalog packaging is not fully integrated with hive
[ https://issues.apache.org/jira/browse/HIVE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973340#comment-13973340 ] Ashutosh Chauhan commented on HIVE-6929: Practical problem it leads into is hcatalog jars are not available by default in hive classpath. So, if you want to make use of hcatalog functionality you need to somehow get those in hive's classpath. This makes for a bad user experience. cc: [~susanths] [~ekoifman] hcatalog packaging is not fully integrated with hive Key: HIVE-6929 URL: https://issues.apache.org/jira/browse/HIVE-6929 Project: Hive Issue Type: Task Components: HCatalog, WebHCat Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Currently, if you run {{mvn package}} hcatalog jars are in {{hcatalog/share/hcatalog}} and similarly webhcat jars. All other hive jars are in lib/ and thats where hcatalog jars should also be. Similar is the story for webhcat. To reduce confusion, its better that hcatalog follow hive's dir structure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6929) hcatalog packaging is not fully integrated with hive
[ https://issues.apache.org/jira/browse/HIVE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973369#comment-13973369 ] Sushanth Sowmyan commented on HIVE-6929: Agreed. We should streamline jar locations for hcat to be in hive standard locations for 0.14. hcatalog packaging is not fully integrated with hive Key: HIVE-6929 URL: https://issues.apache.org/jira/browse/HIVE-6929 Project: Hive Issue Type: Task Components: HCatalog, WebHCat Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Currently, if you run {{mvn package}} hcatalog jars are in {{hcatalog/share/hcatalog}} and similarly webhcat jars. All other hive jars are in lib/ and thats where hcatalog jars should also be. Similar is the story for webhcat. To reduce confusion, its better that hcatalog follow hive's dir structure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-2302) Allow grant privileges on granting privileges.
[ https://issues.apache.org/jira/browse/HIVE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-2302. Resolution: Fixed Fix Version/s: 0.13.0 This is now possible via SQL std based authorization introduced in HIVE-5837 which is going to be available in Hive 0.13 Allow grant privileges on granting privileges. -- Key: HIVE-2302 URL: https://issues.apache.org/jira/browse/HIVE-2302 Project: Hive Issue Type: Improvement Components: Authorization, Security Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: Guy Doulberg Assignee: Mohammad Kamrul Islam Fix For: 0.13.0 Today any user can grant him and any other users privileges on schemas and tables. This way the administrator can not be sure that the rules he had apply are fulfilled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF
[ https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973438#comment-13973438 ] Jason Dere commented on HIVE-6922: -- Would you be able to add a testcase for this bug? NullPointerException in collect_set() UDAF -- Key: HIVE-6922 URL: https://issues.apache.org/jira/browse/HIVE-6922 Project: Hive Issue Type: Bug Components: UDF Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6922.patch Steps to reproduce the bug: {noformat} create table temp(key int, value string); -- leave the table empty select collect_set(key) from temp where key=0; Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) ... 9 more {noformat} The root cause is that in GenericUDAFMkCollectionEvaluator.merge() partialResult could be null but is not validated before it is used. {code} ListObject partialResult = (ArrayListObject) internalMergeOI.getList(partial); for(Object i : partialResult) { putIntoCollection(i, myagg); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
[ https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6862: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Eugene! Lefty I edited that line while committing. add DB schema DDL and upgrade 12to13 scripts for MS SQL Server -- Key: HIVE-6862 URL: https://issues.apache.org/jira/browse/HIVE-6862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-6862.2.patch, HIVE-6862.3.patch, HIVE-6862.patch need to add a unifed 0.13 script and a separate script for ACID support NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973455#comment-13973455 ] Sergey Shelukhin commented on HIVE-6924: Hmm.. cuckoo hashing is a method for conflict resolution, right? This is the hash function itself. MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
Minimum supported versions for DB backing Metastore
I don't think we have documented anywhere what versions of mysql / postgres / oracle / ms-sql we are supporting. It will be good to document those. I propose following versions: * Derby - 10.10.1.1 - defined in hive's pom, so all unit tests run with it. * MySQL - 5.6.17 - minimum supported version by mysql community * Postgres - 9.1.13- has support for create table if not exists which is good to have * Oracle - 11g - oldest oracle version available to download from their site * MSSQL server - 2008 R2 - one which is currently tested against. Thoughts? Ashutosh
Re: Minimum supported versions for DB backing Metastore
Do we have ms-sql scripts these days? Last time I checked we did not. I think we need them to claim ms-sql support. On Apr 17, 2014 4:59 PM, Ashutosh Chauhan hashut...@apache.org wrote: I don't think we have documented anywhere what versions of mysql / postgres / oracle / ms-sql we are supporting. It will be good to document those. I propose following versions: * Derby - 10.10.1.1 - defined in hive's pom, so all unit tests run with it. * MySQL - 5.6.17 - minimum supported version by mysql community * Postgres - 9.1.13- has support for create table if not exists which is good to have * Oracle - 11g - oldest oracle version available to download from their site * MSSQL server - 2008 R2 - one which is currently tested against. Thoughts? Ashutosh
[jira] [Commented] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973500#comment-13973500 ] Julian Hyde commented on HIVE-6361: --- Agreed. Expect a patch in about a week. Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Minimum supported versions for DB backing Metastore
we do: https://issues.apache.org/jira/browse/HIVE-6862 On Thu, Apr 17, 2014 at 3:12 PM, Brock Noland br...@cloudera.com wrote: Do we have ms-sql scripts these days? Last time I checked we did not. I think we need them to claim ms-sql support. On Apr 17, 2014 4:59 PM, Ashutosh Chauhan hashut...@apache.org wrote: I don't think we have documented anywhere what versions of mysql / postgres / oracle / ms-sql we are supporting. It will be good to document those. I propose following versions: * Derby - 10.10.1.1 - defined in hive's pom, so all unit tests run with it. * MySQL - 5.6.17 - minimum supported version by mysql community * Postgres - 9.1.13- has support for create table if not exists which is good to have * Oracle - 11g - oldest oracle version available to download from their site * MSSQL server - 2008 R2 - one which is currently tested against. Thoughts? Ashutosh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973531#comment-13973531 ] Ashutosh Chauhan commented on HIVE-6924: I see. Thats correct. +1 MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6930) Beeline should nicely format timestamps when displaying results
Gwen Shapira created HIVE-6930: -- Summary: Beeline should nicely format timestamps when displaying results Key: HIVE-6930 URL: https://issues.apache.org/jira/browse/HIVE-6930 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Gwen Shapira When I have a timestamp column in my query, I get the results back as the bigint with number of seconds since epoch. Not very user friendly or readable. This means that all my queries need to include stuff like: select from_unixtime(cast(round(transaction_ts/1000) as bigint))... which is not too readable either :) Other SQL query tools automatically convert timestamps to some standard readable date format. They even let users specify the default formatting by setting a parameter (for example NLS_DATE_FORMAT for Oracle). I'd love to see something like that in beeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6843) INSTR for UTF-8 returns incorrect position
[ https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973566#comment-13973566 ] Jason Dere commented on HIVE-6843: -- Should this also work for unicode characters which require more than one Java character? If you add these checks to TestGenericUDFUtils, the 2nd check fails: {code} Assert.assertEquals(3, GenericUDFUtils.findText(new Text(123\uD801\uDC00456), new Text(\uD801\uDC00), 0)); Assert.assertEquals(4, GenericUDFUtils.findText(new Text(123\uD801\uDC00456), new Text(4), 0)); {code} This would require using String.codePointCount() on the indexOf() result. INSTR for UTF-8 returns incorrect position -- Key: HIVE-6843 URL: https://issues.apache.org/jira/browse/HIVE-6843 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.11.0, 0.12.0 Reporter: Clif Kranish Assignee: Szehon Ho Priority: Minor Attachments: HIVE-6843.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973615#comment-13973615 ] Anthony Hsu commented on HIVE-6835: --- What happens is Hive tries to build ObjectInspectorConverters from the partition schema to the table schema. If the partition schema is different from the table schema, you may get a ClassCastException like above. When you add new columns at the end, this is not a problem because these new columns are chopped off. See ObjectInspectorConverters:StructConverter: {code} int minFields = Math.min(inputFields.size(), outputFields.size()); fieldConverters = new ArrayListConverter(minFields); {code} It's only when you insert new columns at the beginning or in the middle that you might run into ClassCastExceptions. For the AvroSerDe, if it always uses the latest schema (which should be the table-level schema), Hive will not get confused when constructing its ObjectInspectorConverters. Then, later, when the AvroSerDe actually goes to read the Avro files, it can compare the latest schema with the (possibly old) schemas stored in the Avro data files themselves, and do the proper schema resolution, omitting fields or substituting default values, following the [schema resolution rules|http://avro.apache.org/docs/current/spec.html#Schema+Resolution]. Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Hsu updated HIVE-6835: -- Status: Patch Available (was: Open) Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6916) Export/import inherit permissions from parent directory
[ https://issues.apache.org/jira/browse/HIVE-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973623#comment-13973623 ] Szehon Ho commented on HIVE-6916: - [~xuefuz] can you please help review this? Export/import inherit permissions from parent directory --- Key: HIVE-6916 URL: https://issues.apache.org/jira/browse/HIVE-6916 Project: Hive Issue Type: Bug Components: Security Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6916.patch Export table into an external location and importing into hive, should set the table to have the permission of the parent directory, if the flag hive.warehouse.subdir.inherit.perms is set. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6430: --- Attachment: HIVE-6430.08.patch Fixed bugs, improved tests; TPCDS q27 now can run on the cluster I have access to (fails with OOM even with 8Gb containers). Profiling the results are actually much better now, little own time for the hashmap. MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 18936: HIVE-6430 MapJoin hash table has large memory overhead
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18936/ --- (Updated April 18, 2014, 1 a.m.) Review request for hive, Gopal V and Gunther Hagleitner. Changes --- Another iteration Repository: hive-git Description --- See JIRA Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e0e1339 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 142bfd8 ql/src/java/org/apache/hadoop/hive/ql/Driver.java bf9d4c1 ql/src/java/org/apache/hadoop/hive/ql/debug/Utils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 2b1438d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 1104a2b ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 8854b19 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 9df425b ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 64f0be2 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinPersistableTableContainer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java 008a8db ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 988959f ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 55b7415 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java e392592 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java eef7656 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedColumnarSerDe.java d4be78d ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 118b339 ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestBytesBytesMultiHashMap.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java 65e3779 ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783 ql/src/test/queries/clientpositive/mapjoin_decimal.q b65a7be ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 1eb95f6 ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 8350670 ql/src/test/results/clientpositive/tez/mapjoin_decimal.q.out 3c55b5c ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 284cc03 serde/src/java/org/apache/hadoop/hive/serde2/ByteStream.java 73d9b29 serde/src/java/org/apache/hadoop/hive/serde2/WriteBuffers.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 5870884 serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java bab505e serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 6f344bb serde/src/java/org/apache/hadoop/hive/serde2/io/DateWritable.java 1f4ccdd serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java a99c7b4 serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 435d6c6 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 82c1263 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java b188c3f serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java caf3517 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 6c14081 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java 06d5c5e serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java 868dd4c serde/src/test/org/apache/hadoop/hive/serde2/thrift_test/CreateSequenceFile.java 1fb49e5 Diff: https://reviews.apache.org/r/18936/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973654#comment-13973654 ] Anthony Hsu commented on HIVE-6835: --- On a side note: If you create an Avro table and store the schema in the TBLPROPERTIES - {code} CREATE TABLE ... TBLPROPERTIES ('avro.schema.literal'='...'); {code} \- everything works fine with partitions because TBLPROPERTIES are NOT copied to the partition, so the partition will end using the TBLPROPERTIES for initializing the Avro SerDe. It's only when you store the schema in the SERDEPROPERTIES - {code} CREATE TABLE ... WITH SERDEPROPERTIES ('avro.schema.literal'='...'); {code} \- that problems arise. SERDEPROPERTIES DO get copied to the partitions, so if you then end up changing the SERDEPROPERTIES stored at the table level, the SERDEPROPERTIES in the table and the partitions get out of sync and this sometimes leads to ClassCastExceptions with the AvroSerDe. Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6924) MapJoinKeyBytes::hashCode() should use Murmur hash
[ https://issues.apache.org/jira/browse/HIVE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973661#comment-13973661 ] Sergey Shelukhin commented on HIVE-6924: Will commit tomorrow MapJoinKeyBytes::hashCode() should use Murmur hash -- Key: HIVE-6924 URL: https://issues.apache.org/jira/browse/HIVE-6924 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6924.patch Existing hashCode is bad, causes HashMap to cluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973660#comment-13973660 ] Sergey Shelukhin commented on HIVE-6430: er, 72 MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6931) Windows unit test fixes
Jason Dere created HIVE-6931: Summary: Windows unit test fixes Key: HIVE-6931 URL: https://issues.apache.org/jira/browse/HIVE-6931 Project: Hive Issue Type: Bug Components: Tests, Windows Reporter: Jason Dere Assignee: Jason Dere A few misc fixes for some of the unit tests on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Minimum supported versions for DB backing Metastore
+1 Sounds good to me. On Thu, Apr 17, 2014 at 2:58 PM, Ashutosh Chauhan hashut...@apache.org wrote: I don't think we have documented anywhere what versions of mysql / postgres / oracle / ms-sql we are supporting. It will be good to document those. I propose following versions: * Derby - 10.10.1.1 - defined in hive's pom, so all unit tests run with it. * MySQL - 5.6.17 - minimum supported version by mysql community * Postgres - 9.1.13- has support for create table if not exists which is good to have * Oracle - 11g - oldest oracle version available to download from their site * MSSQL server - 2008 R2 - one which is currently tested against. Thoughts? Ashutosh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-6931) Windows unit test fixes
[ https://issues.apache.org/jira/browse/HIVE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6931: - Attachment: HIVE-6931.1.patch Patch v1: - Remove setAuxJars() call which was breaking Minimr tests - Refactor common code between QTestUtil/WindowsPathUtil - TestExecDriver should initialize tmpdir after converting Windows paths - Fix a couple of q file tests Windows unit test fixes --- Key: HIVE-6931 URL: https://issues.apache.org/jira/browse/HIVE-6931 Project: Hive Issue Type: Bug Components: Tests, Windows Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6931.1.patch A few misc fixes for some of the unit tests on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 20472: HIVE-6931 Windows unit test fixes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20472/ --- Review request for hive and Thejas Nair. Bugs: HIVE-6931 https://issues.apache.org/jira/browse/HIVE-6931 Repository: hive-git Description --- Remove setAuxJars() call which was breaking Minimr tests Refactor common code between QTestUtil/WindowsPathUtil TestExecDriver should initialize tmpdir after converting Windows paths Fix a couple of q file tests Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java d6e33f8 pom.xml 426dca8 ql/src/test/org/apache/hadoop/hive/ql/WindowsPathUtil.java 131260b ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java b548672 ql/src/test/queries/clientpositive/scriptfile1_win.q 0008ae5 ql/src/test/queries/clientpositive/tez_insert_overwrite_local_directory_1.q d7a652f ql/src/test/results/clientpositive/scriptfile1_win.q.out dfaa057 Diff: https://reviews.apache.org/r/20472/diff/ Testing --- Thanks, Jason Dere
[jira] [Updated] (HIVE-6931) Windows unit test fixes
[ https://issues.apache.org/jira/browse/HIVE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6931: - Status: Patch Available (was: Open) Windows unit test fixes --- Key: HIVE-6931 URL: https://issues.apache.org/jira/browse/HIVE-6931 Project: Hive Issue Type: Bug Components: Tests, Windows Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6931.1.patch A few misc fixes for some of the unit tests on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF
[ https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973698#comment-13973698 ] Sun Rui commented on HIVE-6922: --- [~jdere] I thought the bug was trivial and due to a casual missing of null pointer check, so a testcase for it would be trivial. However, if you still prefer a testcase, I can add it. NullPointerException in collect_set() UDAF -- Key: HIVE-6922 URL: https://issues.apache.org/jira/browse/HIVE-6922 Project: Hive Issue Type: Bug Components: UDF Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6922.patch Steps to reproduce the bug: {noformat} create table temp(key int, value string); -- leave the table empty select collect_set(key) from temp where key=0; Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) ... 9 more {noformat} The root cause is that in GenericUDAFMkCollectionEvaluator.merge() partialResult could be null but is not validated before it is used. {code} ListObject partialResult = (ArrayListObject) internalMergeOI.getList(partial); for(Object i : partialResult) { putIntoCollection(i, myagg); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6932) hive README needs update
[ https://issues.apache.org/jira/browse/HIVE-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973707#comment-13973707 ] Thejas M Nair commented on HIVE-6932: - Also needing update is the requirements section. We should include Java 1.7. hive README needs update Key: HIVE-6932 URL: https://issues.apache.org/jira/browse/HIVE-6932 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair It needs to be updated to include Tez as a runtime. Also, it talks about average latency being in minutes, which is very misleading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6932) hive README needs update
Thejas M Nair created HIVE-6932: --- Summary: hive README needs update Key: HIVE-6932 URL: https://issues.apache.org/jira/browse/HIVE-6932 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair It needs to be updated to include Tez as a runtime. Also, it talks about average latency being in minutes, which is very misleading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6932) hive README needs update
[ https://issues.apache.org/jira/browse/HIVE-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973708#comment-13973708 ] Thejas M Nair commented on HIVE-6932: - Also add MS SQL in databases supported (for 0.14) release. hive README needs update Key: HIVE-6932 URL: https://issues.apache.org/jira/browse/HIVE-6932 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair It needs to be updated to include Tez as a runtime. Also, it talks about average latency being in minutes, which is very misleading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-6932) hive README needs update
[ https://issues.apache.org/jira/browse/HIVE-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973708#comment-13973708 ] Thejas M Nair edited comment on HIVE-6932 at 4/18/14 2:09 AM: -- Also add Microsoft SQL Server in databases supported (for 0.14) release. was (Author: thejas): Also add MS SQL in databases supported (for 0.14) release. hive README needs update Key: HIVE-6932 URL: https://issues.apache.org/jira/browse/HIVE-6932 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair It needs to be updated to include Tez as a runtime. Also, it talks about average latency being in minutes, which is very misleading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF
[ https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973710#comment-13973710 ] Xuefu Zhang commented on HIVE-6922: --- Yes, adding the null check is trivial, but I guess it's more important to know why the variable might be null. Otherwise, null check might just hide other bug. NullPointerException in collect_set() UDAF -- Key: HIVE-6922 URL: https://issues.apache.org/jira/browse/HIVE-6922 Project: Hive Issue Type: Bug Components: UDF Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6922.patch Steps to reproduce the bug: {noformat} create table temp(key int, value string); -- leave the table empty select collect_set(key) from temp where key=0; Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) ... 9 more {noformat} The root cause is that in GenericUDAFMkCollectionEvaluator.merge() partialResult could be null but is not validated before it is used. {code} ListObject partialResult = (ArrayListObject) internalMergeOI.getList(partial); for(Object i : partialResult) { putIntoCollection(i, myagg); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache Hive 0.13.0 Release Candidate 2
+1 - Verified the md5 checksums and gpg keys - Checked LICENSE, README.txt , NOTICE, RELEASE_NOTES.txt files - Build src tar.gz - Ran local mode queries with new build. I had run unit test suite with rc1 and they looked good. On Tue, Apr 15, 2014 at 2:06 PM, Harish Butani rhbut...@apache.org wrote: Apache Hive 0.13.0 Release Candidate 2 is available here: http://people.apache.org/~rhbutani/hive-0.13.0-candidate-2 Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1011 Source tag for RCN is at: https://svn.apache.org/repos/asf/hive/tags/release-0.13.0-rc2/ Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-6913) Hive unable to find the hashtable file during complex multi-staged map join
[ https://issues.apache.org/jira/browse/HIVE-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6913: -- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Brock for the fix. Hive unable to find the hashtable file during complex multi-staged map join --- Key: HIVE-6913 URL: https://issues.apache.org/jira/browse/HIVE-6913 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-6913.patch, HIVE-6913.patch If a query has multiple mapjoins and one of the tables to be mapjoined is empty, the query can result in a no such file or directory when looking for the hashtable. This is because when we generate a dummy hash table, we do not close the TableScan (TS) operator for that table. Additionally, HashTableSinkOperator (HTSO) outputs it's hash tables in the closeOp method. However, when close is called on HTSO we check to ensure that all parents are closed: https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L333 which is not true on this case, because the TS operator for the empty table was never closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema
[ https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973722#comment-13973722 ] Xuefu Zhang commented on HIVE-6835: --- [~erwaman] Thanks for the explanation. Now I see where the problem is. SERDEPROPERTIES and TBLPROPERTIES are for different purpose. I'm curious why user would put avro.schema.literal in the serde properties, as this is table specific and it should be put in TBLPROPERTIES. SERDEPROPERTIES, on the other hand, is used to control serde behavior (plugin level instead of table level), such as field delimiter which doesn't necessary vary from table to table. If you check AvroSerde documentation, schema is specified in TBLPROPERTIES. https://cwiki.apache.org/confluence/display/Hive/AvroSerDe. Thus, it seems that this fix is for an invalid use case. What's your thought on this? Reading of partitioned Avro data fails if partition schema does not match table schema -- Key: HIVE-6835 URL: https://issues.apache.org/jira/browse/HIVE-6835 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch To reproduce: {code} create table testarray (a arraystring); load data local inpath '/home/ahsu/test/array.txt' into table testarray; # create partitioned Avro table with one array column create table avroarray partitioned by (y string) row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ { name:a, type:{type:array,items:string} } ] }') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; insert into table avroarray partition(y=1) select * from testarray; # add an int column with a default value of 0 alter table avroarray set serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties('avro.schema.literal'='{namespace:test,name:avroarray,type: record, fields: [ {name:intfield,type:int,default:0},{ name:a, type:{type:array,items:string} } ] }'); # fails with ClassCastException select * from avroarray; {code} The select * fails with: {code} Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6922) NullPointerException in collect_set() UDAF
[ https://issues.apache.org/jira/browse/HIVE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973727#comment-13973727 ] Sun Rui commented on HIVE-6922: --- [~xuefuz] The reason for the variable being null is that the table is empty and thus no input data. {code} /** * Merge with partial aggregation result. NOTE: null might be passed in case * there is no input data. * * @param partial * The partial aggregation result. */ public abstract void merge(AggregationBuffer agg, Object partial) throws HiveException; {code} Null might be passed in case there is no input data in the description for merge() in GenericUDAFEvaluator. I found existing examples of checking if partial is null. GenericUDAFComputeStats as an example: {code} @Override public void merge(AggregationBuffer agg, Object partial) throws HiveException { if (partial != null) { ... } } {code} NullPointerException in collect_set() UDAF -- Key: HIVE-6922 URL: https://issues.apache.org/jira/browse/HIVE-6922 Project: Hive Issue Type: Bug Components: UDF Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6922.patch Steps to reproduce the bug: {noformat} create table temp(key int, value string); -- leave the table empty select collect_set(key) from temp where key=0; Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:326) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:471) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:318) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator.merge(GenericUDAFMkCollectionEvaluator.java:140) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:186) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) ... 9 more {noformat} The root cause is that in GenericUDAFMkCollectionEvaluator.merge() partialResult could be null but is not validated before it is used. {code} ListObject partialResult = (ArrayListObject) internalMergeOI.getList(partial); for(Object i : partialResult) { putIntoCollection(i, myagg); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6843) INSTR for UTF-8 returns incorrect position
[ https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973730#comment-13973730 ] Szehon Ho commented on HIVE-6843: - Thanks for the review. As I understand, you are passing in a string literal to Text constructor, so it is not interpreting \uD801 as one char, so there is actually 5 chars there: '\', 'u', 'D', '8', '0', '1'. I tried the following test and it seemed to work: char[] chararray = new char[] {'1', '2', '3', '\uD801', '\uDC00', '4', '5', '6'}; String str = new String(chararray); Assert.assertEquals(5, GenericUDFUtils.findText(new Text(str), new Text(4), 0)); I guess the second check was supposed to be 5, not 4. INSTR for UTF-8 returns incorrect position -- Key: HIVE-6843 URL: https://issues.apache.org/jira/browse/HIVE-6843 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.11.0, 0.12.0 Reporter: Clif Kranish Assignee: Szehon Ho Priority: Minor Attachments: HIVE-6843.patch -- This message was sent by Atlassian JIRA (v6.2#6252)