[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Attachment: HIVE-10165.9.patch Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.
[ https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596300#comment-14596300 ] Hive QA commented on HIVE-11062: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741052/HIVE-11062.1.patch {color:green}SUCCESS:{color} +1 9013 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4335/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4335/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4335/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12741052 - PreCommit-HIVE-TRUNK-Build Remove Exception stacktrace from Log.info when ACL is not supported. Key: HIVE-11062 URL: https://issues.apache.org/jira/browse/HIVE-11062 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11062.1.patch When logging set to info, Extended ACL Enabled and the file system does not support ACL, there are a lot of Exception stack trace in the log file. Although it is benign, it can easily make users frustrated. We should set the level to show the Exception in debug. Current, the Exception in the log looks like: {noformat} 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path s3a://yibing/hive does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729) at org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10594) Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596295#comment-14596295 ] Chao Sun commented on HIVE-10594: - +1 Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch] -- Key: HIVE-10594 URL: https://issues.apache.org/jira/browse/HIVE-10594 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chao Sun Assignee: Xuefu Zhang Attachments: HIVE-10594.1-spark.patch Reporting problem found by one of the HoS users: Currently, if user is running Beeline on a different host than HS2, and he/she didn't do kinit on the HS2 host, then he/she may get the following error: {code} 2015-04-29 15:49:34,614 INFO org.apache.hive.spark.client.SparkClientImpl: 15/04/29 15:49:34 WARN UserGroupInformation: PriviledgedActionException as:hive (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-29 15:49:34,652 INFO org.apache.hive.spark.client.SparkClientImpl: Exception in thread main java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: secure-hos-1.ent.cloudera.com/10.20.77.79; destination host is: secure-hos-1.ent.cloudera.com:8032; 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.Client.call(Client.java:1472) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.Client.call(Client.java:1399) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl: at com.sun.proxy.$Proxy11.getClusterMetrics(Unknown Source) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at java.lang.reflect.Method.invoke(Method.java:606) 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at com.sun.proxy.$Proxy12.getClusterMetrics(Unknown Source) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:461) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:49) 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl: at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:90) 2015-04-29 15:49:34,658 INFO org.apache.hive.spark.client.SparkClientImpl: at
[jira] [Updated] (HIVE-11073) ORC FileDump utility ignores errors when writing output
[ https://issues.apache.org/jira/browse/HIVE-11073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-11073: --- Attachment: HIVE-11073.1.patch ORC FileDump utility ignores errors when writing output --- Key: HIVE-11073 URL: https://issues.apache.org/jira/browse/HIVE-11073 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Priority: Minor Labels: cli, orc Attachments: HIVE-11073.1.patch The Hive command line provides the {{--orcfiledump}} utility for dumping data contained within ORC files, specifically when using the {{-d}} option. Generally, it is useful to be able to pipe the data extracted into other commands and utilities to transform and control the data so that it is more manageable by the CLI user. A classic example is {{less}}. When such command pipelines are currently constructed, the underlying implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} is oblivious to errors occurring when writing to its output stream. Such errors are common place when a user issues {{Ctrl+C}} to kill the leaf process. In this event the leaf process terminates immediately but the Hive CLI process continues to execute until the full contents of the ORC file has been read. By making {{FileDump}} considerate of output stream errors the process will terminate as soon as the destination process exits (i.e. when the user kills {{less}}) and control will be returned to the user as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11073) ORC FileDump utility ignores errors when writing output
[ https://issues.apache.org/jira/browse/HIVE-11073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596292#comment-14596292 ] Alan Gates commented on HIVE-11073: --- +1 ORC FileDump utility ignores errors when writing output --- Key: HIVE-11073 URL: https://issues.apache.org/jira/browse/HIVE-11073 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Priority: Minor Labels: cli, orc Attachments: HIVE-11073.1.patch The Hive command line provides the {{--orcfiledump}} utility for dumping data contained within ORC files, specifically when using the {{-d}} option. Generally, it is useful to be able to pipe the data extracted into other commands and utilities to transform and control the data so that it is more manageable by the CLI user. A classic example is {{less}}. When such command pipelines are currently constructed, the underlying implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} is oblivious to errors occurring when writing to its output stream. Such errors are common place when a user issues {{Ctrl+C}} to kill the leaf process. In this event the leaf process terminates immediately but the Hive CLI process continues to execute until the full contents of the ORC file has been read. By making {{FileDump}} considerate of output stream errors the process will terminate as soon as the destination process exits (i.e. when the user kills {{less}}) and control will be returned to the user as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.
[ https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-11062: Attachment: HIVE-11062.1.patch Remove Exception stacktrace from Log.info when ACL is not supported. Key: HIVE-11062 URL: https://issues.apache.org/jira/browse/HIVE-11062 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11062.1.patch When logging set to info, Extended ACL Enabled and the file system does not support ACL, there are a lot of Exception stack trace in the log file. Although it is benign, it can easily make users frustrated. We should set the level to show the Exception in debug. Current, the Exception in the log looks like: {noformat} 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path s3a://yibing/hive does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729) at org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11073) ORC FileDump utility ignores errors when writing output
[ https://issues.apache.org/jira/browse/HIVE-11073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-11073: --- Summary: ORC FileDump utility ignores errors when writing output (was: ORC FileDump utility ignore errors when writing output) ORC FileDump utility ignores errors when writing output --- Key: HIVE-11073 URL: https://issues.apache.org/jira/browse/HIVE-11073 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Priority: Minor Labels: cli, orc The Hive command line provides the {{--orcfiledump}} utility for dumping data contained within ORC files, specifically when using the {{-d}} option. Generally, it is useful to be able to pipe the data extracted into other commands and utilities to transform and control the data so that it is more manageable by the CLI user. A classic example is {{less}}. When such command pipelines are currently constructed, the underlying implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} is oblivious to errors occurring when writing to its output stream. Such errors are common place when a user issues {{Ctrl+C}} to kill the leaf process. In this event the leaf process terminates immediately but the Hive CLI process continues to execute until the full contents of the ORC file has been read. By making {{FileDump}} considerate of output stream errors the process will terminate as soon as the destination process exits (i.e. when the user kills {{less}}) and control will be returned to the user as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11060) Make test windowing.q robust
[ https://issues.apache.org/jira/browse/HIVE-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596195#comment-14596195 ] Jesus Camacho Rodriguez commented on HIVE-11060: [~ashutoshc], test fails are not related to this patch. Thanks Make test windowing.q robust Key: HIVE-11060 URL: https://issues.apache.org/jira/browse/HIVE-11060 Project: Hive Issue Type: Bug Components: Tests Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11060.01.patch, HIVE-11060.patch Add partition / order by in over clause to make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-7193: Attachment: HIVE-7193.6.patch Attaching a new patch with a doc change (description of a parameter in HiveConf), per Lefty's suggestion. No real code changes. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.6.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596013#comment-14596013 ] Vaibhav Gumashta commented on HIVE-10895: - [~aihuaxu] I'm OOO for next few days. Please feel free to take over if you have a potential solution. If you can wait, I plan to work on this after 07/05. Thanks for checking back. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Vaibhav Gumashta Attachments: HIVE-10895.1.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596010#comment-14596010 ] Vaibhav Gumashta commented on HIVE-10895: - [~aihuaxu] I'm OOO for next few days. Please feel free to take over if you have a potential solution. If you can wait, I plan to work on this after 07/05. Thanks for checking back. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Vaibhav Gumashta Attachments: HIVE-10895.1.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.
[ https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596382#comment-14596382 ] Yongzhi Chen commented on HIVE-11062: - [~spena], could you review the change? Thanks Remove Exception stacktrace from Log.info when ACL is not supported. Key: HIVE-11062 URL: https://issues.apache.org/jira/browse/HIVE-11062 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11062.1.patch When logging set to info, Extended ACL Enabled and the file system does not support ACL, there are a lot of Exception stack trace in the log file. Although it is benign, it can easily make users frustrated. We should set the level to show the Exception in debug. Current, the Exception in the log looks like: {noformat} 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path s3a://yibing/hive does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729) at org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-7193: -- Labels: TODOC1.3 TODOC2.0 (was: ) Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Labels: TODOC1.3, TODOC2.0 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.6.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11074) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11074: --- Component/s: Tests Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11074 URL: https://issues.apache.org/jira/browse/HIVE-11074 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596613#comment-14596613 ] Hive QA commented on HIVE-10165: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741056/HIVE-10165.9.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9104 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4337/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4337/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4337/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741056 - PreCommit-HIVE-TRUNK-Build Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8190) LDAP user match for authentication on hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-8190. - Resolution: Fixed Fix Version/s: 2.0.0 1.3.0 Hadoop Flags: Reviewed A more general fix for this issue has been included in HIVE-7193 that add filter support for LDAP user and groups. Users can configure the following properties to indicate multiple patterns(COMMA-separated) for DNs where users/groups can be located in LDAP. hive.server2.authentication.ldap.groupDNPattern hive.server2.authentication.ldap.userDNPattern ex: uid=%s,ou=Users,DC=domain,DC=com:CN=%s,CN=Users,DC=domain,DC=com uid=%s,ou=Groups,DC=domain,DC=com:CN=%s,CN=Groups,DC=domain,DC=com Please provide any feedback you have on the new features. Thanks LDAP user match for authentication on hiveserver2 - Key: HIVE-8190 URL: https://issues.apache.org/jira/browse/HIVE-8190 Project: Hive Issue Type: Improvement Components: Authorization, Clients Affects Versions: 0.13.1 Environment: Centos 6.5 Reporter: LINTE Assignee: Naveen Gangam Fix For: 1.3.0, 2.0.0 Some LDAP has the user composant as CN and not UID. SO when you try to authenticate the LDAP authentication module of hive try to authenticate with the following string : uid=$login,basedn Some AD have user objects that are not uid but cn, so it is be important to personalize the kind of objects that the authentication moduel look for in ldap. We can see an exemple in knox LDAP module configuration the parameter main.ldapRealm.userDnTemplate can be configured to look for : uid : 'uid={0}, basedn' or cn : 'cn={0}, basedn' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11074) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11074: --- Attachment: HIVE-11074.patch Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11074 URL: https://issues.apache.org/jira/browse/HIVE-11074 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11074.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596679#comment-14596679 ] Aihua Xu commented on HIVE-10895: - [~vgumashta] I was looking into two solutions. 1. a wapper class for the JDO query result which will dispose the resources when the wrapper class gets garbage collected automatically. But this approach could still cause issues that resources like cursors may not get released soon enough. 2. Clone query result to a new list to force iterating the result and return that new list and then call query.closeAll() immediately. I will work on the second approach, ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596692#comment-14596692 ] Laljo John Pullokkaran commented on HIVE-11037: --- Add a note why comparable of vertex uses string comparison. Otherwise look good. +1 conditional on QA run. HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, HIVE-11037.06.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11074) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11074: --- Attachment: HIVE-11074.patch [~hsubramaniyan], could you check the patch? It seems I added two test files in the wrong place in HIVE-11041, which is causing TestSessionState to fail. Thanks Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11074 URL: https://issues.apache.org/jira/browse/HIVE-11074 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11074) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11074: --- Attachment: (was: HIVE-11074.patch) Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11074 URL: https://issues.apache.org/jira/browse/HIVE-11074 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11037: --- Attachment: HIVE-11037.06.patch address the vertex ordering issue. [~jpullokkaran], could you please take a look? HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, HIVE-11037.06.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9880) Support configurable username attribute for HiveServer2 LDAP authentication
[ https://issues.apache.org/jira/browse/HIVE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-9880. - Resolution: Fixed Fix Version/s: 2.0.0 1.3.0 Hadoop Flags: Reviewed A more general fix for this issue has been included in HIVE-7193 that add filter support for LDAP user and groups. Users can configure the following properties to indicate multiple patterns(COMMA-separated) for DNs where users/groups can be located in LDAP. hive.server2.authentication.ldap.groupDNPattern hive.server2.authentication.ldap.userDNPattern ex: uid=%s,ou=Users,DC=domain,DC=com:CN=%s,CN=Users,DC=domain,DC=com uid=%s,ou=Groups,DC=domain,DC=com:CN=%s,CN=Groups,DC=domain,DC=com Please provide any feedback you have on the new features. Thanks Support configurable username attribute for HiveServer2 LDAP authentication --- Key: HIVE-9880 URL: https://issues.apache.org/jira/browse/HIVE-9880 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaime Murillo Assignee: Naveen Gangam Fix For: 1.3.0, 2.0.0 Attachments: HIVE-9880-1.patch OpenLDAP requires that when bind authenticating, the DN being supplied must be the creation DN of the account. Since, OpenLDAP allows for any attribute to be used when creating a DN for an account, organizations that don’t use hardcoded *uid* attribute won’t be able to utilize HiveServer2 LDAP authentication. HiveServer2 should support a configurable username attribute when constructing the bindDN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11037: --- Attachment: HIVE-11037.07.patch HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, HIVE-11037.06.patch, HIVE-11037.07.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11071) FIx the output of beeline dbinfo command
[ https://issues.apache.org/jira/browse/HIVE-11071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596834#comment-14596834 ] Xuefu Zhang commented on HIVE-11071: +1 FIx the output of beeline dbinfo command Key: HIVE-11071 URL: https://issues.apache.org/jira/browse/HIVE-11071 Project: Hive Issue Type: Bug Components: Beeline Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11071-001-output.txt, HIVE-11071-001.patch When dbinfo is executed by beeline, it is displayed as follows. {code} 0: jdbc:hive2://localhost:10001/ !dbinfo Error: Method not supported (state=,code=0) allTablesAreSelectabletrue Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) getCatalogSeparator . getCatalogTerminstance getDatabaseProductNameApache Hive getDatabaseProductVersion 2.0.0-SNAPSHOT getDefaultTransactionIsolation0 getDriverMajorVersion 1 getDriverMinorVersion 1 getDriverName Hive JDBC ... {code} The method name of Error is not understood. I fix this output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596729#comment-14596729 ] Laljo John Pullokkaran commented on HIVE-10996: --- +1 Conditional on QA run. Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11073) ORC FileDump utility ignores errors when writing output
[ https://issues.apache.org/jira/browse/HIVE-11073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596752#comment-14596752 ] Hive QA commented on HIVE-11073: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741071/HIVE-11073.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9014 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4338/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4338/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4338/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741071 - PreCommit-HIVE-TRUNK-Build ORC FileDump utility ignores errors when writing output --- Key: HIVE-11073 URL: https://issues.apache.org/jira/browse/HIVE-11073 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Priority: Minor Labels: cli, orc Attachments: HIVE-11073.1.patch The Hive command line provides the {{--orcfiledump}} utility for dumping data contained within ORC files, specifically when using the {{-d}} option. Generally, it is useful to be able to pipe the data extracted into other commands and utilities to transform and control the data so that it is more manageable by the CLI user. A classic example is {{less}}. When such command pipelines are currently constructed, the underlying implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} is oblivious to errors occurring when writing to its output stream. Such errors are common place when a user issues {{Ctrl+C}} to kill the leaf process. In this event the leaf process terminates immediately but the Hive CLI process continues to execute until the full contents of the ORC file has been read. By making {{FileDump}} considerate of output stream errors the process will terminate as soon as the destination process exits (i.e. when the user kills {{less}}) and control will be returned to the user as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596769#comment-14596769 ] Pengcheng Xiong commented on HIVE-11076: [~ashutoshc], could you please take a look? Thanks. Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11076.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11076: --- Attachment: HIVE-11076.01.patch Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11076.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.
[ https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596579#comment-14596579 ] Sergio Peña commented on HIVE-11062: Thanks [~ychena] +1 Remove Exception stacktrace from Log.info when ACL is not supported. Key: HIVE-11062 URL: https://issues.apache.org/jira/browse/HIVE-11062 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11062.1.patch When logging set to info, Extended ACL Enabled and the file system does not support ACL, there are a lot of Exception stack trace in the log file. Although it is benign, it can easily make users frustrated. We should set the level to show the Exception in debug. Current, the Exception in the log looks like: {noformat} 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path s3a://yibing/hive does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729) at org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596472#comment-14596472 ] Hive QA commented on HIVE-7193: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741055/HIVE-7193.6.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9013 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4336/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4336/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4336/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741055 - PreCommit-HIVE-TRUNK-Build Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.6.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596489#comment-14596489 ] Naveen Gangam commented on HIVE-7193: - This test seems flaky. It failed with patch v4, and passed with patch v5 and failed again with patch v6, although there are no code changes between v4,v5 and v6 of the patch, just doc changes (javadocs and hiveconf parameter descriptions). The failure does not appear to be related to my fix. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.6.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10673: -- Attachment: HIVE-10673.5.patch Patch v5 - rebasing with trunk Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596562#comment-14596562 ] Laljo John Pullokkaran commented on HIVE-10996: --- Some more comments: 1. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java for (Operator? child : op.getChildOperators()) { +if (child instanceof SelectOperator || child instanceof ReduceSinkOperator) { + continue; +} +ListString neededCols = cppCtx.genColLists(op, child); +if (neededCols.size() op.getSchema().getSignature().size()) { + ArrayListExprNodeDesc exprs = new ArrayListExprNodeDesc(); + ArrayListString outputs = new ArrayListString(); + MapString, ExprNodeDesc colExprMap = new HashMapString, ExprNodeDesc(); + ArrayListColumnInfo outputRS = new ArrayListColumnInfo(); + for (String internalName : neededCols) { +ColumnInfo colInfo = op.getSchema().getColumnInfo( Should preserve the order of cols as it appears in GB 2. Nit Pick: change name of OP to GBOP 3. Nit Pick: Change name of output to outputColNames Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597097#comment-14597097 ] Gunther Hagleitner commented on HIVE-10233: --- .11 is a simplified version. I threw out all the mem computation wrt edges and basically just kept the adjustment for graceful joins. [~vikram.dixit]/[~wzheng] could you take a look? Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.11.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597085#comment-14597085 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597126#comment-14597126 ] Mostafa Mokhtar commented on HIVE-10233: [~hagleitn] [~vikram.dixit] [~wzheng] It would make sense to annotate the explain plan with memory assigned to each Hash table, as in {code} DagName: jenkins_20150622122318_f770d9ab-0ddd-43cf-b950-32f38e2f17e1:1 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: (ss_item_sk is not null and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean) Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 28878719387 Data size: 231029755096 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 ss_item_sk (type: int) 1 i_item_sk (type: int) outputColumnNames: _col1, _col22, _col26 input vertices: 1 Map 3 Statistics: Num rows: 28878719387 Data size: 346544632644 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Hash table memory : 1848000 Bytes Filter Operator predicate: ((_col26 = _col1) and _col22 BETWEEN 2450816 AND 2451500) (type: boolean) Statistics: Num rows: 7219679846 Data size: 86636158152 Basic stats: COMPLETE Column stats: COMPLETE Select Operator Statistics: Num rows: 7219679846 Data size: 86636158152 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: bigint) Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: item filterExpr: i_item_sk is not null (type: boolean) Statistics: Num rows: 462000 Data size: 663560457 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: i_item_sk is not null (type: boolean) Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: i_item_sk (type: int) sort order: + Map-reduce partition columns: i_item_sk (type: int) Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized {code} Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597141#comment-14597141 ] Mostafa Mokhtar commented on HIVE-10233: [~hagleitn] Please check comment above. Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597149#comment-14597149 ] Hive QA commented on HIVE-11076: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741152/HIVE-11076.01.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9013 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merge_multi_expressions org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_merge_multi_expressions {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4343/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4343/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4343/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741152 - PreCommit-HIVE-TRUNK-Build Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11076.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597084#comment-14597084 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597079#comment-14597079 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597062#comment-14597062 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597067#comment-14597067 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597080#comment-14597080 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597081#comment-14597081 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5457) Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid in
[ https://issues.apache.org/jira/browse/HIVE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597072#comment-14597072 ] 陈典贵 commented on HIVE-5457: --- hi! Did you resolve this problem? It happend when we use hue to execute hql. And we set datanucleus.autoStartMechanism= SchemaTabler refered in fixed 4762,but it helpless. Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping --- Key: HIVE-5457 URL: https://issues.apache.org/jira/browse/HIVE-5457 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Lenni Kuff Priority: Critical Concurrent calls to getTable() result in: MetaException: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. NucleusException: Invalid index 1 for DataStoreMapping This happens when using a Hive Metastore Service directly connecting to the backend metastore db. I have been able to hit this with as few as 2 concurrent calls. When I update my app to serialize all calls to getTable() this problem is resolved. Stack Trace: {code} Caused by: org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping. at org.datanucleus.store.mapped.mapping.PersistableMapping.getDatastoreMapping(PersistableMapping.java:307) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSizeStmt(RDBMSElementContainerStoreSpecialization.java:407) at org.datanucleus.store.rdbms.scostore.RDBMSElementContainerStoreSpecialization.getSize(RDBMSElementContainerStoreSpecialization.java:257) at org.datanucleus.store.rdbms.scostore.RDBMSJoinListStoreSpecialization.getSize(RDBMSJoinListStoreSpecialization.java:46) at org.datanucleus.store.mapped.scostore.ElementContainerStore.size(ElementContainerStore.java:440) at org.datanucleus.sco.backed.List.size(List.java:557) at org.apache.hadoop.hive.metastore.ObjectStore.convertToSkewedValues(ObjectStore.java:1029) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1007) at org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1017) at org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:872) at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1349) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597139#comment-14597139 ] Mostafa Mokhtar commented on HIVE-10233: [~gunther] Should totalAvailableMemory be HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD? resourceAvailable.getMemory() would basically return container size, this will result in a lot of GC if all memory gets used up. Change Log.Debug to Log.INFO and print the input size. {code} for (MapJoinOperator mj : mapJoins) { mj.getConf().setMemoryNeeded(minMemory); LOG.info(Setting + minMemory + bytes needed for + mj); } {code} Also I am not following the logic here, shouldn't the memory needed per operator be something like (estimate size) / (Total input sizes) x memoryAvailable ? {code} int numJoins = mapJoins.size(); long minMemory = totalAvailableMemory / ((numJoins 0) ? numJoins : 1); minMemory = Math.min(minMemory, onePercentMemory); for (MapJoinOperator mj : mapJoins) { mj.getConf().setMemoryNeeded(minMemory); if (LOG.isDebugEnabled()) { LOG.debug(Setting + minMemory + bytes needed for + mj); } } {code} Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11076: --- Attachment: HIVE-11076.02.patch Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7193: - Labels: TODOC1.3 (was: TODOC1.3 TODOC2.0) Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Labels: TODOC1.3 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.6.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597186#comment-14597186 ] Lefty Leverenz commented on HIVE-7193: -- Doc note: (Removed TODOC2.0 because we only need to document the initial version, which is 1.3.) This adds five configuration parameters, which need to be documented in the HiveServer2 section of Configuration Properties. * hive.server2.authentication.ldap.groupDNPattern * hive.server2.authentication.ldap.groupFilter * hive.server2.authentication.ldap.userDNPattern * hive.server2.authentication.ldap.userFilter * hive.server2.authentication.ldap.customLDAPQuery * [Configuration Properties -- HiveServer2 | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2] Ben Tse wrote up the general documentation here (thanks, Ben): * [User and Group Filter Support with LDAP Atn Provider in HiveServer2 | https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2] Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Labels: TODOC1.3 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.6.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595962#comment-14595962 ] Aihua Xu commented on HIVE-10895: - Hi [~vgumashta] Any updates on that? Sorry to push you on that. Our customer is waiting on the fix. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Vaibhav Gumashta Attachments: HIVE-10895.1.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11071) FIx the output of beeline dbinfo command
[ https://issues.apache.org/jira/browse/HIVE-11071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HIVE-11071: -- Attachment: HIVE-11071-001.patch I attach a patch file. FIx the output of beeline dbinfo command Key: HIVE-11071 URL: https://issues.apache.org/jira/browse/HIVE-11071 Project: Hive Issue Type: Bug Components: Beeline Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11071-001.patch When dbinfo is executed by beeline, it is displayed as follows. {code} 0: jdbc:hive2://localhost:10001/ !dbinfo Error: Method not supported (state=,code=0) allTablesAreSelectabletrue Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) getCatalogSeparator . getCatalogTerminstance getDatabaseProductNameApache Hive getDatabaseProductVersion 2.0.0-SNAPSHOT getDefaultTransactionIsolation0 getDriverMajorVersion 1 getDriverMinorVersion 1 getDriverName Hive JDBC ... {code} The method name of Error is not understood. I fix this output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10970) Investigate HIVE-10453: HS2 leaking open file descriptors when using UDFs
[ https://issues.apache.org/jira/browse/HIVE-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595968#comment-14595968 ] Yongzhi Chen commented on HIVE-10970: - [~vgumashta], HIVE-10453 may not be the root cause of your test failures, hive seems to have issues on how to handle threads going across different sessions, HIVE-10453 may accelerate exposing the issues. Investigate HIVE-10453: HS2 leaking open file descriptors when using UDFs - Key: HIVE-10970 URL: https://issues.apache.org/jira/browse/HIVE-10970 Project: Hive Issue Type: Bug Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11071) FIx the output of beeline dbinfo command
[ https://issues.apache.org/jira/browse/HIVE-11071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595974#comment-14595974 ] Xuefu Zhang commented on HIVE-11071: [~yamashitasni], could you please post the output with your patch included? Thanks. FIx the output of beeline dbinfo command Key: HIVE-11071 URL: https://issues.apache.org/jira/browse/HIVE-11071 Project: Hive Issue Type: Bug Components: Beeline Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11071-001.patch When dbinfo is executed by beeline, it is displayed as follows. {code} 0: jdbc:hive2://localhost:10001/ !dbinfo Error: Method not supported (state=,code=0) allTablesAreSelectabletrue Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) getCatalogSeparator . getCatalogTerminstance getDatabaseProductNameApache Hive getDatabaseProductVersion 2.0.0-SNAPSHOT getDefaultTransactionIsolation0 getDriverMajorVersion 1 getDriverMinorVersion 1 getDriverName Hive JDBC ... {code} The method name of Error is not understood. I fix this output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function
[ https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595951#comment-14595951 ] Aihua Xu commented on HIVE-10142: - Can you share some sample tables and queries for this type of models I can try to see if they are supported or not? Calculating formula based on difference between each row's value and current row's in Windowing function Key: HIVE-10142 URL: https://issues.apache.org/jira/browse/HIVE-10142 Project: Hive Issue Type: New Feature Components: PTF-Windowing Affects Versions: 1.0.0 Reporter: Yi Zhang Assignee: Aihua Xu For analytics with windowing function, the calculation formula sometimes needs to perform over each row's value against current tow's value. The decay value is a good example, such as sums of value with a decay function based on difference of timestamp between each row and current row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595954#comment-14595954 ] Aihua Xu commented on HIVE-11025: - [~ashutoshc] Can you help submit the patch? Thanks. In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Key: HIVE-11025 URL: https://issues.apache.org/jira/browse/HIVE-11025 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11025.patch Given data and the following query, {noformat} deptno empno bonussalary 307698 NULL2850.0 307900 NULL950.0 307844 0 1500.0 select avg(salary) over (partition by deptno order by bonus range 200 preceding) from emp2; {noformat} It produces incorrect result for the row in which bonus=0 1900.0 1900.0 1766.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)