[jira] [Created] (HIVE-26383) OOM during join query
Pravin Sinha created HIVE-26383: --- Summary: OOM during join query Key: HIVE-26383 URL: https://issues.apache.org/jira/browse/HIVE-26383 Project: Hive Issue Type: Bug Reporter: Pravin Sinha {code:java} [ERROR] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[innerjoin_cal_with_insert] Time elapsed: 100.73 s <<< ERROR! java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newTreeNode(HashMap.java:1784) at java.util.HashMap$TreeNode.putTreeVal(HashMap.java:2029) at java.util.HashMap.putVal(HashMap.java:639) at java.util.HashMap.put(HashMap.java:613) at java.util.HashSet.add(HashSet.java:220) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.EstimateUniqueKeys.getUniqueKeys(EstimateUniqueKeys.java:229) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.EstimateUniqueKeys.getUniqueKeys(EstimateUniqueKeys.java:304) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.isKey(HiveRelMdRowCount.java:501) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.analyzeJoinForPKFK(HiveRelMdRowCount.java:302) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:102) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1882) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1756) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1233) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addFactorToTree(LoptOptimizeJoinRule.java:927) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createOrdering(LoptOptimizeJoinRule.java:728) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.findBestOrderings(LoptOptimizeJoinRule.java:459) at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.onMatch(LoptOptimizeJoinRule.java:128) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2468) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2427) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyJoinOrderingTransform(CalcitePlanner.java:2193) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-25439) Make the DistCp stat csv content parse-able
Pravin Sinha created HIVE-25439: --- Summary: Make the DistCp stat csv content parse-able Key: HIVE-25439 URL: https://issues.apache.org/jira/browse/HIVE-25439 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha The csv file generated by script replstats.sh isn't parse-able when the number of bytes Copied is huge. The 'Bytes Copied' field itself can have comma. E.g {code:java} #cat Repl#repl_testing20210802T153039308427#14711values.csv job_1624306668424_194169,2-Aug-2021 20:20:41,2-Aug-2021 20:22:08,1mins, 27sec,2-Aug-2021 20:22:29,21sec,1,0,112,527,514,1,SUCCEEDED {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25355) EXPLAIN statement for write transactions with hive.txn.readonly.enabled fails
Pravin Sinha created HIVE-25355: --- Summary: EXPLAIN statement for write transactions with hive.txn.readonly.enabled fails Key: HIVE-25355 URL: https://issues.apache.org/jira/browse/HIVE-25355 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25330) Make FS calls in CopyUtils retryable
Pravin Sinha created HIVE-25330: --- Summary: Make FS calls in CopyUtils retryable Key: HIVE-25330 URL: https://issues.apache.org/jira/browse/HIVE-25330 Project: Hive Issue Type: Improvement Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25305) Replayed transactions are not cleaned up properly on open txn timeout
Pravin Sinha created HIVE-25305: --- Summary: Replayed transactions are not cleaned up properly on open txn timeout Key: HIVE-25305 URL: https://issues.apache.org/jira/browse/HIVE-25305 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG
Pravin Sinha created HIVE-25272: --- Summary: READ transactions are getting logged in NOTIFICATION LOG Key: HIVE-25272 URL: https://issues.apache.org/jira/browse/HIVE-25272 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha While READ transactions are already skipped from getting logged in NOTIFICATION logs, few are still getting logged. Need to skip those transactions as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel
Pravin Sinha created HIVE-25164: --- Summary: Execute Bootstrap REPL load DDL tasks in parallel Key: HIVE-25164 URL: https://issues.apache.org/jira/browse/HIVE-25164 Project: Hive Issue Type: Improvement Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24896) External table having same name as dropped managed table fails to replicate
Pravin Sinha created HIVE-24896: --- Summary: External table having same name as dropped managed table fails to replicate Key: HIVE-24896 URL: https://issues.apache.org/jira/browse/HIVE-24896 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24884) Move top level dump metadata content to be in JSON format
Pravin Sinha created HIVE-24884: --- Summary: Move top level dump metadata content to be in JSON format Key: HIVE-24884 URL: https://issues.apache.org/jira/browse/HIVE-24884 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha {color:#172b4d}The current content for _dumpmetadata file is TAB separated. This is not very flexible for extension. A more flexible format like JSON based content would be helpful for extending the content.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source
Pravin Sinha created HIVE-24733: --- Summary: Handle replication when db location and managed location is set to custom location on source Key: HIVE-24733 URL: https://issues.apache.org/jira/browse/HIVE-24733 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha {color:#172b4d} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24697) DbNotificationListener Cleaner thread dies with NoSuchMethodError
Pravin Sinha created HIVE-24697: --- Summary: DbNotificationListener Cleaner thread dies with NoSuchMethodError Key: HIVE-24697 URL: https://issues.apache.org/jira/browse/HIVE-24697 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha {code:java} java.lang.NoSuchMethodError: javax.jdo.Query.close()V at org.apache.hadoop.hive.metastore.ObjectStore.doCleanNotificationEvents(ObjectStore.java:11025) at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:10965) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24654) Table level replication support for Atlas metadata
Pravin Sinha created HIVE-24654: --- Summary: Table level replication support for Atlas metadata Key: HIVE-24654 URL: https://issues.apache.org/jira/browse/HIVE-24654 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha Covers mainly Atlas export API payload change required to support table level replication -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24417) Add config options for Atlas client timeouts
Pravin Sinha created HIVE-24417: --- Summary: Add config options for Atlas client timeouts Key: HIVE-24417 URL: https://issues.apache.org/jira/browse/HIVE-24417 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24363) Current order of transactional event listeners is prone to deadlock in backend DB connections
Pravin Sinha created HIVE-24363: --- Summary: Current order of transactional event listeners is prone to deadlock in backend DB connections Key: HIVE-24363 URL: https://issues.apache.org/jira/browse/HIVE-24363 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha Currently the AcidEventListener is added to the end of list transactional event listeners. When DbNotificationListener is configured in 'hive.metastore.transactional.event.listeners'. The list will be formed as : {"DbNotificationListener" , "AcidEventListener"} This will result in backend DB lock acquisition in this order: {code:java} lock(a) { // perform some op on a lock(b) { // perform some op on b } } {code} On the other hand, there are some HMS API say for example commit_txn(), which calls the TxnHandler method directly, followed by DbNotificationListener processing. Which will result in the lock acquisition in reverse order: {code:java} lock(b) { // perform some op on b lock(a) { // perform some op on a } } {code} Note: 'a' and 'b' above are backend DB lock and not a jvm lock. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24327) During Atlas metadata replication handle a case when AtlasServer entity not present
Pravin Sinha created HIVE-24327: --- Summary: During Atlas metadata replication handle a case when AtlasServer entity not present Key: HIVE-24327 URL: https://issues.apache.org/jira/browse/HIVE-24327 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24267) RetryingClientTimeBased should perform first invocation immediately
Pravin Sinha created HIVE-24267: --- Summary: RetryingClientTimeBased should perform first invocation immediately Key: HIVE-24267 URL: https://issues.apache.org/jira/browse/HIVE-24267 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24244) NPE during Atlas metadata replication
Pravin Sinha created HIVE-24244: --- Summary: NPE during Atlas metadata replication Key: HIVE-24244 URL: https://issues.apache.org/jira/browse/HIVE-24244 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24187) Handle _files creation for HA config with same nameservice on source and destination
Pravin Sinha created HIVE-24187: --- Summary: Handle _files creation for HA config with same nameservice on source and destination Key: HIVE-24187 URL: https://issues.apache.org/jira/browse/HIVE-24187 Project: Hive Issue Type: Improvement Reporter: Pravin Sinha Assignee: Pravin Sinha Current HA is supported only for different nameservices on Source and Destination. We need to add support of same nameservice on Source and Destination. Local nameservice will be passed correctly to the repl command. Remote nameservice will be a random name and corresponding configs for the same. Example: Clusters originally configured with ns for hdfs: src: ns1 target : ns1 We can denote remote name with some random name, say for example: nsRemote. This is how the command will see the ns w.r.t source and target: Repl Dump : src: ns1, target: nsRemote Repl Load: src: nsRemote, target: ns1 Entries in the _files(for managed table data loc) will be made with nsRemote in stead of ns1(for src). Example: hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot Same way list of external table data locations will also be modified using nsRemote in stead of ns1(for src). New configs can control the behavior: *hive.repl.ha.datapath.replace.remote.nameservice = * *hive.repl.ha.datapath.replace.remote.nameservice.name = * Based on the above configs replacement of nameservice can be done. This will also require that 'hive.repl.rootdir' is passed accordingly during dump and load: Repl dump: ||Repl Operation||Repl Command|| |*Staging on source cluster*| |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')| |Repl Load|repl load dbName into dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')| |*Staging on target cluster*| |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')| |Repl Load|repl load dbName into dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24170) Add the UDF jar explicitely to the classpath while handling drop function event during repl load.
Pravin Sinha created HIVE-24170: --- Summary: Add the UDF jar explicitely to the classpath while handling drop function event during repl load. Key: HIVE-24170 URL: https://issues.apache.org/jira/browse/HIVE-24170 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24148) TestHiveStrictManagedMigration.testExternalMove failing for all new PR.
Pravin Sinha created HIVE-24148: --- Summary: TestHiveStrictManagedMigration.testExternalMove failing for all new PR. Key: HIVE-24148 URL: https://issues.apache.org/jira/browse/HIVE-24148 Project: Hive Issue Type: Bug Reporter: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24131) Use original src location always when data copy runs on target
Pravin Sinha created HIVE-24131: --- Summary: Use original src location always when data copy runs on target Key: HIVE-24131 URL: https://issues.apache.org/jira/browse/HIVE-24131 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24129) Deleting the previous successful dump directory should be based on config
Pravin Sinha created HIVE-24129: --- Summary: Deleting the previous successful dump directory should be based on config Key: HIVE-24129 URL: https://issues.apache.org/jira/browse/HIVE-24129 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Arko Sharma {color:#22}Description: Provide a policy level config defaulted to true.{color} {color:#22}This can help debug any issue in the production.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24114) Load is not working with both staging and data copy on target
Pravin Sinha created HIVE-24114: --- Summary: Load is not working with both staging and data copy on target Key: HIVE-24114 URL: https://issues.apache.org/jira/browse/HIVE-24114 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
Pravin Sinha created HIVE-24067: --- Summary: TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop Key: HIVE-24067 URL: https://issues.apache.org/jira/browse/HIVE-24067 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha In TestReplicationScenariosExclusiveReplica during drop database operation for primary db, it leads to wrong FS error as the ReplChangeManager is associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24011) Flaky test AsyncResponseHandlerTest
Pravin Sinha created HIVE-24011: --- Summary: Flaky test AsyncResponseHandlerTest Key: HIVE-24011 URL: https://issues.apache.org/jira/browse/HIVE-24011 Project: Hive Issue Type: Task Reporter: Pravin Sinha [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1352/2/tests/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23960) Partition with no column statistics leads to unbalanced calls to openTransaction/commitTransaction error during get_partitions_by_names
Pravin Sinha created HIVE-23960: --- Summary: Partition with no column statistics leads to unbalanced calls to openTransaction/commitTransaction error during get_partitions_by_names Key: HIVE-23960 URL: https://issues.apache.org/jira/browse/HIVE-23960 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha {color:#172b4d}Creating a partition with data and adding another partition is leading to unbalanced calls to open/commit transaction during get_partitions_by_names call. {color} {color:#172b4d}Issue was discovered during REPL DUMP operation which uses this HMS call to get the metadata of partition. This error occurs when there is a partition with no column statistics.{color} {color:#172b4d}To reproduce:{color} {code:java} CREATE TABLE student_part_acid(name string, age int, gpa double) PARTITIONED BY (ds string) STORED AS orc; LOAD DATA INPATH ‘/user/hive/partDir/student_part_acid/ds=20110924’ INTO TABLE student_part_acid partition(ds=20110924); ALTER TABLE student_part_acid ADD PARTITION (ds=20110925); Now we try to preform REPL DUMP it fails with this the error "Unbalanced calls to open/commit transaction" on the HS2 side. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23916) Fix Atlas client dependencies version
Pravin Sinha created HIVE-23916: --- Summary: Fix Atlas client dependencies version Key: HIVE-23916 URL: https://issues.apache.org/jira/browse/HIVE-23916 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23835) Repl Dump should dump function binaries to staging directory
Pravin Sinha created HIVE-23835: --- Summary: Repl Dump should dump function binaries to staging directory Key: HIVE-23835 URL: https://issues.apache.org/jira/browse/HIVE-23835 Project: Hive Issue Type: Task Reporter: Pravin Sinha Assignee: Pravin Sinha {color:#172b4d}When hive function's binaries are on source HDFS, repl dump should dump it to the staging location in order to break cross clusters visibility requirement.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23611) Mandate fully qualified absolute path for for external table base dir during REPL operation
Pravin Sinha created HIVE-23611: --- Summary: Mandate fully qualified absolute path for for external table base dir during REPL operation Key: HIVE-23611 URL: https://issues.apache.org/jira/browse/HIVE-23611 Project: Hive Issue Type: Improvement Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23605) Wrong FS error during _external_tables_info creation when staging location is remote
Pravin Sinha created HIVE-23605: --- Summary: Wrong FS error during _external_tables_info creation when staging location is remote Key: HIVE-23605 URL: https://issues.apache.org/jira/browse/HIVE-23605 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: Pravin Sinha When staging location is on target cluster, Repl Dump fails to create _external_tables_info file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location
Pravin Sinha created HIVE-23539: --- Summary: Optimize data copy during repl load operation for HDFS based staging location Key: HIVE-23539 URL: https://issues.apache.org/jira/browse/HIVE-23539 Project: Hive Issue Type: Improvement Reporter: Pravin Sinha Assignee: Pravin Sinha -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23538) Cannot run setBugDatabaseInfo from findbugs during preCommit
Pravin Sinha created HIVE-23538: --- Summary: Cannot run setBugDatabaseInfo from findbugs during preCommit Key: HIVE-23538 URL: https://issues.apache.org/jira/browse/HIVE-23538 Project: Hive Issue Type: Bug Reporter: Pravin Sinha Assignee: David Mollitor During the preCommit of the patch HIVE-23353 this is seen. -1 findbugs1m 5s patch/common cannot run setBugDatabaseInfo from findbugs -1 findbugs10m 27s patch/ql cannot run setBugDatabaseInfo from findbugs -1 findbugs1m 51s patch/itests/hive-unit cannot run setBugDatabaseInfo from findbugs -- This message was sent by Atlassian Jira (v8.3.4#803005)