[jira] [Commented] (HIVE-16738) Notification ID generation in DBNotification might not be unique across HS2 instances.
[ https://issues.apache.org/jira/browse/HIVE-16738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050034#comment-16050034 ] anishek commented on HIVE-16738: No, I havent started on this one, please go ahead. > Notification ID generation in DBNotification might not be unique across HS2 > instances. > -- > > Key: HIVE-16738 > URL: https://issues.apache.org/jira/browse/HIVE-16738 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > Going to explain the problem in scope of "replication" feature for hive 2 > that is being built, as it is easier to explain: > To allow replication to work we need to set > "hive.metastore.transactional.event.listeners" to DBNotificationListener. > For use cases where there are multiple HiveServer2 Instances running > {code} > private void process(NotificationEvent event, ListenerEvent listenerEvent) > throws MetaException { > event.setMessageFormat(msgFactory.getMessageFormat()); > synchronized (NOTIFICATION_TBL_LOCK) { > LOG.debug("DbNotificationListener: Processing : {}:{}", > event.getEventId(), > event.getMessage()); > HMSHandler.getMSForConf(hiveConf).addNotificationEvent(event); > } > // Set the DB_NOTIFICATION_EVENT_ID for future reference by other > listeners. > if (event.isSetEventId()) { > listenerEvent.putParameter( > MetaStoreEventListenerConstants.DB_NOTIFICATION_EVENT_ID_KEY_NAME, > Long.toString(event.getEventId())); > } > } > {code} > the above code in DBNotificationListner having the object lock wont be > guarantee enough to make sure that all events get a unique id. The > transaction isolation level at the db "read-comitted" or "repeatable-read" > would also not guarantee the same, unless a lock is at the db level > preferably on table {{NOTIFICATION_SEQUENCE}} which only has one row. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16905) Add zookeeper ACL for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saijin Huang updated HIVE-16905: Description: Add zookeeper ACL for hiveserver2 > Add zookeeper ACL for hiveserver2 > - > > Key: HIVE-16905 > URL: https://issues.apache.org/jira/browse/HIVE-16905 > Project: Hive > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Saijin Huang >Assignee: Saijin Huang > > Add zookeeper ACL for hiveserver2 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16905) Add zookeeper ACL for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saijin Huang reassigned HIVE-16905: --- > Add zookeeper ACL for hiveserver2 > - > > Key: HIVE-16905 > URL: https://issues.apache.org/jira/browse/HIVE-16905 > Project: Hive > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Saijin Huang >Assignee: Saijin Huang > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16904) during repl load for large number of partitions the metadata file can be huge and can lead to out of memory
[ https://issues.apache.org/jira/browse/HIVE-16904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16904: -- > during repl load for large number of partitions the metadata file can be huge > and can lead to out of memory > > > Key: HIVE-16904 > URL: https://issues.apache.org/jira/browse/HIVE-16904 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > the metadata pertaining to a table + its partitions is stored in a single > file, During repl load all the data is loaded in memory in one shot and then > individual partitions processed. This can lead to huge memory overhead as the > entire file is read in memory. try to deserialize the partition objects with > some sort of streaming json deserializer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16876) RpcServer should be re-created when Rpc configs change
[ https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049986#comment-16049986 ] Xuefu Zhang commented on HIVE-16876: +1 > RpcServer should be re-created when Rpc configs change > -- > > Key: HIVE-16876 > URL: https://issues.apache.org/jira/browse/HIVE-16876 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-16876.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-16903: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks [~gopalv]. Committed to master. > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-16903.1.patch > > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049923#comment-16049923 ] Chao Sun commented on HIVE-11297: - Sure. Added comments in RB. Regarding the output file, you can just use {{-Dtest.output.overwrite=true}} to generate new file. > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049902#comment-16049902 ] liyunzhang_intel commented on HIVE-11297: - [~csun]: can you help to view HIVE-11297.3.patch which changes {{SplitOpTreeForDPP.java}} and {{spark.dynamic.partition.pruning.q.out}}? thanks > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.
[ https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhizhen Hou updated HIVE-16332: --- Summary: When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error. (was: We create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.) > When create a partitioned text format table with one partition, after we > change the format of table to orc, then the array type field may output error. > --- > > Key: HIVE-16332 > URL: https://issues.apache.org/jira/browse/HIVE-16332 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.1 >Reporter: Zhizhen Hou >Priority: Critical > > ##The step to reproduce the result. > 1. First crate a text format table with array type field in hive. > ``` > create table test_text_orc ( > col_int bigint, > col_text string, > col_array array, > col_map map> ) > PARTITIONED BY ( >day string >) >ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > collection items TERMINATED BY ']' > map keys TERMINATED BY ':' > ; > > ``` > 2. Create new text file hive-orc-text-file-array-error-test.txt. > ``` > 1,text_value1,array_value1]array_value2]array_value3, > map_key1:map_value1,map_key2:map_value2 > 2,text_value2,array_value4, map_key1:map_value3 > ,text_value3,, map_key1:]map_key3:map_value3 > ``` > 3. Load the data into one partition. > ``` > LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite > into table test_text_orc partition(day=20170329) > ``` > 4. select the data to verify the result. > ``` > hive> select * from test.test_text_orc; > OK > 1 text_value1 ["array_value1","array_value2","array_value3"] {" > map_key1":"map_value1","map_key2":"map_value2"} 20170329 > 2 text_value2 ["array_value4"]{"map_key1":"map_value3"} > 20170329 > NULL text_value3 [] {" map_key1":"","map_key3":"map_value3"} > 20170329 > ``` > 5. Alter table format of table to orc; > ``` > alter table test_text_orc set fileformat orc; > ``` > 6. Check the result again, and you can see the error result. > ``` > hive> select * from test.test_text_orc; > OK > 1 text_value1 ["array_value1","array_value2","array_value3"] {" > map_key1":"map_value1","map_key2":"map_value2"} 20170329 > 2 text_value2 ["array_value4","array_value2","array_value3"] > {"map_key1":"map_value3"} 20170329 > NULL text_value3 ["array_value4","array_value2","array_value3"] > {"map_key3":"map_value3"," map_key1":""}20170329 > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049847#comment-16049847 ] Gopal V commented on HIVE-16903: +1 > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-16903.1.patch > > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049828#comment-16049828 ] Hive QA commented on HIVE-16903: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873045/HIVE-16903.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10831 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5647/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5647/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5647/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873045 - PreCommit-HIVE-Build > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-16903.1.patch > > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16902) investigate "failed to remove operation log" errors
[ https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049775#comment-16049775 ] Hive QA commented on HIVE-16902: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873026/HIVE-16902.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10830 tests executed *Failed tests:* {noformat} TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file (likely timed out) (batchId=238) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5646/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5646/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5646/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873026 - PreCommit-HIVE-Build > investigate "failed to remove operation log" errors > --- > > Key: HIVE-16902 > URL: https://issues.apache.org/jira/browse/HIVE-16902 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-16902.1.patch > > > When we call {{set a=3;}} from beeline, the following exception is thrown. > {noformat} > [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log > file of operation: OperationHandle [opType=GET_TABLES, > getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658] > java.io.FileNotFoundException: File does not exist: > /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915 > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275) > at > org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122) > at > org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90) > at > org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287) > at > org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58) > at > org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273) > at > org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >
[jira] [Assigned] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned HIVE-16903: --- Assignee: Rajesh Balamohan > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-16903.1.patch > > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-16903: Attachment: HIVE-16903.1.patch \cc [~sseth]. > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-16903.1.patch > > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-16903: Status: Patch Available (was: Open) > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-16903.1.patch > > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
[ https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-16903: Component/s: llap > LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE > -- > > Key: HIVE-16903 > URL: https://issues.apache.org/jira/browse/HIVE-16903 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Priority: Trivial > > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8434) Vectorization logic using wrong values for DATE and TIMESTAMP partitioning columns in vectorized row batches...
[ https://issues.apache.org/jira/browse/HIVE-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049759#comment-16049759 ] Charles Pritchard commented on HIVE-8434: - Hit this in 1.2.1 when using MONTH(CAST(datestr_partitioncol as date)) on select and group by -- gives unstable results. Seeing a lot of 7 and 31. > Vectorization logic using wrong values for DATE and TIMESTAMP partitioning > columns in vectorized row batches... > --- > > Key: HIVE-8434 > URL: https://issues.apache.org/jira/browse/HIVE-8434 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 0.14.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-8434.01.patch, HIVE-8434.02.patch > > > VectorizedRowBatchCtx.addPartitionColsToBatch uses wrong values to populate > DATE and TIMESTAMP data types. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16902) investigate "failed to remove operation log" errors
[ https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16902: Status: Patch Available (was: Open) patch-1: in some cases, actually no log file is created since no log is to be printed to client console. So just try to remove the log file if the file exists. > investigate "failed to remove operation log" errors > --- > > Key: HIVE-16902 > URL: https://issues.apache.org/jira/browse/HIVE-16902 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-16902.1.patch > > > When we call {{set a=3;}} from beeline, the following exception is thrown. > {noformat} > [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log > file of operation: OperationHandle [opType=GET_TABLES, > getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658] > java.io.FileNotFoundException: File does not exist: > /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915 > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275) > at > org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122) > at > org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90) > at > org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287) > at > org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58) > at > org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273) > at > org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy38.closeOperation(Unknown Source) > at > org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:475) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:671) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16902) investigate "failed to remove operation log" errors
[ https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16902: Attachment: HIVE-16902.1.patch > investigate "failed to remove operation log" errors > --- > > Key: HIVE-16902 > URL: https://issues.apache.org/jira/browse/HIVE-16902 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-16902.1.patch > > > When we call {{set a=3;}} from beeline, the following exception is thrown. > {noformat} > [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log > file of operation: OperationHandle [opType=GET_TABLES, > getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658] > java.io.FileNotFoundException: File does not exist: > /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915 > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275) > at > org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122) > at > org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90) > at > org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287) > at > org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58) > at > org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273) > at > org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy38.closeOperation(Unknown Source) > at > org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:475) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:671) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16902) investigate "failed to remove operation log" errors
[ https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-16902: --- > investigate "failed to remove operation log" errors > --- > > Key: HIVE-16902 > URL: https://issues.apache.org/jira/browse/HIVE-16902 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > > When we call {{set a=3;}} from beeline, the following exception is thrown. > {noformat} > [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log > file of operation: OperationHandle [opType=GET_TABLES, > getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658] > java.io.FileNotFoundException: File does not exist: > /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915 > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275) > at > org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122) > at > org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90) > at > org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287) > at > org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58) > at > org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273) > at > org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy38.closeOperation(Unknown Source) > at > org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:475) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:671) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats
[ https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049527#comment-16049527 ] Thomas Poepping commented on HIVE-16288: I hate to bump this again, but it looks like branch-2 is still wrong. Can you help again, [~ashutoshc]? Thank you! > Add blobstore tests for ORC and RCFILE file formats > --- > > Key: HIVE-16288 > URL: https://issues.apache.org/jira/browse/HIVE-16288 > Project: Hive > Issue Type: Test > Components: Tests >Affects Versions: 2.1.1 >Reporter: Thomas Poepping >Assignee: Thomas Poepping > Fix For: 2.3.0, 3.0.0 > > Attachments: HIVE-16288.patch > > > This patch adds four tests each for ORC and RCFILE when running against > blobstore filesystems: > * Test for bucketed tables > * Test for nonpartitioned tables > * Test for partitioned tables > * Test for partitioned tables with nonstandard partition locations -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16901) Distcp optimization - One distcp per CopyTask
[ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16901: Description: Currently, if a CopyTask is created to copy a list of files, then distcp is invoked for each and every file. Instead, need to pass the list of source files to be copied to distcp tool which basically copies the files in parallel and hence gets lot of performance gain. If the copy of list of files fail, then traverse the destination directory to see which file is missing and checksum mismatches, then trigger copy of those files one by one. was: Currently, if a ReplCopyTask is created to copy a list of files, then distcp is invoked for each and every file. Instead, need to pass the list of source files to be copied to distcp tool which basically copies the files in parallel and hence gets lot of performance gain. If the copy of list of files fail, then traverse the destination directory to see which file is missing and checksum mismatches, then trigger copy of those files one by one. > Distcp optimization - One distcp per CopyTask > -- > > Key: HIVE-16901 > URL: https://issues.apache.org/jira/browse/HIVE-16901 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > > Currently, if a CopyTask is created to copy a list of files, then distcp is > invoked for each and every file. Instead, need to pass the list of source > files to be copied to distcp tool which basically copies the files in > parallel and hence gets lot of performance gain. > If the copy of list of files fail, then traverse the destination directory to > see which file is missing and checksum mismatches, then trigger copy of those > files one by one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16901) Distcp optimization - One distcp per CopyTask
[ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16901: Summary: Distcp optimization - One distcp per CopyTask (was: Distcp optimization - One distcp per ReplCopyTask ) > Distcp optimization - One distcp per CopyTask > -- > > Key: HIVE-16901 > URL: https://issues.apache.org/jira/browse/HIVE-16901 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > > Currently, if a ReplCopyTask is created to copy a list of files, then distcp > is invoked for each and every file. Instead, need to pass the list of source > files to be copied to distcp tool which basically copies the files in > parallel and hence gets lot of performance gain. > If the copy of list of files fail, then traverse the destination directory to > see which file is missing and checksum mismatches, then trigger copy of those > files one by one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask
[ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan reassigned HIVE-16901: --- > Distcp optimization - One distcp per ReplCopyTask > -- > > Key: HIVE-16901 > URL: https://issues.apache.org/jira/browse/HIVE-16901 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > > Currently, if a ReplCopyTask is created to copy a list of files, then distcp > is invoked for each and every file. Instead, need to pass the list of source > files to be copied to distcp tool which basically copies the files in > parallel and hence gets lot of performance gain. > If the copy of list of files fail, then traverse the destination directory to > see which file is missing and checksum mismatches, then trigger copy of those > files one by one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause
[ https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049458#comment-16049458 ] Hive QA commented on HIVE-16885: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872992/HIVE-16885.01.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5645/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5645/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5645/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872992 - PreCommit-HIVE-Build > Non-equi Joins: Filter clauses should be pushed into the ON clause > -- > > Key: HIVE-16885 > URL: https://issues.apache.org/jira/browse/HIVE-16885 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16885.01.patch, HIVE-16885.patch > > > FIL_24 -> MAPJOIN_23 > {code} > hive> explain select * from part where p_size > (select max(p_size) from > part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=110 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=110 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=330 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=2 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator
[jira] [Commented] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause
[ https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049384#comment-16049384 ] Hive QA commented on HIVE-16885: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872992/HIVE-16885.01.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5644/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5644/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5644/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872992 - PreCommit-HIVE-Build > Non-equi Joins: Filter clauses should be pushed into the ON clause > -- > > Key: HIVE-16885 > URL: https://issues.apache.org/jira/browse/HIVE-16885 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16885.01.patch, HIVE-16885.patch > > > FIL_24 -> MAPJOIN_23 > {code} > hive> explain select * from part where p_size > (select max(p_size) from > part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=110 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=110 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=330 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=2 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator
[jira] [Updated] (HIVE-16835) Addendum to HIVE-16745
[ https://issues.apache.org/jira/browse/HIVE-16835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-16835: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > Addendum to HIVE-16745 > -- > > Key: HIVE-16835 > URL: https://issues.apache.org/jira/browse/HIVE-16835 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Fix For: 3.0.0 > > Attachments: HIVE-16835.01.patch > > > HIVE-16745 missed fixing the syntax error in hive-schema-1.1.0.mysql.sql -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16835) Addendum to HIVE-16745
[ https://issues.apache.org/jira/browse/HIVE-16835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049347#comment-16049347 ] Sergio Peña commented on HIVE-16835: Looks simple +1 > Addendum to HIVE-16745 > -- > > Key: HIVE-16835 > URL: https://issues.apache.org/jira/browse/HIVE-16835 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16835.01.patch > > > HIVE-16745 missed fixing the syntax error in hive-schema-1.1.0.mysql.sql -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049337#comment-16049337 ] Hive QA commented on HIVE-14747: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872986/HIVE-14747.02.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10831 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5643/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5643/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5643/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872986 - PreCommit-HIVE-Build > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause
[ https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16885: --- Attachment: (was: HIVE-16885.01.patch) > Non-equi Joins: Filter clauses should be pushed into the ON clause > -- > > Key: HIVE-16885 > URL: https://issues.apache.org/jira/browse/HIVE-16885 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16885.01.patch, HIVE-16885.patch > > > FIL_24 -> MAPJOIN_23 > {code} > hive> explain select * from part where p_size > (select max(p_size) from > part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=110 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=110 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=330 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=2 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator [SEL_22] (rows=2 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > TableScan [TS_0] (rows=2 width=621) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause
[ https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16885: --- Attachment: HIVE-16885.01.patch > Non-equi Joins: Filter clauses should be pushed into the ON clause > -- > > Key: HIVE-16885 > URL: https://issues.apache.org/jira/browse/HIVE-16885 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16885.01.patch, HIVE-16885.patch > > > FIL_24 -> MAPJOIN_23 > {code} > hive> explain select * from part where p_size > (select max(p_size) from > part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=110 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=110 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=330 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=2 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator [SEL_22] (rows=2 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > TableScan [TS_0] (rows=2 width=621) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause
[ https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16885: --- Attachment: HIVE-16885.01.patch > Non-equi Joins: Filter clauses should be pushed into the ON clause > -- > > Key: HIVE-16885 > URL: https://issues.apache.org/jira/browse/HIVE-16885 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16885.01.patch, HIVE-16885.patch > > > FIL_24 -> MAPJOIN_23 > {code} > hive> explain select * from part where p_size > (select max(p_size) from > part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=110 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=110 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=330 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=2 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator [SEL_22] (rows=2 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > TableScan [TS_0] (rows=2 width=621) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049276#comment-16049276 ] Hive QA commented on HIVE-14747: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872982/HIVE-14747.01.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10817 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] (batchId=9) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=103) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5642/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5642/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5642/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872982 - PreCommit-HIVE-Build > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly
[ https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16357 started by Barna Zsombor Klara. -- > Failed folder creation when creating a new table is reported incorrectly > > > Key: HIVE-16357 > URL: https://issues.apache.org/jira/browse/HIVE-16357 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > If the directory for a Hive table could not be created, them the HMS will > throw a metaexception: > {code} > if (tblPath != null) { > if (!wh.isDir(tblPath)) { > if (!wh.mkdirs(tblPath, true)) { > throw new MetaException(tblPath > + " is not a directory or unable to create one"); > } > madeDir = true; > } > } > {code} > However in the finally block we always try to call the > DbNotificationListener, which in turn will also throw an exception because > the directory is missing, overwriting the initial exception with a > FileNotFoundException. > Actual stacktrace seen by the caller: > {code} > 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: > MetaException(message:java.lang.RuntimeException: > java.io.FileNotFoundException: File file:/.../0 does not exist) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File > file:/.../0 does not exist > at > org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203) > at > org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482) > ... 20 more > Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574) > at > org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:243) > at > org.apache.hadoop.fs.ProxyFileSystem.listStatus(ProxyFileSystem.java:195) > at > org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:243) > at
[jira] [Assigned] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly
[ https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara reassigned HIVE-16357: -- Assignee: Barna Zsombor Klara > Failed folder creation when creating a new table is reported incorrectly > > > Key: HIVE-16357 > URL: https://issues.apache.org/jira/browse/HIVE-16357 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > If the directory for a Hive table could not be created, them the HMS will > throw a metaexception: > {code} > if (tblPath != null) { > if (!wh.isDir(tblPath)) { > if (!wh.mkdirs(tblPath, true)) { > throw new MetaException(tblPath > + " is not a directory or unable to create one"); > } > madeDir = true; > } > } > {code} > However in the finally block we always try to call the > DbNotificationListener, which in turn will also throw an exception because > the directory is missing, overwriting the initial exception with a > FileNotFoundException. > Actual stacktrace seen by the caller: > {code} > 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: > MetaException(message:java.lang.RuntimeException: > java.io.FileNotFoundException: File file:/.../0 does not exist) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File > file:/.../0 does not exist > at > org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203) > at > org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482) > ... 20 more > Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555) > at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574) > at > org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:243) > at > org.apache.hadoop.fs.ProxyFileSystem.listStatus(ProxyFileSystem.java:195) > at >
[jira] [Commented] (HIVE-16738) Notification ID generation in DBNotification might not be unique across HS2 instances.
[ https://issues.apache.org/jira/browse/HIVE-16738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049242#comment-16049242 ] Sergio Peña commented on HIVE-16738: Ah, thanks, I forgot bout the embedded metastore. So this is issue is the same as HIVE-16886 then. Btw, are you working on this patch? I was thinking about providing a patch for HIVE-16886, but if you have some work in progress, then I'll wait. > Notification ID generation in DBNotification might not be unique across HS2 > instances. > -- > > Key: HIVE-16738 > URL: https://issues.apache.org/jira/browse/HIVE-16738 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > Going to explain the problem in scope of "replication" feature for hive 2 > that is being built, as it is easier to explain: > To allow replication to work we need to set > "hive.metastore.transactional.event.listeners" to DBNotificationListener. > For use cases where there are multiple HiveServer2 Instances running > {code} > private void process(NotificationEvent event, ListenerEvent listenerEvent) > throws MetaException { > event.setMessageFormat(msgFactory.getMessageFormat()); > synchronized (NOTIFICATION_TBL_LOCK) { > LOG.debug("DbNotificationListener: Processing : {}:{}", > event.getEventId(), > event.getMessage()); > HMSHandler.getMSForConf(hiveConf).addNotificationEvent(event); > } > // Set the DB_NOTIFICATION_EVENT_ID for future reference by other > listeners. > if (event.isSetEventId()) { > listenerEvent.putParameter( > MetaStoreEventListenerConstants.DB_NOTIFICATION_EVENT_ID_KEY_NAME, > Long.toString(event.getEventId())); > } > } > {code} > the above code in DBNotificationListner having the object lock wont be > guarantee enough to make sure that all events get a unique id. The > transaction isolation level at the db "read-comitted" or "repeatable-read" > would also not guarantee the same, unless a lock is at the db level > preferably on table {{NOTIFICATION_SEQUENCE}} which only has one row. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-14747: --- Attachment: HIVE-14747.02.patch Added comments in the second version of the patch. > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-14747: --- Status: Patch Available (was: In Progress) > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14747.01.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-14747: --- Attachment: HIVE-14747.01.patch > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14747.01.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-14747 started by Barna Zsombor Klara. -- > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara reassigned HIVE-14747: -- Assignee: Barna Zsombor Klara > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16865) Handle replication bootstrap of large databases
[ https://issues.apache.org/jira/browse/HIVE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048991#comment-16048991 ] anishek edited comment on HIVE-16865 at 6/14/17 9:59 AM: - h4. Bootstrap Replication Dump * db metadata one at a time * function metadata one at a time * get all tableNames — which should not be overwhelming * tables is also requested one a time. — all partition definitions are loaded in one go for a table and we dont expect a table with more than 5-10 partition definition columns * the partition object themselves will be done in batches via the PartitionIteratable * only problem seems to be when writing data to _files where in we load all the file status objects per partition ( for partitioned tables) and per table otherwise , in memory. this might lead OOM cases :: decision : this is not a problem as for split computation we will do the same, where we have not faced this issue. * we create replCopyTask that will create the _files for all tables / partitions etc during analysis time and then go to execution engine, this will lead to lot of objects stored in memory given the above scale targets. *possibly* ** move the dump enclosed in a task itself which manage its own thread pools to subsequently analyze/dump tables in execution phase, this will lead to possible blurring of demarcation of execution vs analysis phase within hive. ** Another mode might be to provide lazy incremental task, from analysis to execution phase, such that both phases run simultaneously rather than one completing before another is started, this will lead to significant change in code to allow the same and currently only seems to be required only for replication. ** we might have to do the same for _incremental replication dump_ too as the _*from*_ and _*to*_ event ids might have millions of events will all of them being inserts, though the creation of _files is handled differently here where in we write the files along with metadata, we should be able to do the same for bootstrap replication also rather than creating replcopy task. this would mean the replCopyTask should effectively be only used during load time. The only problem using this approach is that since the process is single threaded we are going to dump data sequentially and it might take long time, unless we do some threading in ReplicationSemanticAnalyzer to dump tables with some parallel since there is no dependency between tables when dumping them, a similar approach might be required for partitions also within tables. h4.Bootstrap Replication Load * list all the table metadata files per db. For massive databases we will load a per above on the order of a million filestatus objects in memory. This seems to significant higher order of objects loaded than probably during split computation and hence might need to look at it. most probably move to {code}org.apache.hadoop.fs.RemoteIterator listFiles(Path f, boolean recursive){code} * a task will be created for each type of operation, in case of bootstrap one task per table / partition / function /database, hence we will encounter the last problem in _*Bootstrap Replication Dump*_ h4. Additional thoughts * Since there can be multiple instance of metastores, from an integration w.r.t beacon for replication, would it be better to have a dedicated metastore instance for replication related workload(at least for bootstrap), since the execution of tasks will take place on the metastore instance it might be better served for the customer to have one metastore for replication and others to handle normal workloads. This can be achieved, I think, based on how the URL's are configured on HS2 client/orchestration engine of replication . * On calling distcp in replcopytask can we log the sourcepath to destpath else if there are problems during copying we wont know the actual paths. * On replica warehouse since replication tasks will run alongside normal execution of other hive tasks assuming there are multiple db's on replica, how do we constraint resource allocation for replication vs normal task ? how do we manage this such that we dont lag behind replication significantly ? was (Author: anishek): h4. Bootstrap Replication Dump * db metadata one at a time * function metadata one at a time * get all tableNames — which should not be overwhelming * tables is also requested one a time. — all partition definitions are loaded in one go for a table and we dont expect a table with more than 5-10 partition definition columns * the partition object themselves will be done in batches via the PartitionIteratable * only problem seems to be when writing data to _files where in we load all the file status objects per partition ( for partitioned tables) and per table otherwise , in memory. this might lead OOM cases :: decision : this is not a
[jira] [Commented] (HIVE-16865) Handle replication bootstrap of large databases
[ https://issues.apache.org/jira/browse/HIVE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048991#comment-16048991 ] anishek commented on HIVE-16865: h4. Bootstrap Replication Dump * db metadata one at a time * function metadata one at a time * get all tableNames — which should not be overwhelming * tables is also requested one a time. — all partition definitions are loaded in one go for a table and we dont expect a table with more than 5-10 partition definition columns * the partition object themselves will be done in batches via the PartitionIteratable * only problem seems to be when writing data to _files where in we load all the file status objects per partition ( for partitioned tables) and per table otherwise , in memory. this might lead OOM cases :: decision : this is not a problem as for split computation we will do the same, where we have not faced this issue. * we create replCopyTask that will create the _files for all tables / partitions etc during analysis time and then go to execution engine, this will lead to lot of objects stored in memory given the above scale targets. *possibly* ** move the dump enclosed in a task itself which manage its own thread pools to subsequently analyze/dump tables in execution phase, this will lead to possible blurring of demarcation of execution vs analysis phase within hive. ** Another mode might be to provide lazy incremental task, from analysis to execution phase, such that both phases run simultaneously rather than one completing before another is started, this will lead to significant change in code to allow the same and currently only seems to be required only for replication. ** we might have to do the same for _incremental replication dump_ too as the _*from*_ and _*to*_ event ids might have millions of events will all of them being inserts, though the creation of _files is handled differently here where in we write the files along with metadata, we should be able to do the same for bootstrap replication also rather than creating replcopy task. this would mean the replCopyTask should effectively be only used during load time. The only problem using this approach is that since the process is single threaded we are going to dump data sequentially and it might take long time, unless we do some threading in ReplicationSemanticAnalyzer to dump tables with some parallel since there is no dependency between tables when dumping them, a similar approach might be required for partitions also within tables. h4.Bootstrap Replication Load * list all the table metadata files per db. For massive databases we will load a per above on the order of a million filestatus objects in memory. This seems to significant higher order of objects loaded than probably during split computation and hence might need to look at it. most probably move to {code}org.apache.hadoop.fs.RemoteIterator listFiles(Path f, boolean recursive){code} * a task will be created for each type of operation, in case of bootstrap one task per table / partition / function /database, hence we will encounter the last problem in _*Bootstrap Replication Dump*_ h4. Additional thoughts * Since there can be multiple instance of metastores, from an integration w.r.t beacon for replication, would it be better to have a dedicated metastore instance for replication related workload(at least for bootstrap), since the execution of tasks will take place on the metastore instance it might be better served for the customer to have one metastore for replication and others to handle normal workloads. This can be achieved, I think, based on how the URL's are configured on HS2/beacon side. * On calling distcp in replcopytask can we log the sourcepath to destpath else if there are problems during copying we wont know the actual paths. * On replica warehouse since replication tasks will run alongside normal execution of other hive tasks assuming there are multiple db's on replica, how do we constraint resource allocation for replication vs normal task ? how do we manage this such that we dont lag behind replication significantly ? > Handle replication bootstrap of large databases > --- > > Key: HIVE-16865 > URL: https://issues.apache.org/jira/browse/HIVE-16865 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > for larger databases make sure that we can handle replication bootstrap. > * Assuming large database can have close to million tables or a few tables > with few hundred thousand partitions. > * for function replication if a primary warehouse has large number of custom > functions defined such that the same binary file in
[jira] [Updated] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory during repl load
[ https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16900: --- Description: During repl Copy currently we only allow operations per file as against list of files supported by distcp, During bootstrap table/partitions load it will be great to load all files listed in {noformat}_files{noformat} in a single distcp job to make it more efficient, this would require changes to the _shims_ sub project in hive to additionally expose api's which take multiple source files. was: During repl Copy currently we only allow operations per file as against list of files supported by distcp, During bootstrap table/partitions load it will be great to load all files listed in _files in a single distcp job to make it more efficient, this would require changes to the _shims_ sub project in hive to additionally expose api's which take multiple source files. > optimization to give distcp a list of input files to copy to a destination > target directory during repl load > > > Key: HIVE-16900 > URL: https://issues.apache.org/jira/browse/HIVE-16900 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > During repl Copy currently we only allow operations per file as against list > of files supported by distcp, During bootstrap table/partitions load it will > be great to load all files listed in {noformat}_files{noformat} in a single > distcp job to make it more efficient, this would require changes to the > _shims_ sub project in hive to additionally expose api's which take multiple > source files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory during repl load
[ https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16900: --- Summary: optimization to give distcp a list of input files to copy to a destination target directory during repl load (was: optimization to give distcp a list of input files to copy to a destination target directory) > optimization to give distcp a list of input files to copy to a destination > target directory during repl load > > > Key: HIVE-16900 > URL: https://issues.apache.org/jira/browse/HIVE-16900 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > During repl Copy currently we only allow operations per file as against list > of files supported by distcp, During bootstrap table/partitions load it will > be great to load all files listed in _files in a single distcp job to make it > more efficient, this would require changes to the _shims_ sub project in hive > to additionally expose api's which take multiple source files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory
[ https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16900: -- > optimization to give distcp a list of input files to copy to a destination > target directory > --- > > Key: HIVE-16900 > URL: https://issues.apache.org/jira/browse/HIVE-16900 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > During repl Copy currently we only allow operations per file as against list > of files supported by distcp, During bootstrap table/partitions load it will > be great to load all files listed in _files in a single distcp job to make it > more efficient, this would require changes to the _shims_ sub project in hive > to additionally expose api's which take multiple source files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16895) Multi-threaded execution of bootstrap dump of partitions / functions
[ https://issues.apache.org/jira/browse/HIVE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16895: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > Multi-threaded execution of bootstrap dump of partitions / functions > - > > Key: HIVE-16895 > URL: https://issues.apache.org/jira/browse/HIVE-16895 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > to allow faster execution of bootstrap dump phase we dump multiple partitions > from same table simultaneously. > even though dumping functions is not going to be a blocker, moving to > similar execution modes for all metastore objects will make code more > coherent. > Bootstrap dump at db level does : > * boostrap of all tables > ** boostrap of all partitions in a table. (scope of current jira) > * boostrap of all functions (scope of current jira) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16897) repl load does not lead to excessive memory consumption for multiple functions from same binary jar
[ https://issues.apache.org/jira/browse/HIVE-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16897: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > repl load does not lead to excessive memory consumption for multiple > functions from same binary jar > > > Key: HIVE-16897 > URL: https://issues.apache.org/jira/browse/HIVE-16897 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > as part of function replication we currently keep a separate copy of the > binary jar associated with the function ( this should be same on the primary > warehouse also since each hdfs jar location given during creation of function > will download the resource in a separate resource location thus leading to > the same jar being included in class path multiple times) > this will lead to excessive space used to keep all jars in classpath, solve > this by identifying the common binary jar ( using checksum from primary on > replica) and not creating multiple copies thus preventing excessive memory > usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16896: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16893) move replication dump related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16893: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > move replication dump related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16893 > URL: https://issues.apache.org/jira/browse/HIVE-16893 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > Since we run in to the possibility of creating a large number tasks during > replication bootstrap dump > * we may not be able to hold all of them in memory for really large > databases, which might not hold true once we complete HIVE-16892 > * Also a compile time lock is taken such that only one query is run in this > phase which in replication bootstrap scenario is going to be a very long > running task and hence moving it to execution phase will limit the lock > period in compile phase. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16894) Multi-threaded execution of bootstrap dump of tables.
[ https://issues.apache.org/jira/browse/HIVE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16894: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > Multi-threaded execution of bootstrap dump of tables. > -- > > Key: HIVE-16894 > URL: https://issues.apache.org/jira/browse/HIVE-16894 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > after completing HIVE-16893 the bootstrap process will dump single table at a > time and hence will be very time consuming while not optimally utilizing the > available resources. Since there is no dependency between dumps of various > tables we should be able to do this in parallel. > Bootstrap dump at db level does : > * boostrap of all tables (scope of current jira) > ** boostrap of all partitions in a table. > * boostrap of all functions -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication
[ https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16892: --- Issue Type: Sub-task (was: Improvement) Parent: HIVE-16865 > Move creation of _files from ReplCopyTask to analysis phase for boostrap > replication > - > > Key: HIVE-16892 > URL: https://issues.apache.org/jira/browse/HIVE-16892 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > during replication boostrap we create the _files via ReplCopyTask for > partitions and tables, this can be done inline as part of analysis phase > rather than creating the replCopytask, > This is done to prevent creation of huge number of these tasks in memory > before giving it to the execution engine. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16898) Validation of file after distcp in repl load
[ https://issues.apache.org/jira/browse/HIVE-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16898: -- > Validation of file after distcp in repl load > - > > Key: HIVE-16898 > URL: https://issues.apache.org/jira/browse/HIVE-16898 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > time between deciding the source and destination path for distcp to invoking > of distcp can have a change of the source file, hence distcp might copy the > wrong file to destination, hence we should an additional check on the > checksum of the source file path after distcp finishes to make sure the path > didnot change during the copy process. if it has take additional steps to > delete the previous file on destination and copy the new source and repeat > the same process as above till we copy the correct file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16898) Validation of source file after distcp in repl load
[ https://issues.apache.org/jira/browse/HIVE-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16898: --- Summary: Validation of source file after distcp in repl load (was: Validation of file after distcp in repl load ) > Validation of source file after distcp in repl load > > > Key: HIVE-16898 > URL: https://issues.apache.org/jira/browse/HIVE-16898 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > time between deciding the source and destination path for distcp to invoking > of distcp can have a change of the source file, hence distcp might copy the > wrong file to destination, hence we should an additional check on the > checksum of the source file path after distcp finishes to make sure the path > didnot change during the copy process. if it has take additional steps to > delete the previous file on destination and copy the new source and repeat > the same process as above till we copy the correct file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16897) repl load does not lead to excessive memory consumption for multiple functions from same binary jar
[ https://issues.apache.org/jira/browse/HIVE-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16897: -- > repl load does not lead to excessive memory consumption for multiple > functions from same binary jar > > > Key: HIVE-16897 > URL: https://issues.apache.org/jira/browse/HIVE-16897 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > as part of function replication we currently keep a separate copy of the > binary jar associated with the function ( this should be same on the primary > warehouse also since each hdfs jar location given during creation of function > will download the resource in a separate resource location thus leading to > the same jar being included in class path multiple times) > this will lead to excessive space used to keep all jars in classpath, solve > this by identifying the common binary jar ( using checksum from primary on > replica) and not creating multiple copies thus preventing excessive memory > usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16896: -- > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Improvement >Reporter: anishek >Assignee: anishek > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16876) RpcServer should be re-created when Rpc configs change
[ https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048890#comment-16048890 ] Hive QA commented on HIVE-16876: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872937/HIVE-16876.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10831 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5641/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5641/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5641/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872937 - PreCommit-HIVE-Build > RpcServer should be re-created when Rpc configs change > -- > > Key: HIVE-16876 > URL: https://issues.apache.org/jira/browse/HIVE-16876 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-16876.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16893) move replication dump related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16893: --- Summary: move replication dump related work in semantic analysis phase to execution phase using a task (was: move replication related work in semantic analysis phase to execution phase using a task) > move replication dump related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16893 > URL: https://issues.apache.org/jira/browse/HIVE-16893 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > Since we run in to the possibility of creating a large number tasks during > replication bootstrap dump > * we may not be able to hold all of them in memory for really large > databases, which might not hold true once we complete HIVE-16892 > * Also a compile time lock is taken such that only one query is run in this > phase which in replication bootstrap scenario is going to be a very long > running task and hence moving it to execution phase will limit the lock > period in compile phase. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16894) Multi-threaded execution of bootstrap dump of tables.
[ https://issues.apache.org/jira/browse/HIVE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16894: --- Description: after completing HIVE-16893 the bootstrap process will dump single table at a time and hence will be very time consuming while not optimally utilizing the available resources. Since there is no dependency between dumps of various tables we should be able to do this in parallel. Bootstrap dump at db level does : * boostrap of all tables (scope of current jira) ** boostrap of all partitions in a table. * boostrap of all functions was: after completing HIVE-16893 the bootstrap process will dump single table at a time and hence will be very time consuming while not optimally utilizing the available resources. Bootstrap dump at db level does : * boostrap of all tables (scope of current jira) ** boostrap of all partitions in a table. * boostrap of all functions > Multi-threaded execution of bootstrap dump of tables. > -- > > Key: HIVE-16894 > URL: https://issues.apache.org/jira/browse/HIVE-16894 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > after completing HIVE-16893 the bootstrap process will dump single table at a > time and hence will be very time consuming while not optimally utilizing the > available resources. Since there is no dependency between dumps of various > tables we should be able to do this in parallel. > Bootstrap dump at db level does : > * boostrap of all tables (scope of current jira) > ** boostrap of all partitions in a table. > * boostrap of all functions -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16895) Multi-threaded execution of bootstrap dump of partitions / functions
[ https://issues.apache.org/jira/browse/HIVE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16895: -- > Multi-threaded execution of bootstrap dump of partitions / functions > - > > Key: HIVE-16895 > URL: https://issues.apache.org/jira/browse/HIVE-16895 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > to allow faster execution of bootstrap dump phase we dump multiple partitions > from same table simultaneously. > even though dumping functions is not going to be a blocker, moving to > similar execution modes for all metastore objects will make code more > coherent. > Bootstrap dump at db level does : > * boostrap of all tables > ** boostrap of all partitions in a table. (scope of current jira) > * boostrap of all functions (scope of current jira) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases
[ https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-16600: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks [~kellyzly] for the contribution! > Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel > order by in multi_insert cases > > > Key: HIVE-16600 > URL: https://issues.apache.org/jira/browse/HIVE-16600 > Project: Hive > Issue Type: Sub-task >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Fix For: 3.0.0 > > Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, > HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, > HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, > HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, > HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, > mr.explain.log.HIVE-16600, Node.java, > TestSetSparkReduceParallelism_MultiInsertCase.java > > > multi_insert_gby.case.q > {code} > set hive.exec.reducers.bytes.per.reducer=256; > set hive.optimize.sampling.orderby=true; > drop table if exists e1; > drop table if exists e2; > create table e1 (key string, value string); > create table e2 (key string); > FROM (select key, cast(key as double) as keyD, value from src order by key) a > INSERT OVERWRITE TABLE e1 > SELECT key, value > INSERT OVERWRITE TABLE e2 > SELECT key; > select * from e1; > select * from e2; > {code} > the parallelism of Sort is 1 even we enable parallel order > by("hive.optimize.sampling.orderby" is set as "true"). This is not > reasonable because the parallelism should be calcuated by > [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170] > this is because SetSparkReducerParallelism#needSetParallelism returns false > when [children size of > RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] > is greater than 1. > in this case, the children size of {{RS[2]}} is two. > the logical plan of the case > {code} >TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5] > -SEL[6]-FS[7] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16894) Multi-threaded execution of bootstrap dump of tables.
[ https://issues.apache.org/jira/browse/HIVE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16894: -- > Multi-threaded execution of bootstrap dump of tables. > -- > > Key: HIVE-16894 > URL: https://issues.apache.org/jira/browse/HIVE-16894 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > after completing HIVE-16893 the bootstrap process will dump single table at a > time and hence will be very time consuming while not optimally utilizing the > available resources. > Bootstrap dump at db level does : > * boostrap of all tables (scope of current jira) > ** boostrap of all partitions in a table. > * boostrap of all functions -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16893) move replication related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16893: -- > move replication related work in semantic analysis phase to execution phase > using a task > > > Key: HIVE-16893 > URL: https://issues.apache.org/jira/browse/HIVE-16893 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > Since we run in to the possibility of creating a large number tasks during > replication bootstrap dump > * we may not be able to hold all of them in memory for really large > databases, which might not hold true once we complete HIVE-16892 > * Also a compile time lock is taken such that only one query is run in this > phase which in replication bootstrap scenario is going to be a very long > running task and hence moving it to execution phase will limit the lock > period in compile phase. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication
[ https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-16892: -- > Move creation of _files from ReplCopyTask to analysis phase for boostrap > replication > - > > Key: HIVE-16892 > URL: https://issues.apache.org/jira/browse/HIVE-16892 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > during replication boostrap we create the _files via ReplCopyTask for > partitions and tables, this can be done inline as part of analysis phase > rather than creating the replCopytask, > This is done to prevent creation of huge number of these tasks in memory > before giving it to the execution engine. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases
[ https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048850#comment-16048850 ] liyunzhang_intel commented on HIVE-16600: - [~lirui]: really thanks for review. > Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel > order by in multi_insert cases > > > Key: HIVE-16600 > URL: https://issues.apache.org/jira/browse/HIVE-16600 > Project: Hive > Issue Type: Sub-task >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, > HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, > HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, > HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, > HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, > mr.explain.log.HIVE-16600, Node.java, > TestSetSparkReduceParallelism_MultiInsertCase.java > > > multi_insert_gby.case.q > {code} > set hive.exec.reducers.bytes.per.reducer=256; > set hive.optimize.sampling.orderby=true; > drop table if exists e1; > drop table if exists e2; > create table e1 (key string, value string); > create table e2 (key string); > FROM (select key, cast(key as double) as keyD, value from src order by key) a > INSERT OVERWRITE TABLE e1 > SELECT key, value > INSERT OVERWRITE TABLE e2 > SELECT key; > select * from e1; > select * from e2; > {code} > the parallelism of Sort is 1 even we enable parallel order > by("hive.optimize.sampling.orderby" is set as "true"). This is not > reasonable because the parallelism should be calcuated by > [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170] > this is because SetSparkReducerParallelism#needSetParallelism returns false > when [children size of > RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] > is greater than 1. > in this case, the children size of {{RS[2]}} is two. > the logical plan of the case > {code} >TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5] > -SEL[6]-FS[7] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048842#comment-16048842 ] Alexander Kolbasov commented on HIVE-16886: --- A reasonable way to make it unique is * Declare a unique constraint * Add a new value into a separate row It would be good to have some retry transaction logic, but it may be a bigger change. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; ifinal int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases
[ https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048841#comment-16048841 ] Rui Li commented on HIVE-16600: --- +1 > Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel > order by in multi_insert cases > > > Key: HIVE-16600 > URL: https://issues.apache.org/jira/browse/HIVE-16600 > Project: Hive > Issue Type: Sub-task >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, > HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, > HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, > HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, > HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, > mr.explain.log.HIVE-16600, Node.java, > TestSetSparkReduceParallelism_MultiInsertCase.java > > > multi_insert_gby.case.q > {code} > set hive.exec.reducers.bytes.per.reducer=256; > set hive.optimize.sampling.orderby=true; > drop table if exists e1; > drop table if exists e2; > create table e1 (key string, value string); > create table e2 (key string); > FROM (select key, cast(key as double) as keyD, value from src order by key) a > INSERT OVERWRITE TABLE e1 > SELECT key, value > INSERT OVERWRITE TABLE e2 > SELECT key; > select * from e1; > select * from e2; > {code} > the parallelism of Sort is 1 even we enable parallel order > by("hive.optimize.sampling.orderby" is set as "true"). This is not > reasonable because the parallelism should be calcuated by > [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170] > this is because SetSparkReducerParallelism#needSetParallelism returns false > when [children size of > RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] > is greater than 1. > in this case, the children size of {{RS[2]}} is two. > the logical plan of the case > {code} >TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5] > -SEL[6]-FS[7] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases
[ https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048827#comment-16048827 ] Hive QA commented on HIVE-16600: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872933/HIVE-16600.13.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10817 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=103) org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5640/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5640/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5640/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872933 - PreCommit-HIVE-Build > Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel > order by in multi_insert cases > > > Key: HIVE-16600 > URL: https://issues.apache.org/jira/browse/HIVE-16600 > Project: Hive > Issue Type: Sub-task >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, > HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, > HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, > HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, > HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, > mr.explain.log.HIVE-16600, Node.java, > TestSetSparkReduceParallelism_MultiInsertCase.java > > > multi_insert_gby.case.q > {code} > set hive.exec.reducers.bytes.per.reducer=256; > set hive.optimize.sampling.orderby=true; > drop table if exists e1; > drop table if exists e2; > create table e1 (key string, value string); > create table e2 (key string); > FROM (select key, cast(key as double) as keyD, value from src order by key) a > INSERT OVERWRITE TABLE e1 > SELECT key, value > INSERT OVERWRITE TABLE e2 > SELECT key; > select * from e1; > select * from e2; > {code} > the parallelism of Sort is 1 even we enable parallel order > by("hive.optimize.sampling.orderby" is set as "true"). This is not > reasonable because the parallelism should be calcuated by > [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170] > this is because SetSparkReducerParallelism#needSetParallelism returns false > when [children size of > RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] > is greater than 1. > in this case, the children size of {{RS[2]}} is two. > the logical plan of the case > {code} >TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5] > -SEL[6]-FS[7] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16876) RpcServer should be re-created when Rpc configs change
[ https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-16876: -- Status: Patch Available (was: Open) > RpcServer should be re-created when Rpc configs change > -- > > Key: HIVE-16876 > URL: https://issues.apache.org/jira/browse/HIVE-16876 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-16876.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16876) RpcServer should be re-created when Rpc configs change
[ https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-16876: -- Attachment: HIVE-16876.1.patch I've made all Rpc configs immutable at runtime. > RpcServer should be re-created when Rpc configs change > -- > > Key: HIVE-16876 > URL: https://issues.apache.org/jira/browse/HIVE-16876 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-16876.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases
[ https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16600: Attachment: HIVE-16600.13.patch [~lirui]: thanks for review. update HIVE-16600.13.patch here and on RB according to last round of review. > Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel > order by in multi_insert cases > > > Key: HIVE-16600 > URL: https://issues.apache.org/jira/browse/HIVE-16600 > Project: Hive > Issue Type: Sub-task >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, > HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, > HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, > HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, > HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, > mr.explain.log.HIVE-16600, Node.java, > TestSetSparkReduceParallelism_MultiInsertCase.java > > > multi_insert_gby.case.q > {code} > set hive.exec.reducers.bytes.per.reducer=256; > set hive.optimize.sampling.orderby=true; > drop table if exists e1; > drop table if exists e2; > create table e1 (key string, value string); > create table e2 (key string); > FROM (select key, cast(key as double) as keyD, value from src order by key) a > INSERT OVERWRITE TABLE e1 > SELECT key, value > INSERT OVERWRITE TABLE e2 > SELECT key; > select * from e1; > select * from e2; > {code} > the parallelism of Sort is 1 even we enable parallel order > by("hive.optimize.sampling.orderby" is set as "true"). This is not > reasonable because the parallelism should be calcuated by > [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170] > this is because SetSparkReducerParallelism#needSetParallelism returns false > when [children size of > RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] > is greater than 1. > in this case, the children size of {{RS[2]}} is two. > the logical plan of the case > {code} >TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5] > -SEL[6]-FS[7] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048772#comment-16048772 ] Hive QA commented on HIVE-16821: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12872928/HIVE-16821.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10831 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] (batchId=76) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5639/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5639/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5639/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12872928 - PreCommit-HIVE-Build > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch, HIVE-16821.3.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16864) add validation to stream position search in LLAP IO
[ https://issues.apache.org/jira/browse/HIVE-16864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048764#comment-16048764 ] Prasanth Jayachandran commented on HIVE-16864: -- orc_ppd_basic test failure related? > add validation to stream position search in LLAP IO > --- > > Key: HIVE-16864 > URL: https://issues.apache.org/jira/browse/HIVE-16864 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-16864.01.patch, HIVE-16864.patch > > > There's a TODO there to add the checks. We've seen some issues before where > incorrect ranges lead to obscure errors after this method returns a bad > result due to absence of validity checks; we also see one now. > Adding the checks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)