[jira] [Created] (HIVE-25777) ACID: Pick the compactor transaction over insert dir
Gopal Vijayaraghavan created HIVE-25777: --- Summary: ACID: Pick the compactor transaction over insert dir Key: HIVE-25777 URL: https://issues.apache.org/jira/browse/HIVE-25777 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.2, 4.0.0 Reporter: Gopal Vijayaraghavan If there are two competing versions of a particular write-id, one from the compactor and another from the original insert, always pick the compactor one once it is committed. If the directory structure looks like {code} base_11/ base_11_v192/ {code} Then always pick the v192 transaction if txnid=192 is committed. This is required to ensure that the raw base_ dir can be deleted safely on non-atomic directory deletions (like s3), without a race condition between getSplits and the actual file-reader. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1
Gopal Vijayaraghavan created HIVE-25589: --- Summary: SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1 Key: HIVE-25589 URL: https://issues.apache.org/jira/browse/HIVE-25589 Project: Hive Issue Type: Improvement Components: CBO, SQL Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan The insert queries which use a row_num()=1 function are inconvenient to write or port from an existing workload, because there is no easy way to ignore a column in this pattern. {code} INSERT INTO main_table SELECT * from duplicated_table QUALIFY ROW_NUMER() OVER (PARTITION BY event_id) = 1; {code} needs to be rewritten into {code} INSERT INTO main_table select event_id, event_ts, event_attribute, event_metric1, event_metric2, event_metric3, event_metric4, .., event_metric43 from (select *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum from duplicated_table) where rnum=1; {code} This is a time-consuming and error-prone rewrite (dealing with a messed up order of columns between one source and dest table). An alternate rewrite would be to do the same or similar syntax using HAVING. {code} INSERT INTO main_table SELECT * from duplicated_table HAVING ROW_NUMER() OVER (PARTITION BY event_id) = 1; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
Gopal Vijayaraghavan created HIVE-25011: --- Summary: Concurrency: Do not acquire locks for EXPLAIN Key: HIVE-25011 URL: https://issues.apache.org/jira/browse/HIVE-25011 Project: Hive Issue Type: Improvement Components: Locking, Transactions Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan {code} EXPLAIN UPDATE ... {code} should be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24976) CBO: count(distinct) in a window function fails CBO
Gopal Vijayaraghavan created HIVE-24976: --- Summary: CBO: count(distinct) in a window function fails CBO Key: HIVE-24976 URL: https://issues.apache.org/jira/browse/HIVE-24976 Project: Hive Issue Type: Bug Components: CBO Reporter: Gopal Vijayaraghavan {code} create temporary table tmp_tbl( `rule_id` string, `severity` string, `alert_id` string, `alert_type` string); explain cbo select `k`.`rule_id`, count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt` from tmp_tbl k ; explain select `k`.`rule_id`, count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt` from tmp_tbl k ; {code} Fails CBO, because the count(distinct) is not being recognized as belonging to a windowing operation. So it throws the following exception {code} throw new CalciteSemanticException("Distinct without an aggregation.", UnsupportedFeature.Distinct_without_an_aggreggation); {code} https://github.com/apache/hive/blob/73c3770d858b063c69dea6c64a759f8fdacad460/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L4914 This prevents a query like this from using a materialized view which already exists in the system (the MV obviously does not contain this expression, but represents a complex transform from a JSON structure into a columnar layout). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24533) Metastore: Allow miniHMS to startup standalone
Gopal Vijayaraghavan created HIVE-24533: --- Summary: Metastore: Allow miniHMS to startup standalone Key: HIVE-24533 URL: https://issues.apache.org/jira/browse/HIVE-24533 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan Similar to how StartMiniHS2Cluster works. https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/StartMiniHS2Cluster.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24462) JDBC: Support bearer token authentication
Gopal Vijayaraghavan created HIVE-24462: --- Summary: JDBC: Support bearer token authentication Key: HIVE-24462 URL: https://issues.apache.org/jira/browse/HIVE-24462 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan SPENGO authentication (Negotiate) authentication is the only way to authenticate a user without providing a password in JDBC. The SPN model for that fails when load-balancing is used (see HIVE-20583). Add a native JDBC equivalent for the Knox flow, but for POST requests with appropriate Authorization bearer tokens. https://knox.apache.org/books/knox-1-1-0/knoxsso_integration.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23609) SemiJoin: Relax big table size check for self-joins
Gopal Vijayaraghavan created HIVE-23609: --- Summary: SemiJoin: Relax big table size check for self-joins Key: HIVE-23609 URL: https://issues.apache.org/jira/browse/HIVE-23609 Project: Hive Issue Type: Improvement Reporter: Gopal Vijayaraghavan For self-joins, several other heuristics applied to Semijoins don't apply as the difference between rows on either side is likely to result in an actual reduction of rows scanned. This change results in slightly different Tez priorities for self-joins which are heavily filtered on one side over the other, which helps ensure the smaller table is completed before the bigger table consumes resources. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23541) Vectorization: Unbounded following window function start producing results too early
Gopal Vijayaraghavan created HIVE-23541: --- Summary: Vectorization: Unbounded following window function start producing results too early Key: HIVE-23541 URL: https://issues.apache.org/jira/browse/HIVE-23541 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan ReduceRecordSource indicates the end of group for a reducer input, whenever the entire key changes. ReduceRecordSource::processVectorGroup calls reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ true); when the last group is being processed. However for PTF window functions with unbounded following, this is triggered by the key changing and not the partition changing. This results in the VectorPTFOperator detect a change in the sort key as a switch of the partition key and start producing results too early. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFOperator.java#L399 {code} create temporary table test2(id STRING,name STRING,event_dt date) stored as orc; insert into test2 values ('100','A','2019-08-15'), ('100','A','2019-10-12'); SELECT name, event_dt, first_value(event_dt) over (PARTITION BY name ORDER BY event_dt desc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW) last_event_dt FROM test2; -- streaming FIRST_VALUE with DESCENDING SELECT name, event_dt, last_value(event_dt) over (PARTITION BY name ORDER BY event_dt asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) last_event_dt FROM test2; -- non-streaming LAST_VALUE with ASCENDING {code} These two queries should return identical results, with the streaming version being significantly faster than the non-streaming one, due to the lack of buffered/spilled rows with streaming. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23087) UDF: bloom_filter() should use 2nd argument as expectedEntries value
Gopal Vijayaraghavan created HIVE-23087: --- Summary: UDF: bloom_filter() should use 2nd argument as expectedEntries value Key: HIVE-23087 URL: https://issues.apache.org/jira/browse/HIVE-23087 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan {code} explain select bloom_filter(sr_ticket_number, 31299272) from store_returns, date_dim where sr_returned_date_sk = d_date_sk and d_year = 2000 and d_moy = 9; | PARTITION_ONLY_SHUFFLE [RS_12] | | Group By Operator [GBY_11] (rows=1 width=144) | | Output:["_col0"],aggregations:["bloom_filter(_col0, 31299272, expectedEntries=0)"] | | Select Operator [SEL_9] (rows=2964832193 width=15) | {code} fails with {code} Caused by: java.lang.IllegalArgumentException: expectedEntries should be > 0 at org.apache.hive.common.util.BloomKFilter.checkArgument(BloomKFilter.java:54) at org.apache.hive.common.util.BloomKFilter.(BloomKFilter.java:59) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22948) QueryCache: Treat query cache locations as temporary storage
Gopal Vijayaraghavan created HIVE-22948: --- Summary: QueryCache: Treat query cache locations as temporary storage Key: HIVE-22948 URL: https://issues.apache.org/jira/browse/HIVE-22948 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 3.1.2, 4.0.0 Reporter: Gopal Vijayaraghavan The WriteEntity with a query cache query is considered for user authorization without having direct access for users. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java#L111 {code} if (privObject instanceof WriteEntity && ((WriteEntity)privObject).isTempURI()) { // do not authorize temporary uris continue; } {code} is not satisfied by the queries qualifying for the query cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22943) Metastore pushdown for DATE constants
Gopal Vijayaraghavan created HIVE-22943: --- Summary: Metastore pushdown for DATE constants Key: HIVE-22943 URL: https://issues.apache.org/jira/browse/HIVE-22943 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan https://github.com/apache/hive/blame/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/Filter.g#L461 {code} /* When I figure out how to make lexer backtrack after validating predicate, dates would be able to support single quotes [( '\'' DateString '\'' ) |]. For now, what we do instead is have a hack to parse the string in metastore code from StringLiteral. */ DateLiteral : KW_DATE? DateString { ExtractDate(getText()) != null }? ; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22930) Performance: ASTNode::getName() allocates within the walker loops
Gopal Vijayaraghavan created HIVE-22930: --- Summary: Performance: ASTNode::getName() allocates within the walker loops Key: HIVE-22930 URL: https://issues.apache.org/jira/browse/HIVE-22930 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan Attachments: ASTNode-name.png {code} /* * (non-Javadoc) * * @see org.apache.hadoop.hive.ql.lib.Node#getName() */ @Override public String getName() { return String.valueOf(super.getToken().getType()); } {code} !ASTNode-name.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
Gopal Vijayaraghavan created HIVE-22929: --- Summary: Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll() Key: HIVE-22929 URL: https://issues.apache.org/jira/browse/HIVE-22929 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan Attachments: String.replaceAll.png !String.replaceAll.png! https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 {code} '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, getText().length() -1 ).replaceAll("``", "`")); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22880) ACID: All delete event readers should ignore ORC SARGs
Gopal Vijayaraghavan created HIVE-22880: --- Summary: ACID: All delete event readers should ignore ORC SARGs Key: HIVE-22880 URL: https://issues.apache.org/jira/browse/HIVE-22880 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan Delete delta readers should not apply any SARGs other than the ones related to the transaction id ranges within the inserts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22834) Consistency: Expose unique table-identifiers in org.apache.hadoop.hive.ql.metadata.Table
Gopal Vijayaraghavan created HIVE-22834: --- Summary: Consistency: Expose unique table-identifiers in org.apache.hadoop.hive.ql.metadata.Table Key: HIVE-22834 URL: https://issues.apache.org/jira/browse/HIVE-22834 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan Distinguish between the two tables in {code} create table foo as select 1 as x; drop table foo; create table foo as select 2 as x; {code} in caching subsystems. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22829) Decimal64: NVL in vectorization miss NPE with CBO on
Gopal Vijayaraghavan created HIVE-22829: --- Summary: Decimal64: NVL in vectorization miss NPE with CBO on Key: HIVE-22829 URL: https://issues.apache.org/jira/browse/HIVE-22829 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Gopal Vijayaraghavan {code} select sum(NVL(ss_sales_price, 1.0BD)) from store_sales where ss_sold_date_sk % = 1; {code} {code} | notVectorizedReason: exception: java.lang.NullPointerException stack trace: org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4754), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4687), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4669), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5269), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:977), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:864), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:834), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2500(Vectorizer.java:245), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2103), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2055), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:2030), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:1185), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1017), org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111), org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), ... | {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22828) Decimal64: NVL & CASE statements implicitly convert decimal64 to 128
Gopal Vijayaraghavan created HIVE-22828: --- Summary: Decimal64: NVL & CASE statements implicitly convert decimal64 to 128 Key: HIVE-22828 URL: https://issues.apache.org/jira/browse/HIVE-22828 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan {code} select sum(case when (ss_item_sk=1) then ss_sales_price else null end), sum(case when (ss_item_sk=2) then ss_sales_price else ss_sales_price+1 end), sum(case when (ss_item_sk=2) then 1.0BD+ss_sales_price else null end) from store_sales where ss_sold_date_sk % = 1; {code} {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector at org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector.setElement(DecimalColumnVector.java:130) at org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnNull.evaluate(IfExprColumnNull.java:125) at org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal.aggregateInputSelection(VectorUDAFSumDecimal.java:113) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:221) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.doProcessBatch(VectorGroupByOperator.java:414) {code} https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L3950 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22816) QueryCache: Queries using views can have them cached after CTE expansion
Gopal Vijayaraghavan created HIVE-22816: --- Summary: QueryCache: Queries using views can have them cached after CTE expansion Key: HIVE-22816 URL: https://issues.apache.org/jira/browse/HIVE-22816 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gopal Vijayaraghavan {code} create view ss_null as select * from store_Sales where ss_Sold_date_sk is null; select count(ss_ticket_number) from ss_null; with ss_null_cte as (select * from store_Sales where ss_Sold_date_sk is null) select count(ss_ticket_number) from ss_null_cte; {code} Are treated differently by the query cache, however their execution is identical. CBO rewrites the view query into AST form as follows {code} SELECT COUNT(`ss_ticket_number`) AS `$f0` FROM `tpcds_bin_partitioned_acid_orc_1`.`store_sales` WHERE `ss_sold_date_sk` IS NULL {code} But retains the write-entity for the VIRTUAL_VIEW for Ranger authorization {code} 0: jdbc:hive2://localhost:10013> explain dependency select count(distinct ss_ticket_number) from ss_null; ++ | Explain | ++ | {"input_tables":[{"tablename":"tpcds_bin_partitioned_acid_orc_1@ss_null","tabletype":"VIRTUAL_VIEW"},{"tablename":"tpcds_bin_partitioned_acid_orc_1@store_sales","tabletype":"MANAGED_TABLE","tableParents":"[tpcds_bin_partitioned_acid_orc_1@ss_null]"}],"input_partitions":[{"partitionName":"tpcds_bin_partitioned_acid_orc_1@store_sales@ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__"}]} | ++ {code} Causing Query cache to print out {code} parse.CalcitePlanner: Not eligible for results caching - query contains non-transactional tables [ss_null] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22797) ACID: Remove the map-side GBY for the Merge cardinality check
Gopal Vijayaraghavan created HIVE-22797: --- Summary: ACID: Remove the map-side GBY for the Merge cardinality check Key: HIVE-22797 URL: https://issues.apache.org/jira/browse/HIVE-22797 Project: Hive Issue Type: Bug Components: Physical Optimizer, Transactions Reporter: Gopal Vijayaraghavan The hash-aggregate of the cardinality check on the mapper is entirely useless as the ideal scenario is that we don't have any duplicates at all. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22796) ACID: Update/Delete operations are implicitly bucketed by 2^12 buckets
Gopal Vijayaraghavan created HIVE-22796: --- Summary: ACID: Update/Delete operations are implicitly bucketed by 2^12 buckets Key: HIVE-22796 URL: https://issues.apache.org/jira/browse/HIVE-22796 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22737) Concurrency: FunctionRegistry::getFunctionInfo is static object locked
Gopal Vijayaraghavan created HIVE-22737: --- Summary: Concurrency: FunctionRegistry::getFunctionInfo is static object locked Key: HIVE-22737 URL: https://issues.apache.org/jira/browse/HIVE-22737 Project: Hive Issue Type: Bug Components: Logical Optimizer, UDF Reporter: Gopal Vijayaraghavan Attachments: FunctionRegistry-lock.png The lock is inside a HS2-wide static object https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L191 {code} // registry for system functions private static final Registry system = new Registry(true); {code} And this is the lock itself https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java#L332 {code} public FunctionInfo getFunctionInfo(String functionName) throws SemanticException { lock.lock(); {code} !FunctionRegistry-lock.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22654) ACID: Allow TxnHandler::checkLock to chunk partitions by 1000
Gopal Vijayaraghavan created HIVE-22654: --- Summary: ACID: Allow TxnHandler::checkLock to chunk partitions by 1000 Key: HIVE-22654 URL: https://issues.apache.org/jira/browse/HIVE-22654 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan The following loop can end up with too many entries within the IN clause throwing {code:java} // If any of the partition requests are null, then I need to pull all // partition locks for this table. sawNull = false; strings.clear(); for (LockInfo info : locksBeingChecked) { if (info.partition == null) { sawNull = true; break; } else { strings.add(info.partition); } } {code} {code} 2019-12-17T04:28:57,991 ERROR [pool-8-thread-143]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to update transaction database java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a list is 1000 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22558) Metastore: Passwords jceks should be read lazily, in case of connection pools
Gopal Vijayaraghavan created HIVE-22558: --- Summary: Metastore: Passwords jceks should be read lazily, in case of connection pools Key: HIVE-22558 URL: https://issues.apache.org/jira/browse/HIVE-22558 Project: Hive Issue Type: Bug Components: Metastore, Standalone Metastore Reporter: Gopal Vijayaraghavan Attachments: getDatabase-password-md5-hotpath.png The jceks file is parsed for every instance of the metastore conf to populate the password in plain-text, which is irrelevant for the scenario where the DB connection pool is already active. !getDatabase-password-md5-hotpath.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22540) Vectorization: Decimal64 columns don't work with VectorizedBatchUtil.makeLikeColumnVector(ColumnVector)
Gopal Vijayaraghavan created HIVE-22540: --- Summary: Vectorization: Decimal64 columns don't work with VectorizedBatchUtil.makeLikeColumnVector(ColumnVector) Key: HIVE-22540 URL: https://issues.apache.org/jira/browse/HIVE-22540 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Gopal Vijayaraghavan {code} if (source instanceof Decimal64ColumnVector) { Decimal64ColumnVector dec64ColVector = (Decimal64ColumnVector) source; return new DecimalColumnVector(dec64ColVector.vector.length, dec64ColVector.precision, dec64ColVector.scale); } {code} This means that the operators need to change between the original and copy of the vector shapes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22499) LLAP: Add an EncodedReaderOptions to extend ORC impl for options
Gopal Vijayaraghavan created HIVE-22499: --- Summary: LLAP: Add an EncodedReaderOptions to extend ORC impl for options Key: HIVE-22499 URL: https://issues.apache.org/jira/browse/HIVE-22499 Project: Hive Issue Type: Bug Components: llap, ORC Reporter: Gopal Vijayaraghavan ORC-570 is an ABI change to the way getFileSystem() by adding an another exception to the implementation. To accept and use that change requires waiting for an ORC release, while this patch serves the same purpose though falls back for a retry for FileSystem.get() in case the supplier fails at runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22448) CBO: Expand the multiple count distinct with a group-by key
Gopal Vijayaraghavan created HIVE-22448: --- Summary: CBO: Expand the multiple count distinct with a group-by key Key: HIVE-22448 URL: https://issues.apache.org/jira/browse/HIVE-22448 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan {code} create temporary table mytable1 (x integer, y integer, z integer, a integer); explain cbo select z, x, count(distinct y), count(distinct a) from mytable1 group by z, x; explain cbo select z, x, count(distinct y), count(distinct a) from mytable1 {code} The first is not vectorized, the second one is because of the ROLLUP based rewrite for count distinct. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22359) LLAP: when a node restarts with the exact same host/port in kubernetes it is not detected as a task failure
Gopal Vijayaraghavan created HIVE-22359: --- Summary: LLAP: when a node restarts with the exact same host/port in kubernetes it is not detected as a task failure Key: HIVE-22359 URL: https://issues.apache.org/jira/browse/HIVE-22359 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan {code} │ <14>1 2019-10-16T22:16:39.233Z query-coordinator-0-5.query-coordinator-0-service.compute-1569601454-l2x9.svc.cluster.local query-coordinator 1 461e5ad9-f05f-11e9-85f7-06e84765763e [mdc@18060 class="te │ │ zplugins.LlapTaskCommunicator" level="INFO" thread="IPC Server handler 4 on 3"] The tasks we expected to be on the node are not there: attempt_1569601631911__1_04_34_0, attempt_15696016319 │ │ 11__1_04_71_0, attempt_1569601631911__1_04_000191_0, attempt_1569601631911__1_04_000211_0, attempt_1569601631911__1_04_000229_0, attempt_1569601631911__1_04_000231_0, attempt_1 │ │ 569601631911__1_04_000235_0, attempt_1569601631911__1_04_000242_0, attempt_1569601631911__1_04_000160_1, attempt_1569601631911__1_04_12_2, attempt_1569601631911__1_04_03_2, │ │ attempt_1569601631911__1_04_56_2, {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22350) ACID: disable query re-execution when doing transactional inserts
Gopal Vijayaraghavan created HIVE-22350: --- Summary: ACID: disable query re-execution when doing transactional inserts Key: HIVE-22350 URL: https://issues.apache.org/jira/browse/HIVE-22350 Project: Hive Issue Type: Bug Components: Physical Optimizer, Transactions Reporter: Gopal Vijayaraghavan Reusing the same transaction id for a 2nd attempt causes issues with data cleanup via transactions, with identical filenames being repeated for inserts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22327) Repl: Ignore read-only transactions in notification log
Gopal Vijayaraghavan created HIVE-22327: --- Summary: Repl: Ignore read-only transactions in notification log Key: HIVE-22327 URL: https://issues.apache.org/jira/browse/HIVE-22327 Project: Hive Issue Type: Improvement Components: repl Reporter: Gopal Vijayaraghavan Read txns need not be replicated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22326) StreamingV2: Fail streaming ingests if columns with default constraints are not provided
Gopal Vijayaraghavan created HIVE-22326: --- Summary: StreamingV2: Fail streaming ingests if columns with default constraints are not provided Key: HIVE-22326 URL: https://issues.apache.org/jira/browse/HIVE-22326 Project: Hive Issue Type: Bug Reporter: Gopal Vijayaraghavan If a column has a default constraint, the StreamingV2 does not run the corresponding UDF (& in some cases cannot run one, like SURROGATE_KEY). Fail visibly for that scenario by scenario, rather than allowing DEFAULT to be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22287) Metastore logging needs to add the partition info
Gopal Vijayaraghavan created HIVE-22287: --- Summary: Metastore logging needs to add the partition info Key: HIVE-22287 URL: https://issues.apache.org/jira/browse/HIVE-22287 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan {code} [mdc@18060 class="utils.MetaStoreUtils" level="WARN" thread="HMSHandler #4"] Updating partition stats fast for: web_returns [mdc@18060 class="utils.MetaStoreUtils" level="WARN" thread="HMSHandler #9"] Updated size to 21890 {code} The HMS logs are at WARN level, but are missing the partition name in both log lines. -- This message was sent by Atlassian Jira (v8.3.4#803005)