[jira] [Created] (HIVE-25777) ACID: Pick the compactor transaction over insert dir

2021-12-06 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-25777:
---

 Summary: ACID: Pick the compactor transaction over insert dir
 Key: HIVE-25777
 URL: https://issues.apache.org/jira/browse/HIVE-25777
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.2, 4.0.0
Reporter: Gopal Vijayaraghavan


If there are two competing versions of a particular write-id, one from the 
compactor and another from the original insert, always pick the compactor one 
once it is committed.

If the directory structure looks like 

{code}
base_11/
base_11_v192/
{code}

Then always pick the v192 transaction if txnid=192 is committed.

This is required to ensure that the raw base_ dir can be deleted safely on 
non-atomic directory deletions (like s3), without a race condition between 
getSplits and the actual file-reader.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1

2021-10-01 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-25589:
---

 Summary: SQL: Implement HAVING/QUALIFY predicates for 
ROW_NUMBER()=1
 Key: HIVE-25589
 URL: https://issues.apache.org/jira/browse/HIVE-25589
 Project: Hive
  Issue Type: Improvement
  Components: CBO, SQL
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


The insert queries which use a row_num()=1 function are inconvenient to write 
or port from an existing workload, because there is no easy way to ignore a 
column in this pattern.

{code}
INSERT INTO main_table 
SELECT * from duplicated_table
QUALIFY ROW_NUMER() OVER (PARTITION BY event_id) = 1;
{code}

needs to be rewritten into

{code}
INSERT INTO main_table
select event_id, event_ts, event_attribute, event_metric1, event_metric2, 
event_metric3, event_metric4, .., event_metric43 from 
(select *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum from 
duplicated_table)
where rnum=1;
{code}

This is a time-consuming and error-prone rewrite (dealing with a messed up 
order of columns between one source and dest table).

An alternate rewrite would be to do the same or similar syntax using HAVING. 

{code}
INSERT INTO main_table 
SELECT * from duplicated_table
HAVING ROW_NUMER() OVER (PARTITION BY event_id) = 1;
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-25011:
---

 Summary: Concurrency: Do not acquire locks for EXPLAIN
 Key: HIVE-25011
 URL: https://issues.apache.org/jira/browse/HIVE-25011
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Transactions
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


{code}
EXPLAIN UPDATE ...
{code}

should be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24976) CBO: count(distinct) in a window function fails CBO

2021-04-05 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-24976:
---

 Summary: CBO: count(distinct) in a window function fails CBO
 Key: HIVE-24976
 URL: https://issues.apache.org/jira/browse/HIVE-24976
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Gopal Vijayaraghavan


{code}
create temporary table tmp_tbl(
`rule_id` string,
`severity` string,
`alert_id` string,
`alert_type` string);

explain cbo
select `k`.`rule_id`,
count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt`
from tmp_tbl k
;


explain
select `k`.`rule_id`,
count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt`
from tmp_tbl k
;
{code}

Fails CBO, because the count(distinct) is not being recognized as belonging to 
a windowing operation.

So it throws the following exception

{code}
throw new CalciteSemanticException("Distinct without an 
aggregation.",
UnsupportedFeature.Distinct_without_an_aggreggation);
{code}

https://github.com/apache/hive/blob/73c3770d858b063c69dea6c64a759f8fdacad460/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L4914

This prevents a query like this from using a materialized view which already 
exists in the system (the MV obviously does not contain this expression, but 
represents a complex transform from a JSON structure into a columnar layout).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24533) Metastore: Allow miniHMS to startup standalone

2020-12-14 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-24533:
---

 Summary: Metastore: Allow miniHMS to startup standalone
 Key: HIVE-24533
 URL: https://issues.apache.org/jira/browse/HIVE-24533
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


Similar to how StartMiniHS2Cluster works.

https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/StartMiniHS2Cluster.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24462) JDBC: Support bearer token authentication

2020-12-01 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-24462:
---

 Summary: JDBC: Support bearer token authentication 
 Key: HIVE-24462
 URL: https://issues.apache.org/jira/browse/HIVE-24462
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


SPENGO authentication (Negotiate) authentication is the only way to 
authenticate a user without providing a password in JDBC.

The SPN model for that fails when load-balancing is used (see HIVE-20583).

Add a native JDBC equivalent for the Knox flow, but for POST requests with 
appropriate Authorization bearer tokens.

https://knox.apache.org/books/knox-1-1-0/knoxsso_integration.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23609) SemiJoin: Relax big table size check for self-joins

2020-06-04 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-23609:
---

 Summary: SemiJoin: Relax big table size check for self-joins
 Key: HIVE-23609
 URL: https://issues.apache.org/jira/browse/HIVE-23609
 Project: Hive
  Issue Type: Improvement
Reporter: Gopal Vijayaraghavan


For self-joins, several other heuristics applied to Semijoins don't apply as 
the difference between rows on either side is likely to result in an actual 
reduction of rows scanned.

This change results in slightly different Tez priorities for self-joins which 
are heavily filtered on one side over the other, which helps ensure the smaller 
table is completed before the bigger table consumes resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23541) Vectorization: Unbounded following window function start producing results too early

2020-05-23 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-23541:
---

 Summary: Vectorization: Unbounded following window function start 
producing results too early
 Key: HIVE-23541
 URL: https://issues.apache.org/jira/browse/HIVE-23541
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


ReduceRecordSource indicates the end of group for a reducer input, whenever the 
entire key changes.

ReduceRecordSource::processVectorGroup calls 
reducer.setNextVectorBatchGroupStatus(/* isLastGroupBatch */ true); when the 
last group is being processed.

However for PTF window functions with unbounded following, this is triggered by 
the key changing and not the partition changing.

This results in the VectorPTFOperator detect a change in the sort key as a 
switch of the partition key and start producing results too early.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFOperator.java#L399

{code}
create temporary table test2(id STRING,name STRING,event_dt date) stored as orc;

insert into test2 values ('100','A','2019-08-15'), ('100','A','2019-10-12');


SELECT name, event_dt, first_value(event_dt) over (PARTITION BY name ORDER BY 
event_dt desc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW) last_event_dt 
FROM test2; -- streaming FIRST_VALUE with DESCENDING

SELECT name, event_dt, last_value(event_dt) over (PARTITION BY name ORDER BY 
event_dt asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) 
last_event_dt FROM test2; -- non-streaming LAST_VALUE with ASCENDING
{code}

These two queries should return identical results, with the streaming version 
being significantly faster than the non-streaming one, due to the lack of 
buffered/spilled rows with streaming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23087) UDF: bloom_filter() should use 2nd argument as expectedEntries value

2020-03-26 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-23087:
---

 Summary: UDF: bloom_filter() should use 2nd argument as 
expectedEntries value
 Key: HIVE-23087
 URL: https://issues.apache.org/jira/browse/HIVE-23087
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


{code}
 explain select bloom_filter(sr_ticket_number, 31299272) from store_returns, 
date_dim where sr_returned_date_sk   = d_date_sk  and d_year = 2000 and d_moy  
= 9;

|   PARTITION_ONLY_SHUFFLE [RS_12]   |
| Group By Operator [GBY_11] (rows=1 width=144) |
|   Output:["_col0"],aggregations:["bloom_filter(_col0, 31299272, 
expectedEntries=0)"] |
|   Select Operator [SEL_9] (rows=2964832193 width=15) |
{code}

fails with

{code}
Caused by: java.lang.IllegalArgumentException: expectedEntries should be > 0
at 
org.apache.hive.common.util.BloomKFilter.checkArgument(BloomKFilter.java:54)
at org.apache.hive.common.util.BloomKFilter.(BloomKFilter.java:59)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22948) QueryCache: Treat query cache locations as temporary storage

2020-02-28 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22948:
---

 Summary: QueryCache: Treat query cache locations as temporary 
storage
 Key: HIVE-22948
 URL: https://issues.apache.org/jira/browse/HIVE-22948
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 3.1.2, 4.0.0
Reporter: Gopal Vijayaraghavan


The WriteEntity with a query cache query is considered for user authorization 
without having direct access for users.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java#L111

{code}
  if (privObject instanceof WriteEntity && 
((WriteEntity)privObject).isTempURI()) {
// do not authorize temporary uris
continue;
  }
{code}

is not satisfied by the queries qualifying for the query cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22943) Metastore pushdown for DATE constants

2020-02-27 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22943:
---

 Summary: Metastore pushdown for DATE constants
 Key: HIVE-22943
 URL: https://issues.apache.org/jira/browse/HIVE-22943
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


https://github.com/apache/hive/blame/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/Filter.g#L461

{code}
/* When I figure out how to make lexer backtrack after validating predicate, 
dates would be able 
to support single quotes [( '\'' DateString '\'' ) |]. For now, what we do 
instead is have a hack
to parse the string in metastore code from StringLiteral. */
DateLiteral
:
KW_DATE? DateString { ExtractDate(getText()) != null }?
;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22930) Performance: ASTNode::getName() allocates within the walker loops

2020-02-25 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22930:
---

 Summary: Performance: ASTNode::getName() allocates within the 
walker loops
 Key: HIVE-22930
 URL: https://issues.apache.org/jira/browse/HIVE-22930
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan
 Attachments: ASTNode-name.png

{code}
  /*
   * (non-Javadoc)
   *
   * @see org.apache.hadoop.hive.ql.lib.Node#getName()
   */
  @Override
  public String getName() {
return String.valueOf(super.getToken().getType());
  }
{code}

 !ASTNode-name.png! 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()

2020-02-25 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22929:
---

 Summary: Performance: quoted identifier parsing uses throwaway 
Regex via String.replaceAll()
 Key: HIVE-22929
 URL: https://issues.apache.org/jira/browse/HIVE-22929
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan
 Attachments: String.replaceAll.png

 !String.replaceAll.png! 

https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530

{code}
'`'  ( '``' | ~('`') )* '`' { setText(getText().substring(1, 
getText().length() -1 ).replaceAll("``", "`")); }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22880) ACID: All delete event readers should ignore ORC SARGs

2020-02-12 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22880:
---

 Summary: ACID: All delete event readers should ignore ORC SARGs
 Key: HIVE-22880
 URL: https://issues.apache.org/jira/browse/HIVE-22880
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


Delete delta readers should not apply any SARGs other than the ones related to 
the transaction id ranges within the inserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22834) Consistency: Expose unique table-identifiers in org.apache.hadoop.hive.ql.metadata.Table

2020-02-05 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22834:
---

 Summary: Consistency: Expose unique table-identifiers in 
org.apache.hadoop.hive.ql.metadata.Table
 Key: HIVE-22834
 URL: https://issues.apache.org/jira/browse/HIVE-22834
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


Distinguish between the two tables in 

{code}
create table foo as select 1 as x;
drop table foo;
create table foo as select 2 as x;
{code}

in caching subsystems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22829) Decimal64: NVL in vectorization miss NPE with CBO on

2020-02-04 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22829:
---

 Summary: Decimal64: NVL in vectorization miss NPE with CBO on
 Key: HIVE-22829
 URL: https://issues.apache.org/jira/browse/HIVE-22829
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Gopal Vijayaraghavan


{code}
select  
sum(NVL(ss_sales_price, 1.0BD))
from store_sales where ss_sold_date_sk %  = 1;
{code}

{code}
| notVectorizedReason: exception: 
java.lang.NullPointerException stack trace: 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4754),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4687),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4669),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5269),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:977),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:864),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:834),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2500(Vectorizer.java:245),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2103),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2055),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:2030),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:1185),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1017),
 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111),
 org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), 
... |
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22828) Decimal64: NVL & CASE statements implicitly convert decimal64 to 128

2020-02-04 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22828:
---

 Summary: Decimal64: NVL & CASE statements implicitly convert 
decimal64 to 128 
 Key: HIVE-22828
 URL: https://issues.apache.org/jira/browse/HIVE-22828
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


{code}
select  
sum(case when (ss_item_sk=1) then ss_sales_price else null end),
sum(case when (ss_item_sk=2) then ss_sales_price else ss_sales_price+1 end),
sum(case when (ss_item_sk=2) then 1.0BD+ss_sales_price else null end)
from store_sales where ss_sold_date_sk %  = 1;
{code}

{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector.setElement(DecimalColumnVector.java:130)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnNull.evaluate(IfExprColumnNull.java:125)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorUDAFSumDecimal.aggregateInputSelection(VectorUDAFSumDecimal.java:113)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:221)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.doProcessBatch(VectorGroupByOperator.java:414)
{code}

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L3950



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22816) QueryCache: Queries using views can have them cached after CTE expansion

2020-01-31 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22816:
---

 Summary: QueryCache: Queries using views can have them cached 
after CTE expansion
 Key: HIVE-22816
 URL: https://issues.apache.org/jira/browse/HIVE-22816
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal Vijayaraghavan


{code}
create view ss_null as select * from store_Sales where ss_Sold_date_sk is null;

select count(ss_ticket_number) from ss_null;

with ss_null_cte as 
(select * from store_Sales where ss_Sold_date_sk is null)
select count(ss_ticket_number) from ss_null_cte;
{code}

Are treated differently by the query cache, however their execution is 
identical.

CBO rewrites the view query into AST form as follows

{code}
SELECT COUNT(`ss_ticket_number`) AS `$f0`
FROM `tpcds_bin_partitioned_acid_orc_1`.`store_sales`
WHERE `ss_sold_date_sk` IS NULL
{code}

But retains the write-entity for the VIRTUAL_VIEW for Ranger authorization 

{code}
0: jdbc:hive2://localhost:10013> explain dependency select count(distinct 
ss_ticket_number) from ss_null;

++
|  Explain   |
++
| 
{"input_tables":[{"tablename":"tpcds_bin_partitioned_acid_orc_1@ss_null","tabletype":"VIRTUAL_VIEW"},{"tablename":"tpcds_bin_partitioned_acid_orc_1@store_sales","tabletype":"MANAGED_TABLE","tableParents":"[tpcds_bin_partitioned_acid_orc_1@ss_null]"}],"input_partitions":[{"partitionName":"tpcds_bin_partitioned_acid_orc_1@store_sales@ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__"}]}
 |
++
{code}

Causing Query cache to print out

{code}
parse.CalcitePlanner: Not eligible for results caching - query contains 
non-transactional tables [ss_null]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22797) ACID: Remove the map-side GBY for the Merge cardinality check

2020-01-30 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22797:
---

 Summary: ACID: Remove the map-side GBY for the Merge cardinality 
check
 Key: HIVE-22797
 URL: https://issues.apache.org/jira/browse/HIVE-22797
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Transactions
Reporter: Gopal Vijayaraghavan


The hash-aggregate of the cardinality check on the mapper is entirely useless 
as the ideal scenario is that we don't have any duplicates at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22796) ACID: Update/Delete operations are implicitly bucketed by 2^12 buckets

2020-01-30 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22796:
---

 Summary: ACID: Update/Delete operations are implicitly bucketed by 
2^12 buckets
 Key: HIVE-22796
 URL: https://issues.apache.org/jira/browse/HIVE-22796
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22737) Concurrency: FunctionRegistry::getFunctionInfo is static object locked

2020-01-16 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22737:
---

 Summary: Concurrency: FunctionRegistry::getFunctionInfo is static 
object locked
 Key: HIVE-22737
 URL: https://issues.apache.org/jira/browse/HIVE-22737
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, UDF
Reporter: Gopal Vijayaraghavan
 Attachments: FunctionRegistry-lock.png

The lock is inside a HS2-wide static object

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L191

{code}
  // registry for system functions
  private static final Registry system = new Registry(true);
{code}

And this is the lock itself

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java#L332

{code}
  public FunctionInfo getFunctionInfo(String functionName) throws 
SemanticException {
lock.lock();
{code}

 !FunctionRegistry-lock.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22654) ACID: Allow TxnHandler::checkLock to chunk partitions by 1000

2019-12-17 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22654:
---

 Summary: ACID: Allow TxnHandler::checkLock to chunk partitions by 
1000 
 Key: HIVE-22654
 URL: https://issues.apache.org/jira/browse/HIVE-22654
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


The following loop can end up with too many entries within the IN clause 
throwing 

{code:java}
        // If any of the partition requests are null, then I need to pull all
        // partition locks for this table.
        sawNull = false;
        strings.clear();
        for (LockInfo info : locksBeingChecked) {
          if (info.partition == null) {
            sawNull = true;
            break;
          } else {
            strings.add(info.partition);
          }
        } 
{code}

{code}
2019-12-17T04:28:57,991 ERROR [pool-8-thread-143]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to 
update transaction database java.sql.SQLSyntaxErrorException: ORA-01795: 
maximum number of expressions in a list is 1000
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22558) Metastore: Passwords jceks should be read lazily, in case of connection pools

2019-11-27 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22558:
---

 Summary: Metastore: Passwords jceks should be read lazily, in case 
of connection pools
 Key: HIVE-22558
 URL: https://issues.apache.org/jira/browse/HIVE-22558
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Standalone Metastore
Reporter: Gopal Vijayaraghavan
 Attachments: getDatabase-password-md5-hotpath.png

The jceks file is parsed for every instance of the metastore conf to populate 
the password in plain-text, which is irrelevant for the scenario where the DB 
connection pool is already active.

 !getDatabase-password-md5-hotpath.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22540) Vectorization: Decimal64 columns don't work with VectorizedBatchUtil.makeLikeColumnVector(ColumnVector)

2019-11-25 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22540:
---

 Summary: Vectorization: Decimal64 columns don't work with 
VectorizedBatchUtil.makeLikeColumnVector(ColumnVector)
 Key: HIVE-22540
 URL: https://issues.apache.org/jira/browse/HIVE-22540
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Gopal Vijayaraghavan


{code}
if (source instanceof Decimal64ColumnVector) {
  Decimal64ColumnVector dec64ColVector = (Decimal64ColumnVector) source;
  return new DecimalColumnVector(dec64ColVector.vector.length,
  dec64ColVector.precision,
  dec64ColVector.scale);
}
{code}

This means that the operators need to change between the original and copy of 
the vector shapes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22499) LLAP: Add an EncodedReaderOptions to extend ORC impl for options

2019-11-14 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22499:
---

 Summary: LLAP: Add an EncodedReaderOptions to extend ORC impl for 
options
 Key: HIVE-22499
 URL: https://issues.apache.org/jira/browse/HIVE-22499
 Project: Hive
  Issue Type: Bug
  Components: llap, ORC
Reporter: Gopal Vijayaraghavan


ORC-570 is an ABI change to the way getFileSystem() by adding an another 
exception to the implementation.

To accept and use that change requires waiting for an ORC release, while this 
patch serves the same purpose though falls back for a retry for 
FileSystem.get() in case the supplier fails at runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22448) CBO: Expand the multiple count distinct with a group-by key

2019-11-01 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22448:
---

 Summary: CBO: Expand the multiple count distinct with a group-by 
key
 Key: HIVE-22448
 URL: https://issues.apache.org/jira/browse/HIVE-22448
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


{code}
create temporary table mytable1 (x integer, y integer, z integer, a integer);

explain cbo
select z, x, count(distinct y), count(distinct a)
from mytable1
group by z, x;


explain cbo
select z, x, count(distinct y), count(distinct a)
from mytable1
{code}

The first is not vectorized, the second one is because of the ROLLUP based 
rewrite for count distinct.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22359) LLAP: when a node restarts with the exact same host/port in kubernetes it is not detected as a task failure

2019-10-16 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22359:
---

 Summary: LLAP: when a node restarts with the exact same host/port 
in kubernetes it is not detected as a task failure
 Key: HIVE-22359
 URL: https://issues.apache.org/jira/browse/HIVE-22359
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


{code}
│ <14>1 2019-10-16T22:16:39.233Z 
query-coordinator-0-5.query-coordinator-0-service.compute-1569601454-l2x9.svc.cluster.local
 query-coordinator 1 461e5ad9-f05f-11e9-85f7-06e84765763e [mdc@18060 class="te │
│ zplugins.LlapTaskCommunicator" level="INFO" thread="IPC Server handler 4 on 
3"] The tasks we expected to be on the node are not there: 
attempt_1569601631911__1_04_34_0, attempt_15696016319 │
│ 11__1_04_71_0, attempt_1569601631911__1_04_000191_0, 
attempt_1569601631911__1_04_000211_0, 
attempt_1569601631911__1_04_000229_0, 
attempt_1569601631911__1_04_000231_0, attempt_1 │
│ 569601631911__1_04_000235_0, attempt_1569601631911__1_04_000242_0, 
attempt_1569601631911__1_04_000160_1, 
attempt_1569601631911__1_04_12_2, 
attempt_1569601631911__1_04_03_2, │
│  attempt_1569601631911__1_04_56_2, 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22350) ACID: disable query re-execution when doing transactional inserts

2019-10-15 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22350:
---

 Summary: ACID: disable query re-execution when doing transactional 
inserts
 Key: HIVE-22350
 URL: https://issues.apache.org/jira/browse/HIVE-22350
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Transactions
Reporter: Gopal Vijayaraghavan


Reusing the same transaction id for a 2nd attempt causes issues with data 
cleanup via transactions, with identical filenames being repeated for inserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22327) Repl: Ignore read-only transactions in notification log

2019-10-10 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22327:
---

 Summary: Repl: Ignore read-only transactions in notification log
 Key: HIVE-22327
 URL: https://issues.apache.org/jira/browse/HIVE-22327
 Project: Hive
  Issue Type: Improvement
  Components: repl
Reporter: Gopal Vijayaraghavan


Read txns need not be replicated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22326) StreamingV2: Fail streaming ingests if columns with default constraints are not provided

2019-10-10 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22326:
---

 Summary: StreamingV2: Fail streaming ingests if columns with 
default constraints are not provided
 Key: HIVE-22326
 URL: https://issues.apache.org/jira/browse/HIVE-22326
 Project: Hive
  Issue Type: Bug
Reporter: Gopal Vijayaraghavan


If a column has a default constraint, the StreamingV2 does not run the 
corresponding UDF (& in some cases cannot run one, like SURROGATE_KEY).

Fail visibly for that scenario by scenario, rather than allowing DEFAULT to be 
ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22287) Metastore logging needs to add the partition info

2019-10-02 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-22287:
---

 Summary: Metastore logging needs to add the partition info
 Key: HIVE-22287
 URL: https://issues.apache.org/jira/browse/HIVE-22287
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


{code}
[mdc@18060 class="utils.MetaStoreUtils" level="WARN" thread="HMSHandler #4"] 
Updating partition stats fast for: web_returns
[mdc@18060 class="utils.MetaStoreUtils" level="WARN" thread="HMSHandler #9"] 
Updated size to 21890 
{code}

The HMS logs are at WARN level, but are missing the partition name in both log 
lines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)