[jira] [Created] (HIVE-24731) NullPointerException with parameterized query

2021-02-03 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24731:
--

 Summary: NullPointerException with parameterized query
 Key: HIVE-24731
 URL: https://issues.apache.org/jira/browse/HIVE-24731
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg


*Repro*
{code:sql}
explain prepare pint2 from select t2.ctinyint as ag from alltypesorc t1 join 
alltypesorc t2 on t1.cint=t2.cint where t1.cint <= ? ;

explain execute pint2 using 100;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24595) Vectorization causing incorrect results for scalar subquery

2021-01-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24595:
--

 Summary: Vectorization causing incorrect results for scalar 
subquery
 Key: HIVE-24595
 URL: https://issues.apache.org/jira/browse/HIVE-24595
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 3.0.0
Reporter: Vineet Garg


*Repro*
{code:sql}
 CREATE EXTERNAL TABLE `alltypessmall`( 
   `id` int,
   `bool_col` boolean,  
   `tinyint_col` tinyint,   
   `smallint_col` smallint, 
   `int_col` int,   
   `bigint_col` bigint, 
   `float_col` float,   
   `double_col` double, 
   `date_string_col` string,
   `string_col` string, 
   `timestamp_col` timestamp)   
 PARTITIONED BY (   
   `year` int,  
   `month` int) 
 ROW FORMAT SERDE   
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
 WITH SERDEPROPERTIES ( 
   'escape.delim'='\\', 
   'field.delim'=',',   
   'serialization.format'=',')  
 STORED AS INPUTFORMAT  
   'org.apache.hadoop.mapred.TextInputFormat'   
 OUTPUTFORMAT   
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
 TBLPROPERTIES (
   'DO_NOT_UPDATE_STATS'='true',
   'OBJCAPABILITIES'='EXTREAD,EXTWRITE',
   'STATS_GENERATED'='TASK',
   'impala.lastComputeStatsTime'='1608312793',  
   'transient_lastDdlTime'='1608310442');

insert into alltypessmall partition(year=2002,month=1) values(1, true, 
3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
insert into alltypessmall partition(year=2002,month=1) values(1, true, 
3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
insert into alltypessmall partition(year=2002,month=1) values(1, true, 
3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001');
{code}
Following query should fail but it succeeds
{code:sql}
SELECT id FROM alltypessmall
WHERE int_col =
  (SELECT int_col
   FROM alltypessmall)
ORDER BY id;
{code}
*Explain plan*
{code:java}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
  Edges:
Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
  DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: alltypessmall
  filterExpr: int_col is not null (type: boolean)
  Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
Column stats: COMPLETE
  Filter Operator
predicate: int_col is not null (type: boolean)
Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: id (type: int), int_col (type: int)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 3 Data size: 24 Basic stats: 
COMPLETE Column stats: COMPLETE
  Map Join Operator
condition map:
 Inner Join 0 to 1
keys:
  0
  1
outputColumnNames: _col0, _col1
input vertices:
  1 Reducer 4
Statistics: Num rows: 3 Data size: 24 Basic stats: 
COMPLETE Column stats: COMPLETE
Map Join Operator
  condition map:
   Inner Join 0 to 1
  keys:
0 _col1 (type: int)
1 _col0 (type: int)
  outputColumnNames: _col0
  input vertices:
1 Map 3
  Statistics: Num rows: 9 Data size: 36 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: int)
null sort

[jira] [Created] (HIVE-24304) Query containing UNION fails with OOM

2020-10-22 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24304:
--

 Summary: Query containing UNION fails with OOM
 Key: HIVE-24304
 URL: https://issues.apache.org/jira/browse/HIVE-24304
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24164) Throw error for parameterized query containing parameters in group by

2020-09-14 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24164:
--

 Summary: Throw error for parameterized query containing parameters 
in group by
 Key: HIVE-24164
 URL: https://issues.apache.org/jira/browse/HIVE-24164
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


e.g. following query should throw a useful error message since parameters 
aren't support in group by
{code:sql}
prepare query1 from select count(*) from src where key > ? and value < ? group 
by ?;
 execute query1 using 1,100,1;
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24140) Improve materialized view authorization check

2020-09-09 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24140:
--

 Summary: Improve materialized view authorization check
 Key: HIVE-24140
 URL: https://issues.apache.org/jira/browse/HIVE-24140
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg


Currently (with HIVE-23454) after mv rewriting authorization check on each mv 
is done and rewrite is rejected if check of any mv fails.
This is inefficient as it does authorization check after rewriting. Ideally 
this check should be done prior to rewrite (and skip rewrite accordingly).

One approach is to check for all mv for tables involved in the query (This may 
cause rewrite to skip even though mv may not be select for rewrite)
Another approach is to cache mv privileges in the registry and refresh them 
periodically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-27 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24087:
--

 Summary: FK side join elimination in presence of PK-FK constraint
 Key: HIVE-24087
 URL: https://issues.apache.org/jira/browse/HIVE-24087
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


If there is PK-FK join FK join could be eliminated by removing FK side if 
following conditions are met
* There is no row filtering on FK side.
* No columns from FK side is required after JOIN.
* FK join columns are guranteed to be unique (have group by)
* FK join columns are guranteed to be NOT NULL (either IS NOT NULL filter or 
constraint)

*Example*
{code:sql}
EXPLAIN 
SELECT customer_removal_n0.*
FROM customer_removal_n0
JOIN
(SELECT lo_custkey
FROM lineorder_removal_n0
WHERE lo_custkey IS NOT NULL
GROUP BY lo_custkey) fkSide ON fkSide.lo_custkey = 
customer_removal_n0.c_custkey;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24010) Support physical transformations for EXECUTE statement

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24010:
--

 Summary: Support physical transformations for EXECUTE statement
 Key: HIVE-24010
 URL: https://issues.apache.org/jira/browse/HIVE-24010
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently these physical transformations are run during PREPARE statement only 
which could miss transforming due to lack of stats or due to lack of actual 
literal/constants.
Ideally once the params are bounded these transformations should be run again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24009) Support partition pruning for EXECUTE statement

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24009:
--

 Summary: Support partition pruning for EXECUTE statement
 Key: HIVE-24009
 URL: https://issues.apache.org/jira/browse/HIVE-24009
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg
Assignee: Vineet Garg


Current partition pruning (compile time) isn't kicked in for EXECUTE statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24008) Improve Explain plan output for EXECUTE queries

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24008:
--

 Summary: Improve Explain plan output for EXECUTE queries
 Key: HIVE-24008
 URL: https://issues.apache.org/jira/browse/HIVE-24008
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently explain on EXECUTE shows prepare query text instead of execute query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24007) Support for EXECUTE/PREPARE operation authorization

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24007:
--

 Summary: Support for EXECUTE/PREPARE operation authorization
 Key: HIVE-24007
 URL: https://issues.apache.org/jira/browse/HIVE-24007
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently both of these operations are created with null input and output 
privileges. This needs to be investigated and fixed/improved to support proper 
authorization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24006) Support dynamic expressions/params in SELECT clause

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24006:
--

 Summary: Support dynamic expressions/params in SELECT clause
 Key: HIVE-24006
 URL: https://issues.apache.org/jira/browse/HIVE-24006
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently it is only supported in WHERE/HAVING clause.

 

One of the challenge here is type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24005) Reduce memory footprint of cached plan

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24005:
--

 Summary: Reduce memory footprint of cached plan
 Key: HIVE-24005
 URL: https://issues.apache.org/jira/browse/HIVE-24005
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently whole {{SemanticAnalyzer}} object is cached. It is not necessary to 
cache the whole object (caching requires set of tasks, query config and some 
other info). 

May be create a CachedPlan object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24003) Stats annotation/collection for EXECUTE statements

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24003:
--

 Summary: Stats annotation/collection for EXECUTE statements
 Key: HIVE-24003
 URL: https://issues.apache.org/jira/browse/HIVE-24003
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Once the parameters are bounded we should re-run stats annotation so that all 
the physical statistics based optimizations could be run.

Currently stats are either collected (and skipped for dynamic param 
expressions) during PREPARE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24002) Allow 'expressions' instead of 'constant literal' for EXECUTE

2020-08-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-24002:
--

 Summary: Allow 'expressions' instead of 'constant literal' for 
EXECUTE
 Key: HIVE-24002
 URL: https://issues.apache.org/jira/browse/HIVE-24002
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently EXECUTE statement grammar allows constant literal only. This prevents 
negative literals or constant expressions like {{1+2}}. Changing it to 
{{expression}} type will allow broader type of literals. Note that this will 
require additional error checking after parsing to only allow "acceptable" 
literals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23951) Support parameterized queries in WHERE/HAVING clause

2020-07-29 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23951:
--

 Summary: Support parameterized queries in WHERE/HAVING clause
 Key: HIVE-23951
 URL: https://issues.apache.org/jira/browse/HIVE-23951
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23950) Support PREPARE and EXECUTE statements

2020-07-29 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23950:
--

 Summary: Support PREPARE and EXECUTE statements
 Key: HIVE-23950
 URL: https://issues.apache.org/jira/browse/HIVE-23950
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


PREPARE and EXECUTE statements provide an ability to create a parameterized 
query and re-use it to execute with different parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-08 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23822:
--

 Summary: Sorted dynamic partition optimization could remove auto 
stat task
 Key: HIVE-23822
 URL: https://issues.apache.org/jira/browse/HIVE-23822
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


{{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23807) Wrong results with vectorization enabled

2020-07-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23807:
--

 Summary: Wrong results with vectorization enabled
 Key: HIVE-23807
 URL: https://issues.apache.org/jira/browse/HIVE-23807
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 2.3.0
Reporter: Vineet Garg
Assignee: Vineet Garg


*Repro*
{code:sql}
CREATE TABLE `test13`(
  `portfolio_valuation_date` string,
  `price_cut_off_datetime` string,
  `portfolio_id_valuation_source` string,
  `contributor_full_path` string,
  `position_market_value` double,
  `mandate_name` string)
STORED AS ORC;

INSERT INTO test13 values (
"2020-01-31",   "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.26,  "foo");

INSERT INTO test13 values (
"2020-01-31",   "2020-02-07T03:14:48.007Z", "37",   NULL,   0.33,   "foo");

INSERT INTO test13 values (
"2020-01-31",   "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.03,  "foo");

INSERT INTO test13 values (
"2020-01-31",   "2020-02-07T03:14:48.007Z", "37",   NULL,   0.16,   "foo");

INSERT INTO test13 values (
"2020-01-31",   "2020-02-07T03:14:48.007Z", "37",   NULL,   0.08,   "foo");

set hive.fetch.task.conversion=none;
set hive.explain.user=false;

set hive.vectorized.execution.enabled=false;
select Cast(`test13`.`price_cut_off_datetime` AS date) from test13;


set hive.vectorized.execution.enabled=true;
select Cast(`test13`.`price_cut_off_datetime` AS date) from test13;
{code:sql}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72491: Move q tests to TestMiniLlapLocal from TestCliDriver where the output is different, batch 4

2020-05-11 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72491/#review220708
---




ql/src/test/results/clientpositive/llap/udf_mask_show_last_n.q.out
Line 80 (original), 78 (patched)
<https://reviews.apache.org/r/72491/#comment309339>

Result mismatch


- Vineet Garg


On May 11, 2020, 7:05 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72491/
> ---
> 
> (Updated May 11, 2020, 7:05 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, John Sherman, Zoltan 
> Haindrich, Krisztian Kasa, Steve Carlin, and Vineet Garg.
> 
> 
> Bugs: HIVE-23440
> https://issues.apache.org/jira/browse/HIVE-23440
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Move q tests to TestMiniLlapLocal from TestCliDriver where the output is 
> different, batch 4
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientpositive/union.q 3f40a25d49 
>   
> ql/src/test/results/clientpositive/llap/temp_table_merge_dynamic_partition.q.out
>  8b1cfad5ab 
>   
> ql/src/test/results/clientpositive/llap/temp_table_merge_dynamic_partition2.q.out
>  413a3f2a63 
>   
> ql/src/test/results/clientpositive/llap/temp_table_merge_dynamic_partition3.q.out
>  12d5d59fb1 
>   
> ql/src/test/results/clientpositive/llap/temp_table_merge_dynamic_partition4.q.out
>  8ddbb96fca 
>   
> ql/src/test/results/clientpositive/llap/temp_table_merge_dynamic_partition5.q.out
>  7dbf56cf39 
>   ql/src/test/results/clientpositive/llap/temp_table_options1.q.out 
> be31a5a289 
>   
> ql/src/test/results/clientpositive/llap/temp_table_parquet_mixed_partition_formats2.q.out
>  23bb41edfe 
>   ql/src/test/results/clientpositive/llap/temp_table_partition_boolexpr.q.out 
> d4af83b320 
>   
> ql/src/test/results/clientpositive/llap/temp_table_partition_condition_remover.q.out
>  18f5348f0f 
>   ql/src/test/results/clientpositive/llap/temp_table_partition_ctas.q.out 
> bd3574f03f 
>   
> ql/src/test/results/clientpositive/llap/temp_table_partition_multilevels.q.out
>  2ea8bf8631 
>   ql/src/test/results/clientpositive/llap/temp_table_partition_pruning.q.out 
> f6fdd61928 
>   
> ql/src/test/results/clientpositive/llap/temp_table_windowing_expressions.q.out
>  c45f36e988 
>   ql/src/test/results/clientpositive/llap/test_teradatabinaryfile.q.out 
> 75584e9ba2 
>   ql/src/test/results/clientpositive/llap/timestamp.q.out 90a46f58f4 
>   ql/src/test/results/clientpositive/llap/timestamp_comparison3.q.out 
> 3977be77f7 
>   ql/src/test/results/clientpositive/llap/timestamp_ints_casts.q.out 
> 572c49ea72 
>   ql/src/test/results/clientpositive/llap/timestamp_literal.q.out cfcd06f907 
>   ql/src/test/results/clientpositive/llap/timestamptz.q.out 09c50ddf10 
>   ql/src/test/results/clientpositive/llap/truncate_column_list_bucket.q.out 
> c8e40bd447 
>   ql/src/test/results/clientpositive/llap/type_cast_1.q.out 22dad1a0f2 
>   ql/src/test/results/clientpositive/llap/type_widening.q.out f295e66ee9 
>   ql/src/test/results/clientpositive/llap/udaf_binarysetfunctions.q.out 
> 86dbcf6f57 
>   
> ql/src/test/results/clientpositive/llap/udaf_binarysetfunctions_no_cbo.q.out 
> 6857ca9739 
>   ql/src/test/results/clientpositive/llap/udaf_number_format.q.out 822ea784ba 
>   ql/src/test/results/clientpositive/llap/udaf_percentile_approx_23.q.out 
> c200ecf75a 
>   ql/src/test/results/clientpositive/llap/udaf_percentile_cont.q.out 
> 509ae7bfe6 
>   ql/src/test/results/clientpositive/llap/udaf_percentile_disc.q.out 
> e7efcf9302 
>   ql/src/test/results/clientpositive/llap/udf1.q.out 9647770bcd 
>   ql/src/test/results/clientpositive/llap/udf2.q.out bcc2faa16a 
>   ql/src/test/results/clientpositive/llap/udf3.q.out 18abd9560c 
>   ql/src/test/results/clientpositive/llap/udf4.q.out d9b841aab9 
>   ql/src/test/results/clientpositive/llap/udf5.q.out 58a1dab60b 
>   ql/src/test/results/clientpositive/llap/udf6.q.out e6d58324c7 
>   ql/src/test/results/clientpositive/llap/udf7.q.out 44b282f82e 
>   ql/src/test/results/clientpositive/llap/udf8.q.out 8e8ca424b4 
>   ql/src/test/results/clientpositive/llap/udf9.q.out a55b3cdb34 
>   ql/src/test/results/clientpositive/llap/udf_10_trims.q.out 41eefa3e8c 
>   ql/src/test/results/clientpositive/llap/udf_E.q.out 469f396a85 
>   ql/src/test/results/clientpositive/llap/udf_PI.q.out a9ec8c1e06 
>   ql/src/test/results/clientpositive/llap/udf_abs.q.out fee7592ec9 
>   ql/

Re: Review Request 72485: Move q tests to TestMiniLlapLocal from TestCliDriver where the output is different, batch 3

2020-05-07 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72485/#review220686
---




ql/src/test/results/clientpositive/llap/parquet_vectorization_limit.q.out
Line 435 (original), 541 (patched)
<https://reviews.apache.org/r/72485/#comment309282>

result is diffrent



ql/src/test/results/clientpositive/llap/partition_condition_remover.q.out
Line 52 (original)
<https://reviews.apache.org/r/72485/#comment309283>

statistics missing



ql/src/test/results/clientpositive/llap/partition_discovery.q.out
Line 317 (original)
<https://reviews.apache.org/r/72485/#comment309284>

msck not working


- Vineet Garg


On May 7, 2020, 5:04 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72485/
> ---
> 
> (Updated May 7, 2020, 5:04 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, John Sherman, Zoltan 
> Haindrich, Krisztian Kasa, Steve Carlin, and Vineet Garg.
> 
> 
> Bugs: HIVE-23403
> https://issues.apache.org/jira/browse/HIVE-23403
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Move q tests to TestMiniLlapLocal from TestCliDriver where the output is 
> different, batch 3
> 
> 
> Diffs
> -
> 
>   ql/src/test/results/clientpositive/llap/mergejoins_mixed.q.out 04bb90c370 
>   ql/src/test/results/clientpositive/llap/metadataOnlyOptimizer.q.out 
> 1671c6b8b4 
>   ql/src/test/results/clientpositive/llap/mm_buckets.q.out e2c31637fa 
>   ql/src/test/results/clientpositive/llap/msck_repair_0.q.out 94da7c3aaf 
>   ql/src/test/results/clientpositive/llap/msck_repair_1.q.out 5f94246e67 
>   ql/src/test/results/clientpositive/llap/msck_repair_2.q.out 90f77b7cde 
>   ql/src/test/results/clientpositive/llap/msck_repair_3.q.out c18da6f437 
>   ql/src/test/results/clientpositive/llap/msck_repair_acid.q.out 902a4b7d80 
>   ql/src/test/results/clientpositive/llap/msck_repair_batchsize.q.out 
> bedfac7d28 
>   ql/src/test/results/clientpositive/llap/msck_repair_drop.q.out 04179f3304 
>   ql/src/test/results/clientpositive/llap/multi_insert_distinct.q.out 
> eefa1e1197 
>   ql/src/test/results/clientpositive/llap/multi_insert_gby.q.out d36dc8de00 
>   ql/src/test/results/clientpositive/llap/multi_insert_gby2.q.out c3db38642b 
>   ql/src/test/results/clientpositive/llap/multi_insert_gby3.q.out 23518f7ac2 
>   ql/src/test/results/clientpositive/llap/multi_insert_gby4.q.out abb749b78b 
>   ql/src/test/results/clientpositive/llap/multi_insert_mixed.q.out b7b721e500 
>   
> ql/src/test/results/clientpositive/llap/multi_insert_move_tasks_share_dependencies.q.out
>  4e34c11af0 
>   ql/src/test/results/clientpositive/llap/multi_insert_union_src.q.out 
> 90597f37d9 
>   ql/src/test/results/clientpositive/llap/multi_insert_with_join2.q.out 
> bdb876e618 
>   ql/src/test/results/clientpositive/llap/multi_join_union.q.out ac3fd77714 
>   ql/src/test/results/clientpositive/llap/multigroupby_singlemr.q.out 
> 3ae1152645 
>   ql/src/test/results/clientpositive/llap/named_column_join.q.out 9c0250e5e5 
>   ql/src/test/results/clientpositive/llap/nested_column_pruning.q.out 
> 233995910c 
>   ql/src/test/results/clientpositive/llap/no_hooks.q.out 7583863800 
>   ql/src/test/results/clientpositive/llap/noalias_subq1.q.out 7cbb6ac993 
>   ql/src/test/results/clientpositive/llap/nonblock_op_deduplicate.q.out 
> 96b7c0a7d2 
>   ql/src/test/results/clientpositive/llap/notable_alias1.q.out 258840619b 
>   ql/src/test/results/clientpositive/llap/notable_alias2.q.out 919c86243b 
>   ql/src/test/results/clientpositive/llap/null_cast.q.out 280a5a1267 
>   
> ql/src/test/results/clientpositive/llap/nullability_transitive_inference.q.out
>  fe3f33bf3e 
>   ql/src/test/results/clientpositive/llap/nullgroup.q.out 4d1517e039 
>   ql/src/test/results/clientpositive/llap/nullgroup2.q.out ea8cc08ab0 
>   ql/src/test/results/clientpositive/llap/nullgroup3.q.out 57d87feb92 
>   ql/src/test/results/clientpositive/llap/nullgroup4.q.out 8797fa0ee2 
>   ql/src/test/results/clientpositive/llap/nullgroup4_multi_distinct.q.out 
> 09c59d395e 
>   ql/src/test/results/clientpositive/llap/nullgroup5.q.out 77eeafcfb9 
>   ql/src/test/results/clientpositive/llap/num_op_type_conv.q.out a0884140f0 
>   ql/src/test/results/clientpositive/llap/offset_limit_global_optimizer.q.out 
> 5b20c2b1a9 
>   ql/src/test/results/clientpositive/llap/optimize_filter_literal.q.out 
> 7542d76305 
>   ql/src/test/results/clien

Re: Review Request 72466: Move q tests to TestMiniLlapLocal from TestCliDriver where the output is different, batch 2

2020-05-04 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72466/#review220610
---




ql/src/test/results/clientpositive/llap/input_part3.q.out
Line 24 (original)
<https://reviews.apache.org/r/72466/#comment309115>

Statistics are missing



ql/src/test/results/clientpositive/llap/input_part4.q.out
Line 22 (original)
<https://reviews.apache.org/r/72466/#comment309116>

Stats missing



ql/src/test/results/clientpositive/llap/input_part6.q.out
Line 26 (original)
<https://reviews.apache.org/r/72466/#comment309117>

Stats missing



ql/src/test/results/clientpositive/llap/input_part8.q.out
Line 26 (original)
<https://reviews.apache.org/r/72466/#comment309118>

Stats missing



ql/src/test/results/clientpositive/llap/insert0.q.out
Line 126 (original), 122 (patched)
<https://reviews.apache.org/r/72466/#comment309119>

result changed


- Vineet Garg


On May 4, 2020, 10:40 a.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72466/
> ---
> 
> (Updated May 4, 2020, 10:40 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, John Sherman, Zoltan 
> Haindrich, Krisztian Kasa, Steve Carlin, and Vineet Garg.
> 
> 
> Bugs: HIVE-23337
> https://issues.apache.org/jira/browse/HIVE-23337
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Move q tests to TestMiniLlapLocal from TestCliDriver where the output is 
> different, batch 2
> 
> 
> Diffs
> -
> 
>   ql/src/test/results/clientpositive/llap/groupby9.q.out 8eaa2e9d1f 
>   ql/src/test/results/clientpositive/llap/groupby_complex_types.q.out 
> e784a5e04a 
>   
> ql/src/test/results/clientpositive/llap/groupby_complex_types_multi_single_reducer.q.out
>  dd2ea4a357 
>   ql/src/test/results/clientpositive/llap/groupby_cube1.q.out 0ac1490e34 
>   ql/src/test/results/clientpositive/llap/groupby_cube_multi_gby.q.out 
> af37eaca1a 
>   ql/src/test/results/clientpositive/llap/groupby_distinct_samekey.q.out 
> 901d6378ff 
>   ql/src/test/results/clientpositive/llap/groupby_duplicate_key.q.out 
> 44e8ef6952 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_id3.q.out 
> cdc063b370 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets1.q.out 
> 43ab99b9f1 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets2.q.out 
> 7831a49e95 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets3.q.out 
> a08dd02490 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets4.q.out 
> b61aba926d 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets5.q.out 
> b6b4dcb339 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets6.q.out 
> f6571b4645 
>   
> ql/src/test/results/clientpositive/llap/groupby_grouping_sets_grouping.q.out 
> 93e081b729 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets_limit.q.out 
> b4aa6d1dd0 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_sets_view.q.out 
> 582b78092c 
>   ql/src/test/results/clientpositive/llap/groupby_grouping_window.q.out 
> 21d92567d5 
>   ql/src/test/results/clientpositive/llap/groupby_join_pushdown.q.out 
> 2138eae171 
>   ql/src/test/results/clientpositive/llap/groupby_map_ppr.q.out 952f310071 
>   
> ql/src/test/results/clientpositive/llap/groupby_map_ppr_multi_distinct.q.out 
> bd43f546dd 
>   
> ql/src/test/results/clientpositive/llap/groupby_multi_insert_common_distinct.q.out
>  991f343394 
>   ql/src/test/results/clientpositive/llap/groupby_multi_single_reducer.q.out 
> 756c179e8b 
>   ql/src/test/results/clientpositive/llap/groupby_multi_single_reducer2.q.out 
> d151470d6c 
>   ql/src/test/results/clientpositive/llap/groupby_multi_single_reducer3.q.out 
> 4b4d57f2a0 
>   ql/src/test/results/clientpositive/llap/groupby_multialias.q.out 1a42ff23a7 
>   ql/src/test/results/clientpositive/llap/groupby_nocolumnalign.q.out 
> 3a92e71a75 
>   ql/src/test/results/clientpositive/llap/groupby_position.q.out 17f02c9089 
>   ql/src/test/results/clientpositive/llap/groupby_ppd.q.out 5731e9d5c2 
>   ql/src/test/results/clientpositive/llap/groupby_ppr.q.out d7549d9536 
>   ql/src/test/results/clientpositive/llap/groupby_ppr_multi_distinct.q.out 
> 95f95b0613 
>   ql/src/test/results/clientpositive/llap/groupby_rollup1.q.out e7b61b4a33 
>   ql/src/test/results/clientpositive/llap/groupby_sort_10.q.out 570d3eeeaf 
>   ql/src/test/results/clientpositive/llap/group

[jira] [Created] (HIVE-23366) Investigate why bucket_map_join*, bucketcontext_* and bucketmapjoin* tests are missing bucket map join when run under TestMiniLlapLocalCliDriver

2020-05-04 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23366:
--

 Summary: Investigate why bucket_map_join*, bucketcontext_* and 
bucketmapjoin* tests are missing bucket map join when run under 
TestMiniLlapLocalCliDriver
 Key: HIVE-23366
 URL: https://issues.apache.org/jira/browse/HIVE-23366
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Reporter: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72431: HIVE-23206

2020-04-27 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72431/#review220503
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinProjectTransposeRule.java
Lines 34 (patched)
<https://reviews.apache.org/r/72431/#comment308966>

You can get away with having just one class 
HiveJoinProjectBtwJoinTransposeRule and a boolean flag indicating if it is LEFT 
or RIGHT. Based on the boolean flag hasLeftChild and hasRightChild will return 
accordingly.



ql/src/test/results/clientpositive/llap/keep_uniform.q.out
Lines 946 (patched)
<https://reviews.apache.org/r/72431/#comment308967>

Why is there an extra join in the plan now?



ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query14.q.out
Lines 240 (patched)
<https://reviews.apache.org/r/72431/#comment308968>

This looks like an extra join as compared to earlier (including few more in 
this plan). Any idea why is this?


- Vineet Garg


On April 27, 2020, 6:11 a.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72431/
> ---
> 
> (Updated April 27, 2020, 6:11 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Steve Carlin, and Vineet 
> Garg.
> 
> 
> Bugs: HIVE-23206
> https://issues.apache.org/jira/browse/HIVE-23206
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Project not defined correctly after reordering a join
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties c55f8db61a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinProjectTransposeRule.java
>  492c55e050 
>   ql/src/test/queries/clientpositive/join_reorder5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/auto_join22.q.out 5a98716fed 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out 2e9e6027ae 
>   ql/src/test/results/clientpositive/filter_cond_pushdown.q.out 74a7aa89e7 
>   ql/src/test/results/clientpositive/join22.q.out ad34bc4310 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out 
> f063766a1f 
>   ql/src/test/results/clientpositive/llap/join_reorder5.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/keep_uniform.q.out 54d0b5fab6 
>   ql/src/test/results/clientpositive/llap/sharedwork.q.out f8d3b4b2f5 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 311cee743d 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query2.q.out 26a98ffcec 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query59.q.out abc5d999b5 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query95.q.out 218ca7d8b6 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query14.q.out 
> eaa1defa81 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query2.q.out 
> 4c90da4476 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query59.q.out 
> 8d17cc79d1 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query95.q.out 
> ace074316b 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out 
> 8204245245 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query2.q.out 
> 6669e6 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query59.q.out 
> f7c7260077 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query95.q.out 
> 39d35ec330 
>   ql/src/test/results/clientpositive/perf/tez/query2.q.out 0e67e97c02 
>   ql/src/test/results/clientpositive/perf/tez/query59.q.out 1a2ba964f4 
>   ql/src/test/results/clientpositive/perf/tez/query95.q.out f15afbed4b 
>   ql/src/test/results/clientpositive/runtime_skewjoin_mapjoin_spark.q.out 
> 9547e4fa7c 
>   ql/src/test/results/clientpositive/smb_mapjoin_25.q.out 8fb82e1659 
> 
> 
> Diff: https://reviews.apache.org/r/72431/diff/2/
> 
> 
> Testing
> ---
> 
> mvn test -Dtest.output.overwrite -DskipSparkTests 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=join_reorder5.q -pl itests/qtest 
> -Pitests
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>



[jira] [Created] (HIVE-23216) Add new api as replacement of get_partitions_by_expr to return PartitionSpec instead of Partitions

2020-04-15 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23216:
--

 Summary: Add new api as replacement of get_partitions_by_expr to 
return PartitionSpec instead of Partitions
 Key: HIVE-23216
 URL: https://issues.apache.org/jira/browse/HIVE-23216
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23160) get_partitions_with_specs fail to close the query

2020-04-08 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23160:
--

 Summary: get_partitions_with_specs fail to close the query
 Key: HIVE-23160
 URL: https://issues.apache.org/jira/browse/HIVE-23160
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Standalone Metastore
Affects Versions: 4.0.0
Reporter: Vineet Garg


The api relies on try to close the resource (query) but it fails (likely 
because try is calling close but instead closeAll need to be called)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23152) Support CachedStore with get_partitions_with_specs

2020-04-07 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23152:
--

 Summary: Support CachedStore with get_partitions_with_specs
 Key: HIVE-23152
 URL: https://issues.apache.org/jira/browse/HIVE-23152
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23148) get_partitions_with_specs fail for postgres with argument type mismatch exception

2020-04-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23148:
--

 Summary: get_partitions_with_specs fail for postgres with argument 
type mismatch exception 
 Key: HIVE-23148
 URL: https://issues.apache.org/jira/browse/HIVE-23148
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 4.0.0
Reporter: Vineet Garg


{code}
MetaException(message:java.lang.IllegalArgumentException: Cannot invoke 
org.apache.hadoop.hive.metastore.api.StorageDescriptor.setNumBuckets on bean 
class 'class org.apache.hadoop.hive.metastore.api.StorageDescriptor' - argument 
type mismatch - had objects of type "java.lang.Long" but expected signature 
"int"
at 
org.apache.commons.beanutils.PropertyUtilsBean.invokeMethod(PropertyUtilsBean.java:2196)
at 
org.apache.commons.beanutils.PropertyUtilsBean.setSimpleProperty(PropertyUtilsBean.java:2109)
at 
org.apache.commons.beanutils.PropertyUtilsBean.setNestedProperty(PropertyUtilsBean.java:1915)
at 
org.apache.commons.beanutils.PropertyUtils.setNestedProperty(PropertyUtils.java:866)
at 
org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.setNestedProperty(MetaStoreUtils.java:1938)
at 
org.apache.hadoop.hive.metastore.PartitionProjectionEvaluator$1.setValue(PartitionProjectionEvaluator.java:412)
at 
org.apache.hadoop.hive.metastore.PartitionProjectionEvaluator.traverseAndSetValues(PartitionProjectionEvaluator.java:501)
at 
org.apache.hadoop.hive.metastore.PartitionProjectionEvaluator.traverseAndSetValues(PartitionProjectionEvaluator.java:505)
at 
org.apache.hadoop.hive.metastore.PartitionProjectionEvaluator.setSingleValuedFields(PartitionProjectionEvaluator.java:392)
at 
org.apache.hadoop.hive.metastore.PartitionProjectionEvaluator.getPartitionsUsingProjectionList(PartitionProjectionEvaluator.java:358)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql$3.run(MetaStoreDirectSql.java:642)
at 
org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsUsingProjectionAndFilterSpec(MetaStoreDirectSql.java:639)
at 
org.apache.hadoop.hive.metastore.ObjectStore$15.getSqlResult(ObjectStore.java:4344)
at 
org.apache.hadoop.hive.metastore.ObjectStore$15.getSqlResult(ObjectStore.java:4315)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3989)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionSpecsByFilterAndProjection(ObjectStore.java:4410)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at 
com.sun.proxy.$Proxy26.getPartitionSpecsByFilterAndProjection(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_with_specs(HiveMetaStore.java:5356)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy27.get_partitions_with_specs(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_with_specs.getResult(ThriftHiveMetastore.java:21620)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_with_specs.getResult(ThriftHiveMetastore.java:21604)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThrif

[jira] [Created] (HIVE-23147) get_partitions_with_sepcs fail for POSTGRES

2020-04-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23147:
--

 Summary: get_partitions_with_sepcs fail for POSTGRES
 Key: HIVE-23147
 URL: https://issues.apache.org/jira/browse/HIVE-23147
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 4.0.0
Reporter: Vineet Garg


Query to fetch partition related data looks like following:
{code:sql}
SELECT "sds"."output_format", 
   "serdes"."name", 
   "serdes"."slib", 
   "sds"."location", 
   "sds"."input_format", 
   "sds"."num_buckets", 
   "sds"."is_compressed", 
   "serdes"."serde_id", 
   "sds"."cd_id", 
   "sds"."sd_id", 
   "partitions"."part_id" 
FROM   partitions 
   LEFT OUTER JOIN sds 
ON partitions."sd_id" = sds."sd_id" 
   LEFT OUTER JOIN serdes 
ON sds."serde_id" = serdes."serde_id" 
WHERE
{code}

This fails for postgres because table references are not quoted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23146) get_partitions_with_specs fail in JDO path if only parent field is provided for a nested field

2020-04-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23146:
--

 Summary: get_partitions_with_specs fail in JDO path if only parent 
field is provided for a nested field
 Key: HIVE-23146
 URL: https://issues.apache.org/jira/browse/HIVE-23146
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Vineet Garg


*PartitionProjectionEvaluator.validate(partitionFields)* is used to validate 
fields which require fully qualified name. Direct SQL path works in such cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23145) get_partitions_with_specs fails if filter expression is not parsable

2020-04-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23145:
--

 Summary: get_partitions_with_specs fails if filter expression is 
not parsable
 Key: HIVE-23145
 URL: https://issues.apache.org/jira/browse/HIVE-23145
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Vineet Garg


Expression is not parsable in most of the cases. Current API 
*get_partitions_by_expr* anticipates this and provide a fallback mechanism. 
This basically deserialize the provided expression, fetches all partition names 
for the table, prune partition names using the expression and then uses the 
names to fetch required partition data.
Note that this expect serialized expression instead of string.

This need to be done for both Direct SQL and JDO path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23127) Replace listPartitionsByExpr with GetPartitionsWithSpecs in Partition pruner

2020-04-01 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23127:
--

 Summary: Replace listPartitionsByExpr with GetPartitionsWithSpecs 
in Partition pruner
 Key: HIVE-23127
 URL: https://issues.apache.org/jira/browse/HIVE-23127
 Project: Hive
  Issue Type: Task
  Components: HiveServer2
Reporter: Vineet Garg
Assignee: Vineet Garg


GetPartitionsWithSpecs reduces data transfer by deduplicating storage descriptor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23071) Remove hive.optimize.sort.dynamic.partition config

2020-03-24 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23071:
--

 Summary: Remove hive.optimize.sort.dynamic.partition config
 Key: HIVE-23071
 URL: https://issues.apache.org/jira/browse/HIVE-23071
 Project: Hive
  Issue Type: Task
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


{{hive.optimize.sort.dynamic.partition.threshold}} has replaced this config, we 
should remove the original config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23066) [Subqueries] Throw an error if COALESCE/NVL is used in correlated condition

2020-03-20 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23066:
--

 Summary: [Subqueries] Throw an error if COALESCE/NVL is used in 
correlated condition
 Key: HIVE-23066
 URL: https://issues.apache.org/jira/browse/HIVE-23066
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


NVL could lead to wrong results
{code:sql}
create table TABLEA (id int, lib string);
insert into TABLEA values (1, 'a'),(2, 'b'),(3, null),(5,'zx');
create table TABLEB (id int, lib string);
insert into TABLEB values (1, 'a'),(4, 'c'),(3, null),(5,'zy');

select *
from TABLEA a
where exists (
   select 1
   from TABLEB b
   where nvl(a.lib,0) = nvl(b.lib,0)
);
{code}

***OUTPUT***
{noformat}
+---++
| a.id  | a.lib  |
+---++
| 1 | a  |
+---++
{noformat}

***EXPECTED***
{noformat}
+---++
| a.id  | a.lib  |
+---++
| 1 | a  |
| 3 | NULL   |
+---++
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23050) Partition pruning cache miss during compilation

2020-03-18 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-23050:
--

 Summary: Partition pruning cache miss during compilation
 Key: HIVE-23050
 URL: https://issues.apache.org/jira/browse/HIVE-23050
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


{code:sql}
create table pcr_t1 (key int, value string) partitioned by (ds string);

insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
where key < 20 order by key;
insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
where key < 20 order by key;
insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
where key < 20 order by key;

explain extended select key, value, ds from pcr_t1 where (ds='2000-04-08' and 
key=1) or (ds='2000-04-09' and key=2) order by key, value, ds
{code}

During query compilation HivePartitionPruner fetches list of partition and 
caches it, later PCR (partition condition removal) tries to get pruned 
partitions but due to cache miss, request goes to metastore server to retrieve 
pruned partitions using listPartitions.

Improvement here would be to use the list of partitions already cached to do 
the partition pruning for PCR or pruning in general
(I am not sure why HivePartitionPruner isn't able to do partition pruning in 
the first place)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22885) HiveMetaStore should log end time for operation requests

2020-02-12 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22885:
--

 Summary: HiveMetaStore should log end time for operation requests
 Key: HIVE-22885
 URL: https://issues.apache.org/jira/browse/HIVE-22885
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72063: HIVE-10362: Support Type check/conversion in dynamic partition column

2020-02-06 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72063/#review219519
---




ql/src/test/queries/clientpositive/dynpart_cast.q
Lines 2 (patched)
<https://reviews.apache.org/r/72063/#comment307717>

Thanks for addressing the comments. Just another minor comment. 
Can we add this test under 'minillap.query.files'? Right now this is MR 
test but since we use Tez/Llap it will be better under that.
You'll need to run "TestTezMiniLlapCliDriver' to genrate q file


- Vineet Garg


On Feb. 6, 2020, 2:22 p.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72063/
> ---
> 
> (Updated Feb. 6, 2020, 2:22 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-10362
> https://issues.apache.org/jira/browse/HIVE-10362
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Example:
> create table dynparttypechecknum (key int, value string) partitioned by (part 
> int);
> insert into dynparttypechecknum partition (part) select key, value, '1' 
> from src limit 1;
> show partitions dynparttypechecknum;
> 
> Partition created will be named:
> part=1
> even though the type of `part` is int.
> 
> Solution is to cast the inserted DP columns in the SelectOperator before 
> FileSinkOperator which creates the partition dir, not after.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 12a022c590 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> c2514eedb1 
>   ql/src/test/queries/clientpositive/dynpart_cast.q PRE-CREATION 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out da3be3e5bb 
>   ql/src/test/results/clientpositive/dynpart_cast.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out 
> 43bb789840 
>   ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out 
> f745b46899 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_opt_bucketing.q.out 
> 453d2451df 
>   ql/src/test/results/clientpositive/llap/orc_merge1.q.out 9da73e65ac 
>   ql/src/test/results/clientpositive/llap/orc_merge10.q.out a6ea33493f 
>   ql/src/test/results/clientpositive/llap/orc_merge2.q.out 9b0d3b4234 
>   ql/src/test/results/clientpositive/llap/orc_merge_diff_fs.q.out d35f44b10a 
>   ql/src/test/results/clientpositive/llap/rcfile_merge2.q.out fcff20a68e 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 4ad78d8582 
>   ql/src/test/results/clientpositive/orc_merge1.q.out 9c07816340 
>   ql/src/test/results/clientpositive/orc_merge10.q.out 4a5f03c82f 
>   ql/src/test/results/clientpositive/orc_merge2.q.out d132d62b18 
>   ql/src/test/results/clientpositive/orc_merge_diff_fs.q.out 7f9a04b09f 
>   ql/src/test/results/clientpositive/smb_join_partition_key.q.out c18d01d26a 
>   
> ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out 
> 56d5ed945b 
>   ql/src/test/results/clientpositive/spark/orc_merge1.q.out 977c4cbfc1 
>   ql/src/test/results/clientpositive/spark/orc_merge2.q.out 4647b86ea3 
>   ql/src/test/results/clientpositive/spark/orc_merge_diff_fs.q.out b7d3dd725d 
> 
> 
> Diff: https://reviews.apache.org/r/72063/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 72063: HIVE-10362: Support Type check/conversion in dynamic partition column

2020-02-05 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72063/#review219513
---




ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java
Lines 88 (patched)
<https://reviews.apache.org/r/72063/#comment307708>

This will only be used if DYNAMICPARTITIONCONVERT is set to true but right 
now the map is always populated. It will be good to explicitly populate the map 
(may be by a call from genConversionSelectOperator before adding the cast)



ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java
Lines 222 (patched)
<https://reviews.apache.org/r/72063/#comment307709>

A comment would be helpful here to explain why we need to keep it



ql/src/test/queries/clientpositive/dynpart_cast.q
Lines 7 (patched)
<https://reviews.apache.org/r/72063/#comment307710>

Can we add explain plan?



ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out
Lines 36 (patched)
<https://reviews.apache.org/r/72063/#comment307711>

This test is to test Sort dynamic bucket partitioning, with the change this 
optimization is not kicking in anymore (PARTITION_BUCKET_SORTED keyword is 
missing in File sink). We should understand why is it happening and may be 
either fix it or log a jira.


- Vineet Garg


On Jan. 30, 2020, 3:30 p.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72063/
> ---
> 
> (Updated Jan. 30, 2020, 3:30 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-10362
> https://issues.apache.org/jira/browse/HIVE-10362
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Example:
> create table dynparttypechecknum (key int, value string) partitioned by (part 
> int);
> insert into dynparttypechecknum partition (part) select key, value, '1' 
> from src limit 1;
> show partitions dynparttypechecknum;
> 
> Partition created will be named:
> part=1
> even though the type of `part` is int.
> 
> Solution is to cast the inserted DP columns in the SelectOperator before 
> FileSinkOperator which creates the partition dir, not after.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 12a022c590 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> c2514eedb1 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java 
> c1aeb8f136 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java 
> 2c4b69b2fe 
>   ql/src/test/queries/clientpositive/dynpart_cast.q PRE-CREATION 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out da3be3e5bb 
>   ql/src/test/results/clientpositive/dynpart_cast.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out 
> 43bb789840 
>   ql/src/test/results/clientpositive/infer_bucket_sort_num_buckets.q.out 
> f745b46899 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
> fc9050b2c3 
>   
> ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out 
> 95aae7286f 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out 24026d0bab 
>   ql/src/test/results/clientpositive/llap/orc_merge1.q.out 9da73e65ac 
>   ql/src/test/results/clientpositive/llap/orc_merge10.q.out a6ea33493f 
>   ql/src/test/results/clientpositive/llap/orc_merge2.q.out 9b0d3b4234 
>   ql/src/test/results/clientpositive/llap/orc_merge_diff_fs.q.out d35f44b10a 
>   ql/src/test/results/clientpositive/llap/rcfile_merge2.q.out fcff20a68e 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 4ad78d8582 
>   ql/src/test/results/clientpositive/orc_merge1.q.out 9c07816340 
>   ql/src/test/results/clientpositive/orc_merge10.q.out 4a5f03c82f 
>   ql/src/test/results/clientpositive/orc_merge2.q.out d132d62b18 
>   ql/src/test/results/clientpositive/orc_merge_diff_fs.q.out 7f9a04b09f 
>   ql/src/test/results/clientpositive/smb_join_partition_key.q.out c18d01d26a 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out 
> bc6c3add54 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 
> 67b62c1265 
>   
> ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out 
> 56d5ed945b 
>   ql/src/test/results/clientpositive/spark/orc_merge1.q.out 977c4cbfc1 
>   ql/src/test/results/clientpositive/spark/orc_merge2.q.out 4647b86ea3 
>   ql/src/test/results/clientpositive/spark/orc_merge_diff_fs.q.out b7d3dd725d 
> 
> 
> Diff: https://reviews.apache.org/r/72063/diff/2/
> 
> 
> Testing
> ---
> 
> There were changes in query output in two spark auto_sortmerge_join_16.q.out 
> files. They now match the query output of llap/auto_sortmerge_join_16.q.out.
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 72073: HIVE-22808

2020-02-04 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72073/#review219501
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableFunctionScan.java
Line 33 (original), 34 (patched)
<https://reviews.apache.org/r/72073/#comment307699>

It is not good idea to extend Logical nodes in Hive. Fix should instead be 
in RelFieldTrimmer to handle TableFunctionScan type.
Ideally it should be fixed in calcite, but for now we can copy the 
implementation of LogicalTableFunctionScan from calcite and resuse it for 
HiveTableFunctionScan


- Vineet Garg


On Feb. 4, 2020, 5:26 a.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72073/
> ---
> 
> (Updated Feb. 4, 2020, 5:26 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Vineet Garg.
> 
> 
> Bugs: HIVE-22808
> https://issues.apache.org/jira/browse/HIVE-22808
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveRelFieldTrimmer does not handle HiveTableFunctionScan
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableFunctionScan.java
>  ffa2a1f262 
>   ql/src/test/results/clientpositive/except_all.q.out 020cba4287 
>   ql/src/test/results/clientpositive/intersect_all_rj.q.out b8ff98ae79 
>   ql/src/test/results/clientpositive/llap/intersect_all_rj.q.out cdfbc2239e 
> 
> 
> Diff: https://reviews.apache.org/r/72073/diff/2/
> 
> 
> Testing
> ---
> 
> mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestCliDriver 
> -Dqfile=intersect_all_rj.q -pl itests/qtest -Pitests
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>



[jira] [Created] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression

2020-02-03 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22824:
--

 Summary: JoinProjectTranspose rule should skip Projects containing 
windowing expression
 Key: HIVE-22824
 URL: https://issues.apache.org/jira/browse/HIVE-22824
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


Otherwise this rule could end up creating plan with windowing expression within 
join condition which hive doesn't know how to process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22788) Query cause NPE due to implicit cast on ROW__ID

2020-01-28 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22788:
--

 Summary: Query cause NPE due to implicit cast on ROW__ID
 Key: HIVE-22788
 URL: https://issues.apache.org/jira/browse/HIVE-22788
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


*Repro*
{code:sql}
CREATE TABLE table_16 (
timestamp_col_19timestamp,
timestamp_col_29timestamp,
int_col_27  int,
int_col_39  int,
boolean_col_18  boolean,
varchar0045_col_23  varchar(45)
);


CREATE TABLE table_7 (
int_col_10  int,
bigint_col_3bigint
);

CREATE TABLE table_10 (
boolean_col_8   boolean,
boolean_col_16  boolean,
timestamp_col_5 timestamp,
timestamp_col_15timestamp,
timestamp_col_30timestamp,
decimal3825_col_26  decimal(38, 25),
smallint_col_9  smallint,
int_col_18  int
);

explain cbo 
SELECT
DISTINCT COALESCE(a4.timestamp_col_15, IF(a4.boolean_col_16, 
a4.timestamp_col_30, a4.timestamp_col_5)) AS timestamp_col
FROM table_7 a3
RIGHT JOIN table_10 a4 
WHERE (a3.bigint_col_3) >= (a4.int_col_18)
INTERSECT ALL
SELECT
COALESCE(LEAST(
COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
),
GREATEST(COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS 
TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
) AS timestamp_col
FROM table_16 a1
GROUP BY COALESCE(LEAST(
COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
),
GREATEST(
COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22783) Add test for HIVE-22366

2020-01-27 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22783:
--

 Summary: Add test for HIVE-22366
 Key: HIVE-22783
 URL: https://issues.apache.org/jira/browse/HIVE-22783
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-01-27 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22782:
--

 Summary: Consolidate metastore call to fetch constraints
 Key: HIVE-22782
 URL: https://issues.apache.org/jira/browse/HIVE-22782
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently separate calls are made to metastore to fetch constraints like Pk, 
fk, not null etc. Since planner always retrieve these constraints we should 
retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22777) Sorted dynamic partition optimization doesn't work if plan require implicit cast

2020-01-24 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22777:
--

 Summary: Sorted dynamic partition optimization doesn't work if 
plan require implicit cast
 Key: HIVE-22777
 URL: https://issues.apache.org/jira/browse/HIVE-22777
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


*Repro*
set hive.stats.autogather=false;
set hive.optimize.sort.dynamic.partition.threshold=1;
set hive.optimize.bucketingsorting = true;

{code:sql}
drop table if exists t1_staging;
create table t1_staging(
   a string,
   b int,
   c int,
   d string)
partitioned by (e  decimal(18,0))
clustered by(a)
into 256 buckets STORED AS TEXTFILE;
load data local inpath '../../data/files/sortdp/00_0' overwrite into table 
t1_staging partition (e=100);

drop table t1_n147;
create table t1_n147(
a string,
b decimal(6,0),
c int,
d string)
partitioned by (e decimal(3,0))
clustered by(a,b)
into 10 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');

set hive.stats.autogather=false;
set hive.optimize.bucketingsorting = true;
explain insert overwrite table t1_n147 partition(e) select a,b,c,d,e  from 
t1_staging;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22637) Avoid cost based rules during generating expressions from AST

2019-12-12 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22637:
--

 Summary: Avoid cost based rules during generating expressions from 
AST
 Key: HIVE-22637
 URL: https://issues.apache.org/jira/browse/HIVE-22637
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


genExprNode uses default dispatcher which fire rules based on cost, computation 
of cost is expensive and looks un-necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22632) Improve estimateRowSizeFromSchema

2019-12-11 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22632:
--

 Summary: Improve estimateRowSizeFromSchema
 Key: HIVE-22632
 URL: https://issues.apache.org/jira/browse/HIVE-22632
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Vineet Garg
Assignee: Vineet Garg


estimateRowSizeFromSchema un-necessarily iterate and do look-up. This could be 
avoided.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22631) Avoid deep copying partition list in listPartitionsByExpr

2019-12-11 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22631:
--

 Summary: Avoid deep copying partition list in listPartitionsByExpr
 Key: HIVE-22631
 URL: https://issues.apache.org/jira/browse/HIVE-22631
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Vineet Garg
Assignee: Vineet Garg


This is an expensive call, I am not sure why deepCopy is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22591) Make single metastore call to fetch all column stats instead of separate call for each column

2019-12-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22591:
--

 Summary: Make single metastore call to fetch all column stats 
instead of separate call for each column
 Key: HIVE-22591
 URL: https://issues.apache.org/jira/browse/HIVE-22591
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently HiveRelFieldTrimmer has the ability to fetch (and trigger cache) 
column stats in single call. 
HiveReduceExpressionsWithStatsRule on the other hand has to use column stats on 
demand, as a result it makes single call for each column.
This should be moved after RelFieldTrimmer so that RelFieldTrimmer has all the 
column stats cached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22380) HiveStrictManagedMigration - migration isn't triggered if database location is updated

2019-10-21 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22380:
--

 Summary: HiveStrictManagedMigration - migration isn't triggered if 
database location is updated
 Key: HIVE-22380
 URL: https://issues.apache.org/jira/browse/HIVE-22380
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Vineet Garg


Migration could be triggered for single table, which updates the DB location. 
If user wants to trigger migration for another table in the database whole 
migration silently skips since DB location has already been updated.

There should be an option to force migration to go ahead even if DB location is 
updated otherwise user is left no option to migrate rest of the tables.

 

cc [~jdere]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22366) Multiple metastore calls for same table and constraints during planning

2019-10-17 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22366:
--

 Summary: Multiple metastore calls for same table and constraints 
during planning
 Key: HIVE-22366
 URL: https://issues.apache.org/jira/browse/HIVE-22366
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22364) RelFieldTrimmer is being done twice during compilation

2019-10-17 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22364:
--

 Summary: RelFieldTrimmer is being done twice during compilation
 Key: HIVE-22364
 URL: https://issues.apache.org/jira/browse/HIVE-22364
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22227) Tez bucket pruning produces wrong result with shared work optimization

2019-09-20 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-7:
--

 Summary: Tez bucket pruning produces wrong result with shared work 
optimization
 Key: HIVE-7
 URL: https://issues.apache.org/jira/browse/HIVE-7
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22213) TxnHander cleanupRecords should only clean records belonging to default catalog

2019-09-17 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22213:
--

 Summary: TxnHander cleanupRecords should only clean records 
belonging to default catalog
 Key: HIVE-22213
 URL: https://issues.apache.org/jira/browse/HIVE-22213
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently it removes record for given database and given table without checking 
for the catalog, as a result it can end up removing records when it shouldn't. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22145) Avoid optimizations for analyze compute statistics

2019-08-26 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22145:
--

 Summary: Avoid optimizations for analyze compute statistics
 Key: HIVE-22145
 URL: https://issues.apache.org/jira/browse/HIVE-22145
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-22136) Turn on tez.bucket.pruning

2019-08-21 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22136:
--

 Summary: Turn on tez.bucket.pruning 
 Key: HIVE-22136
 URL: https://issues.apache.org/jira/browse/HIVE-22136
 Project: Hive
  Issue Type: Task
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-22135) Remove Hive.get().getConf() call and instead pass the configuration

2019-08-21 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22135:
--

 Summary: Remove Hive.get().getConf() call and instead pass the 
configuration
 Key: HIVE-22135
 URL: https://issues.apache.org/jira/browse/HIVE-22135
 Project: Hive
  Issue Type: Task
Affects Versions: 4.0.0
Reporter: Vineet Garg


There are multiple places where {{Hive.get().getConf()}} is used to get to hive 
configuration. This static call could be expensive and should be avoided, 
instead configuration should be passed on to where ever it is required.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-22121) Turning on hive.tez.bucket.pruning produce wrong results

2019-08-15 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-22121:
--

 Summary: Turning on hive.tez.bucket.pruning produce wrong results
 Key: HIVE-22121
 URL: https://issues.apache.org/jira/browse/HIVE-22121
 Project: Hive
  Issue Type: Bug
Reporter: Vineet Garg
Assignee: Vineet Garg


*Reproducer*

{code:sql}
set hive.query.results.cache.enabled=false;
set hive.optimize.ppd.storage=true;
set hive.optimize.index.filter=true;

set hive.tez.bucket.pruning=true; 


CREATE TABLE `test_table`( 
   `col_1` int, 
   `col_2` string,  
   `col_3` string)  
 CLUSTERED BY ( 
   col_1)   
 INTO 4 BUCKETS; 

insert into test_table values(1, 'one', 'ONE'), (2, 'two', 'TWO'), 
(3,'three','THREE'),(4,'four','FOUR');

select * from test_table;

explain select col_1, col_2, col_3 from test_table where col_1 <> 2 order by 
col_2;
select col_1, col_2, col_3 from test_table where col_1 <> 2 order by col_2;

{code}

Above sql query produce zero rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22107) Correlated subquery producing wrong schema

2019-08-13 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-22107:
--

 Summary: Correlated subquery producing wrong schema
 Key: HIVE-22107
 URL: https://issues.apache.org/jira/browse/HIVE-22107
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


*Repro*
{code:sql}
create table test(id int, name string,dept string);
insert into test values(1,'a','it'),(2,'b','eee'),(NULL, 'c', 'cse');
select distinct 'empno' as eid, a.id from test a where NOT EXISTS (select c.id 
from test c where a.id=c.id);
{code}

{code}
+---++
|  eid  |  a.id  |
+---++
| NULL  | empno  |
+---++
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22079) Post order walker for iterating over expression tree

2019-08-02 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-22079:
--

 Summary: Post order walker for iterating over expression tree
 Key: HIVE-22079
 URL: https://issues.apache.org/jira/browse/HIVE-22079
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


Current {{DefaultGraphWalker}} is used to iterate over an expression tree. This 
walker uses hash map to keep track of visited/processed nodes. If an expression 
tree is large this adds significant overhead due to map lookup.
For an expression trees we can instead use post order traversal and avoid using 
map.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22074) Slow compilation due to IN to OR transformation

2019-08-01 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-22074:
--

 Summary: Slow compilation due to IN to OR transformation
 Key: HIVE-22074
 URL: https://issues.apache.org/jira/browse/HIVE-22074
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently Hive transform IN expressions to OR to apply various CBO rules. This 
incur significant performance hit if IN consist of large number of expressions. 
It is better to not transform IN expressions to OR in such cases because 
overall benefit of various optimizations/transformations is unrealized due to 
the compilation overhead



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22045) HIVE-21711 introduced regression in data load

2019-07-24 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-22045:
--

 Summary: HIVE-21711 introduced regression in data load
 Key: HIVE-22045
 URL: https://issues.apache.org/jira/browse/HIVE-22045
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


Better fix for HIVE-21711 is to specialize the handling for CTAS/Create MV 
statements to avoid intermittent rename operation but keep INSERT etc 
statements do intermittent rename since otherwise final move by file operation 
is significantly slow for such statements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Review Request 71091: ACID: getAcidState() should cache a recursive dir listing locally

2019-07-23 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71091/#review216796
---




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Line 348 (original), 348 (patched)
<https://reviews.apache.org/r/71091/#comment304062>

unnecessary white space



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 195 (patched)
<https://reviews.apache.org/r/71091/#comment304017>

Can you add comment to explain why METADATA_FILE and ACID_FORMAT are not 
filtered out?



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 211 (patched)
<https://reviews.apache.org/r/71091/#comment304018>

Comment here to explain why directories containing TMP_PREFIX are not 
filtered out would be nice.



ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
Lines 29 (patched)
<https://reviews.apache.org/r/71091/#comment304061>

Ununsed import statements


- Vineet Garg


On July 22, 2019, 5:35 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71091/
> ---
> 
> (Updated July 22, 2019, 5:35 p.m.)
> 
> 
> Review request for hive, Gopal V and Vineet Garg.
> 
> 
> Bugs: HIVE-21225
> https://issues.apache.org/jira/browse/HIVE-21225
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-21225
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  4dc04f46fd 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java
>  78cae7263b 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  d59cfe51e9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 295fe7cbd0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 9d5ba3d310 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cff7e04b9a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 707e38c321 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b1ede0556f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  15f1f945ce 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9d631ed43d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 57eb506996 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 67a5e6de46 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 6168fc0f79 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d4abf4277b 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java ea31557741 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
> b5958fa9cc 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 
> 8451462023 
>   streaming/src/test/org/apache/hive/streaming/TestStreaming.java c6d7e7f27c 
> 
> 
> Diff: https://reviews.apache.org/r/71091/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Created] (HIVE-21991) Upgrade ORC version to 1.5.6

2019-07-13 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21991:
--

 Summary: Upgrade ORC version to 1.5.6
 Key: HIVE-21991
 URL: https://issues.apache.org/jira/browse/HIVE-21991
 Project: Hive
  Issue Type: Task
  Components: ORC
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21921) Support for correlated quantified predicates

2019-06-24 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21921:
--

 Summary: Support for correlated quantified predicates
 Key: HIVE-21921
 URL: https://issues.apache.org/jira/browse/HIVE-21921
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg
 Attachments: HIVE-21921.1.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21898) Wrong result with IN correlated subquery with aggregate in SELECT

2019-06-19 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21898:
--

 Summary: Wrong result with IN correlated subquery with aggregate 
in SELECT
 Key: HIVE-21898
 URL: https://issues.apache.org/jira/browse/HIVE-21898
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0, 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


{code:sql}
select 
p_size in
(select min(p_size)
 from (select p_mfgr, p_size from part) a
 where a.p_mfgr = b.p_name
) from part b limit 1
{code}

Expected result: null
Actual result: false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21863) Fix HIVE-21742 for non-cbo path

2019-06-11 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21863:
--

 Summary: Fix HIVE-21742 for non-cbo path
 Key: HIVE-21863
 URL: https://issues.apache.org/jira/browse/HIVE-21863
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Affects Versions: 4.0.0
Reporter: Vineet Garg


The fix done in HIVE-21742 applied to cbo path only (i.e when hive.cbo.enable = 
true). This jira is to fix the issue in non-cbo path as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21862) ORC ppd produces wrong result with timestamp

2019-06-11 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21862:
--

 Summary: ORC ppd produces wrong result with timestamp
 Key: HIVE-21862
 URL: https://issues.apache.org/jira/browse/HIVE-21862
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


Note that to reproduce this ORC-491 and ORC-477 is required.

{code:sql}
set hive.vectorized.execution.enabled=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.tez.bucket.pruning=true;
set hive.optimize.index.filter=true;
set hive.metastore.disallow.incompatible.col.type.changes=false;

create table change_allowincompatible_vectorization_false_date (ts date) 
partitioned by (s string) clustered by (ts) into 32 buckets stored as orc 
tblproperties ('transactional'='true');

alter table change_allowincompatible_vectorization_false_date add 
partition(s='aaa');

alter table change_allowincompatible_vectorization_false_date add 
partition(s='bbb');

insert into table change_allowincompatible_vectorization_false_date partition 
(s='aaa') select ctimestamp1 from alltypesorc where ctimestamp1 > '2000-01-01' 
limit 50;

insert into table change_allowincompatible_vectorization_false_date partition 
(s='bbb') select ctimestamp1 from alltypesorc where ctimestamp1 < '2000-01-01' 
limit 50;

select count(*) from change_allowincompatible_vectorization_false_date;

alter table change_allowincompatible_vectorization_false_date change column ts 
ts timestamp;

select count(*) from change_allowincompatible_vectorization_false_date;

insert into table change_allowincompatible_vectorization_false_date partition 
(s='aaa') values ('2038-03-22 07:26:48.0');

select ts from change_allowincompatible_vectorization_false_date where 
ts='2038-03-22 07:26:48.0' and s='aaa';
{code}

Expected is one row but actual is zero
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21767) Support SMB Join with shared work optimization

2019-05-21 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21767:
--

 Summary: Support SMB Join with shared work optimization
 Key: HIVE-21767
 URL: https://issues.apache.org/jira/browse/HIVE-21767
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg


SMB join contains Dummy operator for which shared work optimization is disabled 
(HIEV-21760). It will be good to support shared work optimization for such 
plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21760) Sharedwork optimization should be bypassed for SMB joins

2019-05-20 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21760:
--

 Summary: Sharedwork optimization should be bypassed for SMB joins
 Key: HIVE-21760
 URL: https://issues.apache.org/jira/browse/HIVE-21760
 Project: Hive
  Issue Type: Bug
Reporter: Vineet Garg
Assignee: Vineet Garg


SMB join introduces DUMMY OPERATOR, if shared work optimizer merges plan 
containing dummy operator task generation fails.
I am not sure what is the root cause of failure in task generation but 
presumably it has some assumption regarding plan containing dummy operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21711) Regression caused by HIVE-21279 for blobstorage fs

2019-05-08 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21711:
--

 Summary: Regression caused by HIVE-21279 for blobstorage fs
 Key: HIVE-21711
 URL: https://issues.apache.org/jira/browse/HIVE-21711
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


HIVE-21279 caused a regression wherein CTAS/create materialized views statement 
for blobstorage is now always renaming files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21708) Filter not pushed within join

2019-05-08 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21708:
--

 Summary: Filter not pushed within join
 Key: HIVE-21708
 URL: https://issues.apache.org/jira/browse/HIVE-21708
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


{code:sql}
explain cbo select count(*) from part where p_partkey <> ALL (select p_partkey 
from part)
{code}

{noformat}
HiveAggregate(group=[{}], agg#0=[count()])
  HiveFilter(condition=[AND(OR(IS NULL($4), =($1, 0)), OR(IS NOT NULL($0), 
=($1, 0), IS NOT NULL($4)), OR(>=($2, $1), =($1, 0), IS NOT NULL($4), IS 
NULL($0)))])
HiveProject(p_partkey=[$0], c=[$3], ck=[$4], p_partkey0=[$1], i826=[$2])
  HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
cost=[{27.0 rows, 0.0 cpu, 0.0 io}])
HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], 
cost=[{51.0 rows, 0.0 cpu, 0.0 io}])
  HiveProject(p_partkey=[$0])
HiveTableScan(table=[[qtest, part]], table:alias=[part])
  HiveProject(p_partkey=[$0], i826=[true])
HiveAggregate(group=[{0}])
  HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[qtest, part]], table:alias=[part])
HiveProject(c=[$0], ck=[$1])
  HiveAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
HiveTableScan(table=[[qtest, part]], table:alias=[part])
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21691) Support <>ANY and =ALL subqueries

2019-05-04 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21691:
--

 Summary: Support <>ANY and =ALL subqueries
 Key: HIVE-21691
 URL: https://issues.apache.org/jira/browse/HIVE-21691
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg


HIVE-13582 is adding quantified predicate support but {{<>ANY}} and {{=ALL}} 
are not supported since it is not clear what is possible semantically 
equivalent transformation for them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21690) Support outer joins with HiveAggregateJoinTransposeRule and turn it on by default

2019-05-04 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21690:
--

 Summary: Support outer joins with HiveAggregateJoinTransposeRule 
and turn it on by default
 Key: HIVE-21690
 URL: https://issues.apache.org/jira/browse/HIVE-21690
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


1) This optimization is off by default. We would like to turn on this 
optimization wherein group by is pushed down to join, in some cases top 
aggregate is removed but in most of the cases this optimization adds extra 
aggregate nodes. To measure if those extra aggregates are beneficial or not 
(they might add extra overhead without reducing rows) cost is computed and 
compared b/w previous plan and new plan.

Since Hive's cost model only consider JOIN's cost and discard cost of rest of 
the nodes, this comparison always favor new plan (since adding aggregate 
beneath join reduces the total number of rows processed by the join and 
therefore reduces the join cost). Therefore turning on this optimization with 
existing cost model is not a good idea.

One approach to fix this is to localize the cost computation to the rule 
itself, i.e compute the non-cumulative cost of existing aggregate and join and 
compare it with new cost of new aggregates, join and top aggregate. 

Better approach in my opinion would be to fix the cost model and take aggregate 
cost into account (along with the join). This could affect other queries and 
can cause performance regression but those will most likely be issues with the 
planning and should be investigated and fixed.


2) This optimization currently only support INNER JOIN. This can be extended to 
support OUTER joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21673) Casting output of mask udf to string produce different result for date type

2019-04-30 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21673:
--

 Summary: Casting output of mask udf to string produce different 
result for date type
 Key: HIVE-21673
 URL: https://issues.apache.org/jira/browse/HIVE-21673
 Project: Hive
  Issue Type: Bug
Reporter: Vineet Garg


*Reproducer*

{code:sql}
create table t1(d date);
insert into t1 values('2019-09-08');
select cast(mask(d) as string) from t1;
{code}

*Expected result*
{noformat}
0001-01-01
{noformat}

*Actual Result*
{noformat}
0001-01-03
{noformat}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21659) Support FULL OUTER JOIN with AggregateJoinTransposeRule

2019-04-27 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21659:
--

 Summary: Support FULL OUTER JOIN with AggregateJoinTransposeRule
 Key: HIVE-21659
 URL: https://issues.apache.org/jira/browse/HIVE-21659
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


This is continuation of CALCITE-3011, which supported LEFT OUTER and RIGHT 
OUTER joins without aggregate functions.

FULL OUTER JOIN was not supported at the time due to CALCITE-3012



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21656) Vectorize UDF mask

2019-04-26 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21656:
--

 Summary: Vectorize UDF mask
 Key: HIVE-21656
 URL: https://issues.apache.org/jira/browse/HIVE-21656
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Reporter: Vineet Garg
Assignee: Vineet Garg
 Attachments: HIVE-21656.1.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21599) Remove predicate on partition columns from Table Scan operator

2019-04-10 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21599:
--

 Summary: Remove predicate on partition columns from Table Scan 
operator
 Key: HIVE-21599
 URL: https://issues.apache.org/jira/browse/HIVE-21599
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Filter predicates are pushed to Table Scan (to be pushed to and used by storage 
handler/input format). Such predicates could consist of partition columns which 
are of no use to storage handler  or input formats. Therefore it should be 
removed from TS filter expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21572) HiveRemoveSqCountCheck rule could be enhanced to capture more patterns

2019-04-03 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21572:
--

 Summary: HiveRemoveSqCountCheck rule could be enhanced to capture 
more patterns 
 Key: HIVE-21572
 URL: https://issues.apache.org/jira/browse/HIVE-21572
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg
 Attachments: HIVE-21572.1.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21537) Scalar query rewrite could be improved to not generate an extra join if subquery is guaranteed to produce atmost one row

2019-03-28 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21537:
--

 Summary: Scalar query rewrite could be improved to not generate an 
extra join if subquery is guaranteed to produce atmost one row
 Key: HIVE-21537
 URL: https://issues.apache.org/jira/browse/HIVE-21537
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently Hive planner introduces this branch and later executes a rule to 
remove this branch if it could. 
Subquery remove rule itself could check if subquery will produce max one row 
(using relmetadat's getMaxRowCount) and avoid introducing this branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 70326: HIVE-21230: LEFT OUTER JOIN does not generate transitive IS NOT NULL filter on right side (HiveJoinAddNotNullRule bails out for outer joins)

2019-03-27 Thread Vineet Garg
/spark/spark_constprog_dpp.q.out c1842b839d 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out 
89da63134c 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out b8403f4e39 
  
ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
 e90f9d17ac 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out 17240f9dc5 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 2d93874450 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 4e31c3fd20 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 3c6f6af020 
  ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 2d7b16f281 
  ql/src/test/results/clientpositive/spark/tez_joins_explain.q.out 114e810a54 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out 
cb50bd6eef 
  ql/src/test/results/clientpositive/spark/vector_outer_join0.q.out e6964e1331 
  ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out 71bcdef0d3 
  ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out e09b940a23 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out 68a65df3a9 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 
e0563eb550 
  ql/src/test/results/clientpositive/vector_coalesce_3.q.out 39fd5e898a 
  ql/src/test/results/clientpositive/vector_groupby_mapjoin.q.out 613a30701a 
  ql/src/test/results/clientpositive/vector_left_outer_join.q.out 1aa237f65d 
  ql/src/test/results/clientpositive/vector_left_outer_join2.q.out 568fb2a589 
  ql/src/test/results/clientpositive/vector_outer_join0.q.out dc5889c787 
  ql/src/test/results/clientpositive/vector_outer_join1.q.out aaf84bab73 
  ql/src/test/results/clientpositive/vector_outer_join2.q.out a5567d9e99 
  ql/src/test/results/clientpositive/vector_outer_join3.q.out 3df003d2d2 
  ql/src/test/results/clientpositive/vector_outer_join4.q.out 826a838d36 
  ql/src/test/results/clientpositive/vector_outer_join6.q.out bd938f631a 
  ql/src/test/results/clientpositive/vectorized_join46_mr.q.out 52107e91f8 


Diff: https://reviews.apache.org/r/70326/diff/4/

Changes: https://reviews.apache.org/r/70326/diff/3-4/


Testing
---


Thanks,

Vineet Garg



Re: Review Request 70326: HIVE-21230: LEFT OUTER JOIN does not generate transitive IS NOT NULL filter on right side (HiveJoinAddNotNullRule bails out for outer joins)

2019-03-27 Thread Vineet Garg
/spark/spark_constprog_dpp.q.out c1842b839d 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out 
89da63134c 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out b8403f4e39 
  
ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
 e90f9d17ac 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out 17240f9dc5 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 2d93874450 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 4e31c3fd20 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 3c6f6af020 
  ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 2d7b16f281 
  ql/src/test/results/clientpositive/spark/tez_joins_explain.q.out 114e810a54 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out 
cb50bd6eef 
  ql/src/test/results/clientpositive/spark/vector_outer_join0.q.out e6964e1331 
  ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out 71bcdef0d3 
  ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out e09b940a23 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out 68a65df3a9 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 
e0563eb550 
  ql/src/test/results/clientpositive/vector_coalesce_3.q.out 39fd5e898a 
  ql/src/test/results/clientpositive/vector_groupby_mapjoin.q.out 613a30701a 
  ql/src/test/results/clientpositive/vector_left_outer_join.q.out 1aa237f65d 
  ql/src/test/results/clientpositive/vector_left_outer_join2.q.out 568fb2a589 
  ql/src/test/results/clientpositive/vector_outer_join0.q.out dc5889c787 
  ql/src/test/results/clientpositive/vector_outer_join1.q.out aaf84bab73 
  ql/src/test/results/clientpositive/vector_outer_join2.q.out a5567d9e99 
  ql/src/test/results/clientpositive/vector_outer_join3.q.out 3df003d2d2 
  ql/src/test/results/clientpositive/vector_outer_join4.q.out 826a838d36 
  ql/src/test/results/clientpositive/vector_outer_join6.q.out bd938f631a 
  ql/src/test/results/clientpositive/vectorized_join46_mr.q.out 52107e91f8 


Diff: https://reviews.apache.org/r/70326/diff/3/

Changes: https://reviews.apache.org/r/70326/diff/2-3/


Testing
---


Thanks,

Vineet Garg



Re: Review Request 70326: HIVE-21230: LEFT OUTER JOIN does not generate transitive IS NOT NULL filter on right side (HiveJoinAddNotNullRule bails out for outer joins)

2019-03-27 Thread Vineet Garg
/spark/spark_constprog_dpp.q.out c1842b839d 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out 
89da63134c 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out b8403f4e39 
  
ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
 e90f9d17ac 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out 17240f9dc5 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 2d93874450 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 4e31c3fd20 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 3c6f6af020 
  ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 2d7b16f281 
  ql/src/test/results/clientpositive/spark/tez_joins_explain.q.out 114e810a54 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out 
cb50bd6eef 
  ql/src/test/results/clientpositive/spark/vector_outer_join0.q.out e6964e1331 
  ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out 71bcdef0d3 
  ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out e09b940a23 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out 68a65df3a9 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 
e0563eb550 
  ql/src/test/results/clientpositive/vector_coalesce_3.q.out 39fd5e898a 
  ql/src/test/results/clientpositive/vector_groupby_mapjoin.q.out 613a30701a 
  ql/src/test/results/clientpositive/vector_left_outer_join.q.out 1aa237f65d 
  ql/src/test/results/clientpositive/vector_left_outer_join2.q.out 568fb2a589 
  ql/src/test/results/clientpositive/vector_outer_join0.q.out dc5889c787 
  ql/src/test/results/clientpositive/vector_outer_join1.q.out aaf84bab73 
  ql/src/test/results/clientpositive/vector_outer_join2.q.out a5567d9e99 
  ql/src/test/results/clientpositive/vector_outer_join3.q.out 3df003d2d2 
  ql/src/test/results/clientpositive/vector_outer_join4.q.out 826a838d36 
  ql/src/test/results/clientpositive/vector_outer_join6.q.out bd938f631a 
  ql/src/test/results/clientpositive/vectorized_join46_mr.q.out 52107e91f8 


Diff: https://reviews.apache.org/r/70326/diff/2/

Changes: https://reviews.apache.org/r/70326/diff/1-2/


Testing
---


Thanks,

Vineet Garg



Re: Review Request 70326: HIVE-21230: LEFT OUTER JOIN does not generate transitive IS NOT NULL filter on right side (HiveJoinAddNotNullRule bails out for outer joins)

2019-03-27 Thread Vineet Garg


> On March 27, 2019, 6:12 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
> > Lines 121 (patched)
> > <https://reviews.apache.org/r/70326/diff/1/?file=2135161#file2135161line122>
> >
> > I think you should initialize these preds as: 
> > RexNode newLeftPredicate = rexBuilder.makeLiteral(true);
> > Then you can remove all the null checks below.

Good point. Let me update the code.


- Vineet


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70326/#review214126
-------


On March 27, 2019, 5:39 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70326/
> ---
> 
> (Updated March 27, 2019, 5:39 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-21230
> https://issues.apache.org/jira/browse/HIVE-21230
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21230: LEFT OUTER JOIN does not generate transitive IS NOT NULL filter 
> on right side (HiveJoinAddNotNullRule bails out for outer joins)
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 3a2807f302 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
>  9711625016 
>   ql/src/test/queries/clientpositive/transitive_not_null.q PRE-CREATION 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out 6c73d6398c 
>   ql/src/test/results/clientpositive/cbo_SortUnionTransposeRule.q.out 
> 1beaa648de 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out 76eaa52855 
>   ql/src/test/results/clientpositive/constant_prop_3.q.out db73902af1 
>   ql/src/test/results/clientpositive/correlationoptimizer8.q.out 69a6670f44 
>   ql/src/test/results/clientpositive/infer_join_preds.q.out 0afdd029be 
>   ql/src/test/results/clientpositive/innerjoin.q.out 709dbdb539 
>   ql/src/test/results/clientpositive/join45.q.out fd639b9d51 
>   ql/src/test/results/clientpositive/join46.q.out 02cb625d0f 
>   ql/src/test/results/clientpositive/join47.q.out e9b6be4f3a 
>   ql/src/test/results/clientpositive/join_cond_pushdown_unqual5.q.out 
> a966d8caf3 
>   ql/src/test/results/clientpositive/join_emit_interval.q.out 9f3f01f57e 
>   ql/src/test/results/clientpositive/join_filters_overlap.q.out 6cd17d1ddb 
>   ql/src/test/results/clientpositive/join_merging.q.out 5b9c0630e6 
>   ql/src/test/results/clientpositive/join_star.q.out 9caf12da49 
>   ql/src/test/results/clientpositive/lineage1.q.out 4a2ca453ac 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_14.q.out 
> cf7252f753 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_15.q.out 
> b6e0ebf30a 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 
> 2c6d7cafa7 
>   ql/src/test/results/clientpositive/llap/check_constraint.q.out 7b794ba34b 
>   ql/src/test/results/clientpositive/llap/constprog_dpp.q.out eef365b9b1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 
> 4ba041d992 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 
> c0909fe1ad 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer4.q.out 
> 66e5bff966 
>   ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out 
> 1679d577e6 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 1ea8fdcbb2 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 
> 63c455b3e4 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out 338f840938 
>   ql/src/test/results/clientpositive/llap/join46.q.out ec584299c5 
>   ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 05424ad04d 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 
> b0e8aeaf08 
>   ql/src/test/results/clientpositive/llap/lineage2.q.out 9543864b2e 
>   ql/src/test/results/clientpositive/llap/lineage3.q.out 11e6904a12 
>   ql/src/test/results/clientpositive/llap/mapjoin3.q.out ac36e4ff44 
>   ql/src/test/results/clientpositive/llap/mapjoin46.q.out d9d239611b 
>   ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out 
> 8c9008a3d8 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out f1153e76dd 
>   ql/src/test/results/clientpositive/llap/sharedwork.q.out 9bd73f98b6 
>   ql/src/test/results/clientpositive/llap/skewjoinopt15.q.out 35f7051ebe 
>   ql/src/

Re: How to disable notification

2019-03-27 Thread Vineet Garg
Hi,

You must be subscribed to iss...@hive.apache.org 
. Try unsubscribing from the mailing list.

Vineet

> On Mar 18, 2019, at 11:27 PM, Sandeep Katta 
>  wrote:
> 
> From Hive Dev mailing list I am getting mails for the following actions
> 
> 1.Newly created jiras
> 2.Any comments on any jira in github
> 
> How to disable the notifications for these actions.I know this question
> sounds very basic but I could not find any way to solve this.



Review Request 70326: HIVE-21230: LEFT OUTER JOIN does not generate transitive IS NOT NULL filter on right side (HiveJoinAddNotNullRule bails out for outer joins)

2019-03-27 Thread Vineet Garg
/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out 
89da63134c 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out b8403f4e39 
  
ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
 e90f9d17ac 
  ql/src/test/results/clientpositive/spark/subquery_multi.q.out 17240f9dc5 
  ql/src/test/results/clientpositive/spark/subquery_notin.q.out 2d93874450 
  ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 4e31c3fd20 
  ql/src/test/results/clientpositive/spark/subquery_select.q.out 3c6f6af020 
  ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 2d7b16f281 
  ql/src/test/results/clientpositive/spark/tez_joins_explain.q.out 114e810a54 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out 
cb50bd6eef 
  ql/src/test/results/clientpositive/spark/vector_outer_join0.q.out e6964e1331 
  ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out 71bcdef0d3 
  ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out e09b940a23 
  ql/src/test/results/clientpositive/subquery_notin_having.q.out 68a65df3a9 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 
e0563eb550 
  ql/src/test/results/clientpositive/vector_coalesce_3.q.out 39fd5e898a 
  ql/src/test/results/clientpositive/vector_groupby_mapjoin.q.out 613a30701a 
  ql/src/test/results/clientpositive/vector_left_outer_join.q.out 1aa237f65d 
  ql/src/test/results/clientpositive/vector_left_outer_join2.q.out 568fb2a589 
  ql/src/test/results/clientpositive/vector_outer_join0.q.out dc5889c787 
  ql/src/test/results/clientpositive/vector_outer_join1.q.out aaf84bab73 
  ql/src/test/results/clientpositive/vector_outer_join2.q.out a5567d9e99 
  ql/src/test/results/clientpositive/vector_outer_join3.q.out 3df003d2d2 
  ql/src/test/results/clientpositive/vector_outer_join4.q.out 826a838d36 
  ql/src/test/results/clientpositive/vector_outer_join6.q.out bd938f631a 
  ql/src/test/results/clientpositive/vectorized_join46_mr.q.out 52107e91f8 


Diff: https://reviews.apache.org/r/70326/diff/1/


Testing
---


Thanks,

Vineet Garg



Re: Review Request 70190: HIVE-21316 Varchar cmp

2019-03-25 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70190/#review214003
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java
Line 816 (original), 836 (patched)
<https://reviews.apache.org/r/70190/#comment300169>

Adding enum interperation for CHAR and going through same path of 
makeHiveUnicodeString for CHAR will make the code more redabled and clear.

ExprNodeDesc visitLiteral(RexLiteral literal) will need to be udpated 
accordingly.



ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Lines 1442 (patched)
<https://reviews.apache.org/r/70190/#comment300170>

Add a comment explaining why



ql/src/test/results/clientpositive/in_typecheck_varchar.q.out
Line 125 (original)
<https://reviews.apache.org/r/70190/#comment300171>

Strange that this is not being vectorized anymore.


- Vineet Garg


On March 12, 2019, 11:55 a.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70190/
> ---
> 
> (Updated March 12, 2019, 11:55 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-21316
> https://issues.apache.org/jira/browse/HIVE-21316
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> preserver varchar type during/after constant folding
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 
> a237745487785bc259ee10ee0989f215ee854572 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ExprNodeConverter.java
>  6dd00189d60e5c01495a5ebd8b64ac339ea59525 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java
>  d15c710c5e12ec7c6af0256afab3ba1dd4d6a92e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> a2dd554b6ea00a3769d2d1593e36835433a96d57 
>   ql/src/test/queries/clientpositive/fold_varchar.q PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_varchar.q.out 
> f51ff29dbac90b8d35ab8cb4007bf17efbf34543 
>   ql/src/test/results/clientpositive/llap/fold_varchar.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/materialized_view_rewrite_6.q.out 
> ee5dfd1e1cfc1b2204f7260626f6c43f1ab2ba88 
>   
> ql/src/test/results/clientpositive/llap/materialized_view_rewrite_no_join_opt.q.out
>  b022ee8ff57c34cc4244bb790e6971a8127fd1b8 
>   ql/src/test/results/clientpositive/llap/vector_case_when_1.q.out 
> 6529758a3a9977154276ba6d8b2dafce922c4d64 
>   ql/src/test/results/clientpositive/vector_case_when_1.q.out 
> 61062e1f8e08306061034c1923d0178a428c6475 
> 
> 
> Diff: https://reviews.apache.org/r/70190/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>



[jira] [Created] (HIVE-21481) MERGE correctness issues with null safe equality

2019-03-19 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21481:
--

 Summary: MERGE correctness issues with null safe equality
 Key: HIVE-21481
 URL: https://issues.apache.org/jira/browse/HIVE-21481
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg


The way Hive currently generates plan for MERGE statement can lead to wrong 
results with null safe equality.

To illustrate consider the following reproducer
{code:sql}
create table ttarget(s string, j int, flag string) stored as orc 
tblproperties("transactional"="true");
truncate table ttarget;
insert into ttarget values('not_null', 1, 'dont udpate'), (null,2, 'update');

create table tsource (i int);
insert into tsource values(null),(2);
{code}

Let's say you have the following MERGE statement
{code:sql}
explain merge into ttarget using tsource on i<=>j
 when matched THEN
UPDATE set flag='updated'
 when not matched THEN
INSERT VALUES('new', 1999, 'true');
{code}

With this MERGE {{*ONLY ONE*}} row should match in target which should be 
updated. But currently due to the plan hive generate it will end up matching 
both rows.

This is because MERGE statement is rewritten into RIGHT OUTER JOIN + FILTER 
corresponding to all branches.

The part of the plan generated by hive for this statement consist of:
{noformat}
Map 2
Map Operator Tree:
TableScan
  alias: tsource
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
  Map Join Operator
condition map:
 Right Outer Join 0 to 1
keys:
  0 j (type: int)
  1 i (type: int)
nullSafes: [true]
outputColumnNames: _col0, _col1, _col5, _col6
input vertices:
  0 Map 1
Statistics: Num rows: 1 Data size: 206 Basic stats: 
COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Filter Operator
  predicate: (_col6 IS NOT DISTINCT FROM _col1) (type: 
boolean)
  Statistics: Num rows: 1 Data size: 206 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col5 (type: 
struct), _col0 (type: string), _col1 
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 206 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: 
struct)
  sort order: +
  Map-reduce partition columns: UDFToInteger(_col0) 
(type: int)
  Statistics: Num rows: 1 Data size: 206 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col1 (type: string), _col2 (type: 
int)
{noformat}

Result after JOIN will be :
{code:sql}
select s,j,i from ttarget right outer join tsource on i<=>j ;
NULLNULLNULL
NULLNULL2
{code}

On this resultset predicate {{(_col6 IS NOT DISTINCT FROM _col1)}} will be true 
for both resulting into both rows matching.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21395) Refactor HiveSemiJoinRule

2019-03-05 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21395:
--

 Summary: Refactor HiveSemiJoinRule
 Key: HIVE-21395
 URL: https://issues.apache.org/jira/browse/HIVE-21395
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


Following refactoring needs to be done:
* Update the rule matching pattern to avoid using HepVertex
* HIVE-21338 adds logic to determine if rel plan will produce at most one row. 
Use this in HiveSemiJoinRule



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21387) Wrong result for UNION query with GROUP BY consisting of PK columns

2019-03-04 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21387:
--

 Summary: Wrong result for UNION query with GROUP BY consisting of 
PK columns
 Key: HIVE-21387
 URL: https://issues.apache.org/jira/browse/HIVE-21387
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg


*Reproducer*
{code:sql}
create table t1(i int primary key disable rely, j int);
insert into t1 values(1,100),(2,200);
create table t2(i int primary key disable rely, j int);
insert into t2 values(2,1000),(4,500);

select i from (select i, j from t1 union all select i,j from t2) subq group by 
i,j;
{code}

*Expected Result*
{noformat}
2
2
4
1
{noformat}

*Actual Result*
{noformat}
1
2
4
{noformat}

*CBO Plan*
{code:sql}
HiveAggregate(group=[{0}])
  HiveProject(i=[$0], j=[$1])
HiveUnion(all=[true])
  HiveProject(i=[$0], j=[$1])
HiveTableScan(table=[[default, t1]], table:alias=[t1])
  HiveProject(i=[$0], j=[$1])
HiveTableScan(table=[[default, t2]], table:alias=[t2])
{code}

This is due to Group by reduction logic reducing keys when it shouldn't. 
Because of UNION relative cardinality of the group by keys are changed (they 
are not PK/UNIQUE anymore). Therefore we shouldn't be trying to reduce group by 
keys at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21386) Extend the fetch task enhancement done in HIVE-21279 to make it work with query result cache

2019-03-04 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21386:
--

 Summary: Extend the fetch task enhancement done in HIVE-21279 to 
make it work with query result cache
 Key: HIVE-21386
 URL: https://issues.apache.org/jira/browse/HIVE-21386
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


The improvement done in HIVE-21279 is disabled for query cache. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-04 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21382:
--

 Summary: Group by keys reduction optimization - keys are not 
reduced in query23
 Key: HIVE-21382
 URL: https://issues.apache.org/jira/browse/HIVE-21382
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


{code:sql}
explain cbo with frequent_ss_items as 
 (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
solddate,count(*) cnt
  from store_sales
  ,date_dim 
  ,item
  where ss_sold_date_sk = d_date_sk
and ss_item_sk = i_item_sk 
and d_year in (1999,1999+1,1999+2,1999+3)
  group by substr(i_item_desc,1,30),i_item_sk,d_date
  having count(*) >4)
select  sum(sales)
 from ((select cs_quantity*cs_list_price sales
   from catalog_sales
   ,date_dim 
   where d_year = 1999 
 and d_moy = 1 
 and cs_sold_date_sk = d_date_sk 
 and cs_item_sk in (select item_sk from frequent_ss_items))) subq limit 
100;
{code}

{code:sql}
HiveSortLimit(fetch=[100])
  HiveProject($f0=[$0])
HiveAggregate(group=[{}], agg#0=[sum($0)])
  HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
  HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
cs_quantity=[$18], cs_list_price=[$20])
  HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[perf_constraints, catalog_sales]], 
table:alias=[catalog_sales])
HiveProject(d_date_sk=[$0])
  HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
HiveTableScan(table=[[perf_constraints, date_dim]], 
table:alias=[date_dim])
  HiveProject(i_item_sk=[$1])
HiveFilter(condition=[>($3, 4)])
  HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
  HiveJoin(condition=[=($1, $4)], joinType=[inner], 
algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
HiveJoin(condition=[=($0, $2)], joinType=[inner], 
algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
  HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
HiveFilter(condition=[IS NOT NULL($0)])
  HiveTableScan(table=[[perf_constraints, 
store_sales]], table:alias=[store_sales])
  HiveProject(d_date_sk=[$0], d_date=[$2])
HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
  HiveTableScan(table=[[perf_constraints, date_dim]], 
table:alias=[date_dim])
HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
  HiveTableScan(table=[[perf_constraints, item]], 
table:alias=[item])
{code}

Right side of HiveSemiJoin has an aggregate which could be reduce to have only 
{{i_item_sk}} as group by key since {{i_item_sk}} is primary key.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21381) Improve column pruning

2019-03-04 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21381:
--

 Summary: Improve column pruning
 Key: HIVE-21381
 URL: https://issues.apache.org/jira/browse/HIVE-21381
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


Following query generate plan where right side of HiveSemiJoin contains 
HiveProject->HiveFilter->HiveProject where bottom HiveProject contain extra 
columns which can be pruned.

{code:sql}
explain cbo with frequent_ss_items as 
 (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
solddate,count(*) cnt
  from store_sales
  ,date_dim 
  ,item
  where ss_sold_date_sk = d_date_sk
and ss_item_sk = i_item_sk 
and d_year in (1999,1999+1,1999+2,1999+3)
  group by substr(i_item_desc,1,30),i_item_sk,d_date
  having count(*) >4)
select  sum(sales)
 from ((select cs_quantity*cs_list_price sales
   from catalog_sales
   ,date_dim 
   where d_year = 1999 
 and d_moy = 1 
 and cs_sold_date_sk = d_date_sk 
 and cs_item_sk in (select item_sk from frequent_ss_items))) subq limit 
100;
{code}

CBO Plan:
{code:sql}
HiveSortLimit(fetch=[100])
  HiveProject($f0=[$0])
HiveAggregate(group=[{}], agg#0=[sum($0)])
  HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
  HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
cs_quantity=[$18], cs_list_price=[$20])
  HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[perf_constraints, catalog_sales]], 
table:alias=[catalog_sales])
HiveProject(d_date_sk=[$0])
  HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
HiveTableScan(table=[[perf_constraints, date_dim]], 
table:alias=[date_dim])
  HiveProject(i_item_sk=[$1])
HiveFilter(condition=[>($3, 4)])
  HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
  HiveJoin(condition=[=($1, $4)], joinType=[inner], 
algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
HiveJoin(condition=[=($0, $2)], joinType=[inner], 
algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
  HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
HiveFilter(condition=[IS NOT NULL($0)])
  HiveTableScan(table=[[perf_constraints, 
store_sales]], table:alias=[store_sales])
  HiveProject(d_date_sk=[$0], d_date=[$2])
HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
  HiveTableScan(table=[[perf_constraints, date_dim]], 
table:alias=[date_dim])
HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
  HiveTableScan(table=[[perf_constraints, item]], 
table:alias=[item])
{code}

Only {{i_item_sk}} and {{$f3/count}} are used up in the plan therefore columns 
{{substr}} andn {{d_date}} can be removed.

Note that the above is generated with HIVE-21340 patch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Move gitbox notification emails to another list?

2019-02-28 Thread Vineet Garg
FYI I have opened an INFRA jira for RB issue:
https://issues.apache.org/jira/browse/INFRA-17926. I personally prefer it
over github pull request for reviews.


On Thu, Feb 28, 2019 at 10:50 AM Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> I think it is a good idea, I can ask that the 'opened a new pull request'
> messages continue being sent to @dev, while the rest are sent to gitbox.
>
> -Jesús
>
>
> On 2/28/19, 9:55 AM, "Alan Gates"  wrote:
>
> +1 to sending the github mail to a separate list.
>
> I agree with Peter that seeing PR reviews is good.  Wouldn't it be
> possible
> to craft a filter that only allowed through these mails?
>
> Alan.
>
>
>
> On Thu, Feb 28, 2019 at 12:11 AM Peter Vary  >
> wrote:
>
> > The github mails in this form are just white noise, so I have added
> > filters to my mailbox to drop every github mail to another folder,
> just as
> > proposed by Jesús. So if we can do it by the infra it would be
> better. +1
> > from me.
> >
> > On the other hand, I miss the "review request created" messages of
> the
> > review board. These type of messages are lost in the shower of the
> github
> > letters. If anyone has an idea how to detect these that would be
> awesome.
> >
> > Thanks,
> > Peter
> >
> > > On Feb 28, 2019, at 08:59, Mani M  wrote:
> > >
> > > Good.
> > >
> > > With Regards
> > > M.Mani
> > > +61 432 461 087
> > >
> > > On Thu, 28 Feb 2019, 17:39 Jesus Camacho Rodriguez, <
> > > jcamachorodrig...@hortonworks.com> wrote:
> > >
> > >> We have had a similar discussion in the Calcite project too.
> > >>
> > >> Gitbox emails are being set to the dev@ list. These emails are
> produced
> > >> every time there is activity in the Hive Github repository and are
> > creating
> > >> quite a lot of noise. The result is that it is really difficult to
> > follow
> > >> any activity in the list.
> > >>
> > >> A possible alternative would be to send them to another list, e.g.
> > >> gitbox@h.a.o. The idea is that having them in a list may still be
> > useful
> > >> as they may serve as a searchable archive of activity in the repo.
> > >>
> > >> What do you think? Should we open an INFRA ticket to request this?
> > >>
> > >> Thanks,
> > >> Jesús
> > >>
> > >>
> >
> >
>
>
>


[jira] [Created] (HIVE-21338) Remove order by and limit for aggregates

2019-02-27 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21338:
--

 Summary: Remove order by and limit for aggregates
 Key: HIVE-21338
 URL: https://issues.apache.org/jira/browse/HIVE-21338
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


If a query is guaranteed to produce at most one row LIMIT and ORDER BY could be 
removed. This saves unnecessary vertex for LIMIT/ORDER BY.

{code:sql}
explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
order by cs limit 100
{code}

{code}
STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
  Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
  DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: (ss_ext_sales_price > 100) (type: boolean)
  Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: (ss_ext_sales_price > 100) (type: boolean)
Statistics: Num rows: 1 Data size: 112 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  Statistics: Num rows: 1 Data size: 112 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 120 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  sort order:
  Statistics: Num rows: 1 Data size: 120 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col0 (type: bigint)
Execution mode: vectorized
Reducer 2
Execution mode: vectorized
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: bigint)
  sort order: +
  Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
Column stats: NONE
  TopN Hash Memory Usage: 0.1
Reducer 3
Execution mode: vectorized
Reduce Operator Tree:
  Select Operator
expressions: KEY.reducesinkkey0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
Column stats: NONE
Limit
  Number of rows: 100
  Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 120 Basic stats: 
COMPLETE Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: 100
  Processor Tree:
ListSink
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21330) Bucketing id varies b/w data loaded through streaming apis and regular query

2019-02-26 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21330:
--

 Summary: Bucketing id varies b/w data loaded through streaming 
apis and regular query
 Key: HIVE-21330
 URL: https://issues.apache.org/jira/browse/HIVE-21330
 Project: Hive
  Issue Type: Bug
Reporter: Vineet Garg


The test at 
[https://github.com/apache/hive/blob/master/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java#L439]
 tests for this case. It currently passes but for the wrong reason. This test 
checks for empty result set. Result sets are empty due to prior INSERT failing 
to load data not because the bucketing scheme is different.

This error with INSERT is fixed in https://github.com/apache/hive/pull/552. 
Test with this patch fails because the underlying bucketing ids generated are 
different.

These tests are run on MR instead of TEZ  which could explain the different 
bucketing ids.
I don't really know what are the repercussion of having different bucketing ids 
and why are they expected to be same but since there is a test to test this 
logic it is worth investigating the case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21323) LEFT OUTER JOIN does not generate transitive IS NOT NULL filter on right side

2019-02-25 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21323:
--

 Summary: LEFT OUTER JOIN does not generate transitive IS NOT NULL 
filter on right side
 Key: HIVE-21323
 URL: https://issues.apache.org/jira/browse/HIVE-21323
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
 Fix For: 4.0.0


{code:sql}
select a.id from a  left outer join c on a.id = c.id
{code}

CBO plan:
{code:sql}
iveProject(id=[$0])
  HiveJoin(condition=[=($0, $1)], joinType=[left], algorithm=[none], cost=[{6.0 
rows, 0.0 cpu, 0.0 io}])
HiveProject(id=[$0])
  HiveTableScan(table=[[hive_21322, a]], table:alias=[a])
HiveProject(id=[$0])
  HiveTableScan(table=[[hive_21322, c]], table:alias=[c])
{code}

Explain Plan:
{code:sql}
Stage: Stage-1
Tez
  DagId: vgarg_20190225222008_083d8041-b5dc-4af1-9dac-4ff5305ab864:10
  Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
  DagName: vgarg_20190225222008_083d8041-b5dc-4af1-9dac-4ff5305ab864:10
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: a
  Statistics: Num rows: 3 Data size: 255 Basic stats: COMPLETE 
Column stats: COMPLETE
  Select Operator
expressions: id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 3 Data size: 255 Basic stats: 
COMPLETE Column stats: COMPLETE
Map Join Operator
  condition map:
   Left Outer Join 0 to 1
  keys:
0 _col0 (type: string)
1 _col0 (type: string)
  outputColumnNames: _col0
  input vertices:
1 Map 2
  Statistics: Num rows: 3 Data size: 255 Basic stats: 
COMPLETE Column stats: COMPLETE
  HybridGraceHashJoin: true
  File Output Operator
compressed: false
Statistics: Num rows: 3 Data size: 255 Basic stats: 
COMPLETE Column stats: COMPLETE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
  alias: c
  Statistics: Num rows: 3 Data size: 258 Basic stats: COMPLETE 
Column stats: COMPLETE
  Select Operator
expressions: id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 3 Data size: 258 Basic stats: 
COMPLETE Column stats: COMPLETE
Reduce Output Operator
  key expressions: _col0 (type: string)
  sort order: +
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 3 Data size: 258 Basic stats: 
COMPLETE Column stats: COMPLETE
Execution mode: vectorized

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{code}

There is no IS NOT NULL filter on {{c.id}}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 70031: HIVE-21167

2019-02-22 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70031/#review213097
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1850 (patched)
<https://reviews.apache.org/r/70031/#comment298963>

Add NULL check for the parent. If a plan doesn't have reduce sink operator 
and you hit table scan its parent will be NULL


- Vineet Garg


On Feb. 22, 2019, 7:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70031/
> ---
> 
> (Updated Feb. 22, 2019, 7:19 a.m.)
> 
> 
> Review request for hive, Jason Dere and Vaibhav Gumashta.
> 
> 
> Bugs: HIVE-21167
> https://issues.apache.org/jira/browse/HIVE-21167
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucketing: Bucketing version 1 is incorrectly partitioning data
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 4b10e8974e 
>   ql/src/test/queries/clientpositive/murmur_hash_migration.q 2b8da9f683 
>   
> ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 
> 5a2cd47381 
>   ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out 
> 5343628252 
> 
> 
> Diff: https://reviews.apache.org/r/70031/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 70031: HIVE-21167

2019-02-21 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70031/#review213034
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 230 (patched)
<https://reviews.apache.org/r/70031/#comment298893>

Can you also add comment explaining why this should be the last 
transformation?



ql/src/test/queries/clientpositive/murmur_hash_migration.q
Lines 71 (patched)
<https://reviews.apache.org/r/70031/#comment298896>

There doesn't seem to be any way currently to see the bucketing version 
used by reduce sink op. It will be really useful to print this information in 
explain extended. It will help uncover bugs this like.



ql/src/test/queries/clientpositive/murmur_hash_migration.q
Lines 77 (patched)
<https://reviews.apache.org/r/70031/#comment298894>

Can you also add a test with insert select with union? something like 

insert into table acid_ptn_bucket1  select key, count(value), key from 
(select key, value from src where value > 2 group by key, value union all 
select key, '45' from src s2 where key > 1 group by key) sub group by key;



ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out
Line 1332 (original), 1332 (patched)
<https://reviews.apache.org/r/70031/#comment298897>

Do you know the reason this size changed? This seems strange.


- Vineet Garg


On Feb. 21, 2019, 8:59 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70031/
> ---
> 
> (Updated Feb. 21, 2019, 8:59 a.m.)
> 
> 
> Review request for hive, Jason Dere and Vaibhav Gumashta.
> 
> 
> Bugs: HIVE-21167
> https://issues.apache.org/jira/browse/HIVE-21167
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucketing: Bucketing version 1 is incorrectly partitioning data
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 4b10e8974e 
>   ql/src/test/queries/clientpositive/murmur_hash_migration.q 2b8da9f683 
>   
> ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 
> 5a2cd47381 
>   ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out 
> 5343628252 
> 
> 
> Diff: https://reviews.apache.org/r/70031/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[jira] [Created] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries

2019-02-15 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21279:
--

 Summary: Avoid moving/rename operation in FileSink op for SELECT 
queries
 Key: HIVE-21279
 URL: https://issues.apache.org/jira/browse/HIVE-21279
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg
 Fix For: 4.0.0
 Attachments: HIVE-21279.1.patch

Currently at the end of a job FileSink operator moves/rename temp directory to 
another directory from which FetchTask fetches result. This is done to avoid 
fetching potential partial/invalid files by failed/runway tasks. This operation 
is expensive for cloud storage. It could be avoided if FetchTask is passed on 
set of files to read from instead of whole directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21182) Skip setting up hive scratch dir during planning

2019-01-29 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21182:
--

 Summary: Skip setting up hive scratch dir during planning
 Key: HIVE-21182
 URL: https://issues.apache.org/jira/browse/HIVE-21182
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


During metadata gathering phase hive creates staging/scratch dir which is 
further used by FS op (FS op sets up staging dir within this dir for tasks to 
write to).
Since FS op do mkdirs to setup staging dir we can skip creating scratch dir 
during metadata gathering phase. FS op will take care of setting up all the 
dirs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21171) Skip creating scratch dirs for tez if RPC is on

2019-01-25 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21171:
--

 Summary: Skip creating scratch dirs for tez if RPC is on
 Key: HIVE-21171
 URL: https://issues.apache.org/jira/browse/HIVE-21171
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Vineet Garg
Assignee: Vineet Garg


There are few places e.g. during creating DAG/Vertices where scratch 
directories are created for each vertex even if plan is being sent using RPC. 
This adds un-necessary overhead for cloud file system e.g. S3A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   >