[jira] [Created] (HIVE-20284) In strict mode, if constant propagation is enable, the partition filter is folded before partition pruner lead to error "No partition predicate for Alias"

2018-07-31 Thread Hui Huang (JIRA)
Hui Huang created HIVE-20284:


 Summary: In strict mode, if constant propagation is enable, the 
partition filter is folded before partition pruner lead to error "No partition 
predicate for Alias"  
 Key: HIVE-20284
 URL: https://issues.apache.org/jira/browse/HIVE-20284
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 1.2.1, 2.3.3
Reporter: Hui Huang
Assignee: Hui Huang
 Fix For: 2.3.3


In strict mode and the hive.optimize.constant.propagation is set to true, the 
following sql will failed:

{code:java}
hive> desc employee_part;
OK
col_namedata_type   comment
eid int
namestring
deptstring
yearstring
month   string

# Partition Information
# col_name  data_type   comment

yearstring
month   string
Time taken: 0.564 seconds, Fetched: 11 row(s)
hive> set hive.mapred.mode=strict;
hive> select * from employee_part where false and concat(year,month)='201807';
FAILED: SemanticException Queries against partitioned tables without a 
partition filter are disabled for safety reasons. If you know what you are 
doing, please sethive.strict.checks.large.query to false and that 
hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get 
errors or incorrect results if you make a mistake while using some of the 
unsafe features. No partition predicate for Alias "employee_part" Table 
"employee_part"
{code}

The above error msg is confusing,  concat(year,month)='201807' is the partition 
filter。

The reason is during logic optimization, the ConstantPropagate optimizer is 
running before partitionPruner optimizer, when found a express like 'false and 
concat(year,month)=', the express will replace with 'fasle' and the 
partition filter is droped. So the PartitionPruner can not get the partition 
filter.

Users can remove the constant express that always has true/false values to work 
around.

When views used, if some columns are constant values, users  will be  confusing.

So we should add some more message in the error msg returned.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68139: HIVE-14493

2018-07-31 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68139/#review206721
---




ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Line 1951 (original), 1965 (patched)


Can we reuse viewPartition rule for this. Only diff is ON vs BY keyword. I 
think its better to be consistent with views (which uses ON) then tables (BY) 
in this case.



ql/src/test/queries/clientpositive/materialized_view_partitioned.q
Lines 56 (patched)


I see there are no select from src_txn in tests. I thought rewriting will 
work on partitioned MVs without any changes. Selecting MV among multiple views 
needs some work.


- Ashutosh Chauhan


On Aug. 1, 2018, 3:07 a.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68139/
> ---
> 
> (Updated Aug. 1, 2018, 3:07 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-14493
> https://issues.apache.org/jira/browse/HIVE-14493
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14493
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 61396e76abc8ccb8c4a41f1b9f498736c114eb0b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
> 49f5487f400a9278248d5ab279dcc9c6a551c416 
>   ql/src/test/queries/clientpositive/materialized_view_partitioned.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68139/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>



Re: Review Request 68139: HIVE-14493

2018-07-31 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68139/
---

(Updated Aug. 1, 2018, 3:07 a.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-14493
https://issues.apache.org/jira/browse/HIVE-14493


Repository: hive-git


Description
---

HIVE-14493


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
61396e76abc8ccb8c4a41f1b9f498736c114eb0b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
49f5487f400a9278248d5ab279dcc9c6a551c416 
  ql/src/test/queries/clientpositive/materialized_view_partitioned.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68139/diff/2/

Changes: https://reviews.apache.org/r/68139/diff/1-2/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Review Request 68139: HIVE-14493

2018-07-31 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68139/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-14493
https://issues.apache.org/jira/browse/HIVE-14493


Repository: hive-git


Description
---

HIVE-14493


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
61396e76abc8ccb8c4a41f1b9f498736c114eb0b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
49f5487f400a9278248d5ab279dcc9c6a551c416 
  ql/src/test/queries/clientpositive/materialized_view_partitioned.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68139/diff/1/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Re: Review Request 68124: HIVE-20252

2018-07-31 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68124/#review206717
---




ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
Lines 451 (patched)


We can remove this first block, it does not buy us much in terms of 
algorithm perfomance, and method would have no restriction on start operator 
(plus more readable).



ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java
Lines 462 (patched)


Probably more useful to do the inverse, the private method void and the 
public method returns the operators in the work?


- Jesús Camacho Rodríguez


On Aug. 1, 2018, 12:27 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68124/
> ---
> 
> (Updated Aug. 1, 2018, 12:27 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Jason Dere.
> 
> 
> Bugs: HIVE-20252
> https://issues.apache.org/jira/browse/HIVE-20252
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira.
> 
> removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. 
> I will eventually remove it and can be ignored.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 
> 
> 
> Diff: https://reviews.apache.org/r/68124/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 68124: HIVE-20252

2018-07-31 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68124/
---

(Updated Aug. 1, 2018, 12:27 a.m.)


Review request for hive, Jesús Camacho Rodríguez and Jason Dere.


Changes
---

Implemented review comments.


Bugs: HIVE-20252
https://issues.apache.org/jira/browse/HIVE-20252


Repository: hive-git


Description
---

See Jira.

removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I 
will eventually remove it and can be ignored.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 


Diff: https://reviews.apache.org/r/68124/diff/3/

Changes: https://reviews.apache.org/r/68124/diff/2-3/


Testing
---


Thanks,

Deepak Jaiswal



Re: Review Request 68124: HIVE-20252

2018-07-31 Thread Deepak Jaiswal


> On July 31, 2018, 11:38 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Line 914 (original), 917 (patched)
> > 
> >
> > Can be collapsed into single line in if condition.

I have been asked to not do that in other reviews before so I kept it that way. 
Lets keep it this way.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68124/#review206707
---


On July 31, 2018, 11:07 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68124/
> ---
> 
> (Updated July 31, 2018, 11:07 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Jason Dere.
> 
> 
> Bugs: HIVE-20252
> https://issues.apache.org/jira/browse/HIVE-20252
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira.
> 
> removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. 
> I will eventually remove it and can be ignored.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 
> 
> 
> Diff: https://reviews.apache.org/r/68124/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 68124: HIVE-20252

2018-07-31 Thread Deepak Jaiswal


> On July 31, 2018, 11:38 p.m., Jesús Camacho Rodríguez wrote:
> >

Thanks I will work on the comments.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68124/#review206707
---


On July 31, 2018, 11:07 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68124/
> ---
> 
> (Updated July 31, 2018, 11:07 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Jason Dere.
> 
> 
> Bugs: HIVE-20252
> https://issues.apache.org/jira/browse/HIVE-20252
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira.
> 
> removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. 
> I will eventually remove it and can be ignored.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 
> 
> 
> Diff: https://reviews.apache.org/r/68124/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 68124: HIVE-20252

2018-07-31 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68124/#review206707
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 846 (original), 841 (patched)


Can we 1) move this method to OperatorUtils, and 2) keep this method 
private, and 3) create a void public entry method for the recursion, where we 
give the _found_ set to the private one, then return the _found_ set?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 865 (original)


Cool!



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 914 (original), 917 (patched)


Can be collapsed into single line in if condition.



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 918 (original), 922 (patched)


_workRSOps_ and _workTerminalOps_



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 920 (original), 924 (patched)


No need to pass empty set, see comment above.



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 934 (original)


Loop on _candidate_ here to populate _terminalOpToRSMap_.
Then you can get rid of the 3-level nested loops below.


- Jesús Camacho Rodríguez


On July 31, 2018, 11:07 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68124/
> ---
> 
> (Updated July 31, 2018, 11:07 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Jason Dere.
> 
> 
> Bugs: HIVE-20252
> https://issues.apache.org/jira/browse/HIVE-20252
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira.
> 
> removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. 
> I will eventually remove it and can be ignored.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 
> 
> 
> Diff: https://reviews.apache.org/r/68124/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[jira] [Created] (HIVE-20283) Logs may be directed to 2 files if --hiveconf hive.log.file is used (metastore)

2018-07-31 Thread Jaume M (JIRA)
Jaume M created HIVE-20283:
--

 Summary: Logs may be directed to 2 files if --hiveconf 
hive.log.file is used (metastore)
 Key: HIVE-20283
 URL: https://issues.apache.org/jira/browse/HIVE-20283
 Project: Hive
  Issue Type: Bug
Reporter: Jaume M
Assignee: Jaume M


Unfortunately when doing this : 
https://issues.apache.org/jira/browse/HIVE-19886 I forgot to do it as well for 
the metastore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68124: HIVE-20252

2018-07-31 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68124/
---

(Updated July 31, 2018, 11:07 p.m.)


Review request for hive, Jesús Camacho Rodríguez and Jason Dere.


Changes
---

New approach where a virtual edge is created from non-semijoin terminal 
operators in a task to semijoin terminal operators within the task.
This creates a cycle if there exists a task level cycle.


Bugs: HIVE-20252
https://issues.apache.org/jira/browse/HIVE-20252


Repository: hive-git


Description
---

See Jira.

removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I 
will eventually remove it and can be ignored.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 


Diff: https://reviews.apache.org/r/68124/diff/2/

Changes: https://reviews.apache.org/r/68124/diff/1-2/


Testing
---


Thanks,

Deepak Jaiswal



[jira] [Created] (HIVE-20282) HiveServer2 incorrect queue name when using Tez instead of MR

2018-07-31 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-20282:
-

 Summary: HiveServer2 incorrect queue name when using Tez instead 
of MR
 Key: HIVE-20282
 URL: https://issues.apache.org/jira/browse/HIVE-20282
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Steve Yeom
Assignee: Steve Yeom
 Fix For: 4.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68109: HIVE-20260 NDV of a column shouldn't be scaled when row count is changed by filter on another column

2018-07-31 Thread Zoltan Haindrich

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68109/
---

(Updated July 31, 2018, 6:18 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

01wip03


Bugs: HIVE-20260
https://issues.apache.org/jira/browse/HIVE-20260


Repository: hive-git


Description
---

* keep track of used column; and only rescale affected columns
* much more conservative than old logic - possible too much...
* wip patch


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/AnnotateStatsProcCtx.java
 47ee949fbc 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
 3c2b085426 
  ql/src/test/queries/clientpositive/stat_estimate_drill.q PRE-CREATION 
  ql/src/test/queries/clientpositive/stat_estimate_related_col.q 52da2f759a 
  ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 
83bb65ede4 
  ql/src/test/results/clientpositive/stat_estimate_drill.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/stat_estimate_related_col.q.out 669adafda3 


Diff: https://reviews.apache.org/r/68109/diff/2/

Changes: https://reviews.apache.org/r/68109/diff/1-2/


Testing
---


Thanks,

Zoltan Haindrich



Re: Review Request 68109: HIVE-20260 NDV of a column shouldn't be scaled when row count is changed by filter on another column

2018-07-31 Thread Zoltan Haindrich


> On July 30, 2018, 6:38 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
> > Line 354 (original), 355 (patched)
> > 
> >
> > add a comment here:
> > We assume columns are uncorrelated. That is filters on different 
> > columns will result in filtering out different rows. So, we scale down the 
> > ndv of a column only when row count is decreased by its own filter. Under 
> > correlated assumption, we would have scaled down ndv for every column for 
> > every filter condition. We dont do that. 
> > This makes our estimate more conservative than need to be which is good 
> > since this will result in overestimates when we are wrong but avoids OOM 
> > had we chosen the other assumption. In future, we need to capture 
> > correlatedness of columns in metadata so that we can account for that.

added a comment about it.
yes; I aggree capturing correlations between different columns would be good - 
but there are around `|columns|**2` of them...I think Calcite has some tools 
for this..
But I currently feel that the current calculation is too much numRows centric; 
which makes it a little hard to keep track / and provide correct estimation 
logic for columns...


> On July 30, 2018, 6:38 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
> > Line 2633 (original), 2647 (patched)
> > 
> >
> > Add assert newNDV <= newNumRows.

actually that would be fail; added code to clamp NDV to maxRows


- Zoltan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68109/#review206607
---


On July 30, 2018, 4:17 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68109/
> ---
> 
> (Updated July 30, 2018, 4:17 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20260
> https://issues.apache.org/jira/browse/HIVE-20260
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> * keep track of used column; and only rescale affected columns
> * much more conservative than old logic - possible too much...
> * wip patch
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/AnnotateStatsProcCtx.java
>  47ee949fbcfa9391c640719a57fab39279c009db 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
>  3c2b0854269d5426153958096a8b5b5ad3612c0f 
>   ql/src/test/queries/clientpositive/stat_estimate_drill.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/stat_estimate_related_col.q 
> 52da2f759a009daa372a53446e2f0fd4a88152be 
>   ql/src/test/results/clientpositive/stat_estimate_drill.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/stat_estimate_related_col.q.out 
> 669adafda3a45f7846face3d99817cd1b9cb3664 
> 
> 
> Diff: https://reviews.apache.org/r/68109/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>



[jira] [Created] (HIVE-20281) SharedWorkOptimizer fails with 'operator cache contents and actual plan differ'

2018-07-31 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-20281:
--

 Summary: SharedWorkOptimizer fails with 'operator cache contents 
and actual plan differ'
 Key: HIVE-20281
 URL: https://issues.apache.org/jira/browse/HIVE-20281
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 4.0.0, 3.2.0
Reporter: Ashutosh Chauhan
Assignee: Jesus Camacho Rodriguez


HIVE-18201 seems to trigger a latent bug in SW optimizer. Test 
{{subquery_in_having}} fails with:
{code}
2018-07-31T08:42:57,328 DEBUG [b68f20cc-54d5-466d-b512-1540b3a43396 main] 
optimizer.SharedWorkOptimizer: After SharedWorkExtendedOptimizer:
TS[0]-SEL[1]-MAPJOIN[131]-FIL[12]-SEL[13]-GBY[14]-RS[15]-GBY[16]-SEL[17]-MAPJOIN[136]-MAPJOIN[137]-FIL[103]-SEL[104]-FS[105]
 
-FIL[113]-SEL[20]-RS[44]-MAPJOIN[133]-SEL[47]-GBY[48]-RS[49]-GBY[50]-SEL[51]-GBY[55]-RS[98]-MAPJOIN[136]
  
-RS[88]-GBY[89]-SEL[120]-FIL[116]-SEL[91]-GBY[93]-RS[94]-GBY[95]-SEL[96]-RS[101]-MAPJOIN[137]
TS[2]-FIL[112]-GBY[5]-RS[6]-GBY[7]-SEL[8]-RS[10]-MAPJOIN[131]
 
-RS[31]-MAPJOIN[132]-FIL[33]-SEL[34]-GBY[35]-RS[36]-GBY[37]-SEL[38]-GBY[42]-MAPJOIN[133]
TS[21]-FIL[114]-SEL[22]-MAPJOIN[132]
2018-07-31T08:42:57,329 ERROR [b68f20cc-54d5-466d-b512-1540b3a43396 main] 
ql.Driver: FAILED: SemanticException Error in shared work optimizer: operator 
cache contentsand actual plan differ
org.apache.hadoop.hive.ql.parse.SemanticException: Error in shared work 
optimizer: operator cache contentsand actual plan differ
at 
org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer.transform(SharedWorkOptimizer.java:524)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:146)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12361)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:356)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:165)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:663)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20280) JobResultSerializer uses wrong registration id in KyroMessageCodec

2018-07-31 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-20280:
---

 Summary: JobResultSerializer uses wrong registration id in 
KyroMessageCodec
 Key: HIVE-20280
 URL: https://issues.apache.org/jira/browse/HIVE-20280
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Inside {{KryoMessageCodec}} the code:

{code}
  Kryo kryo = new Kryo();
  int count = 0;
  for (Class klass : messages) {
kryo.register(klass, REG_ID_BASE + count);
count++;
  }
  kryo.register(BaseProtocol.JobResult.class, new JobResultSerializer(), 
count);
{code}

Uses the wrong registration id for the {{JobResultSerializer}} it should be 
{{REG_ID_BASE + count}} not {{count}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68121: HIVE-20220 : Incorrect result when hive.groupby.skewindata is enabled

2018-07-31 Thread Ganesha Shreedhara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68121/
---

(Updated July 31, 2018, 2:06 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped by 
keys to the reducers and hence avoids overloading a single reducer when there 
is a skew in data. 

This random distribution of keys is buggy when the reducer fails to fetch the 
mapper output due to a faulty datanode or any other reason. When reducer finds 
that it can't fetch mapper output, it sends a signal to Application Master to 
reattempt the corresponding map task. The reattempted map task will now get the 
different random value from rand function and hence the keys that gets 
distributed now to the reducer will not be same as the previous run. 

 

Steps to reproduce:

create table test(id int);

insert into test values 
(1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9);

SET hive.groupby.skewindata=true;

SET mapreduce.reduce.reduces=2;

//Add a debug port for reducer

select count(1) from test group by id;

//Remove mapper's intermediate output file when map stage is completed and one 
out of 2 reduce tasks is completed and then continue the run. This causes 2nd 
reducer to send event to Application Master to rerun the map task. 

The following is the expected result. 

1
2
3
4
5
6
8
8
9 

 

But you may get different result due to a different value returned by the rand 
function in the second run causing different distribution of keys.

This needs to be fixed such that the mapper distributes the same keys even if 
it is reattempted multiple times.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39c77b3fe5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 250a085084 
  ql/src/test/queries/clientpositive/groupby_skew_rand_seed.q PRE-CREATION 
  ql/src/test/results/clientpositive/groupby_skew_rand_seed.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/68121/diff/2/

Changes: https://reviews.apache.org/r/68121/diff/1-2/


Testing
---

Qtests added


Thanks,

Ganesha Shreedhara



[jira] [Created] (HIVE-20279) HiveContextAwareRecordReader slows down Druid Scan queries.

2018-07-31 Thread Nishant Bangarwa (JIRA)
Nishant Bangarwa created HIVE-20279:
---

 Summary: HiveContextAwareRecordReader slows down Druid Scan 
queries. 
 Key: HIVE-20279
 URL: https://issues.apache.org/jira/browse/HIVE-20279
 Project: Hive
  Issue Type: Improvement
Reporter: Nishant Bangarwa
Assignee: Nishant Bangarwa
 Attachments: scan2.svg

HiveContextAwareRecordReader add lots of overhead for Druid Scan Queries. 
See attached flame graph. 
Looks like the operations for checking for existence of footer/header buffer 
takes most of time For druid and other storage handlers that do not have footer 
buffer we should skip the logic for checking the existence for storage handlers 
atleast. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20278) Druid Scan Query avoid copying from List -> Map -> List

2018-07-31 Thread Nishant Bangarwa (JIRA)
Nishant Bangarwa created HIVE-20278:
---

 Summary: Druid Scan Query avoid copying from List -> Map -> List
 Key: HIVE-20278
 URL: https://issues.apache.org/jira/browse/HIVE-20278
 Project: Hive
  Issue Type: Improvement
Reporter: Nishant Bangarwa
Assignee: Nishant Bangarwa


DruidScanQueryRecordReader gets a compacted List from druid. It then 
converts that list into a Map as DruidWritable where key is the 
column name. 
At the second stage DruidSerde takes this DruidWritable and creates a List out 
out of the map again. We can avoid the map creation part by reading the list 
sent by druid directly in the DruidSerde.deserialize() method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-07-31 Thread Rajesh Balamohan
Congratulations Sahil!

~Rajesh.B


On Tue, Jul 31, 2018 at 3:57 PM Marta Kuczora 
wrote:

> Congratulations Sahil!
>
> On Mon, Jul 30, 2018 at 9:44 AM Peter Vary 
> wrote:
>
> > Congratulations Sahil!
> >
> > > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> > >
> > > Congratulations Sahil!
> > >
> > >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan 
> > wrote:
> > >>
> > >> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is
> > >> joining Hive PMC.
> > >> Thanks Sahil for all your contributions till now. Looking forward to
> > many
> > >> more.
> > >>
> > >> Welcome, Sahil!
> > >>
> > >> Thanks,
> > >> Ashutosh
> > >
> >
> >


Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-07-31 Thread Rajesh Balamohan
Congratulations Peter!

~Rajesh.B


On Tue, Jul 31, 2018 at 3:58 PM Marta Kuczora 
wrote:

> Congratulations Peter!
>
> On Mon, Jul 30, 2018 at 7:53 PM Andrew Sherman
>  wrote:
>
> > Congratulations Peter!
> >
> > On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg 
> wrote:
> >
> > > Congratulations Peter!
> > >
> > > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan  >
> > > wrote:
> > > >
> > > > On behalf of the Hive PMC I am delighted to announce Peter Vary is
> > > joining
> > > > Hive PMC.
> > > > Thanks Peter for all your contributions till now. Looking forward to
> > many
> > > > more.
> > > >
> > > > Welcome, Peter!
> > > >
> > > > Thanks,
> > > > Ashutosh
> > >
> > >
> >


Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-07-31 Thread Rajesh Balamohan
Congratulations Vihang!

~Rajesh.B


On Tue, Jul 31, 2018 at 3:35 PM Marta Kuczora 
wrote:

> Congratulations Vihang!
>
> On Mon, Jul 30, 2018 at 9:44 AM Peter Vary 
> wrote:
>
> > Congratulations Vihang!
> >
> > > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> > >
> > > Congratulations Vihang!
> > >
> > >> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan 
> > wrote:
> > >>
> > >> On behalf of the Hive PMC I am delighted to announce Vihang
> > Karajgaonkar
> > >> is joining Hive PMC.
> > >> Thanks Vihang for all your contributions till now. Looking forward to
> > many
> > >> more.
> > >>
> > >> Welcome, Vihang!
> > >>
> > >> Thanks,
> > >> Ashutosh
> > >
> >
> >


Re: [ANNOUNCE] New committer: Slim Bouguerra

2018-07-31 Thread Rajesh Balamohan
Congratulations Slim!

~Rajesh.B

On Tue, Jul 31, 2018 at 3:34 PM Marta Kuczora 
wrote:

> Congratulations Slim!
>
> On Mon, Jul 30, 2018 at 2:01 AM Ashutosh Chauhan 
> wrote:
>
> > Apache Hive's Project Management Committee (PMC) has invited Slim
> Bouguerra
> > to become a committer, and we are pleased to announce that he has
> accepted.
> >
> > Slim, welcome, thank you for your contributions, and we look forward your
> > further interactions with the community!
> >
> > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> >


Re: [ANNOUNCE] New PMC Member : Vineet Garg

2018-07-31 Thread Rajesh Balamohan
Congratulations Vineet!

~Rajesh.B


On Tue, Jul 31, 2018 at 3:34 PM Marta Kuczora 
wrote:

> Congratulations Vineet!
>
> On Mon, Jul 30, 2018 at 9:45 AM Peter Vary 
> wrote:
>
> > Congratulations Vineet!
> >
> > > On Jul 30, 2018, at 01:59, Ashutosh Chauhan 
> > wrote:
> > >
> > > On behalf of the Hive PMC I am delighted to announce Vineet Garg is
> > joining
> > > Hive PMC.
> > > Thanks Vineet for all your contributions till now. Looking forward to
> > many
> > > more.
> > >
> > > Welcome, Vineet!
> > >
> > > Thanks,
> > > Ashutosh
> >
> >
>


-- 
~Rajesh.B


Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-07-31 Thread Marta Kuczora
Congratulations Peter!

On Mon, Jul 30, 2018 at 7:53 PM Andrew Sherman
 wrote:

> Congratulations Peter!
>
> On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg  wrote:
>
> > Congratulations Peter!
> >
> > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan 
> > wrote:
> > >
> > > On behalf of the Hive PMC I am delighted to announce Peter Vary is
> > joining
> > > Hive PMC.
> > > Thanks Peter for all your contributions till now. Looking forward to
> many
> > > more.
> > >
> > > Welcome, Peter!
> > >
> > > Thanks,
> > > Ashutosh
> >
> >
>


Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-07-31 Thread Marta Kuczora
Congratulations Sahil!

On Mon, Jul 30, 2018 at 9:44 AM Peter Vary 
wrote:

> Congratulations Sahil!
>
> > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> >
> > Congratulations Sahil!
> >
> >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan 
> wrote:
> >>
> >> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is
> >> joining Hive PMC.
> >> Thanks Sahil for all your contributions till now. Looking forward to
> many
> >> more.
> >>
> >> Welcome, Sahil!
> >>
> >> Thanks,
> >> Ashutosh
> >
>
>


Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-07-31 Thread Marta Kuczora
Congratulations Vihang!

On Mon, Jul 30, 2018 at 9:44 AM Peter Vary 
wrote:

> Congratulations Vihang!
>
> > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> >
> > Congratulations Vihang!
> >
> >> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan 
> wrote:
> >>
> >> On behalf of the Hive PMC I am delighted to announce Vihang
> Karajgaonkar
> >> is joining Hive PMC.
> >> Thanks Vihang for all your contributions till now. Looking forward to
> many
> >> more.
> >>
> >> Welcome, Vihang!
> >>
> >> Thanks,
> >> Ashutosh
> >
>
>


Re: [ANNOUNCE] New PMC Member : Vineet Garg

2018-07-31 Thread Marta Kuczora
Congratulations Vineet!

On Mon, Jul 30, 2018 at 9:45 AM Peter Vary 
wrote:

> Congratulations Vineet!
>
> > On Jul 30, 2018, at 01:59, Ashutosh Chauhan 
> wrote:
> >
> > On behalf of the Hive PMC I am delighted to announce Vineet Garg is
> joining
> > Hive PMC.
> > Thanks Vineet for all your contributions till now. Looking forward to
> many
> > more.
> >
> > Welcome, Vineet!
> >
> > Thanks,
> > Ashutosh
>
>


Re: [ANNOUNCE] New committer: Slim Bouguerra

2018-07-31 Thread Marta Kuczora
Congratulations Slim!

On Mon, Jul 30, 2018 at 2:01 AM Ashutosh Chauhan 
wrote:

> Apache Hive's Project Management Committee (PMC) has invited Slim Bouguerra
> to become a committer, and we are pleased to announce that he has accepted.
>
> Slim, welcome, thank you for your contributions, and we look forward your
> further interactions with the community!
>
> Ashutosh Chauhan (on behalf of the Apache Hive PMC)
>


Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-07-31 Thread Zoltan Haindrich

Congratulations!

On 07/31/2018 07:27 AM, Anishek Agarwal wrote:

Congratulations Vihang!

On Tue, Jul 31, 2018 at 6:39 AM Vihang Karajgaonkar
 wrote:


Thanks Everyone!

On Mon, Jul 30, 2018 at 4:05 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:


Congratulations Vihang!

Thanks
Prasanth



On Mon, Jul 30, 2018 at 12:34 PM -0700, "Xuefu Zhang"
mailto:xu...@uber.com.INVALID>> wrote:


Congratulations, Vihang!

On Mon, Jul 30, 2018 at 10:53 AM, Andrew Sherman <
asher...@cloudera.com.invalid> wrote:


Congratulations Vihang!



On Mon, Jul 30, 2018 at 12:44 AM Peter Vary
wrote:


Congratulations Vihang!


On Jul 29, 2018, at 22:32, Vineet Garg  wrote:

Congratulations Vihang!


On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan

wrote:


On behalf of the Hive PMC I am delighted to announce Vihang

Karajgaonkar

is joining Hive PMC.
Thanks Vihang for all your contributions till now. Looking forward

to

many

more.

Welcome, Vihang!

Thanks,
Ashutosh
















Re: [ANNOUNCE] New PMC Member : Vineet Garg

2018-07-31 Thread Zoltan Haindrich

Congratulations!

On 07/31/2018 07:38 AM, Anshuman Dwivedi wrote:

Congrats to Vineet, Vihang, Peter, Slim.



Rgds,
Anshuman Dwivedi




-Anishek Agarwal  wrote: -

To: dev@hive.apache.org
From: Anishek Agarwal 
Date: 07/31/2018 10:51AM
Subject: Re: [ANNOUNCE] New PMC Member : Vineet Garg


Congrats Vineet!

On Tue, Jul 31, 2018 at 10:30 AM Vineet Garg  wrote:


Thanks all!


On Jul 30, 2018, at 6:07 PM, Vihang Karajgaonkar

 wrote:


Congrats Vineet!

On Mon, Jul 30, 2018 at 4:04 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:


Congratulations Vineet!

Thanks
Prasanth



On Mon, Jul 30, 2018 at 12:11 PM -0700, "Xuefu Zhang"
mailto:xu...@uber.com.INVALID>> wrote:


Congratulations!

On Mon, Jul 30, 2018 at 12:10 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:


Congrats Vineet!

On 7/30/18, 10:53 AM, "Andrew Sherman"
wrote:

Congratulations Vineet!

On Mon, Jul 30, 2018 at 12:52 AM Deepak Jaiswal <
djais...@hortonworks.com>
wrote:


Congratulations Vineet!

On 7/30/18, 12:45 AM, "Peter Vary"

wrote:


Congratulations Vineet!


On Jul 30, 2018, at 01:59, Ashutosh Chauhan <

hashut...@apache.org>

wrote:


On behalf of the Hive PMC I am delighted to announce Vineet

Garg is

joining

Hive PMC.
Thanks Vineet for all your contributions till now. Looking

forward

to many

more.

Welcome, Vineet!

Thanks,
Ashutosh
















=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you





Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-07-31 Thread Zoltan Haindrich

Congratulations!

On 07/31/2018 07:27 AM, Anishek Agarwal wrote:

Congratulations Peter!

On Tue, Jul 31, 2018 at 6:38 AM Vihang Karajgaonkar
 wrote:


Congrats Peter!

On Mon, Jul 30, 2018 at 4:04 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:


Congratulations Peter!

Thanks
Prasanth



On Mon, Jul 30, 2018 at 12:21 PM -0700, "Xuefu Zhang"
mailto:xu...@uber.com.INVALID>> wrote:


Congratulations, Peter!

On Mon, Jul 30, 2018 at 12:11 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:


Congrats Peter!

On 7/30/18, 10:53 AM, "Andrew Sherman"
wrote:

 Congratulations Peter!

 On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg
wrote:

 > Congratulations Peter!
 >
 > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan <
hashut...@apache.org>
 > wrote:
 > >
 > > On behalf of the Hive PMC I am delighted to announce Peter Vary

is

 > joining
 > > Hive PMC.
 > > Thanks Peter for all your contributions till now. Looking

forward

to many
 > > more.
 > >
 > > Welcome, Peter!
 > >
 > > Thanks,
 > > Ashutosh
 >
 >












Re: [ANNOUNCE] New committer: Slim Bouguerra

2018-07-31 Thread Zoltan Haindrich

Congratulations!

On 07/31/2018 07:21 AM, Anishek Agarwal wrote:

Congratulations Slim!

On Tue, Jul 31, 2018 at 10:29 AM Vineet Garg  wrote:


Congrats Slim!


On Jul 30, 2018, at 6:08 PM, Vihang Karajgaonkar

 wrote:


Congrats Slim!

On Mon, Jul 30, 2018 at 4:35 PM, Deepak Jaiswal <

djais...@hortonworks.com>

wrote:


Congrats Slim!

On 7/30/18, 4:03 PM, "Prasanth Jayachandran" <
pjayachand...@hortonworks.com> wrote:

Congratulations Slim!


On Jul 30, 2018, at 4:00 PM, Sergey Shelukhin <

ser...@hortonworks.com> wrote:


Congrats!

On 18/7/30, 12:53, "Gunther Hagleitner" 
Congratulations!

Thanks,
Gunther.

From: Xuefu Zhang 
Sent: Monday, July 30, 2018 12:11 PM
To: dev@hive.apache.org
Subject: Re: [ANNOUNCE] New committer: Slim Bouguerra

congratulations!!!

On Mon, Jul 30, 2018 at 12:10 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:


Congrats Slim!

On 7/30/18, 10:53 AM, "Andrew Sherman"



wrote:

   Congratulations Slim!

   On Mon, Jul 30, 2018 at 12:46 AM Peter Vary



   wrote:


Congratulations Slim!


On Jul 30, 2018, at 02:00, Ashutosh Chauhan



wrote:


Apache Hive's Project Management Committee (PMC) has invited

Slim

Bouguerra

to become a committer, and we are pleased to announce that he

has

accepted.


Slim, welcome, thank you for your contributions, and we look

forward your

further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)





















[GitHub] hive pull request #405: There a negative number of splits need be avoided

2018-07-31 Thread Seandity
GitHub user Seandity opened a pull request:

https://github.com/apache/hive/pull/405

There  a negative number of splits need be avoided

I am facing issues when exec insert   on hive2.2-tez0.84   :
I follow the tez api source code ,but I still dont konw why here would get 
a a negative number
java.lang.IllegalArgumentException: Illegal Capacity: -1
at java.util.ArrayList.(ArrayList.java:157)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:339)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:519)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:768)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:211)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Seandity/hive patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit 842ced5337e837413761e13754b5a3f23ab3ee77
Author: jiahuilliu 
Date:   2018-07-31T08:39:54Z

There  a negative number of splits need be avoided

I am facing issues when exec insert   on hive2.2-tez0.84   :
I follow the tez api source code ,but I still dont konw why here would get 
a a negative number
java.lang.IllegalArgumentException: Illegal Capacity: -1
at java.util.ArrayList.(ArrayList.java:157)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:339)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:519)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:768)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:211)




---


[jira] [Created] (HIVE-20277) Vectorization: Case expressions that return NULL in FILTER

2018-07-31 Thread Gopal V (JIRA)
Gopal V created HIVE-20277:
--

 Summary: Vectorization: Case expressions that return NULL in FILTER
 Key: HIVE-20277
 URL: https://issues.apache.org/jira/browse/HIVE-20277
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


In cases like Query89, the vertex with the filter is not vectorized.

{code}
   Filter Operator
  predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
(((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END (type: 
boolean)
{code}

{code}
Reducer 3 
Execution mode: llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
notVectorizedReason: FILTER operator: Unexpected hive type name 
void
vectorized: false
{code}

The query specifically has 

{code}
where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
{code}

while rewriting it to 

{code}
where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
{code}

does vectorize into 

{code}
Filter Operator
  Filter Vectorization:
  className: VectorFilterOperator
  native: true
  predicateExpression: SelectColumnIsTrue(col 
12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
(((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
FuncAbsDoubleToDouble(col 9:double)(children: DoubleColSubtractDoubleColumn(col 
6:double, col 7:double) -> 9:double) -> 10:double) -> 9:double) -> 11:boolean) 
-> 12:boolean)
  predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
(((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END (type: 
boolean)
  Statistics: Num rows: 11 Data size: 5291 Basic stats: 
COMPLETE Column stats: COMPLETE
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20276) Hive UDF class getting Instantiated for each call of function

2018-07-31 Thread Hardik Trivedi (JIRA)
Hardik Trivedi created HIVE-20276:
-

 Summary: Hive UDF class getting Instantiated for each call of 
function
 Key: HIVE-20276
 URL: https://issues.apache.org/jira/browse/HIVE-20276
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Hardik Trivedi


* I have created One Hive UDF class and register its function in spark.
 * In hive query inside spark session object  i call this function
 * Now when i run my code i observe on each time when function called it create 
new instance of UDF class.
 * Is it normal behavior? On each call should it create new instance?
 * Is it version specific issue? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)