Re: Review Request 39836: HIVE-12309

2015-11-10 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39836/#review105945
---



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java (line 192)


Would be better if we do this only when we have complete column stats? 
Incomplete/missing column stats can lead to underestimation. Over estimation is 
sometimes fine (thanks to auto-reducer parallelism) but under estimation will 
hurt performance.


- Prasanth_J


On Oct. 31, 2015, 10:11 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39836/
> ---
> 
> (Updated Oct. 31, 2015, 10:11 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Bugs: HIVE-12309
> https://issues.apache.org/jira/browse/HIVE-12309
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TableScan should use column stats when available for better data size estimate
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java e1f8ebc 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 
> fc4f294 
>   ql/src/test/results/clientpositive/annotate_stats_filter.q.out 054b573 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 1b9ec68 
>   ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out be3fa1d 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out bc44cc3 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out c864c04 
>   ql/src/test/results/clientpositive/annotate_stats_limit.q.out 7300ea0 
>   ql/src/test/results/clientpositive/annotate_stats_part.q.out cf523cb 
>   ql/src/test/results/clientpositive/annotate_stats_select.q.out 877037d 
>   ql/src/test/results/clientpositive/annotate_stats_table.q.out ebc6c5b 
>   ql/src/test/results/clientpositive/annotate_stats_union.q.out e09dde3 
>   ql/src/test/results/clientpositive/cbo_rp_auto_join0.q.out d1bc6d4 
>   ql/src/test/results/clientpositive/cbo_rp_auto_join1.q.out 3b053fe 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out a8bcc90 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 
> f87a539 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
> 5903cd1 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
> 2ea1e6e 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 676a0e4 
>   ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 8955a61 
>   ql/src/test/results/clientpositive/stats_ppr_all.q.out 7627f7a 
>   ql/src/test/results/clientpositive/tez/explainuser_1.q.out ec434f0 
>   ql/src/test/results/clientpositive/tez/llapdecider.q.out 676a0e4 
> 
> Diff: https://reviews.apache.org/r/39836/diff/
> 
> 
> Testing
> ---
> 
> Existing tests
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Re: Review Request 39836: HIVE-12309

2015-11-10 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39836/#review105972
---



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java (line 192)


This issue is mostly a cosemtic one. 

Because whoever uses the stats (e.g., auto-reducer parallelism) looks at 
the leaves of tree, not at TS. Every operator after TS, anyway uses 
getDSFromCS() function to compute its own DS which uses numrows from its parent 
and col stats to compute DS. Parent's DS is not used. So, the value of DS in TS 
is irrelevant for planning. And because of this disconnect in explain you can 
see a DS getting increased after FIL is applied on TS. See, patch attached on 
HIVE-12181 This patch aims to fix that by having uniform logic for DS 
estimation so that explain output doesnt look stupid. Planning logic will not 
be affected by this. 

Further, estimate is made using existing function getDSFromCS() which all 
other operators use and no change is made in that w.r.t incomplete/missing 
stats.


- Ashutosh Chauhan


On Oct. 31, 2015, 10:11 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39836/
> ---
> 
> (Updated Oct. 31, 2015, 10:11 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Bugs: HIVE-12309
> https://issues.apache.org/jira/browse/HIVE-12309
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TableScan should use column stats when available for better data size estimate
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java e1f8ebc 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 
> fc4f294 
>   ql/src/test/results/clientpositive/annotate_stats_filter.q.out 054b573 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 1b9ec68 
>   ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out be3fa1d 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out bc44cc3 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out c864c04 
>   ql/src/test/results/clientpositive/annotate_stats_limit.q.out 7300ea0 
>   ql/src/test/results/clientpositive/annotate_stats_part.q.out cf523cb 
>   ql/src/test/results/clientpositive/annotate_stats_select.q.out 877037d 
>   ql/src/test/results/clientpositive/annotate_stats_table.q.out ebc6c5b 
>   ql/src/test/results/clientpositive/annotate_stats_union.q.out e09dde3 
>   ql/src/test/results/clientpositive/cbo_rp_auto_join0.q.out d1bc6d4 
>   ql/src/test/results/clientpositive/cbo_rp_auto_join1.q.out 3b053fe 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out a8bcc90 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 
> f87a539 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
> 5903cd1 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
> 2ea1e6e 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 676a0e4 
>   ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 8955a61 
>   ql/src/test/results/clientpositive/stats_ppr_all.q.out 7627f7a 
>   ql/src/test/results/clientpositive/tez/explainuser_1.q.out ec434f0 
>   ql/src/test/results/clientpositive/tez/llapdecider.q.out 676a0e4 
> 
> Diff: https://reviews.apache.org/r/39836/diff/
> 
> 
> Testing
> ---
> 
> Existing tests
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Re: Review Request 39836: HIVE-12309

2015-11-10 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39836/#review105976
---

Ship it!


Ship It!

- Prasanth_J


On Oct. 31, 2015, 10:11 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39836/
> ---
> 
> (Updated Oct. 31, 2015, 10:11 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Bugs: HIVE-12309
> https://issues.apache.org/jira/browse/HIVE-12309
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TableScan should use column stats when available for better data size estimate
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java e1f8ebc 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 
> fc4f294 
>   ql/src/test/results/clientpositive/annotate_stats_filter.q.out 054b573 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 1b9ec68 
>   ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out be3fa1d 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out bc44cc3 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out c864c04 
>   ql/src/test/results/clientpositive/annotate_stats_limit.q.out 7300ea0 
>   ql/src/test/results/clientpositive/annotate_stats_part.q.out cf523cb 
>   ql/src/test/results/clientpositive/annotate_stats_select.q.out 877037d 
>   ql/src/test/results/clientpositive/annotate_stats_table.q.out ebc6c5b 
>   ql/src/test/results/clientpositive/annotate_stats_union.q.out e09dde3 
>   ql/src/test/results/clientpositive/cbo_rp_auto_join0.q.out d1bc6d4 
>   ql/src/test/results/clientpositive/cbo_rp_auto_join1.q.out 3b053fe 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out a8bcc90 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 
> f87a539 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
> 5903cd1 
>   ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
> 2ea1e6e 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 676a0e4 
>   ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 8955a61 
>   ql/src/test/results/clientpositive/stats_ppr_all.q.out 7627f7a 
>   ql/src/test/results/clientpositive/tez/explainuser_1.q.out ec434f0 
>   ql/src/test/results/clientpositive/tez/llapdecider.q.out 676a0e4 
> 
> Diff: https://reviews.apache.org/r/39836/diff/
> 
> 
> Testing
> ---
> 
> Existing tests
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Review Request 39836: HIVE-12309

2015-10-31 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39836/
---

Review request for hive and Prasanth_J.


Bugs: HIVE-12309
https://issues.apache.org/jira/browse/HIVE-12309


Repository: hive-git


Description
---

TableScan should use column stats when available for better data size estimate


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java e1f8ebc 
  ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out fc4f294 
  ql/src/test/results/clientpositive/annotate_stats_filter.q.out 054b573 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 1b9ec68 
  ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out be3fa1d 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out bc44cc3 
  ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out c864c04 
  ql/src/test/results/clientpositive/annotate_stats_limit.q.out 7300ea0 
  ql/src/test/results/clientpositive/annotate_stats_part.q.out cf523cb 
  ql/src/test/results/clientpositive/annotate_stats_select.q.out 877037d 
  ql/src/test/results/clientpositive/annotate_stats_table.q.out ebc6c5b 
  ql/src/test/results/clientpositive/annotate_stats_union.q.out e09dde3 
  ql/src/test/results/clientpositive/cbo_rp_auto_join0.q.out d1bc6d4 
  ql/src/test/results/clientpositive/cbo_rp_auto_join1.q.out 3b053fe 
  ql/src/test/results/clientpositive/cbo_rp_join0.q.out a8bcc90 
  ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out f87a539 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
5903cd1 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
2ea1e6e 
  ql/src/test/results/clientpositive/llap/llapdecider.q.out 676a0e4 
  ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 8955a61 
  ql/src/test/results/clientpositive/stats_ppr_all.q.out 7627f7a 
  ql/src/test/results/clientpositive/tez/explainuser_1.q.out ec434f0 
  ql/src/test/results/clientpositive/tez/llapdecider.q.out 676a0e4 

Diff: https://reviews.apache.org/r/39836/diff/


Testing
---

Existing tests


Thanks,

Ashutosh Chauhan