Re: Updates on Benchmarking and Optimization Research for Calcite

2018-03-07 Thread AshwinKumar AshwinKumar
Hi Jesus, Many thanks for your inputs. Best, Ashwin On Mon, Mar 5, 2018 at 12:41 PM, Jesus Camacho Rodriguez < jcama...@apache.org> wrote: > Hi Ashwin, > > 1) It is important that table/column stats are available, so Calcite can > trigger correctly its cost-based optimizations. You can do that

Re: Updates on Benchmarking and Optimization Research for Calcite

2018-03-05 Thread Jesus Camacho Rodriguez
Hi Ashwin, 1) It is important that table/column stats are available, so Calcite can trigger correctly its cost-based optimizations. You can do that either manually by running ANALYZE... COMPUTE STATISTICS FOR COLUMNS statement, or enabling hive.stats.autogather indeed. 2) Calcite-based

Re: Updates on Benchmarking and Optimization Research for Calcite

2018-03-03 Thread Edmon Begoli
and to clarify the instrumentation aspect -- are there any Calcite logs that are typically turned off that would be good to turn on to get insights, and are there any JVM settings, JMX hooks, HDFS parameters, etc. that would be good to tap into to get a better picture of how the Java memory,

Re: Updates on Benchmarking and Optimization Research for Calcite

2018-03-03 Thread AshwinKumar AshwinKumar
Hello Dev Team, I am trying to run queries on Apache HIVE by setting the flag *hive.cbo.enabled* to true and also to false and then compare the metrics. I have a few questions regarding the same - 1. Do I need to set *hive.stats.autogather(to gather the tables statistics)* to true as well before

Re: Updates on Benchmarking and Optimization Research for Calcite

2018-02-21 Thread Michael Mior
1. TPC-DS seems like a great starting point to me. SSB would also be a good addition. 2. You can set hive.cbo.enabled in your Hive configto false to turn off the optimizer. 3. Count me interested in general although I have limited time available in the immediate future. I'd be interested in