RE: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread assaf.mendelson
To: Mendelson, Assaf Subject: Re: statistics collection and propagation for cost-based optimizer They are not yet complete. The benchmark was done with an implementation of cost-based optimizer Huawei had internally for Spark 1.5 (or some even older version). On Mon, Nov 14, 2016 at 10:46 PM

Re: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread Reynold Xin
They are not yet complete. The benchmark was done with an implementation of cost-based optimizer Huawei had internally for Spark 1.5 (or some even older version). On Mon, Nov 14, 2016 at 10:46 PM, Yogesh Mahajan wrote: > It looks like Huawei team have run TPC-H benchmark

Re: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread Yogesh Mahajan
Thanks Reynold for the detailed proposals. A few questions/clarifications - 1) How the existing rule based operator co-exist with CBO? The existing rules are heuristics/empirical based, i am assuming rules like predicate pushdown or project pruning will co-exist with CBO and we just want to

Re: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread Reynold Xin
Historically tpcds and tpch. There is certainly a chance of overfitting one or two benchmarks. Note that those will probably be impacted more by the way we set the parameters for CBO rather than using x or y for summary statistics. On Monday, November 14, 2016, Shivaram Venkataraman <

Re: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread Shivaram Venkataraman
Do we have any query workloads for which we can benchmark these proposals in terms of performance ? Thanks Shivaram On Sun, Nov 13, 2016 at 5:53 PM, Reynold Xin wrote: > One additional note: in terms of size, the size of a count-min sketch with > eps = 0.1% and confidence

Re: statistics collection and propagation for cost-based optimizer

2016-11-13 Thread Reynold Xin
One additional note: in terms of size, the size of a count-min sketch with eps = 0.1% and confidence 0.87, uncompressed, is 48k bytes. To look up what that means, see http://spark.apache.org/docs/latest/api/java/org/apache/spark/util/sketch/CountMinSketch.html On Sun, Nov 13, 2016 at 5:30