To: Mendelson, Assaf
Subject: Re: statistics collection and propagation for cost-based optimizer
They are not yet complete. The benchmark was done with an implementation of
cost-based optimizer Huawei had internally for Spark 1.5 (or some even older
version).
On Mon, Nov 14, 2016 at 10:46 PM
They are not yet complete. The benchmark was done with an implementation of
cost-based optimizer Huawei had internally for Spark 1.5 (or some even
older version).
On Mon, Nov 14, 2016 at 10:46 PM, Yogesh Mahajan
wrote:
> It looks like Huawei team have run TPC-H benchmark
Thanks Reynold for the detailed proposals. A few questions/clarifications -
1) How the existing rule based operator co-exist with CBO? The existing
rules are heuristics/empirical based, i am assuming rules like predicate
pushdown or project pruning will co-exist with CBO and we just want to
Historically tpcds and tpch. There is certainly a chance of overfitting one
or two benchmarks. Note that those will probably be impacted more by the
way we set the parameters for CBO rather than using x or y for summary
statistics.
On Monday, November 14, 2016, Shivaram Venkataraman <
Do we have any query workloads for which we can benchmark these
proposals in terms of performance ?
Thanks
Shivaram
On Sun, Nov 13, 2016 at 5:53 PM, Reynold Xin wrote:
> One additional note: in terms of size, the size of a count-min sketch with
> eps = 0.1% and confidence
One additional note: in terms of size, the size of a count-min sketch with
eps = 0.1% and confidence 0.87, uncompressed, is 48k bytes.
To look up what that means, see
http://spark.apache.org/docs/latest/api/java/org/apache/spark/util/sketch/CountMinSketch.html
On Sun, Nov 13, 2016 at 5:30