Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-06 Thread Wenchen Fan
yea sure, thanks for doing it! On Sat, Jan 6, 2018 at 9:25 PM, Jacek Laskowski wrote: > Hi Wenchen, > > That's just now when I stumbled across this comment in > `LeafNode.computeStats` [1]: > > > Leaf nodes that can survive analysis must define their own statistics. > > And the other in scaladoc

Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-06 Thread Jacek Laskowski
Hi Wenchen, That's just now when I stumbled across this comment in `LeafNode.computeStats` [1]: > Leaf nodes that can survive analysis must define their own statistics. And the other in scaladoc of AnalysisBarrier [2] > This analysis barrier will be removed at the end of analysis stage. That m

Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-04 Thread Jacek Laskowski
Thanks Wenchen. That makes a lot of sense now (after you made the point about AnalysisBarrier that I've been seeing here and there, but haven't spent much time to explore yet, but turned out important). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bi

Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-04 Thread Wenchen Fan
First of all, I think you know that `QueryExecution` is a developer API right? By definition `QueryExecution.logical` is the input plan, which can even be unresolved. Developers should be aware of it and do not apply operations that need the plan to be resolved. Obviously `LogicalPlan.stats` needs

Why some queries use logical.stats while others analyzed.stats?

2018-01-04 Thread Jacek Laskowski
Hi, I use Spark from the master today. $ ./bin/spark-shell --version Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0