subject:"How to cache SparkPlan.execute for reusing\?"

Re: How to cache SparkPlan.execute for reusing?

2017-03-03 Thread Liang-Chi Hsieh

Not sure what you mean in "its parents have to reuse it by creating new RDDs". As SparkPlan.execute returns new RDD every time, you won't expect the cached RDD can be reused automatically, even you reuse the SparkPlan in several queries. Btw, is there any existing ways to reuse SparkPlan?

Re: How to cache SparkPlan.execute for reusing?

2017-03-02 Thread Liang-Chi Hsieh

Internally, in each partition of the resulting RDD[InternalRow], you will get the same UnsafeRow when iterating the rows. Typical RDD.cache doesn't work for it. You will get the output with the same rows. Not sure why you get empty output. Dataset.cache() is used for caching SQL query results.