To address one specific question: > Docs says it usues sun.misc.unsafe to convert physical rdd structure into byte array at some point for optimized GC and memory. My question is why is it only applicable to SQL/Dataframe and not RDD? RDD has types too!
A principal difference between RDDs and DataFrames/Datasets is that the latter have a schema associated to them. This means that they support only certain types (primitives, case classes and more) and that they are uniform, whereas RDDs can contain any serializable object and must not necessarily be uniform. These properties make it possible to generate very efficient serialization and other optimizations that cannot be achieved with plain RDDs.