Hi, I do not profess at all that this this reply has any correlation with the advanced people :)
However, in general a Data Frame adds the two-dimensional structure (table) to RDD which is basically a construct that cannot be optimised due to non-schema structure of RDD. Now converting RDD to DF will add the cost of creating that metadata. So there is cost associated with it. However, IMO the cost is inevitable as the later stages of the app will be much more optimised and will compensate for that initial cost In other words this is a necessary step and a feature of a tool, much like creating an index in a relational table. There is a cost in creating and maintaining a DF but the benefits outweigh the cost of metadata addition. So it is really an academic question in overall schema of things. Can we get away without having a DF and its offshoot of temporary table etc. ? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 24 June 2016 at 07:58, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I've been asking a similar question myself too! Thanks for sending it to > the mailing list! > > Going from a RDD to a Dataset triggers a job to calculate a schema (unless > the RDD is RDD[Row]). > > I *think* that transitioning from a Dataset to a RDD is almost a no op > since a Dataset requires more to generate underlying data structures and > optimizations. > > Can't wait to hear what more advanced people say. > > Jacek > On 24 Jun 2016 8:00 a.m., "pan" <pranav.na...@gmail.com> wrote: > > Hello, > I am trying to understand the cost of converting an RDD to Dataframe and > back. Would a conversion back and forth very frequently cost performance. > > I do observe that some operations like join are implemented very > differently > for RDD (pair) and Dataframe so trying to figure out the cose of converting > one to another > > Regards, > Pranav > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Cost-of-converting-RDD-s-to-dataframe-and-back-tp27222.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >