Wouldn't toDS() do this without conversion? On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov <capacyt...@gmail.com> wrote: > > Hi! > I'm trying to understand the cost of RDD to Dataset conversion > It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 > records > It takes around 15 minutes to convert them to Dataset[MyCaseClass] > The shema of MyCaseClass is > str01: String, > str02: String, > str03: String, > str04: String, > long01: Long, > long02: Long, > double01: Double, > map: Map[String, Double] > > What can i do in order to run it faster?
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org