Hi! I'm trying to understand the cost of RDD to Dataset conversion It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 records It takes around 15 minutes to convert them to Dataset[MyCaseClass] The shema of MyCaseClass is str01: String, str02: String, str03: String, str04: String, long01: Long, long02: Long, double01: Double, map: Map[String, Double]
What can i do in order to run it faster?