scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

Ivan Petrov Mon, 13 Jul 2020 15:26:11 -0700

Hi!
I'm trying to understand the cost of RDD to Dataset conversion
It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000
records
It takes around 15 minutes to convert them to Dataset[MyCaseClass]
The shema of MyCaseClass is
str01: String,
str02: String,
str03: String,
str04: String,
long01: Long,
long02: Long,
double01: Double,
map: Map[String, Double]


What can i do in order to run it faster?

scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

Reply via email to