Hi!
I'm trying to understand the cost of RDD to Dataset conversion
It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000
records
It takes around 15 minutes to convert them to Dataset[MyCaseClass]
The shema of MyCaseClass is
str01: String,
str02: String,
str03: String,
str04: String,
long01: Long,
long02: Long,
double01: Double,
map: Map[String, Double]

What can i do in order to run it faster?

Reply via email to