Wouldn't toDS() do this without conversion?

On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov <capacyt...@gmail.com> wrote:
>
> Hi!
> I'm trying to understand the cost of RDD to Dataset conversion
> It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 
> records
> It takes around 15 minutes to convert them to Dataset[MyCaseClass]
> The shema of MyCaseClass is
> str01: String,
> str02: String,
> str03: String,
> str04: String,
> long01: Long,
> long02: Long,
> double01: Double,
> map: Map[String, Double]
>
> What can i do in order to run it faster?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to