Current state of dataset api

Magnus Nilsson Mon, 04 Oct 2021 03:55:48 -0700

Hi,

I tried using the (typed) Dataset API about three years ago. Then
there were limitations with predicate pushdown, overhead serialization
and maybe more things I've forgotten. Ultimately we chose the
Dataframe API as the sweet spot.


Does anyone know of a good overview of the current state of the
Dataset API, pros/cons as of Spark 3?

Is it fully usable, do you get the advantages of a strongly typed
dataframe? Any known limitations or drawbacks to take into account?

br,

Magnus

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Current state of dataset api

Reply via email to