Hi,

I tried using the (typed) Dataset API about three years ago. Then
there were limitations with predicate pushdown, overhead serialization
and maybe more things I've forgotten. Ultimately we chose the
Dataframe API as the sweet spot.

Does anyone know of a good overview of the current state of the
Dataset API, pros/cons as of Spark 3?

Is it fully usable, do you get the advantages of a strongly typed
dataframe? Any known limitations or drawbacks to take into account?

br,

Magnus

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to