Dataframe vs Dataset dilemma: either Row parsing or no filter push-down

Valery Khamenya Mon, 18 Jun 2018 14:01:13 -0700

Hi Spark gurus,

I was surprised to read here:
https://stackoverflow.com/questions/50129411/why-is-predicate-pushdown-not-used-in-typed-dataset-api-vs-untyped-dataframe-ap


that filters are not pushed down in typed Datasets and one should rather
stick to Dataframes.

But writing code for groupByKey + mapGroups is a headache with Dataframes
if compared to typed Dataset. The former mostly doesn't force you to write
any Encoders (unless you try to write generic transformations on
parametrized Dataset[T]) . Neither typed Dataset forces you to do an ugly
Row parsing with getInteger, getString, etc -- like it is needed with
Dataframes.

So, what should the poor Spark user rely on by default, if the goal is to
deliver a library of  data transformations -- Dataset or Dataframe?

best regards
--
Valery

Dataframe vs Dataset dilemma: either Row parsing or no filter push-down

Reply via email to