Hi Spark gurus, I was surprised to read here: https://stackoverflow.com/questions/50129411/why-is-predicate-pushdown-not-used-in-typed-dataset-api-vs-untyped-dataframe-ap
that filters are not pushed down in typed Datasets and one should rather stick to Dataframes. But writing code for groupByKey + mapGroups is a headache with Dataframes if compared to typed Dataset. The former mostly doesn't force you to write any Encoders (unless you try to write generic transformations on parametrized Dataset[T]) . Neither typed Dataset forces you to do an ugly Row parsing with getInteger, getString, etc -- like it is needed with Dataframes. So, what should the poor Spark user rely on by default, if the goal is to deliver a library of data transformations -- Dataset or Dataframe? best regards -- Valery