>
> bq. In many cases, the current implementation of the Dataset API does not
> yet leverage the additional information it has and can be slower than RDDs.
>
>
Are the characteristics of cases above known so that users can decide which
> API to use ?
>

Lots of back to back operations aren't great yet because we serialize
deseriaize unnecessarily.  For example:
https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/DatasetPerformance.scala#L37


>
> For custom encoders, I did a quick search but didn't find the JIRA number.
> Can you share the JIRA number ?
>

This is probably the closest thing:
https://issues.apache.org/jira/browse/SPARK-7768

Reply via email to