Hi,
There are some functions like map, flatMap, reduce and ..., that construct
the base data processing operation in big data (and Apache Spark). But
Spark, in new versions, introduces the high-level Dataframe API and
recommend using it. This is while there are no such functions in Dataframe
API and it just has many built-in functions and the UDF. It's very
inflexible (at least to me) and I at many points should convert
Dataframes to RDD and vice-versa. My question is:
Is RDD going to be outdated and if so, what is the correct road-map to do
processing using Apache Spark, while Dataframe doesn't support functions
like Map and reduce? How UDF functions process the data, they will apply to
every row, like map functions? Are converting Dataframe to RDD comes with
many costs?

Reply via email to