Dataset experimental interfaces

Andrew Old Tue, 18 Dec 2018 12:55:14 -0800

We are running Spark 2.2.0 in a hadoop cluster and I worked on a proof of
concept to read event based data into Spark Datasets and operating over
those sets to calculate differences between the event data.


More specifically, ordered position data with odometer values and wanting
to calculate the number of miles traveled within certain jurisdictions by
vehicle.

My prototype utilizes some Dataset interfaces (such as map (using
Encoders), groupByKey) that are marked experimental (even in the 2.4.0
release).  While I understand experimental means that changes may occur in
future releases, I would like to know if others would avoid using the
experimental interfaces in any production code at all costs?  We would have
control on when we would upgrade to newer versions of Spark so we can test
for compatibility when new releases come out but I'm still a bit hesitant
to count on these interfaces moving forward.

Since our prototype is showing success, we are considering using it for a
new application and I would like to get feedback on if I should consider
trying to re-work it using non-experimental interfaces while I have some
time.  So far I have found Datasets being great to use to process data and
would like to keep using them.

Thanks.

Dataset experimental interfaces

Reply via email to