Saving results of GroupByKey?

Cory Tucker Fri, 19 Aug 2016 12:39:18 -0700

I have a fairly large data set that I need to perform a GroupByKey on.
This is by far the most time consuming part of my pipeline and I'm looking
for ways to optimize it.  The data is somewhat static and only changes
periodically so it pains me to have to wait on the GBK to happen every time
I want to run the pipeline.  Is there any way to cache the result of the
operation and load the data each time already grouped?


thanks
--Cory

Saving results of GroupByKey?

Reply via email to