How to increase maxResultSize with Spark 2.1.1

Shane Johnson Wed, 21 Feb 2018 13:24:39 -0800

Hi team,

We have a specific use case where we are trying to save off a map from the
train function and reuse it in the predict function to increase our predict
function response time. I know the collect() forces everything to the
driver. We are collecting the RDD to a map as we don't have a spark context
in the predict function.


I am getting this error and am looking for a way to adjust the parameter
from 1G to 4G+. I can see a way to do it in Spark 1.6 but we are using
Spark 2.1.1 and I have not seen the ability to set this. *Has anyone been
able to adjust the maxResultSize to something more than 1G?*

Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Total size of serialized results of 7
tasks (1156.3 MB) is bigger than spark.driver.maxResultSize (1024.0
MB)


I have tried to set this parameter but get this as a result with Spark 2.1.1

Error: Unrecognized option: --driver-maxResultSize

Our other option is to do the work to obtain a spark context in the predict
function so we can pass the RDD through from the train to predict function.
The documentation was a little unclear to me on PredictionIO. *Is this the
right place to learn how to get a spark context in the predict function?*
https://predictionio.incubator.apache.org/templates/vanilla/dase/

Also I am not seeing in this documentation how to get the spark context
into the predict function, it looks like it is only used in the train
function.

Thanks in advance for your expertise.

*Shane Johnson | 801.360.3350*
LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
<https://www.facebook.com/shane.johnson.71653>

How to increase maxResultSize with Spark 2.1.1

Reply via email to