Re: How to increase maxResultSize with Spark 2.1.1

Shane Johnson Fri, 23 Feb 2018 13:37:40 -0800

Thanks Donald, I saw that as well and am re-running after removing the
extra double dash in the train command. I think I am going to use the
PAlgorithm as it will open up more options. Thanks for the support.


*Shane Johnson | 801.360.3350*
LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
<https://www.facebook.com/shane.johnson.71653>

2018-02-23 8:03 GMT-10:00 Donald Szeto <[email protected]>:

> Hey Shane,
>
> Quick correction: there was an extra double dash in your train command.
> Please try
>
> bin/pio train -- --driver-memory 14G --conf spark.driver.maxResultSize=4g
>
> Regards,
> Donald
>
> On Fri, Feb 23, 2018 at 8:10 AM Shane Johnson <
> [email protected]> wrote:
>
>> Thanks Donald. I used this command but am still getting this error. It
>> doesn't seem to be adjusting the configuration. Do you see a problem in how
>> I used the spark-submit options. The train function ran but the error makes
>> me think the sparkResultSize was not adjusted.
>>
>> command:
>>
>> bin/pio build --verbose; bin/pio train -- --driver-memory 14G -- --conf
>> spark.driver.max
>> ResultSize=4g; bin/pio deploy
>>
>> error:
>>
>> Job aborted due to stage failure: Total
>> size of serialized results of 8 tasks (1236.7 MB) is bigger than
>> spark.driver.maxResultSize (1024.0
>> MB)
>>
>> Regarding the PAlgorithm, what I am trying to do is save a Map in the
>> train method to reuse in the predict method. Because of the error above I
>> am not able to convert my RDD to a map as the collectAsMap tries to bring
>> it to the driver. If I use the PAlgorithm, I should be able to just save
>> the RDD in the Model class and then use it in the predict method. I am
>> going down that path now. *Do you know of any templates that are using
>> the PAlgorithm?* The docs say that "Similar Product" uses it but it
>> looks like it uses the P2LAlgorithm.
>>
>> Thank you for your help.
>>
>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>> <https://www.facebook.com/shane.johnson.71653>
>>
>> 2018-02-22 8:16 GMT-10:00 Donald Szeto <[email protected]>:
>>
>>> Hi Shane,
>>>
>>> I think what you are looking for to set max result size on the driver is
>>> by passing in a spark-submit argument that looks something like this:
>>>
>>> pio train ... -- --conf spark.driver.maxResultSize=4g ...
>>>
>>> Regarding PAlgorithm, the predict() method does not actually have the
>>> SparkContext in it (http://predictionio.apache.
>>> org/api/current/#org.apache.predictionio.controller.PAlgorithm). The
>>> "model" argument, unlike P2LAlgorithm, can contain RDDs. In
>>> PAlgorithm.predict(), you would be able to perform RDD operations directly
>>> on the model argument. If the SparkContext is needed, the context() method
>>> can be used on the model RDD.
>>>
>>> Hope these help.
>>>
>>> Regards,
>>> Donald
>>>
>>> On Wed, Feb 21, 2018 at 12:08 PM Shane Johnson <
>>> [email protected]> wrote:
>>>
>>>> Hi team,
>>>>
>>>> We have a specific use case where we are trying to save off a map from
>>>> the train function and reuse it in the predict function to increase our
>>>> predict function response time. I know the collect() forces everything to
>>>> the driver. We are collecting the RDD to a map as we don't have a spark
>>>> context in the predict function.
>>>>
>>>> I am getting this error and am looking for a way to adjust the
>>>> parameter from 1G to 4G+. I can see a way to do it in Spark 1.6 but we are
>>>> using Spark 2.1.1 and I have not seen the ability to set this. *Has
>>>> anyone been able to adjust the maxResultSize to something more than 1G?*
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted 
>>>> due to stage failure: Total size of serialized results of 7 tasks (1156.3 
>>>> MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
>>>>
>>>>
>>>> I have tried to set this parameter but get this as a result with Spark
>>>> 2.1.1
>>>>
>>>> Error: Unrecognized option: --driver-maxResultSize
>>>>
>>>> Our other option is to do the work to obtain a spark context in the
>>>> predict function so we can pass the RDD through from the train to predict
>>>> function. The documentation was a little unclear to me on PredictionIO. *Is
>>>> this the right place to learn how to get a spark context in the predict
>>>> function?* https://predictionio.incubator.apache.
>>>> org/templates/vanilla/dase/
>>>>
>>>> Also I am not seeing in this documentation how to get the spark context
>>>> into the predict function, it looks like it is only used in the train
>>>> function.
>>>>
>>>> Thanks in advance for your expertise.
>>>>
>>>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>>>> <https://www.facebook.com/shane.johnson.71653>
>>>>
>>>
>>

Re: How to increase maxResultSize with Spark 2.1.1

Reply via email to