Thanks Donald, I saw that as well and am re-running after removing the extra double dash in the train command. I think I am going to use the PAlgorithm as it will open up more options. Thanks for the support.
*Shane Johnson | 801.360.3350* LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook <https://www.facebook.com/shane.johnson.71653> 2018-02-23 8:03 GMT-10:00 Donald Szeto <[email protected]>: > Hey Shane, > > Quick correction: there was an extra double dash in your train command. > Please try > > bin/pio train -- --driver-memory 14G --conf spark.driver.maxResultSize=4g > > Regards, > Donald > > On Fri, Feb 23, 2018 at 8:10 AM Shane Johnson < > [email protected]> wrote: > >> Thanks Donald. I used this command but am still getting this error. It >> doesn't seem to be adjusting the configuration. Do you see a problem in how >> I used the spark-submit options. The train function ran but the error makes >> me think the sparkResultSize was not adjusted. >> >> command: >> >> bin/pio build --verbose; bin/pio train -- --driver-memory 14G -- --conf >> spark.driver.max >> ResultSize=4g; bin/pio deploy >> >> error: >> >> Job aborted due to stage failure: Total >> size of serialized results of 8 tasks (1236.7 MB) is bigger than >> spark.driver.maxResultSize (1024.0 >> MB) >> >> Regarding the PAlgorithm, what I am trying to do is save a Map in the >> train method to reuse in the predict method. Because of the error above I >> am not able to convert my RDD to a map as the collectAsMap tries to bring >> it to the driver. If I use the PAlgorithm, I should be able to just save >> the RDD in the Model class and then use it in the predict method. I am >> going down that path now. *Do you know of any templates that are using >> the PAlgorithm?* The docs say that "Similar Product" uses it but it >> looks like it uses the P2LAlgorithm. >> >> Thank you for your help. >> >> *Shane Johnson | 801.360.3350 <(801)%20360-3350>* >> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook >> <https://www.facebook.com/shane.johnson.71653> >> >> 2018-02-22 8:16 GMT-10:00 Donald Szeto <[email protected]>: >> >>> Hi Shane, >>> >>> I think what you are looking for to set max result size on the driver is >>> by passing in a spark-submit argument that looks something like this: >>> >>> pio train ... -- --conf spark.driver.maxResultSize=4g ... >>> >>> Regarding PAlgorithm, the predict() method does not actually have the >>> SparkContext in it (http://predictionio.apache. >>> org/api/current/#org.apache.predictionio.controller.PAlgorithm). The >>> "model" argument, unlike P2LAlgorithm, can contain RDDs. In >>> PAlgorithm.predict(), you would be able to perform RDD operations directly >>> on the model argument. If the SparkContext is needed, the context() method >>> can be used on the model RDD. >>> >>> Hope these help. >>> >>> Regards, >>> Donald >>> >>> On Wed, Feb 21, 2018 at 12:08 PM Shane Johnson < >>> [email protected]> wrote: >>> >>>> Hi team, >>>> >>>> We have a specific use case where we are trying to save off a map from >>>> the train function and reuse it in the predict function to increase our >>>> predict function response time. I know the collect() forces everything to >>>> the driver. We are collecting the RDD to a map as we don't have a spark >>>> context in the predict function. >>>> >>>> I am getting this error and am looking for a way to adjust the >>>> parameter from 1G to 4G+. I can see a way to do it in Spark 1.6 but we are >>>> using Spark 2.1.1 and I have not seen the ability to set this. *Has >>>> anyone been able to adjust the maxResultSize to something more than 1G?* >>>> >>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted >>>> due to stage failure: Total size of serialized results of 7 tasks (1156.3 >>>> MB) is bigger than spark.driver.maxResultSize (1024.0 MB) >>>> >>>> >>>> I have tried to set this parameter but get this as a result with Spark >>>> 2.1.1 >>>> >>>> Error: Unrecognized option: --driver-maxResultSize >>>> >>>> Our other option is to do the work to obtain a spark context in the >>>> predict function so we can pass the RDD through from the train to predict >>>> function. The documentation was a little unclear to me on PredictionIO. *Is >>>> this the right place to learn how to get a spark context in the predict >>>> function?* https://predictionio.incubator.apache. >>>> org/templates/vanilla/dase/ >>>> >>>> Also I am not seeing in this documentation how to get the spark context >>>> into the predict function, it looks like it is only used in the train >>>> function. >>>> >>>> Thanks in advance for your expertise. >>>> >>>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>* >>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook >>>> <https://www.facebook.com/shane.johnson.71653> >>>> >>> >>
