Hey Shane, Quick correction: there was an extra double dash in your train command. Please try
bin/pio train -- --driver-memory 14G --conf spark.driver.maxResultSize=4g Regards, Donald On Fri, Feb 23, 2018 at 8:10 AM Shane Johnson <shanewaldenjohn...@gmail.com> wrote: > Thanks Donald. I used this command but am still getting this error. It > doesn't seem to be adjusting the configuration. Do you see a problem in how > I used the spark-submit options. The train function ran but the error makes > me think the sparkResultSize was not adjusted. > > command: > > bin/pio build --verbose; bin/pio train -- --driver-memory 14G -- --conf > spark.driver.max > ResultSize=4g; bin/pio deploy > > error: > > Job aborted due to stage failure: Total > size of serialized results of 8 tasks (1236.7 MB) is bigger than > spark.driver.maxResultSize (1024.0 > MB) > > Regarding the PAlgorithm, what I am trying to do is save a Map in the > train method to reuse in the predict method. Because of the error above I > am not able to convert my RDD to a map as the collectAsMap tries to bring > it to the driver. If I use the PAlgorithm, I should be able to just save > the RDD in the Model class and then use it in the predict method. I am > going down that path now. *Do you know of any templates that are using > the PAlgorithm?* The docs say that "Similar Product" uses it but it looks > like it uses the P2LAlgorithm. > > Thank you for your help. > > *Shane Johnson | 801.360.3350* > LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook > <https://www.facebook.com/shane.johnson.71653> > > 2018-02-22 8:16 GMT-10:00 Donald Szeto <don...@apache.org>: > >> Hi Shane, >> >> I think what you are looking for to set max result size on the driver is >> by passing in a spark-submit argument that looks something like this: >> >> pio train ... -- --conf spark.driver.maxResultSize=4g ... >> >> Regarding PAlgorithm, the predict() method does not actually have the >> SparkContext in it ( >> http://predictionio.apache.org/api/current/#org.apache.predictionio.controller.PAlgorithm). >> The "model" argument, unlike P2LAlgorithm, can contain RDDs. In >> PAlgorithm.predict(), you would be able to perform RDD operations directly >> on the model argument. If the SparkContext is needed, the context() method >> can be used on the model RDD. >> >> Hope these help. >> >> Regards, >> Donald >> >> On Wed, Feb 21, 2018 at 12:08 PM Shane Johnson < >> shanewaldenjohn...@gmail.com> wrote: >> >>> Hi team, >>> >>> We have a specific use case where we are trying to save off a map from >>> the train function and reuse it in the predict function to increase our >>> predict function response time. I know the collect() forces everything to >>> the driver. We are collecting the RDD to a map as we don't have a spark >>> context in the predict function. >>> >>> I am getting this error and am looking for a way to adjust the parameter >>> from 1G to 4G+. I can see a way to do it in Spark 1.6 but we are using >>> Spark 2.1.1 and I have not seen the ability to set this. *Has anyone >>> been able to adjust the maxResultSize to something more than 1G?* >>> >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted due >>> to stage failure: Total size of serialized results of 7 tasks (1156.3 MB) >>> is bigger than spark.driver.maxResultSize (1024.0 MB) >>> >>> >>> I have tried to set this parameter but get this as a result with Spark >>> 2.1.1 >>> >>> Error: Unrecognized option: --driver-maxResultSize >>> >>> Our other option is to do the work to obtain a spark context in the >>> predict function so we can pass the RDD through from the train to predict >>> function. The documentation was a little unclear to me on PredictionIO. *Is >>> this the right place to learn how to get a spark context in the predict >>> function?* >>> https://predictionio.incubator.apache.org/templates/vanilla/dase/ >>> >>> Also I am not seeing in this documentation how to get the spark context >>> into the predict function, it looks like it is only used in the train >>> function. >>> >>> Thanks in advance for your expertise. >>> >>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>* >>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook >>> <https://www.facebook.com/shane.johnson.71653> >>> >> >