The need for Spark at query time depends on the engine. Which are you using? The Universal Recommender, which I maintain, does not require Spark for queries but uses PIO. We simply don’t use the Spark context so it is ignored. To make PIO work you need to have the Spark code accessible but that doesn’t mean there must be a Spark cluster, you can set the Spark master to “local” and there are no Spark resources used in the deployed pio PredictionServer.
We have infra code to spin up a Spark cluster for training and bring it back down afterward. This all works just fine. The UR PredictionServer also has no need to be re-deployed since the model is hot-swapped after training, Deploy once run forever. And no real requirement for Spark to do queries. So depending on the Engine the requirement for Spark is code level not system level. From: Donald Szeto <[email protected]> <[email protected]> Reply: [email protected] <[email protected]> <[email protected]> Date: April 13, 2018 at 4:48:15 PM To: [email protected] <[email protected]> <[email protected]> Subject: Re: pio deploy without spark context Hi George, This is unfortunately not possible now without modifying the source code, but we are planning to refactor PredictionIO to be runtime-agnostic, meaning the engine server would be independent and SparkContext would not be created if not necessary. We will start a discussion on the refactoring soon. You are very welcome to add your input then, and any subsequent contribution would be highly appreciated. Regards, Donald On Fri, Apr 13, 2018 at 3:51 PM George Yarish <[email protected]> wrote: > Hi all, > > We use pio engine which doesn't require apache spark in serving time, but > from my understanding anyway sparkContext will be created by "pio deploy" > process by default. > My question is there any way to deploy an engine avoiding creation of > spark application if I don't need it? > > Thanks, > George > >
