The need for Spark at query time depends on the engine. Which are you
using? The Universal Recommender, which I maintain, does not require Spark
for queries but uses PIO. We simply don’t use the Spark context so it is
ignored. To make PIO work you need to have the Spark code accessible but
that doesn’t mean there must be a Spark cluster, you can  set the Spark
master to “local” and there are no Spark resources used in the deployed pio

We have infra code to spin up a Spark cluster for training and bring it
back down afterward. This all works just fine. The UR PredictionServer also
has no need to be re-deployed since the model is hot-swapped after
training, Deploy once run forever. And no real requirement for Spark to do

So depending on the Engine the requirement for Spark is code level not
system level.

From: Donald Szeto <> <>
Reply: <>
Date: April 13, 2018 at 4:48:15 PM
To: <>
Subject:  Re: pio deploy without spark context

Hi George,

This is unfortunately not possible now without modifying the source code,
but we are planning to refactor PredictionIO to be runtime-agnostic,
meaning the engine server would be independent and SparkContext would not
be created if not necessary.

We will start a discussion on the refactoring soon. You are very welcome to
add your input then, and any subsequent contribution would be highly


On Fri, Apr 13, 2018 at 3:51 PM George Yarish <>

> Hi all,
> We use pio engine which doesn't require apache spark in serving time, but
> from my understanding anyway sparkContext will be created by "pio deploy"
> process by default.
> My question is there any way to deploy an engine avoiding creation of
> spark application if I don't need it?
> Thanks,
> George

Reply via email to