The key motivation behind this idea/request is to:
Simplify baseline PredictionIO deployment, both conceptually & technically.
My vision with this thread is to:
Enable single-process, single network-listener PredictionIO app deployment
(i.e. Queries & Events APIs in the same process.)
Attempting to address some previous questions & statements…
From Pat Ferrel on Tue, 11 Jul 2017 10:53:48 -0700 (PDT):
> how much of your problem is workflow vs installation vs bundling of APIs? Can
> you explain it more?
I am focused on deploying PredictionIO on Heroku via this buildpack:
https://github.com/heroku/predictionio-buildpack
Heroku is an app-centric platform, where each app gets a single routable
network port. By default apps get a URL like:
https://tdx-classi.herokuapp.com (an example PIO Classification engine)
Deploying a separate Eventserver app that must be configured to share storage
config & backends leads to all kinds of complexity, especially when
unsuspectingly a developer might want to deploy a new engine with a different
storage config but not realize that Eventserver is not simply shareable.
Despite a lot of docs & discussion suggesting its share-ability, there is
precious little documentation that presents how the multi-backend Storage
really works in PIO. (I didn't understand it until I read a bunch of Storage
source code.)
From Kenneth Chan on Tue, 11 Jul 2017 12:49:58 -0700 (PDT):
> For example, one can modify the classification to train a classifier on the
> same set of data used by recommendation.
…and later on Wed, 12 Jul 2017 13:44:01 -0700:
> My concern of embedding event server in engine is
> - what problem are we solving by providing an illusion that events are only
> limited for one engine?
This is a great ideal target, but the reality is that it takes some significant
design & engineering to reach that level of data share-ability. I'm not
suggesting that we do anything to undercut the possibilities of such a
distributed architecture. I suggest that we streamline PIO for everyone that is
not at that level of distributed architecture. Make PIO not *require* it.
The best example I have is that you can run Spark in local mode, without
worrying about any aspect of its ideal distributed purpose. (In fact
PredictionIO is built on this feature of Spark!) I don't know the history
there, but would imagine Spark was not always so friendly for small or embedded
tasks like this.
A huge part of my reality is seeing how many newcomers fumble around and get
frustrated. I'm looking at PredictionIO from a very Heroku-style perspective of
"how do we help [new] developers be successful", which is probably going to
seem like I want to take away capabilities. I just want to make the onramp more
graceful!
*Mars
( <> .. <> )