This sounds like a good case for Donald’s suggestion. 

What I was trying to add to the discussion is a way to make all commands rely 
on state in the megastore, rather than any file on any machine in a cluster or 
on ordering of execution or execution from a location in a directory structure. 
All commands would then be stateless.

This enables real use cases like provisioning PIO machines and running `pio 
deploy <resource-id>` to get a new PredictionServer. Provisioning can be 
container and discovery based rather cleanly.


On Sep 17, 2016, at 5:26 PM, Mars Hall <m...@heroku.com> wrote:

Hello folks,

Great to hear about this possibility. I've been working on running PredictionIO 
on Heroku https://www.heroku.com

Heroku's 12-factor architecture https://12factor.net prefers "stateless builds" 
to ensure that compiled artifacts result in processes which may be cheaply 
restarted, replaced, and scaled via process count & size. I imagine this 
stateless property would be valuable for others as well.

The fact that `pio build` inserts stateful metadata into a database causes 
ripples throughout the lifecycle of PIO engines on Heroku:

* An engine cannot be built for production without the production database 
available. When a production database contains PII (personally identifiable 
information) which has security compliance requirements, the build system may 
not be privileged to access that PII data. This also affects CI (continuous 
integration/testing), where engines would need to be rebuilt in production, 
defeating assurances CI is supposed to provide.

* The build artifacts cannot be reliably reused. "Slugs" at Heroku are intended 
to be stateless, so that you can rollback to a previous version during the 
lifetime of an app. With `pio build` causing database side-effects, there's a 
greater-than-zero probability of slug-to-metadata inconsistencies eventually 
surfacing in a long-running system.


From my user-perspective, a few changes to the CLI would fix it:

1. add a "skip registration" option, `pio build --without-engine-registration`
2. a new command `pio app register` that could be run separately in the built 
engine (before training)

Alas, I do not know PredictionIO internals, so I can only offer a suggestion 
for how this might be solved.


Donald, one specific note,

Regarding "No automatic version matching of PIO binary distribution and 
artifacts version used in the engine template":

The Heroku slug contains the PredictionIO binary distribution used to build the 
engine, so there's never a version matching issue. I guess some systems might 
deploy only the engine artifacts to production where a pre-existing PIO binary 
is available, but that seems like a risky practice for long-running systems.


Thanks for listening,

*Mars Hall
Customer Facing Architect
Salesforce App Cloud / Heroku
San Francisco, California

> On Sep 16, 2016, at 10:42, Donald Szeto <don...@apache.org> wrote:
> 
> Hi all,
> 
> I want to start the discussion of removing engine registration. How many 
> people actually take advantage of being able to run pio commands everywhere 
> outside of an engine template directory? This will be a nontrivial change on 
> the operational side so I want to gauge the potential impact to existing 
> users.
> 
> Pros:
> - Stateless build. This would work well with many PaaS.
> - Eliminate the "pio build" command once and for all.
> - Ability to use your own build system, i.e. Maven, Ant, Gradle, etc.
> - Potentially better experience with IDE since engine templates no longer 
> depends on an SBT plugin.
> 
> Cons:
> - Inability to run pio engine training and deployment commands outside of 
> engine template directory.
> - No automatic version matching of PIO binary distribution and artifacts 
> version used in the engine template.
> - A less unified user experience: from pio-build-train-deploy to build, then 
> pio-train-deploy.
> 
> Regards,
> Donald


Reply via email to