This is how the Flink guys are doing it -
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673

They call them FLIP (Flink Improvement Process) similar to KLIP in Kafka.

You could use a shared google doc as well as the Wiki and Mailing lists.
Make sure that the Wiki is the final repository of everything.



On Thu, Sep 22, 2016 at 7:52 PM, Donald Szeto <don...@apache.org> wrote:

> (Dropping user list for dev activities.)
>
> Sounds good. Let's start this collaboration. We should establish a common
> place for collaborative design documents. Do you guys feel like using a
> shared Google Drive, or the Apache wiki?
>
> On Wed, Sep 21, 2016 at 3:09 PM, Marcin Ziemiński <ziem...@gmail.com>
> wrote:
>
> > General purpose registry for service discovery is a much bigger thing. We
> > should first think about how we could make PIO more modular and divide it
> > into some logical parts, which could be abstracted and then turned to
> > services, before deciding to create some kind of general registry. There
> > was an issue brought up of creating an admin server as an alternative to
> > Console. The same for the eventserver, which could be treated as a very
> > special case of service responsible for providing eventdata. The serving
> > part of PIO is also another example.
> >
> > As Donald mentioned, it would be sensible create some shared doc, where
> we
> > could try to come up with new design decisions and outline the steps to
> get
> > there. I suppose that discussing one thing such as refactoring a manifest
> > might lead to other changes and propositions in different areas. I'd be
> > willing to help with that.
> >
> > śr., 21.09.2016 o 22:44 użytkownik Pat Ferrel <p...@occamsmachete.com>
> > napisał:
> >
> >> What do you think about using a general purpose registry, that can also
> >> be used to discover cluster machines, or microservices?
> >> Something like consul.io or docker swarm with and ASF compatible
> >> license? This would be a real step into the future and since some work
> is
> >> needed anyway…
> >>
> >> I think Donald is right that much of this can be made optional—with a
> >> mind towards making a single machine install easy and a cluster install
> >> almost as easy
> >>
> >>
> >> On Sep 21, 2016, at 1:18 PM, Donald Szeto <don...@apache.org> wrote:
> >>
> >> I second with removing engine manifests and add a separate registry for
> >> other meta data (such as where to push engine code, models, and misc.
> >> discovery).
> >>
> >> The current design is a result of realizing the need that producing
> >> predictions from the model requires custom code (scoring function) as
> >> well.
> >> We have bundled training code, predicting (scoring) code together as an
> >> engine, different input parameters as different engine variants, and
> >> engine
> >> instances as an immutable list of metadata that points to an engine,
> >> engine
> >> variant, and trained models. We can definitely draw clearer boundaries
> and
> >> names. We should start a design doc somewhere. Any suggestions?
> >>
> >> I propose to start by making registration optional, then start to
> refactor
> >> manifest and build a proper engine registry.
> >>
> >> Regards,
> >> Donald
> >>
> >> On Wed, Sep 21, 2016 at 12:29 PM, Marcin Ziemiński <ziem...@gmail.com>
> >> wrote:
> >>
> >> > I think that getting rid of the manifest.json and introducing a new
> kind
> >> > of resourse-id for an engine to be registered is a good idea.
> >> >
> >> > Currently in the repository there are three important keys:
> >> > * engine id
> >> > * engine version - depends only on the path the engine was built at to
> >> > distinguish copies
> >> > * engine instance id - because of the name may be associated with the
> >> > engine itself, but in fact is the identificator of trained models for
> an
> >> > engine.
> >> > When running deploy you either get the latest trained model for the
> >> > engine-id and engine-version, what strictly ties it to the location it
> >> was
> >> > compiled at or you specify engine instance id. I am not sure, but I
> >> think
> >> > that in the latter case you could get a model for a completely
> different
> >> > engine, what could potentially fail because of initialization with
> >> improper
> >> > parameters.
> >> > What is more, the engine object creation relies only on the full name
> of
> >> > the EngineFactory, so the actual engine, which gets loaded is
> >> determined by
> >> > the current CLASSPATH. I guess that it is probably the place, which
> >> should
> >> > be modified if we want a multi-tenant architecture.
> >> > I have to admit that these things hadn't been completely clear to me,
> >> > until I went through the code.
> >> >
> >> > We could introduce a new type of service for engine and model
> >> management.
> >> > I like the idea of the repository to push built engines under chosen
> >> ids.
> >> > We could also add some versioning of them if necessary.
> >> > I treat this approach purely as some kind of package management
> system.
> >> >
> >> > As Pat said, a similar approach would let us rely only on the
> repository
> >> > and thanks to that run pio commands regardless of the machine and
> >> location.
> >> >
> >> > Separating the engine part from the rest of PIO could potentially
> enable
> >> > us to come up with different architectures in the future and push us
> >> > towards micro-services ecosystem.
> >> >
> >> > What do you think of separating models from engines in more visible
> >> way? I
> >> > mean that engine variants in terms of algorithm parameters are more
> like
> >> > model variants. I just see an engine only as code being a dependency
> for
> >> > application related models/algorithms. So you would register an engine
> >> - as
> >> > a code once and run training for some domain specific data (app) and
> >> > algorithm parameters, what would result in a different identifier,
> that
> >> > would be later used for deployment.
> >> >
> >> > Regards,
> >> > Marcin
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > niedz., 18.09.2016 o 20:02 użytkownik Pat Ferrel <
> p...@occamsmachete.com
> >> >
> >> > napisał:
> >> >
> >> >> This sounds like a good case for Donald’s suggestion.
> >> >>
> >> >> What I was trying to add to the discussion is a way to make all
> >> commands
> >> >> rely on state in the megastore, rather than any file on any machine
> in
> >> a
> >> >> cluster or on ordering of execution or execution from a location in a
> >> >> directory structure. All commands would then be stateless.
> >> >>
> >> >> This enables real use cases like provisioning PIO machines and
> running
> >> >> `pio deploy <resource-id>` to get a new PredictionServer.
> Provisioning
> >> can
> >> >> be container and discovery based rather cleanly.
> >> >>
> >> >>
> >> >> On Sep 17, 2016, at 5:26 PM, Mars Hall <m...@heroku.com> wrote:
> >> >>
> >> >> Hello folks,
> >> >>
> >> >> Great to hear about this possibility. I've been working on running
> >> >> PredictionIO on Heroku https://www.heroku.com
> >> >>
> >> >> Heroku's 12-factor architecture https://12factor.net prefers
> >> "stateless
> >> >> builds" to ensure that compiled artifacts result in processes which
> >> may be
> >> >> cheaply restarted, replaced, and scaled via process count & size. I
> >> imagine
> >> >> this stateless property would be valuable for others as well.
> >> >>
> >> >> The fact that `pio build` inserts stateful metadata into a database
> >> >> causes ripples throughout the lifecycle of PIO engines on Heroku:
> >> >>
> >> >> * An engine cannot be built for production without the production
> >> >> database available. When a production database contains PII
> (personally
> >> >> identifiable information) which has security compliance requirements,
> >> the
> >> >> build system may not be privileged to access that PII data. This also
> >> >> affects CI (continuous integration/testing), where engines would need
> >> to be
> >> >> rebuilt in production, defeating assurances CI is supposed to
> provide.
> >> >>
> >> >> * The build artifacts cannot be reliably reused. "Slugs" at Heroku
> are
> >> >> intended to be stateless, so that you can rollback to a previous
> >> version
> >> >> during the lifetime of an app. With `pio build` causing database
> >> >> side-effects, there's a greater-than-zero probability of
> >> slug-to-metadata
> >> >> inconsistencies eventually surfacing in a long-running system.
> >> >>
> >> >>
> >> >> From my user-perspective, a few changes to the CLI would fix it:
> >> >>
> >> >> 1. add a "skip registration" option, `pio build
> >> >> --without-engine-registration`
> >> >> 2. a new command `pio app register` that could be run separately in
> the
> >> >> built engine (before training)
> >> >>
> >> >> Alas, I do not know PredictionIO internals, so I can only offer a
> >> >> suggestion for how this might be solved.
> >> >>
> >> >>
> >> >> Donald, one specific note,
> >> >>
> >> >> Regarding "No automatic version matching of PIO binary distribution
> and
> >> >> artifacts version used in the engine template":
> >> >>
> >> >> The Heroku slug contains the PredictionIO binary distribution used to
> >> >> build the engine, so there's never a version matching issue. I guess
> >> some
> >> >> systems might deploy only the engine artifacts to production where a
> >> >> pre-existing PIO binary is available, but that seems like a risky
> >> practice
> >> >> for long-running systems.
> >> >>
> >> >>
> >> >> Thanks for listening,
> >> >>
> >> >> *Mars Hall
> >> >> Customer Facing Architect
> >> >> Salesforce App Cloud / Heroku
> >> >> San Francisco, California
> >> >>
> >> >>> On Sep 16, 2016, at 10:42, Donald Szeto <don...@apache.org> wrote:
> >> >>>
> >> >>> Hi all,
> >> >>>
> >> >>> I want to start the discussion of removing engine registration. How
> >> >> many people actually take advantage of being able to run pio commands
> >> >> everywhere outside of an engine template directory? This will be a
> >> >> nontrivial change on the operational side so I want to gauge the
> >> potential
> >> >> impact to existing users.
> >> >>>
> >> >>> Pros:
> >> >>> - Stateless build. This would work well with many PaaS.
> >> >>> - Eliminate the "pio build" command once and for all.
> >> >>> - Ability to use your own build system, i.e. Maven, Ant, Gradle,
> etc.
> >> >>> - Potentially better experience with IDE since engine templates no
> >> >> longer depends on an SBT plugin.
> >> >>>
> >> >>> Cons:
> >> >>> - Inability to run pio engine training and deployment commands
> outside
> >> >> of engine template directory.
> >> >>> - No automatic version matching of PIO binary distribution and
> >> >> artifacts version used in the engine template.
> >> >>> - A less unified user experience: from pio-build-train-deploy to
> >> build,
> >> >> then pio-train-deploy.
> >> >>>
> >> >>> Regards,
> >> >>> Donald
> >> >>
> >> >>
> >> >>
> >>
> >>
>

Reply via email to