Donald, Pat, great to hear that this is a well-pondered design challenge of PIO 😄 The prototype, composable, all-in-one server sounds promising.
I'm wondering if there's a more immediate possibility to address adding the `/events` REST API to Engine? Would it make sense to try invoking an `EventServiceActor` in the tools.commands.Engine#deploy method? If that would be a distasteful hack, just say so. I'm trying to understand possibility of solving this in the current codebase vs a visionary new version of PIO. *Mars ( <> .. <> ) > On Jun 28, 2017, at 18:01, Pat Ferrel <[email protected]> wrote: > > Ah, one of my favorite subjects. > > I’m working on a prototype server that handles online learning as well as > Lambda style. There is only one server with everything going through REST. > There are 2 resource types, Engines and Commands. Engines have REST APIs with > endpoints for Events and Queries. So something like POST > /engines/resouce-id/events would send an event to what is like a PIO app and > POST /engine/resource-id/queries does the PIO query equivalent. Note that > this is fully multi-tenant and has only one important id. It’s based on > akka-http in a fully microservice type architecture. While the Server is > running you can add completely new Templates for any algorithm, thereby > adding new endpoints for Events and Queries. Each “tenant” is super > lightweight since it’s just an Actor not a new JVM. The CLI is actually > Python that hits the REST API with a Python SDK, and there is a Java SDK too. > We support SSL and OAuth2 so having those baked into an SDK is really > important. Though a prototype it can support multi-tenant SaaS. > > We have a prototype online learner Template which does not save events at all > though it ingests events exactly like PIO in the same format in fact we have > the same template for both servers taking identical input. Instead of an > EventServer it mirrors received events events before validation (yes we have > full event validation that is template specific.) This allows some events to > affect mutable data in a database and some to just be an immutable stream or > even be thrown away for Kappa learners. For an online learner, each event > updates the model, which is stored periodically as a watermark. If you want > to change algo params you destroy the engine instance and replay the mirrored > events. For a Lambda learner the Events may be stored like PIO. > > This is very much along the lines of the proposal I put up for future PIO but > the philosophy internally is so different that I’m now not sure how it would > fit. I’d love to talk about it sometime and once we do a Lambda Template > we’ll at least have some nice comparisons to make. We migrated the Kappa > style Template to it so we have a good idea that it’s not that hard. I’d love > to donate it to PIO but only if it makes sense. > > > On Jun 28, 2017, at 4:27 PM, Donald Szeto <[email protected]> wrote: > > Hey Mars, > > Thanks for the suggestion and I agree with your point on the metadata part. > Essentially I think the app and channel concept should be instead logically > grouped together with event, not metadata. > > I think in some advanced use cases, event storage should not even be a hard > requirement as engine templates can source data differently. In the long run, > it might be cleaner to have event server (and all relevant concepts such as > its API, access keys, apps, etc) as a separable package, that is by default > turned on, embedded to engine server. Advanced users can either make it > standalone or even turn it off completely. > > I imagine this kind of refactoring would echo Pat's proposal on making a > clean and separate engine and metadata management system down the road. > > Regards, > Donald > > On Wed, Jun 28, 2017 at 3:29 PM Mars Hall <[email protected]> wrote: > One of the ongoing challenges we face with PredictionIO is the separation of > Engine & Eventserver APIs. This separation leads to several problems: > > 1. Deploying a complete PredictionIO app requires multiple processes, each > with its own network listener > 2. Eventserver & Engine must be configured to share exactly the same storage > backends (same `pio-env.sh`) > 3. Confusion between "Eventserver" (an optional REST API) & "event storage" > (a required database) > > These challenges are exacerbated by the fact that PredictionIO's docs & `pio > app` CLI make it appear that sharing an Eventserver between Engines is a good > idea. I recently filed a JIRA issue about this topic. TL;DR sharing an > eventserver between engines with different Meta Storage config will cause > data corruption: > https://issues.apache.org/jira/browse/PIO-96 > > > I believe a lot of these issues could be alleviated with one change to > PredictionIO core: > > By default, expose the Eventserver API from the `pio deploy` Engine process, > so that it is not necessary to deploy a second Eventserver-only process. > Separate `pio eventserver` could still be optional if you need the separation > of concerns for scalability. > > > I'd love to hear what you folks think. I will file a JIRA enhancement issue > if this seems like an acceptable approach. > > *Mars Hall > Customer Facing Architect > Salesforce Platform / Heroku > San Francisco, California > >
