Donald, Pat, great to hear that this is a well-pondered design challenge of PIO 
😄 The prototype, composable, all-in-one server sounds promising.

I'm wondering if there's a more immediate possibility to address adding the 
`/events` REST API to Engine? Would it make sense to try invoking an 
`EventServiceActor` in the tools.commands.Engine#deploy method? If that would 
be a distasteful hack, just say so. I'm trying to understand possibility of 
solving this in the current codebase vs a visionary new version of PIO.

*Mars

( <> .. <> )

> On Jun 28, 2017, at 18:01, Pat Ferrel <[email protected]> wrote:
> 
> Ah, one of my favorite subjects.
> 
> I’m working on a prototype server that handles online learning as well as 
> Lambda style. There is only one server with everything going through REST. 
> There are 2 resource types, Engines and Commands. Engines have REST APIs with 
> endpoints for Events and Queries. So something like POST 
> /engines/resouce-id/events would send an event to what is like a PIO app and 
> POST /engine/resource-id/queries does the PIO query equivalent. Note that 
> this is fully multi-tenant and has only one important id. It’s based on 
> akka-http in a fully microservice type architecture. While the Server is 
> running you can add completely new Templates for any algorithm, thereby 
> adding new endpoints for Events and Queries. Each “tenant” is super 
> lightweight since it’s just an Actor not a new JVM. The CLI is actually 
> Python that hits the REST API with a Python SDK, and there is a Java SDK too. 
> We support SSL and OAuth2 so having those baked into an SDK is really 
> important. Though a prototype it can support multi-tenant SaaS.
> 
> We have a prototype online learner Template which does not save events at all 
> though it ingests events exactly like PIO in the same format in fact we have 
> the same template for both servers taking identical input. Instead of an 
> EventServer it mirrors received events events before validation (yes we have 
> full event validation that is template specific.) This allows some events to 
> affect mutable data in a database and some to just be an immutable stream or 
> even be thrown away for Kappa learners. For an online learner, each event 
> updates the model, which is stored periodically as a watermark. If you want 
> to change algo params you destroy the engine instance and replay the mirrored 
> events. For a Lambda learner the Events may be stored like PIO. 
> 
> This is very much along the lines of the proposal I put up for future PIO but 
> the philosophy internally is so different that I’m now not sure how it would 
> fit. I’d love to talk about it sometime and once we do a Lambda Template 
> we’ll at least have some nice comparisons to make. We migrated the Kappa 
> style Template to it so we have a good idea that it’s not that hard. I’d love 
> to donate it to PIO but only if it makes sense.
> 
> 
> On Jun 28, 2017, at 4:27 PM, Donald Szeto <[email protected]> wrote:
> 
> Hey Mars,
> 
> Thanks for the suggestion and I agree with your point on the metadata part. 
> Essentially I think the app and channel concept should be instead logically 
> grouped together with event, not metadata.
> 
> I think in some advanced use cases, event storage should not even be a hard 
> requirement as engine templates can source data differently. In the long run, 
> it might be cleaner to have event server (and all relevant concepts such as 
> its API, access keys, apps, etc) as a separable package, that is by default 
> turned on, embedded to engine server. Advanced users can either make it 
> standalone or even turn it off completely.
> 
> I imagine this kind of refactoring would echo Pat's proposal on making a 
> clean and separate engine and metadata management system down the road.
> 
> Regards,
> Donald
> 
> On Wed, Jun 28, 2017 at 3:29 PM Mars Hall <[email protected]> wrote:
> One of the ongoing challenges we face with PredictionIO is the separation of 
> Engine & Eventserver APIs. This separation leads to several problems:
> 
> 1. Deploying a complete PredictionIO app requires multiple processes, each 
> with its own network listener
> 2. Eventserver & Engine must be configured to share exactly the same storage 
> backends (same `pio-env.sh`)
> 3. Confusion between "Eventserver" (an optional REST API) & "event storage" 
> (a required database)
> 
> These challenges are exacerbated by the fact that PredictionIO's docs & `pio 
> app` CLI make it appear that sharing an Eventserver between Engines is a good 
> idea. I recently filed a JIRA issue about this topic. TL;DR sharing an 
> eventserver between engines with different Meta Storage config will cause 
> data corruption:
>   https://issues.apache.org/jira/browse/PIO-96
> 
> 
> I believe a lot of these issues could be alleviated with one change to 
> PredictionIO core:
> 
> By default, expose the Eventserver API from the `pio deploy` Engine process, 
> so that it is not necessary to deploy a second Eventserver-only process. 
> Separate `pio eventserver` could still be optional if you need the separation 
> of concerns for scalability.
> 
> 
> I'd love to hear what you folks think. I will file a JIRA enhancement issue 
> if this seems like an acceptable approach.
> 
> *Mars Hall
> Customer Facing Architect
> Salesforce Platform / Heroku
> San Francisco, California
> 
> 

Reply via email to