Could you please provide links to resources on the PIO that supports multi-tenancy with lightweight Actors one per tenant.
On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <[email protected]> wrote: > I’m the maintainer of the Universal Recommender. We have OSS support at > https://groups.google.com/forum/#!forum/actionml-user > > Do you wish to take advantage of the same user being in multiple > datasets/tenants? The answer below is assuming no. > > There are several ways to do this. First the PIO EventServer is > multi-tenant, just keep data in separate “apps” which really should be > named “datasets” they are IDed by keys generated when you do `pio app new > <your-app-name> > > The PredictionServer is not multi-tenant but you can put a separate > process on different ports. You would train each tenant from a different > directory containing the UR and the correct engine.json for that > tenant/dataset. Then deploy it on some port that is specific to the > tenant/model. This will create somewhat heavyweight processes for each port. > > We have a version of PIO that supports multi-tenancy with lightweight > Actors one per tenant. You deploy with a resource-id and when you make > queries include the REST resource id in the URI. All engines are on the > same port running in the same process so it’s very light-weight and > performant. Otherwise the query works the same. Private message me to hear > more. > > I would not advise the item property method, unless you know there is no > overlap in user-ids it may produce undesired results in the model and these > may leak into recommendations. You can solve that with a filter (instead of > the boost below) but there are better ways to solve this. > > > > On Sep 8, 2016, at 4:08 PM, David Jones <[email protected]> wrote: > > Hi All, > > I have a use case where I have events coming in from many seperate tenants > and I want to use the Universal Product Recommender engine. The challenge > is separating data from each tenant throughout the PIO process. > > I can think of three possible ways to solve this issue, but they all have > tradeoffs: > > *1) Create Multiple Apps* > > You have one app per tenant. When you create events, you use the access > key specific to that tenant. Then you query for recommendations using that > same access key to get recommendations for just that app. > > Issue: each engine has to specify an “appName” in engine.json. So now you > have to have an engine per tenant (AKA app) that has all the same source > code except for the “appName” will be different. > > This’ll result in a bunch of duplicated code and you’ll have to train and > deploy each one individually. > > There is also no API for creating apps, so something will need to be > created to bridge that to allow a new tenant to be on boarded. > > *2) Use Channels* > > You create one app, but create a channel per tenant. When you create an > event you specific the channel. > > Issue: the Universal Recommender engine can be modified to look at data > for a single channel name but that name cannot be dynamically queried, > it’ll be hardcoded into DataSource.scala. So now you’re in this same > situation where you’ll need to create one engine per tenant, where each > engine has the exact same source code except a one line change in the > DataSource.scala file. > > *3) Use Product Properties* > > Provided your user ids are unique over all tenants, you could set a > property on each product with a tenant id. > > This way you can use one app, one engine, and simply query for > recommendations and supply a significant bias to products that contain the > tenant id property. > > Example, give me the top recommendations for user xyz who is on tenant_id > 12. > > { > "user": "xyz", > "fields": [ > { > "name": tenant_id", > "values": ["12"], > "bias": 10 > } > ] > } > > Issues: since all the data for all tenants is in one place, you’re going > to have to train over all tenant’s data each time. There’s also issues > around risk of deleting data from the wrong tenant should a tenant leave. > > - > I was wondering if anyone has done something to any of these options? > Perhaps there are other options? Are there any better ones? I’m thinking > option 3) might be the best for our needs. > > Thanks, > David. > >
