Think of the recommender as a single app. It is scalable to whatever your data size via the services it is built on. We often see that using a recommender is people’s first experience with really big data. Other tools and services you use outside of it are fine because they do not deal with such large data. Recommenders force you so process every interaction that all your users have made over perhaps a year and do it often. There are few other apps that require this. Welcome to Big-Data.
MySQL is fine to run your app as you no doubt know. The “model” built in a recommender is generally not human readable but in the case of the UR you can understand it with some experience. It lives in Elasticsearch while the user interactions live in HBase. The user events can be looked at but not sure why you’d want too, they are condensed snippets of server logs. In any case it may help to think of the model in Elasticsearch as a product catalog. It will define what items can be recommended and have an entry for each item with Machine Learning calculated attributes attached that indicate the type of user that prefers each item. But the model also contains item properties/attributes that you may want to include for business rules. The Recommender is easily accessed from you app through the input and query API. You can change attributes of items by sending special input events. Queries are defined that match the type of things recommenders with business rules do and the model can be seen through Elasticsearch APIs but it is discouraged to do any direct manipulation of these since their meaning or format may change with any update. Plan to use the PIO query API, it will respond in real-time, with latency on the order of 25ms, and multiple simultaneous connections/queries. There would be no reason to pull out data from the UR and put it in a database or you would loose the ability to react to user’s real-time behavior, which is used to make recommendations. Stick to the input/query APIs and feed data into the UR in real-time and you’ll get the most benefit. On Mar 23, 2017, at 12:25 PM, Vaghawan Ojha <[email protected]> wrote: Hi Pat, Thank you very much.Yes I will be following actionml instruction since I'm going to use UR. I think I should rather direct myself to HBASE rather than expensing time in setting up Mysql. Part of my need is that once we train the dataset, the result should be easily available to the application which are running into Mysql. I'm fairly new to the concept itself. So basically I would always have a larage json file coming from the application which uses mysql(this shouldn't be the problem). Then I would use PIO and UR to do the hard work, and get back the result either like an API which I think already works in PIO or saved somewhere in database like mysql or something like that. Thanks On Fri, Mar 24, 2017 at 1:03 AM, Pat Ferrel <[email protected] <mailto:[email protected]>> wrote: The UR uses Elasticsearch for part of the Recommender algorithm, therefor it must be configured as a storage backend. It is possible to use Postgres or MySQL for the other stores but we have very little experience with this. HBase is indefinitely scalable so we always use that. Single machine deployments are rare with a reasonably sized data so Elasticsearch + Hbase running separately or in clusters will always meet the data needs. The RDBs will not and anyway, like I said you have to use Elasticsearch. Therefore for the UR follow instructions on the ActionML site since they are specific to the UR. For other templates you may use other configurations of PIO but if you use the UR config you can also use every template too. On Mar 23, 2017, at 9:07 AM, Vaghawan Ojha <[email protected] <mailto:[email protected]>> wrote: Hi, Thank you! I came into further more confusion here, actually I installed prediction IO version 0.10.0 from here http://predictionio.incubator.apache.org/install/install-sourcecode/ <http://predictionio.incubator.apache.org/install/install-sourcecode/> and have been fighting to configure mysql as a storage in my local linux machine. But I see there is a different documentation of installing in actionml website, I'm not sure for which I would have to go. Currently there is no "pio-env.sh". file inside conf folder however there is pio-env.sh.template file. I commented the pgsql section and uncommented the mysql section with the username and password, but whenever I do . sudo PredictionIO-0.10.0-incubating/bin/pio eventserver there seems to be an error that says that authentication failed with pgsql, however I don't want to use pgsql. # Storage Repositories # Default is to use PostgreSQL PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL # Storage Data Sources # PostgreSQL Default Settings # Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly #PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc #PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio <> #PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio #PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio # MySQL Example PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio <> PIO_STORAGE_SOURCES_MYSQL_USERNAME=root PIO_STORAGE_SOURCES_MYSQL_PASSWORD=root This is how the pio-env.sh.template looks like. And again when I visited the actionml site, it suggests that I do have to have ELASTICSEARCH. but prediction.io <http://prediction.io/> site doesn't tells us the same. Which one should I follow and where would I find the current working version of installation guide. I actually wanaa use prediction.io <http://prediction.io/> in my production shortly after I implemented in local. Please help me, thank you very much for your help, I appreciate it so much. Vaghawan On Thu, Mar 23, 2017 at 9:27 PM, Pat Ferrel <[email protected] <mailto:[email protected]>> wrote: Since PIO has moved to Apache, the namespace of PIO code changed and so all templates need to be updated. None of the ones in https://github.com/PredictionIO/ <https://github.com/PredictionIO/template-scala-parallel-universal-recommendation> will work with Apache PIO. For the upgraded UR see: https://github.com/actionml/universal-recommender <https://github.com/actionml/universal-recommender> Docs for the UR are here: http://actionml.com/docs/ur <http://actionml.com/docs/ur> Also look on the Template gallery page here for a description of template status. Some have not been moved to the new namespace and converted to run with PIO but this is pretty easy to do yourself. http://predictionio.incubator.apache.org/gallery/template-gallery/ <http://predictionio.incubator.apache.org/gallery/template-gallery/> user_id, product_id and purchase_date is all you need to use any recommender. If you plan to gather other events in the future, use the UR. As far as item or user based recommendations, the UR will give either based on the query with the same data and model, as some others will do. The UR allows you to mix both types in a single query, which may be useful with small amounts of individual user data. Also the accepted wisdom about this it to put item-based recs on item detail pages, and user-based recs elsewhere, when you don’t have an item to base recs on, or in another placement on any page. You can have many different placements of recs in any page by changing the queries. This is how Netflix gets rows and rows of specialized recs for different things all based on the same data. The UR queries are quite flexible. On Mar 23, 2017, at 7:08 AM, Vaghawan Ojha <[email protected] <mailto:[email protected]>> wrote: Hi, I've been trying to deploy a recommendation system using https://github.com/PredictionIO/template-scala-parallel-universal-recommendation <https://github.com/PredictionIO/template-scala-parallel-universal-recommendation>. I've purchase history of user something like this: user_id, product_id and purchase_date, so I will be using user_id and product_id to determine the recommendation. I'm not sure if I would be able to customize the default even parameter. Do you have any suggestions like which template would be more suitable for my problem. I don't have data like rating or view state, I only have data about user and product they purchased. I need something like item based similarity as well as user based item similarity. Any help would be great Thank you Vaghawan
