Re: Need a Suggessations

Pat Ferrel Thu, 23 Mar 2017 14:39:22 -0700

Think of the recommender as a single app. It is scalable to whatever your data 
size via the services it is built on. We often see that using a recommender is 
people’s first experience with really big data. Other tools and services you 
use outside of it are fine because they do not deal with such large data. 
Recommenders force you so process every interaction that all your users have 
made over perhaps a year and do it often. There are few other apps that require 
this. Welcome to Big-Data.


MySQL is fine to run your app as you no doubt know. The “model” built in a 
recommender is generally not human readable but in the case of the UR you can 
understand it with some experience. It lives in Elasticsearch while the user 
interactions live in HBase. The user events can be looked at but not sure why 
you’d want too, they are condensed snippets of server logs.

In any case it may help to think of the model in Elasticsearch as a product 
catalog. It will define what items can be recommended and have an entry for 
each item with Machine Learning calculated attributes attached that indicate 
the type of user that prefers each item. But the model also contains item 
properties/attributes that you may want to include for business rules.

The Recommender is easily accessed from you app through the input and query 
API. You can change attributes of items by sending special input events. 
Queries are defined that match the type of things recommenders with business 
rules do and the model can be seen through Elasticsearch APIs but it is 
discouraged to do any direct manipulation of these since their meaning or 
format may change with any update.

Plan to use the PIO query API, it will respond in real-time, with latency on 
the order of 25ms, and multiple simultaneous connections/queries. There would 
be no reason to pull out data from the UR and put it in a database or you would 
loose the ability to react to user’s real-time behavior, which is used to make 
recommendations. Stick to the input/query APIs and feed data into the UR in 
real-time and you’ll get the most benefit.


On Mar 23, 2017, at 12:25 PM, Vaghawan Ojha <[email protected]> wrote:

Hi Pat, 

Thank you very much.Yes I will be following actionml instruction since I'm 
going to use UR. I think I should rather direct myself to HBASE rather than 
expensing time  in setting up Mysql. Part of my need is that once we train the 
dataset, the result should be easily available to the application which are 
running into Mysql. 

I'm fairly new to the concept itself. So basically I would always have a larage 
json file coming from the application which uses mysql(this shouldn't be the 
problem). Then I would use PIO and UR to do the hard work, and get back the 
result either like an API which I think already works in PIO or saved somewhere 
in database like mysql or something like that. 

Thanks 

On Fri, Mar 24, 2017 at 1:03 AM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
The UR uses Elasticsearch for part of the Recommender algorithm, therefor it 
must be configured as a storage backend. It is possible to use Postgres or 
MySQL for the other stores but we have very little experience with this. HBase 
is indefinitely scalable so we always use that. Single machine deployments are 
rare with a reasonably sized data so Elasticsearch + Hbase running separately 
or in clusters will always meet the data needs. The RDBs will not and anyway, 
like I said you have to use Elasticsearch.

Therefore for the UR follow instructions on the ActionML site since they are 
specific to the UR. For other templates you may use other configurations of PIO 
but if you use the UR config you can also use every template too.



On Mar 23, 2017, at 9:07 AM, Vaghawan Ojha <[email protected] 
<mailto:[email protected]>> wrote:

Hi, Thank you! 

I came into further more confusion here, actually I installed prediction IO 
version 0.10.0 from here 
http://predictionio.incubator.apache.org/install/install-sourcecode/ 
<http://predictionio.incubator.apache.org/install/install-sourcecode/>  and 
have been fighting to configure mysql as a storage in my local linux machine. 

But I see there is a different documentation of installing in actionml website, 
I'm not sure for which I would have to go. Currently there is no "pio-env.sh".  
file inside conf folder however there is pio-env.sh.template file. I commented 
the pgsql section and uncommented the mysql section with the username and 
password, but whenever I do . sudo PredictionIO-0.10.0-incubating/bin/pio 
eventserver there seems to be an error that says that authentication failed 
with pgsql, however I don't want to use pgsql. 

# Storage Repositories

# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL

# Storage Data Sources

# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
#PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
#PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio <>
#PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
#PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio

# MySQL Example
 PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
 PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio <>
 PIO_STORAGE_SOURCES_MYSQL_USERNAME=root
 PIO_STORAGE_SOURCES_MYSQL_PASSWORD=root


This is how the pio-env.sh.template looks like. And again when I visited the 
actionml site, it suggests that I do have to have ELASTICSEARCH. but 
prediction.io <http://prediction.io/> site doesn't tells us the same. Which one 
should I follow and where would I find the current working version of 
installation guide. I actually wanaa use prediction.io <http://prediction.io/> 
in my production shortly after I implemented in local. 

Please help me, thank you very much for your help, I appreciate it so much.
Vaghawan


On Thu, Mar 23, 2017 at 9:27 PM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
Since PIO has moved to Apache, the namespace of PIO code changed and so all 
templates need to be updated. None of the ones in  
https://github.com/PredictionIO/ 
<https://github.com/PredictionIO/template-scala-parallel-universal-recommendation>
 will work with Apache PIO. For the upgraded UR see: 
https://github.com/actionml/universal-recommender 
<https://github.com/actionml/universal-recommender> Docs for the UR are here: 
http://actionml.com/docs/ur <http://actionml.com/docs/ur> 

Also look on the Template gallery page here for a description of template 
status. Some have not been moved to the new namespace and converted to run with 
PIO but this is pretty easy to do yourself. 
http://predictionio.incubator.apache.org/gallery/template-gallery/ 
<http://predictionio.incubator.apache.org/gallery/template-gallery/>

user_id, product_id and purchase_date is all you need to use any recommender. 
If you plan to gather other events in the future, use the UR. As far as item or 
user based recommendations, the UR will give either based on the query with the 
same data and model, as some others will do. The UR allows you to mix both 
types in a single query, which may be useful with small amounts of individual 
user data.

Also the accepted wisdom about this it to put item-based recs on item detail 
pages, and user-based recs elsewhere, when you don’t have an item to base recs 
on, or in another placement on any page.

You can have many different placements of recs in any page by changing the 
queries. This is how Netflix gets rows and rows of specialized recs for 
different things all based on the same data. The UR queries are quite flexible.


On Mar 23, 2017, at 7:08 AM, Vaghawan Ojha <[email protected] 
<mailto:[email protected]>> wrote:

Hi, 

I've been trying to deploy a recommendation system using 
https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
 
<https://github.com/PredictionIO/template-scala-parallel-universal-recommendation>.
 

I've purchase history of user something like this: 
user_id, product_id and purchase_date, so I will be using user_id and 
product_id to determine the recommendation. I'm not sure if I would be able to 
customize the default even parameter. 

Do you have any suggestions like which template would be more suitable for my 
problem. I don't have data like rating or view state, I only have data about 
user and product they purchased. I need something like item based similarity as 
well as user based item similarity. 

Any help would be great

Thank you
Vaghawan

Re: Need a Suggessations

Reply via email to