Hi Pat,
I know in the real situation is to use "id" instead of the "item", now
is to determine how to write SQL, then with solr engineers determine how to
index.
Before every recommendation to query all history information for each
actions?
For example, a user view history is "nexus" and "ipad", and purchase
history is "ipad" , and addToCart history is "ipad", but now the user viewing
"iphone", for recommendations how write the SQL?
select item from test.rec where purchase like '%ipad%' and addToCart
like '%ipad%'
and view like '% nexus%' and view like '% ipad%' and like '%
iphone%'
Feel the SQL is not correct, how to write? I need is an instance.
Thank you.
On Sep 20, 2014, at 23:29, Pat Ferrel <[email protected]> wrote:
> Looks like you have the correct indicators in your DB, now you must integrate
> with a search engine like Solr or Elasticsearch to index the indicator and
> cross-indicator columns. You must decide how you want to do this. The
> indicators should be space delimited OR arrays of strings and your query will
> be to the search engine not a select statement. You will have to check the
> docs for the search engine you use and set up configuration to index the
> right columns. Make sure to setup the indexing (I use auto-indexing so it is
> always up-to-date).
>
> In the query take the user’s history of each action and map purchases to
> “purchase” column, views to the “view” column, and adds to cart to the
> “addToCart” column. This is a single 3 field query for Solr or Elasticsearch.
>
> Results will be an ordered list of row IDs/ db IDs. You will need to fetch
> the items from the catalog using either the row ID or the “item” as a foreign
> key.
>
>
> On Sep 20, 2014, at 8:03 AM, pol <[email protected]> wrote:
>
> Hi Pat,
> I have a problem in practical recommendations, looking forward to your
> reply and thank you.
>
> /rec/si/input/data.txt:
> --------------------------
> u1,purchase,iphone
> u1,purchase,ipad
> u2,purchase,nexus
> u2,purchase,galaxy
> u3,purchase,surface
> u4,purchase,iphone
> u4,purchase,galaxy
> u1,view,iphone
> u1,view,ipad
> u1,view,nexus
> u1,view,galaxy
> u2,view,iphone
> u2,view,ipad
> u2,view,nexus
> u2,view,galaxy
> u3,view,surface
> u3,view,nexus
> u4,view,iphone
> u4,view,ipad
> u4,view,galaxy
> u1,addToCart,iphone
> u1,addToCart,ipad
> u1,addToCart,nexus
> u2,addToCart,iphone
> u2,addToCart,nexus
> u2,addToCart,galaxy
> u3,addToCart,surface
> u4,addToCart,iphone
> u4,addToCart,galaxy
>
> with the command line:
> mahout spark-itemsimilarity -i /rec/si/input/data.txt -o /rec/si/output
> -f1 purchase -f2 view -os -ic 2 -fc 1 -td ,
>
> and created two directories ---- /rec/si/output/indicator-matrix and
> /rec/si/output/cross-indicator-matrix, contents as follows:
>
> /rec/si/output/indicator-matrix/part-00000
> --------------------------
> galaxy nexus
> surface
> iphone ipad
> nexus galaxy
> ipad iphone
>
> /rec/si/output/cross-indicator-matrix/part-00000
> --------------------------
> galaxy galaxy,iphone,nexus,ipad
> surface surface,nexus
> iphone galaxy,iphone,nexus,ipad
> nexus galaxy,iphone,nexus,ipad
> ipad galaxy,iphone,nexus,ipad
>
> the second command:
> mahout spark-itemsimilarity -i /rec/si/input/data.txt -o /rec/si/output2
> -f1 addToCart -os -ic 2 -fc 1 -td ,
>
> /rec/si/output2/indicator-matrix/part-00000
> --------------------------
> galaxy iphone
> surface
> iphone galaxy,nexus,ipad
> nexus iphone,ipad
> ipad nexus,iphone
>
> Through the above outputs to create an index or table, here with the table
> for test:
> +----+---------+----------+--------------------------+-------------------+
> | id | item | purchase | view | addToCart |
> +----+---------+----------+--------------------------+-------------------+
> | 1 | galaxy | nexus | galaxy,iphone,nexus,ipad | iphone |
> | 2 | surface | | surface,nexus | |
> | 3 | iphone | ipad | galaxy,iphone,nexus,ipad | galaxy,nexus,ipad |
> | 4 | nexus | galaxy | galaxy,iphone,nexus,ipad | iphone,ipad |
> | 5 | ipad | iphone | galaxy,iphone,nexus,ipad | nexus,iphone |
> +----+---------+----------+--------------------------+-------------------+
>
> Now, a user viewed "nexus" and "ipad", "other people also viewed items"
> recommendations to him, Which writing is right? Or the other?
>
> 1. select item from test.rec where view like '%nexus%' or view like '%ipad%'
> order by id;
> +---------+
> | item |
> +---------+
> | galaxy |
> | surface |
> | iphone |
> | nexus |
> | ipad |
> +---------+
>
> 2. select item from test.rec where view like '%nexus%' and view like '%ipad%'
> order by id;
> +--------+
> | item |
> +--------+
> | galaxy |
> | iphone |
> | nexus |
> | ipad |
> +--------+
>
> 3. select distinct view from test.rec where view like '%nexus%' or view like
> '%ipad%' order by id;
> +--------------------------+
> | view |
> +--------------------------+
> | galaxy,iphone,nexus,ipad |
> | surface,nexus |
> +--------------------------+
>
> 4. select distinct view from test.rec where view like '%nexus%' and view like
> '%ipad%' order by id;
> +--------------------------+
> | view |
> +--------------------------+
> | galaxy,iphone,nexus,ipad |
> +--------------------------+
>
> 5. select distinct view from test.rec where item = 'nexus' or item = 'ipad'
> order by id;
> +--------------------------+
> | view |
> +--------------------------+
> | galaxy,iphone,nexus,ipad |
> +--------------------------+
>
> I just learn Mahout for a period of time, and is not very understanding.
>
> Thanks.
>
>
> On Sep 19, 2014, at 22:41, pol <[email protected]> wrote:
>
>> Hi Pat,
>> I made a spelling mistake, As you said, I am a reference to this
>> example:
>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>>
>> I know, I understand is right. Thank you again.
>>
>>
>> On Sep 19, 2014, at 22:14, Pat Ferrel <[email protected]> wrote:
>>
>>> First it looks like some misspelled IDs
>>>
>>> ipad != iPad
>>> iphone != iPhone
>>>
>>> Second you have to treat purchase as the primary action and view as the
>>> secondary action this will create two indicator matrices in two different
>>> directories as the docs say. Use the command line in the docs for two
>>> actions.
>>>
>>> Notice:
>>> --filter1 purchase \ # word that flags input for the primary action
>>> --filter2 view \ # word that flags input for the secondary action
>>>
>>> This tells the job to create an indicator matrix from lines with “purchase”
>>> and a cross-indicator from lines with “view”
>>>
>>> Read the "More Complex Input” section.
>>>
>>>
>>> On Sep 19, 2014, at 1:27 AM, pol <[email protected]> wrote:
>>>
>>> Hi Pat,
>>>
>>> Thank you very much! I had a little understanding. In this example:
>>>
>>> item purchase view
>>> --------------------------------------------------------
>>> galaxy nexus galaxy iphone nexus iPad
>>> surface surface nexus
>>> iPhone ipad galaxy iphone nexus ipad
>>> nexus galaxy galaxy iphone nexus ipad
>>> iPad iphone galaxy iphone nexus iPad
>>>
>>> When a user view "surface", "surface" recommended for him to view;
>>> When a user purchase "nexus" and "iPad", "galaxy" and "iPhone" recommended
>>> for him to purchase;
>>> Of course, there is no filtering for recommendation result. I understand is
>>> right?
>>>
>>> Thanks.
>>>
>>> On Sep 19, 2014, at 04:40, Pat Ferrel <[email protected]> wrote:
>>>
>>>> You create the indicator and cross-indicator matrices with —omitStrength
>>>> then if you are using a database with solr or elasticsearch you will
>>>> create a table:
>>>>
>>>> item ID, list of indicator Item IDs, list of cross-indicator item IDs
>>>>
>>>> 3 columns. All IDs will be like “nexus” in the example—they are your
>>>> application’s item IDs. The second and third column contain lists of item
>>>> IDs. There are several ways you can do this either by using a multi-valued
>>>> field (array of IDs) or a space delimited string depending on how you want
>>>> to integrate with your search engine and database. Check the instructions
>>>> for your particular search engine.
>>>>
>>>> At the time you want to recommend, take the user’s history of the primary
>>>> action (purchase in the example) and map it to the "list of indicator Item
>>>> IDs” field. Take the user’s history of the secondary action (view in the
>>>> example) and map it to the "list of cross-indicator Item IDs” field. Then
>>>> perform the search engine query and you’ll get back a list of item IDs to
>>>> recommend. Filter out any items that the user has in their history (if you
>>>> wish) and recommend the items in the order they were returned
>>>>
>>>> This blog explains more:
>>>> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>>>>
>>>> Ted’s book gives an example architecture:
>>>> https://www.mapr.com/practical-machine-learning
>>>>
>>>> On Sep 18, 2014, at 10:00 AM, pol <[email protected]> wrote:
>>>>
>>>> Hi, All
>>>> I saw spark-itemsimilarity doc at
>>>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html,
>>>> but I don’t understand how can creating a recommender by
>>>> spark-itemsimilarity? I don’t understand "3 Creating a Recommender"
>>>> chapter.
>>>> For input of the form:
>>>> u1,purchase,iphone
>>>> u1,purchase,ipad
>>>> u2,purchase,nexus
>>>> u2,purchase,galaxy
>>>> u3,purchase,surface
>>>> u4,purchase,iphone
>>>> u4,purchase,galaxy
>>>> u1,view,iphone
>>>> u1,view,ipad
>>>> u1,view,nexus
>>>> u1,view,galaxy
>>>> u2,view,iphone
>>>> u2,view,ipad
>>>> u2,view,nexus
>>>> u2,view,galaxy
>>>> u3,view,surface
>>>> u3,view,nexus
>>>> u4,view,iphone
>>>> u4,view,ipad
>>>> u4,view,galaxy
>>>> output
>>>> out-path
>>>> |-- indicator-matrix - TDF part files
>>>> \-- cross-indicator-matrix - TDF part-files
>>>> The indicator matrix will contain the lines:
>>>> galaxy\tnexus:1.7260924347106847
>>>> ipad\tiphone:1.7260924347106847
>>>> nexus\tgalaxy:1.7260924347106847
>>>> iphone\tipad:1.7260924347106847
>>>> surface
>>>> The cross-indicator matrix will contain:
>>>> iphone\tnexus:1.7260924347106847 iphone:1.7260924347106847
>>>> ipad:1.7260924347106847 galaxy:1.7260924347106847
>>>> ipad\tnexus:0.6795961471815897 iphone:0.6795961471815897
>>>> ipad:0.6795961471815897 galaxy:0.6795961471815897
>>>> nexus\tnexus:0.6795961471815897 iphone:0.6795961471815897
>>>> ipad:0.6795961471815897 galaxy:0.6795961471815897
>>>> galaxy\tnexus:1.7260924347106847 iphone:1.7260924347106847
>>>> ipad:1.7260924347106847 galaxy:1.7260924347106847
>>>> surface\tsurface:4.498681156950466 nexus:0.6795961471815897
>>>> ————----
>>>> Now,u4 view nexus, how to recommend for u4 by the above of output?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>