If user 1 has the following history: purchase: 1001, 1002, 1003, 1004, 1005 view: 1002, 1006 addToCart: 1008
Then you want a full-text field-based query, not MoreLikeThis. You want to use an “OR” type query so that matching any item id will be a search “hit” and will add weight to the result. This is set up in the Solr config. I assume you have created an indicator for “purchase” and cross-indicators for “view” and “addToCart”. These are put in Solr fields “purchase”, “view”, and “addToCart” for all items in your catalog. For user 1 the recommendation returning query would be: for “purchase” field: “1001 1002 1003 1004 1005” for “view” field: ”1002 1006” for “addToCart” field: ”1008” Field query parser docs here: http://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html For more help constructing the query I suggest the Solr mailing list. This will return items with similar indicators and cross-indicator fields—these are recommendations. You may want to filter out any that the user has already seen. In this case filter out 1001 1002 1003 1004 1005 1006 and 1008 from the query results. You can decide how much user history to include in your query and it doesn’t have to include all fields. For instance purchase: "1001, 1002, 1003, 1004, 1005" or view: "1002 1006" are both valid queries but the more recent user history the better. If you have real data for multiple actions are you able to share it? On Sep 21, 2014, at 8:54 AM, pol <[email protected]> wrote: Hi Pat, Thank you again for your reply. I’m still not sure how to do when querying recommendations by Solr. I read your two blog but still not very understand(No loss is a Mahout submitter, blog is very good) http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-1/ http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ Part of the website logs below: userId,action,itemId ------------------------ 1,view,1001 1,view,1002 1,view,1003 1,view,1004 1,view,1005 2,view,1001 2,view,1002 2,view,1006 3,view,1003 3,view,1004 3,view,1007 1,purchase,1001 1,purchase,1003 1,purchase,1004 2,purchase,1001 2,purchase,1006 3,purchase,1004 mahout spark-itemsimilarity -i /rec/si/test/input/logs.txt -o /rec/si/test/output -os -f1 purchase -f2 view -ic 2 -fc 1 The plan is: Perform spark-itemsimilarity every night(not real-time), and write index to Solr by MR, index likes this: itemId purchase view ------------------------------------------------------ 1001 1004 1006 1003 1002 1001 1005 1003 1006 1004 1004 1001 1003 1003 1004 1005 1007 1002 1001 1006 1001 1006 1002 1001 1003 1001 1004 1005 1003 1002 1001 1004 Now, the user1(userId=1) currently viewing items are 1008 and 1009, and added 1008 to shopping cart. 1. Recommending to his items can be view through the recent 5 browsing history Query : q=view:1003 1004 1005 1008 1009&mlt.true&mlt.fl=view&mlt.mintf=1&mlt.mindf=1&fl=itemId Question: 1. fl is "itemId" not "view" field? 2. 1003 1004 1005 1008 1009 of 1008\1009 not index to Solr, because not real-time, is that ok? 3. here only recommend the view, can not to purchase action history? 2. Recommending to his items can be purchase through currently shopping cart items. Query : q=purchase:1009&mlt.true&mlt.fl=purchase&mlt.mintf=1&mlt.mindf=1&fl=itemId Question: Same as above Here used Solr’s MoreLikeThis : http://wiki.apache.org/solr/MoreLikeThis Please give you the correct code or ideas, Thank you. Your reply is very important to me, looking forward to your reply. Thank you On Sep 21, 2014, at 00:53, Pat Ferrel <[email protected]> wrote: > Can’t use SQL, a search engine performs a “similarity” query not a relational > query. Similarity uses a cosine measure of distance between the query vector > and the indexed vectors. > > Sorry, Solr or Elasticsearh are required! > > On Sep 20, 2014, at 9:05 AM, pol <[email protected]> wrote: > > Hi Pat, > I know in the real situation is to use "id" instead of the "item", now > is to determine how to write SQL, then with solr engineers determine how to > index. > > Before every recommendation to query all history information for each > actions? > > For example, a user view history is "nexus" and "ipad", and purchase > history is "ipad" , and addToCart history is "ipad", but now the user viewing > "iphone", for recommendations how write the SQL? > > select item from test.rec where purchase like '%ipad%' and addToCart > like '%ipad%' > and view like '% nexus%' and view like '% ipad%' and like '% > iphone%' > > Feel the SQL is not correct, how to write? I need is an instance. > > Thank you. > > > On Sep 20, 2014, at 23:29, Pat Ferrel <[email protected]> wrote: > >> Looks like you have the correct indicators in your DB, now you must >> integrate with a search engine like Solr or Elasticsearch to index the >> indicator and cross-indicator columns. You must decide how you want to do >> this. The indicators should be space delimited OR arrays of strings and your >> query will be to the search engine not a select statement. You will have to >> check the docs for the search engine you use and set up configuration to >> index the right columns. Make sure to setup the indexing (I use >> auto-indexing so it is always up-to-date). >> >> In the query take the user’s history of each action and map purchases to >> “purchase” column, views to the “view” column, and adds to cart to the >> “addToCart” column. This is a single 3 field query for Solr or Elasticsearch. >> >> Results will be an ordered list of row IDs/ db IDs. You will need to fetch >> the items from the catalog using either the row ID or the “item” as a >> foreign key. >> >> >> On Sep 20, 2014, at 8:03 AM, pol <[email protected]> wrote: >> >> Hi Pat, >> I have a problem in practical recommendations, looking forward to your >> reply and thank you. >> >> /rec/si/input/data.txt: >> -------------------------- >> u1,purchase,iphone >> u1,purchase,ipad >> u2,purchase,nexus >> u2,purchase,galaxy >> u3,purchase,surface >> u4,purchase,iphone >> u4,purchase,galaxy >> u1,view,iphone >> u1,view,ipad >> u1,view,nexus >> u1,view,galaxy >> u2,view,iphone >> u2,view,ipad >> u2,view,nexus >> u2,view,galaxy >> u3,view,surface >> u3,view,nexus >> u4,view,iphone >> u4,view,ipad >> u4,view,galaxy >> u1,addToCart,iphone >> u1,addToCart,ipad >> u1,addToCart,nexus >> u2,addToCart,iphone >> u2,addToCart,nexus >> u2,addToCart,galaxy >> u3,addToCart,surface >> u4,addToCart,iphone >> u4,addToCart,galaxy >> >> with the command line: >> mahout spark-itemsimilarity -i /rec/si/input/data.txt -o /rec/si/output -f1 >> purchase -f2 view -os -ic 2 -fc 1 -td , >> >> and created two directories ---- /rec/si/output/indicator-matrix and >> /rec/si/output/cross-indicator-matrix, contents as follows: >> >> /rec/si/output/indicator-matrix/part-00000 >> -------------------------- >> galaxy nexus >> surface >> iphone ipad >> nexus galaxy >> ipad iphone >> >> /rec/si/output/cross-indicator-matrix/part-00000 >> -------------------------- >> galaxy galaxy,iphone,nexus,ipad >> surface surface,nexus >> iphone galaxy,iphone,nexus,ipad >> nexus galaxy,iphone,nexus,ipad >> ipad galaxy,iphone,nexus,ipad >> >> the second command: >> mahout spark-itemsimilarity -i /rec/si/input/data.txt -o /rec/si/output2 -f1 >> addToCart -os -ic 2 -fc 1 -td , >> >> /rec/si/output2/indicator-matrix/part-00000 >> -------------------------- >> galaxy iphone >> surface >> iphone galaxy,nexus,ipad >> nexus iphone,ipad >> ipad nexus,iphone >> >> Through the above outputs to create an index or table, here with the table >> for test: >> +----+---------+----------+--------------------------+-------------------+ >> | id | item | purchase | view | addToCart | >> +----+---------+----------+--------------------------+-------------------+ >> | 1 | galaxy | nexus | galaxy,iphone,nexus,ipad | iphone | >> | 2 | surface | | surface,nexus | | >> | 3 | iphone | ipad | galaxy,iphone,nexus,ipad | galaxy,nexus,ipad | >> | 4 | nexus | galaxy | galaxy,iphone,nexus,ipad | iphone,ipad | >> | 5 | ipad | iphone | galaxy,iphone,nexus,ipad | nexus,iphone | >> +----+---------+----------+--------------------------+-------------------+ >> >> Now, a user viewed "nexus" and "ipad", "other people also viewed items" >> recommendations to him, Which writing is right? Or the other? >> >> 1. select item from test.rec where view like '%nexus%' or view like '%ipad%' >> order by id; >> +---------+ >> | item | >> +---------+ >> | galaxy | >> | surface | >> | iphone | >> | nexus | >> | ipad | >> +---------+ >> >> 2. select item from test.rec where view like '%nexus%' and view like >> '%ipad%' order by id; >> +--------+ >> | item | >> +--------+ >> | galaxy | >> | iphone | >> | nexus | >> | ipad | >> +--------+ >> >> 3. select distinct view from test.rec where view like '%nexus%' or view like >> '%ipad%' order by id; >> +--------------------------+ >> | view | >> +--------------------------+ >> | galaxy,iphone,nexus,ipad | >> | surface,nexus | >> +--------------------------+ >> >> 4. select distinct view from test.rec where view like '%nexus%' and view >> like '%ipad%' order by id; >> +--------------------------+ >> | view | >> +--------------------------+ >> | galaxy,iphone,nexus,ipad | >> +--------------------------+ >> >> 5. select distinct view from test.rec where item = 'nexus' or item = 'ipad' >> order by id; >> +--------------------------+ >> | view | >> +--------------------------+ >> | galaxy,iphone,nexus,ipad | >> +--------------------------+ >> >> I just learn Mahout for a period of time, and is not very understanding. >> >> Thanks. >> >> >> On Sep 19, 2014, at 22:41, pol <[email protected]> wrote: >> >>> Hi Pat, >>> I made a spelling mistake, As you said, I am a reference to this >>> example: >>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html >>> >>> I know, I understand is right. Thank you again. >>> >>> >>> On Sep 19, 2014, at 22:14, Pat Ferrel <[email protected]> wrote: >>> >>>> First it looks like some misspelled IDs >>>> >>>> ipad != iPad >>>> iphone != iPhone >>>> >>>> Second you have to treat purchase as the primary action and view as the >>>> secondary action this will create two indicator matrices in two different >>>> directories as the docs say. Use the command line in the docs for two >>>> actions. >>>> >>>> Notice: >>>> --filter1 purchase \ # word that flags input for the primary action >>>> --filter2 view \ # word that flags input for the secondary action >>>> >>>> This tells the job to create an indicator matrix from lines with >>>> “purchase” and a cross-indicator from lines with “view” >>>> >>>> Read the "More Complex Input” section. >>>> >>>> >>>> On Sep 19, 2014, at 1:27 AM, pol <[email protected]> wrote: >>>> >>>> Hi Pat, >>>> >>>> Thank you very much! I had a little understanding. In this example: >>>> >>>> item purchase view >>>> -------------------------------------------------------- >>>> galaxy nexus galaxy iphone nexus iPad >>>> surface surface nexus >>>> iPhone ipad galaxy iphone nexus ipad >>>> nexus galaxy galaxy iphone nexus ipad >>>> iPad iphone galaxy iphone nexus iPad >>>> >>>> When a user view "surface", "surface" recommended for him to view; >>>> When a user purchase "nexus" and "iPad", "galaxy" and "iPhone" recommended >>>> for him to purchase; >>>> Of course, there is no filtering for recommendation result. I understand >>>> is right? >>>> >>>> Thanks. >>>> >>>> On Sep 19, 2014, at 04:40, Pat Ferrel <[email protected]> wrote: >>>> >>>>> You create the indicator and cross-indicator matrices with —omitStrength >>>>> then if you are using a database with solr or elasticsearch you will >>>>> create a table: >>>>> >>>>> item ID, list of indicator Item IDs, list of cross-indicator item IDs >>>>> >>>>> 3 columns. All IDs will be like “nexus” in the example—they are your >>>>> application’s item IDs. The second and third column contain lists of item >>>>> IDs. There are several ways you can do this either by using a >>>>> multi-valued field (array of IDs) or a space delimited string depending >>>>> on how you want to integrate with your search engine and database. Check >>>>> the instructions for your particular search engine. >>>>> >>>>> At the time you want to recommend, take the user’s history of the primary >>>>> action (purchase in the example) and map it to the "list of indicator >>>>> Item IDs” field. Take the user’s history of the secondary action (view in >>>>> the example) and map it to the "list of cross-indicator Item IDs” field. >>>>> Then perform the search engine query and you’ll get back a list of item >>>>> IDs to recommend. Filter out any items that the user has in their history >>>>> (if you wish) and recommend the items in the order they were returned >>>>> >>>>> This blog explains more: >>>>> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ >>>>> >>>>> Ted’s book gives an example architecture: >>>>> https://www.mapr.com/practical-machine-learning >>>>> >>>>> On Sep 18, 2014, at 10:00 AM, pol <[email protected]> wrote: >>>>> >>>>> Hi, All >>>>> I saw spark-itemsimilarity doc at >>>>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html, >>>>> but I don’t understand how can creating a recommender by >>>>> spark-itemsimilarity? I don’t understand "3 Creating a Recommender" >>>>> chapter. >>>>> For input of the form: >>>>> u1,purchase,iphone >>>>> u1,purchase,ipad >>>>> u2,purchase,nexus >>>>> u2,purchase,galaxy >>>>> u3,purchase,surface >>>>> u4,purchase,iphone >>>>> u4,purchase,galaxy >>>>> u1,view,iphone >>>>> u1,view,ipad >>>>> u1,view,nexus >>>>> u1,view,galaxy >>>>> u2,view,iphone >>>>> u2,view,ipad >>>>> u2,view,nexus >>>>> u2,view,galaxy >>>>> u3,view,surface >>>>> u3,view,nexus >>>>> u4,view,iphone >>>>> u4,view,ipad >>>>> u4,view,galaxy >>>>> output >>>>> out-path >>>>> |-- indicator-matrix - TDF part files >>>>> \-- cross-indicator-matrix - TDF part-files >>>>> The indicator matrix will contain the lines: >>>>> galaxy\tnexus:1.7260924347106847 >>>>> ipad\tiphone:1.7260924347106847 >>>>> nexus\tgalaxy:1.7260924347106847 >>>>> iphone\tipad:1.7260924347106847 >>>>> surface >>>>> The cross-indicator matrix will contain: >>>>> iphone\tnexus:1.7260924347106847 iphone:1.7260924347106847 >>>>> ipad:1.7260924347106847 galaxy:1.7260924347106847 >>>>> ipad\tnexus:0.6795961471815897 iphone:0.6795961471815897 >>>>> ipad:0.6795961471815897 galaxy:0.6795961471815897 >>>>> nexus\tnexus:0.6795961471815897 iphone:0.6795961471815897 >>>>> ipad:0.6795961471815897 galaxy:0.6795961471815897 >>>>> galaxy\tnexus:1.7260924347106847 iphone:1.7260924347106847 >>>>> ipad:1.7260924347106847 galaxy:1.7260924347106847 >>>>> surface\tsurface:4.498681156950466 nexus:0.6795961471815897 >>>>> ————---- >>>>> Now,u4 view nexus, how to recommend for u4 by the above of output? >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> >> >> > > >
