RE: algorithms Apriori, FPgrowth
Hi Jakub, fpg is in 0.9 but currently unsupported (no code maintainer(s)). I don't think we have any docs for it since it's slated for removal. I use it in 0.8 and it works for the _limited_ use cases I have. As an alternative, and in preparation for the eventuality that it'll be removed without maintainers, I've been using an R package to fill the gap (actually playing with getting to execute on H20 via their R integration). Feel free to take a look at the algo and see if it's something you could maintain if you think it'd be useful for you - I'd certainly be happy about it! Best, Nick From: Jakub Stransky [stransky...@gmail.com] Sent: Tuesday, November 25, 2014 8:31 AM To: user@mahout.apache.org Subject: algorithms Apriori, FPgrowth Hello experienced mahout users, I am new to mahout library and I have a bit trouble to find a starting point for "associative rule mining" as I don't see neither Apriori not FPgrowth algorithm on the list of implemented algorithms. Contrary I found several blog posts with referal to mahout library for implementation of those algorithms. I am a bit confused what the current state is and where to find appropriate docs. Any hint would be appreciated. Thanks Jakub
RE: How to deal with catogrical and date data in mahout ?
Hi there, Which algorithm are you using? For instance, for recommendations you could create a mapping of your categorical data to integers before you pass the data into Mahout. Let us know a bit more about what you're trying to accomplish/algos you're looking to use. Best, Nick -Original Message- From: Lee S [mailto:sle...@gmail.com] Sent: Tuesday, November 18, 2014 10:13 PM To: user Subject: How to deal with catogrical and date data in mahout ? Hi all: Do you hava any good practice when you deal with catogrical data? Does mahout have provided a tool class which can do the convertion?
Re: Mahout Vs Spark
I know we lost the maintainer for fpgrowth somewhere along the line but it's definitely something I'd love to see carried forward, too. Sent from my iPhone > On Oct 22, 2014, at 8:09 AM, "Brian Dolan" wrote: > > Sing it, brother! I miss FP Growth as well. Once the Scala bindings are in, > I'm hoping to work up some time series methods. > >> On Oct 21, 2014, at 8:00 PM, Lee S wrote: >> >> As a developer, who is facing the library chosen between mahout and mllib, >> I have some idea below. >> Mahout has no any decision tree algorithm. But MLLIB has the components of >> constructing a decision tree algorithm such as gini index, information >> gain. And also I think mahout can add algorithm about frequency pattern >> mining which is very import in feature selection and statistic analysis. >> MLLIB has no frequent mining algorithms. >> p.s Why fpgrowth algorithm is removed in version 0.9? >> >> 2014-10-22 9:12 GMT+08:00 Vibhanshu Prasad : >> >>> actually spark is available in python also, so users of spark are having an >>> upper hand over users of traditional users of mahout. This is applicable to >>> all the libraries of python (including numpy). >>> >>> On Wed, Oct 22, 2014 at 3:54 AM, Ted Dunning >>> wrote: >>> On Tue, Oct 21, 2014 at 3:04 PM, Mahesh Balija < >>> balijamahesh@gmail.com wrote: > I am trying to differentiate between Mahout and Spark, here is the >>> small > list, > > Features Mahout Spark Clustering Y Y Classification Y Y >>> Regression Y > Y Dimensionality Reduction Y Y Java Y Y Scala N Y Python N Y >>> Numpy N > Y Hadoop Y Y Text Mining Y N Scala/Spark Bindings Y N/A >>> scalability Y > Y Mahout doesn't actually have strong features for clustering, >>> classification and regression. Mahout is very strong in recommendations (which you don't mention) and dimensionality reduction. Mahout does support scala in the development version. What do you mean by support for Numpy? >
RE: New Mahout Recommender Service
Would absolutely love an ES integration. -Original Message- From: Pat Ferrel [mailto:p...@occamsmachete.com] Sent: Tuesday, September 09, 2014 10:29 AM To: user@mahout.apache.org Subject: New Mahout Recommender Service Now that we have the basis of several significant improvements to Mahout's recommender it seems like we need to go the last step and provide a service. Without this it is left to the user to do a lot of integration making the current next gen somewhat incomplete. Using the Hadoop mapreduce code you can get all recs for all people using collaborative filtering data or you can use the in-memory single machine recommender if you have a small dataset. The next generation would require Solr or Elasticsearch so why not go the extra step and provide a recommender API on top? At very least it would give users a single machine API they can call, analogous to the in-memory recommender of Mahout 0.9. But it would also be indefinitely scalable. Is anyone interested in discussing this here?
Re: Fpgrowth
In the spirit of "there are no dumb questions"... What would it take to support this algo? Does that mean one volunteers for user list help/wiki doc maintenance and of course the code management? That cover it? Sent from my iPhone On Jul 24, 2014, at 1:40 AM, "Suneel Marthi" wrote: > fpgrowth was initially removed and added again for 0.9 because one specific > user stepped up to support it (and was never heard from again). Mahout 0.9 > should have fpgrowth IIRC. > > > On Thu, Jul 24, 2014 at 1:27 AM, Martin, Nick wrote: > >> So I know fpgrowth was sent out to pasture a few months ago. As luck would >> have it I need to do this kind of thing now. >> >> Would my only option now be to pull the source (per Sebastian's note in >> the JIRA)? Could I roll back from 0.9 to a prev version to pick it back up? >> >> Any other options? I don't *think* we'd be able to bite off algo >> maintenance so that probably rules our getting it dropped back into the >> distro I'm guessing. >> >> Sent from my iPhone
Fpgrowth
So I know fpgrowth was sent out to pasture a few months ago. As luck would have it I need to do this kind of thing now. Would my only option now be to pull the source (per Sebastian's note in the JIRA)? Could I roll back from 0.9 to a prev version to pick it back up? Any other options? I don't *think* we'd be able to bite off algo maintenance so that probably rules our getting it dropped back into the distro I'm guessing. Sent from my iPhone
Re: Recommend to a cluster of users
Couple thoughts/comments: - How much anonymity are we talking about here? you have an IP which gives you (ostensibly) geography. That's not entirely trivial...think about looking at purchasing characteristics by geolocation. You can make some common sense decisions about what you recommend (ie maybe dont pop a recommendation for flip flops to someone hitting you from Montreal in January). - I can't speak to whether somebody's solved the cold start problem but I'd recommend taking a look at how your customers acquire product categories/items/widgets in an early period of their lifetime with you. Think looking at cohorts and comparing them to tease out if there's a pattern of purchasing in the first n days of them being a customer. Absent that, I'd pitch popular stuff with good margins :) Hope that gets the wheels turning a bit. I don't think cold start is a "one size fits all" kind of thing. Tough nut to crack. Sent from my iPhone On Jul 11, 2014, at 6:58 PM, "Rashi Jain" wrote: > Hi, > > I want to build a recommendation for anonymous/first time users on an > e-commerce website. I was thinking of recommending products to a > cluster/segment of users , something like TreeClusteringRecommender does > but I believe this has been deprecated. > > I have used item based collaborative filtering based on boolean preferences > for registered users but am looking for ideas to achieve some sort of > recommendation for anonymous/first-time users. > > Any feedback will be highly appreciated. > > Thank you. > > Regards, > Rashi
Re: Welcome Pat Ferrel as new committer on Mahout
Awesome Pat congrats!!! Very well deserved. Sent from my iPhone On Apr 24, 2014, at 6:20 AM, "Sebastian Schelter" wrote: > Hi, > > this is to announce that the Project Management Committee (PMC) for Apache > Mahout has asked Pat Ferrel to become committer and we are pleased to > announce that he has accepted. > > Being a committer enables easier contribution to the project since in > addition to posting patches on JIRA it also gives write access to the code > repository. That also means that now we have yet another person who can > commit patches submitted by others to our repo *wink* > > Pat, we look forward to working with you in the future. Welcome! It would be > great if you could introduce yourself with a few words. > > -s
RE: Documentation, Documentation, Documentation
Drafted a little intro to the item based rec and dropped it in the comments for 1445. Aimed to include some examples of the variety of things one can do with the algo and hopefully enough info that someone hitting the page could get a feel for what they can potentially accomplish before diving directly into the 'guts' of the workflow/config options, etc. Happy to take edits, saw there was another submission a bit ahead of mine this morning so not sure how that gets resolved. Anyways, maybe this can get us closer on cleanup! -Original Message- From: Sebastian Schelter [mailto:s...@apache.org] Sent: Sunday, April 13, 2014 7:49 AM To: user@mahout.apache.org; d...@mahout.apache.org Subject: Documentation, Documentation, Documentation Hi, this is another reminder that we still have to finish our documentation improvements! The website looks shiny now and there have been lots of discussions about new directions but we still have some work todo in cleaning up webpages. We should especially make sure that the examples work. Please help with that, anyone who is willing to sacrifice some time, go through a website and try out the steps described is of great help to the project. It would also be awesome to get some help in creating a few new pages, especially for the recommenders. Here's the list of documentation related jira's for 1.0: https://issues.apache.org/jira/browse/MAHOUT-1441?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Documentation%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC Best, Sebastian
RE: market basket analysis of low sales volume products
I can tell you my experience is that it's absolutely informative to take a look at running the recommendation stuff on things other than items (brands, categories, sub-categories, etc.). If you're in a multi-brand environment it can give you a great view into brand pen by customer groups pretty quickly. Instead of items just assign your categories (or brands, or types, etc.) an ID and pass them through the recommendation algos. And/or, if you'd like (and you have the metadata available), you can do the same with customer segments/groups/etc. If you start to see deficiencies in brand spread for customers you don't expect (or, even, don't like) you can inject that feedback into your process. A good place to control that kind of thing is in the filter file and items file - here you can control what items (or categories, or sub-categories, or brands) make it into your output. You could even go so far as to exclude low-margin items, only generate recs for categories in a specific brand for which you're currently trying to increase penetration, etc. Long answer, but I strongly suggest it's a "yes" and based on experience dealing with this stuff day-to-day. Come to think of it, I think I owe a write-up on this whole kind of thing... -Original Message- From: Si Chen [mailto:sic...@opensourcestrategies.com] Sent: Thursday, March 20, 2014 8:15 PM To: user@mahout.apache.org Subject: market basket analysis of low sales volume products Hi everybody, I'd like to do some market basket analysis to suggest cross-sells, but many of the products are very low sales volume items, so in the past the results weren't that useful. Do you think it would make sense to do market basket analysis at more aggregate levels, for example by brand, product keywords, and product categories, to develop a set of heuristic rules? Then we can use those rules to say that even if we haven't sold product X, because it has brand A, category B, or type C, then it should be cross-sold with some other products. Does that sound like a reasonable strategy? Has anybody ever tried this? -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps
Re: Newbie question
+ Mahout user Sent from my iPhone On Mar 8, 2014, at 10:42 AM, "Mahmood Naderan" mailto:nt_mahm...@yahoo.com>> wrote: Hi Maybe this is a newbie question but I want to know does Hadoop/Mahout use pthread models? Regards, Mahmood
RE: Welcome Andrew Musselman as new comitter
Awesome! Congrats Andrew very well-deserved. -Original Message- From: Sebastian Schelter [mailto:s...@apache.org] Sent: Friday, March 07, 2014 12:13 PM To: user@mahout.apache.org; d...@mahout.apache.org Subject: Welcome Andrew Musselman as new comitter Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA it also gives write access to the code repository. That also means that now we have yet another person who can commit patches submitted by others to our repo *wink* Andrew, we look forward to working with you in the future. Welcome! It would be great if you could introduce yourself with a few words :) Sebastian
RE: get similar items
The data source system (i.e. MySQL) won't really matter since you'll be looking to output a file with a specific format for the clustering algorithm to pick up. As long as you can manage to get the data out of your source system into the acceptable input format you'll be fine. I very strongly suggest walking through that Reuters example step-by-step to get a feel for how your data needs to be structured as an input, how the sequence file conversion works, etc. There are plenty of great resources out there re: clustering text (or, product descriptions in your case) that are straightforward and informative (i.e. https://eastagile.com/blogs/text-mining-in-apache-mahout, http://ashokharnal.wordpress.com/2014/02/09/text-clustering-using-mahout-command-line-step-by-step/ , http://blog.trifork.com/2011/04/04/how-to-cluster-seinfeld-episodes-with-mahout/ (fun one) ) and certainly the Mahout In Action book would be a great place to learn as well. Happy clustering! -Original Message- From: N! [mailto:12481...@qq.com] Sent: Friday, February 14, 2014 2:33 AM To: user Subject: Re: get similar items Thank you Sebastian&Martin&Scott. I checked 'https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line'. It looks like the case what I said.But I am using JAVA with a Mysql database, is there an example related to this? thanks. -- Original -- From: "Scott C. Cote";; Date: Wed, Feb 12, 2014 11:47 PM To: "user@mahout.apache.org"; Subject: Re: get similar items Since you are relying on unguided data - switch from recommenders/classifier to clustering. Anyone else agree with me on this??? SCott On 2/12/14 9:04 AM, "Martin, Nick" wrote: >Yeah, since it would appear you're lacking requisite data for >recommenders the only other thing I can think of in this case is >potentially treating the movie records as documents and clustering them >(via whatever might be in the 'description' field). > >Have a look here >https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+a >nal >ysis+using+the+Mahout+command+line and see if you can support something >like this with your dataset. > >-Original Message- >From: Sebastian Schelter [mailto:ssc.o...@googlemail.com] >Sent: Wednesday, February 12, 2014 6:28 AM >To: user@mahout.apache.org >Subject: Re: get similar items > >Hi, > >Mahout's recommenders are based on analyzing interactions between users >and items/movies, e.g. ratings or counts how often the movie was watched. > > >On 02/12/2014 11:34 AM, N! wrote: >> Hi all: >> Does anyone have any suggestions for the questions below? >> >> >> thanks a lot. >> >> >> -- Original -- >> Sender: "N!"<12481...@qq.com>; >> Send time: Wednesday, Feb 12, 2014 6:17 PM >> To: "user"; >> >> Subject: Re: get similar items >> >> >> >> Hi Sean: >> Thanks for the reply. >> Assume I have only one table named 'movie' with 1000+ >>records, this table have three >>columns:'id','movieName','movieDescription'. >> Can Mahout calculate the most similar movies for a >>movie.(based on only the 'movie' table)? >> code like: List mostSimilarMovieList = >>recommender.mostSimilar(int movieId). >> if not, do you have any suggestions for this scenario? >> > .
RE: get similar items
Yeah, since it would appear you're lacking requisite data for recommenders the only other thing I can think of in this case is potentially treating the movie records as documents and clustering them (via whatever might be in the 'description' field). Have a look here https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line and see if you can support something like this with your dataset. -Original Message- From: Sebastian Schelter [mailto:ssc.o...@googlemail.com] Sent: Wednesday, February 12, 2014 6:28 AM To: user@mahout.apache.org Subject: Re: get similar items Hi, Mahout's recommenders are based on analyzing interactions between users and items/movies, e.g. ratings or counts how often the movie was watched. On 02/12/2014 11:34 AM, N! wrote: > Hi all: > Does anyone have any suggestions for the questions below? > > > thanks a lot. > > > -- Original -- > Sender: "N!"<12481...@qq.com>; > Send time: Wednesday, Feb 12, 2014 6:17 PM > To: "user"; > > Subject: Re: get similar items > > > > Hi Sean: > Thanks for the reply. > Assume I have only one table named 'movie' with 1000+ records, > this table have three columns:'id','movieName','movieDescription'. > Can Mahout calculate the most similar movies for a movie.(based > on only the 'movie' table)? > code like: List mostSimilarMovieList = > recommender.mostSimilar(int movieId). > if not, do you have any suggestions for this scenario? >
RE: Item recommendation w/o users or preferences
I think the key question is what is the desired outcome? If you don't have users (customers) for which you'd like to generate recommendations that really handcuffs you from a recommendation standpoint. I'd recommend starting with a read through this: http://mahout.apache.org/users/recommender/recommender-first-timer-faq.html to get a feel for what Mahout does in the recommendation space. -Original Message- From: Tim Smith [mailto:timsmit...@hotmail.com] Sent: Friday, January 10, 2014 8:27 PM To: user@mahout.apache.org Subject: Item recommendation w/o users or preferences Say I have a retail organization that doesn't sell a diverse set of products, eg 2000, but has many small transactions. Also say that I don't have any user or preference information. Is it reasonable to use pattern mining (market baskets) and recommend items based on a set of thresholds for support, confidence, and lift? If not, what are my options?
RE: Seeing already purchased items in recommender output (running 0.7)
Update...--filterFile remedied it but I was operating under the impression a filterFile wasn't exactly required. -Original Message- From: Martin, Nick [mailto:nimar...@pssd.com] Sent: Wednesday, November 13, 2013 4:27 PM To: user@mahout.apache.org Subject: RE: Seeing already purchased items in recommender output (running 0.7) https://drive.google.com/folderview?id=0B7c8ZMblZvRVUmFPeGlIdDJUV28&usp=sharing Figured it might help if I attach the input/output in case anyone wants to have a look/run a test. If you look at UserID 16240507 they purchased ItemID 1521 (rec_input.csv) and the recommendation output gives ItemID 1521 as a recommendation (rec_out.txt) -Original Message----- From: Martin, Nick [mailto:nimar...@pssd.com] Sent: Wednesday, November 13, 2013 1:43 PM To: user@mahout.apache.org Subject: Seeing already purchased items in recommender output (running 0.7) Hi all, I'm running > mahout recommenditembased -s SIMILARITY_EUCLIDEAN_DISTANCE -i /user/myname/somedir/Input/minm.csv -o /user/nyname/somedir/Output/ My input is the standard format: userid,itemid,pref but I found a customer item recommendation for something a customer already purchased. Saw this: http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items but it looked fairly old. Am I hitting a known bug I just haven't stumbled across yet?
RE: Seeing already purchased items in recommender output (running 0.7)
https://drive.google.com/folderview?id=0B7c8ZMblZvRVUmFPeGlIdDJUV28&usp=sharing Figured it might help if I attach the input/output in case anyone wants to have a look/run a test. If you look at UserID 16240507 they purchased ItemID 1521 (rec_input.csv) and the recommendation output gives ItemID 1521 as a recommendation (rec_out.txt) -Original Message- From: Martin, Nick [mailto:nimar...@pssd.com] Sent: Wednesday, November 13, 2013 1:43 PM To: user@mahout.apache.org Subject: Seeing already purchased items in recommender output (running 0.7) Hi all, I'm running > mahout recommenditembased -s SIMILARITY_EUCLIDEAN_DISTANCE -i /user/myname/somedir/Input/minm.csv -o /user/nyname/somedir/Output/ My input is the standard format: userid,itemid,pref but I found a customer item recommendation for something a customer already purchased. Saw this: http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items but it looked fairly old. Am I hitting a known bug I just haven't stumbled across yet?
Seeing already purchased items in recommender output (running 0.7)
Hi all, I'm running > mahout recommenditembased -s SIMILARITY_EUCLIDEAN_DISTANCE -i /user/myname/somedir/Input/minm.csv -o /user/nyname/somedir/Output/ My input is the standard format: userid,itemid,pref but I found a customer item recommendation for something a customer already purchased. Saw this: http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items but it looked fairly old. Am I hitting a known bug I just haven't stumbled across yet?
RE: Scheduled tasks in Mahout
+1 Oozie -Original Message- From: kelvin@gmail.com [mailto:kelvin@gmail.com] On Behalf Of Shengjie Min Sent: Wednesday, October 30, 2013 9:03 PM To: user@mahout.apache.org Subject: Re: Scheduled tasks in Mahout Oozie. On 31 October 2013 04:42, j.barrett Strausser wrote: > You can look at : Flume, Oozie, Mesos, Chronos, Luigi. > > > On Wed, Oct 30, 2013 at 4:19 PM, Ted Dunning > wrote: > > > No. Scheduling is outside of Mahout's scope. > > > > > > > > > > On Wed, Oct 30, 2013 at 12:55 PM, Cassio Melo > > > > wrote: > > > > > I wonder if Mahout (more precisely org.apache.mahout.cf.taste > > > package) has any helper class to execute scheduled tasks like > > > fetch data, compute similarity, etc. > > > > > > Thank you > > > > > > Cassio > > > > > > > > > -- > > > https://github.com/bearrito > @deepbearrito >
RE: Getting rating for all the files
Hi all, I have the same question as Deepak does below...where can I find the User based recommender via Mahout command line? I don't see it listed in the valid program names: Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. dirichlet: : Dirichlet Clustering eigencuts: : Eigencuts spectral clustering evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering fpg: : Frequent Pattern Growth hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices meanshift: : Mean Shift clustering minhash: : Run Minhash clustering parallelALS: : ALS-WR factorization of a rating matrix recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using item-based collaborative filtering regexconverter: : Convert text files on a per line basis based on regular expressions rowid: : Map SequenceFile to {SequenceFile, SequenceFile} rowsimilarity: : Compute the pairwise similarities of the rows of a matrix runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model runlogistic: : Run a logistic regression model against CSV data seq2encoded: : Encoded Sparse Vector generation from Text sequence files seq2sparse: : Sparse Vector generation from Text sequence files seqdirectory: : Generate sequence files (of Text) from a directory seqdumper: : Generic Sequence File dumper seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives seqwiki: : Wikipedia xml dump to sequence file spectralkmeans: : Spectral k-means clustering split: : Split Input data into test and train sets splitDataset: : split a rating dataset into training and probe parts ssvd: : Stochastic SVD svd: : Lanczos Singular Value Decomposition testnb: : Test the Vector-based Bayes classifier trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model trainlogistic: : Train a logistic regression using stochastic gradient descent trainnb: : Train the Vector-based Bayes classifier transpose: : Take the transpose of a matrix validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors vectordump: : Dump vectors from a sequence file to text viterbi: : Viterbi decoding of hidden states from given output states sequence -Original Message- From: Deepak Subhramanian [mailto:deepak.subhraman...@gmail.com] Sent: Sunday, September 29, 2013 4:06 PM To: user@mahout.apache.org Subject: Re: Getting rating for all the files I tried writing a UserRecommendation program in java. But it give me less results than the ItemBasedRecommendation. Anyone else have any thoughts on my previous question ? On Sun, Sep 29, 2013 at 7:24 PM, Deepak Subhramanian < deepak.subhraman...@gmail.com> wrote: > Thanks Nick. I am planning to give a try with userbasedrecommendation > since there are low no of users. I dont see recommenduserbased option > in the commandline utility for Mahout. Does that mean I have to write > a Java Program to use the UserBasedRecommender ? > > > On Sun, Sep 29, 2013 at 7:22 PM, Martin, Nick wrote: > >> I'l need to defer to one of the other math whizzes on the potential >> reasons for recommendations for certain users not appearing. My >> suspicion is that you would either not have sufficient co-occurrence >> of specific users/items to support a recommendation or you may need >> to experiment with a different similarity measure. >> >> Anyone else want to weigh in? >> >> >> >> Sent from my iPhone >> >> On Sep 29, 2013, at 1:14 PM, "Deepak Subhramanian" < >> deepak.subhraman...@gmail.com> wrote: >> >> > Sorry . My mistake . I am getting the lower ratings for some of the >> users >> &
Re: Getting rating for all the files
I'l need to defer to one of the other math whizzes on the potential reasons for recommendations for certain users not appearing. My suspicion is that you would either not have sufficient co-occurrence of specific users/items to support a recommendation or you may need to experiment with a different similarity measure. Anyone else want to weigh in? Sent from my iPhone On Sep 29, 2013, at 1:14 PM, "Deepak Subhramanian" wrote: > Sorry . My mistake . I am getting the lower ratings for some of the users > and items. But my issue is not solved . I am not getting ratings for some > of the users and some of the ratings. > > My userFile has 8000 users and my itemsFile has 4000 Items . But I get > recommendations for only 5000 users and 1500 items. And the maximum no of > recommendations given is 258. What can be the reasons that there is no > items recommendations for 3000 users and 2500 items. Is it because there is > no similarities exist between those users and items ? > > > On Sun, Sep 29, 2013 at 4:46 PM, Deepak Subhramanian < > deepak.subhraman...@gmail.com> wrote: > >> Thanks Nick. As I mentioned earleir I am getting ratings only for the top >> recommended products instead of ratings for 4000 products I am giving >> numRecommendations parameter to 4000 and maxPrefsPerUser to 4000. Should >> it give 4000 items in the list for each user ? For some reasons the >> output for items which are having lower ratings is not displayed. I see >> the default limit is 10. >> >> I am not sure if I am not getting ratings for 4000 items because I am >> passing the wrong options for the mahout version or is it an issue with >> mahout ver 0.7. I am using 0.7 -mahout-examples-0.7-cdh4.3.1.jar . >> >> I see the parameter name changed in the latest version I checked from git >> - 0.9-SNAPSHOT >> >> maxPrefsPerUserConsidered = jobConf.getInt(MAX_PREFS_PER_USER_CONSIDERED, >> DEFAULT_MAX_PREFS_PER_USER_CONSIDERED); >> >> Will using a latest version help ? >> >> >> >> >> >> On Sun, Sep 29, 2013 at 12:29 PM, Martin, Nick wrote: >> >>> There should be a score after each recommended item (i.e. 123456:2.6) in >>> your output. Lower scores would be the ones you're interested in. >>> >>> Sent from my iPhone >>> >>> On Sep 28, 2013, at 8:25 AM, "Deepak Subhramanian" < >>> deepak.subhraman...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> I am trying to predict the ratings for some items for some users using >>> item >>>> based collaborative filtering. I tried using the mahout >>> recommenditembased >>>> , but it shows only the top 10 items or I can increase it by passing the >>>> --numRecommendations parameter. But it doesnt shows items which has >>> lower >>>> predicted rating . What is the best approach to get ratings for items >>> which >>>> has low predicted rating ? >>>> >>>> >>>> I tried this command. >>>> >>>> mahout recommenditembased --input mahoutrecoinput --usersFile >>>> recouserlist --itemsFile recoitemlist --output >>>> /mahoutrecooutputpearsonnew -s SIMILARITY_PEARSON_CORRELATION >>>> --numRecommendations 4000 --maxPrefsPerUser 4000 >>>> >>>> Also I tried using the estimatePreference method on the recommender. >>>> >>>> Please help . >> >> >> >> -- >> Deepak Subhramanian > > > > -- > Deepak Subhramanian
Re: Getting rating for all the files
There should be a score after each recommended item (i.e. 123456:2.6) in your output. Lower scores would be the ones you're interested in. Sent from my iPhone On Sep 28, 2013, at 8:25 AM, "Deepak Subhramanian" wrote: > Hi > > I am trying to predict the ratings for some items for some users using item > based collaborative filtering. I tried using the mahout recommenditembased > , but it shows only the top 10 items or I can increase it by passing the > --numRecommendations parameter. But it doesnt shows items which has lower > predicted rating . What is the best approach to get ratings for items which > has low predicted rating ? > > > I tried this command. > > mahout recommenditembased --input mahoutrecoinput --usersFile > recouserlist --itemsFile recoitemlist --output > /mahoutrecooutputpearsonnew -s SIMILARITY_PEARSON_CORRELATION > --numRecommendations 4000 --maxPrefsPerUser 4000 > > Also I tried using the estimatePreference method on the recommender. > > Please help .
Preference to vectors for clustering
Hi all, I'm looking for the best way to get user clusters from my recommendation output. Idea being I have my recommended items for users (user, item, score) based on their preferences but I want to see how the users were clustered together (and their similarity) so I can run some other analytics on those clusters. I found some discussion on this here (http://lucene.472066.n3.nabble.com/Turning-Preference-Files-Into-Vectors-td640035.html) but I'm not sure if any updates have been made since this thread that would make this a bit easier? If not, is what's discussed in the thread my best approach? Hope that makes sense... Thanks, Nick
RE: Clustering for customer segmentation
Great info, thanks for the help. I pulled the paper and will start looking at some options. I'd love to contribute so I'll get on JIRA and sign up for the dev@ mailing list to start getting a feel for that process. Thanks, Nick -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Monday, August 12, 2013 12:00 PM To: user@mahout.apache.org Subject: Re: Clustering for customer segmentation The tasks that you need to do include: a) group your history by user id b) extract the features you want to use from each user history c) repeat clustering and adjusting the scaling of your features until you are happy If you have a few hundred examples of customers broken down by the segmentation that you want, then one thing that you might look at is this paper: http://www.cs.cmu.edu/~epxing/papers/Old_papers/xing_nips02_metric.pdf It shows a method for learning a metric that optimizes clustering of labeled and unlabeled points. Mahout currently does not have support for this kind of metric learning, but it would make an excellent addition. On Sat, Aug 10, 2013 at 11:54 AM, Martin, Nick wrote: > Hi all, > > I'm new to Mahout and wondering if anyone could point me in the right > direction for doing customer purchase behavior clustering in Mahout. > Seems most of what I encounter in online and book examples for > clustering is text/document based. > > Basically, I'd like to be able to explore passing n years of customer > transaction data into one of the clustering algorithms and have my > customer population be segmented into similar groups. Key determinants > of similarity would be things like sales volume, purchase frequency, > sales channel, profitability, tenure, category mix, etc. > > Anywhere I can see examples of this kind of thing? > > Thanks!! > Nick > > > > Sent from my iPhone
Clustering for customer segmentation
Hi all, I'm new to Mahout and wondering if anyone could point me in the right direction for doing customer purchase behavior clustering in Mahout. Seems most of what I encounter in online and book examples for clustering is text/document based. Basically, I'd like to be able to explore passing n years of customer transaction data into one of the clustering algorithms and have my customer population be segmented into similar groups. Key determinants of similarity would be things like sales volume, purchase frequency, sales channel, profitability, tenure, category mix, etc. Anywhere I can see examples of this kind of thing? Thanks!! Nick Sent from my iPhone