RE: algorithms Apriori, FPgrowth

2014-11-25 Thread Martin, Nick
Hi Jakub,

fpg is in 0.9 but currently unsupported (no code maintainer(s)). I don't think 
we have any docs for it since it's slated for removal. I use it in 0.8 and it 
works for the _limited_ use cases I have.

As an alternative, and in preparation for the eventuality that it'll be removed 
without maintainers, I've been using an R package to fill the gap (actually 
playing with getting to execute on H20 via their R integration).

Feel free to take a look at the algo and see if it's something you could 
maintain if you think it'd be useful for you - I'd certainly be happy about it!

Best,
Nick

From: Jakub Stransky [stransky...@gmail.com]
Sent: Tuesday, November 25, 2014 8:31 AM
To: user@mahout.apache.org
Subject: algorithms Apriori, FPgrowth

Hello experienced mahout users,

I am new to mahout library and I have a bit trouble to find a starting
point for "associative rule mining"  as I don't see neither Apriori not
FPgrowth algorithm on the list of implemented algorithms. Contrary I found
several blog posts with referal to mahout library for implementation of
those algorithms.
I am a bit confused what the current state is and where to find appropriate
docs.

Any hint would be appreciated.

Thanks
Jakub

RE: How to deal with catogrical and date data in mahout ?

2014-11-18 Thread Martin, Nick
Hi there,

Which algorithm are you using? For instance, for recommendations you could 
create a mapping of your categorical data to integers before you pass the data 
into Mahout.

Let us know a bit more about what you're trying to accomplish/algos you're 
looking to use.

Best,
Nick 

-Original Message-
From: Lee S [mailto:sle...@gmail.com] 
Sent: Tuesday, November 18, 2014 10:13 PM
To: user
Subject: How to deal with catogrical and date data in mahout ?

Hi all:
 Do you hava any good practice when you deal with catogrical data?
 Does mahout have provided a tool class which can do the convertion?


Re: Mahout Vs Spark

2014-10-22 Thread Martin, Nick
I know we lost the maintainer for fpgrowth somewhere along the line but it's 
definitely something I'd love to see carried forward, too.

Sent from my iPhone

> On Oct 22, 2014, at 8:09 AM, "Brian Dolan"  wrote:
> 
> Sing it, brother!  I miss FP Growth as well.  Once the Scala bindings are in, 
> I'm hoping to work up some time series methods.
> 
>> On Oct 21, 2014, at 8:00 PM, Lee S  wrote:
>> 
>> As a developer, who is facing the library  chosen between mahout and mllib,
>> I have some idea below.
>> Mahout has no any decision tree algorithm. But MLLIB has the components of
>> constructing a decision tree algorithm such as gini index, information
>> gain. And also  I think mahout can add algorithm about frequency pattern
>> mining which is very import in feature selection and statistic analysis.
>> MLLIB has no frequent mining algorithms.
>> p.s Why fpgrowth algorithm is removed in version 0.9?
>> 
>> 2014-10-22 9:12 GMT+08:00 Vibhanshu Prasad :
>> 
>>> actually spark is available in python also, so users of spark are having an
>>> upper hand over users of traditional users of mahout. This is applicable to
>>> all the libraries of python (including numpy).
>>> 
>>> On Wed, Oct 22, 2014 at 3:54 AM, Ted Dunning 
>>> wrote:
>>> 
 On Tue, Oct 21, 2014 at 3:04 PM, Mahesh Balija <
>>> balijamahesh@gmail.com
 wrote:
 
> I am trying to differentiate between Mahout and Spark, here is the
>>> small
> list,
> 
> Features Mahout Spark  Clustering Y Y  Classification Y Y
>>> Regression Y
> Y  Dimensionality Reduction Y Y  Java Y Y  Scala N Y  Python N Y
>>> Numpy N
> Y  Hadoop Y Y  Text Mining Y N  Scala/Spark Bindings Y N/A
>>> scalability Y
> Y
 
 Mahout doesn't actually have strong features for clustering,
>>> classification
 and regression. Mahout is very strong in recommendations (which you don't
 mention) and dimensionality reduction.
 
 Mahout does support scala in the development version.
 
 What do you mean by support for Numpy?
> 


RE: New Mahout Recommender Service

2014-09-09 Thread Martin, Nick
Would absolutely love an ES integration.

-Original Message-
From: Pat Ferrel [mailto:p...@occamsmachete.com] 
Sent: Tuesday, September 09, 2014 10:29 AM
To: user@mahout.apache.org
Subject: New Mahout Recommender Service

Now that we have the basis of several significant improvements to Mahout's 
recommender it seems like we need to go the last step and provide a service. 
Without this it is left to the user to do a lot of integration making the 
current next gen somewhat incomplete.

Using the Hadoop mapreduce code you can get all recs for all people using 
collaborative filtering data or you can use the in-memory single machine 
recommender if you have a small dataset. 

The next generation would require Solr or Elasticsearch so why not go the extra 
step and provide a recommender API on top? At very least it would give users a 
single machine API they can call, analogous to the in-memory recommender of 
Mahout 0.9. But it would also be indefinitely scalable.

Is anyone interested in discussing this here?


Re: Fpgrowth

2014-07-24 Thread Martin, Nick
In the spirit of "there are no dumb questions"...

What would it take to support this algo? Does that mean one volunteers for user 
list help/wiki doc maintenance and of course the code management? That cover it?

Sent from my iPhone

On Jul 24, 2014, at 1:40 AM, "Suneel Marthi"  wrote:

> fpgrowth was initially removed and added again for 0.9 because one specific
> user stepped up to support it (and was never heard from again).  Mahout 0.9
> should have fpgrowth IIRC.
> 
> 
> On Thu, Jul 24, 2014 at 1:27 AM, Martin, Nick  wrote:
> 
>> So I know fpgrowth was sent out to pasture a few months ago. As luck would
>> have it I need to do this kind of thing now.
>> 
>> Would my only option now be to pull the source (per Sebastian's note in
>> the JIRA)? Could I roll back from 0.9 to a prev version to pick it back up?
>> 
>> Any other options? I don't *think* we'd be able to bite off algo
>> maintenance so that probably rules our getting it dropped back into the
>> distro I'm guessing.
>> 
>> Sent from my iPhone


Fpgrowth

2014-07-23 Thread Martin, Nick
So I know fpgrowth was sent out to pasture a few months ago. As luck would have 
it I need to do this kind of thing now.

Would my only option now be to pull the source (per Sebastian's note in the 
JIRA)? Could I roll back from 0.9 to a prev version to pick it back up?

Any other options? I don't *think* we'd be able to bite off algo maintenance so 
that probably rules our getting it dropped back into the distro I'm guessing. 

Sent from my iPhone

Re: Recommend to a cluster of users

2014-07-11 Thread Martin, Nick
Couple thoughts/comments:

- How much anonymity are we talking about here? you have an IP which gives you 
(ostensibly) geography. That's not entirely trivial...think about looking at 
purchasing characteristics by geolocation. You can make some common sense 
decisions about what you recommend (ie maybe dont pop a recommendation for flip 
flops to someone hitting you from Montreal in January). 

- I can't speak to whether somebody's solved the cold start problem but I'd 
recommend taking a look at how your customers acquire product 
categories/items/widgets in an early period of their lifetime with you. Think 
looking at cohorts and comparing them to tease out if there's a pattern of 
purchasing in the first n days of them being a customer. Absent that, I'd pitch 
popular stuff with good margins :)

Hope that gets the wheels turning a bit. I don't think cold start is a "one 
size fits all" kind of thing. Tough nut to crack.

Sent from my iPhone

On Jul 11, 2014, at 6:58 PM, "Rashi Jain"  wrote:

> Hi,
> 
> I want to build a recommendation for anonymous/first time users on an
> e-commerce website. I was thinking of recommending products to a
> cluster/segment of users , something like TreeClusteringRecommender does
> but I believe this has been deprecated.
> 
> I have used item based collaborative filtering based on boolean preferences
> for registered users but am looking for ideas to achieve some sort of
> recommendation for anonymous/first-time users.
> 
> Any feedback will be highly appreciated.
> 
> Thank you.
> 
> Regards,
> Rashi


Re: Welcome Pat Ferrel as new committer on Mahout

2014-04-24 Thread Martin, Nick
Awesome Pat congrats!!! Very well deserved.

Sent from my iPhone

On Apr 24, 2014, at 6:20 AM, "Sebastian Schelter"  wrote:

> Hi,
> 
> this is to announce that the Project Management Committee (PMC) for Apache 
> Mahout has asked Pat Ferrel to become committer and we are pleased to 
> announce that he has accepted.
> 
> Being a committer enables easier contribution to the project since in 
> addition to posting patches on JIRA it also gives write access to the code 
> repository. That also means that now we have yet another person who can 
> commit patches submitted by others to our repo *wink*
> 
> Pat, we look forward to working with you in the future. Welcome! It would be 
> great if you could introduce yourself with a few words.
> 
> -s


RE: Documentation, Documentation, Documentation

2014-04-14 Thread Martin, Nick
Drafted a little intro to the item based rec and dropped it in the comments for 
1445. Aimed to include some examples of the variety of things one can do with 
the algo and hopefully enough info that someone hitting the page could get a 
feel for what they can potentially accomplish before diving directly into the 
'guts' of the workflow/config options, etc. 

Happy to take edits, saw there was another submission a bit ahead of mine this 
morning so not sure how that gets resolved. 

Anyways, maybe this can get us closer on cleanup!

-Original Message-
From: Sebastian Schelter [mailto:s...@apache.org] 
Sent: Sunday, April 13, 2014 7:49 AM
To: user@mahout.apache.org; d...@mahout.apache.org
Subject: Documentation, Documentation, Documentation

Hi,

this is another reminder that we still have to finish our documentation 
improvements! The website looks shiny now and there have been lots of 
discussions about new directions but we still have some work todo in cleaning 
up webpages. We should especially make sure that the examples work.

Please help with that, anyone who is willing to sacrifice some time, go through 
a website and try out the steps described is of great help to the project. It 
would also be awesome to get some help in creating a few new pages, especially 
for the recommenders.

Here's the list of documentation related jira's for 1.0:

https://issues.apache.org/jira/browse/MAHOUT-1441?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Documentation%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

Best,
Sebastian


RE: market basket analysis of low sales volume products

2014-03-20 Thread Martin, Nick
I can tell you my experience is that it's absolutely informative to take a look 
at running the recommendation stuff on things other than items (brands, 
categories, sub-categories, etc.). If you're in a multi-brand environment it 
can give you a great view into brand pen by customer groups pretty quickly. 
Instead of items just assign your categories (or brands, or types, etc.) an ID 
and pass them through the recommendation algos. And/or, if you'd like (and you 
have the metadata available), you can do the same with customer 
segments/groups/etc. 

If you start to see deficiencies in brand spread for customers you don't expect 
(or, even, don't like) you can inject that feedback into your process. A good 
place to control that kind of thing is in the filter file and items file - here 
you can control what items (or categories, or sub-categories, or brands) make 
it into your output. You could even go so far as to exclude low-margin items, 
only generate recs for categories in a specific brand for which you're 
currently trying to increase penetration, etc.

Long answer, but I strongly suggest it's a "yes" and based on experience 
dealing with this stuff day-to-day. 

Come to think of it, I think I owe a write-up on this whole kind of thing...

-Original Message-
From: Si Chen [mailto:sic...@opensourcestrategies.com] 
Sent: Thursday, March 20, 2014 8:15 PM
To: user@mahout.apache.org
Subject: market basket analysis of low sales volume products

Hi everybody,

I'd like to do some market basket analysis to suggest cross-sells, but many of 
the products are very low sales volume items, so in the past the results 
weren't that useful.

Do you think it would make sense to do market basket analysis at more aggregate 
levels, for example by brand, product keywords, and product categories, to 
develop a set of heuristic rules?  Then we can use those rules to say that even 
if we haven't sold product X, because it has brand A, category B, or type C, 
then it should be cross-sold with some other products.

Does that sound like a reasonable strategy?  Has anybody ever tried this?

--
Si Chen
Open Source Strategies, Inc.
sic...@opensourcestrategies.com
http://www.OpenSourceStrategies.com
LinkedIn: http://www.linkedin.com/in/opentaps
Twitter: http://twitter.com/opentaps


Re: Newbie question

2014-03-08 Thread Martin, Nick
+ Mahout user

Sent from my iPhone

On Mar 8, 2014, at 10:42 AM, "Mahmood Naderan" 
mailto:nt_mahm...@yahoo.com>> wrote:

Hi
Maybe this is a newbie question but I want to know does Hadoop/Mahout use 
pthread models?

Regards,
Mahmood


RE: Welcome Andrew Musselman as new comitter

2014-03-07 Thread Martin, Nick
Awesome! Congrats Andrew very well-deserved.

-Original Message-
From: Sebastian Schelter [mailto:s...@apache.org] 
Sent: Friday, March 07, 2014 12:13 PM
To: user@mahout.apache.org; d...@mahout.apache.org
Subject: Welcome Andrew Musselman as new comitter

Hi,

this is to announce that the Project Management Committee (PMC) for Apache 
Mahout has asked Andrew Musselman to become committer and we are pleased to 
announce that he has accepted.

Being a committer enables easier contribution to the project since in addition 
to posting patches on JIRA it also gives write access to the code repository. 
That also means that now we have yet another person who can commit patches 
submitted by others to our repo *wink*

Andrew, we look forward to working with you in the future. Welcome! It would be 
great if you could introduce yourself with a few words :)

Sebastian


RE: get similar items

2014-02-14 Thread Martin, Nick
The data source system (i.e. MySQL) won't really matter since you'll be looking 
to output a file with a specific format for the clustering algorithm to pick 
up. As long as you can manage to get the data out of your source system into 
the acceptable input format you'll be fine. 

I very strongly suggest walking through that Reuters example step-by-step to 
get a feel for how your data needs to be structured as an input, how the 
sequence file conversion works, etc. There are plenty of great resources out 
there re: clustering text (or, product descriptions in  your case) that are 
straightforward and informative (i.e. 
https://eastagile.com/blogs/text-mining-in-apache-mahout, 
http://ashokharnal.wordpress.com/2014/02/09/text-clustering-using-mahout-command-line-step-by-step/
 , 
http://blog.trifork.com/2011/04/04/how-to-cluster-seinfeld-episodes-with-mahout/
 (fun one) ) and certainly the Mahout In Action book would be a great place to 
learn as well. 

Happy clustering!

-Original Message-
From: N! [mailto:12481...@qq.com] 
Sent: Friday, February 14, 2014 2:33 AM
To: user
Subject: Re: get similar items

Thank you Sebastian&Martin&Scott.
I checked 
'https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line'.
It looks like the case what I said.But I am using JAVA with a Mysql database, 
is there an example related to this?


thanks.
-- Original --
From:  "Scott C. Cote";;
Date:  Wed, Feb 12, 2014 11:47 PM
To:  "user@mahout.apache.org"; 

Subject:  Re: get similar items



Since you are relying on unguided data - switch from recommenders/classifier to 
clustering.

Anyone else agree with me on this???

SCott

On 2/12/14 9:04 AM, "Martin, Nick"  wrote:

>Yeah, since it would appear you're lacking requisite data for 
>recommenders the only other thing I can think of in this case is 
>potentially treating the movie records as documents and clustering them 
>(via whatever might be in the 'description' field).
>
>Have a look here
>https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+a
>nal
>ysis+using+the+Mahout+command+line and see if you can support something
>like this with your dataset.
>
>-Original Message-
>From: Sebastian Schelter [mailto:ssc.o...@googlemail.com]
>Sent: Wednesday, February 12, 2014 6:28 AM
>To: user@mahout.apache.org
>Subject: Re: get similar items
>
>Hi,
>
>Mahout's recommenders are based on analyzing interactions between users 
>and items/movies, e.g. ratings or counts how often the movie was watched.
>
>
>On 02/12/2014 11:34 AM, N! wrote:
>> Hi all:
>>   Does anyone have any suggestions for the questions below?
>>
>>
>>   thanks a lot.
>>
>>
>> -- Original --
>> Sender: "N!"<12481...@qq.com>;
>> Send time: Wednesday, Feb 12, 2014 6:17 PM
>> To: "user";
>>
>> Subject: Re: get similar items
>>
>>
>>
>> Hi Sean:
>>  Thanks for the reply.
>>  Assume I have only one table named 'movie' with 1000+ 
>>records, this table have three 
>>columns:'id','movieName','movieDescription'.
>>  Can Mahout calculate the most similar movies for a 
>>movie.(based on only the 'movie' table)?
>>  code like: List mostSimilarMovieList = 
>>recommender.mostSimilar(int movieId).
>>  if not, do you have any suggestions for this scenario?
>>
>


.


RE: get similar items

2014-02-12 Thread Martin, Nick
Yeah, since it would appear you're lacking requisite data for recommenders the 
only other thing I can think of in this case is potentially treating the movie 
records as documents and clustering them (via whatever might be in the 
'description' field).

Have a look here 
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
 and see if you can support something like this with your dataset.

-Original Message-
From: Sebastian Schelter [mailto:ssc.o...@googlemail.com] 
Sent: Wednesday, February 12, 2014 6:28 AM
To: user@mahout.apache.org
Subject: Re: get similar items

Hi,

Mahout's recommenders are based on analyzing interactions between users and 
items/movies, e.g. ratings or counts how often the movie was watched.


On 02/12/2014 11:34 AM, N! wrote:
> Hi all:
>   Does anyone have any suggestions for the questions below?
>
>
>   thanks a lot.
>
>
> -- Original --
> Sender: "N!"<12481...@qq.com>;
> Send time: Wednesday, Feb 12, 2014 6:17 PM
> To: "user";
>
> Subject: Re: get similar items
>
>
>
> Hi Sean:
>  Thanks for the reply.
>  Assume I have only one table named 'movie' with 1000+ records, 
> this table have three columns:'id','movieName','movieDescription'.
>  Can Mahout calculate the most similar movies for a movie.(based 
> on only the 'movie' table)?
>  code like: List mostSimilarMovieList = 
> recommender.mostSimilar(int movieId).
>  if not, do you have any suggestions for this scenario?
>



RE: Item recommendation w/o users or preferences

2014-01-10 Thread Martin, Nick
I think the key question is what is the desired outcome? If you don't have 
users (customers) for which you'd like to generate recommendations that really 
handcuffs you from a recommendation standpoint.

I'd recommend starting with a read through this: 
http://mahout.apache.org/users/recommender/recommender-first-timer-faq.html to 
get a feel for what Mahout does in the recommendation space. 

-Original Message-
From: Tim Smith [mailto:timsmit...@hotmail.com] 
Sent: Friday, January 10, 2014 8:27 PM
To: user@mahout.apache.org
Subject: Item recommendation w/o users or preferences

Say I have a retail organization that doesn't sell a diverse set of products, 
eg 2000, but has many small transactions.  Also say that I don't have any user 
or preference information.  Is it reasonable to use pattern mining (market 
baskets) and recommend items based on a set of thresholds for support, 
confidence, and lift?  If not, what are my options?
  


RE: Seeing already purchased items in recommender output (running 0.7)

2013-11-14 Thread Martin, Nick
Update...--filterFile remedied it but I was operating under the impression a 
filterFile wasn't exactly required. 

-Original Message-
From: Martin, Nick [mailto:nimar...@pssd.com] 
Sent: Wednesday, November 13, 2013 4:27 PM
To: user@mahout.apache.org
Subject: RE: Seeing already purchased items in recommender output (running 0.7)

https://drive.google.com/folderview?id=0B7c8ZMblZvRVUmFPeGlIdDJUV28&usp=sharing

Figured it might help if I attach the input/output in case anyone wants to have 
a look/run a test. 

If you look at UserID 16240507 they purchased ItemID 1521 (rec_input.csv) and 
the recommendation output gives ItemID 1521 as a recommendation (rec_out.txt)


-Original Message-----
From: Martin, Nick [mailto:nimar...@pssd.com] 
Sent: Wednesday, November 13, 2013 1:43 PM
To: user@mahout.apache.org
Subject: Seeing already purchased items in recommender output (running 0.7)

Hi all,

I'm running > mahout recommenditembased -s SIMILARITY_EUCLIDEAN_DISTANCE -i 
/user/myname/somedir/Input/minm.csv -o /user/nyname/somedir/Output/

My input is the standard format: userid,itemid,pref but I found a customer item 
recommendation for something a customer already purchased.

Saw this: 
http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items
 but it looked fairly old.

Am I hitting a known bug I just haven't stumbled across yet?




RE: Seeing already purchased items in recommender output (running 0.7)

2013-11-13 Thread Martin, Nick
https://drive.google.com/folderview?id=0B7c8ZMblZvRVUmFPeGlIdDJUV28&usp=sharing

Figured it might help if I attach the input/output in case anyone wants to have 
a look/run a test. 

If you look at UserID 16240507 they purchased ItemID 1521 (rec_input.csv) and 
the recommendation output gives ItemID 1521 as a recommendation (rec_out.txt)


-Original Message-
From: Martin, Nick [mailto:nimar...@pssd.com] 
Sent: Wednesday, November 13, 2013 1:43 PM
To: user@mahout.apache.org
Subject: Seeing already purchased items in recommender output (running 0.7)

Hi all,

I'm running > mahout recommenditembased -s SIMILARITY_EUCLIDEAN_DISTANCE -i 
/user/myname/somedir/Input/minm.csv -o /user/nyname/somedir/Output/

My input is the standard format: userid,itemid,pref but I found a customer item 
recommendation for something a customer already purchased.

Saw this: 
http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items
 but it looked fairly old.

Am I hitting a known bug I just haven't stumbled across yet?




Seeing already purchased items in recommender output (running 0.7)

2013-11-13 Thread Martin, Nick
Hi all,

I'm running > mahout recommenditembased -s SIMILARITY_EUCLIDEAN_DISTANCE -i 
/user/myname/somedir/Input/minm.csv -o /user/nyname/somedir/Output/

My input is the standard format: userid,itemid,pref but I found a customer item 
recommendation for something a customer already purchased.

Saw this: 
http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items
 but it looked fairly old.

Am I hitting a known bug I just haven't stumbled across yet?




RE: Scheduled tasks in Mahout

2013-10-30 Thread Martin, Nick
+1 Oozie

-Original Message-
From: kelvin@gmail.com [mailto:kelvin@gmail.com] On Behalf Of Shengjie 
Min
Sent: Wednesday, October 30, 2013 9:03 PM
To: user@mahout.apache.org
Subject: Re: Scheduled tasks in Mahout

Oozie.


On 31 October 2013 04:42, j.barrett Strausser  wrote:

> You can look at : Flume, Oozie, Mesos, Chronos, Luigi.
>
>
> On Wed, Oct 30, 2013 at 4:19 PM, Ted Dunning 
> wrote:
>
> > No.  Scheduling is outside of Mahout's scope.
> >
> >
> >
> >
> > On Wed, Oct 30, 2013 at 12:55 PM, Cassio Melo 
> > 
> > wrote:
> >
> > > I wonder if Mahout (more precisely org.apache.mahout.cf.taste 
> > > package) has any helper class to execute scheduled tasks like 
> > > fetch data, compute similarity, etc.
> > >
> > > Thank you
> > >
> > > Cassio
> > >
> >
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>


RE: Getting rating for all the files

2013-09-30 Thread Martin, Nick
Hi all, 

I have the same question as Deepak does below...where can I find the User based 
recommender via Mahout command line?

I don't see it listed in the valid program names:

Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  dirichlet: : Dirichlet Clustering
  eigencuts: : Eigencuts spectral clustering
  evaluateFactorization: : compute RMSE and MAE of a rating matrix 
factorization against probes
  fkmeans: : Fuzzy K-means clustering
  fpg: : Frequent Pattern Growth
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based 
collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate Vectors from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  meanshift: : Mean Shift clustering
  minhash: : Run Minhash clustering
  parallelALS: : ALS-WR factorization of a rating matrix
  recommendfactorized: : Compute recommendations using the factorization of a 
rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative 
filtering
  regexconverter: : Convert text files on a per line basis based on regular 
expressions
  rowid: : Map SequenceFile to 
{SequenceFile, SequenceFile}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and 
validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped 
mail archives
  seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model 
against hold-out data set
  vecdist: : Compute the distances between a set of Vectors (or Cluster or 
Canopy, they must fit in memory) and a list of Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence

-Original Message-
From: Deepak Subhramanian [mailto:deepak.subhraman...@gmail.com] 
Sent: Sunday, September 29, 2013 4:06 PM
To: user@mahout.apache.org
Subject: Re: Getting rating for all the files

I tried writing a UserRecommendation program in java. But it give me less 
results than the ItemBasedRecommendation. Anyone else have any thoughts on my 
previous question ?


On Sun, Sep 29, 2013 at 7:24 PM, Deepak Subhramanian < 
deepak.subhraman...@gmail.com> wrote:

> Thanks Nick. I am planning to give a try with userbasedrecommendation 
> since there are low no of users. I dont see recommenduserbased option 
> in the commandline utility for Mahout. Does that mean I have to write 
> a Java Program to use the UserBasedRecommender ?
>
>
> On Sun, Sep 29, 2013 at 7:22 PM, Martin, Nick  wrote:
>
>> I'l need to defer to one of the other math whizzes on the potential 
>> reasons for recommendations for certain users not appearing. My 
>> suspicion is that you would either not have sufficient co-occurrence 
>> of specific users/items to support a recommendation or you may need 
>> to experiment with a different similarity measure.
>>
>> Anyone else want to weigh in?
>>
>>
>>
>> Sent from my iPhone
>>
>> On Sep 29, 2013, at 1:14 PM, "Deepak Subhramanian" < 
>> deepak.subhraman...@gmail.com> wrote:
>>
>> > Sorry . My mistake . I am getting the lower ratings for some of the
>> users
>> &

Re: Getting rating for all the files

2013-09-29 Thread Martin, Nick
I'l need to defer to one of the other math whizzes on the potential reasons for 
recommendations for certain users not appearing. My suspicion is that you would 
either not have sufficient co-occurrence of specific users/items to support a 
recommendation or you may need to experiment with a different similarity 
measure. 

Anyone else want to weigh in?



Sent from my iPhone

On Sep 29, 2013, at 1:14 PM, "Deepak Subhramanian" 
 wrote:

> Sorry . My mistake . I am getting the lower ratings for some of the users
> and items. But my issue is not solved . I am not getting ratings for some
> of the users and some of the ratings.
> 
> My userFile has 8000 users and my itemsFile has 4000 Items  . But I get
> recommendations for only 5000 users and  1500 items. And the maximum no of
> recommendations given is 258. What can be the reasons that there  is no
> items recommendations for 3000 users and 2500 items. Is it because there is
> no similarities exist between those users and items  ?
> 
> 
> On Sun, Sep 29, 2013 at 4:46 PM, Deepak Subhramanian <
> deepak.subhraman...@gmail.com> wrote:
> 
>> Thanks Nick. As I mentioned earleir I am getting  ratings only for the top
>> recommended products instead of ratings for 4000 products I am giving
>> numRecommendations parameter to 4000 and maxPrefsPerUser  to 4000. Should
>> it give 4000 items in the list for each user ? For some reasons the
>> output for items which are having lower ratings is not displayed.  I see
>> the default limit is 10.
>> 
>> I am not sure if I am not getting ratings for 4000 items because I am
>> passing the wrong options for the  mahout version or is it an issue with
>> mahout ver 0.7. I am using 0.7 -mahout-examples-0.7-cdh4.3.1.jar .
>> 
>> I see the parameter name changed in the latest version I checked from git
>> - 0.9-SNAPSHOT
>> 
>> maxPrefsPerUserConsidered = jobConf.getInt(MAX_PREFS_PER_USER_CONSIDERED,
>> DEFAULT_MAX_PREFS_PER_USER_CONSIDERED);
>> 
>> Will using a latest version help ?
>> 
>> 
>> 
>> 
>> 
>> On Sun, Sep 29, 2013 at 12:29 PM, Martin, Nick  wrote:
>> 
>>> There should be a score after each recommended item (i.e. 123456:2.6) in
>>> your output. Lower scores would be the ones you're interested in.
>>> 
>>> Sent from my iPhone
>>> 
>>> On Sep 28, 2013, at 8:25 AM, "Deepak Subhramanian" <
>>> deepak.subhraman...@gmail.com> wrote:
>>> 
>>>> Hi
>>>> 
>>>> I am trying to predict the ratings for some items for some users using
>>> item
>>>> based collaborative filtering. I tried using the mahout
>>> recommenditembased
>>>> , but it shows only the top 10 items or I can increase it by passing the
>>>> --numRecommendations parameter. But it doesnt shows items which has
>>> lower
>>>> predicted rating . What is the best approach to get ratings for items
>>> which
>>>> has low predicted rating ?
>>>> 
>>>> 
>>>> I tried this command.
>>>> 
>>>> mahout recommenditembased --input mahoutrecoinput --usersFile
>>>> recouserlist  --itemsFile  recoitemlist --output
>>>> /mahoutrecooutputpearsonnew -s SIMILARITY_PEARSON_CORRELATION
>>>> --numRecommendations 4000  --maxPrefsPerUser 4000
>>>> 
>>>> Also I tried using the estimatePreference method on the recommender.
>>>> 
>>>> Please help .
>> 
>> 
>> 
>> --
>> Deepak Subhramanian
> 
> 
> 
> -- 
> Deepak Subhramanian


Re: Getting rating for all the files

2013-09-29 Thread Martin, Nick
There should be a score after each recommended item (i.e. 123456:2.6) in your 
output. Lower scores would be the ones you're interested in. 

Sent from my iPhone

On Sep 28, 2013, at 8:25 AM, "Deepak Subhramanian" 
 wrote:

> Hi
> 
> I am trying to predict the ratings for some items for some users using item
> based collaborative filtering. I tried using the mahout recommenditembased
> , but it shows only the top 10 items or I can increase it by passing the
> --numRecommendations parameter. But it doesnt shows items which has lower
> predicted rating . What is the best approach to get ratings for items which
> has low predicted rating ?
> 
> 
> I tried this command.
> 
> mahout recommenditembased --input mahoutrecoinput --usersFile
> recouserlist  --itemsFile  recoitemlist --output
> /mahoutrecooutputpearsonnew -s SIMILARITY_PEARSON_CORRELATION
> --numRecommendations 4000  --maxPrefsPerUser 4000
> 
> Also I tried using the estimatePreference method on the recommender.
> 
> Please help .


Preference to vectors for clustering

2013-09-17 Thread Martin, Nick
Hi all,

I'm looking for the best way to get user clusters from my recommendation 
output. Idea being I have my recommended items for users (user, item, score) 
based on their preferences but I want to see how the users were clustered 
together (and their similarity) so I can run some other analytics on those 
clusters. I found some discussion on this here 
(http://lucene.472066.n3.nabble.com/Turning-Preference-Files-Into-Vectors-td640035.html)
 but I'm not sure if any updates have been made since this thread that would 
make this a bit easier? If not, is what's discussed in the thread my best 
approach?

Hope that makes sense...

Thanks,
Nick


RE: Clustering for customer segmentation

2013-08-12 Thread Martin, Nick
Great info, thanks for the help. I pulled the paper and will start looking at 
some options.

I'd love to contribute so I'll get on JIRA and sign up for the dev@ mailing 
list to start getting a feel for that process.

Thanks,
Nick

-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Monday, August 12, 2013 12:00 PM
To: user@mahout.apache.org
Subject: Re: Clustering for customer segmentation

The tasks that you need to do include:

a) group your history by user id
b) extract the features you want to use from each user history
c) repeat clustering and adjusting the scaling of your features until you are 
happy

If you have a few hundred examples of customers broken down by the segmentation 
that you want, then one thing that you might look at is this
paper:

http://www.cs.cmu.edu/~epxing/papers/Old_papers/xing_nips02_metric.pdf

It shows a method for learning a metric that optimizes clustering of labeled 
and unlabeled points.

Mahout currently does not have support for this kind of metric learning, but it 
would make an excellent addition.



On Sat, Aug 10, 2013 at 11:54 AM, Martin, Nick  wrote:

> Hi all,
>
> I'm new to Mahout and wondering if anyone could point me in the right 
> direction for doing customer purchase behavior clustering in Mahout. 
> Seems most of what I encounter in online and book examples for 
> clustering is text/document based.
>
> Basically, I'd like to be able to explore passing n years of customer 
> transaction data into one of the clustering algorithms and have my 
> customer population be segmented into similar groups. Key determinants 
> of similarity would be things like sales volume, purchase frequency, 
> sales channel, profitability, tenure, category mix, etc.
>
> Anywhere I can see examples of this kind of thing?
>
> Thanks!!
> Nick
>
>
>
> Sent from my iPhone


Clustering for customer segmentation

2013-08-10 Thread Martin, Nick
Hi all,

I'm new to Mahout and wondering if anyone could point me in the right direction 
for doing customer purchase behavior clustering in Mahout. Seems most of what I 
encounter in online and book examples for clustering is text/document based.

Basically, I'd like to be able to explore passing n years of customer 
transaction data into one of the clustering algorithms and have my customer 
population be segmented into similar groups. Key determinants of similarity 
would be things like sales volume, purchase frequency, sales channel, 
profitability, tenure, category mix, etc.

Anywhere I can see examples of this kind of thing?

Thanks!!
Nick



Sent from my iPhone