Re: JobConf and ClassPath

2013-04-11 Thread Cyril Bogus
Hi am trying to use mahout jar instead of compiling it with my code. On Tue, Apr 9, 2013 at 6:01 PM, Dominik Hübner cont...@dhuebner.com wrote: Try adding this to your pom file build plugins plugin groupIdorg.apache.maven.plugins/groupId

Re: cross recommender

2013-04-11 Thread Pat Ferrel
Getting this running with co-occurrence rather than using a similarity calc on user rows finally forced me to understand what is going on in the base recommender. And the answer implies further work. [B'B] is usually not calculated in the usual item based recommender. The matrix that comes out

Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Billy
I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n'

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sean Owen
This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items,

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Pat Ferrel
Or you may want to look at recording purchases by user ID. Then use the standard recommender to train on (userID, itemsID, boolean). Then query the trained recommender thus: recommender.mostSimilarItems(long itemID, int howMany) This does what you want but uses more data than just what items

Re: cross recommender

2013-04-11 Thread Sebastian Schelter
Do I have to create a SimilarityJob( matrixB, matrixA, similarityType ) to get this or have I missed something already in Mahout? It could be worth to investigate whether MatrixMultiplicationJob could be extended to compute similarities instead of dot products. Best, Sebastian

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sebastian Schelter
Use ItemSimilarityJob instead of RowSimilarityJob, its the easy-to-use wrapper around that :) On 11.04.2013 19:28, Sean Owen wrote: This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities.

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sean Owen
You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote: Thanks for replying, I don't have users, well I do :-) but in this case it should not influence the recommendations , these

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Ted Dunning
Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote: You can try treating your

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Billy
As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask for recommendations for user 1 then this works but if I ask for recommendations for user 6 (a new user yet to be added to the data model) then I get no recommendations ... so if I substitute users for orders then again I

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sean Owen
You can actually create a user #6 for your new order. Or you can use the anonymous user function of the library, although it's hacky. We may be mixing up terms here. DataModel is a class that has nothing to do with Hadoop. Hadoop in turn has no part in real-time anything, like recommending to a

Re: log-likelihood ratio value in item similarity calculation

2013-04-11 Thread Ted Dunning
These numbers don't match what I get. I get LLR = 117. This is wildly anomalous so this pair should definitely be connected. Both items are quite rare (15/300,000 or 20/300,000 rates) but they occur together most of the time that they appear. On Wed, Apr 10, 2013 at 2:15 AM, Phoenix Bai

Re: log-likelihood ratio value in item similarity calculation

2013-04-11 Thread Ted Dunning
Counts are critical here. Suppose that two rare events occur together the first time you ever see them. How exciting is this? Not very in my mind, but not necessarily trivial. Now suppose that they occur together 20 times and never occur alone after you have collected 20 times more data. This

Re: log-likelihood ratio value in item similarity calculation

2013-04-11 Thread Sean Owen
Yes I also get (er, Mahout gets) 117 (116.69), FWIW. I think the second question concerned counts vs relative frequencies -- normalized, or not. Like whether you divide all the counts by their sum or not. For a fixed set of observations that does change the LLR because it is unnormalized, not

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Pat Ferrel
Do you not have a user ID? No matter (though if you do I'd use it) you can use the item ID as a surrogate for a user ID in the recommender. And there will be no filtering if you ask for recommender.mostSimilarItems(long itemID, int howMany), which has no user ID in the call and so will not

trainclassifier -type cbayes dumps text

2013-04-11 Thread Ryan Compton
I'm trying to train a simple text classifier using cbayes. I've got formatted Text,Text sequence files created with com.twitter.elephantbird.pig.store.SequenceFileStorage(), eg: JOY actually turning decent new year ☺ JOY best New Years tonight! ready 2013. U+1F609 U+1F38AU+1F389 JOY

Re: trainclassifier -type cbayes dumps text

2013-04-11 Thread Ryan Compton
Also, right before the screen dump I see: 13/04/11 15:46:40 INFO mapred.JobClient: Combine output records=462236 13/04/11 15:46:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=1618497536 13/04/11 15:46:40 INFO mapred.JobClient: Reduce output records=419058 13/04/11 15:46:40

Re: trainclassifier -type cbayes dumps text

2013-04-11 Thread Ryan Compton
Ok I think I got it. The problem was that I wasn't naming the files properly. If I'm not mistaken I'll need to organize my training data like: -bash-3.2$ hadoop dfs -lsr /user/rfcompton/emotion-training-labeled/ -rw-r--r-- 3 rfcompton hadoop2896850 2013-04-11 16:23

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sebastian Schelter
You can also use the new MultithreadedBatchItemSimilarities class to efficiently precompute item similarities on a single machine without having to go to MapReduce. On 12.04.2013 00:54, Pat Ferrel wrote: Do you not have a user ID? No matter (though if you do I'd use it) you can use the item ID