Hi am trying to use mahout jar instead of compiling it with my code.
On Tue, Apr 9, 2013 at 6:01 PM, Dominik Hübner cont...@dhuebner.com wrote:
Try adding this to your pom file
build
plugins
plugin
groupIdorg.apache.maven.plugins/groupId
Getting this running with co-occurrence rather than using a similarity calc on
user rows finally forced me to understand what is going on in the base
recommender. And the answer implies further work.
[B'B] is usually not calculated in the usual item based recommender. The matrix
that comes out
I am very new to Mahout and currently just ready up to chapter 5 of 'MIA'
but after reading about the various User centric and Item centric
recommenders they all seem to still need a userId so still unsure if Mahout
can help with a fairly common recommendation.
My requirement is to produce 'n'
This sounds like just a most-similar-items problem. That's good news
because that's simpler. The only question is how you want to compute
item-item similarities. That could be based on user-item interactions.
If you're on Hadoop, try the RowSimilarityJob (where you will need
rows to be items,
Or you may want to look at recording purchases by user ID. Then use the
standard recommender to train on (userID, itemsID, boolean). Then query the
trained recommender thus: recommender.mostSimilarItems(long itemID, int
howMany) This does what you want but uses more data than just what items
Do I have to create a SimilarityJob( matrixB, matrixA, similarityType
) to get this or have I missed something already in Mahout?
It could be worth to investigate whether MatrixMultiplicationJob could
be extended to compute similarities instead of dot products.
Best,
Sebastian
Use ItemSimilarityJob instead of RowSimilarityJob, its the easy-to-use
wrapper around that :)
On 11.04.2013 19:28, Sean Owen wrote:
This sounds like just a most-similar-items problem. That's good news
because that's simpler. The only question is how you want to compute
item-item similarities.
You can try treating your orders as the 'users'. Then just compute
item-item similarities per usual.
On Thu, Apr 11, 2013 at 7:59 PM, Billy b...@ntlworld.com wrote:
Thanks for replying,
I don't have users, well I do :-) but in this case it should not influence
the recommendations
,
these
Actually, making this user based is a really good thing because you get
recommendations from one session to the next. These may be much more
valuable for cross-sell than things in the same order.
On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen sro...@gmail.com wrote:
You can try treating your
As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask
for recommendations for user 1 then this works but if I ask for
recommendations for user 6 (a new user yet to be added to the data model)
then I get no recommendations ... so if I substitute users for orders then
again I
You can actually create a user #6 for your new order. Or you can use
the anonymous user function of the library, although it's hacky.
We may be mixing up terms here. DataModel is a class that has
nothing to do with Hadoop. Hadoop in turn has no part in real-time
anything, like recommending to a
These numbers don't match what I get.
I get LLR = 117.
This is wildly anomalous so this pair should definitely be connected. Both
items are quite rare (15/300,000 or 20/300,000 rates) but they occur
together most of the time that they appear.
On Wed, Apr 10, 2013 at 2:15 AM, Phoenix Bai
Counts are critical here.
Suppose that two rare events occur together the first time you ever see
them. How exciting is this? Not very in my mind, but not necessarily
trivial.
Now suppose that they occur together 20 times and never occur alone after
you have collected 20 times more data. This
Yes I also get (er, Mahout gets) 117 (116.69), FWIW.
I think the second question concerned counts vs relative frequencies
-- normalized, or not. Like whether you divide all the counts by their
sum or not. For a fixed set of observations that does change the LLR
because it is unnormalized, not
Do you not have a user ID? No matter (though if you do I'd use it) you can use
the item ID as a surrogate for a user ID in the recommender. And there will be
no filtering if you ask for recommender.mostSimilarItems(long itemID, int
howMany), which has no user ID in the call and so will not
I'm trying to train a simple text classifier using cbayes. I've got
formatted Text,Text sequence files created with
com.twitter.elephantbird.pig.store.SequenceFileStorage(), eg:
JOY actually turning decent new year ☺
JOY best New Years tonight! ready 2013. U+1F609 U+1F38AU+1F389
JOY
Also, right before the screen dump I see:
13/04/11 15:46:40 INFO mapred.JobClient: Combine output records=462236
13/04/11 15:46:40 INFO mapred.JobClient: Physical memory (bytes)
snapshot=1618497536
13/04/11 15:46:40 INFO mapred.JobClient: Reduce output records=419058
13/04/11 15:46:40
Ok I think I got it.
The problem was that I wasn't naming the files properly. If I'm not
mistaken I'll need to organize my training data like:
-bash-3.2$ hadoop dfs -lsr /user/rfcompton/emotion-training-labeled/
-rw-r--r-- 3 rfcompton hadoop2896850 2013-04-11 16:23
You can also use the new MultithreadedBatchItemSimilarities class to
efficiently precompute item similarities on a single machine without
having to go to MapReduce.
On 12.04.2013 00:54, Pat Ferrel wrote:
Do you not have a user ID? No matter (though if you do I'd use it) you can
use the item ID
19 matches
Mail list logo