Improving quality of item similarities?

2013-02-14 Thread Julian Ortega
Hi everyone. I have a data set that looks like this: Number of users: 198651 Number of items: 9972 Statistics of purchases from users mean number of purchases 3.3 stdDev number of purchases 3.5 min number of purchases 1 max number of purchases 176 median number

Re: Improving quality of item similarities?

2013-02-14 Thread Sean Owen
Yes, I don't know if removing that data would improve results. It might mean you can compute things faster, at little or no observable loss in quality of the results. I'm not sure, but you probably have repeat purchases of the same item, and items of different value. Working in that data may help

Re: how to use a custom distance measure with kmeans?

2013-02-14 Thread Dan Filimon
I can think of only 2 possibilities: - in the script, I think it goes through the if statements to line 251 where the HADOOP_CLASSPATH is being set; that line differs from line 243 where the CLASSPATH you set also gets added. So, it seems that the CLASSPATH you set isn't being passed to hadoop.

RE: how to use a custom distance measure with kmeans?

2013-02-14 Thread Mihai Josan
I modified line 251 like this: export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}:$CLASSPATH Now I don't have the Class not found exception but I get: Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector I found a big discussion regarding this error at

How best to characterize data as a vector

2013-02-14 Thread misterblinky
I'm clustering (non-textual) data. Some of the features in my vectors represent discrete values or types such that, for example, one feature may have the range of values 0=red, 1=blue, 2=green, 3=yellow. I could also have characterized the same data as 4 features where the value of the feature

Shopping cart

2013-02-14 Thread Pat Ferrel
There are several methods for recommending things given a shopping cart contents. At the risk of using the same tool for every problem I was thinking about a recommender's use here. I'd do something like train on shopping cart purchases so row = cartID, column = itemID. Given cart contents I

RE: Reg Classification Problem..

2013-02-14 Thread Saikat Kanjilal
Hey Vignesh,Are there specific things you need, I've built a classification implementation in the past with naive bayes and a real time service to serve up the results of this data. Let me know if you have specific questions.Regards Date: Thu, 14 Feb 2013 10:18:31 +0530 Subject: Reg

Re: Shopping cart

2013-02-14 Thread Ted Dunning
I think that this is an excellent use case for cross recommendation from cart contents (items) to cart purchases (items). The cross aspect is that the recommendation is from two different kinds of actions, not two kinds of things. The first action is insertion into a cart and the second is

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
I thought you might say that but we don't have the add-to-cart action. We have to calculate cart purchases by matching cart IDs or session IDs. So we only have cart purchases with items. If we had the add-to-cart and the purchase we could use your cross-action method for getting recs by

Re: Shopping cart

2013-02-14 Thread Sean Owen
This sounds like a job for frequent item set mining, which is kind of a special case of the ideas you've mentioned here. Given N items in a cart, which next item most frequently occurs in a purchased cart? On Thu, Feb 14, 2013 at 6:30 PM, Pat Ferrel pat.fer...@gmail.com wrote: I thought you

Re: Problems Running Mahout SSVD

2013-02-14 Thread K.D.P. Ross
Appreciate the replies! Yes this problem has been pretty much beaten to shreds. In fact so much so i wrote it into troubleshooting in section 5 of the manual (https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17modificationDate=134085000). Aha, it

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
Yes, one time tested way to do this is the apriori algo which looks at frequent item sets and creates rules. I was looking for a shortcut using a recommender, which would be super easy to try. The rule builder is a little harder to implement but we can also test precision on that and compare

Re: Shopping cart

2013-02-14 Thread Sean Owen
I don't think it's necessarily slow; this is how item-based recommenders work. The only thing stopping you from using Mahout directly is that I don't think there's an easy way to say recommend to this collection of items. But that's what is happening inside when you recommend for a user. You can

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
I'm creating a matrix of cart ids and items ids so cart x items in cart. The 'preference' then is cartID, itemID. This will create the correct matrix I think. For any cart id I would get a ranked list of recommended items that was calculated from other carts. This seems like what is needed in

Re: Shopping cart

2013-02-14 Thread Sean Owen
Yes your only issue there, which I think you had touched on, was that you have to put your current cart (which hasn't been purchased) into the model in order to get an answer out of a recommender. I think we've talked about the recommend-to-anonymous function in the context of another system,

Re: Help with Classifier

2013-02-14 Thread Brian McCallister
So to answer my own question, the order of training matters. I had been doing all category 1 then all category 0. Apparently this breaks things badly On Wed, Feb 13, 2013 at 4:29 PM, Brian McCallister bri...@skife.org wrote: I'm trying to do a basic two category classifier on textual data, I

Re: Shopping cart

2013-02-14 Thread Ted Dunning
Do you see the contents of the cart? Is the cart ID opaque? Does it persist as a surrogate for a user? On Thu, Feb 14, 2013 at 10:30 AM, Pat Ferrel pat.fer...@gmail.com wrote: I thought you might say that but we don't have the add-to-cart action. We have to calculate cart purchases by

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
Sure, we have cart/session IDs, items IDs, and user IDs when purchases are made or when asked for a recommendation from the cart page. We currently don't get the add-to or remove-from cart actions. We could get them. Are you thinking that we can use the add-to-cart user x item matrix and