Hello list,

I wonder if anyone might be able to help me troubleshoot an attempt at
porting some simple Python code to R.

The function below is supposed to take a matrix containing item ratings from
various users and, given a vector containing at least 1 rating and 1 missing
value, employ a 'weighted slope one' algorithm to predict the missing
values.

The algorithm itself is fairly simple and described here:[1] and more
formally here: [3]. But comparing my answers to those generated by the
Python implementation here [2], I obtain different answers.

Unfortunately I don't know enough Python to tell where I might be going
wrong. Any suggestions?

[1]
http://en.wikipedia.org/wiki/Slope_One#Slope_one_collaborative_filtering_for_rated_resources
[2]
http://www.serpentine.com/wordpress/wp-content/uploads/2006/12/slope_one.py.txt
[3]
http://www.daniel-lemire.com/fr/documents/publications/lemiremaclachlan_sdm05.pdf


# take a 'training' set, tr.set and a vector with some missing ratings, d

pred=function(tr.set,d) {

tr.set=rbind(tr.set,d)
n.items=ncol(tr.set)

# tally frequencies to use as weights
freqs=sapply(1:n.items, function(i) {
unlist(lapply(1:n.items, function(j) {
sum(!(i==j)&!is.na(tr.set[,i])&!is.na(tr.set[,j])) })) })

# estimates product-by-product mean differences in ratings
diffs=array(NA, dim=c(n.items,n.items))
diffs=sapply(1:n.items, function(i) {
unlist(lapply(1:n.items, function(j) {
diffs[j,i]=mean(tr.set[,i]-tr.set[,j],na.rm=T) })) })

# create an output vector with NAs for all the items the user has already
rated
pred.out=as.numeric(is.na(d))
pred.out[!is.na(d)]=NA
a=which(!is.na(pred.out))
b=which(is.na(pred.out))

# calculated the weighted slope one estimate
pred.out[a]=sapply(a, function(i) {
sum(unlist(lapply(b,function (j) {
sum((d[j]+diffs[j,i])*freqs[j,i])/rowSums(freqs)[i] }))) })

names(pred.out)=colnames(tr.set)
return(pred.out) }
# end function

# test 1, using example from [1]
john=c(item1=5, item2=3, item3=2)
mark=c(item1=3, item2=4, item3=NA)
tr.set1=rbind(john,mark)
lucy1=c(item1=NA, item2=2, item3=5)
pred(tr.set1,lucy1)   ##  item1=4.33 --> correct

# test 2, using example from [2]
alice=c(squid=1.0, octopus=0.2, cuttlefish=0.5, nautilus=NA)
bob=c(squid=1.0, octopus=0.5, cuttlefish=NA, nautilus=0.2)
carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4)
dave=c(squid=NA, octopus=0.4, cuttlefish=0.9, nautilus=0.5)
tr.set2=rbind(alice,bob,carole,dave)
lucy2=c(squid=0.4, octopus=NA, cuttlefish=NA, nautilus=NA)
pred(tr.set2,lucy2)  ## not correct
# correct(?): {'nautilus': 0.10, 'octopus': 0.23, 'cuttlefish': 0.25}

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to