Dear all, first I would like to tell you that I've been using R for two days... (so, you can predict my knowledge of the language!).
Yet, I managed to implement some stuff related with the Long-Tail model [1]. I did some tests with the data in table 1 (from [1]), and plotted figure 2 (from [1]). (See R code and CSV file at the end of the email) Now, I'm stuck in the nonlinear regression model of F(x). I got a nice error: " Error in nls(~F(r, N50, beta, alfa), data = dataset, start = list(N50 = N50, : singular gradient " And, yes, I've been looking for how to solve this (via this mailing list + some google), and I could not come across to a proper solution. That's why I am asking the experts to help me! :-) So, any help would be much appreciated... Cheers, Oscar [1] http://www.firstmonday.org/issues/issue12_5/kilkki/ PS: R code and CVS file FILE: "data.R" (data taken from [1] Table 1, columns 1 and 2) --8=<------------------- "rank","cum_value" 10, 17396510 32, 31194809 96, 53447300 420, 100379331 1187, 152238166 24234, 432238757 91242, 581332371 294180, 650880870 1242185,665227287 -->=8------------------- R CODE: # # F(x). The long-tail model # Reference: http://www.firstmonday.org/issues/issue12_5/kilkki/ # Params: # x : Rank (either an integer or a list) # N50 : the number of objects that cover half of the whole volume # beta: total volume # alfa: the factor that defines the form of the function F <- function (x, N50, beta=1.0, alfa=0.49) { xx <- as.numeric(x) # as.numeric() prevents overflow Fx = beta / ( (N50/xx)^alfa + 1 ) Fx } # Read CSV file (rank, cum_value) lt <- read.csv(file="data.R",head=TRUE,sep=",") r <- lt$rank v <- lt$cum_value pcnt <- v/v[length(v)] *100 # get cumulative percentage plot(r, pcnt, log="x", type='l', xlab='Ranking', ylab='Cumulative percentatge of sales', main="Books Popularity", sub="The long-tail effect", col='blue') # Set some default values to be used by F(x)... alfa = 0.49 beta = 1.38 N50 = 30714 # Start using F(x). Results are in 'f' ... f <- c(0) # oops! is this the best initialization for 'f'? for (i in 1:24234) f[i] <- F(i, N50, beta, alfa)*100 # Plot some estimated values from F(x) (N50, beta, and alfa values come from the paper. See ref. [1]) plot(f, log="x", type='l', xlab='Ranking', ylab='Cumulative percentatge of sales', main="Books Popularity", sub="Plotting first values of F(x) and some real points") points(r, pcnt, col="blue") # adding the "real" points # Create a dataset to be used by nls() dataset <- data.frame(r, pcnt) # Verifying that F(x) works fine... (comparing with the "real" values contained in the dataset) dataset F(10, N50, beta, alfa) * 100 F(32, N50, beta, alfa) * 100 F(96, N50, beta, alfa) * 100 F(420, N50, beta, alfa) * 100 F(1187, N50, beta, alfa) * 100 F(24234, N50, beta, alfa) * 100 F(91242, N50, beta, alfa) * 100 F(294180, N50, beta, alfa) * 100 F(1242185, N50, beta, alfa) * 100 #dataset <- data.frame(pcnt) # which dataset should I use? Should I include the ranks in it? nls( ~ F(r, N50, beta, alfa), data = dataset, start = list(N50=N50, beta=beta, alfa=alfa), trace = TRUE ) ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.