[R] prediction error for test set-cross validation
Hi, I have a database of 2211 rows with 31 entries each and I manually split my data into 10 folds for cross validation. I build logistic regression model as: model - glm(qual ~ AgGr + FaHx + PrHx + PrSr + PaLp + SvD + IndExam + Rad +BrDn + BRDS + PrinFin+ SkRtr + NpRtr + SkThck +TrThkc + SkLes + AxAdnp + ArcDst + MaDen + CaDt + MaMG + MaMrp + MaSh + SCTub + SCFoc + MaSz, family=binomial(link=logit)); Where the variables are taken from the trainSet of size 1989x31. The test set is sized 222x31. Now my question is when I try to predict on the test set it gives me the error: predict.glm(model, testSet, type =response) Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : subscript out of bounds It does fine on trainSet. so it is something about the testSet. On the other hand, I realized that some independent variables say MaSz takes 3 different values in the trainset vs. 4 different ones in the testSet. I am not sure if this is the cause.If so, what would be the remedy? Since I can retrieve the coefficients of the logistic regression, I could manually calculate response for each entry in the testSet. This could solve my problem although burdensome. But, I don't know an easy way of doing it as my logistic regression have 80+ coefficients. Could somebody advise? Thanks, M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting a dataframe to a matrix
xtabs is your friend: xtabs(likes ~ color + name, data=dat) color nameblue green red jake 1 1 0 sally1 1 0 tom 0 0 1 See ?xtabs for more info. Note that I changed the likes? column to just likes. It is a bad idea to have question marks in variable names. Simon. On Wed, 2009-03-11 at 01:42 -0400, Jennifer Brea wrote: If I have a dataframe which is organized like this: name color likes? 1 sally red0 2 sally blue1 3 sally green1 4 jake red0 5 jake blue1 6 jake green1 7 tom red1 8 tom blue0 9 tom green0 And I want to create a matrix in the form: red blue green sally 01 1 jake01 1 tom 10 0 Are there any built-in commands that might help me do this? Also, I can't assume that there is an observation for every person-color. In other words, in the original dataset, there might be some colors for which sally offered no opinion. In some cases, this may be represented by NA, in others, it may mean that no row exists for sally for that color. Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician School of Biological Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320 Goddard Building (8) T: +61 7 3365 2506 http://www.uq.edu.au/~uqsblomb email: S.Blomberg1_at_uq.edu.au Policies: 1. I will NOT analyse your data for you. 2. Your deadline is your problem. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting a dataframe to a matrix
Hi: Below works but it's extremely ugly and overly complicated. i'm sure someone else will send you something better and I'll be waiting also. Also, the way I named the rows and columns works for below but it won't hold in the general case if you don't have nice ordered names like you do below. So, use it until we something better. DF - read.table(textConnection(name color likes 1 sally red 0 2 sally blue 1 3 sally green 1 4 jake red 0 5 jake blue 1 6 jake green 1 7 tom red 1 8 tom blue 0 9 tom green 0),header=TRUE,stringsAsFactors=FALSE) mat - matrix(sapply(1:nrow(DF),function(.row) { DF[.row,3] }),nrow=length(unique(DF$name)),byrow=TRUE,dimnames=list(c(unique(DF$name)),c(unique(DF$color print(mat) On Wed, Mar 11, 2009 at 12:42 AM, Jennifer Brea wrote: If I have a dataframe which is organized like this: name color likes? 1 sally red0 2 sally blue1 3 sally green1 4 jake red0 5 jake blue1 6 jake green1 7 tom red1 8 tom blue0 9 tom green0 And I want to create a matrix in the form: red blue green sally 01 1 jake01 1 tom 10 0 Are there any built-in commands that might help me do this? Also, I can't assume that there is an observation for every person-color. In other words, in the original dataset, there might be some colors for which sally offered no opinion. In some cases, this may be represented by NA, in others, it may mean that no row exists for sally for that color. Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lsmeans in R
I need help with calculating lsmeans (adjusted means) of different terms in a linear model including the main effect and the interaction effect terms. I use lm to run the linear models...I previously noted from literature that that effects package can be used to generate lsmeans. But I tried to use it but could not figure out which option to use to get means. If anyone can give an example of how to get lsmeans using lm object, that will very helpful. Thanks, SUman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Barplot -density and multhist x-axis problem
Thanks in advance for any help I would like to plot a barplot with probability density instead of count frequencies, as is possible in histograms (freq=FALSE). Is this possible? I have tried the following code: with(center,table(light))-light.table barplot(light.table ,col=brewer, freq=FALSE, beside=T) But freq=FALSE doesnt seem to work in the command of barplot. The reason why I want to use a barplot is that I have two variables. I have tried to use multhist instead, but when I plot the histogram the x-axis labels are unevenly spaced. I would like to create a bar with 7 ticks on the x-axis (1,2,3..7), however the following tics are yielded in the x-axis (1, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5). How can I regulate this? I have used the following code: l-list(m1=(center$light[center$site==1]), m2=(center$light[center$site==2])) multhist(l,breaks=c(1:7, by=1), freq=FALSE, main=Illumination in Plot Center, ylab=Density, col =brewer) I have tried names.arg= but this doesnt work in multihist either. Is there any way around this problem? Either by using barplot or multihist? I am very grateful for any help Honorata __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : Converting a dataframe to a matrix
Something like this will work m-matrix(df1$likes, nr=3,nc=3,byrow=T) colnames(m)-unique(df1$color) rowlnames(m)-unique(df1$name) Sincerly. Justin BEM BP 1917 Yaoundé Cameroun Tél (237) 76043774 De : Jennifer Brea b...@fas.harvard.edu À : r-help@r-project.org Envoyé le : Mercredi, 11 Mars 2009, 6h42mn 29s Objet : [R] Converting a dataframe to a matrix If I have a dataframe which is organized like this: name color likes? 1 sally red 0 2 sally blue 1 3 sally green 1 4 jake red 0 5 jake blue 1 6 jake green 1 7 tom red 1 8 tom blue 0 9 tom green 0 And I want to create a matrix in the form: red blue green sally 0 1 1 jake 0 1 1 tom 1 0 0 Are there any built-in commands that might help me do this? Also, I can't assume that there is an observation for every person-color. In other words, in the original dataset, there might be some colors for which sally offered no opinion. In some cases, this may be represented by NA, in others, it may mean that no row exists for sally for that color. Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] t-test, survey data
*How can I do a t-test with survey data?* Thanks. Marie -- - Le Bureau fédéral du Plan fête ses 50 ans en 2009. Het Federaal Planbureau viert zijn 50-ste verjaardag in 2009. The Federal Planning Bureau celebrates its 50-th anniversary in 2009. - Disclaimer: please see www.plan.be/disclaimer.html Please, think before you print... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : t-test, survey data
For survey data, the survey package provides methods for variance estimation. Justin BEM BP 1917 Yaoundé Tél (237) 99597295 (237) 22040246 De : Marie Vandresse v...@plan.be À : r-help@r-project.org Envoyé le : Mercredi, 11 Mars 2009, 9h25mn 11s Objet : [R] t-test, survey data *How can I do a t-test with survey data?* Thanks. Marie -- - Le Bureau fédéral du Plan fête ses 50 ans en 2009. Het Federaal Planbureau viert zijn 50-ste verjaardag in 2009. The Federal Planning Bureau celebrates its 50-th anniversary in 2009. - Disclaimer: please see www.plan.be/disclaimer.html Please, think before you print... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lsmeans in R
On 3/11/2009 2:45 AM, suman Duvvuru wrote: I need help with calculating lsmeans (adjusted means) of different terms in a linear model including the main effect and the interaction effect terms. I use lm to run the linear models...I previously noted from literature that that effects package can be used to generate lsmeans. But I tried to use it but could not figure out which option to use to get means. If anyone can give an example of how to get lsmeans using lm object, that will very helpful. This R-help thread from March 2007 should help: http://finzi.psych.upenn.edu/R/Rhelp02/archive/95809.html Thanks, SUman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removing daylight savings in R
On Wed, 11 Mar 2009, Rhiannon Marchant wrote: Hello, Thanks the Prof. Brian Ripley for his earlier response, however I still have a few issues with the times in R. I'd like to somehow set up R so it automatically has the timezone set to Western Australian standard time when I open up a workspace instead of Western Australian daylight savings time. Is this possible? I haven't had much luck in trying to find this. Yes, see ?Sys.timezone (as described in my earlier reply). According to Wikipedia that is not an actual timezone, but it seems you want TZ='GMT-8'. As that is not a known timezone, you will get some messages when trying to print the time and (spuriously) GMT as the timezone names, but input conversion did work. The best thing to do would be to create a timezone with the names you want and recreate the zoneinfo database: there are instructions on how to do so in the R sources. Life would be a lot easier if you keep times in an official timezone or UTC. I would suggest reading them as if in UTC (again, I explained that in my first reply) and shifting by 8 hours (subtract 8*3600): you can then print them in UTC or a real-world time. [...] -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeated values
Hi,All. How to make a program to delete repeated value? a example c [1] 4 3 0 3 4 1 0 1 4 4 3 4 3 4 I want to get : 4 3 0 3 4 1 0 1 4 3 4 3 4 two 4 is being represented by one 4. Thanks. Tammy _ More than messagescheck out the rest of the Windows Live. http://www.microsoft.com/windows/windowslive/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mean +/- SEM
Stefo Ratino wrote: Hi all, I am looking for a R function which unables me to plot mean +/- SEM. Is there such a function in R? Hi Stefo, The dispersion function in the plotrix package (among many, many others in other packages) allows you to stick error bars onto almost anything. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Christos Hatzis wrote: Bioconductor already provides download stats for all packages... http://bioconductor.org/packages/stats/bioc/affy.html Maybe if we asked the Bioconductor people _really_ nicely Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated values
On 3/11/2009 5:08 AM, Tammy Ma wrote: Hi,All. How to make a program to delete repeated value? a example c [1] 4 3 0 3 4 1 0 1 4 4 3 4 3 4 I want to get : 4 3 0 3 4 1 0 1 4 3 4 3 4 two 4 is being represented by one 4. x - c(4, 3, 0, 3, 4, 1, 0, 1, 4, 4, 3, 4, 3, 4) rle(x)$values [1] 4 3 0 3 4 1 0 1 4 3 4 3 4 ?rle Thanks. Tammy _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/windows/windowslive/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplot -density and multhist x-axis problem
Honorata Kaja Gajda wrote: Thanks in advance for any help I would like to plot a barplot with probability density instead of count frequencies, as is possible in histograms (freq=FALSE). Is this possible? I have tried the following code: with(center,table(light))-light.table barplot(light.table ,col=brewer, freq=FALSE, beside=T) But freq=FALSE doesn’t seem to work in the command of barplot. The reason why I want to use a barplot is that I have two variables. I have tried to use multhist instead, but when I plot the histogram the x-axis labels are unevenly spaced. I would like to create a bar with 7 ticks on the x-axis (1,2,3..7), however the following tics are yielded in the x-axis (1, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5). How can I regulate this? I have used the following code: l-list(m1=(center$light[center$site==1]), m2=(center$light[center$site==2])) multhist(l,breaks=c(1:7, by=1), freq=FALSE, main=Illumination in Plot Center, ylab=Density, col =brewer) I have tried names.arg= but this doesn’t work in multihist either. Is there any way around this problem? Either by using barplot or multihist? Hi Honorata, Have a look at the barp function in plotrix. The bars or groups of bars are centered on integer values. You can also save the return value of barplot as this gives you the positions of the bars. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction error for test set-cross validation
Mehmet U Ayvaci wrote: Hi, I have a database of 2211 rows with 31 entries each and I manually split my data into 10 folds for cross validation. I build logistic regression model as: model - glm(qual ~ AgGr + FaHx + PrHx + PrSr + PaLp + SvD + IndExam + Rad +BrDn + BRDS + PrinFin+ SkRtr + NpRtr + SkThck +TrThkc + SkLes + AxAdnp + ArcDst + MaDen + CaDt + MaMG + MaMrp + MaSh + SCTub + SCFoc + MaSz, family=binomial(link=logit)); Where the variables are taken from the trainSet of size 1989x31. The test set is sized 222x31. Now my question is when I try to predict on the test set it gives me the error: predict.glm(model, testSet, type =response) Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : subscript out of bounds It does fine on trainSet. so it is something about the testSet. On the other hand, I realized that some independent variables say MaSz takes 3 different values in the trainset vs. 4 different ones in the testSet. I am not sure if this is the cause.If so, what would be the remedy? Since I can retrieve the coefficients of the logistic regression, I could manually calculate response for each entry in the testSet. This could solve my problem although burdensome. But, I don't know an easy way of doing it as my logistic regression have 80+ coefficients. Well, if MaSz takes 3 different values in the trainset vs. 4 different ones in the testSet, then you won't even be able to calculate it by hand, because you got no coefficients for the 4th level of that factor. Either you need the data to estimate coefficients from or you cannot predict. Uwe Ligges Could somebody advise? Thanks, M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a directory for my data
Hmmm, in fact I do not understand what you are going to do. Can you tell us what the result of your code should look like? Uwe Ligges miya wrote: Hi everyone, I am currently working with a very large data set. It is data collected a few times a day, so there are repeated titles in the data set. I want to assign an id number to each different title and enter this information in a directory that I can access whenever I am working with the data. An example of a data I am workin with follows: 1 title A 2 title C 3 title B 1 title A 2 title B 3 title C 1 title D 2 title A 3 title C 1 title B 2 title A 3 title C What I have tried thus far is for loops. r-matrix(x[,1]) t-matrix(x[,2]) for(i in 1:length(t)) { for(k in 1:length(r)) { z = matrix(c(i,t)) A = Article A for(j in 1:length(z)){ if(B==Article A) else{enter article in z} } } This is of course giving me a lot of errors. I just have no idea where to go with this. Does anyone have any ideas on where I can go from here to create my directory? I appreciate all your help. Thank you in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trying to run odfWeave()
Hi there ! I'm working with windows and R GUI and I'm trying to generate an automatic repport using odfWeave. I have taken the basic template available on line at : http://www.biostat.uzh.ch/services/templates.html which is SampleOdf.odt I've imported the package odfWeave and the corresponding library. And then I run : file.in=E:/Tex/SampleOdf.odt file.out=E:/Tex/RepportOdf.odt odfWeave(file.in, file.out) Copying E:/Tex/SampleOdf.odt Setting wd to C:\DOCUME~1\Lore\LOCALS~1\Temp\RtmpSjnx0I/odfWeave11100313419 Unzipping ODF file using unzip -o SampleOdf.odt Erreur dans odfWeave(file.in, file.out) : Error unzipping file De plus : Warning message: In system(zipCmd[2], invisible = TRUE) : unzip introuvable Is there someone who has an idea of what's the problem here ? Then, I have a second question, which totally independent from the one above (I have an error in both cases...) : is RepportOdf.odt automatically generated/created or should I create the document before ? Thanks a lot everyone for your help ! Lo. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RES: North Arrow (.png file) on a Map
Thanks a lot Yihui, It's perfect, in fact exactly identical the I have as a png file in my computer. Thank you so much. Regards Rodrigo -Mensagem original- De: Yihui Xie [mailto:xieyi...@gmail.com] Enviada em: quarta-feira, 11 de março de 2009 02:48 Para: Rodrigo Aluizio Cc: R Help Assunto: Re: [R] North Arrow (.png file) on a Map Is this arrow satisfactory for you? north.arrow = function(x, y, h) { polygon(c(x, x, x + h/2), c(y - h, y, y - (1 + sqrt(3)/2) * h), col = black, border = NA) polygon(c(x, x + h/2, x, x - h/2), c(y - h, y - (1 + sqrt(3)/2) * h, y, y - (1 + sqrt(3)/2) * h)) text(x, y, N, adj = c(0.5, 0), cex = 4) } plot(1, type = n, ylim = c(0, 1)) north.arrow(1, 0.8, 0.3) Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Tue, Mar 10, 2009 at 7:21 PM, Rodrigo Aluizio r.alui...@gmail.com wrote: Hi list. I would like to know how do I insert a North arrow, stored as a png file in my computer, in a map? I found lots of post asking similar things, one of them mentioned the pixmap package. The map was done using map() and shapefiles (the code is below). Im using the pixmap () and addlogo() functions. Well I can import the png with pixmap() function (I guess, once theres no error message), but I cant put It on the map, I got an error message telling me that: Error at t(x...@index[nrow(x...@index):1, , drop = FALSE]) : index out of limits Well I tried changing coordinates but I always got the same result. How do I do this correctly? Is there a better way? Thanks for the help and attention. Here is the complete map script: library(RODBC) library(maps) library(mapdata) library(maptools) library(pixmap) #Carregar Coordenadas e dados dos Pontos Amostrais Dados-odbcConnectExcel('Campos.xls',readOnly=T) Coord-sqlFetch(Dados,'CoordMed',colnames=F,rownames='Ponto') odbcClose(Dados) N-pixmap('Norte.png',nrow=166,ncol=113) # Carregar pontos e shapes Batimetria-readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontol ogia/Campos/ShapeFiles/Batimetria_BC.shp') Estados-readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontologi a/Campos/ShapeFiles/Estados_Sudeste.shp') Faciologia-readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontol ogia/Campos/ShapeFiles/Faciologia_BC.shp') # Mapa com os Pontos da Bacia postscript('MapaCampos.eps',paper='special',onefile=F,horizontal=F,width=3.5 ,height=4.5,bg='white',pointsize=3) par(mar=c(3,2,2,0)) map('worldHires','brazil',ylim=c(23.9,20.3),xlim=c(42.1,39.2),type='n') plot(Faciologia,ylab='',xlab='',col=c('lightgreen','lightgreen','lightgreen' ,'lightgreen','lightgreen','lightgray','lightgray','lightgray','lightgray',' lightgray','lightgray','lightgray','lightgray','lightgray','lightgray','ligh tgray','lightgray','lightgray','lightgray','lightgray','lightgray','lightyel low','lightyellow','lightyellow'),add=T,lwd=0.5,border=0) plot(Batimetria,ylab='',xlab='',col='darkgray',lty='solid',lwd=0.2,add=T) plot(Estados,ylab='',xlab='',lty='solid',add=T,lwd=0.8) text(Coord$Longitude[Coord$Réplicas=='1'],Coord$Latitude[Coord$Réplicas=='1' ],rownames(Coord)[Coord$Réplicas=='1'],col='red',cex=0.5,font=2) text(Coord$Longitude[Coord$Réplicas=='2'],Coord$Latitude[Coord$Réplicas=='2' ],rownames(Coord)[Coord$Réplicas=='2'],col='yellow',cex=0.5,font=2) text(Coord$Longitude[Coord$Réplicas=='3'],Coord$Latitude[Coord$Réplicas=='3' ],rownames(Coord)[Coord$Réplicas=='3'],col='blue',cex=0.5,font=2) points(Coord$Longitude,Coord$Latitude-0.045,pch=20,cex=0.7) text(c(41.5,41.3),c(21.7,20.6),c('RJ','ES')) axis(1,xaxp=c(42.1,39.2,2),cex.axis=1) axis(2,yaxp=c(23.9,20.3,4),cex.axis=1) title(main='Bacia') legend(40.2,23.5,c('Uma','Duas','Três'),pch=21,cex=1,pt.bg=c('red','yellow', 'blue'),bty='n',pt.cex=2,pt.lwd=0.6,title='Réplicas') legend(39.8,23.5,c('Areia','Calcário','Lama'),pch=21,cex=1,pt.bg=c('lightyel low','lightgray','lightgreen'),bty='n',pt.cex=2,pt.lwd=0.6,title='Faciologia ') addlogo(N,px=c(40,39.8),py=c(21,20.8)) dev.off() q('no') - MSc. Rodrigo Aluizio mailto:r.alui...@gmail.com Centro de Estudos do Mar/UFPR Laboratório de Micropaleontologia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
[R] Date frame
Hi, All, How to make a data frame, each row of data frame store the different length of vector? Thanks. Tammy _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] chisq.test: decreasing p-value
A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, Sören -- Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence/prediction interval for choice probabilities mlogit
Hi, I have estimated a multinomial logit model with use of the mlogit-package for my choice data. I predicted the probabilities for each choice, using the formula: pred1 - exp(B1*X1.1 + B2*X2.1 + B3*X3.1) pred2 - exp(B1*X1.2 + B2*X2.2 + B3*X3.2) pred3 - exp(B1*X1.3 + B2*X2.3 + B3*X3.3) sumpred - pred1+pred2+pred3 pred1 - pred1/sumpred pred2 - pred2/sumpred pred3 - pred3/sumpred Then for each choice I have a prediction of the choice probabilities. Now I want to calculate a confidence interval of the choice probablities. I'm not sure if it is called a confidence interval or a prediction interval. At the internet I haven't found anything useful. Most examples are about the confidence interval of a certain regression coefficient, for example the intercept. But I want to know the interval around my choice probabilities. The predict() function does not work with mlogit, I think. Can anybody help me? Kind regards, Tryntsje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo-random numbers between two numbers
g...@ucalgary.ca wrote: Please forget the last email I sent with the same subject. = I would like to generate pseudo-random numbers between two numbers using R, up to a given distribution, for instance, norm. That is something like rnorm(HowMany,Min,Max,mean,sd) over rnorm(HowMany,mean,sd). I am wondering if qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd) is good. It depends on what Min and Max are. If you get far out in the tails, rounding error will kill you. For example, pnorm(x) is exactly 1 for x bigger than 10 or so, so this approach would fail if Min and Max were both bigger than 10. The solution is to switch to lower=FALSE in the upper tail, and possibly switch to a log scale if you want to be really extreme. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) This is a combination of user error and an infelicity in chisq.test. You are sapply'ing over a list with one element, so essentially you are doing chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value Now B is supposed to be a single integer, so the above cannot be expected to do anything sensible, but you might have hoped for an error message. Instead, it seems that you get the result of r[1] replications divided by r+1: chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 7/(r+1) [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 What you really wanted was sapply(r,function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) [1] 0.9090909 0.8118812 0.7964072 0.7672328 0.8025987 0.7932414 Thank you, Sören --Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
On Mar 11, 2009, at 6:36 AM, soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the truth. (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? In principle they are not the same. Do you want a test that tells you they are? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, Sören -- Sören Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Making a prediction model from a plot
Hello I have plot looking like this: http://www.nabble.com/file/p22453304/Index%2Bgraf.jpeg Where each continues line corresponds to a period og days, and each break on the line corresponds to a shift in the index level. So what i thought about was, is it possible to make a model predicting the index level depending on the length of period? Here is a bit of my dataset: b[b$a.CHR_NR==97592,] a.CHR_NR a.SALMO_INDEX a.x a.diffdato1 a.akkusum 149297592 20.6 62 3434 149397592 18.7 62 1448 149497592 18.1 62 1462 149597592 19.8 16 1616 149697592 18.1 139 7 7 149797592 18.1 139 1017 149897592 22.2 139 320 149997592 22.2 139 1131 150097592 23.2 139 738 150197592 23.2 139 2361 150297592 26.2 139 768 150397592 26.2 139 3098 150497592 15.2 139 10 108 150597592 15.2 139 7 115 150697592 10.8 139 4 119 150797592 10.8 139 20 139 150897592 15.2 62 6 6 150997592 15.2 62 2834 151097592 12.3 62 438 151197592 12.3 62 2462 151297592 14.222 1010 151397592 14.222 1222 151497592 31.6 140 3232 151597592 35.1 140 2456 151697592 30.0 140 3995 151797592 10.9140 11 106 151897592 11.8140 34 140 151997592 10.1329 2828 152097592 10.6329 3058 152197592 57.0 329 3088 152297592 47.3 329 27 115 152397592 30.4 329 28 143 152497592 19.3 329 21 164 152597592 28.8 329 39 203 152697592 20.4 329 23 226 152797592 35.1 329 26 252 152897592 21.3 329 32 284 152997592 16.6 329 45 329 Where a.CHR_NR is a particular herd (i have a lot of herds), a.SALMO_INDEX is the index level (the responds variable), a.x is the length of period (the lines on the plot), a.diffdato1 is the days in the period and a.akkusum is the cumulated sum of a.diffdato1 making the breaks on the lines. Hope you can help me -- View this message in context: http://www.nabble.com/Making-a-prediction-model-from-a-plot-tp22453304p22453304.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date frame
It is called a list. On Wed, Mar 11, 2009 at 6:32 AM, Tammy Ma metal_lical...@live.com wrote: Hi, All, How to make a data frame, each row of data frame store the different length of vector? Thanks. Tammy _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave xtable still best packages to produce tables?
Just a quick question - I did a quick search of the R-search (http://search.r-project.org/nmz.html) [thanks to David Winsemius for recommending this neat R search engine] and R-seek (http://www.rseek.org/) trying to find a package(s) in order to produce publication quality tables. I found several references to odfWeave and xtable. My question is, are those still best R packages to produce publication quality tables tables from R? Thank you for any feedback and insights. P.S. I know I should be trying to Using Graphs Instead of Tables, but some customers (bosses) still like seeing tables. Thanks again:) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled by math on date-time objects
Hi Phil, Well thank you very much for this detailed explanation. It will help me when summarizing information over periods of time using either summarize (Hmisc) or summaryBy (doBy). Until now, doing so resulted in mean time for each group being transformed as a number of seconds, as you explain below. But both these functions do not put it back in a POSIX date-time object. I tried to do so by using as.POSIXct() but this failed because I did not provide a reference. From now on I'll try the structure command you used below. Denis Le 09-03-10 à 19:04, Phil Spector a écrit : Denis - If you look inside of summary.POSIXct, you'll see the following: x - summary.default(unclass(object), digits = digits, ...)[1:6] In other words, summary accepts the POSIX object, unclasses it (resulting in a numeric value representing the number of seconds since January 1, 1960), performs the operation, and then reassigns the class. You can do this basic trick yourself. Suppose we have a vector of dates and want the median: dates = as.POSIXct(c('2009-3-15','2009-2-19','2009-3-20','2009-2-18')) median(dates) Error in Summary.POSIXct(c(1235030400, 1237100400), na.rm = FALSE) : 'sum' not defined for POSIXt objects res = median(as.numeric(dates)) structure(res,class='POSIXct') [1] 2009-03-02 23:30:00 PST I think it's clear that you can do any arithmetic operation on dates this way, even if it doesn't make sense: sum(dates) Error in Summary.POSIXct(c(1237100400, 1235030400, 1237532400, 1234944000 : 'sum' not defined for POSIXt objects res = sum(as.numeric(dates)) structure(res,class='POSIXct') [1] 2126-09-08 23:00:00 PDT I'm quite certain that median.POSIXct will be fixed pretty quickly, but you can always unclass and reclass to do what you need. - Phil On Tue, 10 Mar 2009, Denis Chabot wrote: Thanks Phil, but how does summary() finds the median of the same type of object? I would have thought the algorithm used when the vector is even would also require the SUM of the POSIX vector. I am glad of the solution you propose, but still puzzled a bit! Denis Le 09-03-10 à 12:39, Phil Spector a écrit : Denis - There is no median method for POSIX objects, although there is a summary object. Thus, when you pass a POSIX object to median, it uses median.default, which contains the following code: if (n%%2L == 1L) sort(x, partial = half)[half] else sum(sort(x, partial = half + 0L:1L)[half + 0L:1L])/2 So when the length of your POSIX vector is odd, it works, but if it's even, it would need to take the sum of a POSIX object. Of course, there is no sum method for POSIX objects, since it doesn't make sense. Right now, it looks like your best bet for a summary of POSIX objects is summary(a)['Median'] - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 10 Mar 2009, Denis Chabot wrote: Hi, I don't understand the following. When I create a small artificial set of date information in class POSIXct, I can calculate the mean and the median: a = as.POSIXct(Sys.time()) a = a + 60*0:10; a [1] 2009-03-10 11:30:16 EDT 2009-03-10 11:31:16 EDT 2009-03-10 11:32:16 EDT [4] 2009-03-10 11:33:16 EDT 2009-03-10 11:34:16 EDT 2009-03-10 11:35:16 EDT [7] 2009-03-10 11:36:16 EDT 2009-03-10 11:37:16 EDT 2009-03-10 11:38:16 EDT [10] 2009-03-10 11:39:16 EDT 2009-03-10 11:40:16 EDT median(a) [1] 2009-03-10 11:35:16 EDT mean(a) [1] 2009-03-10 11:35:16 EDT But for real data (for this post, a short subset is in object c) that I have converted into a POSIXct object, I cannot calculate the median with median(), though I do get it with summary(): c [1] 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24 14:51:19 EST [4] 2009-02-24 14:51:20 EST 2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST [7] 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST 2009-02-24 14:51:22 EST [10] 2009-02-24 14:51:22 EST class(c) [1] POSIXt POSIXct median(c) Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm = FALSE) : 'sum' not defined for POSIXt objects One difference is that in my own date-time series, some events are repeated (the original data contained fractions of seconds). But then, why can I get a median through summary()? summary(c) Min. 1st Qu. Median 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24 14:51:20 EST Mean 3rd Qu. Max. 2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST Thanks in advance, Denis Chabot sessionInfo() R version 2.8.1 Patched (2009-01-19 r47650) i386-apple-darwin9.6.0 locale:
Re: [R] (no subject)
Shuying Yang wrote: Dear Members, I have a question about using R2WinBUGS to obtain the WinBUGS results. By default, when R2WinBUGS returns summary stats, I got mean, sd, 2.5%, 25%, median, 75% and 97.5%. Could anyone tell me how to modify the code to obtain 5% and 95% summary results? Hi, please read the posting guide and use a sensible subject line (I almost missed your posting). The used quantiles in the printed output are hardcoded, but you can calculate arbitrary quantiles from the data. Ad an example, please consider the schools example from ?bugs, running the example results in an object called schools.sim. Now you can calculate arbitrary quantiles as e.g. in: apply(schools.sim$sims.matrix, 2, quantile, c(0.05, 0.95)) Uwe Ligges Many thanks Alice _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lsmeans in R
Dear Suman, Chuck Cleland has already pointed you toward a reasonably complete discussion of the topic (thank you Chuck). To update my contribution to that discussion, the effects package now uses t-intervals for models with an estimated dispersion parameter (such as linear models) and will create displays for multinomial and proportional-odds logit models. (The latter isn't relevant to your intended application, of course.) Beyond that, I don't quite understand your question. Adjusted means are fitted values, and this is what the effects package gives you for linear models. Maybe you could explain in more detail what it is that you want and why you think that you're not getting it. Regards, John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of suman Duvvuru Sent: March-11-09 2:45 AM To: r-help@r-project.org Subject: [R] lsmeans in R I need help with calculating lsmeans (adjusted means) of different terms in a linear model including the main effect and the interaction effect terms. I use lm to run the linear models...I previously noted from literature that that effects package can be used to generate lsmeans. But I tried to use it but could not figure out which option to use to get means. If anyone can give an example of how to get lsmeans using lm object, that will very helpful. Thanks, SUman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date frame
I have produced the data frame(1) as: starting_timeEnding_time session_1: 14:36:1014:40:44 session_2: 14:40:47 14:41:47 for each session, I produced different sequences as: seq_1-c(4 ,3, 0 ,3 ,4 ,1, 0 ,1 ,4 ,3 ,4 ,3 ,4) seq_2-c(3,4,1,2) seq_3-Null I want to put those to the 3rd column of data frame (1) corresponding the individual session. How could I do so? Originally, I want to create a data frame with each row contains different sequence and then put to 3rd column, which seems couldn't be done. thanks. Tammy Date: Wed, 11 Mar 2009 07:50:37 -0400 Subject: Re: [R] Date frame From: jholt...@gmail.com To: metal_lical...@live.com CC: r-help@r-project.org It is called a list. On Wed, Mar 11, 2009 at 6:32 AM, Tammy Ma metal_lical...@live.com wrote: Hi, All, How to make a data frame, each row of data frame store the different length of vector? Thanks. Tammy _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? _ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction error for test set-cross validation
Uwe Ligges wrote: Mehmet U Ayvaci wrote: Hi, I have a database of 2211 rows with 31 entries each and I manually split my data into 10 folds for cross validation. I build logistic regression model as: model - glm(qual ~ AgGr + FaHx + PrHx + PrSr + PaLp + SvD + IndExam + Rad +BrDn + BRDS + PrinFin+ SkRtr + NpRtr + SkThck +TrThkc + SkLes + AxAdnp + ArcDst + MaDen + CaDt + MaMG + MaMrp + MaSh + SCTub + SCFoc + MaSz, family=binomial(link=logit)); Where the variables are taken from the trainSet of size 1989x31. The test set is sized 222x31. Now my question is when I try to predict on the test set it gives me the error: predict.glm(model, testSet, type =response) Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : subscript out of bounds It does fine on trainSet. so it is something about the testSet. On the other hand, I realized that some independent variables say MaSz takes 3 different values in the trainset vs. 4 different ones in the testSet. I am not sure if this is the cause.If so, what would be the remedy? Since I can retrieve the coefficients of the logistic regression, I could manually calculate response for each entry in the testSet. This could solve my problem although burdensome. But, I don't know an easy way of doing it as my logistic regression have 80+ coefficients. Well, if MaSz takes 3 different values in the trainset vs. 4 different ones in the testSet, then you won't even be able to calculate it by hand, because you got no coefficients for the 4th level of that factor. Either you need the data to estimate coefficients from or you cannot predict. Uwe Ligges And note that your test sample is far too small to yield reliable results. You need to use resampling (e.g., bootstrap or 50-fold repeats of 10-fold cross-validation). See the validate function in the Design package. Note that validate does not implement the proportion classified correctly because this is an improper scoring rule with minimum information/lowest precision/lowest power. Frank Harrell Could somebody advise? Thanks, M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t-test, survey data
On Wed, 11 Mar 2009, Marie Vandresse wrote: *How can I do a t-test with survey data?* The 'survey' package does analysis of survey data. There isn't a t-test function; the easiest way to test for differences in means is with a linear regression model, using svyglm() -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mean +/- SEM
Hi Stefo, Perhaps this post might be useful: http://www.nabble.com/error-bars-to22092367.html#a22092367 HTH, Jorge On Tue, Mar 10, 2009 at 6:39 PM, Stefo Ratino srat...@yahoo.com wrote: Hi all, I am looking for a R function which unables me to plot mean +/- SEM. Is there such a function in R? Many thanks, Stefo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using napredict in prcomp
Thanks for the answer. Yes indeed, napredict only puts NAs in the right places (ie where na.action was used). Anyway I realize that I'm much better off dealing with the missing values prior to the PCA, and R offers many options for that. Thanks Alain Prof Brian Ripley a écrit : On Tue, 10 Mar 2009, Alain Paquette wrote: Hello all, I wish to compute site scores using PCA (prcomp) on a matrix with missing values, for example: DrainSlopeOrgL a41NA b2.5396 c6845 d3912 e3164 ... Where a,b... are sites. The command pca-prcomp(~ Drain + Slope + OrgL, data = t, center = TRUE, scale = TRUE, na.action=na.exclude) works great, and from pca$x I can get site scores, e.g. PC1 PC2 PC3 a NA NA NA b -2.10475208 -2.315128625 -0.885197753 c 5.01177388 -1.778786252 -0.193285051 d 0.28638602 0.298315086 0.386113799 e -0.58861254 0.089498632 -0.434951813 ... Easy enough... But how do I use the napredict argument? Is it intended as an argument to be used in the prcomp line (as suggested in ?prcomp), or is it to be used by itself, to replace NAs in the above site score matrix (which is what I really want to do). I don't see an 'napredict' argument. The help says x: if 'retx' is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the 'rotation' matrix) is returned. Hence, 'cov(x)' is the diagonal matrix 'diag(sdev^2)'. For the formula method, 'napredict' is applied to handle the treatment of values omitted by the 'na.action'. 'napredict' is a function (so please look up its help), and it says R applies it, not that you need to. It is napredict which gives you the NAs in the correct places. Thank you, Alain -- Alain Paquette, Ph.D. alain.paque...@gmail.com Centre d'étude de la forêt (CEF) Université du Québec à Montréal www.cef-cfr.ca Projet TRIADE www.projettriade.ca alain.paque...@projettriade.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Paquette, Ph.D. alain.paque...@gmail.com Centre d'étude de la forêt (CEF) Université du Québec à Montréal www.cef-cfr.ca Projet TRIADE www.projettriade.ca alain.paque...@projettriade.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to monthly,daily,yearly average
Sorry, this is my first time to post. I have a big data set: first colume is date (ex: 2008-2-150, the second is time (10:30:00), and the following columes are variaty measurement data. Every 30 min, I have one data. I want to find an effecient way to calculate the hourly, daily, monthly and yearly average, and plot them, and eventually use these average data to do further analysis. Thanks! Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cholesky Decomposition in R
You got an A+ on the homework, Doug! I got a C- for suggesting svd(), which clearly doesn't yield a lower (or upper) triangular factorization. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Douglas Bates Sent: Tuesday, March 10, 2009 10:14 PM To: Manli Yan Cc: r-help@r-project.org Subject: Re: [R] Cholesky Decomposition in R On Tue, Mar 10, 2009 at 4:33 PM, Manli Yan manliyanrh...@gmail.com wrote: Hi everyone: I try to use r to do the Cholesky Decomposition,which is A=LDL',so far I only found how to decomposite A in to LL' by using chol(A),the function Cholesky(A) doesnt work,any one know other command to decomposte A in to LDL' My r code is: library(Matrix) A=matrix(c(1,1,1,1,5,5,1,5,14),nrow=3) chol(A) [,1] [,2] [,3] [1,] 1 1 1 [2,] 0 2 2 [3,] 0 0 3 Cholesky(A) Error in function (classes, fdef, mtable) : unable to find an inherited method for function Cholesky, for signature matrix whatz wrong??? The answer, surprisingly, is in the documentation, accessible as ?Cholesky which says that the first argument has to be a sparse, symmetric matrix. Because the Cholesky function is intended for sparse matrices it is not the best approach. The object returned is rather obscure Cholesky(as(A, dsCMatrix), LDL = TRUE) 'MatrixFactorization' of Formal class 'dCHMsimpl' [package Matrix] with 10 slots ..@ x : num [1:6] 1 1 1 4 1 9 ..@ p : int [1:4] 0 3 5 6 ..@ i : int [1:6] 0 1 2 1 2 2 ..@ nz : int [1:3] 3 2 1 ..@ nxt : int [1:5] 1 2 3 -1 0 ..@ prv : int [1:5] 4 0 1 2 -1 ..@ colcount: int [1:3] 3 2 1 ..@ perm: int [1:3] 0 1 2 ..@ type: int [1:4] 2 0 0 1 ..@ Dim : int [1:2] 3 3 It turns out that the factorization you want is encoded in the 'x' slot but not in an obvious way. Even if you ask for an expansion expand(Cholesky(as(A, dsCMatrix), LDL = TRUE)) $P [1,] | . . [2,] . | . [3,] . . | $L [1,] 1 . . [2,] 1 2 . [3,] 1 2 3 the result is converted from the LDL' factor to the LL' factor. A better approach is to consider how the LDL' factorization is related to the R'R form of the factorization returned by chol() ch - chol(A) dd - diag(ch) L - t(ch/dd) DD - dd^2 L [,1] [,2] [,3] [1,]100 [2,]110 [3,]111 DD [1] 1 4 9 This is all rather backwards in that the whole purpose of the LDL' form of the factorization is to avoid taking square roots to get the diagonal elements and to contend with positive semidefinite matrices. In other words, the LDL' form avoids some of the possible problems of the LL' form but not if you go through the LL' form to get to it. I think the underlying reason that an LDL' form is not directly available in R is because there is no Lapack subroutine for it. Let me know what our grade on the homework is. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to monthly,daily,yearly average
Check as.Date Date: Wed, 11 Mar 2009 06:25:44 -0700 From: qflic...@yahoo.com To: r-help@r-project.org Subject: [R] How to monthly,daily,yearly average Sorry, this is my first time to post. I have a big data set: first colume is date (ex: 2008-2-150, the second is time (10:30:00), and the following columes are variaty measurement data. Every 30 min, I have one data. I want to find an effecient way to calculate the hourly, daily, monthly and yearly average, and plot them, and eventually use these average data to do further analysis. Thanks! Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ggplot2: saving a grid with multiple plots
Hi all, I have managed to create a figure on the screen with multiple plots in it. Something like the example below. When I save that with ggsave(), only the last plot gets saved (pPath in the example) instead of the entire figure. Any suggestions how I can save this kind of figures automated? Thanks, Thierry library(ggplot2) pPoint - qplot(unemploy/pop, psavert, data=economics) pPath - qplot(unemploy/pop, psavert, data=economics, geom=path) grid.newpage() pushViewport(viewport(layout = grid.layout(2, 1))) print(pPoint, vp = viewport(layout.pos.row = 1, layout.pos.col = 1)) print(pPath, vp = viewport(layout.pos.row = 2, layout.pos.col = 1)) ggsave(filename = test.pdf) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
Thanks to Peter Dalgaard for the correct answer. I misinterpreted what R was returning. On Mar 11, 2009, at 7:32 AM, David Winsemius wrote: On Mar 11, 2009, at 6:36 AM, soeren.vo...@eawag.ch wrote: A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the truth. (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? In principle they are not the same. Do you want a test that tells you they are? (3) By the way, how to deal with low frequency cells? r - c(10, 100, 500, 1000, 2000, 5000) v - c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forecasting with dlm
Hi All, I have a problem trying to forecast using the dlm package, can anyone offer any advise? I setup my problem as follows, (following the manual as much as possible) data for example to run code CostUSD - c(27.24031,32.97051, 38.72474, 22.78394, 28.58938, 49.85973, 42.93949, 35.92468) library(dlm) buildFun - function(x) { dlmModPoly(1, dV = exp(x[1]), dW = exp(x[2])) } fit - dlmMLE(CostUSD, parm = c(0,0), build = buildFun) fit$conv dlmCostUSD - buildFun(fit$par) V(dlmCostUSD) W(dlmCostUSD) #For comparison StructTS(CostUSD, level) CostUSDFilt - dlmFilter(CostUSD, dlmCostUSD) CostUSDFore - dlmForecast(CostUSDFilt, nAhead = 1) after which i return the error message: Error in mod$m[lastObsIndex, ] : incorrect number of dimensions Can anyone offer any insight to this problem? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding text and other elements to ggplot2 plots
Hello, I really like the interface and flexibility of the ggplots package. However, I cannot find how to add text to a plot (like the 'text' and 'rect' functions in the graphics package). I will appreciate any suggestions. Thanks, Ofir. -- View this message in context: http://www.nabble.com/adding-text-and-other-elements-to-ggplot2-plots-tp22455395p22455395.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is this a documentation bug? Spss dates import
Hello R-user bug seekers are needed! In order to perform these simple tasks you have to use a copy of SPSS and obviously R. The problem is that date conversion of data coming from SPSS gives wrong results, if we follow ?as.POSIXct ## SPSS dates (R-help 2006-02-17) z - c(10485849600, 10477641600, 10561104000, 10562745600) as.Date(as.POSIXct(z, origin=1582-10-14, tz=GMT)) Note that ?as.POSIXct is coherent with the SPSS 'Programming and Data Management' guide (pag. 116): 'Internally, dates and date/times are stored as the number of seconds from October 14, 1582, and times are stored as the number of seconds from midnight.' I think the SPSS 'Programming and Data Management' is not very clear: times are stored as the number of seconds from midnight, but which midnight? 13th or 14th of October one? I think about 14th , so as.Date(as.POSIXct(z, origin=1582-10-14, tz=GMT)) has to be changed to as.Date(as.POSIXct(z, origin=1582-10-15, tz=GMT)) Test: - Let's create a vector of dates in SPSS and save it in C:\\date.sav DATA LIST / mydate (date). BEGIN DATA. 01/01/1960 11/07/1955 25/11/1962 08/06/1959 28-01-2003 15,03,03 1/1/1997 01-JAN-1998 END DATA. save outfile = C:\\date.sav /compressed. Now we use R: library(foreign) test.df - read.spss(C://date.sav, to.data.frame=T) now following ?as.POSIXct test.df$newdate - as.Date(as.POSIXct(test.df$MYDATE , origin=1582-10-14)) But if you take a look at the vector test.df$newdate you got: R.date = SPSS.date - 1 day Please confirm this! I would like to be sure that i didn't made mistakes (I've used SPSS 11 + R 2.8) If you come back to data, changing 14 to 15 test.df$newdate2 - as.Date(as.POSIXct(test.df$MYDATE, origin=1582-10-15)) assures that R.date = SPSS.date test.df MYDATEnewdate newdate2 1 1190376 1959-12-31 1960-01-01 2 11762496000 1955-07-10 1955-07-11 3 11995257600 1962-11-24 1962-11-25 4 11885875200 1959-06-07 1959-06-08 5 13263091200 2003-01-27 2003-01-28 6 13267065600 2003-03-14 2003-03-15 7 13071456000 1996-12-31 1997-01-01 8 13102992000 1997-12-31 1998-01-01 You got that, do you? It is needed to (eventually) ask clarifications about SPSS to Raynald Levesque (the author of SPSS 'Programming and Data Management' guide) and to submit a bug for ?as.POSIXct thank you Luca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sparse PCA in R
Also look at the pcaMethods package (on Bioconductor). Kevin Wright On Tue, Mar 10, 2009 at 2:59 PM, Christos Hatzis christos.hat...@nuverabio.com wrote: Take a look at the elasticnet package. -Christos -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of joris meys Sent: Tuesday, March 10, 2009 3:43 PM To: R-help Mailing List Subject: [R] Sparse PCA in R Dear all, I would like to perform a sparse PCA, but I didn't find any library offering me this in R. Is there one available, or do I have to write the functions myself? Kind regards Joris Meys [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cholesky Decomposition in R
A - matrix(runif(25),5,5); A - A%*%t(A) ## Example +ve def matrix U - chol(A) ## start from factor given by chol D - diag(U) ## extract sqrt(D) L - t(U/D) ## get unit lower triangular factor D - diag(D^2) ## and diagonal ## So now A = LDL' range(A-L%*%D%*%t(L)) #best, #Simon On Tuesday 10 March 2009 21:33, Manli Yan wrote: Hi everyone: I try to use r to do the Cholesky Decomposition,which is A=LDL',so far I only found how to decomposite A in to LL' by using chol(A),the function Cholesky(A) doesnt work,any one know other command to decomposte A in to LDL' My r code is: library(Matrix) A=matrix(c(1,1,1,1,5,5,1,5,14),nrow=3) chol(A) [,1] [,2] [,3] [1,]111 [2,]022 [3,]003 Cholesky(A) Error in function (classes, fdef, mtable) : unable to find an inherited method for function Cholesky, for signature matrix whatz wrong??? thanks~ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave xtable still best packages to produce tables
Jason, I found several references to odfWeave and xtable. My question is, are those still best R packages to produce publication quality tables tables from R? I think the file type you want (and how you will use the output) should probably drive your choice. If you are going to pdf, then Sweave/xtable/Hmisc:::latex are the best choices. If you are going to a more editable file format (doc or odt), then consider odfWeave or pdf/odf converter (which I think what Dr. Harrell does). HTML is always an option, but you'll have to covert the document if that is not your target format. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating an Excel file with multiple spreadsheets
2009/3/9 Phil Spector spec...@stat.berkeley.edu: The xlsReadWrite package provides write.xls for Windows, but it cannot write _multiple_ spreadsheets -- Regards, Hans-Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDL' Cholesky decomposition
The gchol function in library(kinship) does an LDL decomposition. An updated version has just recently been posted on Rforge, in the bdsmatrix library which is part of survival. temp - matrix(c(1,1,1,1,5,8,1,8,14), 3) gt - gchol(temp) as.matrix(gt) # L [,1] [,2] [,3] [1,]1 0.000 [2,]1 1.000 [3,]1 1.751 diag(gt) # D [1] 1.00 4.00 0.75 Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to monthly,daily,yearly average
Here is an example. See the three vignettes in zoo: vignette(zoo) and R News 4/1 for info on dates. # assume this input Lines - Date Time Value 01/01/08 00:00:00 1 01/01/08 00:30:00 2 01/01/08 01:00:00 3 01/01/08 01:30:00 4 01/01/08 02:00:00 5 01/01/08 02:30:00 6 01/01/08 03:00:00 7 01/01/08 03:30:00 8 01/01/08 04:00:00 9 # read in the data, convert it to zoo, aggregate it and plot library(zoo) library(chron) # DF - read.table(myfile.dat, header = TRUE, as.is = TRUE) DF - read.table(textConnection(Lines), header = TRUE, as.is = TRUE) # next line may need to specify formats, if different z - zoo(DF$Value, chron(DF$Date, DF$Time)) # hourly aggregation z.hr - aggregate(z, trunc(time(z), 01:00:00), mean) plot(z.hr, type = o) # daily aggregation z.day - aggregate(z, trunc, mean) plot(z.day, type = o) # monthly aggregation z.mo - aggregate(z, as.yearmon, mean) plot(z.mo, type = o) # yearly aggregation z.yr - aggregate(z.mo, floor, mean) plot(z.yr, type = o) On Wed, Mar 11, 2009 at 9:25 AM, Qianfeng Li qflic...@yahoo.com wrote: Sorry, this is my first time to post. I have a big data set: first colume is date (ex: 2008-2-150, the second is time (10:30:00), and the following columes are variaty measurement data. Every 30 min, I have one data. I want to find an effecient way to calculate the hourly, daily, monthly and yearly average, and plot them, and eventually use these average data to do further analysis. Thanks! Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to run odfWeave()
Unzipping ODF file using unzip -o SampleOdf.odt Erreur dans odfWeave(file.in, file.out) : Error unzipping file De plus : Warning message: In system(zipCmd[2], invisible = TRUE) : unzip introuvable I get this question about twice a week. Look at the help file. ?odfWeave says: Since ODF files are compressed archives of files and directories, R will need to zip and unzip the source file. While R has an unzip utility, it does not have one for re-zipping files, so an external application is needed. unzip and zip are free utilities located at http://www.info-zip.org/; Then, I have a second question, which totally independent from the one above (I have an error in both cases...) : is RepportOdf.odt automatically generated/created or should I create the document before ? You will need to create an odt document with the R code in it, then odfWeave(inFile, outFile) will take the input odt file and produce an output odt file. Please see: http://www.r-project.org/conferences/useR-2007/program/presentations/kuhn.pdf for an idea of how it works. And read the posting guide. Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bias correction for random forests?
Hi, Way back in 2004, an update to randomForest added an option 'corr.bias'. The explanation was a bit vague, but it turns out it improves RF's predictive fit with my data substantially. But I am having trouble understanding it. Does anyone know what this 'bias correction' actually does? Or what the justification for it is? Or when it would be necessary? Is there a paper I can look at? And is the feature likely to emerge from 'experimental' any time soon? Zhou Fang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Easy Recall to get ls(..., all.names=TRUE)?
Dear useRs, I have a utility function which is meant to be a clone of ls(), except with the option all.names=TRUE. Currently however, the function merely consists of a copy of the source code of ls(), except the default value of all.names is different. That approach has the drawback of future inconsistency if the code for ls() ever changes. No comment on whether that is likely or not; I would and do object to the current construction in principle. What I would like to do is rewrite the new function so that it does very minimal processing of its arguments, and then calls ls() with the new arguments, somewhat in the same spirit of the Recall() function. A challenge to me has been twofold, that this new ls() call has a different search path, and that ls() itself has a good deal of lazy evaluation (including possibly twice in one line, around line 11) in it. Getting my new function to work with all permutations of arguments, without merely copying the code from ls(), has been futile. I am sure I could manually enumerate the behavior for all permutations of all the arguments to ls(), and code each case individually. I am also sure I will not waste my time doing that. I was hoping, however, that there was some simple trick to allow this easily, one that I have missed. Even an explanation of why this might be a fool's errand would be quite helpful to my understanding of the intricacies of R evaluation. Many thanks, John John Szumiloski, Ph.D. Senior Biometrician Biometrics Research WP53B-120 Merck Research Laboratories P.O. Box 0004 West Point, PA 19486-0004 (215) 652-7346 (PH) (215) 993-1835 (FAX) Notice: This e-mail message, together with any attachme...{{dropped:15}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] About the merge
hi, All. DateDtime Hour Min Second Rep App_dur 9 2006-02-22 14:36:11 14 36 11 4 1 10 2006-02-22 14:36:12 14 36 12 3 86 11 2006-02-22 14:37:38 14 37 38 0 58 14 2006-02-22 14:38:36 14 38 36 3 1 15 2006-02-22 14:38:37 14 38 37 4 1 16 2006-02-22 14:38:38 14 38 38 1 9 18 2006-02-22 14:38:47 14 38 47 0 3 20 2006-02-22 14:38:50 14 38 50 1 1 21 2006-02-22 14:38:51 14 38 51 4 2 *** 23 2006-02-22 14:38:53 14 38 53 4 39 *** 25 2006-02-22 14:39:32 14 39 32 3 1 26 2006-02-22 14:39:33 14 39 33 4 1 27 2006-02-22 14:39:34 14 39 34 3 8 28 2006-02-22 14:39:42 14 39 42 4 62 How to write a program to merge Rep with consecutive equal numbers to one of that number and sum up App_dur?? on the above data frame, i want to merge 21 2006-02-22 14:38:51 14 38 51 4 2 *** 23 2006-02-22 14:38:53 14 38 53 4 39 *** to 2006-02-22 14:38:51 14 38 51 4 41 and represent original two *** by the new one like: 9 2006-02-22 14:36:11 14 36 11 4 1 10 2006-02-22 14:36:12 14 36 12 3 86 11 2006-02-22 14:37:38 14 37 38 0 58 14 2006-02-22 14:38:36 14 38 36 3 1 15 2006-02-22 14:38:37 14 38 37 4 1 16 2006-02-22 14:38:38 14 38 38 1 9 18 2006-02-22 14:38:47 14 38 47 0 3 20 2006-02-22 14:38:50 14 38 50 1 1 2006-02-22 14:38:51 14 38 51 4 41 *** 25 2006-02-22 14:39:32 14 39 32 3 1 26 2006-02-22 14:39:33 14 39 33 4 1 27 2006-02-22 14:39:34 14 39 34 3 8 28 2006-02-22 14:39:42 14 39 42 4 62 ?? Thanks. Tammy _ Drag n dropGet easy photo sharing with Windows Live Photos. http://www.microsoft.com/windows/windowslive/products/photos.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there any difference between - and =
Dear R-helpers: I have a question related to - and =. I saw very experienced R programmers use = rather than - quite consistently. However, I heard from others that do not use = but always stick to - when assigning valuese. I personally like = because I was using Matabl, But, would like to receive expert opinion to avoid potential trouble. Many thanks in advance. -Sean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chisq.test: decreasing p-value
Thanks to Peter, David, and Michael! After having corrected the coding error, the p values converge to particular value, not necessarily zero. The whole story is, 634 respondents in 6 different areas marked their answer on a 7-step Likert scale (very bad, bad, ..., very good -- later recoded to 5 scale levels). The statistical question now is, do the answer's distributions (amount of goods, bads etc.) in either area differ from the mean answer-distribution calculated with summing up all goods, bads, etc. Anyway an omnibus chi square would not answer my question, and due to spurious significances I'd rather go back to my chi square book ;-) (for the interested, see http://sozmod.eawag.ch/files/file.Robj for the entire table). Thanks for your help Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed models fixed effects
Dear All, This may sound like a dumb question but I am trying to use a mixed model to determine the predictors of bat activity along hedges within 8 sites. So my response is continuous (bat passes) my predictors fixed effects are continuous (height metres), width (metres) etc and the random effect is site - can you tell me if the fixed effects can be continuous as all the examples I have read show them as categorical, but this is not covered in any documents I can find. Help! Emma __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed models fixed effects
Hi Emma, Continuous predictors are no problem at all. You can mix both continuous and categorial predictors if needed. I suppose your response are counts (the number of bats that passes)? In that case a generalised linear mixed model is more appropriate. With the lme4 package you could try something like this: library(lme4) Model - glmer(BatPasses ~ Width + Height + (1|Site), family = poisson) HTH, Thierry PS There is a mailing list dedicated to mixed models: R-Sig-MixedModels ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Emma Stone Verzonden: woensdag 11 maart 2009 15:29 Aan: r-help@r-project.org Onderwerp: Re: [R] Mixed models fixed effects Dear All, This may sound like a dumb question but I am trying to use a mixed model to determine the predictors of bat activity along hedges within 8 sites. So my response is continuous (bat passes) my predictors fixed effects are continuous (height metres), width (metres) etc and the random effect is site - can you tell me if the fixed effects can be continuous as all the examples I have read show them as categorical, but this is not covered in any documents I can find. Help! Emma __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any difference between - and =
On 3/11/2009 10:18 AM, Sean Zhang wrote: Dear R-helpers: I have a question related to - and =. I saw very experienced R programmers use = rather than - quite consistently. However, I heard from others that do not use = but always stick to - when assigning valuese. I personally like = because I was using Matabl, But, would like to receive expert opinion to avoid potential trouble. Use - for assignment, and = for function arguments. Then the difference between f( a = 3 ) f( a - 3 ) is clear, and you won't be surprised that a gets changed in the second case. If you use = for assignment, the two lines above will be written as f( a = 3 ) f( ( a = 3 ) ) and it is very easy to miss the crucial difference between them. Duncan Murdoch Many thanks in advance. -Sean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with rfImpute (package randomForest)
Hello everybody, this is my first request about R so I am sorry if I send it to a bad mail or if I am not very clear. So my problem is about the use of rfImpute from randomForest package. I am interested in imputations of missing values and I read that randomForest can make it. So i write the following code : set.seed(100); library(mlbench) library(randomForest) data(BreastCancer) summary(BreastCancer) data=BreastCancer[,-1] data=data[!is.na(data[,Bare.nuclei]),] summary(data) is.factor(data$Cl.thickness)# OK ##selection of missing values## x=1:nrow(data) sample1=sample(x,70) sample3=sample(x,70) sample5=sample(x,70) ##replace by missing values# data_missing=data data_missing[sample1,1]=NA data_missing[sample3,3]=NA data_missing[sample5,5]=NA summary(data_missing) is.factor(data_missing$Cl.thickness)# OK imputation by random forest data_imputed - rfImpute(Class ~ .,data_missing,iter=5,ntree=1000) is.factor(data_imputed$Cl.thickness)# Not OK And as you can see, rfImpute change the type of one explanatory variable. Before imputation, it was a factor. After it becomes a quantitative variable. So I don't understand what it happens. Maybe I should add an option in rfImpute... If someone could help me to understand. Thank you very much _ ? Lancez-vous ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to rotate the histogram
Dear all I want to combine two figures on one plot. First figure display the data distribution. The x-axis of first figure is individual's ranking and the y-axis of first figure is probability. However, the second figure is a histogram about probability. In order to make the figure more clearly, i want to put those two figure together. But i have a problem that i can not rotate the histogram.Because i want put the x- axis of histogram on the left side of first figure. Can anyone provide some way to solve this problem. Sinerely __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RServe
Dear all, I'm trying to use R-functions from Java. I want to use the Package Rserve. Everthing is installed and I can run my Java-testprogramm. ... Rconnection r = new Rconnection(); double[] d = r.eval(rnorm(10)).asDoubleArray(); ... But now I get always arrays with ten 0.0, which is not correct I suppose, that is is a problem from R, because at the fist time (last week) I started RServe I got an icon in my taskbar which was something about a R server. But I never got it since then... I tried to reinstall R, but it didn't help. Where is teh problem and what can I do? Thanks a lot, Max -- View this message in context: http://www.nabble.com/RServe-tp22452051p22452051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combining math and variables in expression
I am trying to get the following line in a plot margin using mtext: 100% Area = 120.000 km^2 Where I intend that 100% Area = is text, 120.000 is a number that varies according to different data, and km^2 should be a neat km-with-superscript-2. The expression function fails me, since it apparently cannot coerce an expression when a variable is involved. In the R help file there is an example using either bquote or substitute to accomplish like things, but these are sufficiently different from my objective and I am becoming more and more confused by every further example I stumble on. This should be easy. Can anyone help me prevent spending anoter 2.5 hours on the matter? Thanks. -- View this message in context: http://www.nabble.com/Combining-math-and-variables-in-expression-tp22455877p22455877.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a directory for my data
Sorry, I did forget to add what it is I want to get. I wish to get a result like the following: Legend: ID Title 1 Article A 2 Article B 3 Article C 4 Article D ... i Article N -- View this message in context: http://www.nabble.com/Creating-a-directory-for-my-data-tp22448335p22455535.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding text and other elements to ggplot2 plots
Have a look at the annotations section of http://had.co.nz/ggplot2/book/toolbox.pdf Hadley On Wed, Mar 11, 2009 at 8:44 AM, levyofi levy...@post.tau.ac.il wrote: Hello, I really like the interface and flexibility of the ggplots package. However, I cannot find how to add text to a plot (like the 'text' and 'rect' functions in the graphics package). I will appreciate any suggestions. Thanks, Ofir. -- View this message in context: http://www.nabble.com/adding-text-and-other-elements-to-ggplot2-plots-tp22455395p22455395.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed models fixed effects
Also check out these pdfs http://cran.r-project.org/other-docs.html and try to get your hands on the bible http://www.amazon.co.uk/R-Book-Michael-J-Crawley/dp/0470510242 Simon. Hi Emma, Continuous predictors are no problem at all. You can mix both continuous and categorial predictors if needed. I suppose your response are counts (the number of bats that passes)? In that case a generalised linear mixed model is more appropriate. With the lme4 package you could try something like this: library(lme4) Model - glmer(BatPasses ~ Width + Height + (1|Site), family = poisson) HTH, Thierry PS There is a mailing list dedicated to mixed models: R-Sig-MixedModels ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Emma Stone Verzonden: woensdag 11 maart 2009 15:29 Aan: r-help@r-project.org Onderwerp: Re: [R] Mixed models fixed effects Dear All, This may sound like a dumb question but I am trying to use a mixed model to determine the predictors of bat activity along hedges within 8 sites. So my response is continuous (bat passes) my predictors fixed effects are continuous (height metres), width (metres) etc and the random effect is site - can you tell me if the fixed effects can be continuous as all the examples I have read show them as categorical, but this is not covered in any documents I can find. Help! Emma __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RES: North Arrow (.png file) on a Map
Another possibility is the my.symbols function in the TeachingDemos package. You can define a polygon of your arrow (or other symbol), then use my.symbols to add it to an existing graph, the advantage of my.symbols is that the size of the arrow is absolute, not in terms of the scale of the original plot. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Rodrigo Aluizio Sent: Wednesday, March 11, 2009 4:22 AM To: 'Yihui Xie' Cc: R Help Subject: [R] RES: North Arrow (.png file) on a Map Thanks a lot Yihui, It's perfect, in fact exactly identical the I have as a png file in my computer. Thank you so much. Regards Rodrigo -Mensagem original- De: Yihui Xie [mailto:xieyi...@gmail.com] Enviada em: quarta-feira, 11 de março de 2009 02:48 Para: Rodrigo Aluizio Cc: R Help Assunto: Re: [R] North Arrow (.png file) on a Map Is this arrow satisfactory for you? north.arrow = function(x, y, h) { polygon(c(x, x, x + h/2), c(y - h, y, y - (1 + sqrt(3)/2) * h), col = black, border = NA) polygon(c(x, x + h/2, x, x - h/2), c(y - h, y - (1 + sqrt(3)/2) * h, y, y - (1 + sqrt(3)/2) * h)) text(x, y, N, adj = c(0.5, 0), cex = 4) } plot(1, type = n, ylim = c(0, 1)) north.arrow(1, 0.8, 0.3) Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Tue, Mar 10, 2009 at 7:21 PM, Rodrigo Aluizio r.alui...@gmail.com wrote: Hi list. I would like to know how do I insert a North arrow, stored as a png file in my computer, in a map? I found lots of post asking similar things, one of them mentioned the pixmap package. The map was done using map() and shapefiles (the code is below). I'm using the pixmap () and addlogo() functions. Well I can import the png with pixmap() function (I guess, once there's no error message), but I can't put It on the map, I got an error message telling me that: Error at t(x...@index[nrow(x...@index):1, , drop = FALSE]) : index out of limits Well I tried changing coordinates but I always got the same result. How do I do this correctly? Is there a better way? Thanks for the help and attention. Here is the complete map script: library(RODBC) library(maps) library(mapdata) library(maptools) library(pixmap) #Carregar Coordenadas e dados dos Pontos Amostrais Dados-odbcConnectExcel('Campos.xls',readOnly=T) Coord-sqlFetch(Dados,'CoordMed',colnames=F,rownames='Ponto') odbcClose(Dados) N-pixmap('Norte.png',nrow=166,ncol=113) # Carregar pontos e shapes Batimetria- readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontol ogia/Campos/ShapeFiles/Batimetria_BC.shp') Estados- readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontologi a/Campos/ShapeFiles/Estados_Sudeste.shp') Faciologia- readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontol ogia/Campos/ShapeFiles/Faciologia_BC.shp') # Mapa com os Pontos da Bacia postscript('MapaCampos.eps',paper='special',onefile=F,horizontal=F,widt h=3.5 ,height=4.5,bg='white',pointsize=3) par(mar=c(3,2,2,0)) map('worldHires','brazil',ylim=c(23.9,20.3),xlim=c(42.1,39.2),type='n') plot(Faciologia,ylab='',xlab='',col=c('lightgreen','lightgreen','lightg reen' ,'lightgreen','lightgreen','lightgray','lightgray','lightgray','lightgr ay',' lightgray','lightgray','lightgray','lightgray','lightgray','lightgray', 'ligh tgray','lightgray','lightgray','lightgray','lightgray','lightgray','lig htyel low','lightyellow','lightyellow'),add=T,lwd=0.5,border=0) plot(Batimetria,ylab='',xlab='',col='darkgray',lty='solid',lwd=0.2,add= T) plot(Estados,ylab='',xlab='',lty='solid',add=T,lwd=0.8) text(Coord$Longitude[Coord$Réplicas=='1'],Coord$Latitude[Coord$Réplicas =='1' ],rownames(Coord)[Coord$Réplicas=='1'],col='red',cex=0.5,font=2) text(Coord$Longitude[Coord$Réplicas=='2'],Coord$Latitude[Coord$Réplicas =='2' ],rownames(Coord)[Coord$Réplicas=='2'],col='yellow',cex=0.5,font=2) text(Coord$Longitude[Coord$Réplicas=='3'],Coord$Latitude[Coord$Réplicas =='3' ],rownames(Coord)[Coord$Réplicas=='3'],col='blue',cex=0.5,font=2) points(Coord$Longitude,Coord$Latitude-0.045,pch=20,cex=0.7) text(c(41.5,41.3),c(21.7,20.6),c('RJ','ES')) axis(1,xaxp=c(42.1,39.2,2),cex.axis=1) axis(2,yaxp=c(23.9,20.3,4),cex.axis=1) title(main='Bacia') legend(40.2,23.5,c('Uma','Duas','Três'),pch=21,cex=1,pt.bg=c('red','yel low', 'blue'),bty='n',pt.cex=2,pt.lwd=0.6,title='Réplicas')
[R] OT: Likelihood ratio for the randomization/permutation test?
Hi guRus, My discipline (experimental psychology) is gradually moving away from Null Hypothesis Testing and towards measures of evidence. One measure of evidence that has been popular of late is the likelihood ratio. Glover Dixon (2005) demonstrate the calculation of the likelihood ratio from ANOVA tables, but I'm also interested in non-parametric statistics and wonder if anyone has any ideas on how to compute a likelihood ratio from a randomization test (aka. permutation test)? Say one had two groups and were interested in whether the mean scores of the two groups differ in a manner consistent with random chance or in a manner consistent with a non-null effect of some manipulation applied to the two groups. The randomization test addresses this by randomly re-assigning the participants to the groups, re-computing the difference between means, and repeating many times, yielding a distribution of simulated difference scores that represents the distribution expected by chance. Within a Null Hypothesis Testing framework you then estimate the probability of the null by observing the proportion of simulated difference scores that are greater in magnitude than the observed difference score. Any guesses on how to translate this into a quantification of evidence? Mike -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tinyurl.com/mikes-public-calendar ~ Certainty is folly... I think. ~ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any difference between - and =
On Wed, Mar 11, 2009 at 7:18 AM, Sean Zhang seane...@gmail.com wrote: Dear R-helpers: I have a question related to - and =. I saw very experienced R programmers use = rather than - quite consistently. However, I heard from others that do not use = but always stick to - when assigning valuese. I personally like = because I was using Matabl, But, would like to receive expert opinion to avoid potential trouble. Many thanks in advance. -Sean The short answer is that - is used for assignment, and = is used to associate function arguments with their values. You can use = instead of - for assignment too (in most contexts), but the converse isn't true. I've provided more detail about when you can and can't exchange the two operators (and some of the history about the operators themselves) in this blog post: http://blog.revolution-computing.com/2008/12/use-equals-or-arrow-for-assignment.html # David Smith -- David M Smith da...@revolution-computing.com Director of Community, REvolution Computing www.revolution-computing.com Tel: +1 (206) 577-4778 x3203 (Seattle, USA) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on MLInterfaces
Hi Tulgan -- MLInterfaces is a Bioconductor package, so please ask on the Bioc mailing list (the package maintainer is more likely to read that list) https://stat.ethz.ch/mailman/listinfo/bioconductor Martin Tul Gan tul...@ymail.com writes: Hi, I am trying to use MLearn in MLInterfaces package to do randomforest, clustering, knn etc. How do I predict on a test set for which I do not know the classes? My training set has two classes. Thanks, Tulgan __ Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now at http://ca.toolbar.yahoo.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compile under Ubuntu 8.10 (Ibex)
I fired up a new machine last night and loaded Ubuntu 8.10 on it. I then had to add in the usual compiler stuff which is not loaded by default: make, emacs, fortran, c++, etc. I'm having trouble compiling R 2.8 however. I get a message that it cannot figure out how to link f77 and C. I've pulled in gfortran. A google search on fortran ubuntu 8.10 shows various problems with the gcc switch from fortran77 to fortran95 --- also true for R? Any hints are welcome, including don't use 8.10 (thought that will be an evening of work). Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using a NAMESPACE or the Imports field in DESCRIPTION?
Hello list, I am writing a package which builds on a function (foo) of another package. I only need that particular function. What is the state of the art to do this? Do I need a NAMESPACE file in my package? Currently I still don't have one. Or can I do this with the Imports field in my DESCRIPTION file? Reading chapter 1 of the Writing R Extensions manual I do not completely understand for what situations these two alternatives are intended for. The manual says that with the Imports field of DESCRIPTION I can import the name space of another package. What does that mean? What are the differences between the two alternatives? Are there consequences for how I can access function foo within the code of my package? Thanks for some clarifications and recommendations! Wolfgang Koller __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Two new R courses in April (US)
REvolution Computing is hosting two new R courses in New York City and Chicago in April. Short descriptions are below, but further information and registration details can be found in the links at: http://blog.revolution-computing.com/2009/03/r-courses-from-revolution-april.html April 1 (New York City): What's All The Fuss About R? An Introduction to R Using Real-world Examples. This one-day, hands-on basic R training provides an introduction to R through the quantitative exploration of real-world problems. We’ll begin by using R for exploratory data analysis and statistical inference; specific topics include graphical methods (both base and grid graphics) and hypothesis testing via parametric and non-parametric methods. This introduction motivates a more formal presentation of the R language, including types of objects, reading/writing data, functions, loops, and control statements. Other strengths of R will be addressed, including statistical modeling, classes and methods, the package management system, and the interface to C/C++ and Fortran. April 23 (Chicago): High-Performance Computing in R. High-performance computing (HPC) technology is more available and affordable than ever before. Productively using HPC computing has remained surprisingly difficult, particularly with high-level scripted languages like R. Fortunately, that is changing. The last few years have seen significant gains in high-quality HPC computing-related packages for R including RMPI, SNOW, ParallelR and many more. This one-day course will present an overview of available HPC technologies for the R language, and will demonstrate each technology with simple examples that can be used as starting points for more sophisticated work. (Ideal for attendees of R/Finance 2009.) # David Smith -- David M Smith da...@revolution-computing.com Director of Community, REvolution Computing www.revolution-computing.com Tel: +1 (206) 577-4778 x3203 (Seattle, USA) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining math and variables in expression
tmp - '120.000' mtext(bquote( '100% Area' == .(tmp)~km^2 )) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of socrates Sent: Wednesday, March 11, 2009 8:08 AM To: r-help@r-project.org Subject: [R] Combining math and variables in expression I am trying to get the following line in a plot margin using mtext: 100% Area = 120.000 km^2 Where I intend that 100% Area = is text, 120.000 is a number that varies according to different data, and km^2 should be a neat km-with-superscript-2. The expression function fails me, since it apparently cannot coerce an expression when a variable is involved. In the R help file there is an example using either bquote or substitute to accomplish like things, but these are sufficiently different from my objective and I am becoming more and more confused by every further example I stumble on. This should be easy. Can anyone help me prevent spending anoter 2.5 hours on the matter? Thanks. -- View this message in context: http://www.nabble.com/Combining-math-and- variables-in-expression-tp22455877p22455877.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using a NAMESPACE or the Imports field in DESCRIPTION?
Hi Wolfgang -- Wolfgang Koller wrote: Hello list, I am writing a package which builds on a function (foo) of another package. I only need that particular function. What is the state of the art to do this? Do I need a NAMESPACE file in my package? Currently I still don't have one. Or can I do this with the Imports field in my DESCRIPTION file? Reading chapter 1 of the Writing R Extensions manual I do not completely understand for what situations these two alternatives are intended for. The manual says that with the Imports field of DESCRIPTION I can import the name space of another package. What does that mean? What are the differences between the two alternatives? Are there consequences for how I can access function foo within the code of my package? It sounds like you want to add Imports: OtherPackage to DESCRIPTION, and importFrom(OtherPackage, foo) in you NAMESPACE file, in addition perhaps to export(foo) if you'd like users of your package to access foo. Using a NAMESPACE can be very beneficial, especially as your software grows in complexity -- it ensures ready access to the symbols you want (foo, in this case) without relying on the structure of the user search path. I don't think adding Imports to DESCRIPTION and adding a NAMESPACE should really be viewed as 'alternatives' -- add a NAMESPACE, and many of the packages that you had previously listed in Depends: likely belong more appropriately in Imports. Martin Thanks for some clarifications and recommendations! Wolfgang Koller __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining math and variables in expression
Dear socrates; It did not take me 2.5 hours but it did take longer than I thought it would. I worked off the example given by Henrique Dallazuanna seen at: http://finzi.psych.upenn.edu/R/Rhelp08/archive/120065.html This seems to work: mtext(bquote(100~% ~ Area == ~ .(text.val)~ Km^2), side=3) I cannot say the the syntactic rules are entirely clear. The proper placement of the tildes is a bit mysterious to me. I am wondering if a solution using substitute(expression(.)) might be more clear but my efforts in that direction were not successful. -- David On Mar 11, 2009, at 10:08 AM, socrates wrote: I am trying to get the following line in a plot margin using mtext: 100% Area = 120.000 km^2 Where I intend that 100% Area = is text, 120.000 is a number that varies according to different data, and km^2 should be a neat km-with-superscript-2. The expression function fails me, since it apparently cannot coerce an expression when a variable is involved. In the R help file there is an example using either bquote or substitute to accomplish like things, but these are sufficiently different from my objective and I am becoming more and more confused by every further example I stumble on. This should be easy. Can anyone help me prevent spending anoter 2.5 hours on the matter? Thanks. -- View this message in context: http://www.nabble.com/Combining-math-and-variables-in-expression-tp22455877p22455877.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to check the model adequacy for a cox proportional hazard model fit?
Anybody has any suggestion? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about datatypes/plotting issue
David, I struggled with this for a while. I think the problem with the dates I have is that they are not specific dates, they are partial dates. A workaround for that that I got from someone else in the list was: as.Date(paste(x$Date, '1'), '%B %Y %d') to make them specific dates (the first of the month). Cheers, -Oscar On Mar 10, 2009, at 7:58 PM, David Winsemius wrote: You need to convert W$Date into a real date variable. At the moment it is just a character variable. str(W) 'data.frame': 265 obs. of 23 variables: $ Date: Factor w/ 265 levels ,April 1987,..: 1 90 68 156 2 178 134 ... $ AZ.Phoenix : Factor w/ 236 levels ,100.00,100.43,..: 236 1 1 1 1 1 1 1 1 1 ... $ CA.Los.Angeles : Factor w/ 260 levels 100.00,100.02,..: 260 113 114 115 116 ... $ CA.San.Diego: Factor w/ 261 levels 100.00,101.07,..: 261 109 110 111 112 ... $ CA.San.Francisco: Factor w/ 256 levels 100.00,102.70,..: 256 108 109 110 111 ... .(output trimmed) . . ?Date # not the variable name, the R class name ?format.Date ?strptime Unfortunately I seem to be at one of the many limits to my knowledge: This code behaves in the manner I expected: format(Sys.time(), %a %b %d %X %Y %Z) [1] Tue Mar 10 22:19:28 2009 EDT strptime(format(Sys.time(), %a %b %d %X %Y %Z), format=%a %b %d %X %Y %Z) [1] 2009-03-10 22:20:04 Whereas this code does not: format(Sys.Date(), %B %Y) [1] March 2009 as.Date(format(Sys.Date(), %B %Y), %B %Y) # would have assumed one was the inverse of the other, but ... [1] NA For some reason I cannot get the space delimited Month- combo to convert. I can getother space delimited formats to work for input or output: as.Date(03 1998, %M %Y) [1] 1998-03-10 format(Sys.Date(), %B %Y) [1] March 2009 Puzzled; -- David Winsemius On Mar 10, 2009, at 9:15 PM, Oscar Bonilla wrote: Hi, I am trying to plot the Case-Shiller index found at: http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls The way I'm importing it into R is as follows: library(gdata) W - read.xls(http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls , header=TRUE) attach(W) To give you and idea of what the data looks like: head(W) Date AZ.Phoenix CA.Los.Angeles CA.San.Diego CA.San.Francisco 1 PHXR LXXR SDXR SFXR 2 January 1987 59.3354.67 46.61 3 February 1987 59.6554.89 46.87 4March 1987 59.9955.16 47.32 5April 1987 60.8155.85 47.69 6 May 1987 61.6756.35 48.31 CO.Denver DC.Washington FL.Miami FL.Tampa GA.Atlanta IL.Chicago MA.Boston 1 DNXR WDXR MIXR TPXR ATXR CHXR BOXR 2 50.20 64.1168.5077.33 53.55 70.04 3 49.96 64.7768.7677.93 54.64 70.08 4 50.15 65.7169.2377.76 54.80 70.00 5 50.55 66.4069.2077.56 54.88 70.70 6 50.63 67.2769.4677.85 55.43 71.51 MI.Detroit MN.Minneapolis NC.Charlotte NV.Las.Vegas NY.New.York OH.Cleveland 1 DEXR MNXR CRXR LVXR NYXR CEXR 2 63.3966.36 74.4253.53 3 63.9467.03 75.4353.50 4 64.1767.34 76.2553.68 5 64.8167.88 77.3453.75 6 65.1867.90 79.1654.71 OR.Portland TX.Dallas WA.Seattle Composite.10 Composite.20 1POXR DAXR SEXR CSXR SPCS20R 2 41.05 62.82 3 41.28 63.39 4 41.06 63.87 5 40.96 64.57 6 41.24 65.56 Now on to the problem... if I just run plot(CA.San.Francisco ~ Date) I get: pastedGraphic.png Which I suspect is a problem because the Date column is not really a Date, it is a factor class(Date) [1] factor If I run: plot(as.numeric(CA.San.Francisco), type=l) I get: pastedGraphic.png which is wrong, as CA.San.Francisco has no such discontinuity. CA.San.Francisco [1] SFXR 46.61 46.87 47.32 47.69 48.31 48.83 49.49 49.94 50.69 [11] 51.33 51.80 52.03 52.24 52.64 53.19 54.19 56.09 58.22 58.70 [21] 59.00 59.50 60.37 61.31 62.20 62.66 63.32 64.64 66.27 67.77 [31] 69.26 70.27 71.36 72.31 72.95 73.25 73.02 72.87 72.95 73.50 [41] 74.57 75.12 75.15 74.81 74.45
Re: [R] Question about datatypes/plotting issue
Another approach is to use yearmon. e.g. library(zoo) as.yearmon(January 1987, %B %Y) [1] Jan 1987 Thus we can replace the w - line in my example code with: w - zoo(as.matrix(W[-1]), as.yearmon(W[,1], %B %Y)) On Wed, Mar 11, 2009 at 12:48 PM, Oscar Bonilla oboni...@galileo.edu wrote: David, I struggled with this for a while. I think the problem with the dates I have is that they are not specific dates, they are partial dates. A workaround for that that I got from someone else in the list was: as.Date(paste(x$Date, '1'), '%B %Y %d') to make them specific dates (the first of the month). Cheers, -Oscar On Mar 10, 2009, at 7:58 PM, David Winsemius wrote: You need to convert W$Date into a real date variable. At the moment it is just a character variable. str(W) 'data.frame': 265 obs. of 23 variables: $ Date : Factor w/ 265 levels ,April 1987,..: 1 90 68 156 2 178 134 ... $ AZ.Phoenix : Factor w/ 236 levels ,100.00,100.43,..: 236 1 1 1 1 1 1 1 1 1 ... $ CA.Los.Angeles : Factor w/ 260 levels 100.00,100.02,..: 260 113 114 115 116 ... $ CA.San.Diego : Factor w/ 261 levels 100.00,101.07,..: 261 109 110 111 112 ... $ CA.San.Francisco: Factor w/ 256 levels 100.00,102.70,..: 256 108 109 110 111 ... .(output trimmed) . . ?Date # not the variable name, the R class name ?format.Date ?strptime Unfortunately I seem to be at one of the many limits to my knowledge: This code behaves in the manner I expected: format(Sys.time(), %a %b %d %X %Y %Z) [1] Tue Mar 10 22:19:28 2009 EDT strptime(format(Sys.time(), %a %b %d %X %Y %Z), format=%a %b %d %X %Y %Z) [1] 2009-03-10 22:20:04 Whereas this code does not: format(Sys.Date(), %B %Y) [1] March 2009 as.Date(format(Sys.Date(), %B %Y), %B %Y) # would have assumed one was the inverse of the other, but ... [1] NA For some reason I cannot get the space delimited Month- combo to convert. I can getother space delimited formats to work for input or output: as.Date(03 1998, %M %Y) [1] 1998-03-10 format(Sys.Date(), %B %Y) [1] March 2009 Puzzled; -- David Winsemius On Mar 10, 2009, at 9:15 PM, Oscar Bonilla wrote: Hi, I am trying to plot the Case-Shiller index found at: http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls The way I'm importing it into R is as follows: library(gdata) W - read.xls(http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls;, header=TRUE) attach(W) To give you and idea of what the data looks like: head(W) Date AZ.Phoenix CA.Los.Angeles CA.San.Diego CA.San.Francisco 1 PHXR LXXR SDXR SFXR 2 January 1987 59.33 54.67 46.61 3 February 1987 59.65 54.89 46.87 4 March 1987 59.99 55.16 47.32 5 April 1987 60.81 55.85 47.69 6 May 1987 61.67 56.35 48.31 CO.Denver DC.Washington FL.Miami FL.Tampa GA.Atlanta IL.Chicago MA.Boston 1 DNXR WDXR MIXR TPXR ATXR CHXR BOXR 2 50.20 64.11 68.50 77.33 53.55 70.04 3 49.96 64.77 68.76 77.93 54.64 70.08 4 50.15 65.71 69.23 77.76 54.80 70.00 5 50.55 66.40 69.20 77.56 54.88 70.70 6 50.63 67.27 69.46 77.85 55.43 71.51 MI.Detroit MN.Minneapolis NC.Charlotte NV.Las.Vegas NY.New.York OH.Cleveland 1 DEXR MNXR CRXR LVXR NYXR CEXR 2 63.39 66.36 74.42 53.53 3 63.94 67.03 75.43 53.50 4 64.17 67.34 76.25 53.68 5 64.81 67.88 77.34 53.75 6 65.18 67.90 79.16 54.71 OR.Portland TX.Dallas WA.Seattle Composite.10 Composite.20 1 POXR DAXR SEXR CSXR SPCS20R 2 41.05 62.82 3 41.28 63.39 4 41.06 63.87 5 40.96 64.57 6 41.24 65.56 Now on to the problem... if I just run plot(CA.San.Francisco ~ Date) I get: pastedGraphic.png Which I suspect is a problem because the Date column is not really a Date, it is a factor class(Date) [1] factor If I run: plot(as.numeric(CA.San.Francisco), type=l) I get: pastedGraphic.png which is wrong, as CA.San.Francisco has no such discontinuity. CA.San.Francisco [1] SFXR 46.61 46.87 47.32 47.69 48.31 48.83 49.49 49.94
Re: [R] Mixed models fixed effects
Hi Simon, Carefull, someone is likely to tell you that the bible is Pinheiro, J.C., and Bates, D.M. (2000) Mixed-Effects Models in S and S-PLUS, Springer, and that would be much closer to being correct. Others might mention something by Searle. Nothing against Crawley, of course. But it usually is better to get close to the source, and to the active researchers in the field. One nice thing about the first reference (there are many others) is that Prof. Bates is an active contributor to this list and to the SIG mixed-models list (which he maintains): https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Check it out. Regards, Mark. Simon Pickett-4 wrote: Also check out these pdfs http://cran.r-project.org/other-docs.html and try to get your hands on the bible http://www.amazon.co.uk/R-Book-Michael-J-Crawley/dp/0470510242 Simon. Hi Emma, Continuous predictors are no problem at all. You can mix both continuous and categorial predictors if needed. I suppose your response are counts (the number of bats that passes)? In that case a generalised linear mixed model is more appropriate. With the lme4 package you could try something like this: library(lme4) Model - glmer(BatPasses ~ Width + Height + (1|Site), family = poisson) HTH, Thierry PS There is a mailing list dedicated to mixed models: R-Sig-MixedModels ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Emma Stone Verzonden: woensdag 11 maart 2009 15:29 Aan: r-help@r-project.org Onderwerp: Re: [R] Mixed models fixed effects Dear All, This may sound like a dumb question but I am trying to use a mixed model to determine the predictors of bat activity along hedges within 8 sites. So my response is continuous (bat passes) my predictors fixed effects are continuous (height metres), width (metres) etc and the random effect is site - can you tell me if the fixed effects can be continuous as all the examples I have read show them as categorical, but this is not covered in any documents I can find. Help! Emma __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Re%3A-Mixed-models-fixed-effects-tp22456368p22460248.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing data frames and keeping non-matches
Hi R users, I am trying to compare 2 data frames by subject and match and save the no matches to an object called nomatch, but I am getting unexpected results... Can anyone tell me how to correct the code to get the expected results shown in the last table? Many thanks in advance for your any help! library(reshape) year - c(2100:2110) x1 - c(F,T,T,F,F,F,T,F,T,T,F) df1 - data.frame(cbind(year, x1)) df1$subject - c(1,1,1,2,2,3,3,3,3,4,4) df1$match - 1; df1 df2 - data.frame(cbind(year, x1)) df2$subject - c(1:11) df2$match - 1; df2 key - c(subject, match) nomatch - subset(df2, is.element(df2[,key], df1[,key])==FALSE); nomatch rm(list=ls()) Unexpected Results year x1 subject match 1 2100 0 1 1 3 2102 1 3 1 5 2104 0 5 1 7 2106 1 7 1 9 2108 1 9 1 11 2110 0 11 1 Results I expected year x1 subject match 5 2104 0 5 1 6 2105 0 6 1 7 2106 1 7 1 8 2107 0 8 1 9 2108 1 9 1 10 2109 1 10 1 11 2110 0 11 1 -- View this message in context: http://www.nabble.com/Comparing-data-frames-and-keeping-non-matches-tp22460451p22460451.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: Likelihood ratio for the randomization/permutation test?
On Wed, 11 Mar 2009, Mike Lawrence wrote: Hi guRus, My discipline (experimental psychology) is gradually moving away from Null Hypothesis Testing and towards measures of evidence. One measure of evidence that has been popular of late is the likelihood ratio. Glover Dixon (2005) demonstrate the calculation of the likelihood ratio from ANOVA tables, but I'm also interested in non-parametric statistics and wonder if anyone has any ideas on how to compute a likelihood ratio from a randomization test (aka. permutation test)? You cannot get the likelihood ratio from just the null, you need an alternative. The alternative would have to provide different probabilities to the individual permutations than under the null I guess, so if you have a framework where this makes sense you are in business. I suspect you might be aiming in the direction of empirical likelihood for which there is a literature - Google 'empirical likelihood'. Also to turn this back to R, check out 'emplik' on CRAN. HTH, Chuck Say one had two groups and were interested in whether the mean scores of the two groups differ in a manner consistent with random chance or in a manner consistent with a non-null effect of some manipulation applied to the two groups. The randomization test addresses this by randomly re-assigning the participants to the groups, re-computing the difference between means, and repeating many times, yielding a distribution of simulated difference scores that represents the distribution expected by chance. Within a Null Hypothesis Testing framework you then estimate the probability of the null by observing the proportion of simulated difference scores that are greater in magnitude than the observed difference score. Any guesses on how to translate this into a quantification of evidence? Mike -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tinyurl.com/mikes-public-calendar ~ Certainty is folly... I think. ~ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compile under Ubuntu 8.10 (Ibex)
On Wed, Mar 11, 2009 at 11:11 AM, Terry Therneau thern...@mayo.edu wrote: I fired up a new machine last night and loaded Ubuntu 8.10 on it. I then had to add in the usual compiler stuff which is not loaded by default: make, emacs, fortran, c++, etc. I'm having trouble compiling R 2.8 however. I get a message that it cannot figure out how to link f77 and C. I've pulled in gfortran. A google search on fortran ubuntu 8.10 shows various problems with the gcc switch from fortran77 to fortran95 --- also true for R? Any hints are welcome, including don't use 8.10 (thought that will be an evening of work). Read http://cran.us.r-project.org/bin/linux/ubuntu to learn how to add the CRAN repository to your package search path then install the r-base and r-base-dev packages. If you want all the development packages to allow you to compile and install development versions of R then run sudo apt-get build-dep r-base and all the required packages will be magically added. If you want to see exactly how Dirk has set up the compilation of R under Debian then install the wajig package, run cd /tmp wajig build r-base and sit back and enjoy. Finally note that the mailing list R-SIG-Debian is more suitable for questions related to Debian and Debian-derived distributions like Ubuntu. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RES: North Arrow (.png file) on a Map
Also, see # install.packages('sp') example('spplot', package='sp') Kingsford Jones On Wed, Mar 11, 2009 at 9:38 AM, Greg Snow greg.s...@imail.org wrote: Another possibility is the my.symbols function in the TeachingDemos package. You can define a polygon of your arrow (or other symbol), then use my.symbols to add it to an existing graph, the advantage of my.symbols is that the size of the arrow is absolute, not in terms of the scale of the original plot. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Rodrigo Aluizio Sent: Wednesday, March 11, 2009 4:22 AM To: 'Yihui Xie' Cc: R Help Subject: [R] RES: North Arrow (.png file) on a Map Thanks a lot Yihui, It's perfect, in fact exactly identical the I have as a png file in my computer. Thank you so much. Regards Rodrigo -Mensagem original- De: Yihui Xie [mailto:xieyi...@gmail.com] Enviada em: quarta-feira, 11 de março de 2009 02:48 Para: Rodrigo Aluizio Cc: R Help Assunto: Re: [R] North Arrow (.png file) on a Map Is this arrow satisfactory for you? north.arrow = function(x, y, h) { polygon(c(x, x, x + h/2), c(y - h, y, y - (1 + sqrt(3)/2) * h), col = black, border = NA) polygon(c(x, x + h/2, x, x - h/2), c(y - h, y - (1 + sqrt(3)/2) * h, y, y - (1 + sqrt(3)/2) * h)) text(x, y, N, adj = c(0.5, 0), cex = 4) } plot(1, type = n, ylim = c(0, 1)) north.arrow(1, 0.8, 0.3) Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Tue, Mar 10, 2009 at 7:21 PM, Rodrigo Aluizio r.alui...@gmail.com wrote: Hi list. I would like to know how do I insert a North arrow, stored as a png file in my computer, in a map? I found lots of post asking similar things, one of them mentioned the pixmap package. The map was done using map() and shapefiles (the code is below). I'm using the pixmap () and addlogo() functions. Well I can import the png with pixmap() function (I guess, once there's no error message), but I can't put It on the map, I got an error message telling me that: Error at t(x...@index[nrow(x...@index):1, , drop = FALSE]) : index out of limits Well I tried changing coordinates but I always got the same result. How do I do this correctly? Is there a better way? Thanks for the help and attention. Here is the complete map script: library(RODBC) library(maps) library(mapdata) library(maptools) library(pixmap) #Carregar Coordenadas e dados dos Pontos Amostrais Dados-odbcConnectExcel('Campos.xls',readOnly=T) Coord-sqlFetch(Dados,'CoordMed',colnames=F,rownames='Ponto') odbcClose(Dados) N-pixmap('Norte.png',nrow=166,ncol=113) # Carregar pontos e shapes Batimetria- readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontol ogia/Campos/ShapeFiles/Batimetria_BC.shp') Estados- readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontologi a/Campos/ShapeFiles/Estados_Sudeste.shp') Faciologia- readShapeSpatial('C:/Users/Rodrigo/Documents/UFPR/Micropaleontol ogia/Campos/ShapeFiles/Faciologia_BC.shp') # Mapa com os Pontos da Bacia postscript('MapaCampos.eps',paper='special',onefile=F,horizontal=F,widt h=3.5 ,height=4.5,bg='white',pointsize=3) par(mar=c(3,2,2,0)) map('worldHires','brazil',ylim=c(23.9,20.3),xlim=c(42.1,39.2),type='n') plot(Faciologia,ylab='',xlab='',col=c('lightgreen','lightgreen','lightg reen' ,'lightgreen','lightgreen','lightgray','lightgray','lightgray','lightgr ay',' lightgray','lightgray','lightgray','lightgray','lightgray','lightgray', 'ligh tgray','lightgray','lightgray','lightgray','lightgray','lightgray','lig htyel low','lightyellow','lightyellow'),add=T,lwd=0.5,border=0) plot(Batimetria,ylab='',xlab='',col='darkgray',lty='solid',lwd=0.2,add= T) plot(Estados,ylab='',xlab='',lty='solid',add=T,lwd=0.8) text(Coord$Longitude[Coord$Réplicas=='1'],Coord$Latitude[Coord$Réplicas =='1' ],rownames(Coord)[Coord$Réplicas=='1'],col='red',cex=0.5,font=2) text(Coord$Longitude[Coord$Réplicas=='2'],Coord$Latitude[Coord$Réplicas =='2' ],rownames(Coord)[Coord$Réplicas=='2'],col='yellow',cex=0.5,font=2) text(Coord$Longitude[Coord$Réplicas=='3'],Coord$Latitude[Coord$Réplicas =='3' ],rownames(Coord)[Coord$Réplicas=='3'],col='blue',cex=0.5,font=2) points(Coord$Longitude,Coord$Latitude-0.045,pch=20,cex=0.7) text(c(41.5,41.3),c(21.7,20.6),c('RJ','ES')) axis(1,xaxp=c(42.1,39.2,2),cex.axis=1) axis(2,yaxp=c(23.9,20.3,4),cex.axis=1) title(main='Bacia')
Re: [R] compile under Ubuntu 8.10 (Ibex) - Thanks
Thanks for the good replies. 1. build-essential -- this is likely the essential thing that I forgot. I tend to leave running machines alone (for years at a time) so forget things like this. My laptop is Fedore core 6 I think, when the prior laptop died. I did remember bin-utils though. 2. See the R Admin manual. I read that pretty closely looking for hints. Perhaps the build-essential hint would be a good addition to the manual. 3. 'r-sig-debian' for Ubuntu questions: I never make that connection. 4. Current binaries for R can be found at... Also a good plan. But I may need to build R-devel (for the ShortRead package in bioconductor, to deal with Solexa sequencer data) so I wanted to work the bugs out. 5. It worked for me Useful too, since it told me it had to be something SIMPLE that I had overlooked. Thanks to Frank Harrell, Berwin Turlach, Chuck Taylor, Dirk Eddelbuettel, Roland Rau, Kjetil Halvorsen. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multilevel Modeling using glmmPQL
Hi, I'm trying to perform a power simulation for a simple multilevel model, using the function glmmPQL in R version 2.8.1. I want to extract the p-value for the fixed-effects portion of the regression, but I'm having trouble doing that. I can extract the coefficients (summary(fit)$coeff), and the covariance matrix (summary(fit)$varFix), but I can't grab the p-value (or the t-statistic.) Could someone explain how to do this? Please send responses to hal...@health.nyc.gov. Thanks in advance, Howard DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ \ ...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ets() initialization
Hi, I have been working on ets() for exponential smoothing. R initialize the forecast itself to minimize the error, I want to set an initialization value for the first period(such as first period's acutal value). However, I could not find how to do it. Would you please help me? Regards, Kezban YAgci -- Kezban Yagci Masters Student Industrial and Systems Engineering Georgia Institute of Technology __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multilevel Modeling using glmmPQL
On 3/11/2009 11:29 AM, Howard Alper wrote: Hi, I'm trying to perform a power simulation for a simple multilevel model, using the function glmmPQL in R version 2.8.1. I want to extract the p-value for the fixed-effects portion of the regression, but I'm having trouble doing that. I can extract the coefficients (summary(fit)$coeff), and the covariance matrix (summary(fit)$varFix), but I can't grab the p-value (or the t-statistic.) Could someone explain how to do this? Please send responses to hal...@health.nyc.gov. library(MASS) fm1 - glmmPQL(y ~ trt + I(week 2), random = ~ 1 | ID, family = binomial, data = bacteria) summary(fm1)$tTable[,5] Look at str(summary(fm1)) . Thanks in advance, Howard DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ \...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a documentation bug? Spss dates import
Luca, I ran your code using SPSS 17.0 and R 2.8.1. I found that the correct date was returned in R using this code: test.df$newdate - as.Date(as.POSIXct(test.df$mydate , origin=1582-10-14)) Also note that SPSS 17.0 is case sensitive in storing variable names, so the R code in your post needs to be changed from test.df$MYDATE to test.df$mydate to match your use of lowercase in the SPSS syntax. Harold Luca Braglia wrote: Hello R-user bug seekers are needed! In order to perform these simple tasks you have to use a copy of SPSS and obviously R -- View this message in context: http://www.nabble.com/Is-this-a-documentation-bug--Spss-dates-import-tp22455442p22461929.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Iterations of random sampling
I have a univariate binary (1,0) 230,000 records, I need to make 20,000 iterations of random sampling of a fixed size. Where I put the result of the sum of selected records for each repetition Thank's _ of your life [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forecasting with dlm
Hi Michael, There was a bug in dlmForecast that I have fixed. While the new version of the package makes its way to CRAN, you can source the attached file, which contains the amended version of dlmForecast. Best, Giovanni Petris Date: Wed, 11 Mar 2009 13:39:36 + From: Michael Pearmain mpearm...@google.com Sender: r-help-boun...@r-project.org Precedence: list DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1236778779; bh=FXcHfXd155+Rn9gFIVOikSMaQWI=; DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; Hi All, I have a problem trying to forecast using the dlm package, can anyone offer any advise? I setup my problem as follows, (following the manual as much as possible) data for example to run code CostUSD - c(27.24031,32.97051, 38.72474, 22.78394, 28.58938, 49.85973, 42.93949, 35.92468) library(dlm) buildFun - function(x) { dlmModPoly(1, dV = exp(x[1]), dW = exp(x[2])) } fit - dlmMLE(CostUSD, parm = c(0,0), build = buildFun) fit$conv dlmCostUSD - buildFun(fit$par) V(dlmCostUSD) W(dlmCostUSD) #For comparison StructTS(CostUSD, level) CostUSDFilt - dlmFilter(CostUSD, dlmCostUSD) CostUSDFore - dlmForecast(CostUSDFilt, nAhead = 1) after which i return the error message: Error in mod$m[lastObsIndex, ] : incorrect number of dimensions Can anyone offer any insight to this problem? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Giovanni Petris gpet...@uark.edu Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any difference between - and =
Duncan Murdoch wrote: Use - for assignment, and = for function arguments. Then the difference between f( a = 3 ) f( a - 3 ) is clear, and you won't be surprised that a gets changed in the second case. If you use = for assignment, the two lines above will be written as f( a = 3 ) f( ( a = 3 ) ) and it is very easy to miss the crucial difference between them. in fact, some recent posts show that things can go the other way round: people try to use - for function arguments. i think the following is the most secure way if one really really has to do assignment in a function call: f({a=3}) and if one keeps this convention, - can be dropped altogether. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Couple of Questions about Classification trees
So I have 2 sets of data - a training data set and a test data set. I've been doing the analysis on the training data set and then using predict and feeding the test data through that. There are 114 rows in the training data and 117 in the test data and 1024 columns in both. It's actually the same set of data split into two. The rows are made of 5 different numbers. They do represent something but it would take too long to explain. I want to try and find a classification rule for the 5 numbers in the rows based on the columns so I created a classification tree and plotted that and then pruned it. My question is how do you print the misclassification rate at each node on the actual diagram of the classification tree. I can't seem to get it up there. In my notes it uses gmistext() but I have a feeling that it's for Splus rather than R as gmistext() doesn.t work for me either. Second question is when I try using the predict.tree to put the test data into the tree and then plot it it comes up with a really weird and wrong looking plot. Here is the code I'm using: tree1 - tree(row~.,data=train) pruned.tree - prune.tree(tree1, best = 5, method = misclass) predict.tree1 - predict.tree(prune.tree, data = main) plot(predict.tree);text(predict.tree) I sort of don't get a classification tree, I get the x axis labelled 1, the y axis labelled 2 and then about 4 small black rectangles scattered across the plot. Thanks in Advance. -- View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22461673.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Iterations of random sampling
On 3/11/2009 3:15 PM, René Pineda wrote: I have a univariate binary (1,0) 230,000 records, I need to make 20,000 iterations of random sampling of a fixed size. Where I put the result of the sum of selected records for each repetition X - rbinom(23, 1, .5) sample.sums - replicate(2, sum(sample(X, 10))) Thank's _ of your life [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tracing SV3 methods
I would like to put traces on the methods for an SV3 generic and trace() doesn't seem to work. E.g., median(1:10) [1] 5.5 trace(median.default) median(1:10) # expect Tracing median.default ... [1] 5.5 I can tell that median.default is getting called, since a trace on sort() shows it in the traceback: trace(sort,Quote(cat(deparse(sys.calls()),sep=\n))) Tracing function sort in package base [1] sort median(1:10) Tracing sort(x, partial = half + 0L:1L) on entry list(median(1:10), median.default(1:10), sort(x, partial = half + 0L:1L), .doTrace(cat(deparse( sys.calls()), sep = \n), on entry), eval.parent(exprObj), eval(expr,p), eval(expr, envir, enclos)) [1] 5.5 Is there a recommended way to do this? (I'd like to put a trace on all methods of a SV3 generic to see which get called in various situations.) This was in R 2.8.1 on Windows but I see the same results in several versions of R. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reshape question.
This hopefully is trivial. I am trying to reshape the data using the reshape package. First I read in the data: a2009 - read.csv(Total2009.dat, header = TRUE) Then I trim it so that it only contains the columns that I have interested in: m2009 - melt(a2009, id.var=c(DayOfYear,Category,SubCategory,Sku), measure.var=c(Quantity), na.rm=TRUE) Then I start to formulate the data that I will process: c2009 - cast(m2009, DayOfYear ~ variable | Category, sum) Finally I aggregate the data: t2009 - cast(m2009, DayOfYear ~ variable, sum) My question is on the third step above (repeated here) c2009 - cast(m2009, DayOfYear ~ variable | Category, sum) This gets the data assocated with a unique 'Category' name. I want to get the data grouped by 'Category' and 'SubCategory'. The 'SubCategory' is not unique but the combination 'Category' and 'SubCategory' form a unique pair. What would be the formula that would give me the data grouped by Category AND SubCategory? Would it be as simple as: c2009 - cast(m2009, DayOfYear ~ variable | Category SubCategory, sum) ? Thank you for your suggestions. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo-random numbers between two numbers
Because of a mistake I made in copying code into email, there has been confusion (expressed in some private emails). Hence the corrected version is appended at the end. On 11-Mar-09 00:05:26, Ted Harding wrote: I have modified my example to make it more convincing! See at end. On 10-Mar-09 23:39:17, Ted Harding wrote: On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote: Please forget the last email I sent with the same subject. = I would like to generate pseudo-random numbers between two numbers using R, up to a given distribution, for instance, norm. That is something like rnorm(HowMany,Min,Max,mean,sd) over rnorm(HowMany,mean,sd). I am wondering if qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd) is good. Any idea? Thanks. -james It would certainly work as intended! For instance: truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(1000,-1,2.5) hist(Sample,breaks=100) Hoping this helps, Ted. truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(10,-1,2.5) hist(Sample,breaks=100) u0-(-0.975)+0.05*(0:69) yy-dnorm(u0) du-min(diff(H$breaks)) Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0)) lines(u0,Y) Ted. Corrected version: -- truncnorm-function(HowMany,Min,Max,mean=0,sd=1){ qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)), mean, sd)} Sample - truncnorm(10,-1,2.5) H - hist(Sample,breaks=100) u0-(-0.975)+0.05*(0:69) yy-dnorm(u0) du-min(diff(H$breaks)) Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0)) lines(u0,Y) Apologies for the error. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 11-Mar-09 Time: 19:48:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error.bars
Hi, I'm trying to use the function error.bars, but the program don't find it, and I dont't found any package with this function. Is there some another functin to draw barplots with error bars? Sueli Rodrigues Eng. Agrônoma - UNESP Mestranda - USP/ESALQ PPG-Solos e Nutrição de Plantas Fones (19)93442981 (19)33719762 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About the merge
You can assign a group number to each run of identical values of 'Rep' with firstInRun - with(data, c(TRUE, Rep[-1]!=Rep[-length(Rep)])) group - cumsum(firstInRun) where 'data' is your data.frame's name. Once you've assigned the runs to groups then you can use, e.g., sapply(split()) to compute the group sums summed_App_dur - with(data, sapply(split(App_dur,group), sum)) and put them into the shortened data.frame with newdata - data[firstInRun,] newdata$App_dur - summed_App_dur There are various tricks for speeding up the sapply(split(),sum), if that is a problem. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com - DateDtime Hour Min Second Rep App_dur 9 2006-02-22 14:36:11 14 36 11 4 1 10 2006-02-22 14:36:12 14 36 12 3 86 11 2006-02-22 14:37:38 14 37 38 0 58 14 2006-02-22 14:38:36 14 38 36 3 1 15 2006-02-22 14:38:37 14 38 37 4 1 16 2006-02-22 14:38:38 14 38 38 1 9 18 2006-02-22 14:38:47 14 38 47 0 3 20 2006-02-22 14:38:50 14 38 50 1 1 21 2006-02-22 14:38:51 14 38 51 4 2 *** 23 2006-02-22 14:38:53 14 38 53 4 39 *** 25 2006-02-22 14:39:32 14 39 32 3 1 26 2006-02-22 14:39:33 14 39 33 4 1 27 2006-02-22 14:39:34 14 39 34 3 8 28 2006-02-22 14:39:42 14 39 42 4 62 How to write a program to merge Rep with consecutive equal numbers to one of that number and sum up App_dur?? on the above data frame, i want to merge 21 2006-02-22 14:38:51 14 38 51 4 2 *** 23 2006-02-22 14:38:53 14 38 53 4 39 *** to 2006-02-22 14:38:51 14 38 51 4 41 ... Thanks. Tammy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Couple of Questions about Classification trees
Jen_mp3 wrote: So I have 2 sets of data - a training data set and a test data set. I've been doing the analysis on the training data set and then using predict and feeding the test data through that. There are 114 rows in the training data and 117 in the test data and 1024 columns in both. It's actually the same set of data split into two. The rows are made of 5 different numbers. They do represent something but it would take too long to explain. Your sample size is too small by a factor of perhaps 100 for simple data splitting to provide stable results. Then you have the problem of an improper scoring rule, i.e., one that when optimized gives the wrong answer. Frank Harrell I want to try and find a classification rule for the 5 numbers in the rows based on the columns so I created a classification tree and plotted that and then pruned it. My question is how do you print the misclassification rate at each node on the actual diagram of the classification tree. I can't seem to get it up there. In my notes it uses gmistext() but I have a feeling that it's for Splus rather than R as gmistext() doesn.t work for me either. Second question is when I try using the predict.tree to put the test data into the tree and then plot it it comes up with a really weird and wrong looking plot. Here is the code I'm using: tree1 - tree(row~.,data=train) pruned.tree - prune.tree(tree1, best = 5, method = misclass) predict.tree1 - predict.tree(prune.tree, data = main) plot(predict.tree);text(predict.tree) I sort of don't get a classification tree, I get the x axis labelled 1, the y axis labelled 2 and then about 4 small black rectangles scattered across the plot. Thanks in Advance. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.