Re: [R] S curve via R
Hello sir: How can I get S curve function via R? For SPSS,the function is:y=exp(b0+b1/x) I am not sure if this is the answer you want, but Scurve - function(x, b0=0, b1=1) { exp(b0+b1/x) } should do what you request. Greetings Johannes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Way OT] New hardware
Sean == Sean Davis [EMAIL PROTECTED] on Tue, 25 Jul 2006 17:16:02 -0400 writes: Sean Can anyone share experience with opteron versus the Sean xeon (woodcrest) for R under linux? I am looking at Sean using 16-32Gb of ram in a workstation (as opposed to a Sean server). Hmm, not that I'd be an expert... If you want to use so much RAM you want to use a 64-bit architecture and software (OS, libraries, compilers,...), right? AFAIK, that's been known to work well with Opterons and different flavors of Linuxen (e.g. we have dual Opterons, one with Redhat Enterprise and two with Ubuntu 6.06). Now I read that there are 64-bit Xeons with EMT64 (which is said to be Intel's emulation of AMD64), so in principle the same versions of Linux and R should run there as well. Since I haven't heard of any success stories I'm interested as well, in reports from R users. Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] command completion in R-WinEdt
Franco Mendolia wrote: Hello! Is there any possibility to use command completion in R-WinEdt? You can make use of the Command Completion Wizard for WinEdt available at http://www.winedt.org/Plugins/. A list of function names ships with the RWiNEdt package (file R.lst). Simply make it known to the Command Completion Wizard. Uwe Ligges Thanks Franco __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Way OT] New hardware
On Wed, 26 Jul 2006, Martin Maechler wrote: Sean == Sean Davis [EMAIL PROTECTED] on Tue, 25 Jul 2006 17:16:02 -0400 writes: Sean Can anyone share experience with opteron versus the Sean xeon (woodcrest) for R under linux? I am looking at Sean using 16-32Gb of ram in a workstation (as opposed to a Sean server). Hmm, not that I'd be an expert... If you want to use so much RAM you want to use a 64-bit architecture and software (OS, libraries, compilers,...), right? AFAIK, that's been known to work well with Opterons and different flavors of Linuxen (e.g. we have dual Opterons, one with Redhat Enterprise and two with Ubuntu 6.06). Now I read that there are 64-bit Xeons with EMT64 (which is said to be Intel's emulation of AMD64), so in principle the same versions of Linux and R should run there as well. Since I haven't heard of any success stories I'm interested as well, in reports from R users. There have been several posted here or R-devel. Things do change, but every time we have had a formal test, Opterons were considerably better than Xeons on performance/£. [BTW, I don't think you will get a workstation motherboard that takes 32Gb of RAM (although things change fast): my own machine has a server motherboard in a small tower case.] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overplotting: plot() invocation looks ugly ... suggestions?
And if lattice is ok then try this: library(lattice) xyplot(Consumption ~ Quarter, group = Year, data, type = o) Or you can use ggplot: install.packages(ggplot) library(ggplot) qplot(Quarter, Consumption, data=data,type=c(point,line), id=data$Year) Unfortunately this has uncovered a couple of small bugs for me to fix (no automatic legend, and have to specify the data frame explicitly) The slighly more verbose example below shows you what it should look like. data$Year - factor(data$Year) p - ggplot(data, aes=list(x=Quarter, y=Consumption, id=Year, colour=Year)) ggline(ggpoint(p), size=2) Regards, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert decimals to fractions - sorted
Hi Muhammad, How about this? at - read.table(textConnection(a)) at2 - cbind(at, jeebee=as.character(as.fractions(as.numeric(at[,2] sort.order - order(at2$V2) at2[sort.order,] at2[sort.order,c(1,3)] JeeBee. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overplotting: plot() invocation looks ugly ... suggestions?
Hello, I would like to make a question regarding the use of a grey background (by ggplot in this case, but also in other settings - I seem to remember a relevant lattice discussion). It seems that it is generally discouraged by journals. I guess one practical reason is that it makes photocopying difficult (in the sense that it may lead to low contrast situations). It might have to do with printing costs, as it leads to higher coverage of the page, but I do not know about that. [Disclaimer: it does look nice, though.] Any comments? Thanks, Costas On 7/26/06, hadley wickham [EMAIL PROTECTED] wrote: And if lattice is ok then try this: library(lattice) xyplot(Consumption ~ Quarter, group = Year, data, type = o) Or you can use ggplot: install.packages(ggplot) library(ggplot) qplot(Quarter, Consumption, data=data,type=c(point,line), id=data$Year) Unfortunately this has uncovered a couple of small bugs for me to fix (no automatic legend, and have to specify the data frame explicitly) The slighly more verbose example below shows you what it should look like. data$Year - factor(data$Year) p - ggplot(data, aes=list(x=Quarter, y=Consumption, id=Year, colour=Year)) ggline(ggpoint(p), size=2) Regards, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Power tests for ROC analysis
Dear List, please forgive this question from a hobby statistician, but I was wondering if there is any way of doing power calculations to estimate how much data is needed so that the sensitivity/specificity values along a ROC curve will be within a certain confidence interval? I am not aware of any such method, but was recently asked how much data would be needed to perform ROC analysis for a study. Thanks a lot, Peter Dr. med. Peter Robinson, MSc. Institut für Medizinische Genetik Universitätsklinikum Charité Humboldt-Universität Augustenburger Platz 1 13353 Berlin Phone: ++49-30-450 569124 Fax: ++49-30-450 569915 [EMAIL PROTECTED] http://www.charite.de/ch/medgen/robinson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overplotting: plot() invocation looks ugly ... suggestions?
Constantinos Antoniou skreiv: I would like to make a question regarding the use of a grey background (by ggplot in this case, but also in other settings - I seem to remember a relevant lattice discussion). It seems that it is generally discouraged by journals. I guess one practical reason is that it makes photocopying difficult (in the sense that it may lead to low contrast situations). It might have to do with printing costs, as it leads to higher coverage of the page, but I do not know about that. [Disclaimer: it does look nice, though.] Any comments? Just a small one: The grey background used by ggplot does look nice; the one used by earlier versions of lattice did not. All IMHO, of course. -- Karl Ove Hufthammer E-mail and Jabber: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overplotting: plot() invocation looks ugly ... suggestions?
I would like to make a question regarding the use of a grey background (by ggplot in this case, but also in other settings - I seem to remember a relevant lattice discussion). It seems that it is generally discouraged by journals. I guess one practical reason is that it makes photocopying difficult (in the sense that it may lead to low contrast situations). It might have to do with printing costs, as it leads to higher coverage of the page, but I do not know about that. [Disclaimer: it does look nice, though.] Any comments? It is very easy to change to the usual black on white grid lines (see ?ggopt and ?ggsave), so if your journal does require it, it's easy to turn off. Here are a few reasons I like the gray background (in no particular order): * you can then use white gridlines, which miniminally impinge on the plot, but still aid lookup to the relevant axis * the color of the plot more closely matches the color (in the typographic sense) of the text, so that the plot fits into a printed document without drawing so much attention to itself. * the contrast between the plot surface and the points is a little lower, which makes it a bit more pleasant to read Of course the big disadvantage is if you don't have a high quality printer, or a looking at a photocopy of a photocopy etc. This disadvantage should go away with time as the quality of printed output steadily improves. Regards, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Axis Title in persp() Overlaps with Axis Labels
Dear Kilian, Also give a looked at: http://wiki.r-project.org/rwiki/doku.php?id=graph_gallery:new-graphics You will see a new and very flexible function to 3D plot. Regards, __ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Paul Murrell p.murrell at auckland.ac.nz writes: Hi Kilian Plank wrote: Good morning, in a 3D plot based on persp() the axis title (of dimension z) overlaps with the axis labels. How can the distance (between axis labels and axis title) be increased? Paul Another way to do it: get the perspective matrix back from persp() and use trans3d() to redo essentially the same calculations that persp() does to decide where to put the label: x - seq(-10, 10, length= 30) y - x f - function(x,y) { r - sqrt(x^2+y^2); 10 * sin(r)/r } z - outer(x, y, f) z[is.na(z)] - 1 par(mfrow=c(2, 2)) persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = lightblue, ticktype=detailed) persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = lightblue, ticktype=detailed, zlab=\n\n\n\nz) p1 - persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = lightblue, ticktype=detailed,zlab=) ranges - t(sapply(list(x,y,z),range)) means - rowMeans(ranges) ## label offset distance, as a fraction of the plot width labelspace - 0.12 ## tweak this until you like the result xpos - min(x)-(diff(range(x)))*labelspace ypos - min(y)-(diff(range(y)))*labelspace labelbot3d - c(xpos,ypos,min(z)) labeltop3d - c(xpos,ypos,max(z)) labelmid3d - c(xpos,ypos,mean(range(z))) trans3dfun - function(v) { trans3d(v[1],v[2],v[3],p1) } labelbot2d - trans3dfun(labelbot3d) labelmid2d - trans3dfun(labelmid3d) labeltop2d - trans3dfun(labeltop3d) labelang - 180/pi*atan2(labeltop2d$y-labelbot2d$y,labeltop2d$x-labelbot2d$x) par(xpd=NA,srt=labelang) ## disable clipping and set string rotation text(labelmid2d[1]$x,labelmid2d$y,z label) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Main title of plot
I am a newbie, and I am afraid this may be a rather trivial question. However I could not find the answer anywhere. I am plotting a series of plots with different values for p. In the main title of a plot I have used the following code: plot(a,b,type=l,ylim=c(0,1), xlab=freq,ylab=power, main=c(maximum gain=,p) ) That works fine. However the value of p is plotted on a new line, instead of just after the = Is there anyway to print the value of p on the same line? Thanks Marco [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Main title of plot
This was just discussed yesterday. See the thread: https://www.stat.math.ethz.ch/pipermail/r-help/2006-July/109931.html On 7/26/06, Marco Boks [EMAIL PROTECTED] wrote: I am a newbie, and I am afraid this may be a rather trivial question. However I could not find the answer anywhere. I am plotting a series of plots with different values for p. In the main title of a plot I have used the following code: plot(a,b,type=l,ylim=c(0,1), xlab=freq,ylab=power, main=c(maximum gain=,p) ) That works fine. However the value of p is plotted on a new line, instead of just after the = Is there anyway to print the value of p on the same line? Thanks Marco [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] randomForest question
Hello, I've a question regarding randomForest (from the package with same name). I've 16 featurs (nominative), 159 positive and 318 negative cases that I'd like to classify (binary classification). Using the tuning from the e1071 package it turns out that the best performance if reached when using all 16 features per tree (mtry=16). However, the documentation of randomForest suggests to take the sqrt(#features), i.e. 4. How can I explain this difference? When using all features this is the same as a classical decision tree, with the difference that the tree is built and tested with different data sets, right? example (I've tried different configurations, incl. changing ntree): param - try(tune(randomForest, class ~ ., data=d.all318, range=list(mtry=c(4, 8, 16), ntree=c(1000; summary(param) Parameter tuning of `randomForest': - sampling method: 10-fold cross validation - best parameters: mtry ntree 16 1000 - best performance: 0.1571809 - Detailed performance results: mtry ntree error 14 1000 0.1928635 28 1000 0.1634752 3 16 1000 0.1571809 thanks a lot for your help, kind regards, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert decimals to fractions - sorted
Dear all, Thanks for your help. I played with you suggest and still didn't sort (summary) which I need. t(table(at2[sort.order,c(1,3)])) V1 jeebee -1 1 0 0 4 11/21 0 1 1/21 0 1/21 1 1 13/42 1 1 17/42 0 2 2/21 0 3 3/14 1 2 5/42 0 1 8/21 0 1 I need the result summary (order) like, -1 1 0/42 0 4 2/42 1 1 4/42 0 3 5/42 0 1 9/42 1 2 13/42 1 1 16/42 0 1 17/42 0 2 21/42 1 0 22/42 0 1 Thanks very much for any suggestions. Groeten Regards, Muhammad Subianto On 7/26/06, JeeBee [EMAIL PROTECTED] wrote: Hi Muhammad, How about this? at - read.table(textConnection(a)) at2 - cbind(at, jeebee=as.character(as.fractions(as.numeric(at[,2] sort.order - order(at2$V2) at2[sort.order,] at2[sort.order,c(1,3)] JeeBee. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. On 7/25/06, Muhammad Subianto [EMAIL PROTECTED] wrote: Dear all, Based on my question a few months ago https://stat.ethz.ch/pipermail/r-help/2006-January/086952.html and solved with https://stat.ethz.ch/pipermail/r-help/2006-January/086955.html https://stat.ethz.ch/pipermail/r-help/2006-January/086956.html and from https://stat.ethz.ch/pipermail/r-help/2006-January/086958.html frac.fun - function(x, den){ dec - seq(0, den) / den nams - paste(seq(0, den), den, sep = /) sapply(x, function(y) nams[which.min(abs(y - dec))]) } ### frac.fun(c(0, 1, 0.827, .06, 0.266), 75) Now, I have a dataset something like this: a -1 0 1 0.095238095238095 1 0.214285714285714 -1 0.5 1 0.309523809523810 -1 0.0476190476190476 1 0.404761904761905 1 0.119047619047619 -1 0.214285714285714 -1 0.309523809523810 1 0 1 0 1 0.404761904761905 1 0.095238095238095 1 0.047619047619047 1 0.380952380952381 1 0.214285714285714 1 0.523809523809524 1 0 1 0.095238095238095 First, I make it as fractions and then sorted. I have played around to make it sort, but it didn't succes. df - read.table(textConnection(a)) library(MASS) as.fractions(as.numeric(df[,2])) cbind(table(df[,2], df[,1]), summary(as.factor(df[,2]))) table(frac.fun(as.numeric(df[,2]),42), df[,1]) table(frac.fun(as.numeric(df[,2]),42), df[,1]) -1 1 0/42 0 4 13/42 1 1 16/42 0 1 17/42 0 2 21/42 1 0 22/42 0 1 2/42 1 1 4/42 0 3 5/42 0 1 9/42 1 2 How to make the result as sort (to increase) like this, -1 1 0/42 0 4 2/42 1 1 4/42 0 3 5/42 0 1 9/42 1 2 13/42 1 1 16/42 0 1 17/42 0 2 21/42 1 0 22/42 0 1 Thank's for any help. Best, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave and tth
Dr. Harrell, I tried odfWeave to create an OpenOffice file and found that it exhausted the memory of my large linux machine and took a long time to run. Do you have any details about the problem that you encountered? A bug that someone else had pointed out might be the culprit. I have the default image format as png, but since a lot of linux systems don't have that device automatically available to them, I have a switch for the device in odfWeaveControl: plotDevice = ifelse(.Platform$OS.type == windows, png, bitmap), The bitmap device units are in inches and the bmp device is in pixels. The bug is the default image size if 480 inches (whoops). Can you try using: odfWeaveControl( plotHeight = 5, plotWidth = 5, dispHeight = 5, dispWidth = 5) in your odfWeave call and let me know if this was the issue? I was able to reproduce the error on our linux systems and this fix worked (strange that the package passes R CMD check though). If this works, Section 7 of the odfWeave manual lists two command line tools (not included in my package) that can do the conversion from odt to Word (or other formats) I really appreciate Max Kuhn's efforts with odfWeave and hope to keep up with its development. No problem. I'll be releasing a bug fix version to solve the device units issue. Also, others have reported problems with locales. I believe I have a fix for this issue too. Thanks. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University Max -- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Main title of plot
Gabor, I think that this is actually different, since it does not involve plotmath. The issue here is the use of c() in: main=c(maximum gain=,p) rather than: main = paste(maximum gain =, p) Marco, try this: plot(a, b, type=l, ylim=c(0, 1), xlab = freq, ylab = power, main = paste(maximum gain =,p)) See ?paste for concatenating multiple vectors into a single character vector (string). HTH, Marc Schwartz On Wed, 2006-07-26 at 07:29 -0400, Gabor Grothendieck wrote: This was just discussed yesterday. See the thread: https://www.stat.math.ethz.ch/pipermail/r-help/2006-July/109931.html On 7/26/06, Marco Boks [EMAIL PROTECTED] wrote: I am a newbie, and I am afraid this may be a rather trivial question. However I could not find the answer anywhere. I am plotting a series of plots with different values for p. In the main title of a plot I have used the following code: plot(a,b,type=l,ylim=c(0,1), xlab=freq,ylab=power, main=c(maximum gain=,p) ) That works fine. However the value of p is plotted on a new line, instead of just after the = Is there anyway to print the value of p on the same line? Thanks Marco __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] the first and last case
Hi all Sometime ago I asked for a solution about how to aggregate data and the help was wonderful. Now, I´d like to know how to extract for each individual case below the first and the last observation to obtain this: ind y 18 19 27 2 11 39 3 10 4 8 4 5 # Below the example: ind - c(1,1,1,2,2,3,3,3,4,4,4,4) y - c(8,10,9,7,11,9,9,10,8,7,6,5) dat - as.data.frame(cbind(ind,y)) dat attach(dat) mean.ind - aggregate(dat$y, by=list(dat$ind), mean) mean.ind Thanks Mauricio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Faster alternative to by?
Hi I have a data.frame, two columns, 12304 rows. Both columns are factors. I want to do an equivalent of an SQL group by statement, and count the number of rows in the data frame for each unique value of the second column. I have: countl - by(mapped, mapped$col2, nrow) Now, mapped$col2 has 10588 levels, so this statement takes a really long time to run. Is there a more efficient way of doing this in R? Thanks Mick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster alternative to by?
table(mapped$col2) --- Jacques VESLOT CNRS UMR 8090 I.B.L (2ème étage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr --- michael watson (IAH-C) a écrit : Hi I have a data.frame, two columns, 12304 rows. Both columns are factors. I want to do an equivalent of an SQL group by statement, and count the number of rows in the data frame for each unique value of the second column. I have: countl - by(mapped, mapped$col2, nrow) Now, mapped$col2 has 10588 levels, so this statement takes a really long time to run. Is there a more efficient way of doing this in R? Thanks Mick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last case
do.call(rbind,lapply(split(dat, dat$ind), function(x) x[c(1,nrow(x)),])) --- Jacques VESLOT CNRS UMR 8090 I.B.L (2ème étage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr --- Mauricio Cardeal a écrit : Hi all Sometime ago I asked for a solution about how to aggregate data and the help was wonderful. Now, I´d like to know how to extract for each individual case below the first and the last observation to obtain this: ind y 18 19 27 2 11 39 3 10 4 8 4 5 # Below the example: ind - c(1,1,1,2,2,3,3,3,4,4,4,4) y - c(8,10,9,7,11,9,9,10,8,7,6,5) dat - as.data.frame(cbind(ind,y)) dat attach(dat) mean.ind - aggregate(dat$y, by=list(dat$ind), mean) mean.ind Thanks Mauricio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: the first and last case
could it be dat[unlist(tapply(1:nrow(dat), ind, range)),] ? stefano -Messaggio originale- Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] conto di Mauricio Cardeal Inviato: 26 July, 2006 14:22 A: r-help@stat.math.ethz.ch Oggetto: [R] the first and last case Hi all Sometime ago I asked for a solution about how to aggregate data and the help was wonderful. Now, I´d like to know how to extract for each individual case below the first and the last observation to obtain this: ind y 18 19 27 2 11 39 3 10 4 8 4 5 # Below the example: ind - c(1,1,1,2,2,3,3,3,4,4,4,4) y - c(8,10,9,7,11,9,9,10,8,7,6,5) dat - as.data.frame(cbind(ind,y)) dat attach(dat) mean.ind - aggregate(dat$y, by=list(dat$ind), mean) mean.ind Thanks Mauricio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last case
Dear Jacques, I believe you need dat ordered by ind and y before you apply your solution, right? Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Jacques VESLOT [EMAIL PROTECTED]: do.call(rbind,lapply(split(dat, dat$ind), function(x) x[c(1,nrow(x)),])) --- Jacques VESLOT CNRS UMR 8090 I.B.L (2ème étage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr --- Mauricio Cardeal a écrit : Hi all Sometime ago I asked for a solution about how to aggregate data and the help was wonderful. Now, I´d like to know how to extract for each individual case below the first and the last observation to obtain this: ind y 18 19 27 2 11 39 3 10 4 8 4 5 # Below the example: ind - c(1,1,1,2,2,3,3,3,4,4,4,4) y - c(8,10,9,7,11,9,9,10,8,7,6,5) dat - as.data.frame(cbind(ind,y)) dat attach(dat) mean.ind - aggregate(dat$y, by=list(dat$ind), mean) mean.ind Thanks Mauricio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [RODBC] ERROR: Could not SQLExecDirect
Peter Eiger Peter.Eiger at gmx.net writes: I've got a problem with RODBC and saving (sqlSave) of a dataframe in Access. R 2.0.1 is running on windows XP. When executing the examples in R help for the USArrests data set sqlSAve works fine, but running sqlSave() for a dataframe Adat str(Adat) `data.frame': 1202 obs. of 18 variables: containing 18 columns and ca. 1200 rows fails. I get the following error message: sqlSave(channel, Adat) Error in sqlSave(channel, Adat) : [RODBC] ERROR: Could not SQLExecDirect The data was fetched from the same Access database before and was not manipulated before the attempt to save. Try to set rownames = FALSE in sqlSave, it's TRUE by default which I believe is a bit unfortunate. And probably append=TRUE. It's also good to try with fast=FALSE first. When I get an error of that type, I first save to a non-existing table, and do a compare of what comes out with the original table. Dieter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last case
Try these: # 1 library(Hmisc) summary(y ~ ind, dat, fun = range, overall = FALSE) # 2 # or with specified column names f - function(x) c(head = head(x,1), tail = tail(x,1)) summary(y ~ ind, dat, fun = f, overall = FALSE) # 3 # another approach using by - same f as above do.call(rbind, by(dat$y, dat$ind, f)) # 4 # same but with with an ind column g - function(x) c(ind = x$ind[1], head = head(x$y,1), tail = tail(x$y,1)) do.call(rbind, by(dat, dat$ind, g)) On 7/26/06, Mauricio Cardeal [EMAIL PROTECTED] wrote: Hi all Sometime ago I asked for a solution about how to aggregate data and the help was wonderful. Now, I´d like to know how to extract for each individual case below the first and the last observation to obtain this: ind y 18 19 27 2 11 39 3 10 4 8 4 5 # Below the example: ind - c(1,1,1,2,2,3,3,3,4,4,4,4) y - c(8,10,9,7,11,9,9,10,8,7,6,5) dat - as.data.frame(cbind(ind,y)) dat attach(dat) mean.ind - aggregate(dat$y, by=list(dat$ind), mean) mean.ind Thanks Mauricio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SURVEY PREDICTED SEs: Problem
Hello R-list, I'm attempting to migrate from Stata to R for my complex survey work.It has been straight-forward so far except for the following problem: I have some code below, but first I'll describe the problem. When I compute predicted logits from a logistic regression, the standard errors of the predicted logits are way off (but the predicted logits are fine). Furthermore, the model logit coefficients have appropriate SEs. As a comparison, I ran the same model without the survey design; the predicted SEs come out fine. Here is example code (first no survey design model and predictions; then survey design model and predictions): #MODEL COEF. ESTIMATES (NO SURVEY DESIGN) model.l.nosvy - glm(qn58~t8l,data=all.stratum,family=binomial) summary(model.l.nosvy) Call: glm(formula = qn58 ~ t8l, family = binomial, data = all.stratum) Deviance Residuals: Min 1Q Median 3Q Max -1.310 -1.245 1.050 1.111 1.158 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 0.175890 0.006176 28.48 2e-16 *** t8l -0.018643 0.001376 -13.55 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 145934 on 105857 degrees of freedom Residual deviance: 145750 on 105856 degrees of freedom AIC: 145754 Number of Fisher Scoring iterations: 3 #PREDICTED SEs phat.l.se.logit.nosvy - predict(model.l.nosvy,se=TRUE) as.matrix(table(phat.l.se.logit.nosvy$se.fit)) [,1] 0.00632408017609573 14456 0.00633130215261306 15188 0.00741988836010757 12896 0.00743834214717549 10392 0.00923404822144662 13207 0.00925875968615561 15864 0.0114294663004145 12235 0.0114574202170594 11620 #MODEL COEF. ESTIMATES (SURVEY DESIGN) model.l - svyglm(qn58~t8l,design=all.svy,family=binomial) summary(model.l) Call: svyglm(qn58 ~ t8l, design = all.svy, family = binomial) Survey design: svydesign(id = ~psu, strata = ~stratum, weights = ~weight, data = all.stratum, nest = T) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.016004 0.023267 -0.6880.492 t8l -0.024496 0.004941 -4.958 1.13e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 0.934964) Number of Fisher Scoring iterations: 2 #PREDICTED SEs phat.l.logit.se - predict(model.l,se=TRUE) as.matrix(table(phat.l.logit.se$se.fit)) [,1] 2.04867522818685 15188 2.05533753780321 14456 2.39885304369985 10392 2.41588959524594 12896 2.98273190185571 15864 3.00556161422958 13207 3.69102305734136 11620 3.71685978156846 12235 #THESE SEs are too large. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest question [Broadcast]
When mtry is equal to total number of features, you just get regular bagging (in the R package -- Breiman Cutler's Fortran code samples variable with replacement, so you can't do bagging with that). There are cases when bagging will do better than random feature selection (i.e., RF), even in simulated data, but I'd say not very often. HTH, Andy From: [EMAIL PROTECTED] Hello, I've a question regarding randomForest (from the package with same name). I've 16 featurs (nominative), 159 positive and 318 negative cases that I'd like to classify (binary classification). Using the tuning from the e1071 package it turns out that the best performance if reached when using all 16 features per tree (mtry=16). However, the documentation of randomForest suggests to take the sqrt(#features), i.e. 4. How can I explain this difference? When using all features this is the same as a classical decision tree, with the difference that the tree is built and tested with different data sets, right? example (I've tried different configurations, incl. changing ntree): param - try(tune(randomForest, class ~ ., data=d.all318, range=list(mtry=c(4, 8, 16), ntree=c(1000; summary(param) Parameter tuning of `randomForest': - sampling method: 10-fold cross validation - best parameters: mtry ntree 16 1000 - best performance: 0.1571809 - Detailed performance results: mtry ntree error 14 1000 0.1928635 28 1000 0.1634752 3 16 1000 0.1571809 thanks a lot for your help, kind regards, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Moving Average
Dear R-Users, How can I compute simple moving averages from a time series in R? Note that I do not want to estimate a MA model, just compute the MA's given a lenght (as excel does). Thanks Ricardo Gonçalves Silva, M. Sc. Apoio aos Processos de Modelagem Matemática Econometria Inadimplência Serasa S.A. (11) - 6847-8889 [EMAIL PROTECTED] ** As informações contidas nesta mensagem e no(s) arquivo(s) anexo(s) são endereçadas exclusivamente à(s) pessoa(s) e/ou instituição(ões) acima indicada(s), podendo conter dados confidenciais, os quais não podem, sob qualquer forma ou pretexto, ser utilizados, divulgados, alterados, impressos ou copiados, total ou parcialmente, por pessoas não autorizadas. Caso não seja o destinatário, favor providenciar sua exclusão e notificar o remetente imediatamente. O uso impróprio será tratado conforme as normas da empresa e da legislação em vigor. Esta mensagem expressa o posicionamento pessoal do subscritor e não reflete necessariamente a opinião da Serasa. ** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String frequencies in rows
Hi All, Iâm trying to evaluate the frequency of different strings in each row of a data.frame : INPUT: ID G1 G2 G3 G4 ⦠GN 1 AA BB AB AB ⦠2 BB AB AB AA ⦠3 AC CC AC AA ⦠4 BB BB BB BB⦠The number of different strings can vary in each row. My solution has been: for (i in 1:length(INPUT[,1])){ b=as.data.frame(table(t((INPUT[i,2:5] some operations using the string values and frequencies (e.g. b for i==1 is: AA 1 BB 1 AB 2 ) } However my dataframe contains thousands rows and this script takes a lot of time. Could someone suggest me a faster way? Thank you very much, Mario Falchi [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Moving Average
See ?filter - simple and exponential are special cases ?runmean - in package caTools (the fastest) ?rollmean - in zoo package ?embed - can write your own using embed as basis ?sma - in package fSeries, also see ewma in same package Probably other functions in other packages too. On 7/26/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear R-Users, How can I compute simple moving averages from a time series in R? Note that I do not want to estimate a MA model, just compute the MA's given a lenght (as excel does). Thanks Ricardo Gonçalves Silva, M. Sc. Apoio aos Processos de Modelagem Matemática Econometria Inadimplência Serasa S.A. (11) - 6847-8889 [EMAIL PROTECTED] ** As informações contidas nesta mensagem e no(s) arquivo(s) anexo(s) são endereçadas exclusivamente à(s) pessoa(s) e/ou instituição(ões) acima indicada(s), podendo conter dados confidenciais, os quais não podem, sob qualquer forma ou pretexto, ser utilizados, divulgados, alterados, impressos ou copiados, total ou parcialmente, por pessoas não autorizadas. Caso não seja o destinatário, favor providenciar sua exclusão e notificar o remetente imediatamente. O uso impróprio será tratado conforme as normas da empresa e da legislação em vigor. Esta mensagem expressa o posicionamento pessoal do subscritor e não reflete necessariamente a opinião da Serasa. ** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Moving Average
From: [EMAIL PROTECTED] Date: 2006/07/26 Wed AM 09:29:27 CDT To: r-help@stat.math.ethz.ch Subject: [R] Moving Average i think it was mave() in Splus. probably something similar in R. do RSiteSearch(moving average)at an R prompt. Dear R-Users, How can I compute simple moving averages from a time series in R? Note that I do not want to estimate a MA model, just compute the MA's given a lenght (as excel does). Thanks Ricardo Gonçalves Silva, M. Sc. Apoio aos Processos de Modelagem Matemática Econometria Inadimplência Serasa S.A. (11) - 6847-8889 [EMAIL PROTECTED] ** As informações contidas nesta mensagem e no(s) arquivo(s) anexo(s) são endereçadas exclusivamente à(s) pessoa(s) e/ou instituição(ões) acima indicada(s), podendo conter dados confidenciais, os quais não podem, sob qualquer forma ou pretexto, ser utilizados, divulgados, alterados, impressos ou copiados, total ou parcialmente, por pessoas não autorizadas. Caso não seja o destinatário, favor providenciar sua exclusão e notificar o remetente imediatamente. O uso impróprio será tratado conforme as normas da empresa e da legislação em vigor. Esta mensagem expressa o posicionamento pessoal do subscritor e não reflete necessariamente a opinião da Serasa. ** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String frequencies in rows
Mario Falchi mariofalchi at yahoo.com writes: Iâm trying to evaluate the frequency of different strings in each row of a data.frame : INPUT: ID G1 G2 G3 G4 ⦠GN 1 AA BB AB AB ⦠Something like z - data[,-1] table(z,row(z)) ? Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave and tth
Kuhn, Max wrote: Dr. Harrell, I tried odfWeave to create an OpenOffice file and found that it exhausted the memory of my large linux machine and took a long time to run. Do you have any details about the problem that you encountered? A bug that someone else had pointed out might be the culprit. I have the default image format as png, but since a lot of linux systems don't have that device automatically available to them, I have a switch for the device in odfWeaveControl: plotDevice = ifelse(.Platform$OS.type == windows, png, bitmap), The bitmap device units are in inches and the bmp device is in pixels. The bug is the default image size if 480 inches (whoops). Can you try using: odfWeaveControl( plotHeight = 5, plotWidth = 5, dispHeight = 5, dispWidth = 5) in your odfWeave call and let me know if this was the issue? I was able to reproduce the error on our linux systems and this fix worked (strange that the package passes R CMD check though). Max, I should have contacted you first - sorry about that. png is working fine on my debian linux system, so I just ran library(odfWeave) odfWeave('/usr/local/lib/R/site-library/odfWeave/examples/examples.odt', '/tmp/out.odt', control= odfWeaveControl(plotHeight = 5,plotWidth = 5,dispHeight = 5,dispWidth = 5)) and it ran extremely fast, creating out.odt that loaded extremely fast into OpenOffice writer, unlike the first out.odt I had tried. If you develop a way to include high-resolution graphics that will be even better. I have updated http://biostat.mc.vanderbilt.edu/SweaveConvert accordingly. If this works, Section 7 of the odfWeave manual lists two command line tools (not included in my package) that can do the conversion from odt to Word (or other formats) Excellent Thanks! Frank I really appreciate Max Kuhn's efforts with odfWeave and hope to keep up with its development. No problem. I'll be releasing a bug fix version to solve the device units issue. Also, others have reported problems with locales. I believe I have a fix for this issue too. Thanks. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Branching on 'grep' returns...
Greetings, all. I'm fiddling with some text manipulation in R, and I've found something which feels counterintuitive to my PERL-trained senses; I'm hoping that I can glean new R intuition about the situation. Here's an example, as concise as I could make it. trg-c(this,that) # these two work as I'd expected. if ( grep(this,trg) ) { cat(Y\n) } else { cat(N\n) } if ( grep(that,trg) ) { cat(Y\n) } else { cat(N\n) } # These all fail with error 'argument is of length zero' # if ( grep(other,trg) ) { cat(Y\n) } else { cat(N\n) } # if ( grep(other,trg) == TRUE) { cat(Y\n) } else { cat(N\n) } # if ( grep(other,trg) == 1) { cat(Y\n) } else { cat(N\n) } # This says that the result is a numeric zero. Shouldn't I be able # to if on that, or at least compare it with a number? grep(other, trg) # I eventually decided this worked, but felt odd to me. if ( any(grep(other,trg))) { cat(Y\n) } else { cat(N\n) } So, is the 'Wrap it in an any()' just normal R practice, and I'm too new to know it? Is there a more fundamental dumb move I'm making? - Allen S. Rout __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String frequencies in rows
It's usually faster to operate on columns of data frames, rather than rows, so the following might help: R x G1 G2 G3 G4 1 AA BB AB AB 2 BB AB AB AA 3 AC CC AC AA 4 BB BB BB BB R xt - as.data.frame(t(x)) R sapply(xt, table) $`1` AA AB BB 1 2 1 $`2` AA AB BB 1 2 1 $`3` AA AC CC 1 2 1 $`4` BB 4 Andy From: Mario Falchi Hi All, I’m trying to evaluate the frequency of different strings in each row of a data.frame : INPUT: ID G1 G2 G3 G4 … GN 1 AA BB AB AB … 2 BB AB AB AA … 3 AC CC AC AA … 4 BB BB BB BB… The number of different strings can vary in each row. My solution has been: for (i in 1:length(INPUT[,1])){ b=as.data.frame(table(t((INPUT[i,2:5] some operations using the string values and frequencies (e.g. b for i==1 is: AA 1 BB 1 AB 2 ) } However my dataframe contains thousands rows and this script takes a lot of time. Could someone suggest me a faster way? Thank you very much, Mario Falchi [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Branching on 'grep' returns...
If you are using grep then I think you have it right. Note that this %in% trg is also available. On 26 Jul 2006 11:16:25 -0400, Allen S. Rout [EMAIL PROTECTED] wrote: Greetings, all. I'm fiddling with some text manipulation in R, and I've found something which feels counterintuitive to my PERL-trained senses; I'm hoping that I can glean new R intuition about the situation. Here's an example, as concise as I could make it. trg-c(this,that) # these two work as I'd expected. if ( grep(this,trg) ) { cat(Y\n) } else { cat(N\n) } if ( grep(that,trg) ) { cat(Y\n) } else { cat(N\n) } # These all fail with error 'argument is of length zero' # if ( grep(other,trg) ) { cat(Y\n) } else { cat(N\n) } # if ( grep(other,trg) == TRUE) { cat(Y\n) } else { cat(N\n) } # if ( grep(other,trg) == 1) { cat(Y\n) } else { cat(N\n) } # This says that the result is a numeric zero. Shouldn't I be able # to if on that, or at least compare it with a number? grep(other, trg) # I eventually decided this worked, but felt odd to me. if ( any(grep(other,trg))) { cat(Y\n) } else { cat(N\n) } So, is the 'Wrap it in an any()' just normal R practice, and I'm too new to know it? Is there a more fundamental dumb move I'm making? - Allen S. Rout __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA with not non-negative definite covariance
Thanks. I suppose that another option could be just to use classical multi-dimensional scaling. By my understanding this is (if based on Euclidian measure) completely analogous to PCA, and because it's based explicitly on distances, I could easily exclude the variables with NA's on a pairwise basis when calculating the distances. Quin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 25 July 2006 09:24 AM To: Quin Wills Cc: r-help@stat.math.ethz.ch Subject: Re: [R] PCA with not non-negative definite covariance Hi , hi all, Am I correct to understand from the previous discussions on this topic (a few years back) that if I have a matrix with missing values my PCA options seem dismal if: (1) I dont want to impute the missing values. (2) I dont want to completely remove cases with missing values. (3) I do cov() with use=pairwise.complete.obs, as this produces negative eigenvalues (which it has in my case!). (4) Maybe you can use the Non-linear Iterative Partial Least Squares (NIPALS) algorithm (intensively used in chemometry). S. Dray proposes a version of this procedure at http://pbil.univ-lyon1.fr/R/additifs.html. Hope this help :) Pierre -- Ce message a été envoyé depuis le webmail IMP (Internet Messaging Program) -- No virus found in this incoming message. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] implementing user defined covariance matirx
maybe the help of one of the corrStruct classes is of interest to you: ?corCompSymm ?corSymm ?corAR1 ?corCar1 ?corARMA ?corExp ?corGaus ?corLin ?corRatio ?corSphere Good luck, Thilo On Wednesday 26 July 2006 17:34, Jonathan Smith wrote: I am trying to implement my own covariance matrix into R.Then be able to use it in gls or lme or nlme to analyze some data. I simply want to use corr=mymodel(form ~tim/peep) same as can use corr= corAR1 and many others. Having a terrible time trying to figure out how to do this. I have found documentation saying that you can do this but cant find out how. Any suggestings would be greatly appreciated. My covariance matrix is: | 1 s1^gs2^gs3^g� | | s1^g 1 s1^gs2^g� | | s2^gs1^g 1 s1^g... | | s3^gs2^gs1^g1 � | Thankyou kindly for your time Jonathan Smith -- Thilo Kellermann Department of Psychiatry und Psychotherapy RWTH Aachen University Pauwelstr. 30 52074 Aachen Tel.: +49 (0)241 / 8089977 Fax.: +49 (0)241 / 8082401 E-Mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA with not non-negative definite covariance
I suppose that another option could be just to use classical multi-dimensional scaling. By my understanding this is (if based on Euclidian measure) completely analogous to PCA, and because it's based explicitly on distances, I could easily exclude the variables with NA's on a pairwise basis when calculating the distances. I don't think it as straightforward as that because distances calculated on observations with missing values will be smaller than other distances. I suspect adjusting for this would be in some way equivalent to imputation. Exactly what do you want a low-dimensional representation of your data set for? (And why are you concerned about negative eigenvalues?) Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multcomp
Let me clarify with a simpler example what I want to accomplish library(multcomp) data(recovery) Dcirec-simint(minutes~blanket,data=recovery, conf.level=0.9, alternative=less) out.data.mat - with(Dcirec,data.frame(estimate, conf.int, p.value.raw = c(p.value.raw), p.value.bon, p.value.adj)) I want to generate the same type of plot using out.data.mat that I get by plot(Dcirec) How do I specify the plot method how the data in out.data.mat is to be plotted? I am interested in doing this because, I am running about 1500 different comparisons, which creates 1500 different objects. I need to analyze them and combine significant ones into one plot. -Original Message- From: Greg Snow [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 25, 2006 12:12 PM To: Nair, Murlidharan T Subject: RE: [R] Multcomp Doing: str( fungus.cirec ) Suggests that fungus.cirec$conf.int contains the confidence intervals, you can manually plot the subset that you are intereseted in (and label them whatever you want) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan T Sent: Saturday, July 22, 2006 11:00 AM To: R-help@stat.math.ethz.ch Subject: [R] Multcomp Here it is again, hope this is more clear I am using the following data (only a small subset is given): Habitat Fungus.yield Birch 20.83829053 Birch 22.9718181 Birch 22.28216829 Birch 24.23136797 Birch 22.32147961 Birch 20.30783598 Oak 27.24047258 Oak 29.7730014 Oak 30.12608508 Oak 25.76088669 Oak 30.14750974 Hornbeam 17.05307949 Hornbeam 15.32805111 Hornbeam 18.26920177 Hornbeam 21.30987049 Hornbeam 21.7173223 I am using the multcomp package to do multiple comparisons as follows library(multcomp) # loads the package fungus-read.table(fungi.txt, Header=T)# Reads the data from file saved as fungi.txt fungus.cirec-simint(Fungus.yield~Habitat, data=fungus,conf.level=0.95,type =c(Tukey)) # Computes cimultaneous intervals using Tukey's method plot(fungus.cirec) # plots the data The plot function plots all the comparisons, I want to plot only part of the data since it clutters the graph. How do I plot only part of the data ? How do I tell it to mark the significant comparisons? How do I get rid of the field names in the plot? For eg. The plot labels are HabitatBirch-HabitatOak, I want it to be labeled as Birch-Oak. Hope I have posted it according to the guidelines, let me know otherwise. Cheers .../Murli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA with not non-negative definite covariance
Not sure what completely analagous means; mds is nonlinear, PCA is linear. In any case, the bottom line is that if you have high dimensional data with many missing values, you cannot know what the multivariate distribution looks like -- and you need a **lot** of data with many variables to usefully characterize it anyway. So you must either make some assumptions about what the distribution could be (including imputation methodology) or use any of the many exploratory techniques available to learn what you can. Thermodynamics holds -- you can't get something for nothing (you can't fool Mother Nature). -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Quin Wills Sent: Wednesday, July 26, 2006 8:44 AM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] PCA with not non-negative definite covariance Thanks. I suppose that another option could be just to use classical multi-dimensional scaling. By my understanding this is (if based on Euclidian measure) completely analogous to PCA, and because it's based explicitly on distances, I could easily exclude the variables with NA's on a pairwise basis when calculating the distances. Quin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 25 July 2006 09:24 AM To: Quin Wills Cc: r-help@stat.math.ethz.ch Subject: Re: [R] PCA with not non-negative definite covariance Hi , hi all, Am I correct to understand from the previous discussions on this topic (a few years back) that if I have a matrix with missing values my PCA options seem dismal if: (1) I dont want to impute the missing values. (2) I dont want to completely remove cases with missing values. (3) I do cov() with use=pairwise.complete.obs, as this produces negative eigenvalues (which it has in my case!). (4) Maybe you can use the Non-linear Iterative Partial Least Squares (NIPALS) algorithm (intensively used in chemometry). S. Dray proposes a version of this procedure at http://pbil.univ-lyon1.fr/R/additifs.html. Hope this help :) Pierre -- Ce message a été envoyé depuis le webmail IMP (Internet Messaging Program) -- No virus found in this incoming message. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odesolve loading problem
Hi, I get the following error message when loading the package odesolve ( R 2.2.1 - odesolve 0.5.14 - AMD64 - Linux Fedora Core 4 ) : library(odesolve) Error in library.dynam(lib,package,package.lib) : shared library 'TRUE' not found Error: package/namespace load failed for 'odesolve' Any help would be greatly appreciated Ariel./ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Branching on 'grep' returns...
On Wed, 26 Jul 2006, Allen S. Rout wrote: # These all fail with error 'argument is of length zero' # if ( grep(other,trg) ) { cat(Y\n) } else { cat(N\n) } # if ( grep(other,trg) == TRUE) { cat(Y\n) } else { cat(N\n) } # if ( grep(other,trg) == 1) { cat(Y\n) } else { cat(N\n) } # This says that the result is a numeric zero. Shouldn't I be able # to if on that, or at least compare it with a number? grep(other, trg) It is a numeric(0), that is, a zero-length vector of numbers. If you compare it with a number you get a zero-length logical vector. You can't get TRUE or FALSE, because a zero-length vector of 1s looks just like a zero-length vector of 0s, (or a zero-length vector of any other number) In handling zero-length vectors (and in other vectorization contexts) it is useful to distinguish between vectorized functions, which return a vector of the same length as the input, and reducing functions, which return a vector of length 1. The == operator is vectorized, but if() requires a condition of length 1, so they don't match. The solution is to apply some reducing function. Two possible options are length() and (as you found) any(). -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multcomp
Let me clarify with a simpler example what I want to accomplish library(multcomp) data(recovery) Dcirec-simint(minutes~blanket,data=recovery, conf.level=0.9, alternative=less) out.data.mat - with(Dcirec,data.frame(estimate, conf.int, p.value.raw = c(p.value.raw), p.value.bon, p.value.adj)) I want to generate the same type of plot using out.data.mat that I get by plot(Dcirec) How do I specify the plot method how the data in out.data.mat is to be plotted? I am interested in doing this because, I am running about 1500 different comparisons, which creates 1500 different objects. I need to analyze them and combine significant ones into one plot. -Original Message- From: Greg Snow [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 25, 2006 12:12 PM To: Nair, Murlidharan T Subject: RE: [R] Multcomp Doing: str( fungus.cirec ) Suggests that fungus.cirec$conf.int contains the confidence intervals, you can manually plot the subset that you are intereseted in (and label them whatever you want) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan T Sent: Saturday, July 22, 2006 11:00 AM To: R-help@stat.math.ethz.ch Subject: [R] Multcomp Here it is again, hope this is more clear I am using the following data (only a small subset is given): Habitat Fungus.yield Birch 20.83829053 Birch 22.9718181 Birch 22.28216829 Birch 24.23136797 Birch 22.32147961 Birch 20.30783598 Oak 27.24047258 Oak 29.7730014 Oak 30.12608508 Oak 25.76088669 Oak 30.14750974 Hornbeam 17.05307949 Hornbeam 15.32805111 Hornbeam 18.26920177 Hornbeam 21.30987049 Hornbeam 21.7173223 I am using the multcomp package to do multiple comparisons as follows library(multcomp) # loads the package fungus-read.table(fungi.txt, Header=T)# Reads the data from file saved as fungi.txt fungus.cirec-simint(Fungus.yield~Habitat, data=fungus,conf.level=0.95,type =c(Tukey)) # Computes cimultaneous intervals using Tukey's method plot(fungus.cirec) # plots the data The plot function plots all the comparisons, I want to plot only part of the data since it clutters the graph. How do I plot only part of the data ? How do I tell it to mark the significant comparisons? How do I get rid of the field names in the plot? For eg. The plot labels are HabitatBirch-HabitatOak, I want it to be labeled as Birch-Oak. Hope I have posted it according to the guidelines, let me know otherwise. Cheers .../Murli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multcomp
Look through multcomp:::plot.hmtest to find out which components of an hmtest object are actually used. Now look at what an hmtest object looks like by doing this dput(Dcirec) or looking through the source of the function that produces hmtest objects. With this information in hand we can construct one from out.data.mat: my.hmtest - structure(list( estimate = t(t(structure(out.data.mat[,estimate], .Names = rownames(out.data.mat, conf.int = out.data.mat[,2:3], ctype = Dunnett), class = hmtest) plot(my.hmtest) Note that this is a bit fragile since changes to the internal representation of hmtest objects could cause your object to cease working although as long as those changes do not affect the three components we are using it should be ok. By the way I hard coded Dunnett above since ctype is not available in out.data.mat . On 7/26/06, Nair, Murlidharan T [EMAIL PROTECTED] wrote: Let me clarify with a simpler example what I want to accomplish library(multcomp) data(recovery) Dcirec-simint(minutes~blanket,data=recovery, conf.level=0.9, alternative=less) out.data.mat - with(Dcirec,data.frame(estimate, conf.int, p.value.raw = c(p.value.raw), p.value.bon, p.value.adj)) I want to generate the same type of plot using out.data.mat that I get by plot(Dcirec) How do I specify the plot method how the data in out.data.mat is to be plotted? I am interested in doing this because, I am running about 1500 different comparisons, which creates 1500 different objects. I need to analyze them and combine significant ones into one plot. -Original Message- From: Greg Snow [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 25, 2006 12:12 PM To: Nair, Murlidharan T Subject: RE: [R] Multcomp Doing: str( fungus.cirec ) Suggests that fungus.cirec$conf.int contains the confidence intervals, you can manually plot the subset that you are intereseted in (and label them whatever you want) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan T Sent: Saturday, July 22, 2006 11:00 AM To: R-help@stat.math.ethz.ch Subject: [R] Multcomp Here it is again, hope this is more clear I am using the following data (only a small subset is given): Habitat Fungus.yield Birch 20.83829053 Birch 22.9718181 Birch 22.28216829 Birch 24.23136797 Birch 22.32147961 Birch 20.30783598 Oak 27.24047258 Oak 29.7730014 Oak 30.12608508 Oak 25.76088669 Oak 30.14750974 Hornbeam 17.05307949 Hornbeam 15.32805111 Hornbeam 18.26920177 Hornbeam 21.30987049 Hornbeam 21.7173223 I am using the multcomp package to do multiple comparisons as follows library(multcomp) # loads the package fungus-read.table(fungi.txt, Header=T)# Reads the data from file saved as fungi.txt fungus.cirec-simint(Fungus.yield~Habitat, data=fungus,conf.level=0.95,type =c(Tukey)) # Computes cimultaneous intervals using Tukey's method plot(fungus.cirec) # plots the data The plot function plots all the comparisons, I want to plot only part of the data since it clutters the graph. How do I plot only part of the data ? How do I tell it to mark the significant comparisons? How do I get rid of the field names in the plot? For eg. The plot labels are HabitatBirch-HabitatOak, I want it to be labeled as Birch-Oak. Hope I have posted it according to the guidelines, let me know otherwise. Cheers .../Murli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odesolve loading problem
Hi Ariel, The problem is that I specified the wrong dependency in the DEPENDS field of the DESCRIPTION file of the package: I specified R version of at least 2.2.1, but that should have been 2.3.1. You have two choices -- upgrade your R to 2.3.1, or install odesolve 0.5.13. I will send an updated version of the package to CRAN, with a note to the CRAN maintainers about the problem, but that won't help you if you need to use R version 2.2.1. There is an archive of older R packages on CRAN, linked to at the bottom of the Contributed Packages page on CRAN. Please accept my apology for the inconvenience -- I rushed through a change requested by a user, and did not take time to fully appreciate the consequences. Woody R. Woodrow Setzer, Ph. D. National Center for Computational Toxicology US Environmental Protection Agency Mail Drop B205-01/US EPA/RTP, NC 27711 Ph: (919) 541-0128Fax: (919) 541-1194 Ariel Chernomoretz [EMAIL PROTECTED]To r-help@stat.math.ethz.ch Sent by:cc [EMAIL PROTECTED] tat.math.ethz.ch Subject [R] odesolve loading problem 07/28/2006 01:30 AM Hi, I get the following error message when loading the package odesolve ( R 2.2.1 - odesolve 0.5.14 - AMD64 - Linux Fedora Core 4 ) : library(odesolve) Error in library.dynam(lib,package,package.lib) : shared library 'TRUE' not found Error: package/namespace load failed for 'odesolve' Any help would be greatly appreciated Ariel./ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multcomp
Here is a minor simplication: my.hmtest - structure(list( estimate = t(t(out.data.mat[,estimate,drop=FALSE])), conf.int = out.data.mat[,2:3], ctype = Dunnett), class = hmtest) plot(my.hmtest) On 7/26/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Look through multcomp:::plot.hmtest to find out which components of an hmtest object are actually used. Now look at what an hmtest object looks like by doing this dput(Dcirec) or looking through the source of the function that produces hmtest objects. With this information in hand we can construct one from out.data.mat: my.hmtest - structure(list( estimate = t(t(structure(out.data.mat[,estimate], .Names = rownames(out.data.mat, conf.int = out.data.mat[,2:3], ctype = Dunnett), class = hmtest) plot(my.hmtest) Note that this is a bit fragile since changes to the internal representation of hmtest objects could cause your object to cease working although as long as those changes do not affect the three components we are using it should be ok. By the way I hard coded Dunnett above since ctype is not available in out.data.mat . On 7/26/06, Nair, Murlidharan T [EMAIL PROTECTED] wrote: Let me clarify with a simpler example what I want to accomplish library(multcomp) data(recovery) Dcirec-simint(minutes~blanket,data=recovery, conf.level=0.9, alternative=less) out.data.mat - with(Dcirec,data.frame(estimate, conf.int, p.value.raw = c(p.value.raw), p.value.bon, p.value.adj)) I want to generate the same type of plot using out.data.mat that I get by plot(Dcirec) How do I specify the plot method how the data in out.data.mat is to be plotted? I am interested in doing this because, I am running about 1500 different comparisons, which creates 1500 different objects. I need to analyze them and combine significant ones into one plot. -Original Message- From: Greg Snow [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 25, 2006 12:12 PM To: Nair, Murlidharan T Subject: RE: [R] Multcomp Doing: str( fungus.cirec ) Suggests that fungus.cirec$conf.int contains the confidence intervals, you can manually plot the subset that you are intereseted in (and label them whatever you want) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan T Sent: Saturday, July 22, 2006 11:00 AM To: R-help@stat.math.ethz.ch Subject: [R] Multcomp Here it is again, hope this is more clear I am using the following data (only a small subset is given): Habitat Fungus.yield Birch 20.83829053 Birch 22.9718181 Birch 22.28216829 Birch 24.23136797 Birch 22.32147961 Birch 20.30783598 Oak 27.24047258 Oak 29.7730014 Oak 30.12608508 Oak 25.76088669 Oak 30.14750974 Hornbeam 17.05307949 Hornbeam 15.32805111 Hornbeam 18.26920177 Hornbeam 21.30987049 Hornbeam 21.7173223 I am using the multcomp package to do multiple comparisons as follows library(multcomp) # loads the package fungus-read.table(fungi.txt, Header=T)# Reads the data from file saved as fungi.txt fungus.cirec-simint(Fungus.yield~Habitat, data=fungus,conf.level=0.95,type =c(Tukey)) # Computes cimultaneous intervals using Tukey's method plot(fungus.cirec) # plots the data The plot function plots all the comparisons, I want to plot only part of the data since it clutters the graph. How do I plot only part of the data ? How do I tell it to mark the significant comparisons? How do I get rid of the field names in the plot? For eg. The plot labels are HabitatBirch-HabitatOak, I want it to be labeled as Birch-Oak. Hope I have posted it according to the guidelines, let me know otherwise. Cheers .../Murli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bootstrap within litter
Hello everyone. I have 6 to 10 strata with 6 to 12 subject within each stratum. I will like to do bootstrap to compute a confidence interval for an estimator which is a function of the Wilconson sum rank test. Are there any function in R to do this? Any reference will be helpful. Thank you Tony. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memory problems when combining randomForests
Dear all, I am trying to train a randomForest using all my control data (12,000 cases, ~ 20 explanatory variables, 2 classes). Because of memory constraints, I have split my data into 7 subsets and trained a randomForest for each, hoping that using combine() afterwards would solve the memory issue. Unfortunately, combine() still runs out of memory. Is there anything else I can do? (I am not using the formula version) Many Thanks Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bug?
Dear All, if you generate a sequence with small latitude like: x-seq(0,1,0.005) and you ask for all points of this lattice how many points are in a neighbourhood with radius 0.01 of each point: v - rep( 0 , length( x ) ) ; for (i in 1:length(x) ) { v[i] - length(x[ abs(x-x[i]) 0.01 ] ) ; }; then the answer should be: v = (2, 3, 3, 3, 3,...,3, 3, 3, 3, 2), because every point instead of the borders has 3 points in a 0.01-neighbourhood. but v contains also many 4 and also 5: v [1] 2 4 3 4 4 3 4 4 3 4 4 3 4 4 4 4 5 4 4 5 4 4 5 4 4 4 3 4 4 4 4 3 3 3 4 4 4 [38] 4 3 3 4 4 4 4 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 4 4 4 4 3 [75] 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 [112] 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 [149] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [186] 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 2 Could anyone explain this fact and help me to compute exactly on general data. Thank you very much, Patrick Jahn __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R vs. Stata
I have read some very good reviews comparing R (or Splus) to SAS. Does anyone know if there are any reviews comparing R (or Splus) to Stata? I am trying to get others to try R in my department, and I have never used Stata. Regards, -Cody Cody Hamilton, Ph.D Institute for Health Care Research and Improvement Baylor Health Care System (214) 265-3618 This e-mail, facsimile, or letter and any files or attachments transmitted with it contains information that is confidential and privileged. This information is intended only for the use of the individual(s) and entity(ies) to whom it is addressed. If you are the intended recipient, further disclosures are prohibited without proper authorization. If you are not the intended recipient, any disclosure, copying, printing, or use of this information is strictly prohibited and possibly a violation of federal or state law and regulations. If you have received this information in error, please notify Baylor Health Care System immediately at 1-866-402-1661 or via e-mail at [EMAIL PROTECTED] Baylor Health Care System, its subsidiaries, and affiliates hereby claim all applicable privileges related to this information. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug?
On Wed, 26 Jul 2006, Patrick Jahn wrote: Dear All, if you generate a sequence with small latitude like: x-seq(0,1,0.005) and you ask for all points of this lattice how many points are in a neighbourhood with radius 0.01 of each point: v - rep( 0 , length( x ) ) ; for (i in 1:length(x) ) { v[i] - length(x[ abs(x-x[i]) 0.01 ] ) ; }; then the answer should be: v = (2, 3, 3, 3, 3,...,3, 3, 3, 3, 2), because every point instead of the borders has 3 points in a 0.01-neighbourhood. but v contains also many 4 and also 5: v [1] 2 4 3 4 4 3 4 4 3 4 4 3 4 4 4 4 5 4 4 5 4 4 5 4 4 4 3 4 4 4 4 3 3 3 4 4 4 [38] 4 3 3 4 4 4 4 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 4 4 4 4 3 [75] 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 [112] 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 [149] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [186] 3 3 3 3 3 4 4 4 4 3 3 3 3 3 3 2 Could anyone explain this fact and help me to compute exactly on general data. Yes and no. The fact is easily explained: 0.005 and 0.01 are not exactly representable in floating point, and so it will not be true for all x that x+0.005+0.005 = x+0.01. This is a FAQ. For this problem an easy solution is to multiply by 200 (or 1000) and work with integers, which can be exactly represented. There is no solution for general data, although software for arbitrary precision floating point may come close (there was a message yesterday from someone trying to interface pari/gp, which does this, with R). -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (robust) mixed-effects model with covariate
Dear Thilo, many thanks for your reply. I realized that there was an error in my formula which should have been: aov(y ~ Group * (Time + Age) + Error (Subj/Time), data=df1) or alternatively: lme(RVP.A ~ Group*(Time+Age), random = ~ 1|Subj/Time,data=df1) but I get different results in each case, and different still from the results of another stat program (JMP). The problem is that I am not sure which one (if one indeed is) correct! Also, in the model you proposed: lme(y~Group*Time, random ~ age | Subj, data = df1) it appears that age is not between the effects of interests, so I do not get an estimate of the significance of the Age or the Age*Group effect. I have Pinheiro Bates, and I read the first chapter but it didn't seem to provide an example analogous to my case. Also, it looks like it would take me some months to study the book thoroughly and frankly that seems a bit excessive for such a (apparently?) simple problem I was hoping somebody would magically provide the correct syntax :-) ! thanks again anyway for your help best regards giuseppe Thilo Kellermann wrote: On Monday 24 July 2006 20:16, Giuseppe Pagnoni wrote: Dear all, First of all I apologize if you received this twice: I was checking the archive and I noticed that the text was scrubbed from the message, probably due to some setting in my e-mail program. I am unsure about how to specify a model in R and I thought of asking some advice to the list. I have two groups (Group= A, B) of subjects, with each subject undertaking a test before and after a certain treatment (Time= pre, post). Additionally, I want to enter the age of the subject as a covariate (the performance on the test is affected by age), and I also want to allow different slopes for the effect of age in the two groups of subjects (age might affect the performance of the two groups differentially). Is the right model to use something like the following? aov (y ~ Group*Time + Group*Age + Error(Subj/Group), data=df1 ) (If I enter that command, within summary, I get the following: Error() model is singular in: aov(y ~ Group * Time + Group * Age + Error(Subj/Group), data = df1)) try: aov(y~Group*Time*Age + Error(Subj*Time*Age), data = df1) which specifies an ANOVA (but not with mixed effects) with three main effects and all interaction terms plus an error term that is independent between groups (!) and relates to within subjects variability. For a real mixed effects analysis you should use the (n)lme function from the nlme package and one possible model could look like this: lme(y~Group*Time, random ~ age | Subj, data = df1) but the exact specification depends on your assumptions, in which it is possible to specify two or three models and compare their fits with anova(). For more information on mixed effects you should consult: Jose C. Pinheiro Douglas M. Bates (2000) Mixed-Effects Models in S and S-PLUS. Springer, New York. Good luck, Thilo As a second question: I have an outlier in one of the two groups. The outlier is not due to a measurement error but simply to the performance of the subject (possibly related to his medical history, but I have no way to determine that with certainty). This subject is signaled to be an outlier within its group: averaging the pre and post values for the performance of the subjects in his group, the Grubbs test yields a probability of 0.002 for the subject to be an outlier (the subject is marked as a significant outlier also if I perform the test separately on the pre and the post data). If I remove this subject from its group, I get significant effects of Group and Group X Age (not using the R formula above, but another stat software), but if I leave the subject in those effects disappear. Since I understand that removing outliers is always worrysome, I would like to know if it is possible in R to estimate a model similar to that outlined above but in a resistant/robust fashion, and what would be the actual syntax to do that. I will very much appreciate any help or suggestion about this. thanks in advance and best regards giuseppe -- - Giuseppe Pagnoni Psychiatry and Behavioral Sciences Emory University School of Medicine 1639 Pierce Drive, Suite 4000 Atlanta, GA, 30322 tel: 404.712.8431 fax: 404.727.3233 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R vs. Stata
There is some discussion in: http://www.burns-stat.com/pages/Tutor/R_relative_statpack.pdf which can also be found at the UCLA website. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Hamilton, Cody wrote: I have read some very good reviews comparing R (or Splus) to SAS. Does anyone know if there are any reviews comparing R (or Splus) to Stata? I am trying to get others to try R in my department, and I have never used Stata. Regards, -Cody Cody Hamilton, Ph.D Institute for Health Care Research and Improvement Baylor Health Care System (214) 265-3618 This e-mail, facsimile, or letter and any files or attachme...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] residual df in lmer and simulation results
Hello. Douglas Bates has explained in a previous posting to R why he does not output residual degrees of freedom, F values and probabilities in the mixed model (lmer) function: because the usual degrees of freedom (obs - fixed df -1) are not exact and are really only upper bounds. I am interpreting what he said but I am not a professional statistician, so I might be getting this wrong... Does anyone know of any more recent results, perhaps from simulations, that quantify the degree of bias that using such upper bounds for the demoninator degrees of freedom produces? Is it possible to calculate a lower bounds for such degrees of freedom? Thanks for any help. Bill Shipley North American Editor, Annals of Botany Editor, Population and Community Biology series, Springer Publishing Département de biologie, Université de Sherbrooke, Sherbrooke (Québec) J1K 2R1 CANADA [EMAIL PROTECTED] http://pages.usherbrooke.ca/jshipley/recherche/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to split the left and right hand terms of a formula
Hello All, I've sent a few messages to the list regarding splitting a formula into its right and left hand terms. Thanks to everyone who has responded. I believe that the best way to extract the left and right hand terms as character vectors follows: library(nlme) formula - y ~ x + z left.term - all.vars(getResponseFormula(formula)) covariates - all.vars(getCovariateFormula(formula)) Thanks! Dan Gerlanc Williams College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] arima() function - issues
Hi, My query is related to ARIMA function in stats package. While looking for the time series literature I found following link which highlights discrepancy in arima function while dealing with differenced time series. Is there a substitute function similar to sarima mentioned in the following website implemened in R? Any pointers would be of great help. http://lib.stat.cmu.edu/general/stoffer/tsa2/Rissues.htm Thanx in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ks.test exact p-value
R enthusiasts! I have been simulating daily in-stream bacteria concentrations using a variety of scenarios. I am using the ks.test (two sample,two-sided) for analysis. My data sets are both of equal size (n=64). My question is this: For the two sample, two sided ks.test, how is the exact P-value calculated? I have not been able to find an explicit citation of how the P-value is calculated. I have read the help file which cites three publications and assigns those pubs to one version or another of the ks.test (eg. one sample, one-sided, etc.). I have also read to the Conover book (referenced but not cited). Conover tables are adapted from a Birnbaum and Hall (1960) paper, and then I have also found tables by Kim and Jennrich, but I feel I have found some disagreement between these sources (with regards to critical D values and p-values). I would like to compare the methods utilized by R version 2.3.0 on Windows XP. Can anybody tell me the exact method for calculating the p-value for a two-sample, two sided ks.test? Any help would greatly appreciated! Kyle Hall Graduate Research Assistant Biological Systems Engineering Virginia Tech (540) 231-2083 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] configure fails for R 2.3.1 on SunOS 5.8
Does this mean I need to use '--with-readline=no'? configure says: checking build system type... sparc-sun-solaris2.8 checking host system type... sparc-sun-solaris2.8 loading site script './config.site' loading build specific script './config.site' checking for pwd... /usr/bin/pwd checking whether builddir is srcdir... yes checking for working aclocal... found checking for working autoconf... found checking for working automake... found checking for working autoheader... found checking for working makeinfo... found checking for gawk... gawk checking for egrep... egrep checking whether ln -s works... yes checking for ranlib... ranlib checking for bison... bison -y checking for ar... ar checking for a BSD-compatible install... tools/install-sh -c checking for sed... /usr/xpg4/bin/sed checking for more... /usr/bin/more checking for perl... no checking for false... /usr/bin/false configure: WARNING: you cannot build the object documentation system checking for dvips... no checking for tex... no checking for latex... no configure: WARNING: you cannot build DVI versions of the R manuals checking for makeindex... no checking for pdftex... no checking for pdflatex... no configure: WARNING: you cannot build PDF versions of the R manuals checking for makeinfo... /p/gnu/makeinfo checking for install-info... /p/gnu/install-info checking for unzip... /usr/bin/unzip checking for zip... /usr/bin/zip checking for gzip... /usr/bin/gzip checking for firefox... /p/firefox/firefox using default browser ... /p/firefox/firefox checking for acroread... no checking for acroread4... no checking for xpdf... no checking for gv... /p/X11/gv checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking how to run the C preprocessor... gcc -E checking whether gcc needs -traditional... no checking how to run the C preprocessor... gcc -E checking for f95... f95 checking whether we are using the GNU Fortran 77 compiler... no checking whether f95 accepts -g... yes checking for g++... g++ checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking how to run the C++ preprocessor... g++ -E checking whether __attribute__((visibility())) is supported... no checking whether gcc accepts -fvisibility... no checking whether f95 accepts -fvisibility... yes checking for a sed that does not truncate output... /usr/xpg4/bin/sed checking for ld used by gcc... /usr/ccs/bin/ld checking if the linker (/usr/ccs/bin/ld) is GNU ld... no checking for /usr/ccs/bin/ld option to reload object files... -r checking for BSD-compatible nm... /usr/ccs/bin/nm -p checking how to recognise dependent libraries... pass_all checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... no checking for unistd.h... yes checking dlfcn.h usability... yes checking dlfcn.h presence... yes checking for dlfcn.h... yes checking the maximum length of command line arguments... 262144 checking command to parse /usr/ccs/bin/nm -p output from gcc object... ok checking for objdir... .libs checking for ranlib... (cached) ranlib checking for strip... strip checking if gcc static flag works... yes checking if gcc supports -fno-rtti -fno-exceptions... yes checking for gcc option to produce PIC... -fPIC checking if gcc PIC flag -fPIC works... yes checking if gcc supports -c -o file.o... yes checking whether the gcc linker (/usr/ccs/bin/ld) supports shared libraries... yes checking whether -lc should be explicitly linked in... yes checking dynamic linker characteristics... solaris2.8 ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... no checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... no configure: creating libtool appending configuration tag CXX to libtool checking for ld used by g++... /usr/ccs/bin/ld checking if the linker (/usr/ccs/bin/ld) is GNU ld... no checking whether the g++ linker (/usr/ccs/bin/ld) supports shared libraries... yes checking for g++ option to produce PIC... -fPIC checking if g++ PIC flag -fPIC works... yes checking if g++ supports -c -o file.o... yes checking whether the g++ linker (/usr/ccs/bin/ld) supports shared libraries... yes checking dynamic linker characteristics... solaris2.8 ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is
[R] creating a color display matrix
Hello all, I am trying to use R to create a colored data matrix. I have my data which (after certain steps of normalization and log transformation) gives me a x by y matrix of numbers between 0 and -5. I want to be able to create from this matrix of numbers a x by y image (box) that contains x*y squares and somehow uses the value in the original matrix to come up with a color that corresponds to the number. Hence each box will be colored on a scale between 0 and 5. For example my data could look like this: X1.FcH X2.FcH X3.FcH X4.FcH X5.FcH X6.FcH X7.FcH 1-AP0.09667593 -4.66298640 -1.28299697 -4.8739017 -4.95862831 -5.178603 -4.878524750 2-AP -4.69186869 -0.08547776 -4.56495440 -4.8348255 -4.80256152 -5.121531 -4.894347108 3-AP -1.71380667 -4.52626124 -0.06810053 -4.8703810 -4.65657593 -5.024595 -4.824712621 4-AP -4.47968850 -4.48604718 -4.44314403 -0.1569536 -4.86436977 -4.988196 -4.550416356 5-AP -4.64616469 -4.5307 -4.78163386 -4.9162949 -0.01729274 -5.061663 -0.769960777 6-AP -4.61047573 -4.60917414 -4.72514817 -5.0084772 -4.87797740 -0.284934 -1.782745357 7-AP -4.48157167 -4.61850313 -4.72241281 -4.8694868 -1.66122821 -3.887898 -0.002522157 How do I make a 7 x 7 box that has 49 squares and each square has a color in the RGB spectrum that corresponds to the value. So for example in the matrix above the biggest number which is 0.00966 (at position 1 , 1) could be set to RED and the smallest number which is 5.0084 (at position 6, 5) can be set to BLUE and all the other numbers be shades on a scale of Red going to Blue. I hope this problem makes sense. I am rather new to R and was wondering if there was a function or solution to this problem out there. Thanks Kartik -- IMPORTANT WARNING: This email (and any attachments) is only intended for the use of the person or entity to which it is addressed, and may contain information that is privileged and confidential. You, the recipient, are obligated to maintain it in a safe, secure and confidential manner. Unauthorized redisclosure or failure to maintain confidentiality may subject you to federal and state penalties. If you are not the recipient, please immediately notify us by return email, and delete this message from your computer. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a color display matrix
?image has worked for me. David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kartik Pappu Sent: Wednesday, July 26, 2006 3:30 PM To: r-help@stat.math.ethz.ch Subject: [R] creating a color display matrix Hello all, I am trying to use R to create a colored data matrix. I have my data which (after certain steps of normalization and log transformation) gives me a x by y matrix of numbers between 0 and -5. I want to be able to create from this matrix of numbers a x by y image (box) that contains x*y squares and somehow uses the value in the original matrix to come up with a color that corresponds to the number. Hence each box will be colored on a scale between 0 and 5. For example my data could look like this: X1.FcH X2.FcH X3.FcH X4.FcH X5.FcH X6.FcH X7.FcH 1-AP0.09667593 -4.66298640 -1.28299697 -4.8739017 -4.95862831 -5.178603 -4.878524750 2-AP -4.69186869 -0.08547776 -4.56495440 -4.8348255 -4.80256152 -5.121531 -4.894347108 3-AP -1.71380667 -4.52626124 -0.06810053 -4.8703810 -4.65657593 -5.024595 -4.824712621 4-AP -4.47968850 -4.48604718 -4.44314403 -0.1569536 -4.86436977 -4.988196 -4.550416356 5-AP -4.64616469 -4.5307 -4.78163386 -4.9162949 -0.01729274 -5.061663 -0.769960777 6-AP -4.61047573 -4.60917414 -4.72514817 -5.0084772 -4.87797740 -0.284934 -1.782745357 7-AP -4.48157167 -4.61850313 -4.72241281 -4.8694868 -1.66122821 -3.887898 -0.002522157 How do I make a 7 x 7 box that has 49 squares and each square has a color in the RGB spectrum that corresponds to the value. So for example in the matrix above the biggest number which is 0.00966 (at position 1 , 1) could be set to RED and the smallest number which is 5.0084 (at position 6, 5) can be set to BLUE and all the other numbers be shades on a scale of Red going to Blue. I hope this problem makes sense. I am rather new to R and was wondering if there was a function or solution to this problem out there. Thanks Kartik -- IMPORTANT WARNING: This email (and any attachments) is only intended for the use of the person or entity to which it is addressed, and may contain information that is privileged and confidential. You, the recipient, are obligated to maintain it in a safe, secure and confidential manner. Unauthorized redisclosure or failure to maintain confidentiality may subject you to federal and state penalties. If you are not the recipient, please immediately notify us by return email, and delete this message from your computer. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] configure fails for R 2.3.1 on SunOS 5.8
Benjamin Tyner [EMAIL PROTECTED] writes: Does this mean I need to use '--with-readline=no'? configure says: configure: error: --with-readline=yes (default) and headers/libs are not available Yes, or that you need to install readline headers/libs (in a sufficiently recent version), or that you have installed them, but not where R looks for them, so that you need to specify the location. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] configure fails for R 2.3.1 on SunOS 5.8
Thanks; configure completes succesfully using --with-readline=no. However, toward the end of running make, it reports mkdir ../share/locale/[EMAIL PROTECTED] mkdir ../share/locale/[EMAIL PROTECTED]/LC_MESSAGES [EMAIL PROTECTED] you should 'make docs' now ... *** Error code 255 make: Fatal error: Command failed for target `R.1' Current working directory ~/btyner/R-2.3.1/doc *** Error code 1 (ignored) building all R object docs (text, HTML, LaTeX, examples) you need Perl version 5 to build the R object docs *** Error code 1 make: Fatal error: Command failed for target `help-indices' Current working directory ~/btyner/R-2.3.1/src/library *** Error code 1 make: Fatal error: Command failed for target `docs' Current working directory ~/btyner/R-2.3.1/src/library *** Error code 1 (ignored) begin installing recommended package VR I do not have perl installed, as I thought one could install sans documentation. Ben Peter Dalgaard wrote: Benjamin Tyner [EMAIL PROTECTED] writes: Does this mean I need to use '--with-readline=no'? configure says: configure: error: --with-readline=yes (default) and headers/libs are not available Yes, or that you need to install readline headers/libs (in a sufficiently recent version), or that you have installed them, but not where R looks for them, so that you need to specify the location. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA with not non-negative definite covariance
My apologies (in response to the last 2 replies). I should write sensibly - including subject titles that make grammatical sense. (1) By analogous, I mean that using classical MDS with Euclidian distance is equivalent to plotting the first k principle components. (2) Agreed re. distribution assumptions. (3) Agreed re. the need to use some kind of imputation for calculating distances. I'm thinking pairwise exclusion for correlation. Re. why I want to do this is simply for graphically representing my data. Quin -Original Message- From: Berton Gunter [mailto:[EMAIL PROTECTED] Sent: 26 July 2006 05:10 PM To: 'Quin Wills'; [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: RE: [R] PCA with not non-negative definite covariance Not sure what completely analagous means; mds is nonlinear, PCA is linear. In any case, the bottom line is that if you have high dimensional data with many missing values, you cannot know what the multivariate distribution looks like -- and you need a **lot** of data with many variables to usefully characterize it anyway. So you must either make some assumptions about what the distribution could be (including imputation methodology) or use any of the many exploratory techniques available to learn what you can. Thermodynamics holds -- you can't get something for nothing (you can't fool Mother Nature). -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Quin Wills Sent: Wednesday, July 26, 2006 8:44 AM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] PCA with not non-negative definite covariance Thanks. I suppose that another option could be just to use classical multi-dimensional scaling. By my understanding this is (if based on Euclidian measure) completely analogous to PCA, and because it's based explicitly on distances, I could easily exclude the variables with NA's on a pairwise basis when calculating the distances. Quin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 25 July 2006 09:24 AM To: Quin Wills Cc: r-help@stat.math.ethz.ch Subject: Re: [R] PCA with not non-negative definite covariance Hi , hi all, Am I correct to understand from the previous discussions on this topic (a few years back) that if I have a matrix with missing values my PCA options seem dismal if: (1) I dont want to impute the missing values. (2) I dont want to completely remove cases with missing values. (3) I do cov() with use=pairwise.complete.obs, as this produces negative eigenvalues (which it has in my case!). (4) Maybe you can use the Non-linear Iterative Partial Least Squares (NIPALS) algorithm (intensively used in chemometry). S. Dray proposes a version of this procedure at http://pbil.univ-lyon1.fr/R/additifs.html. Hope this help :) Pierre -- Ce message a été envoyé depuis le webmail IMP (Internet Messaging Program) -- No virus found in this incoming message. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- No virus found in this incoming message. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC on linux
Anyone out there using Linux RODBC and unixODBC to connect to a Microsoft SQL server? If possible can someone post a sample .odbc.ini file? I saw a few discussions on the archives a few years ago, but no config file details were available. Thanks, Whit This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Codes; White's heteroscedasticity test and GARCH models
Hello, I have just recently started using R and was wondering whether anybody had a code written for White's heteroscedasticity correction for standard errors. Also, can anybody share a code for the GARCH(1,1) and GARCH-in-mean models for modelling regression residuals? Thanks a lot in advance, Spyros - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Codes; White's heteroscedasticity test and GARCH models
Check tseries and fSeries packages for GARCH -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spiros Mesomeris Sent: Wednesday, July 26, 2006 5:00 PM To: r-help@stat.math.ethz.ch Subject: [R] Codes; White's heteroscedasticity test and GARCH models Hello, I have just recently started using R and was wondering whether anybody had a code written for White's heteroscedasticity correction for standard errors. Also, can anybody share a code for the GARCH(1,1) and GARCH-in-mean models for modelling regression residuals? Thanks a lot in advance, Spyros - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Codes; White's heteroscedasticity test and GARCH models
Spyros: I have just recently started using R and was wondering whether anybody had a code written for White's heteroscedasticity correction for standard errors. See package sandwich, particularly functions vcovHC() and sandwich(). Also, can anybody share a code for the GARCH(1,1) and GARCH-in-mean models for modelling regression residuals? See function garch() in package tseries. Furthermore, the econometrics and finance task views might be helpful for you: http://CRAN.R-project.org/src/contrib/Views/Econometrics.html http://CRAN.R-project.org/src/contrib/Views/Finance.html hth, Z __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overplotting: plot() invocation looks ugly ... suggestions?
Hi Hadley, Thanks for your suggestion. The description of ggplot states: Description: ... It combines the advantages of both base and lattice graphics ... and you can still build up a plot step by step from multiple data sources So I thought I'd try to enhance the plot by adding in the means from each quarter (this is snagged directly from ESS): qplot(Quarter, Consumption, data=data, type=c(point,line), id=data$Year) ( mean.per.quarter- with(data, tapply(Consumption, Quarter, mean)) ) points(mean.per.quarter, pch=+, cex=2.0) qplot(Quarter, Consumption, data=data, type=c(point,line), id=data$Year) ( mean.per.quarter- with(data, tapply(Consumption, Quarter, mean)) ) 1 2 3 4 888.2 709.2 616.4 832.8 points(mean.per.quarter, pch=+, cex=2.0) Error in plot.xy(xy.coords(x, y), type = type, ...) : plot.new has not been called yet Now I'm green behind the ears when it comes to R, so I'm guessing that there is some major conflict between base graphics and lattice graphics, which I thought ggplot avoided, given the library help blurb. I'm assuming that there must be a way to add points / lines to lattice / ggplot graphics (in the latter case it seems to be via ggpoint, or some such)? But is there a way that allows me to add via: points(mean.per.quarter, pch=+, cex=2.0) and similar, or do I have to learn the lingo for lattice / ggplot? Thanks, Jack. hadley wickham [EMAIL PROTECTED] wrote: And if lattice is ok then try this: library(lattice) xyplot(Consumption ~ Quarter, group = Year, data, type = o) Or you can use ggplot: install.packages(ggplot) library(ggplot) qplot(Quarter, Consumption, data=data,type=c(point,line), id=data$Year) Unfortunately this has uncovered a couple of small bugs for me to fix (no automatic legend, and have to specify the data frame explicitly) The slighly more verbose example below shows you what it should look like. data$Year - factor(data$Year) p - ggplot(data, aes=list(x=Quarter, y=Consumption, id=Year, colour=Year)) ggline(ggpoint(p), size=2) Regards, Hadley - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R vs. Stata
Thanks Patrick! -Cody -Original Message- From: Patrick Burns [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 26, 2006 13:35 PM To: Hamilton, Cody Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R vs. Stata There is some discussion in: http://www.burns-stat.com/pages/Tutor/R_relative_statpack.pdf which can also be found at the UCLA website. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Hamilton, Cody wrote: I have read some very good reviews comparing R (or Splus) to SAS. Does anyone know if there are any reviews comparing R (or Splus) to Stata? I am trying to get others to try R in my department, and I have never used Stata. Regards, -Cody Cody Hamilton, Ph.D Institute for Health Care Research and Improvement Baylor Health Care System (214) 265-3618 This e-mail, facsimile, or letter and any files or attachments transmitted with it contains information that is confidential and privileged. This information is intended only for the use of the individual(s) and entity(ies) to whom it is addressed. If you are the intended recipient, further disclosures are prohibited without proper authorization. If you are not the intended recipient, any disclosure, copying, printing, or use of this information is strictly prohibited and possibly a violation of federal or state law and regulations. If you have received this information in error, please notify Baylor Health Care System immediately at 1-866-402-1661 or via e-mail at [EMAIL PROTECTED] Baylor Health Care System, its subsidiaries, and affiliates hereby claim all applicable privileges related to this information. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail, facsimile, or letter and any files or attachmen...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests [Broadcast]
You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki Dear all, I am trying to train a randomForest using all my control data (12,000 cases, ~ 20 explanatory variables, 2 classes). Because of memory constraints, I have split my data into 7 subsets and trained a randomForest for each, hoping that using combine() afterwards would solve the memory issue. Unfortunately, combine() still runs out of memory. Is there anything else I can do? (I am not using the formula version) Many Thanks Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overplotting: plot() invocation looks ugly ... suggestions?
With the lattice package it would be done like this (where the panel.points function places large red pluses on the plot): xyplot(Consumption ~ Quarter, group = Year, data, type = o) trellis.focus(panel, 1, 1) panel.points(1:4, mean.per.quarter, pch = +, cex = 2, col = red) trellis.unfocus() On 7/26/06, John McHenry [EMAIL PROTECTED] wrote: Hi Hadley, Thanks for your suggestion. The description of ggplot states: Description: ... It combines the advantages of both base and lattice graphics ... and you can still build up a plot step by step from multiple data sources So I thought I'd try to enhance the plot by adding in the means from each quarter (this is snagged directly from ESS): qplot(Quarter, Consumption, data=data, type=c(point,line), id=data$Year) ( mean.per.quarter- with(data, tapply(Consumption, Quarter, mean)) ) points(mean.per.quarter, pch=+, cex=2.0) qplot(Quarter, Consumption, data=data, type=c(point,line), id=data$Year) ( mean.per.quarter- with(data, tapply(Consumption, Quarter, mean)) ) 1 2 3 4 888.2 709.2 616.4 832.8 points(mean.per.quarter, pch=+, cex=2.0) Error in plot.xy(xy.coords(x, y), type = type, ...) : plot.new has not been called yet Now I'm green behind the ears when it comes to R, so I'm guessing that there is some major conflict between base graphics and lattice graphics, which I thought ggplot avoided, given the library help blurb. I'm assuming that there must be a way to add points / lines to lattice / ggplot graphics (in the latter case it seems to be via ggpoint, or some such)? But is there a way that allows me to add via: points(mean.per.quarter, pch=+, cex=2.0) and similar, or do I have to learn the lingo for lattice / ggplot? Thanks, Jack. hadley wickham [EMAIL PROTECTED] wrote: And if lattice is ok then try this: library(lattice) xyplot(Consumption ~ Quarter, group = Year, data, type = o) Or you can use ggplot: install.packages(ggplot) library(ggplot) qplot(Quarter, Consumption, data=data,type=c(point,line), id=data$Year) Unfortunately this has uncovered a couple of small bugs for me to fix (no automatic legend, and have to specify the data frame explicitly) The slighly more verbose example below shows you what it should look like. data$Year - factor(data$Year) p - ggplot(data, aes=list(x=Quarter, y=Consumption, id=Year, colour=Year)) ggline(ggpoint(p), size=2) Regards, Hadley - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] seq unexpected behavior
seq(0.1, 0.9 - 0.8, by = 0.1) gives the following error message: Error in seq.default(0.1, 0.9 - 0.8, by = 0.1) : wrong sign in 'by' argument but seq(0.1, 0.8 - 0.7, by = 0.1) gives [1] 0.1 (no error message) Why do I get an error message in the first case? Han sessionInfo() R version 2.2.1, 2005-12-20, i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base (NB I also tried version 2.3.1 and got the same result - both versions are precompiled) Sys.getlocale() [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 This email message is for the sole use of the intended recip...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] seq unexpected behavior
On Wed, 2006-07-26 at 18:35 -0700, Vries, Han de wrote: seq(0.1, 0.9 - 0.8, by = 0.1) gives the following error message: Error in seq.default(0.1, 0.9 - 0.8, by = 0.1) : wrong sign in 'by' argument but seq(0.1, 0.8 - 0.7, by = 0.1) gives [1] 0.1 (no error message) Why do I get an error message in the first case? Han See R FAQ 7.31 Why doesn't R think these numbers are equal? print(0.9 - 0.8, 20) [1] 0.09997780 print(0.8 - 0.7, 20) [1] 0.10008882 In the first case, the result of the subtraction is slightly less than 0.1, resulting in a negative interval. In the second case, it is slightly greater than 0.1, which is OK. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non-parametric four-way interactions?
Dear All I am trying to study four-way interactions in an ANOVA problem. However, qqnorm+qqline result (at http://phhs80.googlepages.com/qqnorm.png) is not promising regarding the normality of data (960 observations). The result of Shapiro-Wilk test is also not encouraging: W = 0.9174, p-value 2.2e-16 (I am aware of the fact that normality tests tend to reject normality for large samples.) By the way, the histogram is at: http://phhs80.googlepages.com/hist.png To circumvent the problem, I looked for non-parametric tests, but I found nothing, but the article: http://www.pgia.ac.lk/socs/asasl/journal_papers/PDFformat/g.bakeerathanpaper-2.pdf Finally, my question is: has R got implemented functions to use non-parametric tests to avoid the fulfillment of the normality assumption required to study four-way interactions? Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC on linux
On Wed, 2006-07-26 at 17:52 -0400, Armstrong, Whit wrote: Anyone out there using Linux RODBC and unixODBC to connect to a Microsoft SQL server? If possible can someone post a sample .odbc.ini file? I saw a few discussions on the archives a few years ago, but no config file details were available. Thanks, Whit Whit, Do you have a Linux ODBC driver for SQL Server? unixODBC is simply the driver manager, not the driver itself. MS does not offer (not surprisingly) an ODBC driver for Unix/Linux. There are resources available however and these might be helpful: http://www.sommarskog.se/mssql/unix.html Note that Easysoft provides (at a cost) an ODBC-ODBC bridge for Unix/Linux platforms which supports ODBC connections to SQL Server: http://www.easysoft.com/products/data_access/odbc_odbc_bridge/index.html I am using RODBC to connect from a FC5 system to an Oracle 10g server running on RHEL, however Oracle provides the ODBC driver for Linux that can work with the unixODBC facilities. Also, note that there is a R-sig-DB e-mail list: https://stat.ethz.ch/mailman/listinfo/r-sig-db HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC on linux
On 26 July 2006 at 20:56, Marc Schwartz wrote: | On Wed, 2006-07-26 at 17:52 -0400, Armstrong, Whit wrote: | Anyone out there using Linux RODBC and unixODBC to connect to a | Microsoft SQL server? [...] | Do you have a Linux ODBC driver for SQL Server? unixODBC is simply the | driver manager, not the driver itself. | | MS does not offer (not surprisingly) an ODBC driver for Unix/Linux. But there is the FreeTDS project (package freetds-dev in Debian) with its associated ODBC drive (package tdsodbc, from the FreeTDS sources). At some point a few years ago, a colleague and I were trying to coax that and unixOBDC to let R (on Solaris) talk to Sybase (on Solaris) and got it to work. MS-SQL is (AFAIK) a descendant of Sybase code originally licensed by MS, hence the common FreeTDS code lineage). So it should be doable. Luckily I haven't needed to talk to MS SQL myself so the usual grain of salt alert... And sorry, hence no working .odbc.ini to share. Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC on linux
On Wed, 2006-07-26 at 21:38 -0500, Dirk Eddelbuettel wrote: On 26 July 2006 at 20:56, Marc Schwartz wrote: | On Wed, 2006-07-26 at 17:52 -0400, Armstrong, Whit wrote: | Anyone out there using Linux RODBC and unixODBC to connect to a | Microsoft SQL server? [...] | Do you have a Linux ODBC driver for SQL Server? unixODBC is simply the | driver manager, not the driver itself. | | MS does not offer (not surprisingly) an ODBC driver for Unix/Linux. But there is the FreeTDS project (package freetds-dev in Debian) with its associated ODBC drive (package tdsodbc, from the FreeTDS sources). At some point a few years ago, a colleague and I were trying to coax that and unixOBDC to let R (on Solaris) talk to Sybase (on Solaris) and got it to work. MS-SQL is (AFAIK) a descendant of Sybase code originally licensed by MS, hence the common FreeTDS code lineage). So it should be doable. Luckily I haven't needed to talk to MS SQL myself so the usual grain of salt alert... And sorry, hence no working .odbc.ini to share. FreeTDS was one of the options listed on the first URL that I had included. :-) Here is the direct link: http://www.freetds.org/ Regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric four-way interactions?
Paul Smith wrote: Dear All I am trying to study four-way interactions in an ANOVA problem. However, qqnorm+qqline result (at http://phhs80.googlepages.com/qqnorm.png) is not promising regarding the normality of data (960 observations). The result of Shapiro-Wilk test is also not encouraging: W = 0.9174, p-value 2.2e-16 (I am aware of the fact that normality tests tend to reject normality for large samples.) By the way, the histogram is at: http://phhs80.googlepages.com/hist.png To circumvent the problem, I looked for non-parametric tests, but I found nothing, but the article: http://www.pgia.ac.lk/socs/asasl/journal_papers/PDFformat/g.bakeerathanpaper-2.pdf Finally, my question is: has R got implemented functions to use non-parametric tests to avoid the fulfillment of the normality assumption required to study four-way interactions? Thanks in advance, Paul Yes, although I seldom want to look at 4th order interactions. You can fit a proportional odds model for an ordinal response which is a generalization of the Wilcoxon/Kruskal-Wallis approach, and allows one to have N-1 intercepts in the model when there are N data points (i.e., it works even with no ties in the data). However if N is large the matrix operations will be prohibitive and you might reduce Y to 100-tile groups. The PO model uses only the ranks of Y so is monotonic transformation invariant. library(Design) # also requires library(Hmisc) f - lrm(y ~ a*b*c*d) f anova(f) Also see the polr function in VR -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC on linux
On Wed, 26 Jul 2006, Marc Schwartz wrote: On Wed, 2006-07-26 at 17:52 -0400, Armstrong, Whit wrote: Anyone out there using Linux RODBC and unixODBC to connect to a Microsoft SQL server? If possible can someone post a sample .odbc.ini file? I saw a few discussions on the archives a few years ago, but no config file details were available. Thanks, Whit Whit, Do you have a Linux ODBC driver for SQL Server? unixODBC is simply the driver manager, not the driver itself. MS does not offer (not surprisingly) an ODBC driver for Unix/Linux. There are resources available however and these might be helpful: http://www.sommarskog.se/mssql/unix.html Note that Easysoft provides (at a cost) an ODBC-ODBC bridge for Unix/Linux platforms which supports ODBC connections to SQL Server: http://www.easysoft.com/products/data_access/odbc_odbc_bridge/index.html Several people have successfully used that, from the earliest days of RODBC: I believe it was part of Michael Lapsley's motivation to write RODBC. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.