Re: [R] normality testing with nortest
Hi Rolf, On Sun, 21 May 2006, Rolf Turner wrote: I don't know from the nortest package, but it should ***always*** be the case that you test hypotheses ... If the nortest package does it differently (and I don't really see how it possibly could!) then it is confusingly designed. I rather suspect that its design is just fine, and that it does what it should do. Thank you for your reply. As I had expected, it was my knowledge of the test which was inadequate and it had nothing to do with the package. No doubt it works fine! Your explanation sounds very reasonable to me...thanks again! Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Gabor __ Gabor R-help@stat.math.ethz.ch mailing list Gabor https://stat.ethz.ch/mailman/listinfo/r-help Gabor PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] normality testing with nortest
Rolf Turner wrote: I don't know from the nortest package, but it should ***always*** be the case that you test hypotheses H_0: The data have a normal distribution. vs. H_a: The data do not have a normal distribution. So if you get a p-value 0.05 you can say that ***there is evidence*** (at the 0.05 significance level) that the data are not from a normal distribution. If the nortest package does it differently (and I don't really see how it possibly could!) then it is confusingly designed. I rather suspect that its design is just fine, and that it does what it should do. I suspect so as well. If you think something is wrong, please contact the package maintainer (CCing; he's not reading R-help posts). Uwe Ligges cheers, Rolf Turner [EMAIL PROTECTED] Original message: I have a question regarding normality testing with the nortest package. I have to admit, my question is so general that it might more be suited a newsgroup such as sci.math. However, just in case it is implemented differently, I was hoping someone who has used it can help me out. I just want to double check that for all of the tests, the null hypothesis is whether or not the input distribution *differs* with the normal distribution. So, if you get a p-value less than (say) 0.05, you can reject the null hypothesis at a 95% confidence level and say that the input distribution is *not* normal. So these tests check/confirm whether a distribution is not normal...as opposed to confirming that it is normal. Does this sound about right? Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How can you buy R?
G'day Deepayan, DS == Deepayan Sarkar [EMAIL PROTECTED] writes: DS let me first summarize this sub-discussion so far: [...] Sound like a perfect summary. :) DS As far as I can tell (and please correct me if I'm wrong), DS your contention is that by linking a GPL component P with a DS non-GPL component Q, a user may lose the rights granted to him DS by the GPL to the GPL-d part P. I don't think that I said this explicitly, but I can see how what I said can be interpreted in such a way. The point is rather that at the moment component P and Q are linked (and I perhaps carelessly assumed that the user was doing this) a product is produced that should be completely under the GPL. Obviously it is not. Hence, the status of this linked product, and whether it can be used by anybody, is an open question. And the answer is probably given by the copyright laws (and others?) of the country in the linking happens. DS Let's assume this is true. All that means is that the user has DS lost his rights to copy, modify and redistribute P. He does DS NOT lose the rights to use P. I agree with you on this. Probably I was to terse in my writing and produced misunderstandings. I never intended to say something about the rights that the user has with regards to P alone. My comments were directed towards the linked product P+Q. In particular, it is not clear to me whether one can execute such a product without violating copyright laws. Thus, the last sentence of mine that you quoted: My understanding was that in that moment a product was created that would have to be wholly under the GPL, so the user was violating the GPL and lost the write to use your package. Should perhaps better be formulated as: My understanding was that in that moment a product was created that would have to be wholly under the GPL, so the person who did the linking was violating the GPL and it is not clear whether anyone is allowed to use the linked product. A simple google search would have confirmed to you that the linux kernel is developed under the GPL. [...] DS Linux is under GPL2, and not GPL2 or later. [...] Oh, I wasn't aware that they did not use the typical(?) or later phrase. Thanks for pointing this out and I note that we both agree that the linux kernel is definitely not under LGPL. DS In any case, this is the complete opposite of the situation we DS were originally discussing: [...] [...] So I have to wonder to what you are referring to as the situation we were originally discussing. DS I was referring to your question (quoted above) about use of DS GPL'd code in S-PLUS, which is what I was replying to. As I DS was saying, that situation is the opposite of the one in your DS example. O.k., sorry, I used a different scale with the time point of origin at Spencer's e-mail and my answer to that mail. Now I am with you. Agreed, the situation is the opposite, but that was the example discussed in gnu.misc.discuss. From an abstract point of view the situations are the same. You make someone else link a GPL product with a non-GPL product creating a derived work, the derived work would have to be under the GPL but is not. Hence, the derived work has a legal status that is in limbo and it is not clear whether anyone has to right to use it. The discussions on gnu.misc.discuss were centred on cases were people provided non-GPL binaries, asked their users to download GPL software from elsewhere, compile and link everything together and then use the combined product. As you say it is the exact opposite (and hence mirror image) from the situation that I was worried about, where I provide GPL software and ask others to compile and link it with non-GPL binaries and then use the combined product. If one scenario is not on, I don't see how the other one could be acceptable either. Except that in the first scenario there is a clear intend of circumventing the GPL. But I was not sure whether such kind of intent makes any difference. Thus, to avoid all these problems I decided to rather use the LGPL since that licence definitely seemed to allow both. Hope this clarifies some of my comments. Cheers, Berwin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] nls fitting
Lorenzo Isella wrote: Dear All, I may look ridiculous, but I am puzzled at the behavior of the nls with a fitting I am currently dealing with. My data are: x N 1 346.4102 145.428256 2 447.2136 169.530634 3 570.0877 144.081627 4 721.1103 106.363316 5 894.4272 130.390552 6 1264.9111 36.727069 7 1788.8544 52.848587 8 2449.4897 25.128742 9 3464.1016 7.531766 10 4472.1360 8.827367 11 6123.7244 6.600603 12 8660.2540 4.083339 I would like to fit N as a function of x according to a function depending on 9 parameters (A1,A2,A3,mu1,mu2,mu3,myvar1,myvar2,myvar3), namely N ~ (log(10)*A1/sqrt(2*pi)/log(myvar1)*exp(-((log(x/mu1))^2)/2/log(myvar1)/log(myvar1)) +log(10)*A2/sqrt(2*pi)/log(myvar2)*exp(-((log(x/mu2))^2)/2/log(myvar2)/log(myvar2)) +log(10)*A3/sqrt(2*pi)/log(myvar3)*exp(-((log(x/mu3))^2)/2/log(myvar3)/log(myvar3))) (i.e. N is to be seen as a sum of three bells whose parameters I need to determine). So I tried: out-nls(N ~ (log(10)*A1/sqrt(2*pi)/log(myvar1)*exp(-((log(x/mu1))^2)/2/log(myvar1)/log(myvar1)) +log(10)*A2/sqrt(2*pi)/log(myvar2)*exp(-((log(x/mu2))^2)/2/log(myvar2)/log(myvar2)) +log(10)*A3/sqrt(2*pi)/log(myvar3)*exp(-((log(x/mu3))^2)/2/log(myvar3)/log(myvar3))) ,start=list(A1 = 85, A2=23,A3=4,mu1=430,mu2=1670,mu3=4900,myvar1=1.59,myvar2=1.5,myvar3=1.5 ) ,algorithm = port ,control=list(maxiter=2,tol=1) ,lower=c(A1=0.1,A2=0.1,A3=0.1,mu1=0.1,mu2=0.1,mu3=0.1,myvar1=0.1,myvar2=0.1,myvar3=0.1) ) getting the error message: Error in nls(N ~ (log(10) * A1/sqrt(2 * pi)/log(myvar1) * exp(-((log(x/mu1))^2)/2/log(myvar1)/log(myvar1)) + : Convergence failure: singular convergence (7) I tried to adjust tol maxiter, but unsuccessfully. If I try fitting N with only two bells, then nls works: out-nls(N ~ (log(10)*A1/sqrt(2*pi)/log(myvar1)*exp(-((log(x/mu1))^2)/2/log(myvar1)/log(myvar1)) +log(10)*A2/sqrt(2*pi)/log(myvar2)*exp(-((log(x/mu2))^2)/2/log(myvar2)/log(myvar2)) ) ,start=list(A1 = 85, A2=23,mu1=430,mu2=1670,myvar1=1.59,myvar2=1.5 ) ,algorithm = port ,control=list(maxiter=2,tol=1) ,lower=c(A1=0.1,A2=0.1,mu1=0.1,mu2=0.1,myvar1=0.1,myvar2=0.1) ) out Nonlinear regression model model: N ~ (log(10) * A1/sqrt(2 * pi)/log(myvar1) * exp(-((log(x/mu1))^2)/2/log(myvar1)/log(myvar1)) + log(10) * A2/sqrt(2 * pi)/log(myvar2) * exp(-((log(x/mu2))^2)/2/log(myvar2)/log(myvar2))) data: parent.frame() A1 A2mu1mu2 myvar1 myvar2 84.920085 40.889968 409.656404 933.081936 1.811560 2.389215 residual sum-of-squares: 2394.876 Any idea about how to get nls working with the whole model? I had better luck with the nls.lm package, but it does not allow to introduce any constrain on my fitting parameters. I was also suggested to try other packages like optim to do the same fitting, but I am a bit unsure about how to set up the problem. Any suggestions? BTW, I am working with R Version 2.2.1 Lorenzo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html apart from the fact that fitting 9 parameters to 12 points quite genereally will not yield satisfactory results (at least estimates will have huge uncertainties), total failure in your case seems obviouus if you plot your data: it's not even obvious where the three peaks (means of the gaussians) should be: all below x=2000 or is there one peak at about x=4500 and one of the 'peaks' below x=2000 is spurious? if you can't decide, nls has problems. moreover: how should reliable estimates of the standard deviations (width) of the gaussian result if the peaks essentially consist of exactly one point? in short: I believe, you either need much more data points or you must put in substantial a priori information (e.g. either means or standard deviations of the gaussians). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] normality testing with nortest
On Mon, 22 May 2006, Uwe Ligges wrote: Rolf Turner wrote: If the nortest package does it differently (and I don't really see how it possibly could!) then it is confusingly designed. I rather suspect that its design is just fine, and that it does what it should do. I suspect so as well. If you think something is wrong, please contact the package maintainer (CCing; he's not reading R-help posts). Ah, ok -- but in this case, it was clearly my misunderstanding which is one reason why I never though of writing to the package maintainer. I have one of the books that the Nortest documentation cites, but I was clearly reading it backwards or upside-down or something as I missed several crucial points. One thing that threw me off (and this is not really specific to Nortest as it seems to be correct, but just my understanding), but the p-value seems quite unstable. For example: ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.2382, p-value = 0.7767 ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.1846, p-value = 0.9059 ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.5138, p-value = 0.1887 I mistakenly had thought the p-values would be more stable since I am artificially creating a random normal distribution. Is this expected for a normality test or is this an issue with how rnorm is producing random numbers? I guess if I run it many times, I would find that I would get many large values for the p-value? Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] normality testing with nortest
Raymond Wan wrote: On Mon, 22 May 2006, Uwe Ligges wrote: Rolf Turner wrote: If the nortest package does it differently (and I don't really see how it possibly could!) then it is confusingly designed. I rather suspect that its design is just fine, and that it does what it should do. I suspect so as well. If you think something is wrong, please contact the package maintainer (CCing; he's not reading R-help posts). Ah, ok -- but in this case, it was clearly my misunderstanding which is one reason why I never though of writing to the package maintainer. I have one of the books that the Nortest documentation cites, but I was clearly reading it backwards or upside-down or something as I missed several crucial points. One thing that threw me off (and this is not really specific to Nortest as it seems to be correct, but just my understanding), but the p-value seems quite unstable. For example: ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.2382, p-value = 0.7767 ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.1846, p-value = 0.9059 ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.5138, p-value = 0.1887 I mistakenly had thought the p-values would be more stable since I am artificially creating a random normal distribution. Is this expected for a normality test or is this an issue with how rnorm is producing random numbers? I guess if I run it many times, I would find that I would get many large values for the p-value? Well, as many large values as small values, 5% significant differences for the 5% level The following looks alright: hist(replicate(1000, ad.test(rnorm(100,mean=5,sd=3))$p.value)) Ray __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to reference R in papers?
Dear List, How do i reference R 2.3.0 in research papers? JJ --- -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to reference R in papers?
Hi JJ, try the following function in R: citation() Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unreadable labels -- in x11() and derivative
Maciej == Maciej Bliziński [EMAIL PROTECTED] on Sun, 21 May 2006 14:48:13 +0200 writes: Maciej Playing around with examples from MASS4, I found a font problem in the Maciej mosaicplot in R-2.3.0. It doesn't happen in other plots. Running this Maciej example from MASS4, page 326... Maciej library(MASS) Maciej caith1 - as.matrix(caith) Maciej names(dimnames(caith1)) - c(eyes, hair) Maciej mosaicplot(caith1, color = TRUE) Maciej ...I get an image as attached. The column and row labels are unreadable. Maciej It is true for both x11() and png() devices. . Maciej Looking at the demo(graphics), I can also recognize some unreadable Maciej labels. Do you think it's only the matter of the X-Window system? Maciej Installed fonts? Maciej Any hints appreciated. Yes, it's a matter of the X - font server of your X-window system. As help(png) clearly notes, png() itself is also using the x11() internals. For me, the labels are all well readable for the above example; even here (at work) where the font server is still not providing correct font sizes too often for my taste -- e.g. when drawing graphs with the Rgraphviz package {from bioconductor}. Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to contribute improved documentation {was .. lm.predict ..}
Larry == Larry Howe [EMAIL PROTECTED] on Sat, 20 May 2006 15:10:09 -0400 writes: An optional data frame in which to look for variables with which to predict. means almost nothing to me. Specifically it doesn't translate to me, as an engineer and software developer, but not a statistician, that the names have to match, and furthermore if they don't match, R fails silently. Larry Howe I have had offline conversations not related to this thread by others who were also confused by this so I agree it would be worthwhile to clarify it. Larry Thank you! If there is anything I can do to help get Larry this documented / clarified, please let me know. BTW, this has now become a topic that belongs a bit more to the R-devel rather than R-help mailing list, but I keep it here for reference and in the hope it will be closed for now. Maybe because of your question (or completely incidentally?) yesterday, we had the topic Suggesting changes to HELP files? on R-devel, see https://stat.ethz.ch/pipermail/r-devel/2006-May/037785.html and it's follow up postings, notably Duncan Murdoch's https://stat.ethz.ch/pipermail/r-devel/2006-May/037787.html I had written part of the following independently, and I would recommend to use in conjunction with Duncan's. Yes, you can help to get this documentation improved. Ideally, and with highest probability of success, if you provide a patch *) to the current documentation *source*, which for help pages always means a file named *.Rd from the R source - which is available online, as a whole, see CRAN - source, or - in this case more conveniently - the complete file tree is at https://svn.R-project.org/R/trunk/ and the file in question is https://svn.r-project.org/R/trunk/src/library/stats/man/predict.lm.Rd Since providing a patch *) is hard for average R users, for documentation changes we would also be content to just receive the new version of the text you changed from the corresponding *.Rd file. What we do like much less is an edited version of some *representation* of the help, e.g., any version you get by using help() or browsing any forms of the manuals. Martin Maechler, ETH Zurich *) Providing a patch means: - edit the file to be changed and keep the original; - run 'diff -u oldfile newfile ' in your shell (which must 'diff' in its path). - E-mail (with a mail program that does NOT mangle lines and does not send HTML ..) the output of the above 'diff ..' to R-devel as a proposal for improving R. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Matrix in 3D
Dear R Users, Is it possible to add another (third) index to matrix (as in MATLAB). For some analysis e.g. finite mixture models is necessary. Simple example i-3 matrix[, , i]-matrixA[, ,i]%*%matrixB[, , i] I would appreciate any help Rob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] random generation of a factor
Hi everybody Does anybody know a function that generates a factor of length N with K levels given the proportions c(p1, p2,... ,pk)? It would not be too hard to write it for myself, but if there's a preexisting one.. I searched the documentation and the help archive, but could not find anything helpful. version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor2.0 year 2005 month10 day 06 svn rev 35749 language R Thanks a lot. Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix in 3D
[EMAIL PROTECTED] wrote: Dear R Users, Is it possible to add another (third) index to matrix (as in MATLAB). For some analysis e.g. finite mixture models is necessary. Simple example i-3 matrix[, , i]-matrixA[, ,i]%*%matrixB[, , i] See ?array Uwe Ligges I would appreciate any help Rob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] how to vectorize a matrix
The help documentation suggests to use the command stack. Either it does not work or i did not understand well how to manage it. I need to vectorize the columns of a matrix in a certain order, any suggestion (or explanation on the command stack)? Thanks in advance. L * credo nella ragione umana, e nella libertà e nella giustizia che dalla ragione scatiruscono. (sciascia) * [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] SUSE 10.1 64 bit binary
Hi, I was just wondering if anyone had created a 64 bit binary for SUSE 10.1 x86_64 yet. Unfortunately, I upgraded before looking whether a package was available and now cannot find a suitable package on CRAN. If noone has produced one yet, does anyone have any idea when one will be available? Many Thanks Daniel Brewer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix in 3D
Hi there are dozens of examples of this kind of thing in the R-and-octave.txt file, in the contributed docs section of CRAN. See the multidimensional arrays section, near the bottom, for reproduction of all of matlab/octave's array handling capabilities. best wishes Robin On 22 May 2006, at 09:56, Uwe Ligges wrote: [EMAIL PROTECTED] wrote: Dear R Users, Is it possible to add another (third) index to matrix (as in MATLAB). For some analysis e.g. finite mixture models is necessary. Simple example i-3 matrix[, , i]-matrixA[, ,i]%*%matrixB[, , i] See ?array Uwe Ligges I would appreciate any help Rob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting- guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting- guide.html -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to vectorize a matrix
Hi Lorenzo, maybe the following example is of use? a - matrix(1:25,5,5) stack(as.data.frame(a[, c(1,3,5,2,4)])) Note that 'stack' takes a data frame or list as first argument (not a matrix). Therefore the matrix is first converted to a data frame using 'as.data.frame'. Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] normality testing with nortest
Uwe Ligges [EMAIL PROTECTED] writes: A = 0.1846, p-value = 0.9059 ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.5138, p-value = 0.1887 I mistakenly had thought the p-values would be more stable since I am artificially creating a random normal distribution. Is this expected for a normality test or is this an issue with how rnorm is producing random numbers? I guess if I run it many times, I would find that I would get many large values for the p-value? Well, as many large values as small values, 5% significant differences for the 5% level The following looks alright: hist(replicate(1000, ad.test(rnorm(100,mean=5,sd=3))$p.value)) We see this misunderstanding worryingly often. Worrying because it reveals that a fundamental aspect of statistical inference has not been grasped: that p-values are designed to be (approximately) uniformly distributed and fall below any given level with the stated probability, when the null hypothesis is true. There is no mechanism to give you fewer significant or more stable p-values, and a p-value close to one is no better an indication of a true null hypothesis than one of 0.5 or 0.25. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] determination of number of entries in list elements
Hi, thanks you all a lot for the quick reply. regards Benjamin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] SUSE 10.1 64 bit binary
Daniel Brewer [EMAIL PROTECTED] writes: Hi, I was just wondering if anyone had created a 64 bit binary for SUSE 10.1 x86_64 yet. Unfortunately, I upgraded before looking whether a package was available and now cannot find a suitable package on CRAN. If noone has produced one yet, does anyone have any idea when one will be available? Many Thanks Daniel Brewer I would suspect that the one for 10.0 works for 10.1 as well. With 2.3.1 scheduled for June 1, there's little point in asking for an RPM now. Otherwise, you can fairly easily build one from sources + Detlef's .spec file. You may need a handful of build tools, though. (Incidentally, Detlef: The Source: entry in the SPEC is wrong. cvs.r-project.org is not accessible and hasn't been so for quite a while. ftp://cran.r-project.org/pub/R/src/base/R-2/R-%version.tar.gz works, and is the same used for RedHat/Fedora.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix in 3D
Thank you very much you Robin and Uwe. Od: Robin Hankin [EMAIL PROTECTED] Do: Uwe Ligges [EMAIL PROTECTED] Data: 22 maja 2006 11:04 Temat: Re: [R] Matrix in 3D Hi there are dozens of examples of this kind of thing in the R-and-octave.txt file, in the contributed docs section of CRAN. See the multidimensional arrays section, near the bottom, for reproduction of all of matlab/octave's array handling capabilities. best wishes Robin On 22 May 2006, at 09:56, Uwe Ligges wrote: [EMAIL PROTECTED] wrote: Dear R Users, Is it possible to add another (third) index to matrix (as in MATLAB). For some analysis e.g. finite mixture models is necessary. Simple example i-3 matrix[, , i]-matrixA[, ,i]%*%matrixB[, , i] See ?array Uwe Ligges I would appreciate any help Rob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting- guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting- guide.html -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Can lmer() fit a multilevel model embedded in a regression?
So, in the hierarchical notation, does the model look like this (for the linear predictor): DV = constant + food_1(B_1) + food_2(B_2) + ... + food_82(B_82) + sex(B_83) + age(B_84) food_1 = gamma_00 + gamma_01(folic) + r_01 food_2 = gamma_10 + gamma_11(folic) + r_02 ... food_82 = gamma_20 + gamma_21(folic) + r_82 where r_qq ~ N(0, Psi) and Psi is an 82-dimensional covariance matrix. I usually need to see this in model form as it helps me translate this into lmer syntax if it can be estimated. From what I see, this would be estimating 82(82+1)/2 = 3403 parameters in the covariance matrix. What I'm stuck on is below you say it would be hopeless to estimate the 82 predictors using ML. But, if I understand the model correctly, the multilevel regression still resolves the predictors (fixed effects) using ML once estimates of the variances are obtained. So, I feel I might still be missing something. -Original Message- From: Andrew Gelman [mailto:[EMAIL PROTECTED] Sent: Sun 5/21/2006 7:35 PM To: Doran, Harold Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject:Re: [R] Can lmer() fit a multilevel model embedded in a regression? Harold, I get confused by the terms fixed and random. Our first-level model (in the simplified version we're discussing here) has 800 data points (the persons in the study) and 84 predictors: sex, age, and 82 coefficients for foods. The second-level model has 82 data points (the foods) and two predictors: a constant term and folic acid concentration. It would be hopeless to estimate the 82 food coefficients via maximum likelihood, so the idea is to do a multilevel model, with a regression of these coefficients on the constant term and folic acid. The group-level model has a residual variance. If the group-level residual variance is 0, it's equivalent to ignoring food, and just using total folic acid as an individual predictor. If the group-level residual variance is infinity, it's equivalent to estimating the original regression (with 84 predictors) using least squares. The difficulty is that the foods aren't groups in the usual sense, since persons are not nested within foods; rather, each person eats many foods, and this is reflected in the X matrix. Andrew Doran, Harold wrote: OK, I'm piecing this together a bit, sorry I'm not familiar with the article you cite. Let me try and fully understand the issue if you don't mind. Are you estimating each of the 82 foods as fixed effects? If so, in the example below this implies 84 total fixed effects (1 for each food type in the X matrix and then sex and age). I'm assuming that food type is nested within one of the 82 folic acid concentrations and then folic acid is treated as a random effect. Is this accurate? -Original Message- From: Andrew Gelman [mailto:[EMAIL PROTECTED] Sent: Sun 5/21/2006 9:17 AM To: Doran, Harold Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject:Re: [R] Can lmer() fit a multilevel model embedded in a regression? Harold, I'm confused now. Just for concretness, suppose we have 800 people, 82 food items, and one predictor (folic, the folic acid concentration) at the food-item level. Then DV will be a vector of length 800, foods is an 800 x 82 matrix, sex is a vector of length 800, age is a vector of length 800, and folic is a vector of length 82. The vector of folic acid concentrations in individual diets is then just foods%*%folic, which I can call folic_indiv. How would I fit the model in lmer(), then? There's some bit of understading that I'm still missing. Thanks. Andrew Doran, Harold wrote: Prof Gelman: I believe the answer is yes. It sounds as though persons are partially crossed within food items? Assuming a logit link, the syntax might follow along the lines of fm1 - lmer(DV ~ foods + sex + age + (1|food_item), data, family = binomial(link='logit'), method = Laplace, control = list(usePQL= FALSE) ) Maybe this gets you partly there. Harold -Original Message- From: [EMAIL PROTECTED] on behalf of Andrew Gelman Sent: Sat 5/20/2006 5:49 AM To: r-help@stat.math.ethz.ch Cc: [EMAIL PROTECTED] Subject:[R] Can lmer() fit a multilevel model embedded in a regression? I would like to fit a hierarchical regression model from Witte et al. (1994; see reference below). It's a logistic regression of a health outcome on quntities of food intake; the linear predictor has the form, X*beta + W*gamma, where X is a matrix of consumption of 82 foods (i.e., the rows of X represent people in the study, the columns represent different foods, and X_ij is the amount of food j eaten by person i); and W is a matrix of some other predictors (sex, age, ...). The second stage of the model is a regression of X on some food-level predictors. Is it possible to fit this model in (the current
Re: [R] how to vectorize a matrix
Christian == Christian Ritz [EMAIL PROTECTED] on Mon, 22 May 2006 11:17:32 +0200 writes: Christian Hi Lorenzo, Christian maybe the following example is of use? Christian a - matrix(1:25,5,5) Christian stack(as.data.frame(a[, c(1,3,5,2,4)])) Christian Note that 'stack' takes a data frame or list as first argument (not a Christian matrix). Therefore the matrix is first converted to a data frame using Christian 'as.data.frame'. I do wonder though if not simply c(a[, c(1,3,5,2,4)]) [1] 1 2 3 4 5 11 12 13 14 15 21 22 23 24 25 6 7 8 9 10 16 17 18 19 20 is enough for Lorenzo, in this case ?? Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] normality testing with nortest
[EMAIL PROTECTED] wrote: One thing that threw me off (and this is not really specific to Nortest as it seems to be correct, but just my understanding), but the p-value seems quite unstable. For example: It is worth noting that if the null hypothesis is true, then the p-value is uniformly distributed on [0,1]. This should be kept in mind when assessing the ``instability'' of p-values. cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] random generation of a factor
Hi, Does anybody know a function that generates a factor of length N with K levels given the proportions c(p1, p2,... ,pk)? It would not ## does this code piece help you? mydata - c(yesterday, today, tomorrow) myproportions - c(0.3, 0.5, 0.2) n - 20 sample(x=mydata, size=n, replace=TRUE, prob=myproportions) cat(Best,\nRoland\n) -- This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] SUSE 10.1 64 bit binary
Hi, I just uploaded one to CRAN. You can use my repository http://fawn.hsu-hh.de/~steuer/SL-10.1 already. Detlef On Mon, 22 May 2006 09:46:53 +0100 Daniel Brewer [EMAIL PROTECTED] wrote: Hi, I was just wondering if anyone had created a 64 bit binary for SUSE 10.1 x86_64 yet. Unfortunately, I upgraded before looking whether a package was available and now cannot find a suitable package on CRAN. If noone has produced one yet, does anyone have any idea when one will be available? Many Thanks Daniel Brewer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] SUSE 10.1 64 bit binary
Thx, I´ll fix it! Detlef On 22 May 2006 11:43:33 +0200 Peter Dalgaard [EMAIL PROTECTED] wrote: (Incidentally, Detlef: The Source: entry in the SPEC is wrong. cvs.r-project.org is not accessible and hasn't been so for quite a while. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
On 5/22/2006 3:26 AM, Martin Maechler wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) For overuse of ifelse(), do you mean cases where test is length 1, so if() would work? Or are you thinking of something else? I'd also be interested in what you mean by quite suboptimal code. Are you thinking of things like if (test) temp - a else temp - b result - f(temp) versus result - f( if (test) a else b ) ? I would generally use the former, because it's easier to get the formatting right, and I find it easier to read. It's suboptimal in speed and memory use because of creating the temp variable, but in most cases I think that would be such a small difference that the small increase in readability is worthwhile. Duncan Murdoch Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Gabor __ Gabor R-help@stat.math.ethz.ch mailing list Gabor https://stat.ethz.ch/mailman/listinfo/r-help Gabor PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [Rd] Citation for R (was How to reference R in papers?)
The following link will take you to a long running discussion on your topic: http://tolstoy.newcastle.edu.au/R/help/06/02/21659.html Chuck Charles E. White, MS, Senior Biostatistician Walter Reed Army Institute of Research 503 Robert Grant Ave., Room 1w102 Silver Spring, MD 20910-1557 301 319-9781 Personal/Professional Site: http://users.starpower.net/cwhite571/professional/ -Original Message- Date: Mon, 22 May 2006 16:17:28 +0800 From: j.joshua thomas [EMAIL PROTECTED] Subject: [R] How to reference R in papers? Dear List, How do i reference R 2.3.0 in research papers? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How can you buy R?
On 5/22/06, Berwin A Turlach [EMAIL PROTECTED] wrote: [...] Thus, the last sentence of mine that you quoted: My understanding was that in that moment a product was created that would have to be wholly under the GPL, so the user was violating the GPL and lost the write to use your package. Should perhaps better be formulated as: My understanding was that in that moment a product was created that would have to be wholly under the GPL, so the person who did the linking was violating the GPL and it is not clear whether anyone is allowed to use the linked product. I think you are still missing the point. The act of creating a derivative work is NOT governed by the GPL, so it cannot possibly by itself violate the GPL. The question of violation only applies when the creator of this derivative work wishes to _distribute_ it. This is like me writing a book that no one else ever reads; it doesn't matter if I have plagiarized huge parts of it. This point is not as academic as you might think. It is well known that Google uses a customized version of Linux for their servers; however, they do not distribute this customized version, and hence are under no obligation to provide the changes (and they do not, in fact). This is NOT a violation of the GPL. DS I was referring to your question (quoted above) about use of DS GPL'd code in S-PLUS, which is what I was replying to. As I DS was saying, that situation is the opposite of the one in your DS example. O.k., sorry, I used a different scale with the time point of origin at Spencer's e-mail and my answer to that mail. Now I am with you. Agreed, the situation is the opposite, but that was the example discussed in gnu.misc.discuss. From an abstract point of view the situations are the same. You make someone else link a GPL product with a non-GPL product creating a derived work, the derived work would have to be under the GPL but is not. Hence, the derived work has a legal status that is in limbo and it is not clear whether anyone has to right to use it. The discussions on gnu.misc.discuss were centred on cases were people provided non-GPL binaries, asked their users to download GPL software from elsewhere, compile and link everything together and then use the combined product. As you say it is the exact opposite (and hence mirror image) from the situation that I was worried about, where I provide GPL software and ask others to compile and link it with non-GPL binaries and then use the combined product. If one scenario is not on, I don't see how the other one could be acceptable either. Except that in the first scenario there is a clear intend of circumventing the GPL. But I was not sure whether such kind of intent makes any difference. Thus, to avoid all these problems I decided to rather use the LGPL since that licence definitely seemed to allow both. That's your choice, but the situations are not symmetric, and quite deliberately so. The FSF's plan was not to produce a completely independent and fully functional 'GNU system' at once (which would be unrealistic), but rather produce replacements of UNIX tools one by one. It was entirely necessary to allow these new versions to operate within the older, proprietary system. In fact, GCC was not the first piece of software released under the GPL, and until then the only way to use GPL software was to compile them using a non-free compiler. This is enabled by means of exceptions to the GPL, as described in http://www.gnu.org/licenses/gpl-faq.html#GPLIncompatibleLibs (which I have already referred to once before). Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How can you buy R?
On 5/22/2006 3:55 AM, Berwin A Turlach wrote: G'day Deepayan, DS == Deepayan Sarkar [EMAIL PROTECTED] writes: DS let me first summarize this sub-discussion so far: [...] Sound like a perfect summary. :) DS As far as I can tell (and please correct me if I'm wrong), DS your contention is that by linking a GPL component P with a DS non-GPL component Q, a user may lose the rights granted to him DS by the GPL to the GPL-d part P. I don't think that I said this explicitly, but I can see how what I said can be interpreted in such a way. The point is rather that at the moment component P and Q are linked (and I perhaps carelessly assumed that the user was doing this) a product is produced that should be completely under the GPL. Obviously it is not. Hence, the status of this linked product, and whether it can be used by anybody, is an open question. And the answer is probably given by the copyright laws (and others?) of the country in the linking happens. DS Let's assume this is true. All that means is that the user has DS lost his rights to copy, modify and redistribute P. He does DS NOT lose the rights to use P. I agree with you on this. Probably I was to terse in my writing and produced misunderstandings. I never intended to say something about the rights that the user has with regards to P alone. My comments were directed towards the linked product P+Q. In particular, it is not clear to me whether one can execute such a product without violating copyright laws. The GPL is quite explicit on this: as Deepayan said, it confers rights to copy, modify and redistribute P. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. This probably varies from country to country, but I think the assumption is that if you have a legally acquired copy of a program, you have a right to execute it as you like. (The American DMCA and laws in other countries that implement the WIPO anti-circumvention rules limit you in specific ways, but they probably don't apply to the situation we're talking about.) Now, I suppose you might argue that executing P+Q makes a copy of it in memory, but I think countries that have modernized their copyright laws recognize that this is something you have a right to do with a legally acquired copy. You don't need the GPL to give you permission to do this. That's certainly true in the US and Canada. Your country may vary. Duncan Murdoch Thus, the last sentence of mine that you quoted: My understanding was that in that moment a product was created that would have to be wholly under the GPL, so the user was violating the GPL and lost the write to use your package. Should perhaps better be formulated as: My understanding was that in that moment a product was created that would have to be wholly under the GPL, so the person who did the linking was violating the GPL and it is not clear whether anyone is allowed to use the linked product. A simple google search would have confirmed to you that the linux kernel is developed under the GPL. [...] DS Linux is under GPL2, and not GPL2 or later. [...] Oh, I wasn't aware that they did not use the typical(?) or later phrase. Thanks for pointing this out and I note that we both agree that the linux kernel is definitely not under LGPL. DS In any case, this is the complete opposite of the situation we DS were originally discussing: [...] [...] So I have to wonder to what you are referring to as the situation we were originally discussing. DS I was referring to your question (quoted above) about use of DS GPL'd code in S-PLUS, which is what I was replying to. As I DS was saying, that situation is the opposite of the one in your DS example. O.k., sorry, I used a different scale with the time point of origin at Spencer's e-mail and my answer to that mail. Now I am with you. Agreed, the situation is the opposite, but that was the example discussed in gnu.misc.discuss. From an abstract point of view the situations are the same. You make someone else link a GPL product with a non-GPL product creating a derived work, the derived work would have to be under the GPL but is not. Hence, the derived work has a legal status that is in limbo and it is not clear whether anyone has to right to use it. The discussions on gnu.misc.discuss were centred on cases were people provided non-GPL binaries, asked their users to download GPL software from elsewhere, compile and link everything together and then use the combined product. As you say it is the exact opposite (and hence mirror image) from the situation that I was worried about, where I provide GPL software and ask others to compile and link it with non-GPL binaries and then
[R] what are supressors doing in hierarchical partitioning??
Hello, After fitting a model with glm (glm(A~a+b+c+d,binomial)) I want to have a closer look at the goodness of fit for each of the 4 independant variables by using hierarchical partitioning. The results (see below) indicate that three variables (MEDIANH_HE+CLOJUL+K95) act as a supressor because their dependant contribution is negative. For two variables the total contribution is near 0 (CLOJUL+N5_97). Surprisingly all variables are highly significant in the glm model. How can that be? And do the results from hierarchical partitioning mean that I better not use this model for prediction??? (PS: I went to a lot of effort fitting this model) many thanks in advance Christian hier.part(NACHTIGALL.x,A2HP_N5,family=binomial,gof=logLik,barplot=T) $gfs [1] -131.61028 -106.38458 -131.60991 -124.60603 -131.51891 -101.70663 [7] -102.38016 -105.11808 -124.55582 -131.51761 -124.31356 -98.96125 [13] -98.59341 -100.65798 -124.29683 -95.51063 $IJ I JTotal MEDIANH_HE 26.895004 -1.669303 2.522570e+01 CLOJUL 2.511079 -2.510707 3.721918e-04 K95 5.245935 1.758320 7.004255e+00 N5_97 1.447637 -1.356262 9.137552e-02 $I.perc I MEDIANH_HE 74.502107 CLOJUL 6.955965 K9514.531814 N5_97 4.010114 -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] normality testing with nortest
On 5/22/2006 5:25 AM, Peter Dalgaard wrote: Uwe Ligges [EMAIL PROTECTED] writes: A = 0.1846, p-value = 0.9059 ad.test(rnorm(100,mean=5,sd=3)) ... A = 0.5138, p-value = 0.1887 I mistakenly had thought the p-values would be more stable since I am artificially creating a random normal distribution. Is this expected for a normality test or is this an issue with how rnorm is producing random numbers? I guess if I run it many times, I would find that I would get many large values for the p-value? Well, as many large values as small values, 5% significant differences for the 5% level The following looks alright: hist(replicate(1000, ad.test(rnorm(100,mean=5,sd=3))$p.value)) We see this misunderstanding worryingly often. Worrying because it reveals that a fundamental aspect of statistical inference has not been grasped: that p-values are designed to be (approximately) uniformly distributed and fall below any given level with the stated probability, when the null hypothesis is true. I think it's the fallacious belief that the p-value measures the probability that the null hypothesis is true. This is currently misunderstanding #1 in the Wikipedia entry for P-values. (Google had me worried: I searched for probability that the null hypothesis is true and found P-value - Wikipedia, the free encyclopedia The p-value is the probability that the null hypothesis is true, justifying the rule of considering as significant p-values closer to 0 (zero). ... en.wikipedia.org/wiki/P-value - 17k - Cached - Similar pages This quote is preceded by: All of the following [...] statements are false: Context is important! :-) The vast majority of hits to that search also pointed out that this interpretation was incorrect. A couple of counterexamples were a research methods page at a department of psychology, and another at a medical school. I'll send a copy of this note to people there. Duncan Murdoch There is no mechanism to give you fewer significant or more stable p-values, and a p-value close to one is no better an indication of a true null hypothesis than one of 0.5 or 0.25. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
Duncan Murdoch wrote: On 5/22/2006 3:26 AM, Martin Maechler wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) For overuse of ifelse(), do you mean cases where test is length 1, so if() would work? Or are you thinking of something else? I'd also be interested in what you mean by quite suboptimal code. Are you thinking of things like if (test) temp - a else temp - b result - f(temp) versus result - f( if (test) a else b ) ? I would generally use the former, because it's easier to get the formatting right, and I find it easier to read. It's suboptimal in speed and memory use because of creating the temp variable, but in most cases I think that would be such a small difference that the small increase in readability is worthwhile. IMHO that approach too verbose and not more readable. Frank Duncan Murdoch Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
If you don't like f(if (temp) a else b) then what about temp - if (test) a else b f(temp) or temp - if (test) a else b f(temp) I think its easier to understand if you factor the temp- out since one immediately then knows the purpose of the statement is to set temp. On 5/22/06, Duncan Murdoch [EMAIL PROTECTED] wrote: On 5/22/2006 3:26 AM, Martin Maechler wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) For overuse of ifelse(), do you mean cases where test is length 1, so if() would work? Or are you thinking of something else? I'd also be interested in what you mean by quite suboptimal code. Are you thinking of things like if (test) temp - a else temp - b result - f(temp) versus result - f( if (test) a else b ) ? I would generally use the former, because it's easier to get the formatting right, and I find it easier to read. It's suboptimal in speed and memory use because of creating the temp variable, but in most cases I think that would be such a small difference that the small increase in readability is worthwhile. Duncan Murdoch Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Gabor __ Gabor R-help@stat.math.ethz.ch mailing list Gabor https://stat.ethz.ch/mailman/listinfo/r-help Gabor PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
On 5/22/2006 9:38 AM, Frank E Harrell Jr wrote: Duncan Murdoch wrote: On 5/22/2006 3:26 AM, Martin Maechler wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) For overuse of ifelse(), do you mean cases where test is length 1, so if() would work? Or are you thinking of something else? I'd also be interested in what you mean by quite suboptimal code. Are you thinking of things like if (test) temp - a else temp - b result - f(temp) versus result - f( if (test) a else b ) ? I would generally use the former, because it's easier to get the formatting right, and I find it easier to read. It's suboptimal in speed and memory use because of creating the temp variable, but in most cases I think that would be such a small difference that the small increase in readability is worthwhile. IMHO that approach too verbose and not more readable. IMO terse unreadable :-) Duncan Murdoch Frank Duncan Murdoch Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Mon, 22 May 2006 09:31:14 -0400 writes: Gabor If you don't like f(if (temp) a else b) Gabor then what about Gabor temp - if (test) a else b Gabor f(temp) Gabor or Gabor temp - if (test) Gabor a Gabor else Gabor b Gabor f(temp) Gabor I think its easier to understand if you factor the temp- out since Gabor one immediately then knows the purpose of the statement is Gabor to set temp. I strongly agree with Gabor on the above. But, to Duncan's question: Yes, indeed, my main point was that people use ifelse(test, a, b) also in cases where test is known to be of length one. BTW, the 2nd point about why I don't ``like'' ifelse() so much is on its help page: Both 'a' and 'b' are fully evaluated even though only one of the two values of a[i], b[i] are used in the result. Martin Gabor On 5/22/06, Duncan Murdoch [EMAIL PROTECTED] wrote: On 5/22/2006 3:26 AM, Martin Maechler wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) For overuse of ifelse(), do you mean cases where test is length 1, so if() would work? Or are you thinking of something else? I'd also be interested in what you mean by quite suboptimal code. Are you thinking of things like if (test) temp - a else temp - b result - f(temp) versus result - f( if (test) a else b ) ? I would generally use the former, because it's easier to get the formatting right, and I find it easier to read. It's suboptimal in speed and memory use because of creating the temp variable, but in most cases I think that would be such a small difference that the small increase in readability is worthwhile. Duncan Murdoch Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Gabor __ Gabor R-help@stat.math.ethz.ch mailing list Gabor https://stat.ethz.ch/mailman/listinfo/r-help Gabor PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
Due to lazy evaluation, I don't think a and b are fully evaluated: ifelse(1, a - 1, b - 2) [1] 1 a [1] 1 b Error: object b not found On 5/22/06, Martin Maechler [EMAIL PROTECTED] wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Mon, 22 May 2006 09:31:14 -0400 writes: Gabor If you don't like f(if (temp) a else b) Gabor then what about Gabor temp - if (test) a else b Gabor f(temp) Gabor or Gabor temp - if (test) Gabor a Gabor else Gabor b Gabor f(temp) Gabor I think its easier to understand if you factor the temp- out since Gabor one immediately then knows the purpose of the statement is Gabor to set temp. I strongly agree with Gabor on the above. But, to Duncan's question: Yes, indeed, my main point was that people use ifelse(test, a, b) also in cases where test is known to be of length one. BTW, the 2nd point about why I don't ``like'' ifelse() so much is on its help page: Both 'a' and 'b' are fully evaluated even though only one of the two values of a[i], b[i] are used in the result. Martin Gabor On 5/22/06, Duncan Murdoch [EMAIL PROTECTED] wrote: On 5/22/2006 3:26 AM, Martin Maechler wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Sun, 21 May 2006 09:47:07 -0400 writes: Gabor If you know that test is a scalar Gabor result - if (test) a else b Gabor will do it. Yes, indeed! IMO, ifelse(test, a, b) is much overused where as if(test) a else b is much UNDER used. From some e-mail postings, and even some documents (even printed books?), I get the impression that too many people think that ifelse(.,.,.) is to be used as expression / function and if(.) . else . only for program flow control. This leads to quite suboptimal code, and I personally use if(.) . else . __as expression__ much more frequently than ifelse(.,.,.) For overuse of ifelse(), do you mean cases where test is length 1, so if() would work? Or are you thinking of something else? I'd also be interested in what you mean by quite suboptimal code. Are you thinking of things like if (test) temp - a else temp - b result - f(temp) versus result - f( if (test) a else b ) ? I would generally use the former, because it's easier to get the formatting right, and I find it easier to read. It's suboptimal in speed and memory use because of creating the temp variable, but in most cases I think that would be such a small difference that the small increase in readability is worthwhile. Duncan Murdoch Martin Maechler, ETH Zurich. Gabor Here is another approach: Gabor as.vector(test * ts(a) + (!test) * ts(b)) Gabor On 5/21/06, ivo welch [EMAIL PROTECTED] wrote: Dear R wizards: I just got stung by the ifelse() feature. a - 10:15 b - 20:300 test - 1 ifelse(test,a,b) [1] 10 I had not realized that this was the default behavior---I had expected 10:15. mea culpa. however, I wonder whether it would make sense to replace ifelse with a different semantic, where if test is a single scalar, it means what a stupid user like me would imagine. Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) regards, /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Gabor __ Gabor R-help@stat.math.ethz.ch mailing list Gabor https://stat.ethz.ch/mailman/listinfo/r-help Gabor PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
On Mon, 22 May 2006, Gabor Grothendieck wrote: Due to lazy evaluation, I don't think a and b are fully evaluated: ifelse(1, a - 1, b - 2) [1] 1 a [1] 1 b Error: object b not found yes. If you look at the code for ifelse() it evaluates the second argument if any test values are TRUE and the third argument if any test values are FALSE, so in the scalar case it does not evaluate both arguments. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] win2k memory problem with merge()'ing repeatedly (long email)
Good afternoon, I have a 63 small .csv files which I process daily, and until two weeks ago they processed just fine and only took a matter of moments and had non noticeable memory problem. Two weeks ago they have reached 318 lines and my script broke. There are some missing-values in some of the files. I have tried hard many times over the last two weeks to create a small repeatable example to give you but I've failed - unless I use my data it works fine... :-( Am I missing something obvious? (again) A line in a typical file has lines which look like : 01/06/2005,1372 Though there are three files which have two values (files 3,32,33) and these have lines which look like... 01/06/2005,1766, or 15/05/2006,289,114 a1 - read.csv(file1.csv,header=F) etc... a63 - read.csv(file63.csv,header=F) names(a1) - c(mdate,file1.column.description) atot - merge(a1,a2,all=T) followed by repeatedly doing... atot - merge(atot, a3,all=T) atot - merge(atot, a4,all=T) etc... I normally start R with --vanilla. What appears to happen is that atot doubles in size each iteration and just falls over due to lack of memory at about i=17... even though the total memory required for all of these individual a1...a63 is only 1001384 bytes (doing an object.size() on a1..a63) at this point I've been trying to pin down this problem for two weeks and I just gave up... The following works fine as I'd expect with minimal memory usage... for (i in 3:67) { datelist - as.Date(start.date)+0:(count-1) #remove a couple of elements... datelist - datelist[-(floor(runif(nacount)*count))] a2 - as.data.frame(datelist) names(a2) - mdate vname - paste(value, i, sep=) a2[vname] - runif(length(datelist)) #a2[floor(runif(nacount)*count), vname] - NA # atot - merge(atot,a2,all=T) i - 2 a.eval.text - paste(merge(atot, a, i, , all=T), sep=) cat(a.eval.text is: -, a.eval.text, -\n, sep=) atot - eval(parse(text=a.eval.text)) cat(i:, i, , gc(), \n) } this works fine... but on my files (as per attached 'lastsave.txt' file) it just gobbles memory. Am I doing something wrong? I (wrongly?) expected that repeatedly merge(atot,aN) would only increase the memory requirement linearly (with jumps perhaps as we go through a 2^n boundary)... which is what happens when merging simulated data.frames as above... no problem at all and its really fast... The attached text file shows a (slightly edited) session where the memory required by the merge() operation just doubles with each use... and I can only allow it to run until i=17!!! I've even run it with gctorture() set on... with similar, but excruciatingly slow results... Is there any relevant info that I'm missing? Unfortunately I am not able to post the contents of the files to a public list like this... As per a previous thread, I know that I can use a list to handle these dataframes - but I had difficulty with the syntax of a list of dataframes... I'd like to know why the memory requirements for this merge just explode... cheers, (and thanks in advance!) Sean O'Riordain == version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 3.0 year 2006 month 05 day09 svn rev38014 language R version.string Version 2.3.0 Patched (2006-05-09 r38014) Running on Win2k with 1Gb ram. I also tried it (with the same results) on 2.2.1 and 2.3.0. R : Copyright 2006, The R Foundation for Statistical Computing Version 2.3.0 Patched (2006-05-09 r38014) ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 178186 4.8 407500 10.9 35 9.4 Vcells 73112 0.6 786432 6.0 333585 2.6 # take the information in the .csv files created from the emails setwd(C:/Documents and Settings/c_oriordain_s/My Documents/pasip/mms/mms_emails) # the input file from Amdocs (as supplied by revenue assurance) amdocs_csv_filename - amdocs_volumes_revised4.csv # where shall we put the output plot file copypath - ient1dfs001\\general\\Process Improvement Projects\\Process Improvement Projects Repository\\Active Projects\\MMS\\01 Measure\\ # set to F (false) instead of T (true) if you're just tricking around and you don't # want to be copying over
Re: [R] odd feature
On Sun, 21 May 2006, ivo welch wrote: Aside, I like the flexibility of R, but I am not thrilled by all the recycling rules. I either mean I want a scalar or a vector of equal/appropriate dimension. I never want a recycle of a smaller vector. (I do often use a recycle of a scalar.) One case where the vector-vector recycling rules are used is in vector-matrix operations: a-1:4 b-diag(4) a+b -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 22 May 2006 07:09:09 -0700 (PDT) writes: TL On Mon, 22 May 2006, Gabor Grothendieck wrote: Due to lazy evaluation, I don't think a and b are fully evaluated: ifelse(1, a - 1, b - 2) [1] 1 a [1] 1 b Error: object b not found TL yes. If you look at the code for ifelse() it evaluates TL the second argument if any test values are TRUE and the TL third argument if any test values are FALSE, so in the TL scalar case it does not evaluate both arguments. yes. And (Gabor), if you recall what I said earlier, it's pretty clear that I would not even *consider* using ifelse() in the scalar case. So my original statement was about the case ifelse() is designed for: The non-trivial vector case, and there my statement of fully evaluated does apply. Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] JSS Code Snippets
JSS is trying to develop it's Code Snippets section. We have some snippets lined up and one published in the latest volume http://www.jstatsoft.org/index.php?vol=16 If you have small chunks of code of obvious relevance to statistical computing (need not be in R) consider submitting it -- why keep it to yourself ? Just a matter of pasting a minimal amount of TeX into our templates and diving into our breathtakingly efficient review process. We also welcome regular submissions and suggestions for special issues, book reviews, and software reviews. Of course if you suggest something you may very well be asked also to actually do something about that suggestion. === Jan de Leeuw; Distinguished Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: [EMAIL PROTECTED] .mac: jdeleeuw ++ aim: deleeuwjan ++ skype: j_deleeuw homepages: http://gifi.stat.ucla.edu ++ http://www.cuddyvalley.org - No matter where you go, there you are. --- Buckaroo Banzai http://gifi.stat.ucla.edu/sounds/nomatter.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Can lmer() fit a multilevel model embedded in a regression?
Harold, I think we use slightly different notation (I like to use variance parameters rather than covariance matrices). Let me try to write it in model form: Data points y_i, i=1,...,800 800 x 84 matrix of predictors, X: for columns j=1,...,82, X_{i,j} is the amount of food j consumed by person i. X_{i,83} is an indicator (1 if male, 0 if female), and X_{i,84} is the age of person i. Data-level model: Pr (y_i=1) = inverse.logit (X_i*beta), for i=1,...,800, with independent outcomes. beta is a (column) vector of length 84. Group-level model: for j=1,...,82: beta_j ~ Normal (gamma_0 + gamma_1 * u_j, sigma^2_{beta}). u is a vector of length 82, where u_j = folate concentration in food j gamma_0 and gamma_1 are scalar coefficients (for the group-level model), and sigma_{beta} is the sd of the group-level errors. It would be hopeless to estimate all the betas using maximum likelihood: that's 800 data points and 84 predictors, the results will just be too noisy. But it should be ok using the 2-level model above. The question is: can I fit in lmer()? Thanks again. Andrew Doran, Harold wrote: So, in the hierarchical notation, does the model look like this (for the linear predictor): DV = constant + food_1(B_1) + food_2(B_2) + ... + food_82(B_82) + sex(B_83) + age(B_84) food_1 = gamma_00 + gamma_01(folic) + r_01 food_2 = gamma_10 + gamma_11(folic) + r_02 ... food_82 = gamma_20 + gamma_21(folic) + r_82 where r_qq ~ N(0, Psi) and Psi is an 82-dimensional covariance matrix. I usually need to see this in model form as it helps me translate this into lmer syntax if it can be estimated. From what I see, this would be estimating 82(82+1)/2 = 3403 parameters in the covariance matrix. What I'm stuck on is below you say it would be hopeless to estimate the 82 predictors using ML. But, if I understand the model correctly, the multilevel regression still resolves the predictors (fixed effects) using ML once estimates of the variances are obtained. So, I feel I might still be missing something. -Original Message- From: Andrew Gelman [mailto:[EMAIL PROTECTED] Sent: Sun 5/21/2006 7:35 PM To: Doran, Harold Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject:Re: [R] Can lmer() fit a multilevel model embedded in a regression? Harold, I get confused by the terms fixed and random. Our first-level model (in the simplified version we're discussing here) has 800 data points (the persons in the study) and 84 predictors: sex, age, and 82 coefficients for foods. The second-level model has 82 data points (the foods) and two predictors: a constant term and folic acid concentration. It would be hopeless to estimate the 82 food coefficients via maximum likelihood, so the idea is to do a multilevel model, with a regression of these coefficients on the constant term and folic acid. The group-level model has a residual variance. If the group-level residual variance is 0, it's equivalent to ignoring food, and just using total folic acid as an individual predictor. If the group-level residual variance is infinity, it's equivalent to estimating the original regression (with 84 predictors) using least squares. The difficulty is that the foods aren't groups in the usual sense, since persons are not nested within foods; rather, each person eats many foods, and this is reflected in the X matrix. Andrew Doran, Harold wrote: OK, I'm piecing this together a bit, sorry I'm not familiar with the article you cite. Let me try and fully understand the issue if you don't mind. Are you estimating each of the 82 foods as fixed effects? If so, in the example below this implies 84 total fixed effects (1 for each food type in the X matrix and then sex and age). I'm assuming that food type is nested within one of the 82 folic acid concentrations and then folic acid is treated as a random effect. Is this accurate? -Original Message- From: Andrew Gelman [mailto:[EMAIL PROTECTED] Sent: Sun 5/21/2006 9:17 AM To: Doran, Harold Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject:Re: [R] Can lmer() fit a multilevel model embedded in a regression? Harold, I'm confused now. Just for concretness, suppose we have 800 people, 82 food items, and one predictor (folic, the folic acid concentration) at the food-item level. Then DV will be a vector of length 800, foods is an 800 x 82 matrix, sex is a vector of length 800, age is a vector of length 800, and folic is a vector of length 82. The vector of folic acid concentrations in individual diets is then just foods%*%folic, which I can call folic_indiv. How would I fit the model in lmer(), then? There's some bit of understading that I'm still missing. Thanks. Andrew Doran, Harold wrote: Prof Gelman: I believe the answer is yes. It sounds as though persons are partially crossed
[R] writing 100 files
Hi All, I need to write as text files 1000 ish variation of the same data frame, once I permute a row. I would like to use the function write.table() to write the files, and use a loop to do it: for (i in 1:1000){ bb8[2,] = sample(bb8[2,]) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = 'whatever?.txt') } so all the files are called whatever1: whatever1000 Any idea? Cheers, Federico -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
On 5/22/06, Martin Maechler [EMAIL PROTECTED] wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 22 May 2006 07:09:09 -0700 (PDT) writes: TL On Mon, 22 May 2006, Gabor Grothendieck wrote: Due to lazy evaluation, I don't think a and b are fully evaluated: ifelse(1, a - 1, b - 2) [1] 1 a [1] 1 b Error: object b not found TL yes. If you look at the code for ifelse() it evaluates TL the second argument if any test values are TRUE and the TL third argument if any test values are FALSE, so in the TL scalar case it does not evaluate both arguments. yes. And (Gabor), if you recall what I said earlier, it's pretty clear that I would not even *consider* using ifelse() in the scalar case. So my original statement was about the case ifelse() is designed for: The non-trivial vector case, and there my statement of fully evaluated does apply. Martin I don't think they are fully evaluated in the vector case either: ifelse(rep(TRUE,4), a - 1:4, b - 1:4) [1] 1 2 3 4 a [1] 1 2 3 4 b Error: object b not found __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] writing 100 files
On 5/22/2006 11:24 AM, Federico Calboli wrote: Hi All, I need to write as text files 1000 ish variation of the same data frame, once I permute a row. I would like to use the function write.table() to write the files, and use a loop to do it: for (i in 1:1000){ bb8[2,] = sample(bb8[2,]) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = 'whatever?.txt') } so all the files are called whatever1: whatever1000 Any idea? Use the paste() function to construct the name, e.g. file = paste(whatever,i,.txt, sep=) Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] win2k memory problem with merge()'ing repeatedly (long email)
Repeated merge()-ing does not always increase the space requirements linearly. Keep in mind that a join between two tables where the same value appears M and N times will produce M*N rows for that particular value. My guess is that the number of rows in atot explodes because you have some duplicate values in your files (having the same duplicate date in each data frame would cause atot to contain 4, then 8, 16, 32, 64... rows for that date). -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean O'Riordain Sent: Monday, May 22, 2006 10:12 AM To: r-help Subject: [R] win2k memory problem with merge()'ing repeatedly (long email) Good afternoon, I have a 63 small .csv files which I process daily, and until two weeks ago they processed just fine and only took a matter of moments and had non noticeable memory problem. Two weeks ago they have reached 318 lines and my script broke. There are some missing-values in some of the files. I have tried hard many times over the last two weeks to create a small repeatable example to give you but I've failed - unless I use my data it works fine... :-( Am I missing something obvious? (again) A line in a typical file has lines which look like : 01/06/2005,1372 Though there are three files which have two values (files 3,32,33) and these have lines which look like... 01/06/2005,1766, or 15/05/2006,289,114 a1 - read.csv(file1.csv,header=F) etc... a63 - read.csv(file63.csv,header=F) names(a1) - c(mdate,file1.column.description) atot - merge(a1,a2,all=T) followed by repeatedly doing... atot - merge(atot, a3,all=T) atot - merge(atot, a4,all=T) etc... I normally start R with --vanilla. What appears to happen is that atot doubles in size each iteration and just falls over due to lack of memory at about i=17... even though the total memory required for all of these individual a1...a63 is only 1001384 bytes (doing an object.size() on a1..a63) at this point I've been trying to pin down this problem for two weeks and I just gave up... The following works fine as I'd expect with minimal memory usage... for (i in 3:67) { datelist - as.Date(start.date)+0:(count-1) #remove a couple of elements... datelist - datelist[-(floor(runif(nacount)*count))] a2 - as.data.frame(datelist) names(a2) - mdate vname - paste(value, i, sep=) a2[vname] - runif(length(datelist)) #a2[floor(runif(nacount)*count), vname] - NA # atot - merge(atot,a2,all=T) i - 2 a.eval.text - paste(merge(atot, a, i, , all=T), sep=) cat(a.eval.text is: -, a.eval.text, -\n, sep=) atot - eval(parse(text=a.eval.text)) cat(i:, i, , gc(), \n) } this works fine... but on my files (as per attached 'lastsave.txt' file) it just gobbles memory. Am I doing something wrong? I (wrongly?) expected that repeatedly merge(atot,aN) would only increase the memory requirement linearly (with jumps perhaps as we go through a 2^n boundary)... which is what happens when merging simulated data.frames as above... no problem at all and its really fast... The attached text file shows a (slightly edited) session where the memory required by the merge() operation just doubles with each use... and I can only allow it to run until i=17!!! I've even run it with gctorture() set on... with similar, but excruciatingly slow results... Is there any relevant info that I'm missing? Unfortunately I am not able to post the contents of the files to a public list like this... As per a previous thread, I know that I can use a list to handle these dataframes - but I had difficulty with the syntax of a list of dataframes... I'd like to know why the memory requirements for this merge just explode... cheers, (and thanks in advance!) Sean O'Riordain == version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 3.0 year 2006 month 05 day09 svn rev38014 language R version.string Version 2.3.0 Patched (2006-05-09 r38014) Running on Win2k with 1Gb ram. I also tried it (with the same results) on 2.2.1 and 2.3.0. R : Copyright 2006, The R Foundation for Statistical Computing Version 2.3.0 Patched (2006-05-09 r38014) ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()'
Re: [R] odd feature
Gabor Grothendieck wrote: On 5/22/06, Martin Maechler [EMAIL PROTECTED] wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 22 May 2006 07:09:09 -0700 (PDT) writes: TL On Mon, 22 May 2006, Gabor Grothendieck wrote: Due to lazy evaluation, I don't think a and b are fully evaluated: ifelse(1, a - 1, b - 2) [1] 1 a [1] 1 b Error: object b not found TL yes. If you look at the code for ifelse() it evaluates TL the second argument if any test values are TRUE and the TL third argument if any test values are FALSE, so in the TL scalar case it does not evaluate both arguments. yes. And (Gabor), if you recall what I said earlier, it's pretty clear that I would not even *consider* using ifelse() in the scalar case. So my original statement was about the case ifelse() is designed for: The non-trivial vector case, and there my statement of fully evaluated does apply. Martin I don't think they are fully evaluated in the vector case either: ifelse(rep(TRUE,4), a - 1:4, b - 1:4) [1] 1 2 3 4 a [1] 1 2 3 4 b Error: object b not found Martin's non-trivial means not all conditions are TRUE or not all are FALSE, then both statements are FULLY evaluated and he is right. See the ifelse() code which is not that hard to read... Uwe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] writing 100 files
Federico Calboli wrote: Hi All, I need to write as text files 1000 ish variation of the same data frame, once I permute a row. I would like to use the function write.table() to write the files, and use a loop to do it: for (i in 1:1000){ bb8[2,] = sample(bb8[2,]) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = 'whatever?.txt') } so all the files are called whatever1: whatever1000 Any idea? Cheers, Federico This is the FAQ How can I save the result of each iteration in a loop into a separate file? Please read the FAQs prior to posting! Uwe Ligges __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] writing 100 files
On Mon, 2006-05-22 at 16:24 +0100, Federico Calboli wrote: Hi All, I need to write as text files 1000 ish variation of the same data frame, once I permute a row. I would like to use the function write.table() to write the files, and use a loop to do it: for (i in 1:1000){ bb8[2,] = sample(bb8[2,]) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = 'whatever?.txt') } so all the files are called whatever1: whatever1000 Any idea? Cheers, Federico The same process used in R FAQ 7.34: How can I save the result of each iteration in a loop into a separate file? can be used here. Instead of using save(), use write.table(), adjusting the arguments accordingly. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RQuantlib Array processing applying EuropeanOptionExampleArray
I am trying to replicate part of the EuropeanOptionExample using my date. I have a data.frame containing all my inputs atm.vols-subset(data.vols,moneyness==min); #Some days have the abs(moneyness) exactly between two strike prices, #Traders will alway price an option at the higher vol when selling it, so we will too. atm.vols.max-data.frame(vol.max=tapply(atm.vols$sigma,INDEX=factor(atm.vols$date),max)); #y-date.mmddyy(as.numeric(row.names(atm.vols))); #row.names(atm.vols.max)-date.mmddyy(as.numeric(row.names(atm.vols.max))); atm.work-merge(atm.vols,atm.vols.max,by.x=date,by.y=row.names,all=TRUE) #ok get only the vols we need atm.work-subset(atm.work,sigma==vol.max); atm.work is has 749 rows. I try and run the EuropeanOption example using atm.work$For_Price as my array of underlying prices and the other inputs from row 9 of atm.work. i-9; x-EuropeanOption(type = put, underlying = atm.work$For_Price, strike = atm.work$K[i], dividendYield = atm.work$BEY[i], riskFreeRate = atm.work$BEY[i], maturity = atm.work$t_exp[i], volatility = atm.work$sigma[i]) x$parameters has the array of underlying prices but the results is only a single vector using the first row of atm.work$For_Price. Is this because I am pulling the inputs from data.frame not arrays? Any help is greatly appreciated. Thank you Joe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Mon, 22 May 2006 11:34:09 -0400 writes: Gabor On 5/22/06, Martin Maechler [EMAIL PROTECTED] wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 22 May 2006 07:09:09 -0700 (PDT) writes: TL On Mon, 22 May 2006, Gabor Grothendieck wrote: Due to lazy evaluation, I don't think a and b are fully evaluated: ifelse(1, a - 1, b - 2) [1] 1 a [1] 1 b Error: object b not found TL yes. If you look at the code for ifelse() it evaluates TL the second argument if any test values are TRUE and the TL third argument if any test values are FALSE, so in the TL scalar case it does not evaluate both arguments. yes. And (Gabor), if you recall what I said earlier, it's pretty clear that I would not even *consider* using ifelse() in the scalar case. So my original statement was about the case ifelse() is designed for: The non-trivial vector case, and there my statement of fully evaluated does apply. Martin Gabor I don't think they are fully evaluated in the vector case either: ifelse(rep(TRUE,4), a - 1:4, b - 1:4) Gabor [1] 1 2 3 4 a Gabor [1] 1 2 3 4 b Gabor Error: object b not found PLEASE -- I said non-trivial above Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] writing 100 files
Try this: x - 1:12 for (i in 1:2){ bb8 = sample(x) a - sprintf(whatever%f.txt,i) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = a) } HTH Sachin Duncan Murdoch [EMAIL PROTECTED] wrote: On 5/22/2006 11:24 AM, Federico Calboli wrote: Hi All, I need to write as text files 1000 ish variation of the same data frame, once I permute a row. I would like to use the function write.table() to write the files, and use a loop to do it: for (i in 1:1000){ bb8[2,] = sample(bb8[2,]) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = 'whatever?.txt') } so all the files are called whatever1: whatever1000 Any idea? Use the paste() function to construct the name, e.g. file = paste(whatever,i,.txt, sep=) Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] intervals from cut() as numerics?
On Sat, 2006-05-20 at 15:44 +0200, Dimitrios Rizopoulos wrote: as an alternative, you can have a look inside cut.default and use the part that produces the breaks, i.e., breaks - 10 groups - cut(x, breaks = breaks) max.bias - as.vector(tapply(error, groups, mean)) # from cut.default() nb - as.integer(breaks + 1) dx - diff(rx - range(x, na.rm = TRUE)) breaks - round(seq(rx[1] - dx/1000, rx[2] + dx/1000, len = nb), 2) mat - cbind(breaks[1:(nb - 1)], breaks[2:nb]) plot(x, error, type = n) abline(h = 0, col = grey) panel.smooth(x, error) arrows(mat[, 1], max.bias, mat[, 2], max.bias, length = 0.05, angle = 90, code = 3) snip / Thanks also to Dimitrios and Gabor for your suggests. This suggestion from Dimitrios is probably the one I'll use even though all three suggestions give the same end result. I like this one because it seems (to me) better to do the calculations than process the text labels assigned to the list names, and because this will end up in a package to go on CRAN I'd prefer not to add dependencies if I can avoid it - especially if an alternative is available. All the best, Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% * Note new Address, Telephone Fax numbers from 6th April 2006 * %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson ECRC ENSIS [t] +44 (0)20 7679 0522 UCL Department of Geography [f] +44 (0)20 7679 0565 Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. [w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] win2k memory problem with merge()'ing repeatedly (long email)
Thank you very much indeed Bogdan! a2[duplicated(a2$mdate),] value2 mdate 3180 2006-05-10 3220 2006-05-13 3240 2006-05-14 3260 2006-05-15 3280 2006-05-16 What a relief to know what is causing this problem... now to sort out the root cause! cheers and thanks again! Sean On 22/05/06, bogdan romocea [EMAIL PROTECTED] wrote: Repeated merge()-ing does not always increase the space requirements linearly. Keep in mind that a join between two tables where the same value appears M and N times will produce M*N rows for that particular value. My guess is that the number of rows in atot explodes because you have some duplicate values in your files (having the same duplicate date in each data frame would cause atot to contain 4, then 8, 16, 32, 64... rows for that date). -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean O'Riordain Sent: Monday, May 22, 2006 10:12 AM To: r-help Subject: [R] win2k memory problem with merge()'ing repeatedly (long email) Good afternoon, I have a 63 small .csv files which I process daily, and until two weeks ago they processed just fine and only took a matter of moments and had non noticeable memory problem. Two weeks ago they have reached 318 lines and my script broke. There are some missing-values in some of the files. I have tried hard many times over the last two weeks to create a small repeatable example to give you but I've failed - unless I use my data it works fine... :-( Am I missing something obvious? (again) A line in a typical file has lines which look like : 01/06/2005,1372 Though there are three files which have two values (files 3,32,33) and these have lines which look like... 01/06/2005,1766, or 15/05/2006,289,114 a1 - read.csv(file1.csv,header=F) etc... a63 - read.csv(file63.csv,header=F) names(a1) - c(mdate,file1.column.description) atot - merge(a1,a2,all=T) followed by repeatedly doing... atot - merge(atot, a3,all=T) atot - merge(atot, a4,all=T) etc... I normally start R with --vanilla. What appears to happen is that atot doubles in size each iteration and just falls over due to lack of memory at about i=17... even though the total memory required for all of these individual a1...a63 is only 1001384 bytes (doing an object.size() on a1..a63) at this point I've been trying to pin down this problem for two weeks and I just gave up... The following works fine as I'd expect with minimal memory usage... for (i in 3:67) { datelist - as.Date(start.date)+0:(count-1) #remove a couple of elements... datelist - datelist[-(floor(runif(nacount)*count))] a2 - as.data.frame(datelist) names(a2) - mdate vname - paste(value, i, sep=) a2[vname] - runif(length(datelist)) #a2[floor(runif(nacount)*count), vname] - NA # atot - merge(atot,a2,all=T) i - 2 a.eval.text - paste(merge(atot, a, i, , all=T), sep=) cat(a.eval.text is: -, a.eval.text, -\n, sep=) atot - eval(parse(text=a.eval.text)) cat(i:, i, , gc(), \n) } this works fine... but on my files (as per attached 'lastsave.txt' file) it just gobbles memory. Am I doing something wrong? I (wrongly?) expected that repeatedly merge(atot,aN) would only increase the memory requirement linearly (with jumps perhaps as we go through a 2^n boundary)... which is what happens when merging simulated data.frames as above... no problem at all and its really fast... The attached text file shows a (slightly edited) session where the memory required by the merge() operation just doubles with each use... and I can only allow it to run until i=17!!! I've even run it with gctorture() set on... with similar, but excruciatingly slow results... Is there any relevant info that I'm missing? Unfortunately I am not able to post the contents of the files to a public list like this... As per a previous thread, I know that I can use a list to handle these dataframes - but I had difficulty with the syntax of a list of dataframes... I'd like to know why the memory requirements for this merge just explode... cheers, (and thanks in advance!) Sean O'Riordain == version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 3.0 year 2006 month 05 day09 svn rev38014 language R version.string Version 2.3.0 Patched (2006-05-09 r38014) Running on Win2k with 1Gb ram. I also tried it (with the same results) on 2.2.1 and 2.3.0. R :
[R] How can you buy R?
About glmmADMB and GPL: We were not very cautious when we put in the GPL statement. What we wanted to say was that the use of glmmADMB is free, and does not require a license for AD Model Builder. Am I correct in interpreting this discussion so that all we have to do is to remove the License: GPL statement from the DESCRIPTION file (and everywhere else it may occur), and there will be no conflict between glmmADMB and the rules of the R community? We have temporarily withdrawn glmmADMB until this question has been settled. hans Brian Ripley wrote: The issue in the glmmADMB example is not if they were required to release it under GPL (my reading from the GPL FAQ is that they probably were not, given that communication is between processes and the R code is interpreted). Rather, it is stated to be under GPL _but_ there is no source code offer for the executables (and the GPL FAQ says that for anonymous FTP it should be downloadable via the same site, and the principles apply equally to HTTP sites). As the executables are not for my normal OS and I would like to exercise my freedom to try the GPLed code, I have requested the sources from the package maintainer. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] choosing a mirror
Dear R People: Is there a way to select a mirror from the command line instead of via a menu, please? I tried chooseCRANmirror but to no avail. Thanks in advance. R version 2.3.0 Windows. Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] choosing a mirror
I guess you are using a function like install.packages() or similar You can, for instalnce, use the argument repos e.g ...,repos=http://cran.at.r-project.br;) On Mon, 22 May 2006, Erin Hodgess wrote: Date: Mon, 22 May 2006 12:57:49 -0500 From: Erin Hodgess [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject: [R] choosing a mirror Dear R People: Is there a way to select a mirror from the command line instead of via a menu, please? I tried chooseCRANmirror but to no avail. Thanks in advance. R version 2.3.0 Windows. Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Paulo Justiniano Ribeiro Jr LEG (Laboratório de Estatística e Geoinformação) Departamento de Estatística Universidade Federal do Paraná Caixa Postal 19.081 CEP 81.531-990 Curitiba, PR - Brasil Tel: (+55) 41 3361 3573 Fax: (+55) 41 3361 3141 e-mail: [EMAIL PROTECTED] http://www.est.ufpr.br/~paulojus__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem installing rgdal from source -- Makevars issue
List: I have installed R, contributed packages, and supporting libraries in a non-standard location (but under the same sub-directory tree). This was necessary for systems reasons beyond my control. When I attempt to use Makevars (as shown below) to install rgdal from source, I get the following error: R CMD INSTALL rgdal_0.4-3.tar.gz . . . checking proj_api.h usability... no checking proj_api.h presence... no checking for proj_api.h... no libproj.a and proj_api.h not found in standard search locations, edit src/Makevars manually, adding -Idirectory with proj_api.h to PKG_CPPFLAGS = , and -Ldirectory with libproj.a to PKG_LIBS = ERROR: configuration failed for package 'rgdal' ** Removing '/awips/rep/lx/local_apps/R/lib/R/library/rgdal' ** Restoring previous '/awips/rep/lx/local_apps/R/lib/R/library/rgdal' I get this error regardless of where I put the Makevars file: $HOME/.R/Makevars, in the local build directory, or in the rgdal source directory src/Makevars runing ./configure My Makevars contains this: PKG_CPPFLAGS = -I/awips/rep/lx/local_apps/proj4/include PKG_LIBS = -L/awips/rep/lx/local_apps/proj4/lib I have also tried using the file name: Makevars and Makevars-i686-pc-linux-gnu I'm sure I am doing *something* wrong; any suggestions? Regards, Tom -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: [EMAIL PROTECTED] VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] (no subject)
C/G GG CC CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA TT AT [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] editing a big file
I have a file that has 90 columns and 20,000 rows and looks like C/G CC GG CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA TT AT I want to write a code that will read through each row first the first looks at the first column and then replace the three columns with 12 if it is the same as the first column e.g. third column 11 if it is a repeat of the first alphabet like the second column and 22 if it a repeat of the second alphabet. Can you please give some hint how I can start coding for this, like command names, I am not a programmer and an R beginner. my out put should look like this 11 22 12 11 22 12 11 22 12 11 22 12 11 22 12 11 22 12 Thank you in advance. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] choosing a mirror
On Mon, 22 May 2006, Paulo Justiniano Ribeiro Jr wrote: I guess you are using a function like install.packages() or similar You can, for instalnce, use the argument repos e.g ...,repos=http://cran.at.r-project.br;) You can also set options(repos) -thomas On Mon, 22 May 2006, Erin Hodgess wrote: Date: Mon, 22 May 2006 12:57:49 -0500 From: Erin Hodgess [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject: [R] choosing a mirror Dear R People: Is there a way to select a mirror from the command line instead of via a menu, please? I tried chooseCRANmirror but to no avail. Thanks in advance. R version 2.3.0 Windows. Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Paulo Justiniano Ribeiro Jr LEG (Laborat?rio de Estat?stica e Geoinforma??o) Departamento de Estat?stica Universidade Federal do Paran? Caixa Postal 19.081 CEP 81.531-990 Curitiba, PR - Brasil Tel: (+55) 41 3361 3573 Fax: (+55) 41 3361 3141 e-mail: [EMAIL PROTECTED] http://www.est.ufpr.br/~paulojus Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] choosing a mirror
Le 22.05.2006 19:57, Erin Hodgess a écrit : Dear R People: Is there a way to select a mirror from the command line instead of via a menu, please? I tried chooseCRANmirror but to no avail. Thanks in advance. R version 2.3.0 Windows. See repos in ?options -- visit the R Graph Gallery : http://addictedtor.free.fr/graphiques mixmod 1.7 is released : http://www-math.univ-fcomte.fr/mixmod/index.php +---+ | Romain FRANCOIS - http://francoisromain.free.fr | | Doctorant INRIA Futurs / EDF | +---+ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] random generation of a factor
Dear Dave and Roland Thanks for your answers! I think sample() does the job I'm looking for. I also came up with rmultinom(), but could not make it working, because I don't want several multinomial distributed vectors, but one vector with K levels of predefined proportions. Propably there is a way to do that with rmultinom() - Dave, could you make up a short example? Best wishes Christian Rau, Roland schrieb: Hi, Does anybody know a function that generates a factor of length N with K levels given the proportions c(p1, p2,... ,pk)? It would not ## does this code piece help you? mydata - c(yesterday, today, tomorrow) myproportions - c(0.3, 0.5, 0.2) n - 20 sample(x=mydata, size=n, replace=TRUE, prob=myproportions) cat(Best,\nRoland\n) -- This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
Hi Thomas: One case where the vector-vector recycling rules are used is in vector-matrix operations: a-1:4 b-diag(4) a+b is this last expression intended to be intuitive and thus desirable? if anything else, I would end up writing something like this more as an error than by intent. I would suggest that recycling might be better to be explicitly asked for by the user---though I am not sure what the syntax should be. allow.recycle(a+b) at the end, I think implicit recycling is a tradeoff between convenience and unintended errors. I would judge the convenience of implicit recycling to be fairly low, and the unintended error rate [with difficulty finding it] to be fairly high. I guess the R language has few options() that control its behavior---e.g., ala use strict; in perl. If it did, I would love to turn off implicit recycling, provided there is an explicit recycle possibility. I would not mind having the ability to be forced to define variables first, either, though this is not a big deal. R is pretty good in telling me when I use variables that have not been defined. regards, /ivo -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] odd feature
On 5/22/2006 5:01 PM, ivo welch wrote: Hi Thomas: One case where the vector-vector recycling rules are used is in vector-matrix operations: a-1:4 b-diag(4) a+b is this last expression intended to be intuitive and thus desirable? if anything else, I would end up writing something like this more as an error than by intent. I would suggest that recycling might be better to be explicitly asked for by the user---though I am not sure what the syntax should be. allow.recycle(a+b) at the end, I think implicit recycling is a tradeoff between convenience and unintended errors. I would judge the convenience of implicit recycling to be fairly low, and the unintended error rate [with difficulty finding it] to be fairly high. I guess the R language has few options() that control its behavior---e.g., ala use strict; in perl. If it did, I would love to turn off implicit recycling, provided there is an explicit recycle possibility. While I agree with you about these things in principle, this is a very old S design decision, and I think it can't really change now. It would have to be a local change (i.e. in my namespace, do things strictly), and that's very unlikely to happen. I would not mind having the ability to be forced to define variables first, either, though this is not a big deal. R is pretty good in telling me when I use variables that have not been defined. There are some code tools around that help to detect such things. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help with lagSeries (Rmetrics, fCalendar)
The function lagSeries calls the function timeSeries, which in turn calls timeDate. (All three functions are in the package fCalendar) During the call to timeDate the positions vector is altered. From that point on one has problems doing arithmetic with the series. For example, [EMAIL PROTECTED] [1] 1954-02-01 01:00:00 1954-03-01 01:00:00 1954-04-01 01:00:00 [4] 1954-05-01 01:00:00 1954-06-01 01:00:00 1954-07-01 01:00:00 [7] 1954-08-01 01:00:00 1954-09-01 01:00:00 1954-10-01 01:00:00 [10] 1954-11-01 01:00:00 ip.sa.lag - lagSeries(ip.sa) [EMAIL PROTECTED] [1] 1954-02-01 02:00:00 1954-03-01 02:00:00 1954-04-01 02:00:00 [4] 1954-05-01 02:00:00 1954-06-01 02:00:00 1954-07-01 02:00:00 [7] 1954-08-01 02:00:00 1954-09-01 02:00:00 1954-10-01 02:00:00 [10] 1954-11-01 02:00:00 ip.sa + ip.sa.lag Error in Ops.timeSeries(ip.sa, ip.sa.lag) : positions slot must match My best guess for why this is happening is that timeDate thinks that it is getting a date from the GMT time zone and converts it to the Zurich time zone. I am working with monthly data and I would rather avoid altogether dealing with hours and minutes. Can this be done? I tried to build my series with format = %Y-%m-%d but that did not help: z - timeSeries(matrix(c(1,2,3)), charvec = c(1954-02-01, 1954-03-01, 1954-04-01), format = %Y-%m-%d) z TS.1 1954-02-01 01:00:001 1954-03-01 01:00:002 1954-04-01 01:00:003 If I really have to put up with hours, minutes, and seconds, is there a way to avoid having lagSeries change the hours as it did in the first example above? I believe the arguments for lagSeries do not include anything that would change the behavior of timeDate. So, I guess, if there is a solution, it might be changing an environment value. I set Sys.putenv(TZ = GMT) as recommended, and of course that did not help. Thanks for any advice. FS __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] off-topic; iraq statistics again
Gabor Grothendieck wrote: I came across this one: http://www.nysun.com/article/32787 which says that the violent death rate in Iraq (which presumably includes violent deaths from the war) is lower than the violent death rate in major American cities. Does anyone have any insights from statistics on how to interpret this? I finally had time to follow up on this. The NY Sun article compares apples and oranges, ie US cities to all of Iraq. For data on Baghdad, see http://www.iraqbodycount.org/press/pr13.php Iraq Body Count Press Release 13, 9th March 2006. Figures released by IBC today, updated by statistics for the year 2005 from the main Baghdad morgue, show that the total number of civilians reported killed has risen year-on-year since May 1st 2003 (the date that President Bush announced ��major combat operations have ended��): * 6,331 from 1st May 2003 to the first anniversary of the invasion, 19th March 2004 (324 days: Year 1) * 11,312 from 20th March 2004 to 19th March 2005 (365 days: Year 2) * 12,617 from 20th March 2005 to 1st March 2006 (346 days: Year 3). According to several websites, the population of Baghdad is about 5 million. According to R the violent death rate per 10 based on the year 3 data is (12617/500)*10 [1] 252.34 For comparison, see http://www.infoplease.com/ipa/A0004902.html, Crime Rates for Selected Large Cities, 2002: homicides per 10 per year. New York, N.Y. 7.3 Los Angeles, Calif. 17.1 Chicago, Ill. 22.1 Houston, Tex. 12.5 Philadelphia, Pa. 18.9 Phoenix, Ariz. 12.6 San Diego, Calif.3.7 Dallas, Tex.15.8 San Antonio, Tex.8.4 Las Vegas, Nev.311.9 Detroit, Mich. 41.8 San Jose, Calif. 2.8 Honolulu, Hawaii 2.0 Indianapolis, Ind.3 13.9 San Francisco, Calif.8.4 etc... regards albyn __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem installing rgdal from source -- Makevars issue
On Mon, 22 May 2006, Thomas Adams wrote: List: I have installed R, contributed packages, and supporting libraries in a non-standard location (but under the same sub-directory tree). This was necessary for systems reasons beyond my control. When I attempt to use Makevars (as shown below) to install rgdal from source, I get the following error: R CMD INSTALL rgdal_0.4-3.tar.gz This is a rather specific question, I'll reply off-list tomorrow (my time), because the advice given in the output you can see doesn't help. The ./configure file cannot know that the rgdal/src/Makevars file has been set manually, and carries on stopping you. Questions like this can preferably be sent directly to the package maintainer. Roger . . . checking proj_api.h usability... no checking proj_api.h presence... no checking for proj_api.h... no libproj.a and proj_api.h not found in standard search locations, edit src/Makevars manually, adding -Idirectory with proj_api.h to PKG_CPPFLAGS = , and -Ldirectory with libproj.a to PKG_LIBS = ERROR: configuration failed for package 'rgdal' ** Removing '/awips/rep/lx/local_apps/R/lib/R/library/rgdal' ** Restoring previous '/awips/rep/lx/local_apps/R/lib/R/library/rgdal' I get this error regardless of where I put the Makevars file: $HOME/.R/Makevars, in the local build directory, or in the rgdal source directory src/Makevars runing ./configure My Makevars contains this: PKG_CPPFLAGS = -I/awips/rep/lx/local_apps/proj4/include PKG_LIBS = -L/awips/rep/lx/local_apps/proj4/lib I have also tried using the file name: Makevars and Makevars-i686-pc-linux-gnu I'm sure I am doing *something* wrong; any suggestions? Regards, Tom -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Subset a list
I have a data frame of ~200 columns and ~20,000 rows where each column consists of binary responses (0,1) and a 9 for missing data. I am interested in finding the columns for which there are fewer than 100 individuals with responses of 0. I can use an apply function to generate a table for each column, but I'm not certain whether I can subset a list based on some criterion as subset() is designed for vectors, matrices or dataframes. For example, I can use the following: tt - apply(data, 2, table) Which returns an object of class list. Here is some sample output from tt $R0235940b 0 1 9 2004 1076 15361 $R710a 0 9 2 18439 $R710b 0 1 9 3941 11167 tt$R710a meets my criteria and I would want to be able to easily find this instead of rolling through the entire output. Is there a way to subset this list to identify the columns which meet the criteria I note above? Thanks, Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subset a list
On Mon, 2006-05-22 at 17:55 -0400, Doran, Harold wrote: I have a data frame of ~200 columns and ~20,000 rows where each column consists of binary responses (0,1) and a 9 for missing data. I am interested in finding the columns for which there are fewer than 100 individuals with responses of 0. I can use an apply function to generate a table for each column, but I'm not certain whether I can subset a list based on some criterion as subset() is designed for vectors, matrices or dataframes. For example, I can use the following: tt - apply(data, 2, table) Which returns an object of class list. Here is some sample output from tt $R0235940b 0 1 9 2004 1076 15361 $R710a 0 9 2 18439 $R710b 0 1 9 3941 11167 tt$R710a meets my criteria and I would want to be able to easily find this instead of rolling through the entire output. Is there a way to subset this list to identify the columns which meet the criteria I note above? Thanks, Harold Harold, How about this: DF V1 V2 V3 V4 V5 1 0 1 0 1 0 2 0 0 1 0 1 3 0 0 1 1 0 4 1 1 0 0 1 5 1 1 1 1 0 6 0 1 0 1 1 7 0 1 1 1 0 8 0 1 0 0 0 9 0 0 1 1 0 10 1 0 0 1 1 # Find the columns with 5 0's which(sapply(DF, function(x) sum(x == 0)) 5) V2 V4 2 4 So in your case, just replace the DF with your data frame name and the 5 with 100. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ordinal Independent Variables
When I run lrm from the Design package, I get a warning about contrasts when I include an ordinal variable: Warning message: Variable ordfac is an ordered factor. You should set options(contrasts=c(contr.treatment,contr.treatment)) or Design will not work properly. in: Design(eval(m, sys.parent())) I don't get this message if I use glm with family=binomial. It produces linear and quadratic contrasts. If it's improper to do this for an ordinal variable, why does glm not balk? Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] sort the values of members and obtain their ranks within groups
Dear all, I would like to sort the values of member for each group, and obtain a variable to indicate its rank within the group. For example, we have original dataset as follows: df - data.frame(group = c(rep(1, 3), rep(5, 3)), member = c(30, 10, 22, 21, 44, 15)) group member 1 1 30 2 1 10 3 1 22 4 5 21 5 5 44 6 5 15 After sorting the variable member, we want the dataset like this: group member order 1 1 30 3 2 1 10 1 3 1 22 2 4 5 21 2 5 5 44 3 6 5 15 1 I already searched the R helper archives, but failed to find the proper answers. Could you please kindly give me some suggestions, except using loop because of larger sample size? Thanka a lot. Best regards, Zhen Zhang [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subset a list
Doran, Harold wrote: I have a data frame of ~200 columns and ~20,000 rows where each column consists of binary responses (0,1) and a 9 for missing data. I am interested in finding the columns for which there are fewer than 100 individuals with responses of 0. I can use an apply function to generate a table for each column, but I'm not certain whether I can subset a list based on some criterion as subset() is designed for vectors, matrices or dataframes. For example, I can use the following: tt - apply(data, 2, table) Which returns an object of class list. Here is some sample output from tt $R0235940b 0 1 9 2004 1076 15361 $R710a 0 9 2 18439 $R710b 0 1 9 3941 11167 tt$R710a meets my criteria and I would want to be able to easily find this instead of rolling through the entire output. Is there a way to subset this list to identify the columns which meet the criteria I note above? How about this? newdf - mydf[,colSums(mydf == 0) 100] Thanks, Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ordinal Independent Variables
Rick Bilonick wrote: When I run lrm from the Design package, I get a warning about contrasts when I include an ordinal variable: Warning message: Variable ordfac is an ordered factor. You should set options(contrasts=c(contr.treatment,contr.treatment)) or Design will not work properly. in: Design(eval(m, sys.parent())) I don't get this message if I use glm with family=binomial. It produces linear and quadratic contrasts. If it's improper to do this for an ordinal variable, why does glm not balk? Rick B. Standard regression methods don't make good use of ordinal predictors and just have to treat them as categorical. Design is a bit picky about this. If the predictor has numeric scores for the categories, you can get a test of adequacy of the scores (with k-2 d.f. for k categories) by using scored(predictor) in the formula. Or just create a factor( ) variable to hand to Design. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] sort the values of members and obtain their ranks within groups
x group member 1 1 30 2 1 10 3 1 22 4 5 21 5 5 44 6 5 15 x$order - NA invisible(tapply(seq(nrow(x)), x$group, function(y) x$order[y] - rank(x$member[y]))) x group member order 1 1 30 3 2 1 10 1 3 1 22 2 4 5 21 2 5 5 44 3 6 5 15 1 On 5/22/06, Zhen Zhang [EMAIL PROTECTED] wrote: Dear all, I would like to sort the values of member for each group, and obtain a variable to indicate its rank within the group. For example, we have original dataset as follows: df - data.frame(group = c(rep(1, 3), rep(5, 3)), member = c(30, 10, 22, 21, 44, 15)) group member 1 1 30 2 1 10 3 1 22 4 5 21 5 5 44 6 5 15 After sorting the variable member, we want the dataset like this: group member order 1 1 30 3 2 1 10 1 3 1 22 2 4 5 21 2 5 5 44 3 6 5 15 1 I already searched the R helper archives, but failed to find the proper answers. Could you please kindly give me some suggestions, except using loop because of larger sample size? Thanka a lot. Best regards, Zhen Zhang [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to call a value labels attribute?
Dear All, after searching on CRAN I got the impression that there is no standard way in R to label values of a numerical variable. Since this would be useful for me I intend to create such an attribute, at the moment for my personal use. Still I would like to choose a name which does not conflict with names of commonly used attributes. Would value.labels or vallabs create conflicts? The attribute should be structured as data.frame with two columns, levels (numeric) and labels (character). These could then also be used to transform from numeric to factor. If the attribute is copied to the factor variable it could also serve to retransform the factor to the original numerical variable. Comments? Ideas? Thanks Heinz Tüchler __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] standardization of values before call to pam() or clara()
Greetings, Experimenting with the cluster package, and am starting to scratch my head in regards to the *best* way to standardize my data. Both functions can pre-standardize columns in a dataframe. according to the manual: Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. This works well when input variables are all in the same units. When I include new variables with a different intrinsic range, the ones with the largest relative values tend to be _weighted_ . this is certainly not surprising, but complicates things. Does there exist a robust technique to effectively re-scale each of the variables, regardless of their intrinsic range to some set range, say from {0,1} ? I have tried dividing a variable by the maximum value of that variable, but I am not sure if this is statistically correct. Any ideas, thoughts would be greatly appreciated. Cheers, -- Dylan Beaudette Soils and Biogeochemistry Graduate Group University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cross correlation/ bivariate/ mantel
RSiteSearch(mantel statistic) produced 27 hits for me, but the first few suggested I might not find enlightenment there. RSiteSearch(spatial statistics) returned 1064 hits, so I restricted that to RSiteSearch(spatial statistics, functions). The first of those was a director for the 'spdep' package. A search for Mantel on that page identified plot.mc.sim: Mantel-Hubert spatial general cross product statistic. I haven't tried it. I know nothing more about it than what I've already told you. I hope you'll forgive the terrible pun if I say that I hope I not only gave you a fish, I showed you how I caught it. Best Wishes, Spencer Graves McClatchie, Sam (PIRSA-SARDI) wrote: Background: OS: Linux Ubuntu Dapper release: R 2.3.0 editor: GNU Emacs 21.4.1 front-end: ESS 5.2.3 - Colleagues I have two spatial datasets (latitude, longitude, fish eggs) and (latitude, longitude, fish larvae) at the same 280 stations (i.e. 280 cases). I want to determine if the 2 datasets are spatially correlated. In other words, do high numbers of larvae occur where there are high numbers of eggs? I would like to calculate the cross correlation for these bivariate data and calculate a Mantel statistic as described on pg. 147 of Fortin and Dale 2005 Spatial analysis. My search of R packages came up with acf and ccf functions but I don't think these are what I want. Does anyone know which spatial package I might find the appropriate test, please? Best fishes Sam Sam McClatchie, Biological oceanography South Australian Aquatic Sciences Centre PO Box 120, Henley Beach 5022 Adelaide, South Australia email [EMAIL PROTECTED] Cellular: 0431 304 497 Telephone: (61-8) 8207 5448 FAX: (61-8) 8207 5481 Research home page http://www.members.iinet.net.au/~s.mcclatchie/ /\ ...xX(° °)Xx / \\ (((° (((° ...xX(°O°)Xx [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cross correlation/ bivariate/ mantel
Thanks Spencer Give a wo/man a fish and s/he becomes an aid dependent, give a wo/man a rod and s/he becomes a fisherman Sam -Original Message- From: Spencer Graves [mailto:[EMAIL PROTECTED] Sent: Tuesday, 23 May 2006 9:58 AM To: McClatchie, Sam (PIRSA-SARDI) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Cross correlation/ bivariate/ mantel RSiteSearch(mantel statistic) produced 27 hits for me, but the first few suggested I might not find enlightenment there. RSiteSearch(spatial statistics) returned 1064 hits, so I restricted that to RSiteSearch(spatial statistics, functions). The first of those was a director for the 'spdep' package. A search for Mantel on that page identified plot.mc.sim: Mantel-Hubert spatial general cross product statistic. I haven't tried it. I know nothing more about it than what I've already told you. I hope you'll forgive the terrible pun if I say that I hope I not only gave you a fish, I showed you how I caught it. Best Wishes, Spencer Graves McClatchie, Sam (PIRSA-SARDI) wrote: Background: OS: Linux Ubuntu Dapper release: R 2.3.0 editor: GNU Emacs 21.4.1 front-end: ESS 5.2.3 - Colleagues I have two spatial datasets (latitude, longitude, fish eggs) and (latitude, longitude, fish larvae) at the same 280 stations (i.e. 280 cases). I want to determine if the 2 datasets are spatially correlated. In other words, do high numbers of larvae occur where there are high numbers of eggs? I would like to calculate the cross correlation for these bivariate data and calculate a Mantel statistic as described on pg. 147 of Fortin and Dale 2005 Spatial analysis. My search of R packages came up with acf and ccf functions but I don't think these are what I want. Does anyone know which spatial package I might find the appropriate test, please? Best fishes Sam Sam McClatchie, Biological oceanography South Australian Aquatic Sciences Centre PO Box 120, Henley Beach 5022 Adelaide, South Australia email [EMAIL PROTECTED] Cellular: 0431 304 497 Telephone: (61-8) 8207 5448 FAX: (61-8) 8207 5481 Research home page http://www.members.iinet.net.au/~s.mcclatchie/ /\ ...xX(° °)Xx / \\ (((° (((° ...xX(°O°)Xx [[alternative HTML version deleted]] -- -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RQuantlib
RQuantlib Experrs I am trying to perform some analysis on a dataset of options. I got RQuantlib to work using a for loop over my data.frame. I tried the sapply, lapply, and other apply functions to no avial. My first error occurs with the following x-sapply(X=atm.work,FUN=EuropeanOption, underlying = atm.work$For_Price, strike = atm.work$K, dividendYield = atm.work$BEY, riskFreeRate = atm.work$BEY, maturity = atm.work$t_exp, volatility = atm.work$sigma) Note, the type= is not specified. The function picks up my date column in the data.frame as the type= column. x-sapply(X=atm.work,FUN=EuropeanOption,underlying = atm.work$For_Price, strike = atm.work$K, + dividendYield = atm.work$BEY, riskFreeRate = atm.work$BEY, maturity = atm.work$t_exp, + volatility = atm.work$sigma) Error in EuropeanOption.default(X[[2]], ...) : Unexpected option type 2Jan97, aborting When I include a column named type with put in all rows I get the following error x-sapply(X=atm.work,FUN=EuropeanOption, underlying = atm.work$For_Price, strike = atm.work$K, + dividendYield = atm.work$BEY, riskFreeRate = atm.work$BEY, maturity = atm.work$t_exp, + volatility = atm.work$sigma, type=atm.work$type) Error in FUN(X[[1]], ...) : unused argument(s) ( ...) Using type=put or type=rep(1,nrows,put) all get this unused argument error. I check the function call to see what happens with I do not include the correct number of parameters. I received error messages detailing the call order of European parms. Thank you Joe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lattice package - question on plotting lines
Hi all, I was trying to plot a graph using the following data: method percent accuracy group A1 4 0.8529 cns A1 10 0.8412 cns A1 15 0.8235 cns A2 4 0.9353 cns A2 10 0.9412 cns A2 15 0.9471 cns A1 4 0.8323 col A1 10 0.8452 col A1 15 0.8484 col A2 4 0.8839 col A2 10 0.8677 col A2 15 0.8678 col # The code I'm using to generate the graphs is: ### code : xyplot(accuracy ~ percent|group,data = k, groups = method, allow.multiple = TRUE, scales = same,type=o, xlab = % of genes, ylab = Accuracy, main = Accuracy , aspect = 2/3, panel = function(...) { panel.superpose(...) } ) I have tried to use the 'get' and 'set' functions of 'par()', but can't figured it out. I have two questions: i) How can I set it so that the line plotted for A1 in all the plots is 'solid', and the one for A2 is 'dotted' for both the groups (cns and col) ii) How can I set the abline feature to show a horizontal line at different points on the y-axis for each of the two graphs? Any help would be highly appreciated. many thanks. - Sneak preview the all-new Yahoo.com. It's not radically different. Just radically better. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nls: formula error?
So thanks for the help, I have a matrix (AB) which in the first column has my bin numbers so -4 - +4 in 0.1 bin units. Then I have in the second column the frequency from some data. I have plotted them and they look roughly Gaussian. So I want to fit them/ find/optimize mu, sigma, and A. So I call the nls function : nls_AB - nls(x ~ (A/sig*sqrt(2*pi))* exp(-1*((x-mu)^2/(2* sig^2))),data=temp, start= list(A=0.1, mu=0.01, sig=0.5), trace=TRUE) Error in eval(expr, envir, enclos) : numeric 'envir' arg not of length one Temp looks like this: bin x [1,] -4.0 0 [2,] -3.9 0 [3,] -3.8 0 .etc [41,] 0.0 241 [42,] 0.1 229 [43,] 0.2 258 [44,] 0.3 305 [45,] 0.4 370 [46,] 0.5 388 So I don't get my error message. I looked at doing class(fo - (x ~ (A/sig*sqrt(2*pi))* exp(-1*((x-mu)^2/(2* sig^2) terms(fo) and that seems to work. So if anyone has any ideas I would welcome them. Cheers, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] editing a big file
Here is one way of doing it: x - c(C/G, CC, GG, CG, G/T, GG, TT, GT, C/T, CC, + TT, CT, A/G, AA, GG, AG, A/C, AA, CC, AC, + A/T, AA, TT, AT) # convert to a matrix with 4 columns x - matrix(x, ncol=4, byrow=TRUE) x [,1] [,2] [,3] [,4] [1,] C/G CC GG CG [2,] G/T GG TT GT [3,] C/T CC TT CT [4,] A/G AA GG AG [5,] A/C AA CC AC [6,] A/T AA TT AT # now substitute for the last 3 columns result - apply(x, 1, function(a){ + # substitute for the first character + d1 - gsub(substr(a[1], 1, 1), '1', a[2:4]) + # substitute for the second + gsub(substr(a[1], 3, 3), '2', d1) + }) result [,1] [,2] [,3] [,4] [,5] [,6] [1,] 11 11 11 11 11 11 [2,] 22 22 22 22 22 22 [3,] 12 12 12 12 12 12 # output as vector as.integer(as.vector(result)) [1] 11 22 12 11 22 12 11 22 12 11 22 12 11 22 12 11 22 12 On 5/22/06, yohannes alazar [EMAIL PROTECTED] wrote: I have a file that has 90 columns and 20,000 rows and looks like C/G CC GG CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA TT AT I want to write a code that will read through each row first the first looks at the first column and then replace the three columns with 12 if it is the same as the first column e.g. third column 11 if it is a repeat of the first alphabet like the second column and 22 if it a repeat of the second alphabet. Can you please give some hint how I can start coding for this, like command names, I am not a programmer and an R beginner. my out put should look like this 11 22 12 11 22 12 11 22 12 11 22 12 11 22 12 11 22 12 Thank you in advance. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] sort the values of members and obtain their ranks within groups
See if this does what you want: cbind(df, rank=ave(df$member, df$group, FUN=rank)) group member rank 1 1 303 2 1 101 3 1 222 4 5 212 5 5 443 6 5 151 Andy _ From: [EMAIL PROTECTED] on behalf of Zhen Zhang Sent: Mon 5/22/2006 6:40 PM To: R-help@stat.math.ethz.ch Subject: [R] sort the values of members and obtain their ranks within groups [Broadcast] Dear all, I would like to sort the values of member for each group, and obtain a variable to indicate its rank within the group. For example, we have original dataset as follows: df - data.frame(group = c(rep(1, 3), rep(5, 3)), member = c(30, 10, 22, 21, 44, 15)) group member 1 1 30 2 1 10 3 1 22 4 5 21 5 5 44 6 5 15 After sorting the variable member, we want the dataset like this: group member order 1 1 30 3 2 1 10 1 3 1 22 2 4 5 21 2 5 5 44 3 6 5 15 1 I already searched the R helper archives, but failed to find the proper answers. Could you please kindly give me some suggestions, except using loop because of larger sample size? Thanka a lot. Best regards, Zhen Zhang [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lattice package - question on plotting lines
On Mon, 2006-05-22 at 18:36 -0700, Tim Smith wrote: Hi all, I was trying to plot a graph using the following data: method percent accuracy group A1 4 0.8529 cns A1 10 0.8412 cns A1 15 0.8235 cns A2 4 0.9353 cns A2 10 0.9412 cns A2 15 0.9471 cns A1 4 0.8323 col A1 10 0.8452 col A1 15 0.8484 col A2 4 0.8839 col A2 10 0.8677 col A2 15 0.8678 col # The code I'm using to generate the graphs is: ### code : xyplot(accuracy ~ percent|group,data = k, groups = method, allow.multiple = TRUE, scales = same,type=o, xlab = % of genes, ylab = Accuracy, main = Accuracy , aspect = 2/3, panel = function(...) { panel.superpose(...) } ) I have tried to use the 'get' and 'set' functions of 'par()', but can't figured it out. I have two questions: i) How can I set it so that the line plotted for A1 in all the plots is 'solid', and the one for A2 is 'dotted' for both the groups (cns and col) ii) How can I set the abline feature to show a horizontal line at different points on the y-axis for each of the two graphs? Any help would be highly appreciated. many thanks. - Sneak preview the all-new Yahoo.com. It's not radically different. Just radically better. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Does this get you closer? xyplot(accuracy ~ percent|group,data = k, groups = method, allow.multiple = TRUE, scales = same,type=o, par.settings=list(superpose.line = list(lty=1:2)), # added xlab = % of genes, ylab = Accuracy, main = Accuracy , aspect = 2/3, panel = function(...) { panel.superpose(...) } ) -- Tony [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RQuantlib Array processing applying EuropeanOptionExampleArray
On 22 May 2006 at 10:57, Joe Byers wrote: | I try and run the EuropeanOption example using atm.work$For_Price as my | array of underlying prices and the other inputs from row 9 of atm.work. | i-9; | x-EuropeanOption(type = put, underlying = atm.work$For_Price, strike | = atm.work$K[i], | dividendYield = atm.work$BEY[i], riskFreeRate = | atm.work$BEY[i], maturity = atm.work$t_exp[i], | volatility = atm.work$sigma[i]) | | x$parameters has the array of underlying prices but the results is only | a single vector using the first row of atm.work$For_Price. Is this | because I am pulling the inputs from data.frame not arrays? If I understand correctly what you are trying to do, then there may simply be a misunderstanding on your part. EuropeanOption(), like all but one [ more on that in the next paragraph ] of the other RQuantLib functions, expects _scalars_ for all its inputs. But you seem to expect that it magically vectorises things automatically for you. It doesn't, and that's why you get errors or unexpected results. There is however one convenience function -- mostly as a proof of concept -- which unrolls vectors (or sequences) in any of its arguments. See help(EuropeanOptionArrays) and example(EuropeanOptionArrays) where the results are shown in terms of two sequences of values for the underlying and the volatility. As always, there will be many more clever ways for doing this. Suggestions and patches welcome. | Any help is greatly appreciated. Hope this helps. Regards, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RMySQL on Mac OS 10.4
I have been having trouble getting RMySQL to work on Mac OS 10.4. It was working fine for me about a year ago but I didn't touch it for a while and several upgrades of R, MySQL, and OS X later, it is not working. I first had the problem that others have been having getting the error discussed in https://stat.ethz.ch/pipermail/r-sig-mac/2005- November/002435.html saying that RS-DBI driver warning: (MySQL mismatch between compiled version 4.0.24 and runtime version 4.1.14). I was able to get this error message to disappear by compiling MySQL with gcc 3.3, and now I can connect to my database and run queries, but when I try to use dbWriteTable, I get this error: dbWriteTable(con, test, rnorm(100)) Error in .class1(object) : no direct or inherited method for function 'dbWriteTable' for this call For more info, I am running OS X 10.4.6, R 2.2.1, I have compiled RMySQL from source version 0.5-7, and I'm running MySQL 4.1.19. I have no idea what this error means and I have not found anything about it anywhere else in the help files. If anyone knows what I can do to fix this, I would really really appreciate it. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RQuantlib Array processing applying EuropeanOptionExampleArray
Dirk, Thank you. Shortly after I sent this email off, I discovered that I left the Array off the function name. It worked fine. My ignorance and oversight. Thanx for the help. Joe Dirk Eddelbuettel wrote: On 22 May 2006 at 10:57, Joe Byers wrote: | I try and run the EuropeanOption example using atm.work$For_Price as my | array of underlying prices and the other inputs from row 9 of atm.work. | i-9; | x-EuropeanOption(type = put, underlying = atm.work$For_Price, strike | = atm.work$K[i], | dividendYield = atm.work$BEY[i], riskFreeRate = | atm.work$BEY[i], maturity = atm.work$t_exp[i], | volatility = atm.work$sigma[i]) | | x$parameters has the array of underlying prices but the results is only | a single vector using the first row of atm.work$For_Price. Is this | because I am pulling the inputs from data.frame not arrays? If I understand correctly what you are trying to do, then there may simply be a misunderstanding on your part. EuropeanOption(), like all but one [ more on that in the next paragraph ] of the other RQuantLib functions, expects _scalars_ for all its inputs. But you seem to expect that it magically vectorises things automatically for you. It doesn't, and that's why you get errors or unexpected results. There is however one convenience function -- mostly as a proof of concept -- which unrolls vectors (or sequences) in any of its arguments. See help(EuropeanOptionArrays) and example(EuropeanOptionArrays) where the results are shown in terms of two sequences of values for the underlying and the volatility. As always, there will be many more clever ways for doing this. Suggestions and patches welcome. | Any help is greatly appreciated. Hope this helps. Regards, Dirk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RMySQL on Mac OS 10.4
Ryan Hafen [EMAIL PROTECTED] writes: dbWriteTable(con, test, rnorm(100)) Error in .class1(object) : no direct or inherited method for function 'dbWriteTable' for this call For more info, I am running OS X 10.4.6, R 2.2.1, I have compiled RMySQL from source version 0.5-7, and I'm running MySQL 4.1.19. I have no idea what this error means and I have not found anything about it anywhere else in the help files. If anyone knows what I can do to fix this, I would really really appreciate it. I think this just means that dbWriteTable doesn't know how to write vectors, it knows how to write data frames (according to the man page). So you could test: dbWriteTable(con, test, data.frame(rng=rnorm(100))) + seth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] nls: formula error?
The data= argument cannot be a matrix. See ?nls On 5/22/06, H. Paul Benton [EMAIL PROTECTED] wrote: So thanks for the help, I have a matrix (AB) which in the first column has my bin numbers so -4 - +4 in 0.1 bin units. Then I have in the second column the frequency from some data. I have plotted them and they look roughly Gaussian. So I want to fit them/ find/optimize mu, sigma, and A. So I call the nls function : nls_AB - nls(x ~ (A/sig*sqrt(2*pi))* exp(-1*((x-mu)^2/(2* sig^2))),data=temp, start= list(A=0.1, mu=0.01, sig=0.5), trace=TRUE) Error in eval(expr, envir, enclos) : numeric 'envir' arg not of length one Temp looks like this: bin x [1,] -4.0 0 [2,] -3.9 0 [3,] -3.8 0 .etc [41,] 0.0 241 [42,] 0.1 229 [43,] 0.2 258 [44,] 0.3 305 [45,] 0.4 370 [46,] 0.5 388 So I don't get my error message. I looked at doing class(fo - (x ~ (A/sig*sqrt(2*pi))* exp(-1*((x-mu)^2/(2* sig^2) terms(fo) and that seems to work. So if anyone has any ideas I would welcome them. Cheers, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Get list of ODBC data sources?
Hello R Helpers, Before setting up a connection with RODBC, I would like to present my users with a pick list of ODBC data sources available in their environment. I may be missing something, but don't see anything in RODBC itself to return list of sources for use in select.list(). Any hints? I'm running 2.3.0 on Win XP SP2. -- TIA, Jim Porzak Loyalty Matrix Inc. San Francisco, CA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Plotting problem-Dendrograms Help!!
Dear List, My dataset is as below: I am using library(cluster) hierarchical clustering on these data. If i try to plot these i couldn't see the partitions clearly, please find the link on the plot : http://roughjade.blogspot.com Could anyone give me some suggesstions. AAP326 29 AAW315 37 AAW321 24 AAW322 7 AAW331 22 ACE381 22 ACP112 21 ACP212 24 ACP251 26 ACP321 31 ACW102 39 ACW112 21 ACW121 17 ACW131 7 ACW212 24 ACW241 37 ACW251 26 ACW261 33 ACW311 7 ACW321 31 ACW342 39 ACW371 29 AFP365 24 AFU366 20 AFW360 2 AFW362 40 AGU610 8 AGW609 16 AGW615 1 AGW617 31 AGW619 6 AGW705 32 AGW707 2 AGW708 13 AKP202 16 AKW101 16 AKW102 40 AKW103 16 AKW104 40 AKW202 16 AKW301 33 AKW302 9 AMP342 20 AMP343 2 AMP344 22 AMP345 36 AMP347 40 AMU643 3 AMW340 25 AMW343 2 AMW344 22 AOP351 40 AOP352 32 AOP353 2 AOP354 2 AOP356 13 AOU655 21 AOW351 40 AOW353 2 APP301 22 APP373 18 APP374 13 APW302 26 ATP393 32 ATU396 8 ATW103 30 ATW108 39 ATW222 28 ATW223 28 ATW241 35 ATW251 14 ATW252 14 ATW392 5 BAT213 8 BAT301 18 BAT302 20 BBT213 38 BBT214 26 BBT301 40 BBT303 22 BET302 18 BET303 28 BGT213 24 BKT302 5 BMT202 26 BMT204 21 BMT302 17 BOE201 14 BOI101 28 BOI102 1 BOI103 23 BOI104 21 BOI109 37 BOM113 22 BOM114 31 BOT205 13 BPT302 6 BST203 5 BST204 24 BST301 39 BST304 34 BTT202 5 BTT302 32 BTT303 28 BVT211 32 BZT213 34 BZT302 22 BZT303 7 CAM101 38 CAP101 38 CAT102 6 CCS523 14 CCS542 21 CCS545 28 CIT511 7 CIT512 36 CIT521 33 CIT522 21 CIT531 28 CMM101 38 CMM102 36 CMM201 5 CMM202 7 CMM212 38 CMM301 2 CMP301 40 CMT201 38 CMT202 5 CMT311 2 CMT312 7 CPM101 38 CPM102 33 CPM301 2 CPM302 7 CPM303 26 CPP301 40 CPP302 38 CPP303 16 CPS301 18 CPS304 14 CPT101 38 CPT103 13 CPT104 6 CPT311 40 CPT312 38 CSA401 14 CSC535 14 CSI513 33 CSI533 1 CST102 33 CST311 36 CST312 31 DTM302 36 DTM324 3 DTM325 30 DTM346 16 DTM364 6 FCP552 23 FCP554 37 FCP557 25 FEK101 26 FEK202 22 FEK302 24 FEK403 28 FFK454 32 FFK455 23 FFK456 30 FIT141 15 FIT142 3 FIT143 8 FIT242 3 FIT343 26 FIT344 39 FKF112 13 FKF213 8 FKK171 5 FTF222 5 FTF223 32 FTF323 6 FTF324 3 HBT104 28 HBT105 34 HBT207 30 HBT213 5 HEK212 26 HEK222 28 HET123 13 HET224 2 HET322 5 HET323 17 HET324 23 HET521 18 HET522 33 HGF121 38 HGF324 22 HGG353 40 HGM131 6 HGM236 32 HGM237 29 HGM341 26 HGT111 36 HGT219 5 HGT318 1 HGW371 3 HIA101 37 HIS213 36 HIS315 16 HIU225 24 HKB112 30 HKB315 26 HKB316 7 HKB501 29 HKB502 2 HKN101 8 HKN206 20 HKN207 13 HKN208 22 HKN307 28 HKT121 15 HMT124 32 HMT221 23 HMT222 34 HMT228 27 HMT322 13 HMT328 39 HMT504 32 HMT505 2 HMT507 16 HPW102 1 HSE251 37 HSI242 24 HSM111 2 HSM213 39 HSM215 28 HSM316 20 HSM317 18 HST121 30 HST325 26 HTU211 19 HTV201 14 HXE109 21 HXE110 32 HXE205 28 HXE208 24 HXE302 6 HXE303 20 HXE308 36 IEG100 21 IEK102 32 IEK103 27 IEK106 6 IEK205 29 IEK206 35 IEK208 37 IEK308 17 IEK316 23 IMG103 21 IMG208 15 IMG302 38 IMG304 15 IMK104 34 IMK208 17 IMK209 32 IMK210 22 IMK304 13 IPK101 7 IPK203 23 IPK205 13 IPK216 34 IPK218 39 IPK306 5 IPK314 25 IPK317 21 IUK105 14 IUK204 29 IUK302 40 IWK101 38 IWK102 6 IWK104 21 IWK202 5 IWK203 33 IWK301 33 IWK305 14 KAA502 1 KAA505 23 KAE246 21 KAE345 23 KAT241 33 KAT244 32 KAT341 4 KEN200 26 KEN300 36 KFT131 17 KFT232 6 KIE232 20 KIE356 29 KIE358 6 KIT252 26 KIT254 36 KIT256 34 KIT359 20 KOE321 39 KOT121 2 KTE211 36 KTE311 1 LAA100 20 LAA200 28 LAA300 36 LAC100 30 LAC200 32 LAE100 23 LAE200 23 LAE300 1 LAG100 24 LAG200 22 LAG300 28 LAJ100 8 LAJ200 8 LAJ300 36 LAJ400 35 LAP100 26 LAP200 13 LAS100 30 LAS200 13 LAS300 36 LHP451 37 LHP452 17 LHP453 1 LHP454 30 LKM100 18 LKM200 13 LKM300 9 LKM400 4 LLC200 6 LLC400 39 LLJ200 38 LLJ400 15 LLS307 32 LPT208 18 MAA101 30 MAA111 30 MAA161 13 MAT102 1 MAT111 35 MAT122 3 MAT161 30 MAT181 15 MAT202 21 MAT511 29 MSG162 8 MSG228 25 MSG252 29 MSG253 1 MSG262 22 MSG265 32 MSG283 31 MSG284 33 MSG355 24 MSG356 34 MSG367 39 MSG368 35 MSG388 23 MSG389 31 MSS211 33 MSS301 26 MSS302 30 MST565 26 PGA103 25 PGA104 40 PGM261 33 PGM322 9 PGM324 38 PGM333 9 PGM342 20 PGM343 34 PGM363 34 PGT302 5 PGT311 30 PGT313 33 PGT322 9 PGT324 38 PGT325 9 PGT333 9 PGT342 20 PGT343 34 PGT363 34 PLG333 17 PLG516 8 PLG517 3 PLG518 37 PLG526 8 PLG532 29 PLG537 13 PLG542 16 PLG546 18 PLG548 22 PLG551 39 PLG553 16 PLG556 22 PLG558 16 PLG701 24 PLG712 16 PLG721 30 PLG722 27 PLG731 37 PLG732 30 PPG202 9 PPG203 40 PPG310 28 PPG311 24 PPG312 39 PPG313 36 PPG314 5 PPG315 7 PPG316 7 PPG317 32 PPG318 16 PPG319 30 PPG320 35 PPG331 26 PPG333 22 RAG265 23 RAG322 13 RAK343 34 RAK344 25 RAK552 21 RDB314 37 RDG266 21 RDG333 34 REG162 6 REG261 32 REG262 6 REG264 8 REG363 36 REG364 1 RET521 21 RET522 40 RET531 13 RET532
[R] hclust-Dendrograms-Help!
Dear List, My dataset is as below: I am using library(cluster) hierarchical clustering on these data. If i try to plot these i couldn't see the partitions clearly, dd-hclust(dist(DataSetS01022Full), ave) Warning message: NAs introduced by coercion (dn-as.dendrogram(dd)) plot(dn) please find the link on the plot : http://roughjade.blogspot.com i couldn't figure it out whether *design *of the dataset. Could anyone give me some suggesstions. AAP326 29 AAW315 37 AAW321 24 AAW322 7 AAW331 22 ACE381 22 ACP112 21 ACP212 24 ACP251 26 ACP321 31 ACW102 39 ACW112 21 ACW121 17 ACW131 7 ACW212 24 ACW241 37 ACW251 26 ACW261 33 ACW311 7 ACW321 31 ACW342 39 ACW371 29 AFP365 24 AFU366 20 AFW360 2 AFW362 40 AGU610 8 AGW609 16 AGW615 1 AGW617 31 AGW619 6 AGW705 32 AGW707 2 AGW708 13 AKP202 16 AKW101 16 AKW102 40 AKW103 16 AKW104 40 AKW202 16 AKW301 33 AKW302 9 AMP342 20 AMP343 2 AMP344 22 AMP345 36 AMP347 40 AMU643 3 AMW340 25 AMW343 2 AMW344 22 AOP351 40 AOP352 32 AOP353 2 AOP354 2 AOP356 13 AOU655 21 AOW351 40 AOW353 2 Thanks in Advance, -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Possible with lm() or glm()?
HelpeRs, I have what amounts to a stupid design---entirely my fault, but it seemed like a good idea at the time. Every participant in the experiment receives a series of a two-alternative forced-choice (2afc) pairs generated at random from a pool of items that make up the pairs. The independent variable of interest is the difference in the 2afc pairs on a metric measure to predict 2afc accuracy. For each participant, the 2afc pairs are generated at random (without replacement) from set of items. This approach means that for any participant there are 40 2afc trials in which the independent variable varies at random from differences of 0 to some large absolute difference. The problem is that that because of the random construction of the pairs there is no consistent variation in the IV over participants (e.g., some participants contribute a high frequency of judgements on small difference pairs, others on large difference pairs, etc.). What I want is the relationship between the IV (pair differences on the metric measure) and 2afc accuracy, with the variation due to the random assignment of pair differences to participants removed. To do so, I want to compute the residuals relating IV differences to 2afc accuracy after participant variation (relating to the covariation with absolute difference) is removed. So, I don't want to remove participant variation per se, bit only that arising from the covariation of participant with IV pair difference. Ultimately, I want to plot the psychometric function, complete with some variant of confidence intervals. I hope that is clear. So, is there some approach within lm() or glm() that will provide for this? Any ideas? To make it concrete, imagine I have a data frame consisting of participant as a factor, absolute pair difference within participant as the vector IV and response (correct/ incorrect) as the vector DV. [The actual design is more complex, but this simple structure captures my concern.] -- Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -Dr. John R. Vokey __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html