Re: [R] creating 'all' sum contrasts
G'day Michael, On Fri, 15 Oct 2010 12:09:07 +0100 Michael Hopkins hopk...@upstreamsystems.com wrote: OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 110 201 3 -1 -1 These two contrasts are *not* orthogonal. This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. You have to decide what you want. The contrasts for all pairwise comparaisons between the levels or all possible orthogonal contrasts? The latter is actually not well defined. For a factor with p levels, there would be p orthogonal contrasts, which are only identifiable up to rotation, hence infinitely many such sets. But there are p(p-1) pairwise comparisons. So unless p=2, yo have to decide what you want Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? Look at package multcomp, in particular functions glht and mcp, these might help. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-Fortran question (multiple subroutines)
G'day Remko, On Mon, 25 Oct 2010 15:33:30 +1100 Remko Duursma remkoduur...@gmail.com wrote: apologies if this is somewhere in a manual, I have not been able to find anything relevant. You probably have to set some appropriate flags and the information should be somewhere in the manual of your compiler. Though, Writing R Exensions will tell you now to change/specify flags for compilers. I run Windows Vista. My condolences. :) But is it possible to have more than one subroutine in my source file, one depending on the other? Yes. I.e, my code looks something like this: subroutine f(x,y,z) call g(x,y,z) end subroutine g(x,y,z) z = x*y end calling this from R shows that subroutine g is not called. What do you mean with g is not called? How did you assert this? If the code functions correctly, then g has to be called, either explicitly or implicitly (e.g. if it was inlined). Do you mean that when you compile the code and load the resulting DLL from R, you can only call subroutine f via .Fortran but not subroutine g? The code compiled as executable works fine. What do you mean with compiled as executable? The snippet that you showed would not compile as an executable as there is no main program. So what do you mean with works fine? That you can call the subroutine g from the main program that you use when creating an executable? My guess, based on this snippet, is that your compiler notices that subroutine g is not really needed and replaces the call to g within f by the body of g and does not create a separate entry point for g in the compiled object; a process known as in-lining (IIRC) and typically used by compilers if high levels of optimisations are requested. The compiler does not know that you intend to call g later directly from R via .Fortran, all it sees is the code in the file and, if high optimisation is requested, it may feel free to rearrange the code to stream-line it (e.g. by in-lining). So possible solutions could be: 1) ask for a lower level of optimisation 2) tell the compiler not to do in-lining (flag --no-inline??) 3) put the routines into separate files, compile the files separately and then link all the resulting object files together into a DLL. AFAIK, optimising/inlining across separate files is a tricky issue so few, if any, compilers would do so. 4) Check whether there is an option to specify which exportable symbols have to be in the DLL in the end. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-Fortran question (multiple subroutines)
G'day all, On Mon, 25 Oct 2010 06:52:15 -0400 Duncan Murdoch murdoch.dun...@gmail.com wrote: Remko Duursma wrote: [...] I.e, my code looks something like this: subroutine f(x,y,z) call g(x,y,z) end subroutine g(x,y,z) z = x*y end calling this from R shows that subroutine g is not called. The code compiled as executable works fine. There are no such limitations imposed by R. I'd suggest your diagnosis of the problem is wrong. If you can't spot the problem, please post a real example (simplified if possible, but not as much as the one above). Actually, it turns out that this example is simplified enough. :) I put this snippet into a file, compiled it via R CMD SHLIB, loaded it into R and then was very surprised about the result of .Fortran(f, x=1.1, y=2.2, z=0.0). Eventually it dawned to me, Remko needs to read the table in section 5.2 of Writing R Extensions. The simple fix in this case is to put a double precision x, y, z at the beginning of each subroutine. The default precision in FORTRAN is not double precision, that's why the code seems to work when compiled as stand alone but not when called from R. In general, I would recommend to start each subroutine with implicit none (o.k., I know that this is not standard FORTRAN77 but most compiler support that construct; the alternative, that would confrom with the FORTRAN77 standard, is to declare everything to be implicitly of type character and then let the fun begin) and then to explicitly declare all variables to be of the appropriate type. HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cube root of a negative number
G'day Gregory, On Tue, 26 Oct 2010 19:05:03 -0400 Gregory Ryslik rsa...@comcast.net wrote: Hi, This might be me missing something painfully obvious but why does the cube root of the following produce an NaN? (-4)^(1/3) [1] NaN 1/3 is not exactly representable as a binary number. My guess is that the number that is closest to 1/3 and representable cannot be used as the exponent for negative numbers, hence the NaN. Essentially, don't expect finite precision arithmetic to behave like infinite precision arithmetic, it just doesn't. The resources mentioned in FAQ 7.31 can probably shed more light on this issue. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cube root of a negative number
G'day Bill, On Wed, 27 Oct 2010 10:34:27 +1100 bill.venab...@csiro.au wrote: [...] It is no surprise that this does not work when working in the real domain, except by fluke with something like -4^(1/3) [1] -1.587401 where the precedence of the operators is not what you might expect. Now that could be considered a bug, since apparently -4^(1/2) [1] -2 which comes as rather a surprise! [...] Mate, you must have been using Excel (or bc) too much recently if this takes you by surprise. :) This is exactly the behaviour that I would expect as I was always taught that exponentiation was the operator with the highest priority. The discussion at http://en.wikipedia.org/wiki/Order_of_operations points out that there exist differing conventions concerning the unary operator - but gives only Excel and bc as examples of programs/languages who give the unary operator the higher precedence. Moreover, http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html points out that in Excel the precedence of unary minus is different than many other languages (including VBA which is often used with Excel). Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what´s wrong with this code?
G'day Jose, On Fri, 29 Oct 2010 11:25:05 +0200 José Manuel Gavilán Ruiz g...@us.es wrote: Hello, I want to maximize a likelihood function expressed as an integral that can not be symbolically evaluated. I expose my problem in a reduced form. g- function(x){ integrand-function(y) {exp(-x^2)*y} g-integrate(integrand,0,1) } h-function(x) log((g(x))) g is an object of the class function, but g(2) is a integrate object, I can print(g(2)) an get a result, but if I try h(2) R says is a nonnumeric argument for a mathematical function. R print(g(2)) 0.00915782 with absolute error 1.0e-16 Indeed print(g(2)) gives an output, but it is obviously not just a numeric number but something formatted, presumably based on the class of the returned object from g(2) and the values that this object contains. (You may want to read up on R's way(s) to object oriented programming). So what does g() return? R str(g(2)) List of 5 $ value : num 0.00916 $ abs.error : num 1.02e-16 $ subdivisions: int 1 $ message : chr OK $ call: language integrate(f = integrand, lower = 0, upper = 1) - attr(*, class)= chr integrate The object is a list with several values, and class integrate. My goal is to maximize h. what´s wrong? I guess you want to pass only the component value to h(). R g - function(x){ + integrand - function(y) {exp(-x^2)*y} + integrate(integrand, 0, 1)$value} R g(2) [1] 0.00915782 R h(g(2)) [1] -0.693231 HTH, Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what´s wrong with this code?
G'day Jose, On Fri, 29 Oct 2010 11:25:05 +0200 José Manuel Gavilán Ruiz g...@us.es wrote: Hello, I want to maximize a likelihood function expressed as an integral that can not be symbolically evaluated. I expose my problem in a reduced form. g- function(x){ integrand-function(y) {exp(-x^2)*y} g-integrate(integrand,0,1) } h-function(x) log((g(x))) g is an object of the class function, but g(2) is a integrate object, I can print(g(2)) an get a result, but if I try h(2) R says is a nonnumeric argument for a mathematical function. My goal is to maximize h. Which can be done without R quite trivially. The result of g(x) is exp(-x^2)/2. So h(x) is -x^2-log(2), which is maximised at x=0. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R version 2-12.0 - running as 32 or as 64 bit?
G'day Dimitri, On Fri, 29 Oct 2010 16:45:00 -0400 Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Question: I installed R verison 2-12.0 on my Windows 7 (64 bit) PC. When I was installing it, it did not ask me anything about 32 vs. 64 bit. So, if I run R now - is it running as a 32-bit or a 64-bit? Well, when I did the same, I got two shortcuts installed on my desktop, one named R 2.12.0 and the other named R x64 2.12.0. If I had any doubts which version of R these shortcuts would start, then the fourth line of the start-up message would put them to rest. Missing that message, you can always issue the command .Machine$sizeof.pointer if the answer is 4, you are running 32bit, if the answer is 8, then you are running 64 bit. HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DBLEPR?
G'day John, On Tue, 16 Nov 2010 14:02:57 -0500 Prof. John C Nash nas...@uottawa.ca wrote: Are the xxxPR routines now deprecated (particularly for 64 bit systems) or still OK to use? They are still OK to use, and I use them occasionally. If OK, can anyone point to documentation and examples? Section 6.5.1 Printing from Fortran in the Writing R Extensions manual has documentation (but no examples). Luckily, Section 5.7 Fortran I/O, the section that I always look at first, has a link to Section 6.5.1. :) Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Namibia becoming NA
G'day Ted, On Sun, 18 Jul 2010 09:25:09 +0100 (BST) (Ted Harding) ted.hard...@manchester.ac.uk wrote: On 18-Jul-10 05:47:03, Suresh Singh wrote: I have a data file in which one of the columns is country code and NA is the code for Namibia. When I read the data file using read.csv, NA for Namibia is being treated as null or NA How can I prevent this from happening? I tried the following but it didn't work input - read.csv(padded.csv,header = TRUE,as.is = c(code2)) thanks, Suresh I suppose this was bound to happen, and in my view it represent a bit of a mess! With a test file temp.csv: Code,Country DE,Germany IT,Italy NA,Namibia FR,France Thanks for providing an example. leads to exactly the same result. And I have tried every variation I can think of of as.is and colClasses, still with exactly the same result! Did you think of trying some variations of na.strings? ;-) IMO, the simplest way of coding missing values in CSV files is to have two consecutive commas; not some code (whether NA, 99, 999, -1, ...) between them. Conclusion: If an entry in a data file is intended to become the character value NA, there seems to be no way of reading it in directly. This should not be so: it should be preventable! It is, through simple use of the na.strings argument: R X - read.csv(temp.csv, na.strings=) R X Code Country 1 DE Germany 2 IT Italy 3 NA Namibia 4 FR France R which(is.na(X)) integer(0) HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Call to rgamma using .C causes R to hang
G'day Steve, On Tue, 20 Jul 2010 17:20:49 +0930 Steve Pederson stephen.peder...@adelaide.edu.au wrote: I suggest to insert the following two lines (untested as I usually don't work on Windows): # include R.h # include Rmath.h void test1 (double *x, double *result) { GetRNGstate(); result[0] = rgamma(*x, 2.0); PutRNGstate(); } [...] I'm running Windows XP I followed the instructions at http://cran.r-project.org/doc/manuals/R-admin.html#Windows-standalone after building R from source. But you didn't follow the instructions of Section 6.3 in the Writing R Extensions manual. :) http://cran.r-project.org/doc/manuals/R-exts.html#Random-numbers HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] minor diagonal in R
On Tue, 7 Sep 2010 06:26:09 -0700 (PDT) Peng, C cpeng@gmail.com wrote: This this what you want? A=matrix(1:16,ncol=4) A [,1] [,2] [,3] [,4] [1,]159 13 [2,]26 10 14 [3,]37 11 15 [4,]48 12 16 diag(A[1:4,4:1]) [1] 13 10 7 4 Or A[cbind(1:4,4:1)] [1] 13 10 7 4 one character more to type, but could be more efficient for large A :) Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: What distribution is this?
G'day Rainer, On Sat, 25 Sep 2010 16:24:17 +0200 Rainer M Krug r.m.k...@gmail.com wrote: This is OT, but I need it for my simulation in R. I have a special case for sampling with replacement: instead of sampling once and replacing it immediately, I sample n times, and then replace all n items. So: N entities x samples with replacement each sample consists of n sub-samples WITHOUT replacement, which are all replaced before the next sample is drawn My question is: which distribution can I use to describe how often each entity of the N has been sampled? Surely, unless I am missing something, any given entity would have (marginally) a binomial distribution: A sub-sample of size n either contains the entity or it does not. The probability that a sub-sample contains the entity is a function of N and n alone. x sub-samples are drawn (with replacement), so the number of times that an entity has been sampled is the number of sub-samples in which it appears. This is given by the binomial distribution with parameters x and p, where p is the probability determined in the previous paragraph. I guess the fun starts if you try to determine the joint distribution of two (or more) entities. HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: What distribution is this?
G'day Rainer, On Sun, 26 Sep 2010 10:29:08 +0200 Rainer M Krug r.m.k...@gmail.com wrote: I realized that I am simply interested in the pdf p(y) that y *number* of entities (which ones is irrelevant) in N are are *not* drawn after the sampling process has been completed. Even simpler (I guess), in a first step, I would only need the mean number of expected non-drawn entities in N (pMean). But what about the range in between? You can calculate those probilities. Your problem was sounding vaguely familiar to me yesterday but I could not put my finger onto its abstract counterpart. Now with this clarification I can. :) Your set up is exactly the one of lottery draws. In each draw n of N numbers are drawn without replacement. So your question is what is the probability that I have not seen k of the N numbers after x draws?. This question can easily be answered by using some Markov Chain theory. Let Y_l be the number of entities that you have not seen after the l-th draw, taking possible values 0, 1, , N. Obviously, Y_0=N with probability 1, and Y_1=N-n with probability 1. Now, the probability that Y_l, l=1, is equal to k is given by; 0 if k=N-n+1, N-n+2,..., N and sum_{i=0,...,n} P[Y_l=k | Y_{l-1} = k+i] P[Y_{l-1} = k+i] o/w. P[Y_l=k | Y_{l-1} = k+i] is the probability that the n entities sampled in the l-th draw consists of i entities of the k+i entities that were not seen by draw l-1 and n-i entities of the N-k-i entities that were already seen by draw l-1. This probability is choose(k+i, i)*choose(N-k-i, n-i) / choose(N, n) Note that this probability is independent of l, i.e., Y_0, Y_1, Y_2,... form a stationary Markov Chain. Thus, to answer your question, the strategy would be to calculate the transition matrix once and then start with either the state vector of Y_0 or of Y_1 (both of which are rather trivial as mentioned above) and calculate the state vector of Y_x by repeatedly multiplying the chosen initial state vector with the transition matrix for the appropriate number of times. The transition matrix is (N+1)x(N+1), so it can be rather large, but the number of non-zero entries in the transition matrix is at most (N+1)*(n+1), so depending on the relative magnitude of n and N, sparse matrix techniques might be helpful (or using a language such as C, FOTRAN, ... for the calculations). HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Spracheinstellung
G'day Tobias, On Wed, 29 Sep 2010 14:01:10 + Keller Tobias (kelleto0) kelle...@students.zhaw.ch wrote: Für unseren Statistik Unterricht brauchen wir das R-Programm. Ich habe das Programm heruntergeladen, deutsch gewählt und installiert. Das Problem ist nach 3mal neu installieren, dass das Programm immer auf englisch kommt. Können Sie mir helfen? R for Windows FAQ 3.2: http://cran.ms.unimelb.edu.au/bin/windows/base/rw-FAQ.html#I-want-R-in-English_0021 HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Mon, 4 Oct 2010 19:45:23 +0800 Berwin A Turlach ber...@maths.uwa.edu.au wrote: Mmh, R my.return - function (vector.of.variable.names) { sapply(vector.of.variable.names, function(x) list(get(x))) } make that: R my.return - function (vector.of.variable.names) { sapply(vector.of.variable.names, function(x) get(x)) } Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
G'day Hans, On Mon, 4 Oct 2010 11:28:15 +0200 Hans Ekbrand h...@sociologi.cjb.net wrote: On Thu, Sep 30, 2010 at 09:34:26AM -0300, Henrique Dallazuanna wrote: You should try: eapply(.GlobalEnv, I)[c('b', 'my.c')] Great! b - c(22.4, 12.2, 10.9, 8.5, 9.2) my.c - sample.int(round(2*mean(b)), 4) my.return - function (vector.of.variable.names) { eapply(.GlobalEnv, I)[vector.of.variable.names] } Well, if you are willing to create a vector with the variable names, then simpler solutions should be possible, i.e. solutions that only operate on the objects of interest and not on all objects in the global environment (which could be a lot depending on your style). For example: R my.return - function (vector.of.variable.names) { sapply(vector.of.variable.names, function(x) list(get(x))) } R str(my.return(c(b,my.c))) List of 2 $ b : num [1:5] 22.4 12.2 10.9 8.5 9.2 $ my.c: int [1:4] 7 5 23 4 Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exponentiate elements of a whole matrix
On Wed, 13 Oct 2010 11:51:39 +0100 Maas James Dr (MED) j.m...@uea.ac.uk wrote: I've tried hard to find a way to exponentiate each element of a whole matrix such that if I start with A A = [ 2 3 2 4] I can get back B B = [ 7.38 20.08 7.38 54.60] I've tried B - exp(A) but no luck. What have you tried exactly? And with which version? This should work with all R versions that I am familiar with, e.g.: R A - matrix(c(2,2,3,4),2,2) R A [,1] [,2] [1,]23 [2,]24 R B - exp(A) R B [,1] [,2] [1,] 7.389056 20.08554 [2,] 7.389056 54.59815 Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] declaring GPL license
G'day Marc, On Thu, 14 Oct 2010 10:46:39 -0500 Marc Schwartz marc_schwa...@me.com wrote: If you want (and you should), create and include a file called COPYING in the 'inst' folder in the package, so that it gets copied to the main package directory upon installation. [...] But that would be against the wishes of R core, from the Writing R Extension manual: Whereas you should feel free to include a license file in your source distribution, please do not arrange to @emph{install} yet another copy of the @acronym{GNU} @file{COPYING} or @file{COPYING.LIB} files but refer to the copies on @url{http://www.r-project.org/Licenses/} and included in the @R{} distribution (in directory @file{share/licenses}). :) Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] declaring GPL license
G'day Stacey, On Thu, 14 Oct 2010 12:36:20 -0400 Stacey Wood sjw...@ncsu.edu wrote: Thanks everyone. Now I know that I should not include another copy of the license, but where should I refer to the copies on http://www.r-project.org/Licenses/? In the DESCRIPTION file? I used to mention that location in the License: line of the DESCRIPTION file, after GPL (=2). But some versions ago, that line got standardised and now it is no longer possible to put additional text onto that line. I guess you could just add a line LicenceLocation: As far as I understand the documentation, you are free to add additional lines beyond those described in Writing R Extensions to the DESCRIPTION file. Some entries described in the manual are optional and for those that are not some restrictions may apply that have to be followed. Personally, I decided not to put any such reference into the DESCRIPTION file of my packages as I guess that people who worry about the lisence will either look into the source package (where I have a copy) or know how to find one. :) HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Optimization
On Wed, 4 Nov 2009 14:48:08 -0800 (PST) s t thamp...@yahoo.com wrote: I'm trying to do the following constrained optimization example. Maximize x1*(1-x1) + x2*(1-x2) + x3*(1-x3) s.t. x1 + x2 + x3 = 1 x1 = 0 and x1 = 1 x2 = 0 and x2 = 1 x3 = 0 and x3 = 1 which are the constraints. I'm expecting the answer x1=x2=x3 = 1/3. This is a quadratic programming problem (mininimising/maximising a quadratic function under linear equality and inequality constraints). There are several R packages that solve such problems, e.g.: R library(quadprog) R Dmat - diag(3) R dvec - rep(1,3)/3 R Amat - cbind(rep(1,3), diag(3), -diag(3)) R bvec - c(1, rep(0,3), rep(-1,3)) R solve.QP(Dmat, dvec, Amat, bvec, meq=1) $solution [1] 0.333 0.333 0.333 $value [1] -0.167 $unconstrainted.solution [1] 0.333 0.333 0.333 $iterations [1] 2 0 $iact [1] 1 You may want to consult: http://cran.r-project.org/web/views/Optimization.html HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour for as.date()
G'day Isabella, On Tue, 10 Nov 2009 18:40:11 -0800 Isabella Ghement isabe...@ghement.ca wrote: I am trying to use the function as.date() from the dates package As far as I can tell, there is no package called dates, did you mean the package date? in R 2.10.0 to convert a character date to a Julian date, as follows: as.date(02-MAY-01, order=mdy) # convert May 2, 2001 to a Julian date [1] 2May1 Are you sure that you are doing what your comments imply? Try: R library(date) R as.date(02-MAY-01, order=mdy) [1] 2May1 R as.date(02-MAY-2001, order=mdy) [1] 2May2001 R as.numeric(as.date(02-MAY-01, order=mdy)) [1] -21428 R as.numeric(as.date(02-MAY-2001, order=mdy)) [1] 15097 However, when trying to convert a character date from the year 2000 to a Julian date, I get an NA instead of the desired Julian date: as.date(02-MAY-00, order=mdy) # convert May 2, 2000 to a Julian date [1] NA Not quite sure why R is unable to handle this type of date (perhaps it thinks it comes from the year 1900?!). My guess it thinks it comes from the year 0. Not sure why it cannot handle such dates. But then, as far as I know, there is actually some discussion about whether the year 0 exist or whether we went straight from 1BC to 1AD.. Is there a way to force R to convert character dates from the year 2000 into Julian dates? Presumably you will need something like: R as.date(sub(-00, -2000, 02-MAY-00)) [1] 2May2000 HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour for as.date()
G'day Isabella, On Tue, 10 Nov 2009 20:11:31 -0800 Isabella Ghement isabe...@ghement.ca wrote: I tried the solution you suggested and find that am having problems getting R to extract the year from an object created by as.date(): Perhaps it would be best to first clarify what you really need. :) If you have to extract the year, why go via as.date? Why not just: R strsplit(02-MAY-00, -)[[1]][3] [1] 00 R as.numeric(strsplit(02-MAY-00, -)[[1]][3]) [1] 0 Or if you have a vector of character strings, each a date in that format: R x - c(02-MAY-00, 02-MAY-01) R sapply(x, function(x) as.numeric(strsplit(x, -)[[1]][3])) 02-MAY-00 02-MAY-01 0 1 HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and HDF5 Question
G'day Scott, On Fri, 13 Nov 2009 09:52:43 -0700 Scott MacDonald scott.p.macdon...@gmail.com wrote: I am trying to load an hdf5 file into R and running into some problems. It's a while that I used hdf5 files and that package in R, but: This builds fine. The library seems to load without issue, but no data is returned when I try to load a file: library(hdf5) hdf5load(test.h5) NULL Is NULL the return of the hdf5load command or are you typing it on the command line? Anyway, .hdf5 files can contain several objects, just as R's .rda file. load() will load an .rda file and put all objects in that file into the workspace. Likewise, hdf5load() loads an hdf5 file and puts all objects in that file into the workspace. Yet, osx:data scott$ h5dump test.h5 HDF5 test.h5 { GROUP / { DATASET dset { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 31 ) / ( 31 ) } DATA { (0): 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, (14): 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, (22): 4194304, 8388608, 16777216, 33554432, 67108864, 134217728, (28): 268435456, 536870912, 1073741824 } } } } Any thoughts? Did you try an ls() after the hdf5load() command? If the hdf5load() command was successfull, an ls() should show you that an object with name dset is now in your workspace; if I read the output above correctly. HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Relase positive with log and zero of negative with 0
G'day Kevin, On Sun, 15 Nov 2009 7:18:18 -0800 rkevinbur...@charter.net wrote: This is a very simple question but I couldn't form a site search quesry that would return a reasonable result set. Say I have a vector: x - c(0,2,3,4,5,-1,-2) I want to replace all of the values in 'x' with the log of x. Naturally this runs into problems since some of the values are negative or zero. So how can I replace all of the positive elements of x with the log(x) and the rest with zero? If you do not mind a warning message: R x - c(0,2,3,4,5,-1,-2) R x - ifelse(x = 0,0, log(x)) Warning message: In log(x) : NaNs produced R x [1] 0.000 0.6931472 1.0986123 1.3862944 1.6094379 0.000 0.000 If you do mind, then: R x - c(0,2,3,4,5,-1,-2) R ind - x0 R x[!ind] - 0 R x[ind] - log(x[ind]) R x [1] 0.000 0.6931472 1.0986123 1.3862944 1.6094379 0.000 0.000 HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gelman 2006 half-Cauchy distribution
G'day Chris, On Fri, 28 May 2010 08:29:30 -0500 Christopher David Desjardins cddesjard...@gmail.com wrote: Hi, I am trying to recreate the right graph on page 524 of Gelman's 2006 paper Prior distributions for variance parameters in hierarchical models in Bayesian Analysis, 3, 515-533. I am only interested, however, in recreating the portion of the graph for the overlain prior density for the half-Cauchy with scale 25 and not the posterior distribution. However, when I try: curve(dcauchy, from=0, to=200, location=0, scale=25) Which version of R do you use? This command creates 12 warnings under R 2.11.0 on my linux machine. Reading up on the help page of curve() would make you realise that you cannot pass the location and scale parameter to dcauchy in the manner you try. I guess you want: R prior - function(x) 2*dcauchy(x,location=0, scale=25) R curve(prior, from=0, to=200) or, more compactly, R curve(2*dcauchy(x, location=0, scale=25), from=0, to=200) Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgl installation failure
G'day, On Sat, 5 Jun 2010 12:51:08 +0530 khush bioinfo.kh...@gmail.com wrote: I am trying to install rgl package under R and getting some errors which is below. install.packages(rgl) Warning in install.packages(rgl) : argument 'lib' is missing: using '/usr/lib/R/library' trying URL 'http://cran.csie.ntu.edu.tw/src/contrib/rgl_0.91.tar.gz' Content type 'application/x-gzip' length 1677498 bytes (1.6 Mb) opened URL == downloaded 1.6 Mb * installing *source* package _rgl_ ... checking for gcc... gcc -m32 -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -m32 -std=gnu99 accepts -g... yes checking for gcc -m32 -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -m32 -std=gnu99 -E checking for gcc... (cached) gcc -m32 -std=gnu99 checking whether we are using the GNU C compiler... (cached) yes checking whether gcc -m32 -std=gnu99 accepts -g... (cached) yes checking for gcc -m32 -std=gnu99 option to accept ISO C89... (cached) none needed checking for libpng-config... no checking libpng... checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking png.h usability... no checking png.h presence... no checking for png.h... no checking for png_read_update_info in -lpng... no You should heed indications as the above, i.e. that items checked for are not found. [...] In file included from pixmap.cpp:14: pngpixmap.h:3:17: error: png.h: No such file or directory And this compiler error seems to be directly linked to the issues identified by configure; and all further error come from the fact that png.h cannot be found. You seem to be using some flavour of Unix, presumably GNU/Linux; but from your posting it is impossible to say which one. On a Debian based system (e.g. Kubuntu 9.10 which I am using), I would issue the command (assuming that apt-file is installed and initialised): ber...@bossiaea:~$ apt-file find include/png.h libpng12-dev: /usr/include/png.h and then check whether the libpng12-dev package is installed. If not, I would install it and retry. Perhaps that would be all that is needed to compile rgl, but there might be other packages missing. If you have a system based on RedHat and Suse, you will have to figure out how to use the package management tools on those systems to find out which packages have to be installed additionally so that you are able to compile rgl. HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] forcing a zero level in contr.sum
G'day Stephen, On Wed, 7 Jul 2010 16:41:18 -0400 Bond, Stephen stephen.b...@cibc.com wrote: Please, do not post if you do not know the answer. People will see this has answers and skip. I tried with mat1=contrasts(fixw$snconv) mat1=mat1[,-2] summary(frm2sum - glm(resp.frm ~ C(snconv,contr=mat1)+mprime+mshape,data=fixw,family=quasibinomial)) the unwanted level is still there. Unbelievable. model.matrix() instead of summary() would reveal that the estimate corresponding to the unwanted level is probably estimating something quite different then what you think it estimates. ?C and look at the how.many argument, presumably you want C(snconv, contr=mat1, how.many=NCOL(mat1)) To see what is going on, consider: R ( tmp - gl(4,3) ) [1] 1 1 1 2 2 2 3 3 3 4 4 4 Levels: 1 2 3 4 R options(contrasts=c(contr.sum, contr.poly)) R ( tt - contrasts(tmp) ) [,1] [,2] [,3] 1100 2010 3001 4 -1 -1 -1 R tt - tt[,-2] R contrasts(tmp, 2) - tt R tmp [1] 1 1 1 2 2 2 3 3 3 4 4 4 attr(,contrasts) [,1] [,2] 110 200 301 4 -1 -1 Levels: 1 2 3 4 R contrasts(tmp) - tt R tmp [1] 1 1 1 2 2 2 3 3 3 4 4 4 attr(,contrasts) [,1] [,2] [,3] 110 0.2886751 200 -0.8660254 301 0.2886751 4 -1 -1 0.2886751 Levels: 1 2 3 4 HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] advice/opinion on - vs = in teaching R
G'day all, On Fri, 15 Jan 2010 09:08:44 - (GMT) (Ted Harding) ted.hard...@manchester.ac.uk wrote: On 15-Jan-10 08:14:04, Barry Rowlingson wrote: [...] I would regard modifying a variable within the parameters of a function call as pretty tasteless. What does: foo(x-2,x) or foo(x,x-3) do that couldn't be done clearer with two lines of code? Depending on the definition of foo() more than two lines might be necessary to clarify what this code is actually supposed to do. :) Remember: 'eschew obfuscation'. Absolutely. Tasteless or not, the language allows it to be done; and therefore discussion of distinctions between ways of doing it is relevant to Erin's question! And a full discussion of these examples would include to warn about the side effects of lazy evaluation: R rm(list=ls(all=TRUE)) R foo - function(x, y){ + if(x 0 ) x else x + y} R foo(2, x - 10) [1] 2 R x Error: object 'x' not found R foo(-2, x - 10) [1] 8 R x [1] 10 Having said this, I often use plot(fm - lm(y ~ . , somedata)). And, yes, students have complained that the code I gave them was not working, that they got a strange error message and they promised that they had typed exactly what I had written on the lab sheet. On checking, they had, of course, typed plot(fm = lm(y ~ . , somedata)) Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it solve.QP or is it me?
G'day Talbot, regarding the subject line, perhaps neither, it may be your OS, chip or maths library. :) On my Intel Core2 Duo machine running under linux all your examples work without error message. What kind of machine are you using? On Fri, 21 Sep 2007 12:38:05 -0400 Talbot Katz [EMAIL PROTECTED] wrote: [..] I was wondering whether anyone has any tricks to share for mitigating these kind of problems and still generating feasible solutions. I believe that one of the recommendations in the numerical literature is to try to keep the norms of the columns of the matrix A on similar scales. Try to divide the first two columns of A (and the first two entries in the vector b) by the square root of n. Hope that helps. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-squared value for linear regression passing through origin using lm()
G'day Ralf, On Fri, 19 Oct 2007 09:51:37 +0200 Ralf Goertz [EMAIL PROTECTED] wrote: Thanks to Thomas Lumley there is another convincing example. But still I've got a problem with it: x-c(2,3,4);y-c(2,3,3) [...] That's okay, but neither [...] nor [...] give the result of summary(lm(y~x+0)), which is 0.9796. Why should either of those formula yield the output of summary(lm(y~x+0)) ? The R-squared output of that command is documented in help(summary.lm): r.squared: R^2, the 'fraction of variance explained by the model', R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), where y* is the mean of y[i] if there is an intercept and zero otherwise. And, indeed: 1-sum(residuals(lm(y~x+0))^2)/sum((y-0)^2) [1] 0.9796238 confirms this. Note: if you do not have an intercept in your model, the residuals do not have to add to zero; and, typically, they will not. Hence, var(residuals(lm(y~x+0)) does not give you the residual sum of squares. In order to save the role of R^2 as a goodness-of-fit indicator R^2 is no goodness-of-fit indicator, neither in models with intercept nor in models without intercept. So I do not see how you can save its role as a goodness-of-fit indicator. :) Since you are posting from a .de domain, I assume you will understand the following quote from Tutz (2000), Die Analyse kategorialer Daten, page 18: R^2 misst *nicht* die Anpassungsguete des linearen Modelles, es sagt nichts darueber aus, ob der lineare Ansatz wahr oder falsch ist, sondern nur ob durch den linearen Ansatz individuelle Beobachtungen vorhersagbar sind. R^2 wird wesentlich vom Design, d.h. den Werten, die x annimmt bestimmt (vgl. Kockelkorn (1998)). The latter reference is: Kockelkorn, U. (1998). Lineare Modelle. Skript, TU Berlin. in zero intercept models one could use the same formula like in models with a constant. I mean, if R^2 is the proportion of variance explained by the model we should use the a priori variance of y[i]. 1-var(residuals(lm(y~x+0)))/var(y) [1] 0.3567182 But I assume that this has probably been discussed at length somewhere more appropriate than r-help. I am sure about that, but it was also discussed here on r-help (long ago). The problem is that this compares two models that are not nested in each other which is a quite controversial thing to do; some might even go so far as saying that it makes no sense at all. The other problem with this approaches is illustrated by my example: set.seed(20070807) x - runif(100)*2+10 y - 4+rnorm(x, sd=1) 1-var(residuals(lm(y~x+0)))/var(y) [1] -0.04848273 How do you explain that a quantity that is called R-squared, implying that it is the square of something, hence always non-negative, can become negative? Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indices of rows containing one or more elements 0
G'day Stanley, On Wed, 13 Feb 2008 10:40:09 +0800 Ng Stanley [EMAIL PROTECTED] wrote: Hi, Given test - matrix(c(0,2,0,1,3,5), 3,2) test[test0] [1] 2 1 3 5 These are values 0 which(test0) [1] 2 4 5 6 These are array indices of those values 0 which(apply(test0, 1, all)) [1] 2 This gives the row whose elements are all 0 I can't seem to get indices of rows containing one or more elements 0 which(apply(test0, 1 any)) instead of which(apply(test0, 1, all)) ?? HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stumped by eval
G'day Peter, On Wed, 13 Feb 2008 08:03:07 +0100 Peter Dalgaard [EMAIL PROTECTED] wrote: Ross Boylan wrote: In the following example, the inner evaluation pulls in the global value of subset (a function) rather than the one I thought I was passing in (a vector). Can anyone help me understand what's going on, and what I need to do to fix the problem? [...] The point is that subset (and offset) arguments are subject to the same evaluation rules as the terms inside the formula: First look in data, then in the environment of the formula, which in this case is the global environment. Perhaps I have a [senior|blonde]-day today, but this does not seem to be the full explanation about what is going on to me. According to this explanation the following should not work: lm(Reading~0+Spec+Reader, netto, subset=c(1) ) Call: lm(formula = Reading ~ 0 + Spec + Reader, data = netto, subset = c(1)) Coefficients: Spec Reader 1 NA since the value passed to subset is not part of data and not in the global environment. But, obviously, it works. OTOH, if we change f0 to f0 function(formula, data, subset, na.action) { lm(formula, data, subset=subset, na.action=na.action) } then we get the same behaviour as with Ross's use of f1 inside of f0: t3 - f0(Reading~0+Spec+Reader, netto, c(1) ) Error in xj[i] : invalid subscript type 'closure' More over, with the original definition of f0: f0 function(formula, data, subset, na.action) { f1(formula, data, subset, na.action) } (f1(Reading~0+Spec+Reader, netto, subset= Spec==1 )) Reading Spec Reader 1 11 1 f0(Reading~0+Spec+Reader, netto, subset= Spec==1 ) Error in xj[i] : invalid subscript type 'closure' Given your explanation, I would have expected this to work. Reading up on `subset' in ?model.frame also does not seem to shed light onto what is going on. Remaining confused. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quadratic Programming
G'day Jorge, On Fri, 15 Feb 2008 17:51:16 -0500 Jorge Aseff [EMAIL PROTECTED] wrote: I am using solve.QP (from quadprog) to solve a standard quadratic programming problem: min_w -0.5*w'Qw st ... I would like solve.QP to do two things: 1) to start the optimization from a user-supplied initial condition; i.e., from a vector w_0 that satisfies the constraints, Not possible. solve.QP is based on the Goldfarb-Idnani algorithm which first calculates the unconstrained solution of the problem and then checks for violated constraints; if such exist, then they are iteratively enforced until all constraints are satisfied. If you want to specify starting values (or have warm starts), then you would need to use other kind of algorithms. and 2) to return the values of the lagrange multiplieres associated with the constraints. See: https://stat.ethz.ch/pipermail/r-devel/2007-December/047636.html Hope this helps. Best wishes, Berrwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questions about EM algorithm
G'day Sean, On Fri, 15 Feb 2008 09:12:22 +0800 Hung-Hsuan Chen (Sean) [EMAIL PROTECTED] wrote: Assume I have 3 distributions, x1, x2, and x3. x1 ~ normal(mu1, sd1) x2 ~ normal(mu2, sd2) x3 ~ normal(mu3, sd3) y1 = x1 + x2 y2 = x1 + x3 Now that the data I can observed is only y1 and y2. It is easy to estimate (mu1+m2), (mu1+mu3), (sd1^2+sd2^2) and (sd1^2+sd3^2) by EM algorithm since Isn't it a bit of an overkill to use an EM algorithm here? There are explicit formula for the estimators (namely the sample average and the sample variance) of those quantities. O.k., these formula may not yield MLE, but it should be very easy to correct for that. y1 ~ normal(mu1+mu2, sqrt(sd1^2+sd2^2)) and y2 ~ normal(mu1+mu3, sqrt(sd1^2+sd3^2)) However, I want to estimate mu1, mu2, mu3, sd1, sd2, and sd3. Is it possible to do so by EM algorithm (or any other estimation methods like gibbs sampling or MLE) ? EM algorithms are a way of calculating MLEs by framing the problem (explicitly or implicitly) in a missing data context. So EM algorithm or MLE are not different methods. The former is a way of calculating the latter; of course, the latter can also be calculated by directly maximising the (log)likelihood function. You did not say so explicitly, but I guess you are assuming that x1, x2 and x3 are independent, are you? At least under this assumption it is easy to deduce that the distribution of y1 and y2 are as you stated. If you do not assume independence of x1, x2, x3, what other assumptions do you do to arrive at these distributions for y1 and y2? Under the assumption of independence of x1, x2, x3, one would also have that Cov(y1,y2)=sd1^2. Together with the the fact that Var(y1)=sd1^2+sd2^2 and Var(y2)=sd1^2+sd3^2, this makes the three standard deviations identifiable, and you can readily estimate them. Actually, if x1, x2 and x3 are independent, then they would be jointly normal, hence y1 and y2 would be jointly normal, whence it would be easy to write down the likelihood of the parameters given y1 and y2 and find the MLEs for sd1, sd2, sd3. For mu1, mu2 and mu3 you have an identifiable problem, the two triples (mu1, mu2, mu3) and (mu1+c, mu2-c, mu3-c) (where c is any fixed number) would yield exactly the same likelihood value. Hence, these three parameters are not identifiable. You would have to fix one of them arbitrarily, say mu1=0. Best wishes, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debugging a try() statement
G'day Juliet, On Tue, 19 Feb 2008 21:35:11 -0500 Juliet Hannah [EMAIL PROTECTED] wrote: I implemented a try() statement that looks like: - function(index) { reduced_model - try(glm.fit(X4,n,family=poisson(link=log))) full_model - try(glm.fit(X5,n,family=poisson(link=log))) if (inherits(reduced_model,try-error) || inherits(full_model,try-error)) return(NA) else { p - pchisq(reduced_model$deviance - full_model$deviance, reduced_model$df.residual - full_model$df.residual, lower.tail= FALSE) return (p) } } After an error occurs NA is returned. But after this occurs, all values returned after this remain as an NA even though this error should occur only 1/500 to 1/1000 times. [...] Is there anything obviously incorrect with my function above. Yes, namely: 1) you do not assign it to any object, so how do you call it? 2) it has a formal argument which is not used anywhere. I guess that would be considered bad programming style in any programming language. 3) The way your code is formatted, the return(NA) is on the same line has the if-clause. Hence, at the end of that statement R's parser has read a complete statement which is then executed. Whence, when the parser comes across the else, a syntax error should be produced. Non of these would explain the problem that you mention, but point 3) raises the question why this code actually works. From what is shown, there is (at least to me) no obvious explanation for the behaviour that you see. If not, I will post a full example to illustrate my question. Well, please not. :) It is pretty obvious, that the code above is derived by cut paste from a larger body of code and does not run on its own. So there is no guarantee that this piece of the code is the one where the problem lies and you cannot seriously expect people to debug such code for you (and for that reason your previous posting probably did not get an answer). Likewise, you should not expect people to reverse-engineer and debug a large body of code for you. You should do what the footer of e-mails to r-help requests, namely: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Typically, while trying to produce such a commented, minimal, self-contained, reproducible example of the problematic code one finds the bug. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained regression
G'day Carlos, On Mon, Mar 3, 2008 at 11:52 AM Carlos Alzola [EMAIL PROTECTED] wrote: I am trying to get information on how to fit a linear regression with constrained parameters. Specifically, I have 8 predictors , their coeffiecients should all be non-negative and add up to 1. I understand it is a quadratic programming problem but I have no experience in the subject. I searched the archives but the results were inconclusive. Could someone provide suggestions and references to the literature, please? A suggestion: library(MASS) ## to access the Boston data designmat - model.matrix(medv~., data=Boston) Dmat - crossprod(designmat, designmat) dvec - crossprod(designmat, Boston$medv) Amat - cbind(1, diag(NROW(Dmat))) bvec - c(1, rep(0,NROW(Dmat)) meq - 1 library(quadprog) res - solve.QP(Dmat, dvec, Amat, bvec, meq) The solution seems to contain values that are, for all practical purposes, actually zero: res$solution [1] 4.535581e-16 2.661931e-18 1.016929e-01 -1.850699e-17 [5] 1.458219e-16 -3.892418e-15 8.544939e-01 0.00e+00 [9] 2.410742e-16 2.905722e-17 -5.700600e-20 -4.227261e-17 [13] 4.381328e-02 -3.723065e-18 So perhaps better: zapsmall(res$solution) [1] 0.000 0.000 0.1016929 0.000 0.000 0.000 [7] 0.8544939 0.000 0.000 0.000 0.000 0.000 [13] 0.0438133 0.000 So the estimates seem to follow the constraints. And the unconstrained solution is: res$unconstrainted.solution [1] 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 [5] 2.686734e+00 -1.776661e+01 3.809865e+00 6.922246e-04 [9] -1.475567e+00 3.060495e-01 -1.233459e-02 -9.527472e-01 [13] 9.311683e-03 -5.247584e-01 which seems to coincide with what lm() thinks it should be: coef(lm(medv~., Boston)) (Intercept) crimzn indus chas 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 2.686734e+00 noxrm age dis rad -1.776661e+01 3.809865e+00 6.922246e-04 -1.475567e+00 3.060495e-01 tax ptratio black lstat -1.233459e-02 -9.527472e-01 9.311683e-03 -5.247584e-01 So there seem to be no numeric problems. Otherwise we could have done something else (e.g calculate the QR factorization of the design matrix, say X, and give the R factor to solve.QP, instead of calculating X'X and giving that one to solve.QP). If the intercept is not supposed to be included in the set of constrained estimates, then something like the following can be done: Amat[1,] - 0 res - solve.QP(Dmat, dvec, Amat, bvec, meq) zapsmall(res$solution) [1] 6.073972 0.00 0.109124 0.00 0.00 0.00 0.863421 [8] 0.00 0.00 0.00 0.00 0.00 0.027455 0.00 Of course, since after the first command in that last block the second column of Amat contains only zeros Amat[,2] [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 we might as well have removed it (and the corresponding entry in bvec) Amat - Amat[, -2] bvec - bvec[-2] before calling solve.QP(). Note, the Boston data set was only used to illustrate how to fit such models, I do not want to imply that these models are sensible for these data. :-) Hope this helps. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange paste, string or package problem?
G'day Thomas, On Tue, 4 Mar 2008 14:40:35 +1300 Thomas Allen [EMAIL PROTECTED] wrote: I came across this strange bug the other day, I'm not sure how to solve it and I wonder if anyone can even replicate it. Step 1) Make an R package using the package.skeleton() command with only these two functions: error - function(){ cmd - paste( -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, –a ,1,sep=) cat(cmd,\n) } Now why does that e28093 replace one of the - in the first command? With my mailtool, the - before the last a looks a bit longer than the others; it definitely seems to be a different tool. Also, if I cut and paste this function into an emacs buffer on my Kubuntu machine, I see: error - function(){ cmd - paste( -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, -a ,1, \u2013a ,1,sep=) cat(cmd,\n) } So somehow you must have entered not a - but some other symbol before that last a. And I guess what you see is a result of the locale and the character encoding that you are working in. But others would know more about this than I and can probably explain better what is going on. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting formula from an lm object
G'day Murray, On Sun, 24 Aug 2008 18:34:39 +1200 Murray Jorgensen [EMAIL PROTECTED] wrote: I want to extra the part of the formula not including the response variable from an lm object. For example if the lm object ABx.lm was created by the call ABx.lm - lm( y ~ A + B + x, ...) Then ACx.lm is saved as part of a workspace. I wish to extract ~ A + B + x. Later in my code I will fit another linear model of the form z ~ A + B + x for some other response variable z. I would be grateful for any suggestions of a nice way to do this. AFAIK, a formula is essentially a list of two or three components. The first component is ~. The second is the LHS of the formula if there are three components; otherwise the RHS of the formula. The third component, if it exists, is the RHS of the formula. So storing ~ A + B + x and manipulating this part for different responses could turn out to be painful; you would have to insert the new LHS as the second component of the list. I would suggest that it is easier to store the complete formula and just manipulate the LHS; see: R library(MASS) R fm - lm(time~dist+climb, hills) R formula(fm) time ~ dist + climb R formula(fm)[[1]] `~` R formula(fm)[[2]] time R formula(fm)[[3]] dist + climb R tt - formula(fm) R tt[[2]] - NULL R tt ~dist + climb R tt - formula(fm) R class(tt[[2]]) [1] name R typeof(tt[[2]]) [1] symbol R tt[[2]] - as.name(y) R tt y ~ dist + climb R tt - formula(fm) R tt[[2]] - as.symbol(z) R tt z ~ dist + climb HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R to generate reports
G'day all, On Fri, 5 Sep 2008 18:10:37 -0400 Jorge Ivan Velez [EMAIL PROTECTED] wrote: Dear Bill, See http://www.statistik.lmu.de/~leisch/Sweave/ The following R News article might also be of interest: Sven Garbade and Peter Burgard. Using R/Sweave in everyday clinical practice. R News, 6(2):26-31, May 2006. If one is new to LaTeX, the following articles might be of interest too: Max Kuhn. Sweave and the open document format - the odfWeave package. R News, 6(4):2-8, October 2006. Gregor Gorjanc. Using sweave with lyx. R News, 8(1):2-9, May 2008. Finally, check out: http://bioinformatics.oxfordjournals.org/cgi/content/full/24/2/276 Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with function rep
G'day Julien, On Thu, 12 Jun 2008 16:48:43 +0200 Julien Hunt [EMAIL PROTECTED] wrote: I am currently writing a program where I need to use function rep. The results I get are quite confusing. Given two vectors A and B, I want to replicate a[1] b[1] times, a[2] b[2] times and so on. All the entries of vector B are positive integers. My problem comes from the fact that if I sum up all the elements of B, [...] Others mentioned already the need for a reproducible example. But my guess is that the elements in B are calculated. Recently, I was sent the following code by a colleague of mine: --- Hi Berwin, Try this in R2.7.0 pai = c(.4,.1,.1,.4) s = .5 p = diag(1-s, 4) + s * t(matrix(pai, 4, 4)) f = diag(pai) %*% p z = 200*f ### bug??? z sum(z) length(rep(1:16, z)) length(rep(1:16, round(z))) I tested the code and my answer was: --- Interesting variation on FAQ 7.31: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Look at z-round(z) and where the negative residuals are. My money is on you having the same problem and that using round(B) instead of B in the rep() command will solve your problem. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug in nls?
G'day Petr, On Thu, 26 Jun 2008 13:57:39 +0200 Petr PIKAL [EMAIL PROTECTED] wrote: I just encountered a strange problem with nls formula. I tried to use nls in cycle but I was not successful. I traced the problem to some parse command. [...] I am not sure if this behaviour is a bug or feature. [...] It is definitely a feature. It is an error to believe that all modelling functions that use modelling formulae use the same syntax for their modelling formulae. This is, perhaps, easiest realised by observing how lm() and nls() interpret * and / in model formulae. In nls(), [..] can be used to index parameters, if the parameter is allowed to change between groups in the data. This seems to be a little known feature, though there is an example that uses that feature in MASS. The contributed documentation An Introduction to R: Software for Statistical Modelling Computing by Petra Kuhnert and Bill Venables, available from CRAN, also has such an example on pages 134 and 230. The fact that nls() allows you to use [..] to index parameters in the model formulae seems to conflict with the way you wanted to specify the observed values in the formula. I guess Gabor's solution is a fix for your problem. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recursively divide a value to get a sequence
G'day all, On Wed, 09 Jul 2008 20:33:39 +1000 Jim Lemon [EMAIL PROTECTED] wrote: On Wed, 2008-07-09 at 11:40 +0200, Anne-Marie Ternes wrote: Hi, if given the value of, say, 15000, I would like to be able to divide that value recursively by, say, 5, and to get a vector of a determined length, say 9, the last value being (set to) zero- i.e. like this: 15000 3000 600 120 24 4.8 0.96 0.192 0 These are in fact concentration values from an experiment. For my script, I get only the starting value (here 15000), and the factor by which concentration is divided for each well, the last one having, by definition, no antagonist at all. I have tried to use seq, but it can only do positive or negative increment. I didn't either find a way with rep, sweep etc. These function normally start from an existing vector, which is not the case here, I have only got a single value to start with. I suppose I could do something loopy, but I'm sure there is a better way to do it. Well, if you really want to do it recursively (and maybe loopy as well) recursivdiv-function(x,denom,lendiv,firstpass=TRUE) { if(firstpass) lendiv-lendiv-1 if(lendiv 1) { divvec-c(x/denom,recursivdiv(x/denom,denom,lendiv-1,FALSE)) cat(divvec,ndiv,\n) } else divvec-0 if(firstpass) divvec-c(x,divvec) return(divvec) } Or, a bit more compactly: recursivdiv - function(x,denom,lendiv) { if(lendiv 1) { divvec-c(x, Recall(x/denom,denom,lendiv-1)) } else divvec-0 return(divvec) } which will continue to work if the function is renamed (say, rcd - recursivdiv) due to the use of Recall() Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
G'day all, On Sun, 13 Jul 2008 15:55:38 +0100 (BST) (Ted Harding) [EMAIL PROTECTED] wrote: On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: [...] A large P-value means nothing more than needing more data. No conclusion is possible. I would have thought that we need more data would qualify as a conclusion. :) Please read the classic paper Absence of Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a classic paper with that title? If so, I'd be pleased to have a reference to it! Of course, I do not know for sure which paper Frank has in mind, but google and google schoar readily come up with papers/editorials that have a nearly identical title: http://www.bmj.com/cgi/content/full/311/7003/485 http://bmj.bmjjournals.com/cgi/content/full/328/7438/476 (see also http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831) http://www.ncbi.nlm.nih.gov/pubmed/6829975 My money is on Frank having the first of these publications in mind. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAMESPACE vs internal.Rd
G'day Christophe, On Wed, 16 Jul 2008 15:10:05 +0200 [EMAIL PROTECTED] wrote: Hi the list, When we use package.skeleton, it create some file in the man directorie. - If we use package.skeleton with namespace=FALSE, it create a file toto-internal.Rd - If we use package.skeleton with namespace=TRUE, it does not create the file toto-internal.Rd Why is that ? My understanding from my reading of Writing R Extension is : 1) All functions/objects of a package that a user can call/see have to be documented in order for the package passing R CMD check without warnings/errors. 2) Users can see all functions/objects in packages without a namespace, hence everything has to be documented. For functions that you do not want users to call directly, since they are help functions for the main functions of your package, you can just create entries in internal.Rd (or toto-internal.Rd) without writing a complete help page for these functions. R CMD check will accept this as documentation. 3) In packages with a namespace, you decide which function/objects the user can see by exporting them. Everything not exported is supposed to be internal to the package and should not be accessed directly by users (though they can via :::). Functions/objects that are not exported do not need to be documented, hence no need for the toto-internal.Rd stub. HTH (and HTIC). Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAMESPACE vs internal.Rd
G'day Christophe, On Wed, 16 Jul 2008 16:11:15 +0200 [EMAIL PROTECTED] wrote: So the main idea is - With NAMESPACE, you do not document the not-for-user because they don't have to be documented - Witout NAMESPACE, you document the not-for-user with a toto-internal.Rd that say not for user That's clear. Is it stupid to consider to use both technique at the same time ? As far as I know, R will not complain if you write documentation for objects that you do not have to document. R CMD check will complain if an object accessible to a user of the package is not documented. A quick perusal of the packages installed on my machine shows that there are several packages that have a namesspace and a -internal file, e.g. the MASS package (from the VR bundle) and the sm package. But both these packages existed before the namespace mechanism was introduced. So the MASS-internal.Rd and sm-internal.Rd may be remainders from the time before namespaces were introduced. You would have to ask the maintainer(s) of such packages why they use a namespace and a -internal.Rd file. I can see no reason but a historical one that the -internal.Rd file stems from the time before namespaces and was not deleted after their introduction (I know that I wouldn't delete such a file if it weren't necessary). - some fonction will be accessible (regular function) - some function will be hidden (function starting with .) - some function will be forbiden (function not in namespace) We are actually talking objects (everything in R is an object) which could be anything, a function, a data frame or ... But if you want to keep the discussion restricted to functions then I would want to point out that functions that start with a . are only hidden from the ls() function and that this has nothing to do with a namespace. According to my understanding, if your package has a namespace, then everything that you do not export explicitly is not visible to users of your package. The Writing R Extensions manual has an example for an exportPattern() directive that exports all variables that do not start with a period, but that would export everything else. I guess writing a regular expression that says export everything that does not start with a dot but do not export foo and bar would be not trivial to write (at least not for me). And your distinction between hidden and forbidden functions is spurious because either function could be accessed via the ::: operator if the user knows that it is in the package (though not exported). Thus, if you use a namespace, then all the objects that you export are visible to the users of the package; all other objects are not visible (but can be accessed via :::). Objects that are not visible do not need to be documented (for R CMD check to succeed), but it is no error to document them. Objects that are visible to the users of the package have to be documented. HTH (and HTIC). Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate differences - strange outcome
On Thu, 17 Jul 2008 11:47:42 +0200 Kunzler, Andreas [EMAIL PROTECTED] wrote: I ran into some trouble by calculating differences. For me it is important that differences are either 0 or not. If that is the case, restrict yourself to calculating with integers. :) So I don't understand the outcome of this calculation 865.56-(782.86+0+63.85+18.85+0) [1] -1.136868e-13 I run R version 2.71 on WinXP I could solve my problem by using round() but I would like to know the reason. Maybe someone can help me? FAQ 7.31 and the reference cited therein should shed light on your problem. Best wishes, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAMESPACE vs internal.Rd
G'day Christophe, On Wed, 16 Jul 2008 18:22:49 +0200 [EMAIL PROTECTED] wrote: Thanks for your answer. My pleasure. I guess writing a regular expression that says export everything that does not start with a dot but do not export foo and bar would be not trivial to write (at least not for me). The NAMESPACE created by package.skeleton contain a single line : exportPattern(^[[:alpha:]]+) I guess that it is what you just say... Not really. :) The regular expression ^[[:alpha:]]+ matches, as far as I know, all objects that have one or more alphabetic character at the beginning. The Writing R Extensions manual suggests the directive exportPattern(^[^\\.]) exports all variables that do not start with a period. As far as I can tell, both these instructions have more or less the same effect (assuming that there are no objects with non-standard names in your package; and I am not enough of an R language lawyer to know what would happen in such a case). I was commenting on your classification as: - some fonction will be accessible (regular function) - some function will be hidden (function starting with .) - some function will be forbiden (function not in namespace) Say, you have accessible functions called fubar, rabuf, main and something.incredible.useful, some hidden functions whose name start with a . and forbidden functions (i.e. not exported) with names foo and bar. (Though, as I commented earlier, it is practically impossible to implement forbidden functions, users can always access them if they want using :::.) Both of the directives above would export fubar, rabuf, main, something.incredible.useful, foo and bar. So my challenge was to come up with a regular expression for exportPattern that says export everything that does not start with a dot but do not export foo and bar, i.e. a regular expression that would only export the accessible functions. HTC. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] truncated normal
G'day all, On Wed, 23 Jul 2008 20:48:59 -0400 Duncan Murdoch [EMAIL PROTECTED] wrote: On 23/07/2008 8:17 PM, cindy Guo wrote: Yes, I know. I mean if I want to generate 100 numbers from N(0,1)I((0,1),(5,10)). There are two intervals (0,1) and (5,10). Then the function will give 50 numbers in the first interval and 50 in the other. I can think of two strategies: 1) Use rtnorm from the msm *package* to sample from the normal distribution truncated to the interval from your smallest lower bound and your largest upper bound; and then use rejection sampling. I.e. for your original example N(0,1)I((0,1),(2,4)) sample from a N(0,1)I(0,4) distribution and reject all observations that are between 1 and 2, sample sufficiently many points until you have kept the required number. But this could be rather inefficient for your second example N(0,1)I((0,1),(5,10)). 2) If I am not mistaken, truncating the normal distribution to more than one interval essentially creates a mixture distribution where the components of the mixture are normal distributions truncated to a single interval. The weights of the mixture are given by the relative probability with which an observation from a normal distribution falls into each of the intervals. Thus, an alternative strategy, exemplified using your second example, would be (code untested): samp1 - rtnorm(100,mean=0,sd=1, lower=0,upper=1) samp2 - rtnorm(100,mean=0,sd=1, lower=5,upper=10) p1 - pnorm(1)-pnorm(0) p2 - pnorm(10)-pnorm(5) ## Take heed about Duncan's remark on ## evaluating such quantities p - p1/(p1+p2) ind - runif(100) p finalsample - samp1 finalsample[!ind] - samp2[!ind] HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with which
G'day Melanie, On Wed, 23 Apr 2008 17:46:56 -1000 Melanie Abecassis [EMAIL PROTECTED] wrote: This doesn't seem to happen with integers. Am I missing something ?? Yes, FAQ 7.31: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Is there a better function for non-integers ? which(sapply(tt, function(x) isTRUE(all.equal(x,1.7 [1] 8 seems to work. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is my understanding of rlnorm correct?
G'day Phil, On Sun, 4 May 2008 14:05:09 +1000 phil colbourn [EMAIL PROTECTED] wrote: rlnorm takes two 'shaping' parameters: meanlog and sdlog. meanlog would appear from the documentation to be the log of the mean. eg if the desired mean is 1 then meanlog=0. These to parameters are the mean and the sd on the log scale of the variate, i.e. if you take the logarithm of the produced numbers then those values will have the given mean and sd. If X has an N(mu, sd^2) distribution, then Y=exp(X) has a log-normal distribution with parameters mu and sd. R set.seed(1) R y - rlnorm(1, mean=3, sd=2) R summary(log(y)) Min. 1st Qu. MedianMean 3rd Qu.Max. -4.343 1.653 2.968 2.987 4.355 10.620 R mean(log(y)) [1] 2.986926 R sd(log(y)) [1] 2.024713 I noticed on wikipedia lognormal page that the median is exp(mu) and that the mean is exp(mu + sigma^2/2) http://en.wikipedia.org/wiki/Log-normal_distribution Where mu and sigma are the mean and standard deviation of a normal variate which is exponentiated to obtain a log normal variate. And this holds for the above example (upto sampling variation): R mean(y) [1] 143.1624 R exp(3+2^2/2) [1] 148.4132 So, does this mean that if i want a mean of 100 that the meanlog value needs to be log(100) - log(sd)^2/2? A mean of 100 for the log-normal variate? In this case any set of mu and sd for which exp(mu+sd^2/2)=100 (or mu+sd^2/2=log(100)) would do the trick: R mu - 2 R sd - sqrt(2*(log(100)-mu)) R summary(rlnorm(1, mean=mu, sd=sd)) Min. 1st Qu.Median Mean 3rd Qu. Max. 4.010e-04 1.551e+00 7.075e+00 1.006e+02 3.344e+01 3.666e+04 R mu - 4 R sd - sqrt(2*(log(100)-mu)) R summary(rlnorm(1, mean=mu, sd=sd)) Min. 1st Qu.Median Mean 3rd Qu. Max. 0.9965 25.9400 56.0200 101.2000 115.5000 3030. R mu - 1 R sd - sqrt(2*(log(100)-mu)) R summary(rlnorm(1, mean=mu, sd=sd)) Min. 1st Qu.Median Mean 3rd Qu. Max. 9.408e-05 4.218e-01 2.797e+00 8.845e+01 1.591e+01 7.538e+04 Note that given the variation we would expect in the mean in the last example, the mean is actually close enough to the theoretical value of 100: R sqrt((exp(sd^2)-1)*exp(2*mu + sd^2)/1) [1] 36.77435 HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about se of predicted glm values
G'day Jarrett, tapply(y,x,mean) a b 1.00 25.58 On Mon, 5 May 2008 20:21:26 -0700 Jarrett Byrnes [EMAIL PROTECTED] wrote: Hey, all. I had a quick question about fitting new glm values and then looking at the error around them. I'm working with a glm using a Gamma distribution and a log link with two types of treatments. However, when I then look at the predicted values for each category, I find for the one that is close to 0, And this does not surprise you, since with your data: tapply(y,x,mean) a b 1.00 25.58 So wouldn't you expect one predicted value to be close to 1 instead of zero? the error (using se.fit=T with predicted) actually makes it overlap 0. This is not possible, as non-0 values have no meaning. Am I missing something in the interpretation? Yes. :) For GLMs, predict returns by default the predicted values on the linear predictor scale, not on the response scale. Negative values for the linear predictor are, of course, possible and may have meaning. Look closely at the pictures that you have created. In the first one, for x=b, the values are around 30, in the picture with the fitted value the prediction for x=b is around 3; clearly another scale (namely the scale of the linear predictor). #get predicted values and their error a.fit-predict(my.glm, data.frame(x=a), se.fit=T) b.fit-predict(my.glm, data.frame(x=b), se.fit=T) Try: a.fit-predict(my.glm, data.frame(x=a), se.fit=T, type=response) b.fit-predict(my.glm, data.frame(x=b), se.fit=T, type=response) Hope this helps. Best wishes, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function
G'day Shuba, On Thu, 15 May 2008 12:18:58 +0530 Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: Getting a strange result using ?apply. Please look into the below codes: d=data.frame(a=c(1,2,3),b=c(A,B,C),c=c(TRUE,FALSE,FALSE),d=c(T,F,F)) class(d[,1]) [1] numeric class(d[,2]) [1] factor class(d[,3]) [1] logical class(d[,4]) [1] logical apply(d,2,class) a b c d character character character character [] Why is this so? ?apply The first argument to apply is an *array*, not a data.frame. In an array, all elements have to be of the same type, so when your data.frame is coerced into an array the target type of the coercion depends on which components you select. How do I get the actual classes of columns of my dataframe d? Something like: R lapply(d, class) $a [1] numeric $b [1] factor $c [1] logical $d [1] logical could be used. HTH. Best wishes, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm model syntax
G'day Harold, On Fri, 16 May 2008 11:43:32 -0400 Doran, Harold [EMAIL PROTECTED] wrote: N+M gives only the main effects, N:M gives only the interaction, and G*M gives the main effects and the interaction. I guess this begs the question what you mean with N:M gives only the interaction ;-) Consider: R (M - gl(2, 1, length=12)) [1] 1 2 1 2 1 2 1 2 1 2 1 2 Levels: 1 2 R (N - gl(2, 6)) [1] 1 1 1 1 1 1 2 2 2 2 2 2 Levels: 1 2 R dat - data.frame(y= rnorm(12), N=N, M=M) R dim(model.matrix(y~N+M, dat)) [1] 12 3 R dim(model.matrix(y~N:M, dat)) [1] 12 5 R dim(model.matrix(y~N*M, dat)) [1] 12 4 Why has the model matrix of y~N:M more columns than the model matrix of y~N*M if the former contains the interactions only and the latter contains main terms and interactions? Of course, if we leave the dim() command away, we will see why. Moreover, it seems that the model matrix constructed from y~N:M has a redundant column. Furthermore: R D1 - model.matrix(y~N*M, dat) R D2 - model.matrix(y~N:M, dat) R resid(lm(D1~D2-1)) Shows that the column space created by the model matrix of y~N*M is completely contained within the column space created by the model matrix of y~N:M, and it is easy to check that the reverse is also true. So it seems to me that y~N:M and y~N*M actually fit the same models. To see how to construct one design matrix from the other, try: R lm(D1~D2-1) Thus, I guess the answer is that y~N+M fits a model with main terms only while y~N:M and y~N*M fit the same model, namely a model with main and interaction terms, these two formulations just create different design matrices which has to be taken into account if one tries to interpret the estimates. Of course, all the above assumes that N and M are actually factors, something that Birgit did not specify. If N and M (or only one of them) is a numeric vector, then the constructed matrices might be different, but this is left as an exercise. ;-) (Apparently, if N and M are both numeric, then your summary is pretty much correct.) Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integer / floating point question
G'day Erik, On Fri, 16 May 2008 10:45:43 -0500 Erik Iverson [EMAIL PROTECTED] wrote: [...] The help page for '%%' addresses this a bit, but then caveats it with 'up to rounding error', which is really my question. Is there ever 'rounding error' with 2.0 %% 1 as opposed to 2 %% 1? I am not in the position to give an authoritative answer, but I think there should be no problem with rounding error in the situation that you describe. At least I hope there is no problem, otherwise I would consider this a serious issue. :) However, my question is related to R FAQ 7.31, Why doesn't R think these numbers are equal? The first sentence of that FAQ reads, The only numbers that can be represented exactly in R's numeric type are integers and fractions whose denominator is a power of 2. Again, I did not write this FAQ answer and cannot give an authoritative answer, but the word integer in that answer does not IMHO refer to variables in R that are of integer type; in particular since the answer discusses what kind of numbers can be represented exactly in R's numeric type. (Perhaps this should actually be plural since there are several numeric types?) My interpretation is that 2.0 and 2 are both *text constants* that represent the integer 2, and that number is representable in a floating point (and in an integer). The paper by Goldberg, referenced in FAQ 7.31, contains a discussion on whether it is possible (it is) to convert a floating point number from binary representation to decimal representation and then back; ending up with the same binary representation. This kind of questions are important if you use commands like write.table() or write.csv() which write out floating points in decimal representation, readable to normal humans. When you read the data back in, you want to end up with the exact same binary representation of the numbers. Goldberg is indeed an interesting paper to read. And the comments I made above are based on my understanding of Goldberg, 2 and 2.0 are both decimal representation of the integer 2, and this number has an exact representation (in integer type variables and in floating point type variables). Hence, both these decimal representation should lead to a binary representation that correspond to that number exactly. Thus, I would expect R x - 2.0 R x %% 1 == 0 always to work and to return TRUE. It is things like: R x - sqrt(2.0)^2 R x %% 1 == 0 that FAQ 7.31 is about and what, IMHO, the comment in the help page of %% warns about; if the variable x contains a value that was created by some finite precision floating point calculations. But the conversion from a textual representation of an integer to a binary representation should not create problems. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integer / floating point question
G'day Brian, On Fri, 16 May 2008 19:28:59 +0100 (BST) Prof Brian Ripley [EMAIL PROTECTED] wrote: 'numeric' is a class but not a type -- so I think the FAQ is wrongly worded but the concept is well defined Though there may be multiple definitions. :-) Reading a bit in R Language Definition (yes, I am aware that it is a draft), 'numeric' seems to be also used as a mode. I guess this comes from S3, the mode (no pun intended) that I am still using and thinking in mostly, and that 'numeric' being a class came in with S4. And I notice that mode(1L) and class(1L) gives different results, the former returns numeric while the latter returns integer. Hence, when I read R's numeric type in the FAQ, I take this as referring to those basic types that are of mode numeric, i.e. integer and real. I am not sure whether changing this to R's object of mode 'numeric' or R's object class 'numeric' will make the answer more readable/understandable. But it does not say that all such numbers can be represented exactly, and only some can. I am well aware that integers and reals can only hold integers and fractions whose denominator is a power of 2 from a limited range; in particular, the former will not hold any fractions. However, given the discussion in Goldberg (which FAQ 7.31 points to) on changing from binary to decimal representation and back, I would expect that given any such number (within the appropriate range) in decimal representation, it would be transformed into the correct, exact binary representation. However, I also know that if I stumble across an example where this is not the case, I shall only complain after ensuring that documented behaviour is violated and that the FAQ not necessarily documents expected behaviour. I guess Duncan's answer said it all, at the extreme ends of the appropriate ranges surprises might lurk, but it would be considered a serious bug if R did not transform small integers into their exact binary representation. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sum of unknown number of matrices
G'day Shubha, On Wed, 4 Jun 2008 20:23:35 +0530 Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: Something like do.call(+,l) is not working...why is this? Well, as the error message says, + is either a unary or a binary operator, i.e. it takes either one or two arguments, but not more. I may not be knowing the number of matrices in the list... This is perhaps a bit complicated but it works: R a=b=c=d=matrix(1:4,2,2) R l=list(a,b,c,d) R library(abind) ## may have to install this package first R apply(do.call(abind, list(l, along=3)), 1:2, sum) [,1] [,2] [1,]4 12 [2,]8 16 HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The .tex version of the manual in foo.Rcheck
G'day Bendix, On Mon, 27 Apr 2009 22:52:02 +0200 BXC (Bendix Carstensen) b...@steno.dk wrote: In version 2.8.1, running Rcmd check on the package foo would leave the file foo-manual.tex in the folder foo.Rcheck. But as of 2.9.0 only foo-manual.pdf and foo-manual.log are there. Is this intentional? I do not know whether it is intentional, but it seems that parts of `check' were substantially re-written due to the new Rd parser. As far as I can understand the Perl code, the .tex file is only copied to the .Rcheck directory if an error occurs while the file is processed. But this can easily be changed by moving two lines in the script a bit higher, see the attached patch. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The .tex version of the manual in foo.Rcheck
G'day Uwe, On Wed, 29 Apr 2009 11:03:43 +0200 Uwe Ligges lig...@statistik.tu-dortmund.de wrote: you can also take a source package an run R CMD Rd2dvi --no-clean packageName on it and you will get a temporary directory with the TeX sources in it. Which is fine for manual processing and when done once or twice, but somewhat less helpful for automatic processing in scripts since the name of the temporary directory is hard to predict. `R CMD check foo' copies already unconditionally the foo-manual.log file from the temporary directory to the foo.Rcheck directory; and the foo-manual.tex file is copied if an error occurs during processing. What is the problem with also copying the foo-manual.tex file unconditionally to foo.Rcheck? Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a vector into equal groups
G'day Utkarsh, On Mon, 04 May 2009 11:51:21 +0530 utkarshsinghal utkarsh.sing...@global-analytics.com wrote: I have vector of length 52, say, x=sample(30,52,replace=T). I want to sort x and split into five *nearly equal groups*. What do you mean by *nearly equal groups*? The size of the groups should be nearly equal? The sum of the elements of the groups should be nearly equal? Note that the observations are repeated in x so in case of a tie I want both the observations to fall in same group. Then it becomes even more important to define what you mean with nearly equal groups. As a start, you may consider: R set.seed(1) R x=sample(30,52,replace=T) R xrle - rle(sort(x)) R xrle Run Length Encoding lengths: int [1:25] 2 1 2 2 3 1 1 1 5 1 ... values : int [1:25] 1 2 4 6 7 8 9 11 12 13 ... R cumsum(xrle$lengths) [1] 2 3 5 7 10 11 12 13 18 19 24 25 26 28 29 32 35 38 [19] 43 45 46 48 49 51 52 and use this to determine our cut-offs. E.g., should the first group have 10, 11 or 12 elements in this case? The information in xrle should enable you to construct your five groups once you have decided on a grouping. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
G'day all, On Sat, 09 May 2009 08:01:40 -0700 spencerg spencer.gra...@prodsyse.com wrote: The harmonic mean is exp(mean(logs)). Therefore, log(harmonic mean) = mean(logs). Does this make sense? I think you are talking here about the geometric mean and not the harmonic mean. :) The harmonic mean is a bit more complicated. If x_i are positive values, then the harmonic mean is H= n / (1/x_1 + 1/x_2 + ... + 1/x_n) so log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n) now log(1/x_i) = -log(x_i) so if log(x_i) is available, the logarithm of the individual terms are easily calculated. But we need to calculate the logarithm of a sum from the logarithms of the individual terms. At the C level R's API has the function logspace_add for such tasks, so it would be easy to do this at the C level. But one could also implement the equivalent of the C routine using R commands. The way to calculate log(x+y) from lx=log(x) and ly=log(y) according to logspace_add is: max(lx,ly) + log1p(exp(-abs(lx-ly))) So the following function may be helpful: logadd - function(x){ logspace_add - function(lx, ly) max(lx, ly) + log1p(exp(-abs(lx-ly))) len_x - length(x) if(len_x 1){ res - logspace_add(x[1], x[2]) if( len_x 2 ){ for(i in 3:len_x) res - logspace_add(res, x[i]) } }else{ res - x } res } R set.seed(1) R x - runif(50) R lx - log(x) R log(1/mean(1/x)) ## logarithm of harmonic mean [1] -1.600885 R log(length(x)) - logadd(-lx) [1] -1.600885 Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unintended loading of package:datasets
G'day David, On Sun, 10 May 2009 17:17:38 -0400 David A Vavra dava...@verizon.net wrote: The dataset package is being loaded apparently by one of the packages that I am using. The loading of the datasets takes a long time and I would like to eliminate it. I thought the datasets were effectively examples so don't understand why they would be required at all. 1) How can I determine what is causing the datasets to be loaded? help(options) and then read the part on defaultPackages 2) How can I stop them from doing so? Create an appropriate .Rprofile. R-DownUnder had long time ago a discussion on what people put into their .Rprofile, and Bill Venables' .Rprofile seem to contain the following snippet: ### puts four more packages on to the default ### search path. I use them all the time local({ old - getOption(defaultPackages) options(defaultPackages = c(old, splines, lattice, mgcv, MASS)) }) I am using the following: Rpart, grDevices, graphics, stats, utils, methods, base There is also an environment named 'Autoloads' Rpart?? Or rpart?? I know of a package with the latter name but not the former, and R is case sensitive. So you may try: local({ options(defaultPackages=c(rpart, grDevices, graphics, stats, utils, methods) }) FWIW, note that for command XXX, help(XXX) will give you a help page that has usually some example code at the end, that code can be run by example(XXX). Typically, such example code is using data sets from the datasets package; so if you do not load it, the example() command might not work anymore for some functions. Don't complain if your R installation doesn't work anymore if you mess around with defaultPackages, in particular if you remove packages that are usually in the default of defaultPackages. :) And I agree with Rolf (Turner), it is hard to believe that the datasets package would produce a noticeable delay on start-up; for me it also loads instantaneously. I guess that your problem is more along David's (Winsemuis) guess. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unintended loading of package:datasets
G'day David, On Sun, 10 May 2009 21:35:30 -0400 David A Vavra dava...@verizon.net wrote: Thanks. Perhaps something else is going on. There is a large time period (about 20 sec.) after the message about loading the package. More investigation, I suppose. What R version are you using? I do not remember ever getting a message that the package datasets is loaded, nor a message for any of the other packages that are loaded by default. (There once, long ago, may have been such a message, but if so, I do not remember this anymore.) Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting lm() to work with a matrix
G'day Luc, On Wed, 20 May 2009 09:58:41 -0400 Luc Villandre villa...@dms.umontreal.ca wrote: MikSmith wrote: [...] Indeed, functions like /lm()/ require the object fed to the /data/ argument to be either [...] But the data argument is optional and does not need to be specified. In your situation, I suggest you typecast your matrix into a data frame using /as.data.frame()/. [...] My guess is that he is already working with a data frame and does not work with matrices, otherwise he should not have encountered problems: R response - matrix(rnorm(120), ncol=4) R spectra.spec - matrix(rnorm(900), ncol=30) R spectra.lm - lm(response[,3]~spectra.spec[,2:20]) R spectra.lm Call: lm(formula = response[, 3] ~ spectra.spec[, 2:20]) Coefficients: (Intercept) spectra.spec[, 2:20]1 -0.48404 0.42503 spectra.spec[, 2:20]2 spectra.spec[, 2:20]3 -0.08955-0.27605 spectra.spec[, 2:20]4 spectra.spec[, 2:20]5 -0.16832-0.14107 spectra.spec[, 2:20]6 spectra.spec[, 2:20]7 -0.47009-0.23672 spectra.spec[, 2:20]8 spectra.spec[, 2:20]9 0.12920 0.23306 spectra.spec[, 2:20]10 spectra.spec[, 2:20]11 -0.28586 0.03579 spectra.spec[, 2:20]12 spectra.spec[, 2:20]13 0.10676-0.34407 spectra.spec[, 2:20]14 spectra.spec[, 2:20]15 0.20253-0.17259 spectra.spec[, 2:20]16 spectra.spec[, 2:20]17 0.19765 0.40705 spectra.spec[, 2:20]18 spectra.spec[, 2:20]19 -0.12448-0.17149 Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained fits: y~a+b*x-c*x^2, with a,b,c =0
G'day Alex, On Wed, 27 May 2009 11:51:39 +0200 Alex van der Spek am...@xs4all.nl wrote: I wonder whether R has methods for constrained fitting of linear models. I am trying fm-lm(y~x+I(x^2), data=dat) which most of the time gives indeed the coefficients of an inverted parabola. I know in advance that it has to be an inverted parabola with the maximum constrained to positive (or zero) values of x. The help pages for lm do not contain any info on constrained fitting. Does anyone know how to? Look at the package nnls on CRAN. According to your subject line, you are trying to solve what is known as a quadratic program, and there are at least two quadratic programming solvers (ipop in kernlab and solve.qp in quadprog) available for R. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbinom with computed probability
G'day Sigalit, On Fri, 23 Nov 2007 11:25:30 +0200 sigalit mangut-leiba [EMAIL PROTECTED] wrote: Hello, I have a loop with probability computed from a logistic model like this: for (i in 1:300){ p[i]-exp(-0.834+0.002*x[i]+0.023*z[i])/(1+exp(-0.834+0.002*x[i]+0.023 +z[i])) x and z generated from normal distribution. I get 300 different probabilities And I want to generate variables from bernulli distribution with P for every observation: T[i]-rbinom(1,1,p[i]) But i get missing values for T. What I'm doing wrong? I guess the problem is that in the numerator you have 0.023*z[i] but 0.023+z[i] in the denominator. Thus, some p[i] can be outside of [0,1] which would produce NAs in T. But why a for loop? This code is readily vectorised: p - exp(-0.834+0.002*x+0.023*z)/(1+exp(-0.834+0.002*x+0.023*z)) HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quadratic programming
G'day Serge, On Wed, 5 Dec 2007 11:25:41 +0100 de Gosson de Varennes Serge (4100) [EMAIL PROTECTED] wrote: I am using the quadprog package and its solve.QP routine to solve and quadratic programming problem with inconsistent constraints, which obviously doesn't work since the constraint matrix doesn't have full rank. I guess it will help to fix some terminology first. In my book, inconsistent constraints are constraints that cannot be fulfilled simultaneously, e.g. something like x_1 = 3 and x_1 = 5 for an obvious example. Thus, a problem with inconsistent constraints cannot be solved, regardless of the rank of the constraint matrix. (Anyway, that matrix is typically not square, so would be be talking about full column rank or full row rank?) Of course, it can happen that the constraints are consistent but that there are some redundancy in the specified constraints, e.g. a simply case would be x_1 = 0, x_2 = 0 and x_1+x_2 = 0; if the first two constraints are fulfilled, then the last one is automatically fulfilled too. In my experience, it can happen that solve.QP comes to the conclusion that a constraint that ought to be fulfilled, given the constraints that have already been enforced, is deemed to be violated and to be inconsistent with the constraints already enforced. In that case solve.QP stops, rather misleadingly, with the message that the constraints are inconsistent. I guess the package should be worked over by someone with a better understanding of the kind of fudges that do not come back to bite and of finite precision arithmetic than the original author's appreciation of such issues when the code was written. ;-)) A way to solve this is to perturb the objective function and the constraints with a parameter that changes at each iteration (so you can dismiss it), but that's where it gets tricky! Solve.QP doesn't seem to admitt constant terms, it need Dmat (a matrix) and dvec (a vector) as defined in the package description. Now, some might object that a constant is a vector but the problem looks like this Min f(x) = (1/2)x^t Q x + D^t x + d It is a bit unclear to me what you call the constant term. Is it `d'? In that case, it does not perturb the constraints and it is irrelevant for the minimizer of f(x); also for the minimizer of f(x) under linear constraints. Regardless of d, the solution is always the same. I do not know of any quadratic programming solver that allows `d' as input, probably because it is irrelevant for determining the solution of the problem. Can anyone help me, PLEASEEE? In my experience, rescaling the problem might help, i.e. use Q* = Q/2 and D*=D/2 instead of the original Q and D; but do not forget to rescale the constraints accordingly. Or you might want to try another quadratic program solver in R, e.g. ipop() in package kernlab. Hope this helps. Best wishes, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building package - tab delimited example data issue
G'day Peter, On Thu, 06 Dec 2007 11:52:46 +0100 Peter Dalgaard [EMAIL PROTECTED] wrote: If you had looked at help(data), you would have found a list of which file formats it supports and how they are read. Hint: TAB-delimited files are not among them. [...] On the other hand, Writing R Extensions has stated since a long time (and still does): The @file{data} subdirectory is for additional data files the package makes available for loading using @code{data()}. Currently, data files can have one of three types as indicated by their extension: plain R code (@file{.R} or @file{.r}), tables (@file{.tab}, @file{.txt}, or @file{.csv}), or @code{save()} images (@file{.RData} or @file{.rda}). Now in my book, .csv files contain comma separated values, .tab files contain values separated by TABs and .txt files are pure text files, presumably values separated by any kind of white space. Thus, I think that the expectation that TAB-delimited file formats should work is not unreasonable; I was long time ago bitten by this too. Then I realised that the phrase one of the three types should probably be interpreted as implying that .tab, .txt and .csv files are all of the same type and, apparently, should contain values separated by whitespace. I admit that I never tested whether .csv files would lead to the same problems as TAB delimited .tab files. Rather, I decided in the end that the safest option, i.e. to avoid misleading file extensions, would be to use .rda files in the future. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm/model.matrix confusion (? bug)
G'day Mark, On Wed, 12 Dec 2007 02:05:54 -0800 (PST) Mark Difford [EMAIL PROTECTED] wrote: In order to get the same coefficients as we get from the following [...] we need to do the following (if we use model.matrix to specify the model) By why would you want to do this? ## summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) ) That is, we need to take out two intercepts. Is this correct? Yes. Shouldn't lm check to see if an intercept has been requested as part of the model formula? No, it does not. In the Details section of lm's help page you will find the following: A formula has an implied intercept term. To remove this use either 'y ~ x - 1' or 'y ~ 0 + x'. See 'formula' for more details of allowed formulae. Thus, except if you explicitly ask for a constant term not be included, lm will add a constant term (a column of ones) additionally to what ever you have specified on the right hand side of the formula. If I do ## summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)), data=whiteside)) we get a strange model. Well, you get a model in which not all parameters are identifiable, and a particular parameter that is not identifiable is estimated by NA. I am not sure what is strange about this. But the formula part of this model qualifies as a valid formula ## as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)) Debatable, the above command only shows that it can be coerced into a valid formula. :) just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside) But we know that the _correct_ formula is the following ## as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1) Why is this formula any more correct than the other one? Both specify exactly the same model. It is just that one does it in an overparameterised way. (Sorry, this is getting really long) --- So, my question/confusion comes down to wanting to know why lm() doesn't check to see if an intercept has been specified when the model has been specified using model.matrix. Because lm() is documented not to check this. If you do not want to have an intercept in the model you have to specifically ask it for. Also, comparing the output of summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) and summary( lm(Gas ~ Insul/Temp, data = whiteside ) ) you can see that lm() does not check whether there is an implicit intercept in the model. Compare the (Adjusted) R-squared values returned; one case is using the formula for models with no intercept the other one the formula for models with intercept. Similar story with the reported F-statistics. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looking for simpler solution to probabilistic question
G'day Rainer, On Tue, 15 Jan 2008 14:24:08 +0200 Rainer M Krug [EMAIL PROTECTED] wrote: I have two processes which take with a certain probability (p1 and p2) x number of years to complete (age1 and age2). As soon as thge first process is completed, the second one begins. I want to calculate the time it takes for the both processes to be completed. I have the following script which gives me the answer, butI think there must be a more elegant way of doing the calculations between the # The code between the together with the first line after the second could be just shortened to: ager - range(age1) + range(age2) ager - ager[1]:ager[2] pp1 - c(cumsum(p1), rev(cumsum(rev(p1 pp2 - c(cumsum(p2[-21]), rev(cumsum(rev(p2)))[-1]) pr - pp1+pp2 pr - pr/sum(pr) all.equal(p, pr) [1] TRUE all.equal(age, ager) [1] TRUE If this is more elegant is probably in the eye of the beholder, but it should definitely use less memory. :) BTW, I am intrigued, in which situation does this problem arise? The time it takes the second process to finish seems to depend in a curious way on the time it took the first process to complete. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply without return values?
G'day Rainer, On Fri, 25 Jan 2008 10:34:32 +0200 Rainer M Krug [EMAIL PROTECTED] wrote: [...] p - data.frame(runif(10), runif(10), runif(10)) lapply( p, function(ps) {x11(); plot(ps)} ) which results in three graphs and a printout: [...] How can I avoid this printout without using tmp - lapply( p, function(ps) {x11(); plot(ps)} )? ?invisible like in invisible(lapply( p, function(ps) {x11(); plot(ps)} )) Note, your solution seems to involve less keystroke but has the disadvantage of creating an object in your workspace. Of course, you could always do something like: ilapply - function(...) invisible(lapply(...)) ## perhaps better: ## ilapply - function(X, FUN, ...) invisible(lapply(X, FUN, ...)) ilapply(p, function(ps) {x11(); plot(ps)}) To save keystrokes in the long run. :) HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] primitives again
G'day Edna, On Sat, 14 Mar 2009 22:52:38 -0500 Edna Bell edna.bel...@gmail.com wrote: Dear R Gurus: Well, I guess I can answer nevertheless. :) How do I find the functions which are primitives, please? ?is.primitive Thus, the following code would give you all the primitive functions in package base: R pos - package:base R uu - ls(pos=pos, all=TRUE) R tt - sapply(uu, function(x) is.primitive(get(x, pos=pos))) R rr - uu[tt] R rr You may want to search other packages too. Or modify the code such that you loop over your search path and concatenate the results from the various locations. HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] builtin vs. closure
G'day Edna, On Sat, 14 Mar 2009 23:00:12 -0500 Edna Bell edna.bel...@gmail.com wrote: Anyway, my question now is: what determines if a function is a builtin vs. a closure, please? The help page of typeof states that: The possible values are listed in the structure 'TypeTable' in 'src/main/util.c'. Current values are [...] 'closure' (function) 'special' and 'builtin' (basic functions and operators) Which might raise the question what basic functions are. But since basic and primitive are synonyms it is not hard to guess that primitive functions are returning special or builtin; and the help page of ?is.primitive confirms this. Note, a return value of special is possible for a primitive function: R UseMethod function (generic, object) .Primitive(UseMethod) R typeof(UseMethod) [1] special HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] primitives again
On Sun, 15 Mar 2009 14:23:40 +0100 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: Edna Bell wrote: How do I find the functions which are primitives, please? you can scan the whole search path for functions that are primitives: primitives = sapply(search(), function(path) with(as.environment(path), Filter(is.primitive, lapply(ls(), get primitives is a list of named lists of primitives, one sublist for each attached package (most sublists will be empty, i guess). The code above will miss some primitives in package:base, namely those that start with a dot: R primitives - sapply(search(), function(path) + with(as.environment(path), +Filter(is.primitive, lapply(ls(), get R sapply(primitives, length) .GlobalEnv package:stats package:graphics package:grDevices 0 0 0 0 package:utils package:datasets package:methods Autoloads 0 0 2 0 package:base 176 R primitives - sapply(search(), function(path) + with(as.environment(path), +Filter(is.primitive, lapply(ls(all=TRUE), get R sapply(primitives, length) .GlobalEnv package:stats package:graphics package:grDevices 0 0 0 0 package:utils package:datasets package:methods Autoloads 0 0 2 0 package:base 188 Also, but that is a matter of taste, it could be preferable to use sapply instead of lapply: R primitives$`package:methods` [[1]] function (expr) .Primitive(quote) [[2]] .Primitive([[-) R head(primitives$`package:base`) [[1]] function (x) .Primitive(!) [[2]] function (e1, e2) .Primitive(!=) [[3]] .Primitive($) [[4]] .Primitive($-) [[5]] function (e1, e2) .Primitive(%%) [[6]] function (x, y) .Primitive(%*%) R primitives - sapply(search(), function(path) +with(as.environment(path), + Filter(is.primitive, sapply(ls(all=TRUE), get R primitives$`package:methods` $Quote function (expr) .Primitive(quote) $`el-` .Primitive([[-) R head(primitives$`package:base`) $`!` function (x) .Primitive(!) $`!=` function (e1, e2) .Primitive(!=) $`$` .Primitive($) $`$-` .Primitive($-) $`%%` function (e1, e2) .Primitive(%%) $`%*%` function (x, y) .Primitive(%*%) Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] .Internal
G'day Kevin, On Wed, 18 Mar 2009 21:46:51 -0700 rkevinbur...@charter.net wrote: I was trying to find source for optimize and I ran across function (f, interval, ..., lower = min(interval), upper = max(interval), maximum = FALSE, tol = .Machine$double.eps^0.25) { if (maximum) { val - .Internal(fmin(function(arg) -f(arg, ...), lower, upper, tol)) list(maximum = val, objective = f(val, ...)) } else { val - .Internal(fmin(function(arg) f(arg, ...), lower, upper, tol)) list(minimum = val, objective = f(val, ...)) } } Then I did a search for fmin and i came up with: /* fmin(f, xmin, xmax tol) */ SEXP attribute_hidden do_fmin(SEXP call, SEXP op, SEXP args, SEXP rho) So my question is where do I find the intermediary step between .Internal(fmin(function(arg) f(arg, ...), lower, upper, tol)) and SEXP attribute_hidden do_fmin(SEXP call, SEXP op, SEXP args, SEXP rho) @Article{Rnews:Ligges:2006, author = {Uwe Ligges}, title= {{R} {H}elp {D}esk: {Accessing} the Sources}, journal = {R News}, year = 2006, volume = 6, number = 4, pages= {43--45}, month= {October}, url = http, pdf = Rnews2006-4 } http://CRAN.R-project.org/doc/Rnews/Rnews_2006-4.pdf The number of arguments doesn't match up. I am guessing that lower and upper somehow get merged into the args. And rho is 'tol'. Right? Unlikely. In Writing R Extensions (and the functions I looked up), 'rho' usually denotes an environment that is used to evaluate expressions in. Typically (i.e. in cases that I had need to look at), all arguments are rolled into the SEXP arg for internal functions. HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If statement generates two outputs
G'day Carl, On Mon, 23 Mar 2009 20:11:19 -0400 Carl Witthoft c...@witthoft.com wrote: From: Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no Date: Sun, 22 Mar 2009 22:58:49 +0100 just for fun, you could do this with multiassignment, e.g., using the (highly experimental and premature!) rvalues: source('http://miscell.googlecode.com/svn/rvalues/rvalues.r') if (TRUE) c(df1, df2) := list(4:8, 9:13) dput(df1) # 4:8 dput(df2) # 9:13 --- Now THAT's what I call an overloaded operator! ^_^ But seriously: can someone explain to me what's going on in the rvalues.r code? I tried a simple experiment, replacing := with a colec in the code, and of course the line c(df1, df2) colec list(4:8, 9:13) just gives me a syntax error response. Clearly I need a pointer to some documentation about how the colon and equals sign get special treatment somewhere inside R. Not sure why := gets a special treatment, perhaps because it is not a valid name and, hence, the parser deduces that it is an operator? IIRC, the traditional way to define your own operator is to bound the name by percentage signs, i.e. replacing := by %colec% and then issuing the command c(df1, df2) %colec% list(4:8, 9:13) will work. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] .Internal
G'day Kevin, On Mon, 23 Mar 2009 18:48:16 -0400 rkevinbur...@charter.net wrote: Sorry to be so dense but the article that you suggest does not give any information on how the arguments are packed up. I look at the call: val - .Internal(fmin(function(arg) -f(arg, ...), lower, upper, tol)) and then with the help of this article I find do_fmin in optimize.c: SEXP attribute_hidden do_fmin(SEXP call, SEXP op, SEXP args, SEXP rho) Sorry for not being clear enough. Yes, the article tells you how to find out that do_fmin is (eventually) called when you call optimize on the command line. I thought that this was one of your questions. Again there doesn't seem to be any coorespondance between lower, upper, tol and the arguments to do_fmin. So I am missing a step. As far as I know, .Internal has the same interface as .External. So a study of Writing R Extensions should give insight regarding this step. In particular Chapter 5 (System and foreign language interfaces) - Section 5.10 (Interface functions .Call and .External) - Section 5.10.2 (Calling .External). Essentially, all arguments to a C function called via .Internal or .External are passed as a single SEXP; this allows to pass an unlimited number of arguments to a C function as all other interfaces to native routines (.C, .Call, .Fortran) have some limit, albeit a rather generous one, on the number of arguments that can be passed to the native routine. I believe you can think of that single SEXP as a list containing the individual arguments. Accessing those arguments one by one involves macros with names like CDR, CAR, CADR, CADDR, CADDDR, CAD4R and so forth. As I understand it, if you are fluent in Lisp (Scheme?) and how components of a list are referred to in that language, then you have no problems with understanding the names of those macros. Since I never became comfortable with those languages, I restrict myself to .C and .Call; YMMV. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting a Matrix to a Vector
G'day Ken, On Wed, 25 Mar 2009 00:13:48 -0700 (PDT) Ken-JP kfmf...@gmail.com wrote: Say I have: set.seed( 1 ) m - matrix( runif(5^2), nrow=5, dimnames = list( c(A,B,C,D,E), c(O,P,Q,R,S) ) ) m O P Q R S A 0.2655087 0.89838968 0.2059746 0.4976992 0.9347052 B 0.3721239 0.94467527 0.1765568 0.7176185 0.2121425 C 0.5728534 0.66079779 0.6870228 0.9919061 0.6516738 D 0.9082078 0.62911404 0.3841037 0.3800352 0.121 E 0.2016819 0.06178627 0.7698414 0.7774452 0.2672207 --- I want to create a vector v from matrix m that looks like this: A.O 0.2655087 B.O 0.3721239 v - as.vector( m ) almost gives me what I want, but then I need to take combinations of colnames( m ) and rownames( m ) to get my labels and hope they match up in order: if not, manipulate the order. This approach feels kludgy... Is this the right approach or is there a better way? R tt - reshape(data.frame(m), direction=long, varying=list(1:5), ids=rownames(m), times=colnames(m)) R ind - names(tt) %in% c(time, id) R uu - tt[,!ind] R names(uu) - rownames(tt) R uu A.OB.OC.OD.OE.OA.PB.P 0.26550866 0.37212390 0.57285336 0.90820779 0.20168193 0.89838968 0.94467527 C.PD.PE.PA.QB.QC.QD.Q 0.66079779 0.62911404 0.06178627 0.20597457 0.17655675 0.68702285 0.38410372 E.QA.RB.RC.RD.RE.RA.S 0.76984142 0.49769924 0.71761851 0.99190609 0.38003518 0.77744522 0.93470523 B.SC.SD.SE.S 0.21214252 0.65167377 0.1210 0.26722067 HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conerned about Interfacing R with Fortran
G'day Maura, On Thu, 26 Mar 2009 18:21:01 +0100 mau...@alice.it wrote: I am reading the manual sections illustrating how to call a Fortran subroutine from R. I feel uneasy at the explicit statement about .Fortran interface working with Fortran 77. I would like to call a Fortran-90 subroutine from my R script. Is that supported at all ? Read the completely manual. :) It is pretty easy to use acroread, or other PDF readers, to search for Fortran in the PDF file; the HTML version should also be searchable from your browser. In chapter 1 (page 7 of the PDF 2.8.1 version) of the Writing R Extensions manual, you will find: [...] providing support for C, C++, FORTRAN 77, Fortran 9...@footnote{note that Ratfor is not supported. If you have Ratfor source code, you need to convert it to FORTRAN. Only FORTRAN-77 (which we write in upper case) is supported on all platforms, but most also support Fortran-95 (for which we use title case). If you want to ship Ratfor source files, please do so in a subdirectory of @file{src} and not in the main subdirectory.}, Objective C [...] and later in chapter 1, there is a complete section (namely 1.2.3) on F95 code: @subsection Using F95 code @R{} currently does not distinguish between FORTRAN 77 and Fortran 90/95 code, and assumes all FORTRAN comes in source files with extension @file{.f}. Commercial Unix systems typically use a F95 compiler, but only since the release of @code{gcc 4.0.0} in April 2005 have Linux and other non-commercial OSes had much support for F95. Only wih @R{} 2.6.0 did the Windows port adopt a Fortran 90 compiler. This means that portable packages need to be written in correct FORTRAN 77, which will also be valid Fortran 95. See @uref{http://developer.r-project.org/Portability.html} for reference resources. In particular, @emph{free source form} F95 code is not portable. On some systems an alternative F95 compiler is available: from the @code{gcc} family this might be @command{gfortran} or @command{g95}. Configuring @R{} will try to find a compiler which (from its name) appears to be a Fortran 90/95 compiler, and set it in macro @samp{FC}. Note that it does not check that such a compiler is fully (or even partially) compliant with Fortran 90/95. Packages making use of Fortran 90/95 features should use file extension @file{.f90} or @file{.f95} for the source files: the variable @code{PKG_FCFLAGS} specifies any special flags to be used. There is no guarantee that compiled Fortran 90/95 code can be mixed with any other type of code, nor that a build of @R{} will have support for such packages. Section 5.5 (Creating shared objects) also mentions Fortran 9x: Shared objects for loading into @R{} can be created using @command{R CMD SHLIB}. This accepts as arguments a list of files which must be object files (with extension @file{.o}) or sources for C, C++, FORTRAN 77, Fortran 9x, Objective C or Objective C++ (with extensions @file{.c}, @file{.cc} or @file{.cpp} or @file{.C}, @file{.f}, @file{.f90} or @file{.f95}, @file{.m}, and @file{.mm} or @file{.M}, respectively), or commands to be passed to the linker. See @kbd{R CMD SHLIB --help} (or the @R{} help for @code{SHLIB}) for usage information. Thus, it seems that calling Fortran 90 code from R is possible on some platforms and, presumably, on those where it is possible this is done via the .Fortran interface; although the Writing R Extensions manual does not seem to say so explicitly. OTOH, the help file for .Fortran states: Use '.Fortran' with care for compiled Fortran 9x code: it may not work if the Fortran 9x compiler used differs from the Fortran compiler used when configuring R, especially if the subroutine name is not lower-case or includes an underscore. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug? FORTTRAN help
G'day Kevin, On Thu, 26 Mar 2009 13:42:20 -0700 rkevinbur...@charter.net wrote: I was feeling masochistic the other day [...] Welcome to the club. :) and we have been having some wierd memory problems so I started digging into the source for L-BFGS-B. In the lbgfsb.c file I see the following code: /* Cholesky factorization of (2,2) block of wn. */ F77_CALL(dpofa)(wn[*col + 1 + (*col + 1) * wn_dim1], m2, col, info); if (*info != 0) { *info = -2; return; } If I am not mistaken this says that there is a m2 * col matrix that starts at 'col + 1 + (col + 1) * wn_dm1. Where wn_dm1 is 2 * m. I think your interpretation is not quite correct. Note that it makes only a sense to calculate a Cholesky factorization of a square matrix. The interface of dpofa (in the linpack library) is available at: http://www.netlib.org/linpack/dpofa.f Thus, the call above says, calculate the Cholesky factorization of a col * col matrix whose (1,1) element is stored at wn[*col+1+(*col+1)] and that matrix is stored within a matrix which was allocated such that it has m2 rows. Or, in other words, calculate the Cholesky factorization of a col * col matrix whose (1,1) element is stored at wn[*col+1+(*col+1)] and to move from the (1,1) element to the (1,2) element you have to move to the memory location m2*sizeof(double) ahead/behind of (1,1). Fortran uses a column major form to store arrays, i.e. element (1,1) is followed by element (2,1), (3,1) and so forth. To know where to find element (1,2) of the matrix, you have to tell Fortran with how many rows the big matrix that holds your matrix was allocated. I am worried that the optimizer will silently write info memory that it shouldn't [...] If you are worried about such issues, you should read chapter 4 of Writing R extensions, in particular Section 4.3 on gctorture and valgrind. Then run R on a platform that supports valgrind. It is very useful to catch problems such as accessing or writing into memory that you should not access or write to. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT ?] rant (was : Re: Conversions From standard to metricunits)
G'day Murray, On Fri, 3 Apr 2009 20:01:30 -0400 Murray Cooper myrm...@earthlink.net wrote: For science yes. For pleasure I'll still take a pint instead of 570ml! And you might experience a disappointment then if you order it in the US where the pint is apparently 450ml; at least, I won several bets on whether a pint is less or more than half a litre against American visitors The stake being, of course, the next round of drinks. :) And, if memory serves correctly, here in Singapore I once ordered a pint of draught beer and was served half a litre... Confused me completely... That's when I thought that the metric system is so much safer until I ordered my next beer, according to the menu a bottled beer of 400ml, it came in a 333ml bottle... Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Eclipse and StatET Howto (also added Subversion, Rtools)
G'day Dirk, On Sat, 4 Apr 2009 20:27:22 -0500 Dirk Eddelbuettel e...@debian.org wrote: On 4 April 2009 at 14:37, Ken-JP wrote: Yes, I have x11-common installed, and dpkg -S /etc/X11/rgb.txt shows not found for me. This is on Ubuntu 8.10 amd64. Same 8.10 for both amd64 and i386 where I checked -- both have the file. It could be a leftover from an earlier install of mine, or an accidental deletion at your end. My guess that it is the former. :) Some time ago I wiped Debian of my laptop and installed Kubuntu 8.10 fresh (i386 flavour); so no left overs from previous versions and on my laptop the results are: ber...@berwin5:~$ apt-file find /etc/X11/rgb.txt ber...@berwin5:~$ dpkg -S /etc/X11/rgb.txt dpkg: /etc/X11/rgb.txt not found. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Eclipse and StatET Howto (also added Subversion, Rtools)
G'day Peter, On Sun, 05 Apr 2009 11:26:40 +0200 Peter Dalgaard p.dalga...@biostat.ku.dk wrote: Berwin A Turlach wrote: G'day Dirk, On Sat, 4 Apr 2009 20:27:22 -0500 Dirk Eddelbuettel e...@debian.org wrote: On 4 April 2009 at 14:37, Ken-JP wrote: Yes, I have x11-common installed, and dpkg -S /etc/X11/rgb.txt shows not found for me. This is on Ubuntu 8.10 amd64. Same 8.10 for both amd64 and i386 where I checked -- both have the file. It could be a leftover from an earlier install of mine, or an accidental deletion at your end. My guess that it is the former. :) Some time ago I wiped Debian of my laptop and installed Kubuntu 8.10 fresh (i386 flavour); so no left overs from previous versions and on my laptop the results are: ber...@berwin5:~$ apt-file find /etc/X11/rgb.txt ber...@berwin5:~$ dpkg -S /etc/X11/rgb.txt dpkg: /etc/X11/rgb.txt not found. Hum.. Fedora 9 doesn't have it either. It does have /usr/share/X11/rgb.txt though, so please check whether it has moved. Apparently it has: ber...@berwin5:~$ apt-file find rgb.txt [...] x11-common: /usr/share/X11/rgb.txt [...] However, ber...@berwin5:~$ ls -l /usr/share/X11/rgb.txt lrwxrwxrwx 1 root root 16 Jan 14 03:28 /usr/share/X11/rgb.txt - /etc/X11/rgb.txt ber...@berwin5:~$ more /usr/share/X11/rgb.txt /usr/share/X11/rgb.txt: No such file or directory Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code of sample() in C
G'day Ranjan, On Sun, 5 Apr 2009 17:00:56 -0500 Ranjan Maitra mai...@iastate.edu wrote: [...[ I haven't tried this function out much myself, so YMMV. There is an obvious problem in this code since len is not decreased but n is when an index is sampled. The last line in the last for loop should be x[j] = x[--len]; Otherwise this algorithm will produce samples in which an index can be repeated. And the probabilities with which an index is sampled would be non-trivial to determine; they would definitely not be uniform. HTH. Cheers, Berwin #include stdlib.h #ifndef USING_RLIB #define MATHLIB_STANDALONE 1 /*It is essential to have this before the call to the Rmath's header file because this decides the definitions to be set. */ #endif #include Rmath.h /* Note that use of this function involves a prior call to the Rmath library to get the seeds in place. It is assumed that this is done in the calling function. */ /* Equal probability sampling; without-replacement case */ /* Adapted from the R function called SampleNoReplace */ /** * Stores k out of n indices sampled at random without replacement * in y. */ int srswor(int n, int k, int *y) { if (k n) { return 1; } else { const double len = (double) n; int i; int* x = malloc(n * sizeof(int)); if (!x) { return 2; } for (i = 0; i n; ++i) x[i] = i; for (i = 0; i k; ++i) { const int j = (int)(len * runif(0.0, 1.0)); y[i] = x[j]; x[j] = x[--n]; } free(x); } return 0; } === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to re-import a table that was just exported to a file
On Sun, Apr 26, 2009 at 7:38 PM, Nigel Birney na...@cam.ac.uk wrote: Hi all, I am saving a program's output to a file to be read by another algorithm. But somehow such a simple operation (the reading) fails. I get: Error in read.table(a_corr_data.txt, sep = ,, col.names = T, row.names = F) : __more columns than column names Here is the write statement: write.table(a_corr_data,a_corr_data.txt,sep=,,col.names=T,row.names=F) Here is the read statement: a_corr_data - read.table(a_corr_data.txt,sep=,,col.names=T,row.names=F) Nothing happens in-between (these actions are just 10-30 secs apart). I tried to export/import without col.names, also tried different deliminators (/t) but the same error pops up again and again. I am already quite unhappy with this. ?read.table You might find that you are using col.names and row.names wrongly in the read.table command. Best wishes, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] definition of a function
G'day George, On Wed, 29 Oct 2008 06:21:37 +0200 KARAVASILIS GEORGE [EMAIL PROTECTED] wrote: Hello, R subscribers. I would like to define a function like this one: Fun - function(x)sin(x)/x with value 1 at x=0. How can I do that? Fun - function(x) ifelse(x==0, 1, sin(x)/x) Is there a way to plot it symbolically, without using x as a vector? e.g. x - seq(from=-10, to=10, length=100) plot(x, Fun(x)) curve(Fun, from=-10, to=10, n=100) HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odd behaviour of identical
On Sat, 01 Nov 2008 22:57:38 +0100 Wacek Kusnierczyk [EMAIL PROTECTED] wrote: Patrick Burns wrote: Wacek Kusnierczyk wrote: smells bad design. Nonsense. not really, i'm afraid. [...] to the point: is.integer(1) # FALSE is.integer(1:1) # TRUE is not particularly appealing as a design, though it can be defended along the line that : uses 1 as the increase step, thus if it starts at an integer, the vector contains integers (just like it says in help(:)). the problem here is that 1:1 is *obviously* an integer vector, while 1 is *obviously* a number vector, but *obviously* not an integer vector. do a poll on how obvious it is to r's users, i'd bet you lose. Probably you are right, but the set of useRs would be the wrong reference base. Most useRs are using R for the purpose it was designed; namely, statistical analyses. And I cannot remember any statistical analyses that I ever done where it mattered whether a number was stored as an integer or not; or whether is.integer() returned FALSE or TRUE. I can see a use of is.integer() only when you use R for programming. Personally, the only use I can see for is.integer() is for checking that a user has not by accident passed a vector stored in integer mode to a routine in which I pass this vector down to C or FORTRAN code that expects double or DOUBLE PRECISION, respectively. But in such situation, rather than testing with is.integer(), I typically just use storage.mode() to ensure the proper storage mode. In summary, if one uses R as a programming language then, as with any other programming language, one should become familiar with the language and its idiosyncrasies; and perhaps also with some features of binary computing and the constraints imposed by these feature. In my experience, every language has idiosyncrasies that one has to get used to (and which one may or may not consider design flaws). As the saying goes, a good handyman does not blame her/his tools. On Sun, Oct 26, 2008 at 6:39 PM, Wacek Kusnierczyk [EMAIL PROTECTED] wrote: given what ?identical says, i find the following odd: Given that the example section of the help page for identical starts with: Examples: identical(1, NULL) ## FALSE -- don't try this with == identical(1, 1.) ## TRUE in R (both are stored as doubles) identical(1, as.integer(1)) ## FALSE, stored as different types I find it odd that you find the following odd. :) x = 1:10 y = 1:10 all.equal(x,y) [1] TRUE identical(x,y) [1] TRUE y[11] = 11 y = y[1:10] Vectors can only hold objects of one type; and from the previous discussion and example you know that 11 has storage mode double. So you should not be surprised that all elements of y are promoted to storage mode double. And if you think that R's promotion rules are hard to understand; try to figure out the rules how objects of different types are promoted in expressions by C; in particular if some objects are unsigned integers. all.equal(x,y) [1] TRUE identical(x,y) [1] FALSE y [1] 1 2 3 4 5 6 7 8 9 10 length(y) [1] 10 looks like a bug. platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 7.0 year 2008 month 04 day22 svn rev45424 language R version.string R version 2.7.0 (2008-04-22) Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] chi square table
G'day Cruz, On Fri, 7 Nov 2008 09:47:47 +0800 cruz [EMAIL PROTECTED] wrote: Hi, How do we get the value of a chi square as we usually look up on the table on our text book? i.e. Chi-square(0.01, df=8), the text book table gives 20.090 dchisq(0.01, df=8) [1] 1.036471e-08 pchisq(0.01, df=8) [1] 2.593772e-11 qchisq(0.01, df=8) [1] 1.646497 nono of them give me 20.090 The value that your textbook denotes, presumably, with chi^2_0.01 (or some similar notatation) is in fact the 0.99 quantile of the chi-square distribution; which R readily calculates: R qchisq(0.99, df=8) [1] 20.09024 rant on That's the problem with introductory textbook whose author think they do the students a favour by using notation as z_alpha, z_0.01, z_(alpha/2) instead of z_(1-alpha), z_0.99, z_(1-alpha/2), respectively. In my opinion this produces in the long run only more confusion and does not help students at all. It just panders to intellectual laziness of (some) students and shows a great deal of confusion on the side of the authors. I would search another textbook rand off Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
G'day Duncan, On Tue, 11 Nov 2008 09:37:57 -0500 Duncan Murdoch [EMAIL PROTECTED] wrote: I think this tension is a fundamental part of the character of S and R. But it is also fundamental to R that there are QC tests that apply to code in packages: so writing new tests that detect dangerous usage (e.g. to disallow partial name matching) would be another way to improve reliability. [...] Please not. :) After years of using of R, it is now second nature to me to type (yes, I always spell out from and to) seq(from=xx, to=yy, length=zz) and I never understood why the full name of that argument had to be length.out. I would hate to see lots of warning messages because I am using partial matching. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 09:49:31 +0100 Wacek Kusnierczyk [EMAIL PROTECTED] wrote: (for whatever reason a user may choose to have a column named '-b') For whatever reason, people also jump from bridges. Does that mean all bridges have an inherently flawed design and should be abolished? Wait, then we would only have level crossing and some people, for whatever reason, think it is a good idea to race trains to level crossings. Gee, we better abolish them too since they are such a bad design. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 11:27:30 +0100 Wacek Kusnierczyk [EMAIL PROTECTED] wrote: Berwin A Turlach wrote: Why is it worth asking this if nobody else asks it? Most notably a certain software company in Redmond, Washington, which is famous for carrying on with bad designs and bugs all in the name of backward compatibility. Apparently this company also sets industry standards so it must be o.k. to do that. ;-) sure. i have had this analogy in mind for a long time, but just didn't want to say it aloud. Mate, if you contemplate comparing R to anything coming out of Redmond, Washington, then you should first heed the old saying that it is better to remain silent and let people believe that one is a fool than to open one's mouth and remove any doubt. :) indeed, r carries on with bad design, but since there are more and more users, it's just fine. Whether R carries on with bad design is debatable. Clearly the changes that you would like to see would lead to big changes that might break a lot of existing code and programming idioms. Such changes could estrange large part of the user base and, in a worst case scenario, make R unusable for many tasks it is used now. No wonder that nobody is eager to implement such design changes. Apparently Python is planning such whole sale changes when moving to version 3.x. Let's see what that does to the popularity of Python and the uptake of the new version. Didn't see any confused complaints yet. really. the discussion was motivated precisely by a user's complaint. We must have different definition of what constitutes a complaint. I looked at the initial posting again. In my book there was no complaint. Just a user who asked how to achieve a certain aim because the way he tried to achieve it did not work. There were three or four constructive answers that pointed out how it can be done and then all of a sudden complaints about alleged design flaws of R started. just scan this list; a large part of the questions stems from confusion, which results directly from r's design. That's your opinion, to which you are of course entitled to. In my opinion, a large part of the questions on r-help these days stem from the fact that in this age of instant gratification it seems to be easier to fire off an e-mail to a mailing list and try to pick the brain of thousands of subscribers instead of spending time on trying to read the documentation, learn about R and figure out the question on one's own. because r is likely to do everything but what the user expects. This is quite a strong statement, and I wonder what the basis is for that a statement. Care to provide any evidence? i could think of organizing a (don't)useR conference, where submissions would provide such evidence. Please do so. Such a conference would probably turn out to be more hilarious and funnier than the Melbourne International Comedy Festival; should be real fun to attend. :) whatever i say here, is mostly discarded as nonsense comments (while it certainly isn't), you say i make the problem up (while i just follow up user's complaints). seriously, i may have exaggerated in the immediately above, but lots of comments made here by the users convince me that r very often breaks expectations. Ever heard about biased sampling? On a list like this you, of course, hear questions by useRs who had the wrong expectations about how R should behave and got surprised. You do not hear of all the instances in which useRs had the correct expectations which promptly were met by R. R is a tool; a very powerful one and hence also very sharp. It is easy to cut yourself with it, but when one knows how to use it gives the results that one expects. I guess the problem in this age of instant gratification is that people are not willing to put in the time and effort to learn about the tools they are using. but a good tool should be made with care for how users will use it. But the group of users change, and sometimes one cannot foresee all possible ways in which future users may use the software. As a programming paradigm says, you cannot make a piece of software idiot-proof; nature will always come up with a better idiot. r apparently fits the ideas of its developers, That's the prerogative of the developers, isn't it? But if it would only fit their ideas, then it would only be used by them. The fact that it is used by many others seem to indicate that it fits also the ideas of many others. while confuses naive users. Well, many judiciaries have staged driver licenses for motorcycle; initially allowing only low-powered machine for new users with increasing powerful machines allowed for more experiences users. Some people in Australia would like to introduce a similar system for car-drivers since, apparently, too many P-platers kill themselves with high-powered V8 cars (though, I am not sure whether this is a problem
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 09:27:41 +0100 Wacek Kusnierczyk [EMAIL PROTECTED] wrote: but then it might be worth asking whether carrying on with misdesign for backward compatibility outbalances guaranteed crashes in future users' programs, [...] Why is it worth asking this if nobody else asks it? Most notably a certain software company in Redmond, Washington, which is famous for carrying on with bad designs and bugs all in the name of backward compatibility. Apparently this company also sets industry standards so it must be o.k. to do that. ;-) which result in confused complaints, Didn't see any confused complaints yet. Only polite requests for enlightenment after coming across behaviour that useRs found surprising given their knowledge of R. The confused complaints seem to be posted as responses to responses to such question by people who for what ever reason seem to have an axe to grind with R. the need for responses suggesting hacks to bypass the design, Not to bypass the design, but to achieve what the person whats. As any programming language, R is a Turing machine and anything can be done with it; it is just a question how. and possibly incorrect results published I guess such things cannot be avoided no matter what software you are using. I am more worried about all the analysis done in MS Excel, in particular in the financial maths/stats world. Also, to me it seems that getting incorrect results is a relative small problem compared with the frequent misinterpretation of correct results or the use of inappropriate statistical techniques. because r is likely to do everything but what the user expects. This is quite a strong statement, and I wonder what the basis is for that a statement. Care to provide any evidence? R is a tool; a very powerful one and hence also very sharp. It is easy to cut yourself with it, but when one knows how to use it gives the results that one expects. I guess the problem in this age of instant gratification is that people are not willing to put in the time and effort to learn about the tools they are using. How about spending some time learning about R instead of continuously griping about it? Just imagine how much you could have learned in the time you spend writing all those e-mails. :) r suffers from early made poor decisions, but then this in itself is not a good reason to carry on. Radford Neal is also complaining on his blog (http://radfordneal.wordpress.com/) about what he thinks are design flaws in R. Why don't you two get together and design a good substitute without any flaws? Or is that too hard? ;-) Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 12:53:31 +0100 Wacek Kusnierczyk [EMAIL PROTECTED] wrote: but seriously, when one buys a complicated device one typically reads a quick start guide, and makes intuitive assumptions about how the device will work, turning back to the reference when the expectations fail. good design should aim at reducing the need for checking why an intuitive assumption fails. And on what are these intuitive assumptions based if not on familiarity with similar devices? And people have different intuition, why should yours be the correct one and the golden standard? I know that if I buy a complicated device and never owned something similar I read more than the quick start guide to get familiar with the device before breaking something due to using wrong assumptions. When I started to use S-PLUS, I had used GAUSS before. Still I took the time off to work through the blue book and make myself familiar with S-PLUS before using it for serious work. Based on my experience with R, I found R very intuitive and easy to use; but still try to keep up with relevant documentation. It really seems that your problem is that you have an attitude of wanting to have instant gratification. If you do not care about how to use machine-gun correctly you could easily harm yourself or others. indeed, and i'm scared to think that some of the published research can be harmful because the researcher denied to read the whole r reference before doing a stats analysis. Sorry, but this is absolute rubbish. There are plenty of statistical analyses that can be done without reading the complete R reference. However, one or two good books might help. My concern would rather be that everybody thinks that they can do statistics and that software project of R makes such people really think they can do it. I am far more concerned about inappropriate analyses and wrong interpretations. How often is absence of evidence taken as evidence of absence? you see, i'm not complaining about my own analyses failing because i have not read the appropriate section in the reference. if this were the problem, i'd just read more and keep silent. i'm complaining about the need to read, by anyone who starts up with r, in all gory details, about the intricacies of r before doing anything, because the behaviour is often so unexpected. I guess Frank Harrell had people like you in mind when he wrote: https://stat.ethz.ch/pipermail/r-help/2005-April/068625.html Would you also not expect to learn about surgery in all its gory details before attempting brain surgery because brain surgery is so intuitive and doesn't need any study? Believe it or not, there are lots of useful things that you can do in R without knowing all the gory details. There are even people who got books on R published who obviously don't know all the gory details and they still show useful applications of R. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R design (was Variable passed to function not used in function in select)
G'day Peter, On Wed, 12 Nov 2008 00:42:36 +0100 Peter Dalgaard [EMAIL PROTECTED] wrote: On 12/11/2008, at 11:29 AM, Peter Dalgaard wrote: Not that one again! For at least one other value of one, the expectation is the opposite: Factor levels do not go away just because they happen not to be present in data. fct - lapply(dd, is.factor) dd[fct] - lapply(dd[fct], [, drop=TRUE) (Actually, the last line could have had lapply(dd[fct],factor), I just got confused about whether in would preserve the level order.) That was my first thought too, that lapply(dd[fct], factor) should be enough. But then I thought that ordered factors test TRUE for is.factor and that using factor() on an ordered factor will not only remove the factor levels that are missing but also remove the information on the ordering of the remaining levels. A short test showed that this is not the case; and after reading the help page of factor() I realised that this is due to the design of the function. So perhaps this example should be documented as a case in which the design decisions of the R developer save a naive user of accidentally doing a wrong thing (namely turning an ordinal variable into a nominal). :) Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
Theory's side, they do not distribute precompiled CDROMs anymore. I wonder whether somebody has had a word with them about the price they were asking? Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
G'day Brian, On Fri, 14 Nov 2008 11:47:46 + (GMT) Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 14 Nov 2008, Duncan Murdoch wrote: I think they are talking about cases where the GPL libraries are compiled into the new product. Packages generally don't include copies of anything from R, so our GPL doesn't apply to them. (Writers may have chosen to copy and modify base functions; if so, they are copying our code, and the GPL would apply.) That _has_ happened several times (usually without any credit and removing the copyright header from the R version), so package writers do need to be aware that base functions are not fair game. Often it comes to light when the forked version fails when e.g. .Internal calls are changed. Removing credits is definitely poor form but nothing that the GPL stops, IIRC. If you want credits to be carried on, you would need to use a license such as the original BSD license. IIRC, it was exactly this clause, that credits have to be kept, that made the original BSD license a GPL-incompatible Free Software License. As far as I understand, removing a copyright header is indeed a pretty big no-no. And copyright law seems to be a funny thing. Recently I was asked by the maintainers of AucTeX whether I was willing to sign over the copyright of contributions I once had provided. When I looked at one of the files for which they wanted the copyright signed over, not a single line looked familiar to me (I cannot write elisp code at the level that the code was at). The file attributed copyright to me and I vaguely remember that I once contributed it so that AucTeX would have support for some .sty file. But over time, presumably also to keep up with changes in AucTeX, the file was completely re-written by others; with none of the original code remaining (as far as I can tell). Still, it was suggested to me that I am the copyright holder just because I contributed the original file, even though by now not a single line of the code originated from me. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
G'day Duncan, On Fri, 14 Nov 2008 11:16:35 -0500 Duncan Murdoch [EMAIL PROTECTED] wrote: On 11/14/2008 11:01 AM, Berwin A Turlach wrote: But I remember that a situation as you describe was hotly debated on gnu.misc.discuss in the mid-90s; thus, I am talking obviously GPL 2. Unfortunately I do not remember which software/company was involved and how the dispute was solved. But they did something like you described: distributed a binary and asked the user to download additionally some GPL software and then run both together. If this were allowed, the GPL would have a hole in it that you could drive a truck through. :) If the binary being released had no GPL content in it, then there would be no basis to complain about anything. I'd guess that the particular case required GPL'd headers in order to compile. That would be enough to say that the binary includes GPL'd code. I am not so sure about whether it is is a matter of including GPL'd headers. After all, if I just want to call some functions from a library libXXX and I know what the arguments of those function is, then I could write their prototypes into my own non-GPL'd headers. The question is really, does the software that I distribute work only if it is dynamically linked against libXXX and are there several implementations of the functionality that I need from libXXX? If the only implementation of libXXX is a GPL'd one, then my software is a derivative work of it. When I compiled it, I must have had the GPL version of libXXX on my system. To be more specific, looking at my Debian system, I can see that /usr/lib/libXXX might link to /etc/alternatives/libXXX and /etc/alternatives/libXXX links to the actual version, which could be a commercial library, a GPL'd one or one under some other open-software licence. If there are several alternatives for libXXX (for the functions that I need), then there is no way of telling what I have on my system and what I used when I created my binary which is linked to /usr/lib/libXXX. So I can give my binary to others and tell them that they have to get a version of /usr/lib/libXXX; and that one option would be to install the GPL'd version obtainable from XYZ. But if the only library that implements the functionality that my program needs is a GPL'd version, then it is pretty clear that /etc/alternatives/libXXX on my machine (if the /etc/alternatives set up is used at all) must point to the GPL'd libXXX. Thus, when I created the binary, I created a derivative work of libXXX, whether I used its GPL'd headers or not. But I guess I should continue to say that IANAL. :) Cheers, Berwin PS: Hope I managed again to not getting the flame-thrower started. :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
G'day Duncan, On Fri, 14 Nov 2008 12:36:15 -0500 Duncan Murdoch [EMAIL PROTECTED] wrote: On 11/14/2008 11:57 AM, Carlos Ungil wrote: And the copyright owners have recourse to legal action if they think there is a license violation. Again, I don't know what a court would decide, but if you want to test the limits of the GPL license I would avoid challenging a GNU project :-) Actually, I think that's an ideal situation. There is a lot of fear and doubt about using GPL'd software, because of worries like yours. But the FSF has the resources to follow up in cases where there are actual violations, and they have won several. What do you mean with won several? As far as I remember from reading an interview of Eben Moglen, in most cases the violator backed down and it ddi not come to a law suit. From what I read, I think the FSF was only involved in one case that actually went to court, perhaps two; but I could be wrong. However, http://en.wikipedia.org/wiki/Harald_Welte attributes to Harald Welte the creation of http://gpl-violations.org and with prosecuting violators of the GPL, which had been untested in court until then. And as I mentioned early, the FSF does not seem to be the copyright holder of the GSL, so it is not clear whether it can become active. But perhaps it can, it is not clear to me whether Harald Welte was the copyright holder in the cases that he pursued. The way to lose a GPL lawsuit is to incorporate GPL'd code into your own project, and then not follow the GPL when you redistribute. There's evidence of that. But I've never heard of anyone linking to but not distributing GPL'd code and being sued for it, let alone losing lawsuits over it. That's evidence enough for me that it is a safe thing to do. I rather think it is evidence that nobody who has done such a thing was willing to let it come to a lawsuit but rather decided to settle the matter before it reached court. Probably because the advice that they got from their own lawyers was that they were on a sure loser and should settle instead, with the settlement being much cheaper. As I understand it, the FSF never went for punitive damages and was just happy with the offender rectifying his/her way in the future and adhering to her/his obligation under the GPL. In my opinion, but IANAL, it is pretty obvious that if your binary has to be linked against some GPL'd software to make it work and this is the only way of making your binary work, then your binary is a derivative work of that GPL'd software. Hence, if you want to distribute your binary, you are bound to the GPL. Thus, on one hand I agree with you, I have never heard that of anyone linking to but not distributing GPL'd code and being sued for it, let alone losing lawsuits over it. On the other hand I do seem to remember hearing about people who distributed binaries-only software, that needed to be linked against software under GPL to work, being talked to by the FSF and agreeing to stop their behaviour. Anybody knows if there is an archive for gnu.misc.discuss for the early to mid '90s so that I can try to refresh my memory? Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
G'day Duncan, On Fri, 14 Nov 2008 13:37:20 -0500 Duncan Murdoch [EMAIL PROTECTED] wrote: What is so special about binaries? First, with binaries it is (presumably) easier to define when one piece of software is part of another. Secondly, I presume that the software that started this thread is a binary, though I have not downloaded it. By your line of reasoning (which I don't agree with), if you write an R script, making use of features of R that are not present in other S dialects, then the only way to make it work would be to use R. So if you want to distribute it, you would need to GPL it. Not at all, see: http://www.fsf.org/licensing/licenses/gpl-faq.html#IfInterpreterIsGPL Last time I checked, R is a programming language interpreter. :) So lots of the R packages on CRAN that do not have GPL compatible licenses would be in violation of the GPL, Not at all. As per the above cited FAQ item, there would be no problem with the interpreted R code; there could be problem with compiled code linked into R. But as soon as that code can also be compiled and run without R, there would be no problem. and since CRAN is distributing them, CRAN is in violation of the GPL. That's nonsense. You might find that many things that you think are nonsense are taken deadly serious by lawyers. ;-) Moreover, if you write a C/C++ program that makes use of GNU extensions, you'd be in violation of the GPL if you were to distribute it without GPLing it. Even the FSF doesn't believe that: http://www.fsf.org/licensing/licenses/gpl-faq.html#CanIUseGPLToolsForNF Could you please explain how you come to this opinion? As far as I know, if you write a C/C++ program that makes use of GNU extensions, that program will be still linked to the same system libraries as a program that does not make use of such extensions. Or is a program that uses GNU extensions all of a sudden linked to /usr/lib/libgnuspecial ? although they do (incorrectly, in my opinion) make the same claim as you about libraries: http://www.fsf.org/licensing/licenses/gpl-faq.html#IfLibraryIsGPL Given the track history of the FSF of defending the GPL, I am afraid I will go rather with their opinion than yours. :) In particular since their opinions were formed with input from lawyers as far as I can tell. The argument in the latter FAQ does seem to imply that R scripts must be GPL'd, and again: that's nonsense. Not at all, read the complete FAQ, especially the part I mention above about interpreted languages. Thus, on one hand I agree with you, I have never heard that [...] On the other hand I do seem to remember hearing about people who distributed binaries-only software, that needed to be linked against software under GPL to work, being talked to by the FSF and agreeing to stop their behaviour. [...] And I've heard of people receiving letters from copyright holders asking them to stop making fair use of copyrighted materials, and they complied. So what? That just shows that intimidation works, it doesn't show that there is a legal basis for the intimidation. Well, from what I read it showed that the offenders didn't like their chances of taking the risk of going to court. So they must have come to the conclusion that there is some legal basis. I guess the only way of knowing for sure would be if you commit such a violation, refuse to stop doing so, don't settle out of court and decide go to court. That would probably settle once and for all whether there is a legal basis for that argument. And you would be doing the free-software community a great favour by having this issue finally tested in a law of court, decided and laid to rest. But please don't be offended if my money is on you loosing the case. :) Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] licensing of R packages
G'day Duncan, On Fri, 14 Nov 2008 15:24:03 -0500 Duncan Murdoch [EMAIL PROTECTED] wrote: Moreover, if you write a C/C++ program that makes use of GNU extensions, you'd be in violation of the GPL if you were to distribute it without GPLing it. Even the FSF doesn't believe that: http://www.fsf.org/licensing/licenses/gpl-faq.html#CanIUseGPLToolsForNF Could you please explain how you come to this opinion? I believe your argument was that if my program can't be used without a library that is currently only available under the GPL, then my program must be considered as a derivative work, so I must license it under the GPL as well. It depends now what you mean with my program. I was making my comments specifically within the context of the original posters questions about Numerit's behaviour and Barry's assertion that he could write a C program that relies on a GPL'd library, distribute the compiled binary of that C program without the GPL'd library and just ask the user to get the GPL'd library from somewhere else so that he or she could then run the program. You seem to extend the scope to a more general definition of what a program constitutes, including the distribution of source code for an interpreter. As I tried to point out, if you distribute source code to be interpreted by an interpreter that is GPL'd, you bring up all kind of issues. Well, if my program uses GNU extensions, it can only be used when compiled by gcc, which I believe is GPL'd. So your argument would imply that it is a derivative work. Not at all. Please read what I said. Is the program compiled from C code that uses GNU extensions all of a sudden linked to a GPL'd /usr/lib/libgnuextension library, with that library being the only source of the needed functionality? The C code is input (data) to the GNU compiler. The license of the GNU compiler does not specify that the output (i.e. the binary) has to be GPL'd. Nor does the FSF claim copyright to that output. (IIRC, when reading the EULA of some commercial compilers in the 80's, some software houses seriously claimed the copyright of any program compiled with their compilers. I think this clause disappeared rather soon when users realised its implication.) None of us (you, me or the FSF) think the latter argument is valid. Agreed, and in my case, and presumably also the FSF's case, it is because such an argument was never made and never claimed. To be, hopefully, completely precise: if you distribute a binary program which can only be run if linked against a GPL'd library, i.e. no alternative implementation of the library providing the functionality your binary needed exists, then that binary is a derivative work of the GPL'd library. Hence, in order to distribute it, you would have to adhere to the rules under which use of the GPL'd library was granted to you. Distributing your binary, and asking your users to get the GPL'd library separately from somewhere else, does not absolve you from your responsibilities under the GPL. I don't see a material difference from the former argument, so I conclude the former argument is invalid as well. My scenario: a binary program is distributed which can only be run if it is linked by the user to a GPL'd library. The source code of that binary program is not distributed and there is no offer of providing it. There is no way of using that program except with the GPL'd library, i.e. you need to have the GPL'd library libXXX, there is no other (whether commercial or not) provider of libXXX that implements the same featuers. Thus, it is clear that when you produced the binary, you had the GPL'd library on your system, you linked the binary against that library and at that moment you created a derivative work of the GPL'd library. Your scenario (as I understand it): You use a GPL'd compiler to compile your program which is then only linked to system libraries available on any similar OS. The compiler offers extension to the language that other compilers don't offer, but produces binaries that are only linked to said standard libraries. The GPL'd compiler makes no claims on the binary that it produces. Sorry, but if you do not see a material difference here, then I guess it is best that we just agree to disagree. :) The argument in the latter FAQ does seem to imply that R scripts must be GPL'd, and again: that's nonsense. Not at all, read the complete FAQ, especially the part I mention above about interpreted languages. I think the complete FAQ is inconsistent. I guess that comes from the fact that you do not see a material difference where other do. :) Some of it is accurate (e.g. the bits that say you can use GPL'd software without GPL'ing your own work) and some of it is not (the parts that imply dynamic linking makes something a derivative work.) As far as I understand, there have been people who shared that thought, namely that dynamic linking does not imply