Re: [R] Lisp-like primitives in R
[Peter Dalgaard] [François Pinard] I meant that R might have implemented a Scheme engine [...] with a surface language [...] which is purposely not Scheme, but could have been. [...] one could dare dreaming that the Scheme engine in R be completed, and Scheme offered as an alternate extension language. [...] there are excellent Scheme compilers. [...] Well, depending on what you want, this is either trivial or impossible... I'm more leaning on the impossible side :-). The internal storage of R is still pretty much equivalent to scheme. R needs a few supplementary data types, and it motivated the R authors into re-implementing their own Scheme engine instead of relying on an existing implementation of a Scheme system. r2scheme - function(e) [...] Nice exercise! :-) a parser that parses a similar language to R internal format is not a very hard exercise (some care needed in places). However, replacing the front-end is not going to make anything faster, Of course. The idea is nothing more than to please people starving to use Scheme instead of S as a surface language, here and there in scripts. I merely thought that if the gap is small enough (so to not require an extraordinary effort), it would be worth the leap. One immediate difficulty to foresee is the name clashes between R and RnRS. There might also be missing things in R (like continuations, say). To make anything faster, and this is a totally different idea, one might consider replacing the back-end, not the front-end. Writing good optimizing Scheme compilers is quite an undertaking, and if one only considers type inference (as a subproblem), this still is an active research area. The Scheme engine in R was written as to quickly get a working S (non-obstant lexical scoping and some library issues). My ramble was about switching this quick base of R to some solid Scheme implementation, than to re-address separately compiling issues for R. and the evaluation engine in R does a couple of tricks which are not done in Scheme, notably lazy evaluation, Promises? Aren't they already part of Scheme? The main difference I saw is their systematic use in R argument passing. All aspects of mere argument passing would require a lot of thought. As you wrote, variable scope is another difficulty. Offering a compatible C API, and library interface in general, might be a frightening but necessary challenge. It's all more of a dream than a thought, actually... :-) Look up the writings of Luke Tierney on the matter to learn more. Thanks for this interesting reference. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lisp-like primitives in R
[Duncan Murdoch] You could also look at Ross Ihaka's paper that is online here: http://cran.r-project.org/doc/html/interface98-paper/paper.html Interesting read. Thanks for this reference! -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lisp-like primitives in R
[Roland Rau] [François Pinard] I wonder what happened, for R to hide the underlying Scheme so fully, at least at the level of the surface language (despite there are hints). To further foster portability, we chose to write R in ANSI C Yes, of course. Scheme is also (often) implemented in C. I meant that R might have implemented a Scheme engine (or part of a Scheme engine, extended with appropriate data types) with a surface language (nearly the S language) which is purposely not Scheme, but could have been. If the gap is not extreme, one could dare dreaming that the Scheme engine in R be completed, and Scheme offered as an alternate extension language. If you allow me to continue dreaming awake -- they told me they will let me free as long as I do not get dangerous! :-) -- part of the interest lies in the fact there are excellent Scheme compilers. If we could only find or devise some kind of marriage between a mature Scheme and R, so to speed up the non-vectorisable parts of R scripts... If we are lucky and one of the original authors reads this thread they might explain the situation further and better [...]. In r-devel, maybe! We would be lucky if the authors really had time to read r-help. :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lisp-like primitives in R
[Chris Elsaesser] I mainly program in Common Lisp and use R for statistical analysis. While in R I miss the power and ease of use of Lisp, especially its many primitives such as find, member, cond, and (perhaps a bridge too far) loop. Has anyone created a package that includes R analogs to a subset of Lisp functions? [Greg Snow] Not all of us are familiar with lisp [...] If you tell us what find, member, cond, and loop do, or what functionality you are looking for, then we will have a better chance of telling you how to do the same in R. Hi, my fRiends :-). So far that I understand, R is built over what originally was a Scheme engine. Scheme may be seen as a flavour of LISP (yet I know people that would strongly object seeing Scheme and Lisp in the same statement :-). But it makes it rather likely that most functions you want already exist in R, even if under different names or syntax. I wonder what happened, for R to hide the underlying Scheme so fully, at least at the level of the surface language (despite there are hints). Wouldn't it have been natural to have the underlying Scheme exposed as an extension language for R, so one might write Scheme functions just as well as C or FORTRAN functions? Is the engine so far from a real Scheme implementation, that such an idea was never reasonable? About the idea of Lisp-inspired library functions... Many Lisp flavours, Common Lisp likely included, have a comprehensive (tremendous?) set of primitives and library functions. By comparison, Scheme is quite moderate, and does not go much beyond the essentials, something which much pleases me :-). There also are many important differences between Common Lisp and Scheme (like for example, global dynamic scoping versus textual scoping). If R was ever to offer Lisp-like interfaces, RnRS (Scheme standards) might be considered, both for being simpler, and more in the spirit of what R already is. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 2.5.1 - Rscript through tee
[Dirk Eddelbuettel] [François Pinard] #!/usr/bin/Rscript options(echo=TRUE) a - 1 Sys.sleep(3) a - 2 If I execute ./pp.R at the shell prompt, the output shows the timely progress of the script as expected. If I use ./pp.R | tee OUT instead, the output seems buffered and I see it all at once at the end. [...] So, is there a way to tell R (or Rscript) that standard output should be unbuffered, even if it is not directly connected to a terminal? Use explicit print statements, e.g. print(a - 1) Yes, I noticed that print statements get written. But I wanted the mere echo trace of the execution of the script to be synchronous (as some statements take many seconds to compute, which I symbolically replaced by Sys.sleep above). Littler5D actually won't show anything unless you explicitly call cat() or print(), but then it does [...] It shares the limitation of Rscript, then. Littler is an 'all-in' binary and starts and runs demonstrably faster than Rscript. I'm not familiar with Littler. Speedwise, Rscript is OK for me so far, as most time is spent within R computations, not much in language compilation or script interpretation. [...] the rather petty refusal of Rscript's main author to a least give a reference to littler in Rscript's documentation, let alone credit as 'we were there first', [...] I've long been in academic circles (and elsewhere too), so I'm familiar with the need of recognizing authorship and people's works. However, perusing R mailing list archives, and following actual list contents, I'm sometimes surprised, and even a bit annoyed, by the recurrent starve for credit I observe. Of course, maintainers and contributors much deserve our thanks and, without going into arguments about what is due to whom, I think contributors receive praise on average, would it be only by all the interest shown by the community. However, it gets a bit muddy when maintainers or contributors show bad temper when not receiving the systematic credit they would like to read. Cicero's friends were telling him how upset they felt that there was still no statute of Cicero on the public place. Cicero replied that he much preferred to hear people saying Why no Cicero statute yet? than to hear people saying Why the Cicero statute?. A wise attitude! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Synchronzing workspaces
[Paul August] I used to work on several computers and to use a flash drive to synchronize the workspace on each machine before starting to work on it. I found that .RData always caused some trouble: Often it is corrupted even though there is no error in copying process. Does anybody have the similar experience? Not me. I use flash drives a lot to move .RData files around, without the slightest trouble. However, in my case, the involved machines are similar in their architecture and system, so I was not fearing trouble. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel (off-topic, sort of)
[Alberto Monteiro] Maybe I'll write a letter to Santa Claus [there are people who write to congressman; they must have more faith than me]. :-) :-) I wish a language where I can write a = b + 10 and then when I write a = 20 the language automatically assigns b = 10. METAFONT does this (and consequently, Metapost as well). I still remember my surprise when I found out that Donald Knuth resorts to such sophisticated machinery for the sole purpose of designing font characters. Knuth surely did many wonderful things :-). -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Max vs summary inconsistency
[Adam D. I. Kramer] I'm having the following questionable behavior: summary(m) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 13000 26280 25890 38550 50910 max(m) [1] 50912 ...it seems to me like max() and summary(m)[6] ought to return the same number. Am I doing something wrong? Some may say that you did not scrutinize the documentation enough, as summary artificially limits the number of significant digits. However, this question reoccurs often and regularly in these mailing lists, so at last, maybe something should be done about it, beyond documenting how it works. Overall, too many users got mislead, that one may not so bluntly assert they are all wrong. For example, resorting to scientific notation whenever non significant zero digits would have otherwise been printed. This should clarify a bit that the printing precision got artificially limited. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
[Muenchen, Robert A (Bob)] I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7)) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. mydata - data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary(subset(mydata, select=c(x1, x3:x5, x7))) Maybe: variables - expression(c(x1, x3:x5, x7)) and later: summary(subset(mydata, select=eval(variables))) However, I do not know how one computes the expression piecemeal, that is, better than by building a string and parsing the result. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 2.5.1 - Rscript through tee
Hi, people. I met a little problem for which someone might have a solution. Let's say I have an executable file (named pp.R) with this contents: #!/usr/bin/Rscript options(echo=TRUE) a - 1 Sys.sleep(3) a - 2 If I execute ./pp.R at the shell prompt, the output shows the timely progress of the script as expected. If I use ./pp.R | tee OUT instead, the output seems buffered and I see it all at once at the end. The problem does not come from the tee program, as if I use this command: (echo a; sleep 5; echo b) | tee OUT the output is timely, not batched. So, is there a way to tell R (or Rscript) that standard output should be unbuffered, even if it is not directly connected to a terminal? In case useful, here is local R information: Version: platform = x86_64-unknown-linux-gnu arch = x86_64 os = linux-gnu system = x86_64, linux-gnu status = major = 2 minor = 5.1 year = 2007 month = 06 day = 27 svn rev = 42083 language = R version.string = R version 2.5.1 (2007-06-27) Locale: LC_CTYPE=fr_CA.UTF-8;LC_NUMERIC=C;LC_TIME=fr_CA.UTF-8;LC_COLLATE=fr_CA.UTF-8;LC_MONETARY=fr_CA.UTF-8;LC_MESSAGES=fr_CA.UTF-8;LC_PAPER=fr_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_CA.UTF-8;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:stats, package:utils, package:datasets, fp.etc, package:graphics, package:grDevices, package:methods, Autoloads, package:base -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turning a logical vector into its indices without losing its length
[Leeds, Mark (IED)] I have the code below which gives me what I want for temp based on invec but I was wondering if there was a shorter way ( i.e : a one liner ) without having to initialize temp to zeros. This is ppurely for learning purposes. Thanks. invec - c(TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE) temp-numeric(length(invec)) temp[invec]-which(invec) temp [1] 1 0 0 4 0 0 7 0 A mere: invec * seq_along(invec) would do it. To be honest, I dislike the multiplication trickery, and so prefer Gabor's solution, even if a bit longer: ifelse(invec, seq_along(invec), 0) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does anyone.... worth a warning?!? No warning at all
[Ted Harding] [...] a very important point. [...] There are a lot of idiosyncracies in R, which in time we get used to; but learning about them is something of a sociological exercise, just as one learns that when one's friend A says X Y Z is may not mean the same as when one's friend B says it. [...] Another example is in the use of %*% for matrix multiplication when one or both of the factors is a vector. [...] Just a few thoughts. As I say we all get used to this stuff in the end, but it can be bewildering (and a trap) for beginners. Using R is a bit akin to smoking. Beginnings are difficult, one may get headaches, and even gag on the first experiences. But in the long run, it becomes pleasurable, and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it. While I appreciate many of the virtues of R, as a language, it has a few flaws. Besides, as a library, and despite many commendable symmetries and beauties, it sometimes suffers from irregularities in its various specifications and offerings -- likely for historical reasons -- maybe lack of coordination while aging, or maybe needs of S compatibility. These irregularities are sometimes documented clearly, yet in many cases, exegesis is required. Moreover, around documentation, there is a question of attitude. While some R maintainers are refreshingly open-minded, others are strongly reluctant to reconsider anything which has been written, as if the mere fact of documenting a detail was fixing it in the universe and eternity; they would then argue to death against slightest changes. In a word, because almost impossible to repair in practice, R idiosyncrasies are likely to stay. Accepting them (idiosyncrasies, irregularities) is part of the game. Correcting them a tiny bit at a time (like, for example, the mean behaviour at the origin of this thread) might overall take forever and shake myriads of electrons within tons of discussions. I'm not sure it is a worth undertaking. For one, I prefer learning to be productive with R as it stands, even knowing it could have been a bit better. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to collapse a list of 1 column matrix to a matrix?
[EMAIL PROTECTED] I encounter a situation where I have a list whose element is a column matrix. Says, $'1' [,1] 1 2 3 $'2' [,1] 4 5 6 Is there fast way to collapse the list into a matrix like a cbind operation in this case? Meaning, the result should be a matrix that looks like: [,1] [,2] [1,]1 4 [2,]2 5 [3,]3 6 I can loop through all elements and do cbind manually. But I think there must be a simpler way that I don't know. Thank you. The do.call function is the R equivalent of the apply from many other languages. I guess that, in R, apply was already taken :-) For example: a = list(x=matrix(1:3, 3, 1), y=matrix(4:6, 3, 1)) a $x [,1] [1,]1 [2,]2 [3,]3 $y [,1] [1,]4 [2,]5 [3,]6 do.call(cbind, a) [,1] [,2] [1,]14 [2,]25 [3,]36 -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to collapse a list of 1 column matrix to a matrix?
[EMAIL PROTECTED] One more question. After I collapse everything into one matrix, I would like to find the index of column that holds minimum value for each row. I remember that there is a function like maxCols but I can't seem to find the same thing for minimum value. Any suggestion please? Here is a possible avenue: z - matrix(sample(1:25), 5) z [,1] [,2] [,3] [,4] [,5] [1,] 1629 247 [2,] 21 19 22 23 18 [3,] 1235 13 15 [4,] 204 25 11 10 [5,] 1718 146 apply(z, 2, which.min) [1] 3 5 3 4 5 I would presume (yet I did not recently check) that do.call, which.min, and a flurry of other useful functions, are introduced in various R tutorials. If you plan to use R seriously, it might be worth scrutinizing a few of those. Keep happy! -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine matrix
[Gianni Burgin] let say something like this a=matrix(1:25, nrow=5) rownames(a)=letters[1:5] colnames(a)=rep(A, 5) a A A A A A a 1 6 11 16 21 b 2 7 12 17 22 c 3 8 13 18 23 d 4 9 14 19 24 e 5 10 15 20 25 b=matrix(1:40, nrow=8) rownames(b)=c(rep(a,4),rep(b,4)) colnames(b)=rep(B, 5) b B B B B B a 1 9 17 25 33 a 2 10 18 26 34 a 3 11 19 27 35 a 4 12 20 28 36 b 5 13 21 29 37 b 6 14 22 30 38 b 7 15 23 31 39 b 8 16 24 32 40 as a results I wold like something like A A A A A B B B B B a 1 6 11 16 21 1 9 17 25 33 a 1 6 11 16 21 2 10 18 26 34 a 1 6 11 16 21 3 11 19 27 35 a 1 6 11 16 21 4 12 20 28 36 b 2 7 12 17 22 5 13 21 29 37 b 2 7 12 17 22 6 14 22 30 38 b 2 7 12 17 22 7 15 23 31 39 b 2 7 12 17 22 8 16 24 32 40 does it is clear? is there a function that automate this operation? Like, maybe: cbind(a[rownames(b),], b) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function to find coodinates in an array
[Ana Conesa] I am looking for a function/way to get the array coordinates of given elements in an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements of X by ord - order(X) (this returns me a vector) - I now want to find the x,y,z coordinates of each element of ord [Moshe Olshansky] If your array's dimensions were KxMxN and the linear index is i then n - ceiling(i/(K*M)) i1 - i - (n-1)*(K*M) m - ceiling(i1/K) k - i1 - (m-1)*K and your index is (k,m,n) The reshape package might be helpful, here. If I understand the problem correctly, given this artificial example: X - sample(1:24) dim(X) - c(2, 3, 4) you would want: library(reshape) melt(X)[order(X), -4] so getting the indices in a three columns data frame. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with counting how many times each value occur in each column
[Gabor Grothendieck] table(col(mat), mat) Clever, simple, and elegant! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with counting how many times each value occur in each column
[Tom Cohen] I have the following dataset and want to know how many times each value occur in each column. data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -100 -100 -100000000 -100 [2,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [3,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [4,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [5,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -50 [6,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [7,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [8,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [9,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [10,] -100 -100 -100 -50 -100 -100 -100 -100 -100 -100 [11,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [12,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [13,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [14,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [15,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [16,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [17,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [18,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 [19,] -100 -100 -100000000 -100 [20,] -100 -100 -100 -100 -100 -100 -100 -100 -100 -100 The result matrix should look like -100 0 -50 [1] 20 [2] 20 [3] 20 [4] 17 [5] 18 [6] 18 [7] 18 and so on [8] [9] [10] Presuming that data is a matrix, one could try a sequence like this: dataf - factor(data) dim(dataf) - dim(data) result - t(apply(dataf, 2, tabulate, nlevels(dataf))) colnames(result) - levels(dataf) result If you want the columns sorted, you might decide the order of the levels on the factor() call, or explicitly reorder columns afterwards. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bind together two vectors of different length...
[Andris Jankevics] I have two vectors: A - c(1:10) B- seq(1,10,2) Now I want to make a table form vectors A and B as rows, and if a value of A isn't present B, then I want to put a N/A symbol in it: Output should look like this: 1 2 3 4 5 6 7 8 9 10 1 0 3 0 5 0 7 0 9 0 How can I do this in R? Either of: A[!A %in% B] - NA A[!A %in% B] - 0 depending on what you want your N/A symbol to be. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding timconsuming for loop renaming identifiers
[EMAIL PROTECTED] I was wondering if I can avoid a time-consuming for loop on my 60 obs dataset. school_id y 8 9.87 8 8.89 8 7.89 8 8.88 20 6.78 20 9.99 20 8.79 31 10.1 31 11 There are, say, 143 different schools in this 60 obs dataset. I need to thave sequential identifiers, 1,2,3,4,5,...,143. Hello, Toby. Maybe: dta$id - cumsum(c(1, diff(dta$school_id) != 0)) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A More efficient method?
[Keith Alan Chamberlain] Is there a faster way than below to set a vector based on values from another vector? I'd like to call a pre-existing function for this, but one which can also handle an arbitrarily large number of categories. Any ideas? Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable C1=vector(length=length(Cat)) # New vector for numeric values # Cycle through each column and set C1 to corresponding value of Cat. for(i in 1:length(C1)){ if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 } C1 [1] -1 -1 -1 1 1 1 -1 -1 1 Cat [1] a a a b b b a a b For handling an arbitrarily large number of categories, one may go through a recoding vector, like this for the example above: Cat - c('a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'b') C1 - c(a=-1, b=1)[Cat] C1 a a a b b b a a b -1 -1 -1 1 1 1 -1 -1 1 -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Off topic:Spam on R-help increase?
[Marc Schwartz] The Human Spam Filter (aka Martin) [...] The R mailing list has, indeed, be remarkably spam-free, and well-managed so far that I can see. I do hope, however, that Martin does not have to do the filtering himself -- it would be just daunting! In any case, Martin, a lot of thanks from me! -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conflict in .Rprofile documentation FAQ vs. Help?
[Brian Ripley] No one actually said it was a *working* example [...] Do you mean that, whenever we see something presented as an example within or around the R system, we should not take it as dependable unless it is explicitly said to be working? (and it is enclosed in \dontrun{}) Within the R online help system, many examples are marked so they are not run. I naively thought they were not run for friendly reasons, like for example, not inordinately impacting the user's environment. Should I read you as saying that those examples are not to be believed? -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question about R environment
[Philippe Grosjean] Please, don't reinvent the wheel: putting functions in a dedicated environment is one of the things done by R packages (together with a good documentation of the function, and making them easily installable on any R implementation). [...] this is probably the time for you to read the Writing R extensions manual, and to start implementing your own R package! Hi, Philippe, and gang! I read this manual long ago and used it to create packages already. You really got the impression I did not read it? :-) You know, there are small wheels, and huge wheels. I do not see why I would use complex devices for tiny problems, merely because those complex devices exist. R packages undoubtedly have their virtues, of course. But just like many statistical tests, they do not always apply. Why go at length organising package directories populated with many files, resorting to initialisation scripts, using package validators, creating documentation files and processing them, go through the cycle of creating a package and installing it, all that merely for a few small quickies that fit very well in the ubiquitous .Rprofile file? Why worry about installation on any R implementation, for little things only meant for myself, and too simple to warrant publication anyway? Keep happy, all. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scripts with littler
[John Lawrence Aspden] I'm trying to write R scripts using littler (under Debian), and was originally using the shebang line: #!/usr/bin/env r However this picks up any .RData file that happens to be lying around, which I find a little disturbing, because it means that the script may not behave the same way on successive invocations. If you drop the /usr/bin/env trick then #!/usr/bin/r --vanilla seems to work, but it also prevents the loading of the libraries in my home directory, some of which I'd like to use. #!/usr/bin/r --no-restore doesn't work at all. Ideally I'd like #!/usr/bin/env r --no-restore Has anyone else been round this loop and can offer advice? I usually do something like: #!/bin/sh R --slave --vanilla EOF R script goes here... EOF # vim: ft=r If you need to search special places for packages, you may tweak exported environment variables between the first and second line. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scripts with littler / subroutines
[John Lawrence Aspden] Another difficulty I'm having is creating a common function (foo, say) to share between two scripts. In your previous message, you were telling us that you want to load from your home directory. You might put the common functions there, maybe? -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question about R environment
[Tong Wang] I created environment mytoolbox by : mytoolbox - new.env(parent=baseenv()). Is there anyway I put it in the search path? In a project, I often write some small functions, and load them into my workspace directly, so when I list the objects with ls(), it looks pretty messy. So I am wondering if it is possible to creat an environment, and put these tools into this environment. For example, I have functions fun1(), fun2() ... and creat an environment mytoolbox which contains all these functions. And it should be somewhere in the search path: .GlobalEnv mytoolbox package:methods. Here is a trick, shown as a fairly simplified copy of my ~/.Rprofile. It allows for a few simple functions always available, yet without having to create a package, and leaving ls() and any later .RData file unencumbered. The idea is to use local() to prevent any unwanted clutter to leak out (my real ~/.Rprofile holds more than shown below and use temporary variables), to initialise a list meant to hold a bunch of functions or other R things, and to save that list on the search path. This example also demonstrate a few useful functions for when I read the R mailing list. I often need to transfer part of emails containing code excerpts within the window where R executes, while removing quotation marks, white lines and other noise. I merely highlight-select part of the message with the mouse, and then, within R, do things like: xs() source the highlighted region xd() read in a data.frame xm() read in a matrix xe() evaluate and print an expression xv() read a list of values as a vector The list above in decreasing order of usefulness (for me). Except for xs(), which has no automatic printout, you may either let the others print what they got, or assign their value to some variable. Arguments are also possible, for example like this: xd(T) read in a data.frame when the first line holds column names if (interactive()) { local({ fp.etc - list() fp.etc$xsel.vector - function (...) { connexion - textConnection(xselection()) on.exit(close(connexion)) scan(connexion, ...) } fp.etc$xsel.dataframe - function (...) { connexion - textConnection(xselection()) on.exit(close(connexion)) read.table(connexion, ...) } fp.etc$xsel.matrix - function (...) { connexion - textConnection(xselection()) on.exit(close(connexion)) data.matrix(read.table(connexion, ...)) } fp.etc$xsel.eval - function (...) { connexion - textConnection(xselection()) on.exit(close(connexion)) eval(parse(connexion, ...)) } fp.etc$xsel.source - function (...) { connexion - textConnection(xselection()) on.exit(close(connexion)) source(connexion, ...) } fp.etc$xselection - function () { lignes - suppressWarnings(readLines('clipboard')) lignes - lignes[lignes != ''] stopifnot(length(lignes) != 0) marge - substr(lignes, 1, 1) while (all(marge %in% c('', '+', ':', '|')) || all(marge == ' ')) { lignes - substring(lignes, 2) marge - substr(lignes, 1, 1) } lignes } fp.etc$xv - fp.etc$xsel.vector fp.etc$xd - fp.etc$xsel.dataframe fp.etc$xm - fp.etc$xsel.matrix fp.etc$xe - fp.etc$xsel.eval fp.etc$xs - fp.etc$xsel.source attach(fp.etc, warn=FALSE) }) } # vim: ft=r -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.fwf and header
[Martin Maechler] In my (and probably R-core's) view, read.fwf() should only have to be used for ``legacy data files'' (those times when people used *no* separators in order to save disk space), since nowadays, such data files should automatically have correct separators. In my day-to-day experience, the main virtue for fixed width format files is basic, humble legibility, much more than disk space savings. The FWF files I see have delimiters between fields, but also embedded space within fields, or at end of fields, without extraneous quotes. XML markup, CSVs, quoted fields, etc. are devices meant for helping machines much more than for helping humans. They significantly decrease legibility. Humans not only know better, they decipher fixed width format easily enough for not really needing hairier devices in general. FWF files may be archaic, they are not obsolescent. They will resist the fashion of the day for complexity, and survive in the long run. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suitability of R for Algorithm simulations
Hi, people. A correspondent puts me in front of a reply I sent to r-help, a few weeks ago, and quoted below. I should have been tired when I sent it. Please replace Eiffel by Erlang all over. Sorry for this error. Date: 2006-10-05 00:43:36 Message-ID: [EMAIL PROTECTED] [Ethan B. Fini] I would like to be able to instantiate an object for each node in my simulated (stand alone, one computer) distributed environment and then proceed by (a) adding message exchange functionality and (b) algorithm behavior to each node. Not so long ago, I quickly glanced at Eiffel after an enthusiastic friend told me about it, and while I do not think I will soon use it for myself, Eiffel might be the right choice for you, being strong on light-weight processes and message passing, from what I've read... If I had a simulation problem to tackle nowadays, I'd likely consider Python supplemented with greelets from the pylib library, mainly because I'm fond on Python legibility, and have a reasonably good confidence in people having implemented greenlets. The simulation results are represented on a GUI [...] The GUI aspects of Eiffel are unknown to me, I did not dive deep enough to touch them. For Python, I'd use pygtk, but there are many toolkits to choose from. Is R suitable for what I am trying to do? I looked around but have not been able to determine if R is the appropriate platform. R libraries are especially good at statistics and graphics. The language in itself is much oriented towards vectorisation, among other things, and this might be convenient for a speedy implementation of some simulation problems. If vectorisation could not be turned into an advantage for you with R, it is likely that R might be slow for such problems, and also not so well adapted to quasi-parallelism between interacting processes having each their own behaviour. Of course, seasoned R users might have much more sound opinions than mine on this topic! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suitability of R for Algorithm simulations
[Ethan B. Fini] I would like to be able to instantiate an object for each node in my simulated (stand alone, one computer) distributed environment and then proceed by (a) adding message exchange functionality and (b) algorithm behavior to each node. Not so long ago, I quickly glanced at Eiffel after an enthusiastic friend told me about it, and while I do not think I will soon use it for myself, Eiffel might be the right choice for you, being strong on light-weight processes and message passing, from what I've read... If I had a simulation problem to tackle nowadays, I'd likely consider Python supplemented with greelets from the pylib library, mainly because I'm fond on Python legibility, and have a reasonably good confidence in people having implemented greenlets. The simulation results are represented on a GUI [...] The GUI aspects of Eiffel are unknown to me, I did not dive deep enough to touch them. For Python, I'd use pygtk, but there are many toolkits to choose from. Is R suitable for what I am trying to do? I looked around but have not been able to determine if R is the appropriate platform. R libraries are especially good at statistics and graphics. The language in itself is much oriented towards vectorisation, among other things, and this might be convenient for a speedy implementation of some simulation problems. If vectorisation could not be turned into an advantage for you with R, it is likely that R might be slow for such problems, and also not so well adapted to quasi-parallelism between interacting processes having each their own behaviour. Of course, seasoned R users might have much more sound opinions than mine on this topic! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Large datasets in R
[Thomas Lumley] People have used R in this way, storing data in a database and reading it as required. There are also some efforts to provide facilities to support this sort of programming (such as the current project funded by Google Summer of Code: http://tolstoy.newcastle.edu.au/R/devel/06/05/5525.html). Interesting project indeed! However, if R requires uses more swapping because arrays do not all fit in physical memory, crudely replacing swapping with database accesses is not necessarily going to buy a drastic speed improvement: the paging gets done in user space instead of being done in the kernel. Long ago, while working on CDC mainframes, astonishing at the time but tiny by nowadays standards, there was a program able to invert or do simplexes on very big matrices. I do not remember the name of the program, and never studied it but superficially (I was in computer support for researchers, but not a researcher myself). The program was documented as being extremely careful at organising accesses to rows and columns (or parts thereof) in such a way that real memory was best used. In other words, at the core of this program was a paging system very specialised and cooperative with the problems meant to be solved. However, the source of this program was just plain huge (let's say from memory, about three or four times the size of the optimizing FORTRAN compiler, which I already knew better as an impressive algorithmic undertaking). So, good or wrong, the prejudice stuck solidly in me at the time, if nothing else, that handling big arrays the right way, speed-wise, ought to be very difficult. One reason there isn't more of this is that relying on Moore's Law has worked very well over the years. On the other hand, the computational needs for scientific problems grow fairly quickly to the size of our ability to solve them. Let me take weather forecasting for example. 3-D geographical grids are never fine enough for the resolution meteorologists would like to get, and the time required for each prediction step grows very rapidly, to increase precision by not so much. By merely tuning a few parameters, these people may easily pump nearly all the available cycles out the supercomputers given to them, and they do so without hesitation. Moore's Law will never succeed at calming their starving hunger! :-). -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] R as shell script
[Juha Vierinen] Hi, Hello, Juha. Your request, quoted below, is likely more appropriate for R help than for R devel, so I'm redirecting this reply there. I am considering if I should invest in learning R. Based on the language definition and introductory documents, it seems nice. But now I am faced with a problem: I want to be able to run R programs easily from the unix shell, and write scripts that can automatically select R as the interpreter: #!/usr/bin/R cat(Hello world.\n) This of course doesn't work, because /usr/bin/R is a shell script. I have been able to create a binary wrapper that calls R with the correct arguments, which is documented here: http://kavaro.fi/mediawiki/index.php/Using_R_from_the_shell This still lacks eg. standard input (but I have no idea how I can implement it in R) and full command line argument passing (can be done), but am I on the right track, or is there already something that does what I need? I'm often using something like: #!/bin/sh R --slave --vanilla EOF # Your R source code goes here! EOF Within your script, shell substitution for $1, etc., will occur. So with a bit of imagination, you can do about anything :-). Simple enough! Make sure you `cat' or `print' explicitly whatever has to be written on standard output: for one, I usually prefer full control in scripts over automatic printing of given expressions. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] incomplete final line found by readLines on ...
[Taka Matzmoto] Is there any way to prevent [this] warning message. Hi, Taka. The easiest might be using the suppressWarnings wrapper. See ?suppressWarnings for more information. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to rotate a triangle image(ZMAT) ?
[Cleber N.Borges] how to align this Zmat (triangle image) in X axis? I would like that the triangle's base become in the X axis and the triangle's height become in the Y axis. Is there some trick for make this? I'm not fully sure of what is the base and the height of the triangle, but if I guess correctly, you may peek at ?image, the last paragraph of the Details: section, and also in the Examples: section, where it says Need to transpose and flip matrix horizontally.. Maybe you'll find some explanations or ideas in there. f - function(x, y) { z = 1-x-y z[z (-1e-15)] - NA return(-100*x+0*y+100*z) } x = seq(1, 0, by = -0.01) y = seq(1, 0, by = -0.01) zmat = outer(x, y, f) image(zmat, col=terrain.colors(10)) contour(zmat, add=T) Another idea is to exchange x, y in the outer call, and maybe also use rev() on one of them. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Install R 2.3.1 on SUSE Linux
[EMAIL PROTECTED] I am new to Linux, and I am trying to install R 2.3.1 on SUSE Linux 10.0. The RPM installer, YAST, states that I need libgfortran.so.0. There is a SuSE 10.0 machine somewhere. Yes: I installed R on it, and it works well there: $ rpm -qf /usr/lib/libgfortran.so gcc-fortran-4.0.2_20050901-3 -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] flipping a plot vertically?
[Tim Brown] This seems like an obvious question but I can't find the answer in the par help document --- I'd like to make a plot where the 0,0 point is in the top left of the screen rather than bottom left... . [...] Any suggestions? You might retry your plot, adding an ylim=c(HIGHEST, LOWEST) argument, that is, listing the maximum before the minimum. For example: plot(1:10, ylim=c(10, 1)) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re-binning histogram data
[Berton Gunter] I would argue that histograms are outdated relics and that density plots (whatever your favorite flavor is) should **always** be used instead these days. When a now retired researcher paid us a visit, I showed him a density plot produced by R over some data he did work a lot, before he left. I, too, find them rather sexy, and I wanted to impress him with some of the pleasures of R, especially knowing he has been a dedicated user of SAS in his times. Yet, this old and wise man _immediately_ caught that the density curve was leaking a tiny bit through the extrema. Not a big deal of course -- and he did like what he saw. Nevertheless, this reminded me that we should be careful at not dismissing too lightly years of accumulated knowledge, experience and know-how, merely because we give in joyful enthusiasm for more recent things. Let me make a comparison, looking at the R mailing lists themselves. Some would much like sending HTML email in here: they would get colours, use various fonts, offer links, and have indentation which dynamically adapts on the receiving end to the window size of the reading guy. But the collective wisdom is to stick to non-HTML email, which is quite proven and still very functional, after all. Some impatient people or dubious tools use other things than fixed-width fonts while presenting text/plain email, or merely ignore the usual 79-column limit and other oldish etiquette issues while sending it: in last analysis, they kibitz the community more than they help it, and deep down, are a bit selfish. There is a long way to go before HTML email is really ubiquitous and correctly supported. Consider the long time MIME took to establish itself: even now, email readers correctly supporting MIME are hard to find -- most are fond on gadgets much more than they know standards. Another comparison which pops to my mind is how some people fanatically try to impose UTF-8 all around, saying that ASCII or ISO-8859-1 (and many others) are part of the prehistory of computers. When mere users, they can always talk without making too much damage. But I've seen a few maintainers going overboard on such matters, consciously breaking software to force their convictions forward: Crois ou meurs! as we say in French (approximately: Believe or perish!). Here, just like for HTML mail or nicer bitmapped R graphics, Unicode does have technical merit; the truth is that we are _far_ from mastering everything about it, and there are lots of open issues that are not strictly technical. Many proponent of these various things are tempted to say that they want to clean out the planet of outdated relics (I liked your expression!) and have the honest feeling they do trigger overall progress. Moreover, new good things do not necessarily make older things wrong. In a word, we should rather wait for progress with calm, and with respectful care of what already exists. Progress will impose itself slowly over time, and is not so much in need of forceful evangelists. :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Edit function
[Pikounis, Bill [CNTUS]] view - function(x) { warnopt - options()$warn options(warn=-1) on.exit({sink(); options(warn=warnopt)}) edit(x) invisible() } I'm surprised by the necessity of sink(). Presuming it is necessary indeed, the above could be simplified a bit like this (untested) code: view - function(x) { on.exit(sink()) invisible(suppressWarnings(edit(x))) } The documentation for suppressWarnings is not overly clear about if the warn option is restored or not in case of error. It says: 'suppressWarnings' evaluates its expression in a context that ignores all warnings. My exegesis :-) for that sentence would be that the context does not survive the error, and so, the warn option is not changed. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How can you buy R?
[Damien Joly] The entity has a policy of only using software that has been purchased and properly licensed (whatever that means). [...] Any ideas? [Rogerio Porto] I think there isn't such a vendor. A while ago, the Cygnus organisation has been created to address this kind of need, betting on the fact that they could live well by support contracts on free software, mainly GPL'ed software, which R is. Since then, Cygnus has been bought by Redhat, and I do not know if the original vocation survived, or has been plain lost. With enough luck, it could be useful to check on this side, who knows... :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] text plots?
[Robert Citek] Is there a way to do text plots in R? I'd like to do some simple XY plots in R with the output in text (ascii). Since I connect to a remote Linux machine using SSH, being able to generate a rough idea of what a plot will look like in text would be of benefit. Note that it is easy with SSH to open a graphics connection, you may use ssh -X to force it. Than, R will show you nice graphics even if run remotely. For example, with gnuplot I can do the following: echo 'set terminal dumb ; plot sin(x)' | gnuplot to generate a simple sin wave. Regards, Amusing for me that you mention this: I wrote that code, many years ago. Despite gnuplot was aiming higher graphic output quality on average, my contribution was readily accepted, and considered useful. While it is possible to attach images within an email, rough graphics is sometimes simpler and sufficient. I do not know how easy (or not) it would be writing a dumb device for R, but I wish that if someone ever contributes it, it will be accepted by the core team. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] command completion?
[Duncan Murdoch] [Robert Citek] Does R have command or object name completion? [...] I don't think it would be a welcome change to the console versions; some of them use readline's filename completion which would almost certainly be broken by this. We have to put things in perspective, here. In my opinion, object name completion would be a lot more useful than filename completion, because in R, we name R objects much more often than we name files. Others need to run under ESS. While this is a good things for Emacs lovers, the requirement is rather unwelcome for pagans! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] On the speed of apply and alternatives?
[Monty B. ] I have to handle a large matrix (1000 x 10001) where in the last column i have a value that all the preceding values in the same row has to be compared to. I have made the following code : # generate a (1000 x 10001) matrix, testm # generate statistics matrix 1000 x 4: qnt - c(0.01, 0.05) cmp_fun - function(x) { LAST - length(x) smpls - x[1:(LAST-1)] real - x[LAST] ret - vector(length=length(qnt)*2) for (i in 1:length(qnt)) { q_i - quantile(smpls, qnt[i])# the quantile i m_i - mean(smpls[smplsq_i ] ) # mean of obs less than q_i ret[i] - ifelse(real q_i, 1, 0) ret[length(qnt)+i] - ifelse(real q_i, real - m_i, 0) } ret } hcvx - apply(testm, 1, cmp_fun) Can anyone advise as to how I can optimize the runtime of this problem? All suggestions are welcome! You may speed it up a bit, not so much, with the following: stats.testm - function (testm, qnt=c(0.01, 0.05)) { quants - apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt) smpls - testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)] reals - testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)] keeps - smpls rep(quants, ncol(smpls)) means - rowSums(smpls * keeps) / rowSums(keeps) matrix(rbind((reals quants) + 0, (reals quants) * (reals - means)), length(qnt) * 2) } Try it with something like: gen.testm - function (n, m) { matrix(sample(0:99, n * (m + 1), TRUE), n) } testm - gen.testm(100, 100) stats.testm(testm) Without checking, I would suspect that quantile is the big consumer. If you could make it without quantile interpolation, maybe some more vectorisation could be possible, but in any case, I do not think you can avoid sorting each row separately, in one way or another (currently done within quantile). -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] screen wrapping
[Robert Citek] How can I increase/decrease the line length for screen wrapping? Check ?options, and within in, width, it might be what you want. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] install R under suse: packages dependency
[zhihua li] I'm trying to install R 2.3.0 under Suse 10.0. As I'm using SSH to login into the SUSE server, I can't use YAST2, I presume this is because you cannot remotely mount the CD's or DVD's? The next time you visit your server, if possible, copy your distribution media to your hard disks, you'll find out that this is really a useful thing to do. You can later use YaST2 to install from the copies you made, even remotely. There is no problem using YaST2 over SSH, either in graphical mode (if you used `ssh -X') or in text mode. In my experience, R 2.3.0 installs painlessly under SuSE 10.0, and needs nothing which is not already available on the distribution media. Should I say, I'm still impressed (even astonished) that R installation succeeds so easily, given the size and complexity of the distribution. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] plot cdf
[Romain Francois] [...] it would be useful to add an option 'ask' in 'example', maybe with a default to TRUE in interactive mode Seconded. `example(...)' would be more friendly for the average use. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] deleting rows with the same ID if any meet a condition
[gblevins] If x2 equal 2 then I want to delete all the rows for that person from the dataframe--see Before and After below. Before x1 - c(1,1,1,2,2,3,3,3) x2 - c(2,3,3,1,1,4,4,2) x3 - data.frame(x1,x2) x3 x1 x2 1 1 2 2 1 3 3 1 3 4 2 1 5 2 1 6 3 4 7 3 4 8 3 2 After x1 x2 1 2 1 2 2 1 You might try: subset(x3, !x1 %in% x1[which(x2==2)]) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is there a way....
[Levent TERLEMEZ] I would like to get rid of counting lines in fix() when i made a mistake in coding? Is there an easy way to an line numbers to editor? You may configure R to use the editor you want, the way you want. For example, if you want fix() to start Vim in graphical mode, highlighting file contents using R syntax, and numbering source lines, you might use the following R command: options(editor = gvim -c \set nu\ -c \set ft=r\) Of course, how to do this much depends on each editor. Some simpler editors may not have options for displaying the line number on each line. Yet, most keep the line counter for the cursor position updated in some mode or status bar, so if nothing better, you can learn to position the cursor while keeping an eye on that. :-) Once you find the proper incantation for your editor, and if you always want it activated, save the R command within your ~/.Rprofile file. P.S. - By the way, much congratulations and thanks to the R Core team for the recent publication of R 2.3.0. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to draw a circle
[Jian Zhang] how to draw a circle (e.g. radius=10cm) of one point? And how to choose these points in the circle? There also are ellipse functions in both packages car and ellipse. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] I am surprised (and a little irritated)
[Tom Backer Johnsen] [...] I've understood that RPM's are somewhat like installing programs on Windows, so that was downloaded and started with YAST. [...] Then I discover to my big surprise that the readme file says that I need to have eight installed packages. Then it says Most of them are included in a standard install. [...] someone should get the OpenSuse people to include R in the installation. [Gabor Csardi] I'm irritated as well. Your email should go to some suse mailing list, this is a suse problem, it has (almost) nothing to do with R. We are running regular (Pro?) SuSE systems at various distributions levels on a flurry of machines, but have no experience with OpenSuse, however, and install R from sources on these machines wherever needed. My notes say that *I* should pay attention to have the following packages pre-installed, besides those which are already usual for us: gcc-fortran, libjpeg-devel, readline-devel, tcl-devel, tk-devel I'm not sure about tk-devel. But these are all available on the CDs. R installation from sources goes surprisingly well for us, using SuSE. surprisingly is an euphemism here, astonishingly is more proper, given the size and complexity of R sources, components, and all release engineering. I'm always quite impressed that such software works! There is a tremendous amount of work behind a successful distribution, which many of us do not suspect enough! :-) It forces admiration. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R and ViM
[Michael Graber] [...] I'd like to be able to use R together with ViM. [...] My question now is, whether there are already people out there knowing how to do this in a similar easy way as with Emacs [...] I've been an Emacs user for a very long time, and then, switched to Vim. See http://pinard.progiciels-bpi.ca/opinions/editors.html, if you feel curious, for a few personal thoughts on Emacs. For R, I tried sticking to a mere interactive shell, taking advantage of the GNU readline interface built into R, with Vim as an external editor. For sending R code from Vim to R, one merely selects the code to send within Vim using the mouse, and paste it directly with the mouse in the interactive shell window running R. Simple and comfortable! :-) Emacs offers ESS, which has many interesting features. However, despite quite attractive, it did not fully seduce me: a bit because I try to avoid returning to Emacs keystroke habits, a bit because ESS is heavy weighted compared to Vim + R-in-a-shell solution, a bit because ESS adds distracting idiosyncrasies, like scrolling differently or opening extra windows at times. R already offers enough options I could customize if I want to read help in a browser or a pager, and at good speed. (Of course, if you use an heavy browser, you feel it; but links -g is OK!) An ESS nicety that my current setup does not really replace is the automatic highlighting or R output. One of the advantages of this output highlighting is visually spotting R requests and replies. As a compromise, I'm using this bit of a kludge in my Rprofile file: if (interactive()) { local({ options(editor='vim -c set ft=r') if (Sys.getenv('TERM') %in% c('rxvt', 'xterm')) { onglet = 2 options(prompt=paste(sep='', formatC('', width=80-onglet), '\033[;30;45m', formatC('', width=onglet), '\033[0m\n', options('prompt'))) } }) } The set ft=r bit ensures proper highlighting and coloration within Vim, whenever edit() or fix() are used. Here vim could be replaced by gvim or gvim -f, say. (In my Vim configuration, vim uses the GUI automatically if started within X; or uses the console mode otherwise.) Then, the R prompt is modified to visually mark each request-reply interaction with a white separating line holding a small violet marker at the right. It works nicely for me in almost all circumstances (there are a few, uncommon exceptions). Usual scrolling of the shell window allows me to quickly find R commands and replies, even if much less colourful than with ESS. I'm ready to pay that price for simplicity. A last trick which is convenient in my case. My X window manager allows customization of keystrokes. (I'm using Openbox, but surely many other window manages offer that possibility too.) For all 26 of Ctrl-Alt-Letter, the same small openbox-helper (Python) script of mine is called with the Letter given as an option, which may launch applications in turn. This is how Ctrl-Alt-R opens a shell window running R, and Ctrl-Alt-M opens a shell window running Maxima. In both these shells, Ctrl-D closes the application and the window. This is convenient for quick mathematical jobs, and quite in the spirit of Vim (fast and easy start/exit, instead of long running like Emacs). -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] What does rbind(iris[,,1], iris[,,2], iris[,,3]) do?
[Gabor Grothendieck] What you are referring to iris is called iris3 in R so just replace iris with iris3. iris3 is a 3d array in R whereas iris is a data frame. Thanks for this calm and simple reply. Some could learn from you! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] strange matrix behaviour: is there a matrix with one row?
[EMAIL PROTECTED] y - matrix(1:8, ncol=2) is.matrix(y[-c(1,2),]) [1] TRUE is.matrix(y[-c(1,2,3),]) [1] FALSE is.matrix(y[-c(1,2,3,4),]) [1] TRUE It seems like an inconsistent behaviour: - with 2 or more rows we have a matrix - with 1 row we do not have a matrix and - with 0 rows we have a matrix again ?'[' explains it. Using your example: is.matrix(y[-c(1, 2), , drop=FALSE]) [1] TRUE is.matrix(y[-c(1, 2, 3), , drop=FALSE]) [1] TRUE is.matrix(y[-c(1, 2, 3, 4), , drop=FALSE]) [1] TRUE -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] argv[0] --- again
[ivo welch] how about people on [...] linux or unix [...] See ?commandArgs. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Remove [1] ... from output
[Gregor Gorjanc] I am writing some numbers and character vectors to an ascii file and would like to get rid of [1] ... as shown bellow (a dummy example) R runif(20) [1] 0.653574 0.164053 0.036031 0.127208 0.134274 0.103252 0.506480 0.547759 [9] 0.912421 0.584382 0.987208 0.996846 0.666760 0.053637 0.327590 0.370737 [17] 0.505706 0.412316 0.887421 0.812151 I have managed to work up to remove quotes and all [*] except [1] as shown bellow. R print(paste(runif(20), collapse = ), quote = FALSE) [1] 0.790620362851769 0.45603066496551 0.563822037540376 0.812907998682931 0.726162418723106 0.37031230609864 0.681147597497329 0.29929908295162 0.209858040558174 0.304300333140418 0.105796672869474 0.743657597573474 0.409294542623684 0.825012607965618 0.282235795632005 0.21159387845546 0.620056127430871 0.337449935730547 0.754527133889496 0.280175548279658 Any hints how to solve my task? You may use cat instead of print. No need to paste then. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Replies on this list [was: removing NA from a data frame]
[Berton Gunter] [Sam Steingold] PPS. how do I figure out the number of rows in a data.frame? is length(attr(X,row.names)) the right way? help.search(number of rows) immediately gets you your answer! Hi, people. Here, I get: Help files with alias or concept or title matching ‘number of rows’ using fuzzy matching: nrow(base) The Number of Rows/Columns of an Array and '?nrow' says that it meant for arrays: nothing about data.frame, and not a generic method either. Even if it was a class method, we should not expect a new user to be very familiar with R (both!) class systems from the start. What a new user might think, reading the documentation? Sam Steingold is surely an experimented and competent computer guy. He might guess, who knows, that some automatic array to data.frame conversion occurs (all inefficient that it could be). Yet this would not match other knowledge nor experimentation, as a data.frame is hardly an array: x = data.frame(a=1:3, b=c(TRUE, TRUE, FALSE), c=letters[1:3]) as.array(x) Erreur dans dimnames-.data.frame(`*tmp*`, value = list(c(a, b, c : 'dimnames' incorrect pour ce tableau de données Despite help.search(number of rows) provides an answer that happens to be right, it might not be recognised as such by an intelligent reader, and so, it is not really satisfactory. The documentation for nrow could be improved by saying that it applies to any kind of structure for which dim() is meaningful. And even then, ?dim is silent about data frames. One clue (yet a pretty weak one) that nrow may be applied to a data.frame comes from the fact that ?dim.data.frame lists the same documentation as ?dim. Why do I say all this? Because it happens, not necessarily in this case, a bit too often nevertheless, that answers given to users are uselessly harsh or haughty. Especially when they imply that the documentation is perfect. One problem is that some people enjoy reading such replies. As example of this strange kind of pleasure, here is a excerpt from R Archives, which I find especially enlightening on the mentality of few members: From: [EMAIL PROTECTED] (Steve Wisdom) Date: 2003-12-26 17:04 Subject: [R] re| Dr Ward on List protocol Andrew C. Ward [EMAIL PROTECTED] : With respect to 'tone' and 'friendliness', perhaps all that is meant or needed is that people be polite and respectful. I shake my head as often at rude answers Oh, by gosh, by golly. I don't think an occasional dose of 'real life', via a jab from the Professor, will cause any lasting harm to the cosseted emolumated students and academics on the List. On a Wall St trading desk, for example, every day one is kicked in the head more brutally by clients, superiors, counterparts, the markets etc, than ever one would be by the Professor. Plus, the Professor's jabs are good Schadenfreudic fun for the rest of us. Regards, Steve Wisdom Westport CT US The truth is that not everybody around here is cosseted emolumated students and academics. Moreover, behaviour at trading desks is fully irrelevant, and for most of us, this is not the kind of life we chose to live. Wrong behaviour elsewhere is hardly an excuse for not behaving properly, here. Moreover, what is mere good fun for some may be perceived as highly inelegant by others. While some competent members may inspire admiration and charism by their knowledge and dedication, they sometimes damage beyond repair what they inspire, when showing poor humanity. I'm aware of the constant fear some have of seeing this list abused. There are ways for not being abused, which do not require becoming abusive ourselves. We should deepen such ways in our own habits. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error function
[Kjetil Brinchmann Halvorsen] erf [in] package (CRAN) NORMT3, as help.search(error function) could have told [you] It does not for me. I would presume one needs NORMT3 installed first, and NORMT3 is seemingly not part of standard base R installation. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Remove gray grid from levelplot
[Brian Ripley] On Mon, 6 Mar 2006, Martin Sandiford wrote: [...] P.S. To me, the png() device does not appear to do sub-pixel rendering. The postscript() and pdf() devices do. What could you possibly mean by that? I would think the original poster refers to aliasing issues. The png device writes on a bitmap. It outputs a rectangular grid of either pre-defined colour indices or RGB values. There is nothing in the PNG standard to allow anything finer. Granted. Yet, there are nuances. Anti-aliasing techniques may be applied to bit-mapped images like PNGs, and a carefully computed alpha channel could be included in the PNG as a way to acknowledge sub-pixel rendering matters. If the background of the generated image is opaque instead of transparent, the graphics and the background might be combined at PNG generation, swallowing what would have been an alpha channel and so, sparing the need of including any in the generated PNG. However, on this Linux system, if I understood correctly, R goes through X11 for generating PNGs, and so, does no better than X11 itself (at least as currently driven by R) in the area of anti-aliasing. Anti-aliasing libraries exist (which I never really studied or used myself) that could likely provide better PNG quality. Did some decision has been reached among developers on this topic? I would guess, without really knowing, that developers favor vector-to-raster rendering to be done outside R, whenever quality is required. Using an anti-aliasing library for higher output quality within R would mean, besides the obvious trouble of selecting one of those libraries and programming the interface, adding yet another dependency at R build-time (likely autoconfigured, of course), and an observable slowdown for graphics which are more heavily loaded, especially in interactive mode. For one, I do not need more than draft quality so far when using R interactively for plots. Maybe some draft, quality or aa flag is added to control anti-aliasing behaviour? (I know that quality is already used to mean something else for JPEG images). Just a few thoughts. Keep happy, all! -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Newton-Raphson with analytical derivatives
[m p] Can someone point to a package with Newton-Raphson method using analytic derivatives. This is just for a 1D problem. Could not find easily anything suitable. You may check: http://pinard.progiciels-bpi.ca/plaisirs/animations/NRart/R/nr.image.R If you remove the graphics-related lines, the few remaining lines is Newton-Raphson over an expression. Don't take this too seriously, it was a mere toy with this to get an initial feel of the R language. :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] prehistoric versions of R -- 1995!
[Martin Maechler] 2) The oldest stuff that I have is all from 1995; Mailing lists seem to go back into 1995 too. I found a few messages from around 1994 on topics to be later found within R, but I'm not sure where I got these old messages from. I did find a message really related to R-pre-alpha, which itself quotes a message written in 1994. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] qqplot
[Vincent Negre] [...] I do not understand how qqplot() compute quantiles. Just type ``qqplot`` (without the parentheses) at the R prompt, to see the source code. ``qqplot`` does not especially compute quantiles, which are rather obtained directly through sorting its arguments. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Difficulty with qqline in logarithmic context
[Brian Ripley] Is there a good reason to use qqnorm in a single-log context? Yes. Googling around reveals this is not so uncommon. Should one not rather use qqnorm(log(freq)) qqline(log(freq)) In the display produced by qqnorm, the y-axis would then show log(value) labels, while the user (me!) expects value labels. since you are (I guess) looking at log-normality of freq? Once again, I was merely toying with qqplot. I found intriguing that, while shuffling messages around between folders, for a good while, the distribution of log(number of messages) per folder appears vagueley normal, as I do not quickly see a reasonable justification for this. Another way to look at that is qqplot(qlnorm(ppoints(length(freq))), freq, log=xy) the same plot, different scales. Interesting, thanks for teaching me about ppoints. Yet, I stay more happy with the abcissa scale produced by qqnorm. Besides, how would one uses qqline with the above? (I believe a QQ plot should always have comparable scales on the two axes.) While comparable scales are somewhat simpler to compare, this is not necessarily what is most adequate for the user. Proof is that while quantiles are being compared here, scales do not show quantiles, but units as meaningful to the user. One might want to compare variables scaled very differently, maybe because of different units from the same distribution, of from different but similar distributions using different scales and shifted to different means. Or even, why not, if this is what is meaningful for users, a log scale. The point is that qqline is tied to normality, not to log-normality. As it stands, yes. As a convenience, it could be extended (probably easily) to log-normality. qqnorm already does something sensible in log-context, so a user might expect qqline to do equally well. The real point might be that qqline is tied to abline a bit too blindly. What is the meaning of intercept and slope of a straight line on a graphic in log context? First, the intercept might not even exist. Second, abline interpretation depends on the clippling, and possibly on the extrema of the pretty breakpoints chosen for scales, so making it hard to predict on average use. There ought to be some reason for the log-aware code in abline, yet I did not find documentation for it. The wisest for abline, in my very humble opinion, would be for it to complain if ever called in log context. Then, qqline would indirectly complain through abline, if qqline is not modified to do something more proper. Moreover, if it is definitely out of question that qqline be ever meaningfully called in log context, then so qqnorm, which should then complain as well. Currently, qqline misbehaves, in that it silently produces a meaningless result, while it could either diagnose that the result is meaningless, or produce a mearningful result. [Remainder of the reply top-quoted, as usual on r-help.] On Wed, 1 Feb 2006, François Pinard wrote: Hi, R friends. I had some difficulty with the following code: qqnorm(freq, log='y') qqline(freq) as the line drawn was seemingly random. The exact data I used appears below. After wandering a bit within the source code for abline, I figured out I should rather write: qqnorm(freq, log='y') par(ylog=FALSE) qqline(log10(freq)) par(ylog=TRUE) I'm proposing that this little stunt be rather be hidden and automatically effected within qqline proper, whenever par('ylog') is TRUE. I thought about providing a patch, as qqline is so small. Yet it would be more noise than useful, as I'm not familiar with the datax argument usage, which should probably be addressed as well. Here is the data, in case useful: freq - as.integer(c(33, 79, 21, 436, 58, 18, 1106, 498, 1567, 393, 2, 104, 50, 67, 113, 76, 327, 331, 196, 145, 86, 59, 12, 215, 293, 154, 500, 314, 246, 587, 85, 23, 323, 3, 13, 576, 29, 37, 24, 21, 1230, 137, 13, 93, 3, 101, 72, 218, 59, 17, 2, 8, 86, 143, 150, 22, 19, 234, 119, 157, 4, 255, 146, 126, 76, 15, 271, 170, 4, 6, 16, 3048, 2175, 3350, 5017, 5706, 1610, 665, 322, 1, 16, 47, 51, 168, 94, 66, 154, 99, 11, 547, 953, 1, 1071, 80, 184, 168, 52, 187, 103, 187, 361, 46, 85, 135, 597, 121, 283, 26, 12, 20, 169, 9, 79, 15, 114, 75, 30, 111, 556, 173, 32, 99, 438, 2, 2, 1, 117, 5, 3, 51, 8, 41, 12, 23, 2, 13, 5, 1, 9, 4, 1, 7, 15, 5, 48, 16, 112, 6, 1, 39, 60, 5, 23, 5, 19, 1, 8, 32, 4, 13, 1, 14, 71, 5, 1, 35, 30, 100, 389, 22, 8, 1, 192, 40, 6, 3, 17, 2, 14, 71, 14, 1, 5, 4, 32, 21, 18, 13, 2, 2, 45, 342, 46, 144, 18, 131, 188, 112, 37, 85, 90, 8, 195, 173, 5, 53, 96, 37, 16, 16, 281, 64, 50, 92, 336, 31, 744, 4, 134, 74, 1, 227, 6, 48, 418, 64, 66, 59, 20, 45, 20, 370, 148, 22, 7, 30, 601, 29, 82, 113, 938, 252, 65, 137, 72, 22, 98, 12, 152, 212, 13, 8, 35, 3, 77)) Yet this really is the value of courriel$freq after data(courriel), with a file .../R/data/courriel.R here, holding: courriel - read.table(pipe('grep -c
[R] Difficulty with qqline in logarithmic context
Hi, R friends. I had some difficulty with the following code: qqnorm(freq, log='y') qqline(freq) as the line drawn was seemingly random. The exact data I used appears below. After wandering a bit within the source code for abline, I figured out I should rather write: qqnorm(freq, log='y') par(ylog=FALSE) qqline(log10(freq)) par(ylog=TRUE) I'm proposing that this little stunt be rather be hidden and automatically effected within qqline proper, whenever par('ylog') is TRUE. I thought about providing a patch, as qqline is so small. Yet it would be more noise than useful, as I'm not familiar with the datax argument usage, which should probably be addressed as well. Here is the data, in case useful: freq - as.integer(c(33, 79, 21, 436, 58, 18, 1106, 498, 1567, 393, 2, 104, 50, 67, 113, 76, 327, 331, 196, 145, 86, 59, 12, 215, 293, 154, 500, 314, 246, 587, 85, 23, 323, 3, 13, 576, 29, 37, 24, 21, 1230, 137, 13, 93, 3, 101, 72, 218, 59, 17, 2, 8, 86, 143, 150, 22, 19, 234, 119, 157, 4, 255, 146, 126, 76, 15, 271, 170, 4, 6, 16, 3048, 2175, 3350, 5017, 5706, 1610, 665, 322, 1, 16, 47, 51, 168, 94, 66, 154, 99, 11, 547, 953, 1, 1071, 80, 184, 168, 52, 187, 103, 187, 361, 46, 85, 135, 597, 121, 283, 26, 12, 20, 169, 9, 79, 15, 114, 75, 30, 111, 556, 173, 32, 99, 438, 2, 2, 1, 117, 5, 3, 51, 8, 41, 12, 23, 2, 13, 5, 1, 9, 4, 1, 7, 15, 5, 48, 16, 112, 6, 1, 39, 60, 5, 23, 5, 19, 1, 8, 32, 4, 13, 1, 14, 71, 5, 1, 35, 30, 100, 389, 22, 8, 1, 192, 40, 6, 3, 17, 2, 14, 71, 14, 1, 5, 4, 32, 21, 18, 13, 2, 2, 45, 342, 46, 144, 18, 131, 188, 112, 37, 85, 90, 8, 195, 173, 5, 53, 96, 37, 16, 16, 281, 64, 50, 92, 336, 31, 744, 4, 134, 74, 1, 227, 6, 48, 418, 64, 66, 59, 20, 45, 20, 370, 148, 22, 7, 30, 601, 29, 82, 113, 938, 252, 65, 137, 72, 22, 98, 12, 152, 212, 13, 8, 35, 3, 77)) Yet this really is the value of courriel$freq after data(courriel), with a file .../R/data/courriel.R here, holding: courriel - read.table(pipe('grep -c \'^From \' ../courriel/*'), sep=':', as.is=T, row.names=1, col.names=c('fichier', 'freq')) My goal, which is nothing serious, was merely to toy with the number of messages per folder, for folders massaged out of R archives. Version: platform = i686-pc-linux-gnu arch = i686 os = linux-gnu system = i686, linux-gnu status = major = 2 minor = 2.1 year = 2005 month = 12 day = 20 svn rev = 36812 language = R Locale: LC_CTYPE=fr_CA.UTF-8;LC_NUMERIC=C;LC_TIME=fr_CA.UTF-8;LC_COLLATE=fr_CA.UTF-8;LC_MONETARY=fr_CA.UTF-8;LC_MESSAGES=fr_CA.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:methods, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, fp.etc, Autoloads, package:base -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 35, Issue 24
[Gabor Grothendieck] [...] this list is inhabited by some rather rude participants but everyone puts up with them in the hope that they do have some useful remarks. I've been witnessing this list for about one year, and also read *lots* of archived messages. While it is true that a few members do not use white gloves, are rather fond on concise replies, and do express strong opinions at times, they never went overboard insulting people and always kept a reasonable measure, at least so far that I could see (yet who knows, outliers might happen! :-). (*) Our whole society is a bit shy and shivers easily when opinions are expressed nowadays, I often observed than people quickly get insecure, feel attacked, and overreact (by running away or starting a fight). there is even a group of thought that feels it is a justifiable way to keep the list volume under control. This may work because of the starred paragraph above, that is, for wrong reasons. Best is, and this often occurs on the R list, when everything (facts, opinions) is being shared efficiently, without useless arguing. Then, threads quickly fade out. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 35, Issue 24
[EMAIL PROTECTED], addressing to Brian Ripley] First of all, unless you are an english professor, then I do not think you have any business policing language. We all do mistakes (English or otherwise). I'm very grateful that people forgive my own errors, and I try to be tolerant to others. (Yet, it happens that people lacking good will ask for stronger reactions.) This is the business of everybody, really, building a better community in every possible aspect, and the means for this go through interaction and collaboration. Let's all be humble enough to ponder the criticism of others, improve ourselves, and so increase the value of our share. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [Rd] Display an Image on a Plane
[Barry Rowlingson] [Ben Bolker] [Labbe, Vincent] I am new to R and I would like to display an image on a plane in a 3D plot, i.e. I would like to be able to specify a theta and a phi parameters like in the function persp to display a 2D image on an inclined plane. what do you mean by image exactly? I think once you get into doing fancy visualisations like this then you may find a solution outside of R. [good referrences deleted] Bonjour, Vincent. I'm not fully sure I understand your request, what I get is that you want to transform an image on a plane as if one was looking at it in space, from an angle. If I had this problem, I would probably produce the image using regular R machinery for this like png() or postscript(), then interactively process the result within Gimp, using trapezoidal deformations (I think they call it Perspective transformation). For example, I used this simple trick in the following picture: http://pinard.progiciels-bpi.ca/plaisirs/dessins/cd-back.jpg for the KWIC listing being part of the composition. However, if I needed a precise phi and theta for transformations beyond what trans3d() can offer, I would likely use Python or R for computing the projection of the rectangle enclosing the image, than PIL (Python Imaging Library) for producing that precise trapezoidal deformation. Just sharing ideas, of course. Much likely that if I knew R better, I would use it more fully -- but that's a tautology! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dynamic Programming in R
[Arnab mukherji] A concern that has been cited that may discourage R use for solving dynamic programs is its memory handling abilities. For a dynamic programming problem defined over N steps, one usually needs a N*N matrix, so problems should be tractable for N being not too big. In those I studied, CPU time usually was the scarse resource. As extreme paths were known to be very unlikely, this (and memory as well) could be alleviated somehow by limiting the solution search into bands (more or less wide) following the diagonal of the solution matrix. I also had some success in splitting big problems into a sequence of smaller subproblems, and recursively: such approximations are likely not acceptable in the general case. I would guess that most dynamic programming problems have their own specific artifacts and speed-up techniques, a universal solution might be uneasy. Who knows (I'm not sure): R might well offer a powerful environment for building a dynamic programming framework. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
[hadley wickham] [François Pinard] Selecting a sample is easy. Yet, I'm not aware of any SQL device for easily selecting a _random_ sample of the records of a given table. On the other hand, I'm no SQL specialist, others might know better. There are a number of such devices, which tend to be rather SQL variant specific. Try googling for select random rows mysql, select random rows pgsql, etc. Thanks as well for these hints. Googling around as your suggested (yet keeping my eyes in the MySQL direction, because this is what we use), getting MySQL itself to do the selection is a bit discouraging, as according to comments I've read, MySQL does not seem to scale well with the database size according to the comments I've read, especially when records have to be decorated with random numbers and later sorted. Yet, I did not drive any benchmark myself, and would not blindly take everything I read for granted, given that MySQL developers have speed in mind, and there are ways to interrupt a sort before running it to full completion, when only a few sorted records are wanted. Another possibility is to generate a large table of randomly distributed ids and then use that (with randomly generated limits) to select the appropriate number of records. I'm not sure I understand your idea (what mixes me in the randomly generated limits part). If the large table is much larger than the size of the wanted sample, we might not be gaining much. Just for fun: here, sample(1, 10) in R is slowish already :-). All in all, if I ever have such a problem, a practical solution probably has to be outside of R, and maybe outside SQL as well. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
[Brian Ripley] [François Pinard] [Brian Ripley] One problem [...] is that R's I/O is not line-oriented but stream-oriented. So selecting lines is not particularly easy in R. I understand that you mean random access to lines, instead of random selection of lines. That was not my point. [...] Skipping lines you do not need will take longer than you might guess (based on some limited experience). Thanks for telling (and also for the expression reservoir sampling). OK, then. All summarized, if I ever need this for bigger datasets, selection might better be done outside of R. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
[Martin Maechler] FrPi Suppose the file (or tape) holds N records (N is not known FrPi in advance), from which we want a sample of M records at FrPi most. [...] If the algorithm is carefully designed, when FrPi the last (N'th) record of the file will have been processed FrPi this way, we may then have M records randomly selected from FrPi N records, in such a a way that each of the N records had an FrPi equal probability to end up in the selection of M records. I FrPi may seek out for details if needed. [...] I'm also intrigued about the details of the algorithm you outline above. I went into my old SPSS books and related references to find it for you, to no avail (yet I confess I did not try very hard). I vaguely remember it was related to Spearman's correlation computation: I did find notes about the severe memory limitation of this computation, but nothing about the implemented workaround. I did find other sampling devices, but not the very one I remember having read about, many years ago. On the other hand, Googling tells that this topic has been much studied, and that Vitter's algorithm Z seems to be popular nowadays (even if not the simplest) because it is more efficient than others. Google found a copy of the paper: http://www.cs.duke.edu/~jsv/Papers/Vit85.Reservoir.pdf Here is an implementation for Postgres: http://svr5.postgresql.org/pgsql-patches/2004-05/msg00319.php yet I do not find it very readable -- but this is only an opinion: I'm rather demanding in the area of legibility, while many or most people are more courageous than me! :-). -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
[hadley wickham] [...] according to comments I've read, MySQL does not seem to scale well with the database size according to the comments I've read, especially when records have to be decorated with random numbers and later sorted. With SQL there is always a way to do what you want quickly, but you need to think carefully about what operations are most common in your database. For example, the problem is much easier if you can assume that the rows are numbered sequentially from 1 to n. This could be enfored using a trigger whenever a record is added/deleted. This would slow insertions/deletions but speed selects. Sure (for a caricature example) that if database records are already decorated with random numbers, and an index is built over the decoration, random sampling may indeed be done quicker :-). The fact is that (at least our) databases are not especially designed for random sampling, and people in charge would resist redesigning them merely because there would be a few needs for random sampling. What would be ideal is being able to build random samples out of any big database or file, with equal ease. The fact is that it's doable. (Brian Ripley points out that R textual I/O has too much overhead for being usable, so one should rather say, sadly: It's doable outside R.) Just for fun: here, sample(1, 10) in R is slowish already :-). This is another example where greater knowledge of problem can yield speed increases. Here (where the number of selections is much smaller than the total number of objects) you are better off generating 10 numbers with runif(10, 0, 100) and then checking that they are unique Of course, my remark about sample() is related to the previous discussion. If sample(N, M) was more on the O(M) side than being on the O(N) side (both memory-wise and cpu-wise), it could be used for preselecting which rows of a big database to include in a random sample, so building on your idea of using a set of IDs. As the sample of M records will have to be processed in-memory by R anyway, computing a vector of M indices does not (or should not) increase complexity. However, sample(N, M) is likely less usable for randomly sampling a database, if it is O(N) to start with. About your suggestion of using runif and later checking uniqueness, sample() could well be implemented this way, when the arguments are proper. The greater knowledge of the problem could be built in right into the routine meant to solve it. sample(N, M) could even know how to take advantage of some simplified case of a reservoir sampling technique :-). [...] a large table of randomly distributed ids [...] (with randomly generated limits) to select the appropriate number of records. [...] a table of random numbers [...] pregenerated for you, you just choose a starting and ending index. It will be slow to generate the table the first time, but then it will be fast. It will also take up quite a bit of space, but space is cheap (and time is not!) Thanks for the explanation. In the case under consideration here (random sampling of a big file or database), I would be tempted to guess that the time required for generating pseudo-random numbers is negligible when compared to the overall input/output time, so it might be that pregenerating randomized IDs is not worth the trouble. Also given that whenever the database size changes, the list of pregenerated IDs is not valid anymore. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
[Uwe Ligges] François Pinard wrote: [David Forrest] [...] A few end-to-end tutorials on some interesting analyses would be helpful. I'm in the process of learning R. While tutorials are undoubtedly very useful, and understanding that working and studying methods vary between individuals, what I (for one) would like to have is a fairly complete reference manual to the library [...] organised by topics. Have a look at help.start() -- Search Engine Keywords -- Section Keywords by Topic. Yes, thanks. This is quite in the spirit, or direction, of what I was proposing. Is that resource exhaustive? (I'm asking out of laziness, as it might take me several months to really check.) One serious drawback (for me) is that it requires an heavy weight browser to be used, with Javascript enabled. I do not find this very practical. Another point is that the presentation, while useful, is a rather dry. In another message, I suggested the Emacs Lisp Reference Manual as a good example of a fluid presentation of a voluminous library. There might be some workable compromise between the current situation with R, even through the Keywords by Topic, and that fluidity. (Wikis also have the drawback of requiring heavy machinery, and the editor they force us into if usually unbearable.) I may be back with this subject, but only in a good while. I'm slowly building a kind of documentation plan I want (yet in French), as I learn R, and guess I may complete my base learning in one or two years from now (hoping I'll stay courageous enough). If I then get something usable or shareable enough, I'll offer it -- because I like returning a little something for the nice tools given to me! :-) In any case, thanks for listening! -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
[Jonathan Baron] [the current reference manual] is organised by library and, within each library, by function name: this organisation means that the manual is mainly used as a reference, or else, that it ought to be studied from cover to cover, dauntingly. I think that many search facilities are helpful here: [...] help.search() [...] 2. RSiteSearch() [...] Sure they are! Yet, we do not all learn or work the same way. Given full choice, I prefer reading a reference than go fish for information, as this tends to build stronger information nets within my brain :-). I doubt that the sort of manual you describe is possible given the very rapid growth of CRAN, and it would be really inadequate if it did not include those packages. The current reference manual does not cover CRAN, and even if it does not, I would not be tempted to qualify it as inadequate (at least for the novice I am). There seems to be a lot to know about R, initially as a language, and then, for learning to shuffle and organise data in preparation for later processing. I would guess every new R user has to learn his way in there. The current reference says a lot, but is big to grasp as it stands, its organisation is not as helpful as it could for learning and retaining. The kind of manual I described seems possible to me, because it could be mechanically derived out of a plan, and the derivation mechanics could diagnose what is being forgotten (this could even yield some Unsorted functions chapter or appendix). The mechanic could be made general enough to accept glue text at appropriate places. [Not completely dissimilar to, for those who happen to remember it, the way C code was mechanically derived out of Pascal, initially, for Knuth's TeX.] Many of [CRAN packages] are designed for people in particular fields and turn out to be extremely useful. Undoubtedly! I envy you all, who know already! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Suggestion for big files [was: Re: A comment about R:]
[ronggui] R's week when handling large data file. I has a data file : 807 vars, 118519 obs.and its CVS format. Stata can read it in in 2 minus,but In my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M. Just (another) thought. I used to use SPSS, many, many years ago, on CDC machines, where the CPU had limited memory and no kind of paging architecture. Files did not need to be very large for being too large. SPSS had a feature that was then useful, about the capability of sampling a big dataset directly at file read time, quite before processing starts. Maybe something similar could help in R (that is, instead of reading the whole data in memory, _then_ sampling it.) One can read records from a file, up to a preset amount of them. If the file happens to contain more records than that preset number (the number of records in the whole file is not known beforehand), already read records may be dropped at random and replaced by other records coming from the file being read. If the random selection algorithm is properly chosen, it can be made so that all records in the original file have equal probability of being kept in the final subset. If such a sampling facility was built right within usual R reading routines (triggered by an extra argument, say), it could offer a compromise for processing large files, and also sometimes accelerate computations for big problems, even when memory is not at stake. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
[Brian Ripley] I rather thought that using a DBMS was standard practice in the R community for those using large datasets: it gets discussed rather often. Indeed. (I tried RMySQL even before speaking of R to my co-workers.) Another possibility is to make use of the several DBMS interfaces already available for R. It is very easy to pull in a sample from one of those, and surely keeping such large data files as ASCII not good practice. Selecting a sample is easy. Yet, I'm not aware of any SQL device for easily selecting a _random_ sample of the records of a given table. On the other hand, I'm no SQL specialist, others might know better. We do not have a need yet for samples where I work, but if we ever need such, they will have to be random, or else, I will always fear biases. One problem with Francois Pinard's suggestion (the credit has got lost) is that R's I/O is not line-oriented but stream-oriented. So selecting lines is not particularly easy in R. I understand that you mean random access to lines, instead of random selection of lines. Once again, this chat comes out of reading someone else's problem, this is not a problem I actually have. SPSS was not randomly accessing lines, as data files could well be hold on magnetic tapes, where random access is not possible on average practice. SPSS reads (or was reading) lines sequentially from beginning to end, and the _random_ sample is built while the reading goes. Suppose the file (or tape) holds N records (N is not known in advance), from which we want a sample of M records at most. If N = M, then we use the whole file, no sampling is possible nor necessary. Otherwise, we first initialise M records with the first M records of the file. Then, for each record in the file after the M'th, the algorithm has to decide if the record just read will be discarded or if it will replace one of the M records already saved, and in the latter case, which of those records will be replaced. If the algorithm is carefully designed, when the last (N'th) record of the file will have been processed this way, we may then have M records randomly selected from N records, in such a a way that each of the N records had an equal probability to end up in the selection of M records. I may seek out for details if needed. This is my suggestion, or in fact, more a thought that a suggestion. It might represent something useful either for flat ASCII files or even for a stream of records coming out of a database, if those effectively do not offer ready random sampling devices. P.S. - In the (rather unlikely, I admit) case the gang I'm part of would have the need described above, and if I then dared implementing it myself, would it be welcome? -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
[David Forrest] [...] A few end-to-end tutorials on some interesting analyses would be helpful. I'm in the process of learning R. While tutorials are undoubtedly very useful, and understanding that working and studying methods vary between individuals, what I (for one) would like to have is a fairly complete reference manual to the library. Of course, we already have one, and that's marvellous already. Yet, it is organised by library and, within each library, by function name: this organisation means that the manual is mainly used as a reference, or else, that it ought to be studied from cover to cover, dauntingly. The very same material could be organised by topics. Chapters could be named like General Help, Language features, Data types, Data Handling, Input/Output, Graphics, Statistics, and such. The chapter Language features, to take one example, could hold sections like Expressions, Statements, Functions, Environments, Packages, Execution and Debugging. Sections could then hold current reference pages. References by library and/or by function name could be stated either in appendices or as a general index at the end. For those who happen to know it, I find the Emacs Lisp Reference Manual to be a good example for organising, in a very usable way, a comprehensive reference to a flurry of library functions. When one needs string handling functions, they are likely grouped together in the manual, and are likely all present. A tutorial, by comparison, usually presents a subset, or even a tiny subset, of what is available. Any volunteers? Not me, or at least, not before quite a long while. The overall organisation of a reference should not be handled by beginners. On the contrary, it rather requires someone who has comprehensive knowledge of all the material to be considered. Just an idea. A good work plan would be to establish a new structure for a reference manual, and once competent people (or this community as a whole) agrees on a structure, to develop mechanical means for generating a reference manual out of the current material. The mechanism should likely allow for added glue text, about everywhere reasonable, and for diagnosing any lone, unreachable page in the current reference. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] use of tapply?
[tom wright] I'm still learning how to program with R and I was hoping someone could take the time to show me how I can rewrite this code? I'll try! :-) data.intersects-data.frame( x=c(0.230,0.411,0.477,0.241,0.552,0.230), y=c(0.119,0.515,0.261,0.431,0.304,0.389), angle=vector(length=6), length=vector(length=6), row.names=c('tbr','trg','dbr','dbg','pbr','pbg')) calcDist-function(x,y){ #calcualates distance from origin (C) origin-data.frame(x=0.34,y=0.36) dx-origin$x-x dy-origin$y-y length-sqrt(dx^2+dy^2) angle-asin(dy/length) return(list('length'=length,'angle'=angle)) } for(iLoc in 1:length(data.intersects[,1])){ result-calcDist(data.intersects[iLoc,]$x,data.intersects[iLoc,]$y) data.intersects[iLoc,]$angle-result$angle data.intersects[iLoc,]$length-result$length } Using `di' instead of `data.intersects' for short: di - data.frame(x=c(0.230, 0.411, 0.477, 0.241, 0.552, 0.230), y=c(0.119, 0.515, 0.261, 0.431, 0.304, 0.389), row.names=c('tbr', 'trg', 'dbr', 'dbg', 'pbr', 'pbg')) di.c - with(di, data.frame(x=x-0.34, y=y-0.36)) di$length - with(di.c, sqrt(x^2 + y^2)) di$angle - with(di.c, atan2(y, x)) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] HOW to Create Movies with R with repeated plot()?
[Jan Verbesselt] Is it possible to create a type of 'movie' in R based on the output of several figures (e.g., jpegs) via the plot() function. I obtained dynamic results with the plotting function and would like to save these as a movie (e.g., avi or other formats)? You may also peek at an actual example of using R for mini-movies: http://pinard.progiciels-bpi.ca/plaisirs/animations/index.html I wrote this toy about the same week I started to learn R, and it was a hell of a good exercise for the poor little me! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R: to the power
[Thomas Lumley] It would be nice if R could realize that you meant the cube root of -8, but that requires either magical powers or complicated and unreliable heuristics. The real solution might be a function like root(x,a,b) to compute x^(a/b), where a and b could then be exactly representable integers. If someone wants to write one... While this could be done with moderate difficulty for the simpler cases, one cannot reasonably ask R to be and do everything. :-) So far, I see R more on the numerical side of things. If you want precise, exact solutions to various mathematical problems, you might consider installing a Computer Algebra System on your machine, next to R, for handling the symbolic side of things. One such system which is both free and very capable might be Maxima. Its convoluted story is rooted 40 years in the past. Some may say it lacks some chrome and be mislead; don't be, the engine is pretty solid. Peek at http://maxima.sourceforge.net if you think you need such a beast. Beware: to use it, you need either GCL or Clisp pre-installed. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Rpy and RSPython
[Weiwei Shi] I am thinking to use one of them but not sure which one is better. I think Rpy cannot call python from R while the PRPython can in two-directional calling. Am I right? s/PRPython/RSPython/ ? :-) This is also what I understood. Yet, despite the uni-directionality of RPy, this is what I chose for my personal usage (probably more handy to use, or easy to install -- but the main point was that RPy guaranteed to be more stable and never crash!). I think I recently read somewhere that they were plans for undusting RSPython, and then said to myself: Should re-evaluate once done.. Surely that for now, RPy is quite sufficient for my simple needs. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R annoyances
[Barry Rowlingson] Even my great dream that R and Python eventually merge into the same language? R gets Python's syntax and Object-oriented functions and Python gets access to all R's statistical functions? R is more than a statistical library. I'm coming to R with a strong Python background, and first thought I would mainly use R through Python. But soon, the R language revealed a few interesting features that Python does not offer, and which are very appropriate in R context. For example, vectorisation is built-in (yet available on the Python side through Numeric or Numarray extensions). R also holds interesting (useful and flexible) ideas about argument passing and matching, lazy evaluation, and environments. And surely other things as well. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R_LIBS difficulty ?
[Prof Brian Ripley] [François Pinard] Now using this line within `~/.Renviron': R_LIBS=/home/pinard/etc/R my tiny package is correctly found by R. However, R does not seem to see any library within that directory if I rather use either of: R_LIBS=$HOME/etc/R R_LIBS=$HOME/etc/R Correct, and as documented. See the description in ?Startup, which says things like ${foo-bar} are allowed but not $HOME, and not ${HOME}/bah or even ${HOME}. But R_LIBS=~/etc/R will work in .Renviron since ~ is intepreted by R in paths. Hello, Brian (or should I rather write Prof Ripley?). Thanks for having replied. I was not sure how to read but not, which could be associated either with which says or are allowed. My English knowledge is not fully solid, and I initially read you meant the later, but it seems the former association is probably the correct one. The fact is the documentation never says that `$HOME' or `${HOME} are forbidden. It is rather silent on the subject, except maybe for this sentence: value is processed in a similar way to a Unix shell in the Details section, which vaguely but undoubtedly suggests that `$HOME' and `${HOME}' might be allowed. Using `~/' is not especially documented either, except from the Examples section, where it is used. I probably thought it was an example of how shell-alike R processes `~/.Renviron'. The last writing (I mean, something similar) is suggested somewhere in the R manuals (but I do not have the manual with me right now to give the exact reference, I'm in another town). It is not mentioned in an R manual, but it is mentioned in the FAQ. I tried checking in the FAQ. By the way, http://www.r-project.org presents a menu on the left, and there is a group of items under the title `Documentation'. `FAQs' is shown under that title, but is not clickable. I would presume it was meant to be? However, the `Other' item is itself clickable, and offers a link to what appears to be an FAQs page. The only thing I saw, in item 5.2 of the FAQ (How can add-on packages be installed?) says that one may use `$HOME/' while defining `R_LIBS' in a Bourne shell profile, or _preferably_ use `~/` while defining `R_LIBS' within file `~/.Renviron`. The FAQ does not really say that `$HOME' is forbidden. The FAQ then refers to `?Startup' for more information, and `?Startup' is not clear on this thing, in my opinion at least. R_LIBS=$HOME/etc/R will work in a shell (and R_LIBS=~/etc/R may not). Another hint that it could be expected to work is that the same `~/.Renviron' once contained the line: R_BROWSER=$HOME/bin/links which apparently worked as expected. (This `links' script launches the real program with `-g' appended whenever `DISPLAY' is defined.) Yes, but that was not interpreted by R, rather a shell script called by R. Granted, thanks for pointing this out. The documentation does not really say either (or else I missed it) if the value of R_BROWSER is given to exec, or given to an exec'ed shell. If a shell is called, it means in particular that we can use options, and this is a useful feature, worth being known I guess. Once again, thanks for having replied, and for caring. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R_LIBS difficulty ?
Hi, R people. I'm shy reporting this, as a bug in this area sounds very unlikely. Did I make my tests wrongly? I'm still flaky at all this. Let me dare nevertheless, who knows, just in case... Please don't kill me! :-) Not so long ago, I wrote to this list: (For now, [the library code] works only for me when I do _not_ use `-l MY/OWN/LIBDIR' at `R CMD INSTALL' time, I surely made a simple blunder somewhere. Hopefully, I'll figure it out.) Now using this line within `~/.Renviron': R_LIBS=/home/pinard/etc/R my tiny package is correctly found by R. However, R does not seem to see any library within that directory if I rather use either of: R_LIBS=$HOME/etc/R R_LIBS=$HOME/etc/R The last writing (I mean, something similar) is suggested somewhere in the R manuals (but I do not have the manual with me right now to give the exact reference, I'm in another town). Another hint that it could be expected to work is that the same `~/.Renviron' once contained the line: R_BROWSER=$HOME/bin/links which apparently worked as expected. (This `links' script launches the real program with `-g' appended whenever `DISPLAY' is defined.) This is R 2.0.1, installed on SuSE 9.2. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R-generated animation of a polynomiograph
Hi, people. Two days ago, I sent to this list a little toy for exploring polynomiographs (yet, the mathematical formulas were not polynomials anymore, so the name is not really appropriate). After studying R calls, expressions and functions a bit more, I gave myself the homework of producing an animation out of my recent toy. The resulting animation, and also the sources, are available at: http://pinard.progiciels-bpi.ca/plaisirs/nr-anim-01.html P.S. - My intent was studying R, much more than producing art :-). I'm sure anyone could do nicer, playing a few hours with this! -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Polynomiographic function in R :-)
Hi, people. Nothing too serious in this message. Nevertheless, all criticism or advice is welcome :-). Yesterday, I went to a conference by Bahman Kalantari (Rutgers University) about Polynomiography (the Fine Art and Science of Visualizing Polynomials). Since I'm starting my R learning, I decided to try using it for computing some (any!) polynomiograph. I was surprised about how easy and quick it was to get some results. Then I thought it was interesting to extend the drawing to any function, and not necessarily polynomials, yielding the function below: polygraph - function(expression, xrange=c(-1, 1), yrange=c(-1, 1), points=200, steps=20, display=image) { expression - substitute(expression) variable - all.vars(expression) stopifnot(length(variable) == 1) derivative - D(expression, variable) name - as.name(variable) expression - substitute(name - expression / derivative) assign(variable, outer(seq(xrange[1], xrange[2], length=points), seq(yrange[1], yrange[2], length=points) * 1i, '+')) for (step in 1:steps) { display(Arg(eval(name))) assign(variable, eval(expression)) } } which can be used this way, picking a function almost at random, say: polygraph(x^3 - sqrt(x) - 1, points=300) Here are a few random thoughts or remarks: * Once fully converged, there should be only one colour per root. Each pixel colour shows towards which root would converge the chosen root finding algorithm, starting at this particular point, or complex number. * Another nice choice for `display' could be `filled.contour', yet it computes more slowly. * The successive plots (20 by default) show the progressive refinement while finding equation roots, making a kind of animation. One might prefer moving the `display' call out of the loop, and show only the last refinement. * I did not know that root finding through Newton-Raphson could be merely extended to complex numbers, fun to see that it works! :-) * The conferencer told us that there a _lot_ of root finding algorithms, and they may yield different styles of art. I only picked the simplest one to play with. But you might do better! (There are also many other approaches than root finding for producing graphs out of polynomials.) * Really, the one thing that most amused me in this experiment is how I could use R for symbolically preparing the computation to do, without resorting to parsing and deparsing (which I'm instinctively tempted to avoid.) I'm quite far from understanding all I should about functions, expressions, calls and parse trees, but even knowing very little, it was satisfying being able to rather quickly debug the above function. * There are likely better ways than those I used. For example, even if unlikely, there might be clashes between the variables making up the expression given, and local variables of the function. I wonder if the expression variable could have been more fully abstracted. * Vectorisation worked surpringly well on that problem, speed-wise. However, because some regions of the plane converge faster than others (use `display=plot' and such while calling `polygraph' to study this), maybe they would be ways towards significant speed-ups. But since it is likely that one would loose a good part of vectorisability by doing so, and add a lot of complexity (with unavoidable bugs in the process), I wonder how worth it would be in practice. * Given a matrix of complex results, they should ideally be turned into N groups, each group being related to one of the N roots of the equation. I tried producing factors out of these results, but numerical approximation made that non-practical. I would guess that clustering, which I do not know, may be seen as a way to produce factors fuzzily. * As a counter-measure to the above difficulty, I used `Arg()' as a way to produce levels out of the results. Could have used `Im()' instead. It seems that `Mod()' and `Re()' are less productive. `image' is kind enough to turn those levels into colours without any effort from me! All in all, it is a fun way to explore R capabilities, and it also opens up all kind of ideas to toy with! :-) -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R mailing list archive difficulty
Hi, people! This is my first babble on this list, please be kind! :-) Last Tuesday, I wrote to the (likely) Webmaster of the R site to report a little problem, but also to ask for advice about how to get a bulk copy of the mailing list archives, from 2002 to now. While I quite understand that from Tuesday to now, there has been little time, and it is only normal that I did not receive a reply yet, I dare re-submitting the same question (message appended below) to this list, in hope someone will reply sooner. I see some bits of free time, this week-end, and would like using them, if possible, at the tedious work of getting those archives, so it sooner gets behind me, instead of ahead... Thanks to all. Enjoy the spring! :-) -- François Pinard http://pinard.progiciels-bpi.ca ---BeginMessage--- Robert King, hello! The page `http://www.r-project.org' gives your name as a contact. The link `http://www.r-project.org/doc/FAQ/R-FAQ.html', near the end of the page, labelled `Frequently Asked Questions', does not resolve, giving: Not Found The requested URL /doc/FAQ/R-FAQ.html was not found on this server. Apache/1.3.26 Server at www.r-project.org Port 80 I would like to get hold on a copy of R mailing lists archives, for local, off-line, progressive perusal (I find Web-based browsing of email extremely inefficient). So, I recursively got archives from `ftp://ftp.stat.math.ethz.ch/Mail-archives/'. The format used in these files is quite usable locally. However, the problem is that these archives do not go beyond 2002. (Maybe the `http://www.r-project.org' Web page should mention this.) Would you be kind enough to advise me with a (simple) way by which I would get in bulk all R archives from 2003 up to now? The simplest format, the better, of course, yet I feel ready to locally reformat HTML if this is the only format you have at your end. I'm discovering R with a lot of pleasure, and some fear as well :-). There is in there an wholly impressive amount of work and knowledge. Thanks, and keep happy! -- François Pinard http://pinard.progiciels-bpi.ca ---End Message--- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html