Re: [R] Suggestion for big files [was: Re: A comment about R:]
[Just one point extracted: Hadley Wickham has answered the random sample one] On Thu, 5 Jan 2006, François Pinard wrote: [Brian Ripley] One problem with Francois Pinard's suggestion (the credit has got lost) is that R's I/O is not line-oriented but stream-oriented. So selecting lines is not particularly easy in R. I understand that you mean random access to lines, instead of random selection of lines. Once again, this chat comes out of reading someone else's problem, this is not a problem I actually have. SPSS was not randomly accessing lines, as data files could well be hold on magnetic tapes, where random access is not possible on average practice. SPSS reads (or was reading) lines sequentially from beginning to end, and the _random_ sample is built while the reading goes. That was not my point. R's standard I/O is through connections, which allow for pushbacks, changing line endings and re-encoding character sets. That does add overhead compared to C/Fortran line-buffered reading of a file. Skipping lines you do not need will take longer than you might guess (based on some limited experience). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ordering boxplot factors
On Thu, 5 Jan 2006, Marc Schwartz wrote: On Thu, 2006-01-05 at 20:27 -0600, Joseph LeBouton wrote: Hi all, what a great help list! I hope someone can help me with this puzzle... I'm trying to find a simple way to do: boxplot(obs~factor) so that the factors are ordered left-to-right along the x-axis by median, not alphabetically by factor name. The thing to realize is that they are not alphabetic, but ordered by factor levels. So the key is to set the levels. (The help page for boxplot does say that, as I was relieved to find.) Complicated ways abound, but I'm hoping for a magical one-liner that'll do the trick. Any suggestions would be treasured. Thanks, -jlb Using the first example in ?boxplot, which is: boxplot(count ~ spray, data = InsectSprays, col = lightgray) Get the medians for 'count by spray' using tapply() and then sort the results in increasing order, by median: med - sort(with(InsectSprays, tapply(count, spray, median))) med CEDAFB 1.5 3.0 5.0 14.0 15.0 16.5 Now do the boxplot, setting the factor levels in order by median: boxplot(count ~ factor(spray, levels = names(med)), data = InsectSprays, col = lightgray) So...technically two lines of code. This was answered yesterday in terms of bwplot. See ?reorder.factor for the same example done using reorder.factor. That will give you the single line asked for, and be self-explanatory. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
FrPi == François Pinard [EMAIL PROTECTED] on Thu, 5 Jan 2006 22:41:21 -0500 writes: FrPi [Brian Ripley] I rather thought that using a DBMS was standard practice in the R community for those using large datasets: it gets discussed rather often. FrPi Indeed. (I tried RMySQL even before speaking of R to my co-workers.) Another possibility is to make use of the several DBMS interfaces already available for R. It is very easy to pull in a sample from one of those, and surely keeping such large data files as ASCII not good practice. FrPi Selecting a sample is easy. Yet, I'm not aware of any FrPi SQL device for easily selecting a _random_ sample of FrPi the records of a given table. On the other hand, I'm FrPi no SQL specialist, others might know better. FrPi We do not have a need yet for samples where I work, FrPi but if we ever need such, they will have to be random, FrPi or else, I will always fear biases. One problem with Francois Pinard's suggestion (the credit has got lost) is that R's I/O is not line-oriented but stream-oriented. So selecting lines is not particularly easy in R. FrPi I understand that you mean random access to lines, FrPi instead of random selection of lines. Once again, FrPi this chat comes out of reading someone else's problem, FrPi this is not a problem I actually have. SPSS was not FrPi randomly accessing lines, as data files could well be FrPi hold on magnetic tapes, where random access is not FrPi possible on average practice. SPSS reads (or was FrPi reading) lines sequentially from beginning to end, and FrPi the _random_ sample is built while the reading goes. FrPi Suppose the file (or tape) holds N records (N is not FrPi known in advance), from which we want a sample of M FrPi records at most. If N = M, then we use the whole FrPi file, no sampling is possible nor necessary. FrPi Otherwise, we first initialise M records with the FrPi first M records of the file. Then, for each record in FrPi the file after the M'th, the algorithm has to decide FrPi if the record just read will be discarded or if it FrPi will replace one of the M records already saved, and FrPi in the latter case, which of those records will be FrPi replaced. If the algorithm is carefully designed, FrPi when the last (N'th) record of the file will have been FrPi processed this way, we may then have M records FrPi randomly selected from N records, in such a a way that FrPi each of the N records had an equal probability to end FrPi up in the selection of M records. I may seek out for FrPi details if needed. FrPi This is my suggestion, or in fact, more a thought that FrPi a suggestion. It might represent something useful FrPi either for flat ASCII files or even for a stream of FrPi records coming out of a database, if those effectively FrPi do not offer ready random sampling devices. FrPi P.S. - In the (rather unlikely, I admit) case the gang FrPi I'm part of would have the need described above, and FrPi if I then dared implementing it myself, would it be welcome? I think this would be a very interesting tool and I'm also intrigued about the details of the algorithm you outline above. If it would be made to work on all kind of read.table()-readable files, (i.e. of course including *.csv); that might be a valuable tool for all those -- and there are many -- for whom working with DBMs is too daunting initially. Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R - Link to a technical report from ATS, UCLA
Hi all, UCLA ATS Statistical Consulting Group has just launched a very interesting paper comparing SPSS, SAS Stata as Statistical Packages.. Perhaps the most notable exception to this discussion is R http://www.ats.ucla.edu/stat/technicalreports/ It's an interesting reading for this thread. Best regards Naji __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
On Fri, 6 Jan 2006, Martin Maechler wrote: FrPi == François Pinard [EMAIL PROTECTED] on Thu, 5 Jan 2006 22:41:21 -0500 writes: FrPi [Brian Ripley] I rather thought that using a DBMS was standard practice in the R community for those using large datasets: it gets discussed rather often. FrPi Indeed. (I tried RMySQL even before speaking of R to my co-workers.) Another possibility is to make use of the several DBMS interfaces already available for R. It is very easy to pull in a sample from one of those, and surely keeping such large data files as ASCII not good practice. FrPi Selecting a sample is easy. Yet, I'm not aware of any FrPi SQL device for easily selecting a _random_ sample of FrPi the records of a given table. On the other hand, I'm FrPi no SQL specialist, others might know better. FrPi We do not have a need yet for samples where I work, FrPi but if we ever need such, they will have to be random, FrPi or else, I will always fear biases. One problem with Francois Pinard's suggestion (the credit has got lost) is that R's I/O is not line-oriented but stream-oriented. So selecting lines is not particularly easy in R. FrPi I understand that you mean random access to lines, FrPi instead of random selection of lines. Once again, FrPi this chat comes out of reading someone else's problem, FrPi this is not a problem I actually have. SPSS was not FrPi randomly accessing lines, as data files could well be FrPi hold on magnetic tapes, where random access is not FrPi possible on average practice. SPSS reads (or was FrPi reading) lines sequentially from beginning to end, and FrPi the _random_ sample is built while the reading goes. FrPi Suppose the file (or tape) holds N records (N is not FrPi known in advance), from which we want a sample of M FrPi records at most. If N = M, then we use the whole FrPi file, no sampling is possible nor necessary. FrPi Otherwise, we first initialise M records with the FrPi first M records of the file. Then, for each record in FrPi the file after the M'th, the algorithm has to decide FrPi if the record just read will be discarded or if it FrPi will replace one of the M records already saved, and FrPi in the latter case, which of those records will be FrPi replaced. If the algorithm is carefully designed, FrPi when the last (N'th) record of the file will have been FrPi processed this way, we may then have M records FrPi randomly selected from N records, in such a a way that FrPi each of the N records had an equal probability to end FrPi up in the selection of M records. I may seek out for FrPi details if needed. FrPi This is my suggestion, or in fact, more a thought that FrPi a suggestion. It might represent something useful FrPi either for flat ASCII files or even for a stream of FrPi records coming out of a database, if those effectively FrPi do not offer ready random sampling devices. FrPi P.S. - In the (rather unlikely, I admit) case the gang FrPi I'm part of would have the need described above, and FrPi if I then dared implementing it myself, would it be welcome? I think this would be a very interesting tool and I'm also intrigued about the details of the algorithm you outline above. It's called `reservoir sampling' and is described in my simulation book and Knuth and elsewhere. If it would be made to work on all kind of read.table()-readable files, (i.e. of course including *.csv); that might be a valuable tool for all those -- and there are many -- for whom working with DBMs is too daunting initially. It would be better (for the reasons I gave) to do this in a separate file preprocessor: read.table reads from a connection not a file, of course. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ylim problem in barplot
Ben == Ben Bolker [EMAIL PROTECTED] on Thu, 5 Jan 2006 19:21:48 + (UTC) writes: Ben Robert Baer rbaer at atsu.edu writes: Well, consider this example: barplot(c(-200,300,-250,350),ylim=c(-99,400)) It seems that barplot uses ylim and pretty to decide things about the axis but does some slightly unexpected things with the bars themselves that are not just at the 'zero' end of the bar. Rob no, there's no pretty() involved. Maybe it helps you to just type box() after the plot. Simply, the usual par(mar) margins are set. I think ___in conclusion___ that Marc Schwartz' solution has been right on target all along: Use 'xpd = FALSE' if you set 'ylim' because otherwise, the result may be confusing. The real problem of barplot.default() is the fact that 'xpd = TRUE' is the default, and AFAIK that's not the case for other high-level plot functions. One could debate if the default setting for xpd should not be changed to xpd = (is.null(ylim) !horiz) || (is.null(xlim) horiz) Now this has definitely gotten a topic for R-devel, and not R-help anymore. Ben in previous cases I think there was room for debate about Ben the appropriate behavior. What do you think should happen Ben in this case? Cutting off the bars seems like the right thing Ben to do; Ben is your point that the axis being confined to positive values (a side effect of setting ylim) is weird? Ben Ben __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R - Link to a technical report from ATS, UCLA
Naji [EMAIL PROTECTED] writes: Hi all, UCLA ATS Statistical Consulting Group has just launched a very interesting paper comparing SPSS, SAS Stata as Statistical Packages.. Perhaps the most notable exception to this discussion is R http://www.ats.ucla.edu/stat/technicalreports/ It's an interesting reading for this thread. In fact, if you trace the thread back to its root, this is what started it... -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RMySQL/DBI
Hello, does anybody run RMySQL/DBI successfully on SunOS5.8 and MySQL 3.23.53 ? I'll get a segmentation fault whe trying to call dbConnect. We'll soon swtich to MySQL 4, however, I was wondering whether the very ancient mysql version realy is the problem ... RMySQL 0.5-5 DBI 0.1-9 R 2.2.0 SunOS 5.8 kind regards and thanks a lot for your help, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
I agree. In desperation at my inbox being swamped by messages I contacted the R-core team to ask about other solutions. They recommended gmane.org who compile a web-viewable archive of thousands of email lists - it even provides RSS feeds for new topics. Going back to the wiki issue, it might be wise to this about using Trac http://projects.edgewall.com/trac/ which is an open source project that integrates a wiki with the SVN code versioning system (used by R-project) and a replacement for bugzilla's ticketing system. We use it to document our own code. Trac would have the advantage of pushing questions on the R list back towards the actual source code and allowing all users to participate in the future development of the software. John Marsland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
John Marsland writes: Trac would have the advantage of pushing questions on the R list back towards the actual source code and allowing all users to participate in the future development of the software. I see that this could be useful for R-devel, but considering the volume of traffic and the kind of contents on R-help, I don't think such tying to the actual source code would be so useful. Perhaps trac could be used as an integrated interface for r-devel/svn and the bug track system, and another wiki solution be used exclusiverly for the r-help community (which includes many people not directly interested in coding or development issues). -- Though this be randomness, yet there is structure in't. Rosa, F.H.F.P Instituto de Matemática e Estatística Universidade de São Paulo Fernando Henrique Ferraz P. da Rosa http://www.feferraz.net __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Use Of makeARIMA
I have not seen a reply to this post, so I will attempt a feeble response. I've been wanting to learn more about these commands and suffering, like you, from the paucity of examples to follow. To get started, after reading the help pages for all the commands you mentioned, I tried to think of the simplest example that might help me learn something about this. This question led to the following: set.seed(3) y3 - rep(0:2, 10)+0.1*rnorm(30) acf(y3) # ACF suggest a period 3 seasonal pacf(y3)# PACF suggests a pure AR of order at most 3 (fit3 - arima(y3, seasonal=list(order=c(1,0,0), period=3))) attributes(fit3) fit3$model # Compare with the documentation for 'makeARIMA' KalmanForecast(mod=fit3$model) If I wanted to understand better makeARIMA in particular, I listed arima and did a search for makeARIMA: arima clearly uses makeARIMA. If you run 'debug(arima)' then the above 'arima' command, you can step through the 'arima' function line by line and look at (and modify) any of the objects that function creates and uses. In particular, you will be able to see exactly how the arima command uses the makeARIMA function. Hope this helps. spencer graves p.s. If you'd like more help from this group, please submit another question. Before you do, however, I suggest you first read the posting guide! www.R-project.org/posting-guide.html. Anecdotal evidence suggests that posts more consistent with that guide are more likely to receive more useful replies quicker. Sumanta Basak wrote: Hi R-Experts, Currently I'm using an univariate time series in which I'm going to apply KalmanLike(),KalmanForecast (),KalmanSmooth(), KalmanRun(). For I use it before makeARIMA () but I don't understand and i don't know to include the seasonal coefficients. Can anyone help me citing a suitable example? Thanks in advance. -- SUMANTA BASAK. -- http://www.drsb24.blogspot.com/ --- This e-mail may contain confidential and/or privileged infor...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
On 01/06/06 13:40, John Marsland wrote: Going back to the wiki issue, it might be wise to this about using Trac http://projects.edgewall.com/trac/ which is an open source project that integrates a wiki with the SVN code versioning system (used by R-project) and a replacement for bugzilla's ticketing system. We use it to document our own code. Trac would have the advantage of pushing questions on the R list back towards the actual source code and allowing all users to participate in the future development of the software. It isn't clear to me what this would be for. I'm not sure that I trust users to modify code. I was thinking myself that user input might be most useful for the documentation of functions. Not that this is so bad, but rather it might be possible to have an extended system of documentation on the web, with FAQ-type questions answered as part of the documentation itself, so that people would not have to rely on R-help so much (even in its archived forms). And I was thinking of setting up a Wiki with one page per function. (Given that there are now hundreds or thousands of functions, setting this up would have to be automated.) I've just installed (for another purpose) TWiki, which seems to have some nice features for this sort of thing (in particular, data stored as text files, hence easily manipulated by other programs), but I will not have time to think through how to do this for some time. Just another idea to throw into the hopper. In principle, another possibility is to do something like the PHP manual at http://www.php.net/manual/en/, which is not a wiki but more like a bulletin board, with discussion of each command. But I think a wiki is better. I found it time consuming to read through all those comments, almost as bad as reading through R-help postings. :) Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
On 1/6/06, John Marsland [EMAIL PROTECTED] wrote: I see your point. Maybe the answer is to use the list for R-help style questions, but encourage people who answer questions to point the the answers in the wiki - which they might have enhanced if necessary. On 1/6/06, Fernando Henrique Ferraz P. da Rosa [EMAIL PROTECTED] wrote: John Marsland writes: Trac would have the advantage of pushing questions on the R list back towards the actual source code and allowing all users to participate in the future development of the software. I see that this could be useful for R-devel, but considering the volume of traffic and the kind of contents on R-help, I don't think such tying to the actual source code would be so useful. Perhaps trac could be used as an integrated interface for r-devel/svn and the bug track system, and another wiki solution be used exclusiverly for the r-help community (which includes many people not directly interested in coding or development issues). -- Though this be randomness, yet there is structure in't. Rosa, F.H.F.P Instituto de Matemática e Estatística Universidade de São Paulo Fernando Henrique Ferraz P. da Rosa http://www.feferraz.net __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
It isn't so much that users modify the code as they would have to do that in the usual way by checking out the project from the SVN. Rather that extended documentation, features and enhancements etc. can easily locate and quote from the code base and the differencing engine as applied to the code base between versions. On 1/6/06, Jonathan Baron [EMAIL PROTECTED] wrote: It isn't clear to me what this would be for. I'm not sure that I trust users to modify code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
Jonathan Baron wrote: And I was thinking of setting up a Wiki with one page per function. (Given that there are now hundreds or thousands of functions, setting this up would have to be automated.) One page per R manual page file would probably suffice. You could do something along the lines of the Zope book, where users can add comments but you can browse with comments off: http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition/AdvDTML.stx then toggle the 'Com On' button. This is less of a wiki and more of an annotation service. but I think you'd run into problems with losing all the annotation when a new R version comes out. Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] A comment about R:
I just got into R for most of the Xmas vacations and was about to ask for helping pointer on how to get a hold of R when I came across this thread. I've read through most it and would like to comment from a novice user point of view. I've a strong programming background but limited statistical experience and no knowledge on competing packages. I'm working as a senior engineer in electronics. Yes, the learning curve is steep. Most of the docu is extremely terse. Learning is mostly from examples (a wiki was proposed in another mail...), documentation uses no graphical elements at all. So, when it comes to things like xyplot in lattice: where would I get the concepts behind panels, superpanels, and the like? ok., this is steep and terse, but after a while I'll get over it... That's life. The general concept is great, things can be expressed very densly: Potential is here I quickly had 200 lines of my own code together, doing what it should - or so I believed. Next I did: matrix-matrix(1:100, 10, 10)image(matrix) locator() Great: I can interactively work with my graphs... But then: filled.contour(matrix) locator() Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize that fitted.contour() has a color bar to the right and scales x wrongly... Here is what really shocked me: str(bar) `data.frame': 206858 obs. of 12 variables: ... str(mean(bar[,6:12])) Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ... ... str(sd(bar[,6:12])) Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ... ... prcomp(bar[,6:12])-foo str(foo$x) num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ... ... str(mean(foo$x)) num -1.07e-13 str(sd(foo$x)) Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ... ... So, sd returns a vector independent on whether the arguement is a matrix or data.frame, but mean reacts differently and returns a vector only against a data.frame? The problem here is not that this is difficult to learn - the problem is the complete absense of a concept. Is a data.frame an 'extended' matrix with columns of different types or something different? Since the numeric mean (I expected a vector) is recycled nicely when used in a vector context, this makes debugging code close to impossible. Since sd returns a vector, things like mean + 4*sd vary sufficiently across the data elements that I assume working code... I don't get any warning signal that something is wrong here. The point in case is the behavior of locator() on a filled.contour() object: Things apparently have been programmed and debugged from example rather than concept. Now, in another posting I read that all this is a feature to discourge inexperienced users from statistics and force you to think before you do things. Whilst I support this concept of thinking: Did I miss something in statistics? I was in the believe that mean and sd were relatively close to each other conceptually... (here, they are even in different packages...) I will continue using R for the time being. But whether I can recommend it to my work collegues remains to be seen: How could I ever trust results returned? I'm still impressed by some of the efficiency, but my trust is deeply shaken... Stefan Eichenberger mailto:[EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
On 1/6/2006 9:15 AM, Jonathan Baron wrote: On 01/06/06 13:40, John Marsland wrote: Going back to the wiki issue, it might be wise to this about using Trac http://projects.edgewall.com/trac/ which is an open source project that integrates a wiki with the SVN code versioning system (used by R-project) and a replacement for bugzilla's ticketing system. We use it to document our own code. Trac would have the advantage of pushing questions on the R list back towards the actual source code and allowing all users to participate in the future development of the software. It isn't clear to me what this would be for. I'm not sure that I trust users to modify code. I was thinking myself that user input might be most useful for the documentation of functions. Not that this is so bad, but rather it might be possible to have an extended system of documentation on the web, with FAQ-type questions answered as part of the documentation itself, so that people would not have to rely on R-help so much (even in its archived forms). And I was thinking of setting up a Wiki with one page per function. (Given that there are now hundreds or thousands of functions, setting this up would have to be automated.) I've just installed (for another purpose) TWiki, which seems to have some nice features for this sort of thing (in particular, data stored as text files, hence easily manipulated by other programs), but I will not have time to think through how to do this for some time. Just another idea to throw into the hopper. I think this sounds like a great idea. I would like to see two way connections between this and the existing man pages, e.g. in the HTML or PDF versions, links that go directly to the Wiki, and links from the Wiki to an online copy of the man pages. If your automatic setup permitted it, then showing the output of the examples on the man pages would be nice. One issue that you'll need to think about is whether there is one page per function, or one page per .Rd file, or some other organization: and you'll need to be prepared for changes in the organization of the documentation with new R releases (and changes in function names, and changes in the examples...). Duncan Murdoch In principle, another possibility is to do something like the PHP manual at http://www.php.net/manual/en/, which is not a wiki but more like a bulletin board, with discussion of each command. But I think a wiki is better. I found it time consuming to read through all those comments, almost as bad as reading through R-help postings. :) Jon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
Regarding systems for presenting documentation and allowing user comments, I recently came across Commentary (see homepage http://pythonpaste.org/commentary/). Haven't used it, but my impression is that comments and the main doc are both stored in svn (and auto-committed for comment changes). This might help solve the problem of updating the doc upon a new R release because you could take advantage of svn merge. Of course, svn merge won't know whether the comments are still appropriate or not :-( Nevermind. + seth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] A comment about R:
~~~ ... blame me for not having sent below message initially in plain text format. Sorry! ~~~ I just got into R for most of the Xmas vacations and was about to ask for helping pointer on how to get a hold of R when I came across this thread. I've read through most it and would like to comment from a novice user point of view. I've a strong programming background but limited statistical experience and no knowledge on competing packages. I'm working as a senior engineer in electronics. Yes, the learning curve is steep. Most of the docu is extremely terse. Learning is mostly from examples (a wiki was proposed in another mail...), documentation uses no graphical elements at all. So, when it comes to things like xyplot in lattice: where would I get the concepts behind panels, superpanels, and the like? ok., this is steep and terse, but after a while I'll get over it... That's life. The general concept is great, things can be expressed very densly: Potential is here I quickly had 200 lines of my own code together, doing what it should - or so I believed. Next I did: matrix-matrix(1:100, 10, 10) image(matrix) locator() Great: I can interactively work with my graphs... But then: filled.contour(matrix) locator() Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize that fitted.contour() has a color bar to the right and scales x wrongly... Here is what really shocked me: str(bar) `data.frame': 206858 obs. of 12 variables: ... str(mean(bar[,6:12])) Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ... ... str(sd(bar[,6:12])) Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ... ... prcomp(bar[,6:12])-foo str(foo$x) num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ... ... str(mean(foo$x)) num -1.07e-13 str(sd(foo$x)) Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ... ... So, sd returns a vector independent on whether the arguement is a matrix or data.frame, but mean reacts differently and returns a vector only against a data.frame? The problem here is not that this is difficult to learn - the problem is the complete absense of a concept. Is a data.frame an 'extended' matrix with columns of different types or something different? Since the numeric mean (I expected a vector) is recycled nicely when used in a vector context, this makes debugging code close to impossible. Since sd returns a vector, things like mean + 4*sd vary sufficiently across the data elements that I assume working code... I don't get any warning signal that something is wrong here. The point in case is the behavior of locator() on a filled.contour() object: Things apparently have been programmed and debugged from example rather than concept. Now, in another posting I read that all this is a feature to discourge inexperienced users from statistics and force you to think before you do things. Whilst I support this concept of thinking: Did I miss something in statistics? I was in the believe that mean and sd were relatively close to each other conceptually... (here, they are even in different packages...) I will continue using R for the time being. But whether I can recommend it to my work collegues remains to be seen: How could I ever trust results returned? I'm still impressed by some of the efficiency, but my trust is deeply shaken... --- Stefan Eichenbergermailto:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
RG, Actually, SQLite provides a solution to read *.csv file directly into db. Just for your consideration. On 1/5/06, ronggui [EMAIL PROTECTED] wrote: 2006/1/6, jim holtman [EMAIL PROTECTED]: If what you are reading in is numeric data, then it would require (807 * 118519 * 8) 760MB just to store a single copy of the object -- more memory than you have on your computer. If you were reading it in, then the problem is the paging that was occurring. In fact,If I read it in 3 pieces, each is about 170M. You have to look at storing this in a database and working on a subset of the data. Do you really need to have all 807 variables in memory at the same time? Yip,I don't need all the variables.But I don't know how to get the necessary variables into R. At last I read the data in piece and use RSQLite package to write it to a database.and do then do the analysis. If i am familiar with database software, using database (and R) is the best choice,but convert the file into database format is not an easy job for me.I ask for help in SQLite list,but the solution is not satisfying as that required the knowledge about the third script language.After searching the internet,I get this solution: #begin rm(list=ls()) f-file(D:\wvsevs_sb_v4.csv,r) i - 0 done - FALSE library(RSQLite) con-dbConnect(SQLite,c:\sqlite\database.db3) tim1-Sys.time() while(!done){ i-i+1 tt-readLines(f,2500) if (length(tt)2500) done - TRUE tt-textConnection(tt) if (i==1) { assign(dat,read.table(tt,head=T,sep=,,quote=)); } else assign(dat,read.table(tt,head=F,sep=,,quote=)) close(tt) ifelse(dbExistsTable(con, wvs),dbWriteTable(con,wvs,dat,append=T), dbWriteTable(con,wvs,dat) ) } close(f) #end It's not the best solution,but it works. If you use 'scan', you could specify that you do not want some of the variables read in so it might make a more reasonably sized objects. On 1/5/06, François Pinard [EMAIL PROTECTED] wrote: [ronggui] R's week when handling large data file. I has a data file : 807 vars, 118519 obs.and its CVS format. Stata can read it in in 2 minus,but In my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M. Just (another) thought. I used to use SPSS, many, many years ago, on CDC machines, where the CPU had limited memory and no kind of paging architecture. Files did not need to be very large for being too large. SPSS had a feature that was then useful, about the capability of sampling a big dataset directly at file read time, quite before processing starts. Maybe something similar could help in R (that is, instead of reading the whole data in memory, _then_ sampling it.) One can read records from a file, up to a preset amount of them. If the file happens to contain more records than that preset number (the number of records in the whole file is not known beforehand), already read records may be dropped at random and replaced by other records coming from the file being read. If the random selection algorithm is properly chosen, it can be made so that all records in the original file have equal probability of being kept in the final subset. If such a sampling facility was built right within usual R reading routines (triggered by an extra argument, say), it could offer a compromise for processing large files, and also sometimes accelerate computations for big problems, even when memory is not at stake. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? -- é»è£è´µ Deparment of Sociology Fudan University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- WenSui Liu (http://statcompute.blogspot.com) Senior Decision Support Analyst Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lmer p-vales are sometimes too small
This concerns whether p-values from lmer can be trusted. From simulations, it seems that lmer can produce very small, and probably spurious, p-values. I realize that lmer is not yet a finished product. Is it likely that the problem will be fixed in a future release of the lme4 package? Using simulated data for a quite standard mixed-model anova (a balanced two-way design; see code for the function SimMixed pasted below), I compared the output of lmer, for three slightly different models, with the output of aov. For an example where there is no fixed treatment effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40 observations per treatment-block combination, I find that lmer gives more statistical significances than it should, whereas aov does not have this problem. An example of output I generated by calling SimMixed(1000) is the following: Proportion significances at the 0.05 level aov: 0.05 lmer.1: 0.148 lmer.2: 0.148 lmer.3: 0.151 Proportion significances at the 0.01 level aov: 0.006 lmer.1: 0.076 lmer.2: 0.076 lmer.3: 0.077 Proportion significances at the 0.001 level aov: 0.001 lmer.1: 0.047 lmer.2: 0.047 lmer.3: 0.047 which is based on 1000 simulations (and takes about 5 min on my PowerMac G5). The different models fitted are: fm.aov - aov(y ~ Treat + Error(Block/Treat), data = dat) fm.lmer.1 - lmer(y ~ Treat + (Treat|Block), data = dat) fm.lmer.2 - lmer(y ~ Treat + (Treat-1|Block), data = dat) fm.lmer.3 - lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat) It seems that, depending on the level of the test, lmer gives between a factor of 3 to a factor of around 50 times too many significances. The first two lmer models seem to give identical results, whereas the third (which I think perhaps is the one that best represents the data generated by the simulation) differs slightly. In running the simulations, warnings like this are occasionally generated: Warning message: optim or nlminb returned message false convergence (8) in: LMEoptimize-(`*tmp*`, value = list(maxIter = 200, tolerance = 1.49011611938477e-08, They seem to derive from the third of the lmer models. Perhaps there is some numerical issue in the lmer function? From running SimMixed() several times, I have noticed that large p-values (say, larger than 0.5) agree very well between lmer and aov, but there seems to be a systematic discrepancy for smaller p-values, where lmer gives smaller values than aov. The F-values agree between all analyzes (except for fm.lmer.3 when there is a warning), so there is a systematic difference between lmer and aov in how a p-value is obtained from the F-value, which becomes severe for small p-values. My output from sessionInfo() R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0 attached base packages: [1] methods stats graphics grDevices utils datasets base other attached packages: lme4 latticeMatrix 0.98-1 0.12-11 0.99-3 Pasted code for the SimMixed function (some lines might wrap): # This function generates n.sims random data sets for a design with 4 # blocks, 2 treatments applied to each block, and 40 replicate # observations for each block-treatment combination. There is no true # fixed treatment effect, so a statistical significance of a test for # a fixed treatment effect ought to occur with a probability equal to # the nominal level of the test. Four tests are applied to each # simulated data set: the classical aov and three versions of lmer, # corresponding to different model formulations. The proportion of # tests for a fixed treatment effect that become significant at the # 0.05 0.01 and 0.001 levels are printed, as well as the p-values for # the last of the simulations. In my runs, lmer gives significance # more often than indicated by the nominal level, for each of the # three models, whereas aov is OK. The package lme4 needs to be loaded # to run the code. SimMixed - function(n.sims = 1) { k - 4# number of blocks n - 40 # num obs per block X treatment combination m1 - 1.0 # fixed effect of level 1 of treatment m2 - m1 # fixed effect of level 2 of treatment sd.block - 0.5 # SD of block random effect sd.block.trt - 1.0 # SD of random effect for block X treatm sd.res - 0.1 # Residual SD Block - factor( rep(1:k, each=2*n) ) Treat - factor( rep( rep(c(Tr1,Tr2), k), each=n) ) m - rep( rep(c(m1, m2), k), each=n) # fixed effects # storage for p-values p.aov - rep(0, n.sims) p.lmer.1 - rep(0, n.sims) p.lmer.2 - rep(0, n.sims) p.lmer.3 - rep(0, n.sims) for (i in 1:n.sims) { # first get block and treatment random deviations b - rep( rep(rnorm(k, 0, sd.block), each=2) + rnorm(2*k, 0, sd.block.trt), each=n ) # then get response y - m + b + rnorm(2*k*n, 0, sd.res) dat - data.frame(Block, Treat, y) # perform the tests fm.aov -
[R] inverse prediction intervals for nonlinear least squares
I'm trying to help several of our scientists with constructing inverse prediction intervals for models estimated with nonlinear least squares. So for example, we might estimate mean of y from a 4 parameter logistic function of x [e.g., using SSfpl in nls()], but then want to estimate a prediction interval for x estimated from y (calibration problem, inverse prediction). I've done some searching of R archives and found the nlscal() function in package quantchem but this only seems to provide inverse estimates not intervals (although quantchem does have a function for inverse prediction intervals of linear models). Is anyone aware of another function or package in R that will provide for inverse prediciton intervals for nonlinear least squares? I will confess that I'm not cognizant of whether there is well developed, accessible theory for inverse prediction intervals in the nonlinear model. Brian Brian S. Cade U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: [EMAIL PROTECTED] tel: 970 226-9326 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
I second Frank's comment! I wonder if questioners who receive a bunch of useful replies could be encouraged to enter a summary of those on a Wiki, in much the same way as users of S-news were expected to post a summary of their answers as a way of giving something back. An existing R Wiki is located at http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome However, there's currently not much on it. Recently on R-help there was a summary of using databases with R, which looked very useful, so I put that on the Wiki. Maybe if others just start putting things there it can gather momentum? -- Tony Plate Frank E Harrell Jr wrote: I feel that as long as people continue to provide help on r-help wikis will not be successful. I think we need to move to a central wiki or discussion board and to move away from e-mail. People are extremely helpful but e-mail seems to be to always be memory-less and messages get too long without factorization of old text. R-help is now too active and too many new users are asking questions asked dozens of times for e-mail to be effective. The wiki also needs to collect and organize example code, especially for data manipulation. I think that new users would profit immensely from a compendium of examples. Just my .02 Euros Frank __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] help with strip.default
Hi, I am creating a multi-conditioned trellis plot. My data look something like this: Factor AFactor BIVDV X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 5 Z 1 Z 2 Z 3 Z 4 In one sense these data are suitable for trellis because for every level of factor A there are four levels of factor B. However, the names of the factor B levels depend on the level of factor A. How would I create a 3 x 4 trellis plot where each panel is a combination of factor A and factor B where the names of factor B are preserved and the strip has two levels, one for factor A and another for factor B? This was more difficult than I thought because trellis wants to generate 15 panels, as there are 3 levels of factor A and 5 levels of factor B. But these 5 levels of factor B are in name only. There are only 4 different levels of factor B for each level of factor A. As a work around I am considering renaming the levels in factor A from 1 to 4 for all levels of factor B. Then, write a custom strip.default to specify the names. However, I am not sure how to write this function. Would someone help me get started? Thanks, Steve [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [Rd] Multiplication
[crossed over to r-help since its not a bug and not a devel thing any more] Thomas Lumley wrote: So is -2^2. The precedence of ^ is higher than that of unary minus. It may be surprising, but it *is* documented and has been in S for a long time. And just about every other programming language: Matlab: -2^2 ans = -4 Maxima: (C1) -2^2; (D1) - 4 Fortran: print *,-2**2 -4 Perl: $ perl -e 'print -2^2' 4294967292 Oops. I mean: $ perl -e 'print -2**2' -4 The precendence of operators is remarkably consistent over programming languages over time. It seems natural for me now that ^ is done before unary minus, but I don't know if that's because I've been doing that for 25 years or because its really more natural. Anyone got a counter example where unary minus is higher than power? Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
I don't have any significant experience with wikis, but I have yet to use any discussion board that was anywhere near as useful to me, or as easy to use, as an email list. Discussion boards have a web browser interface. Typically, they display at most a dozen topics at a time. Scrolling to get the next dozen is slow, as it requires a download from some web server. There is a huge amount of wasted screen space. When there is a topic that generates many messages scrolling through them is slow, as some discussion board interfaces show only 6 or 7 at a time. Search engines provided by the discussion board software are limited and slow. In contrast, in my email client I can show about three dozen subject lines at a time, I can quickly scroll up and down through the list, I can quickly group all the messages with the same subject line with a single click of the mouse. I can easily and quickly store selected messages of particular interest to a place where I can easily find them again. My email software searches very quickly through a huge number of messages. Then there's the question of administration and maintenance. Who is going to set up the wiki or discussion board categories? As far as I can tell (and that's actually not very far), either of them would require a lot more time and effort to set up and maintain than the present email list. Yes, r-help has a huge volume -- right now, my R-help mailbox has almost 22,000 messages in it, 2004-01-02 to the present; its size is about 124 mb. Yes, there is a lot of duplication. None the less, I find it easier and quicker to scan the subject lines a few times a day for interesting-looking topics than it would be to go to a browser and have to navigate up and down through various categories, looking for interesting-looking topics. As far as I can tell, the wiki concept is more along the lines of a reference library, whereas mailing lists and discussion boards are meant for people to ask each other questions, and give each other answers. If that perception is at all accurate, I would have to say that a wiki is by no means a suitable replacement for an email list. And when it comes to a choice between an email list and a discussion board, I have a strong preference for the email list. -Don At 7:04 PM -0600 1/5/06, Frank E Harrell Jr wrote: I feel that as long as people continue to provide help on r-help wikis will not be successful. I think we need to move to a central wiki or discussion board and to move away from e-mail. People are extremely helpful but e-mail seems to be to always be memory-less and messages get too long without factorization of old text. R-help is now too active and too many new users are asking questions asked dozens of times for e-mail to be effective. The wiki also needs to collect and organize example code, especially for data manipulation. I think that new users would profit immensely from a compendium of examples. Just my .02 Euros Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] sudoku
Any doubts about R's big-league status should be put to rest, now that we have a Sudoku Puzzle Solver. Take that, SAS! See package sudoku on CRAN. The package could really use a puzzle generator -- contributors are welcome! -- David Brahm ([EMAIL PROTECTED]) [[alternative HTML version deleted]] ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to visualise spatial raster data?
Dear R help, We are trying to visualise spatial raster data. We have per line, X Y coordinates and Z(data). How could we visualise this type of data? We also would like to add extra data points to this plot based on new X,Y and Z data. We used the following function but would like to use only the graph in the upper right corner (spatial one) . Similar to the graph cfr. http://www.est.ufpr.br/geoR/geoRdoc/vignette/geoRintro/geoRintrose3.html#x4-60003.1 geo_iRVI - as.geodata(pixels_blok,coords.col=2:3, data.col=4) plot(geo_iRVI) How can this plot be optimized? And can we add other points to it? Another solution could be: filled.contour(AVG,color=terrain.colors, xlab=Longitude (°), ylab=Latitude (°)) but therefore the data needs to be organised differently, not per line of X,Y coordinates but in a raster form. Can anyone advise functions to visualise spatial raster data optimally? thanks, Jan windows R 2.2 library(geoR) library(akima) Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] distribution maps
Dears, I would like to know if there is a R package(s) on CRAN that can generate distribution maps of species. I think that this issue not has been discussed, but I did not search extensively on CRAN or help archives. Best regards Rogério __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Hi just to difference between matrix and data.frame str(data.frame(mat)) `data.frame': 4 obs. of 5 variables: $ X1: num -0.1940 -0.7629 0.0446 -0.5408 $ X2: num -1.092 -0.040 1.070 0.868 $ X3: num 0.634 0.823 0.693 1.152 $ X4: num 0.0258 -1.6507 1.2052 0.9714 $ X5: num 0.673 0.380 -1.531 -0.426 str((mat)) num [1:4, 1:5] -0.1940 -0.7629 0.0446 -0.5408 -1.0925 ... matrix is a numeric vector with dim attributes, data frame is matrix like structure which can hold different types of variables (columns). sd is function based on var sd function (x, na.rm = FALSE) { if (is.matrix(x)) apply(x, 2, sd, na.rm = na.rm) else if (is.vector(x)) sqrt(var(x, na.rm = na.rm)) else if (is.data.frame(x)) sapply(x, sd, na.rm = na.rm) else sqrt(var(as.vector(x), na.rm = na.rm)) } environment: namespace:stats and therefore behaves in similar manner for data.frames and matrices, but mean accepts only data.frames, numeric vectors and dates Arguments: x: An R object. Currently there are methods for numeric data frames, numeric vectors and dates. A complex vector is allowed for 'trim = 0', only. So therefore matrix is treated as a numeric vector by mean but as a set of vectors by sd. Don't know why. I believe that it is because with var(matrix) you expect output as a variance matrix. Maybe somebody can explain it better. If you wanted similar behaviour for mean for matrices as sd you can try mymean-function(x, na.rm=FALSE) { if(is.matrix(x)) colMeans(x, na.rm=na.rm) else mean(x, na.rm=na.rm) } mymean(mat) [1] -0.3632682 0.2013843 0.8251625 0.1379205 -0.2259909 HTH Petr On 6 Jan 2006 at 16:18, Stefan Eichenberger wrote: From: Stefan Eichenberger [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Date sent: Fri, 6 Jan 2006 16:18:16 +0100 Subject:[R] A comment about R: ~~~ ... blame me for not having sent below message initially in plain text format. Sorry! ~~~ I just got into R for most of the Xmas vacations and was about to ask for helping pointer on how to get a hold of R when I came across this thread. I've read through most it and would like to comment from a novice user point of view. I've a strong programming background but limited statistical experience and no knowledge on competing packages. I'm working as a senior engineer in electronics. Yes, the learning curve is steep. Most of the docu is extremely terse. Learning is mostly from examples (a wiki was proposed in another mail...), documentation uses no graphical elements at all. So, when it comes to things like xyplot in lattice: where would I get the concepts behind panels, superpanels, and the like? ok., this is steep and terse, but after a while I'll get over it... That's life. The general concept is great, things can be expressed very densly: Potential is here I quickly had 200 lines of my own code together, doing what it should - or so I believed. Next I did: matrix-matrix(1:100, 10, 10) image(matrix) locator() Great: I can interactively work with my graphs... But then: filled.contour(matrix) locator() Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize that fitted.contour() has a color bar to the right and scales x wrongly... Here is what really shocked me: str(bar) `data.frame': 206858 obs. of 12 variables: ... str(mean(bar[,6:12])) Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ... ... str(sd(bar[,6:12])) Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ... ... prcomp(bar[,6:12])-foo str(foo$x) num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ... ... str(mean(foo$x)) num -1.07e-13 str(sd(foo$x)) Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ... ... So, sd returns a vector independent on whether the arguement is a matrix or data.frame, but mean reacts differently and returns a vector only against a data.frame? The problem here is not that this is difficult to learn - the problem is the complete absense of a concept. Is a data.frame an 'extended' matrix with columns of different types or something different? Since the numeric mean (I expected a vector) is recycled nicely when used in a vector context, this makes debugging code close to impossible. Since sd returns a vector, things like mean + 4*sd vary sufficiently across the data elements that I assume working code... I don't get any warning signal that something is wrong here. The point in case is the behavior of locator() on a filled.contour() object: Things apparently have been programmed and debugged from example rather than concept. Now, in another posting I read that all this is a feature to discourge inexperienced users from statistics and force you to think before you do things. Whilst I support this concept of thinking:
[R] Can R plot multicolor lines?
I have a number of continuous data series I'd like to plot with the first 2/3 or so of each plotted in one color with the last 1/3 plotted in another color. I've thought of plotting 2 lines that abut each other by determining where the first portion ends and attach the second portion. Is there a simpler way that i have not thought of or discovered through the mailing list, Intro to R, or Lattice PDF? Thanks Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] installation question/problem
Hello, Can anybody tell me why I am getting the error below when I run make check and if it has any consequences I may regret later? I run: # ./configure --enable-R-shlib # make # make check # make install configure, make and make install all work without errors, and it seems to install ok, and I even test the R binary after install, so I guess it's working. But I want to make sure. I'm not going to be using R, but I'm the net admin who has been tasked with installing it our servers, so I don't want any nasty surprises. I wonder if it's possible that I'm missing libraries because I'm not running X on the servers? This is: FreeBSD 5.4 p8 R-2.2.1 make check output: (snip) . running code in 'grDevices-Ex.R' ... OK comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.prev' ... OK running code in 'graphics-Ex.R' ... OK comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.prev' ... OK running code in 'stats-Ex.R' ...*** Error code 1 Stop in /usr/home/dwinner/tmp/R-2.2.1/tests/Examples. *** Error code 1 Stop in /usr/home/dwinner/tmp/R-2.2.1/tests/Examples. *** Error code 1 Stop in /usr/home/dwinner/tmp/R-2.2.1/tests. *** Error code 1 Stop in /usr/home/dwinner/tmp/R-2.2.1/tests. *** Error code 1 Stop in /usr/home/dwinner/tmp/R-2.2.1. Thanks for any info, DW __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Can R plot multicolor lines?
Hi one way is to use segments x-rnorm(200) plot(1:200, x, type=n) segments(1:199,x[1:199], 2:200, x[2:200], col=c(rep(1,150), rep(2,50))) HTH Petr On 6 Jan 2006 at 12:28, Paul DeBruicker wrote: Date sent: Fri, 6 Jan 2006 12:28:36 -0500 From: Paul DeBruicker [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject:[R] Can R plot multicolor lines? I have a number of continuous data series I'd like to plot with the first 2/3 or so of each plotted in one color with the last 1/3 plotted in another color. I've thought of plotting 2 lines that abut each other by determining where the first portion ends and attach the second portion. Is there a simpler way that i have not thought of or discovered through the mailing list, Intro to R, or Lattice PDF? Thanks Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] installation question/problem
DW [EMAIL PROTECTED] writes: Hello, Can anybody tell me why I am getting the error below when I run make check and if it has any consequences I may regret later? I run: # ./configure --enable-R-shlib # make # make check # make install configure, make and make install all work without errors, and it seems to install ok, and I even test the R binary after install, so I guess it's working. But I want to make sure. I'm not going to be using R, but I'm the net admin who has been tasked with installing it our servers, so I don't want any nasty surprises. I wonder if it's possible that I'm missing libraries because I'm not running X on the servers? This is: FreeBSD 5.4 p8 R-2.2.1 make check output: (snip) . running code in 'grDevices-Ex.R' ... OK comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.prev' ... OK running code in 'graphics-Ex.R' ... OK comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.prev' ... OK running code in 'stats-Ex.R' ...*** Error code 1 Ouch. Please look for stats-Ex.Rout.fail at tell us what is in it (you should find it in tests/Examples in your builddir, interesting stuff should be towards the end of the file). -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem with Integral of Indicator Function
Hi.. i was trying to integrate the indicator funtion but had problems when limits where negative or equal to the indicator condition my function is fun1-function(x){ as.numeric(x=2) } _ which should be Ind(x=2)*x seems to work for the following two cases integrate(fun1,3,5) 2 with absolute error 2.2e-14 integrate(fun1,5,100) 95 with absolute error 1.1e-12 -- Does not work for the following integrate(fun1,0,2) 0 with absolute error 0 ( i was expecting = 2) integrate(fun1,-1,5) 3 with absolute error 3.3e-14 (i was expecting =5) integrate(fun1,-2,5) 3 with absolute error 5.3e-15(i was expecting =5) Any suggestions? Thanks. Harsh, - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] help with strip.default
On 1/6/06, Berton Gunter [EMAIL PROTECTED] wrote: Steve: This is a question for **super Deepayan,** and hopefully he'll respond. However, in the interim, let me give it a shot. Basically, I think what you've asked for falls outside the bounds of what lattice is designed to do. But I think there's a simple way to fool it. Basically what you need to do is to combine your two factors into one with level names and ordering as you want. See ?factor (?ordered may also be useful, but you don't need it). For example: comb.factor=factor(paste(A,B,sep='.')) That's what I would have suggested. I would recommend using interaction() instead of paste(), since it is designed for this and is presumably more efficient (not that it matters in this small example). For the record, the 'layout' and 'skip' arguments (of xyplot etc) are often useful in conjunction with this sort of use. Deepayan As I said, you may have to reorder the levels from the default that factor() gives you to get your panels to display the way you want. Also see the perm.cond and index.cond arguments of xyplot, which might also suffice for that purpose. Again, Deepayan will hopefully suggest a cleverer way that I missed. But I think this approach will get you what you want. Cheers, Bert -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steven Lacey Sent: Friday, January 06, 2006 8:20 AM To: r-help@stat.math.ethz.ch Subject: [R] help with strip.default Hi, I am creating a multi-conditioned trellis plot. My data look something like this: Factor AFactor BIVDV X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 5 Z 1 Z 2 Z 3 Z 4 In one sense these data are suitable for trellis because for every level of factor A there are four levels of factor B. However, the names of the factor B levels depend on the level of factor A. How would I create a 3 x 4 trellis plot where each panel is a combination of factor A and factor B where the names of factor B are preserved and the strip has two levels, one for factor A and another for factor B? This was more difficult than I thought because trellis wants to generate 15 panels, as there are 3 levels of factor A and 5 levels of factor B. But these 5 levels of factor B are in name only. There are only 4 different levels of factor B for each level of factor A. As a work around I am considering renaming the levels in factor A from 1 to 4 for all levels of factor B. Then, write a custom strip.default to specify the names. However, I am not sure how to write this function. Would someone help me get started? Thanks, Steve __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] installation question/problem
Peter Dalgaard wrote: DW [EMAIL PROTECTED] writes: Hello, Can anybody tell me why I am getting the error below when I run make check and if it has any consequences I may regret later? I run: # ./configure --enable-R-shlib # make # make check # make install configure, make and make install all work without errors, and it seems to install ok, and I even test the R binary after install, so I guess it's working. But I want to make sure. I'm not going to be using R, but I'm the net admin who has been tasked with installing it our servers, so I don't want any nasty surprises. I wonder if it's possible that I'm missing libraries because I'm not running X on the servers? This is: FreeBSD 5.4 p8 R-2.2.1 make check output: (snip) . running code in 'grDevices-Ex.R' ... OK comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.prev' ... OK running code in 'graphics-Ex.R' ... OK comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.prev' ... OK running code in 'stats-Ex.R' ...*** Error code 1 Ouch. Please look for stats-Ex.Rout.fail at tell us what is in it (you should find it in tests/Examples in your builddir, interesting stuff should be towards the end of the file). Here is what I found: ## using the nl2sol algorithm fm4DNase1 - nls( density ~ Asym/(1 + exp((xmid - log(conc))/scal)), + data = DNase1, + start = list(Asym = 3, xmid = 0, scal = 1), + trace = TRUE, algorithm = port) 0 0.0: 3.0 0.0 1.0 Error in nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = DNase1, : Convergence failure: See PORT documentation. Code (27) Execution halted Thanks, DW __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] distribution maps
On Fri, 6 Jan 2006, Rogério Rosa da Silva wrote: Dears, I would like to know if there is a R package(s) on CRAN that can generate distribution maps of species. I think that this issue not has been discussed, but I did not search extensively on CRAN or help archives. Could I suggest the Spatial and Environmetrics Task Views reached from the Task View item in the navigation bar on CRAN? You may also find the R-sig-geo mailing list a useful place to make your question a little more detailed - you do not say anything about your data, and a helpful reply would depend on knowing that. Best regards Rogério -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Got it--Re:A question on summation of functions
Dear Rers, It seems the usual sum function can work. Anyway, I appreciate your time on this. Best wishes, Liqiu Dear Rers, I am trying to do a 2 dimmensional intergration for a function. the function is summation of another function evaluated at a series of vector values. I have difficulty to code this function. For example: I have function f which is a bivariate normal density function: #define some constants err-0.5 m-5 times-seq(0, m-1) rou-sum(times)/sqrt(m*sum(times^2)) sig.w- sqrt(m*err) sig.wt-sqrt(sum(times^2)*err) #bivariate normal density f-function(x, y, u.x, u.y) exp(-((x-u.x)^2/sig.w^2+(y-u.y)^2/sig.wt^2-2*rou*(x-u.x)*(y-u.y)/(sig.w*sig.wt))/(2*(1-rou^2)))/(2*pi*sig.w*sig.wt*sqrt(1-rou^2)) ### I would like to have a function g which is defined as ## uw = 1:n uwt = (n+1):2n g = function(x, y) f(x,y, uw[1], uw[1])+f(x,y, uw[2], uwt[2])+ ...+f(x,y, uw[n], uwt[n]) ### if n is very large, I am not able to write all them down, How can I code the function g. Thank you for your consideration. Best wishes, Liqiu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Daylight Savings Time unknown in R-2.2.1
Under R-2.2.1, a POSIXlt date created with strptime has an unknown Daylight Savings Time flag: strptime(20051208, %Y%m%d)$isdst [1] -1 This is true on both Linux (details below) and Windows. It did not occur under R-2.1.0. Any ideas? TIA! Sys.getenv(TZ) TZ Version: platform = i686-pc-linux-gnu arch = i686 os = linux-gnu system = i686, linux-gnu status = major = 2 minor = 2.1 year = 2005 month = 12 day = 20 svn rev = 36812 language = R Locale: C Search Path: .GlobalEnv, package:methods, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, Autoloads, package:base -- David Brahm ([EMAIL PROTECTED]) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Installing Task Views
Hello, I am just beginning to use R, after several years of using S-Plus (with mixed success). I saw a recommendation on another mailing list for the Environmetrics and Spatial Task Views, as a good way for a new user to get started actually using R. The Task Views page at CRAN says: To automatically install these views, the ctv package needs to be installed, e.g., via install.packages(ctv)and then the views can be installed via install.views (after loading ctv), e.g., install.views(Econometrics) I have installed ctv and I'm assuming that loading ctv means entering library(ctv) If I then enter install.views(Environmetrics) I get the error message Warning message: CRAN task view Environmetrics not available in: install.views(Environmetrics) If I then go up to the Packages menu and Set CRAN Mirror to, for example, USA (CA 2) and again enter install.views(Environmetrics) I now get the error message Error in install.packages(pkgs, CRAN = views[[i]]$repository, dependencies = dependencies, : unused argument(s) (CRAN ...) Entering CRAN.views in the R GUI does indeed give a complete list of task view names, topics, maintainers, and repositories. Selecting different CRAN mirrors produces the same error messages as above, as does attempting to install a different task view. I have searched the R-help mailing list archive and found several postings announcing the availability of task views, but not on how to install them. I have searched the pdf manuals, and found no instances of task view. I have also searched the FAQs. The help for install.views basically repeats the information on the CRAN site, providing information on using the function, but not on actually installing task views. The example in the article on task views in the May 2005 issue of R News uses the lib = argument, which is not mentioned in the help for install.views. It appears that the instructions at CRAN for installing task views are missing at least one step. Can anyone point me to a reliable set of instructions for installing (not to mention actually using) a task view? Many thanks in advance. Regards, Mark C. Andersen Dr. Mark C. Andersen Associate Professor Department of Fishery and Wildlife Sciences New Mexico State University Las Cruces NM 88003-0003 phone: 505-646-8034 fax: 505-646-1281 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
I am a fan of wiki's and I reckon it would really help with making R more accessible. On one extreme you have this email list and on the other extreme you have RNews and the PDF's on CRAN. A wiki might hit the spot between them and reduce the traffic on the email list. Frank E Harrell Jr wrote: I feel that as long as people continue to provide help on r-help wikis will not be successful. I think we need to move to a central wiki or discussion board and to move away from e-mail. People are extremely helpful but e-mail seems to be to always be memory-less and messages get too long without factorization of old text. R-help is now too active and too many new users are asking questions asked dozens of times for e-mail to be effective. The wiki also needs to collect and organize example code, especially for data manipulation. I think that new users would profit immensely from a compendium of examples. Just my .02 Euros Frank __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for big files [was: Re: A comment about R:]
RG, I think .import command in sqlite should work. plus, sqlite browser ( http://sqlitebrowser.sourceforge.net) might do the work as well. On 1/6/06, ronggui [EMAIL PROTECTED] wrote: Can you give me some hints? or let me know how to do ? Thank you ! 2006/1/6, Wensui Liu [EMAIL PROTECTED]: RG, Actually, SQLite provides a solution to read *.csv file directly into db. Just for your consideration. On 1/5/06, ronggui [EMAIL PROTECTED] wrote: 2006/1/6, jim holtman [EMAIL PROTECTED]: If what you are reading in is numeric data, then it would require (807 * 118519 * 8) 760MB just to store a single copy of the object -- more memory than you have on your computer. If you were reading it in, then the problem is the paging that was occurring. In fact,If I read it in 3 pieces, each is about 170M. You have to look at storing this in a database and working on a subset of the data. Do you really need to have all 807 variables in memory at the same time? Yip,I don't need all the variables.But I don't know how to get the necessary variables into R. At last I read the data in piece and use RSQLite package to write it to a database.and do then do the analysis. If i am familiar with database software, using database (and R) is the best choice,but convert the file into database format is not an easy job for me.I ask for help in SQLite list,but the solution is not satisfying as that required the knowledge about the third script language.After searching the internet,I get this solution: #begin rm(list=ls()) f-file(D:\wvsevs_sb_v4.csv,r) i - 0 done - FALSE library(RSQLite) con-dbConnect(SQLite,c:\sqlite\database.db3) tim1-Sys.time() while(!done){ i-i+1 tt-readLines(f,2500) if (length(tt)2500) done - TRUE tt-textConnection(tt) if (i==1) { assign(dat,read.table(tt,head=T,sep=,,quote=)); } else assign(dat,read.table(tt,head=F,sep=,,quote=)) close(tt) ifelse(dbExistsTable(con, wvs),dbWriteTable(con,wvs,dat,append=T), dbWriteTable(con,wvs,dat) ) } close(f) #end It's not the best solution,but it works. If you use 'scan', you could specify that you do not want some of the variables read in so it might make a more reasonably sized objects. On 1/5/06, François Pinard [EMAIL PROTECTED] wrote: [ronggui] R's week when handling large data file. I has a data file : 807 vars, 118519 obs.and its CVS format. Stata can read it in in 2 minus,but In my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M. Just (another) thought. I used to use SPSS, many, many years ago, on CDC machines, where the CPU had limited memory and no kind of paging architecture. Files did not need to be very large for being too large. SPSS had a feature that was then useful, about the capability of sampling a big dataset directly at file read time, quite before processing starts. Maybe something similar could help in R (that is, instead of reading the whole data in memory, _then_ sampling it.) One can read records from a file, up to a preset amount of them. If the file happens to contain more records than that preset number (the number of records in the whole file is not known beforehand), already read records may be dropped at random and replaced by other records coming from the file being read. If the random selection algorithm is properly chosen, it can be made so that all records in the original file have equal probability of being kept in the final subset. If such a sampling facility was built right within usual R reading routines (triggered by an extra argument, say), it could offer a compromise for processing large files, and also sometimes accelerate computations for big problems, even when memory is not at stake. -- François Pinard http://pinard.progiciels-bpi.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? -- é»è£è´µ Deparment of Sociology Fudan University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- WenSui Liu (http://statcompute.blogspot.com) Senior Decision Support Analyst Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center -- é»è£è´µ Deparment of Sociology Fudan
Re: [R] LOCFIT help
Hi, I have started to learn local regression models so as to identify statistically significant peaks in urban areas, such as population densities and congestion. I successfully ran /locfit/ and got several information on the fit. Now I got stuck. This is a very silly question, but isn't the first derivative of the fitted curve zero or close to zero? I got some very high numbers on that. The commands I put are: x-Longitude y-Latitude model.local-locfit(log(POPDENSITY)~lp(x,y, nn=0.55)) span was determined by spgwr adaptive bandwidth. Any help appreciated. Thank you very much. Taka PhD student Indiana University, Geography __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html