Re: [R] Likelihood returning inf values to optim(L-BFGS-B) other options?
Thank you very much for your postings. Rewriting the likelihood with lgamma helps a lot and the mistake with "fnscale" was quite stupid (Sorry for that!). The model is working for most of the parameter sets but I am still facing some inf-returns on my (with lgamma updated and negative) loglikelihood if I am putting in some extreme parameter values (e.g. u-shaped beta densities for the simulation data which generates x,t_x and T). Actually these are the ones which are really interesting for my project. So, is there another similar optimization algorithm which can deal with inf-returns? Thanks a lot! Best regards, Michael Zitat von Prof Brian Ripley <[EMAIL PROTECTED]>: > On Thu, 5 Apr 2007, Ravi Varadhan wrote: > >> In your code, the variables x (which I assume is the observed data), Tvec, >> and flag are not passed to the function as arguments. This could be a >> potential problem. > > I think scoping will probably find them. > >> Another problem could be that you have to use "negative" >> log-likelihood function as input to optim, since by default it "minimizes" >> the function, whereas you are interested in finding the argmax of >> log-likelihood. So, in your function you should return (-ll) instead of ll. > > OR set fnscale. This is the most serious problem. > >> If the above strategies don't work, I would try different initial values (it >> would be best if you have a data-driven strategy for picking a starting >> value) and different optimization methods (e.g. conjugate gradient with >> "Polak-Ribiere" steplength option, Nelder-Mead, etc.). > > It looks to me as if the calculations are very vulnerable to > overflow/underflow, as they use gamma and not lgamma. They could be > rearranged to be much stabler by computing the sum of logs for each > sub-expression. > > There were over 50 warnings, which we were not shown. They probably > explained the problem. > > Beyond that, the feasible region seems to be the interior of the > positive orthant, in which case transforming the parameters (e.g. > working with their logs) would be a good idea. > > Finally, always supply analytical gradients when you can (as would be > easy here). > > >> -Original Message- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] >> Sent: Thursday, April 05, 2007 6:12 AM >> To: r-help@stat.math.ethz.ch >> Subject: [R] Likelihood returning inf values to optim(L-BFGS-B) other >> options? >> >> Dear R-help list, >> >> I am working on an optimization with R by evaluating a likelihood >> function that contains lots of Gamma calculations (BGNBD: Hardie Fader >> Lee 2005 Management Science). Since I am forced to implement lower >> bounds for the four parameters included in the model, I chose the >> optim() function mith L-BFGS-B as method. But the likelihood often >> returns inf-values which L-BFGS-B can't deal with. >> >> Are there any other options to implement an optimization algorithm >> with R accounting for lower bounds and a four parameter-space? >> >> Here is the error message I receive (german): >> -- >>> >> out=optim(c(.1,.1,.1,.1),fn,method="L-BFGS-B",lower=c(.0001,.0001,.0001,.000 >> 1,.0001)) >> Fehler in optim(c(0.1, 0.1, 0.1, 0.1), fn, method = "L-BFGS-B", lower >> = c(1e-04, : >> L-BFGS-B benötigt endliche Werte von 'fn' >> Zusätzlich: Es gab 50 oder mehr Warnungen (Anzeige der ersten 50 mit >> warnings()) >> -- >> And this is the likelihood function: >> -- >> fn<-function(p) { >>A1=(gamma(p[1]+x)*p[2]^p[1])/(gamma(p[1])) >>A2=(gamma(p[3]+p[4])*gamma(p[4]+x))/(gamma(p[4])*gamma(p[3]+p[4]+x)) >>A3=(1/(p[2]+Tvec))^(p[1]+x) >>A4=(p[3]/(p[4]+x-1))*((1/(p[2]+t_x))^(p[1]+x)) >>ll=sum(log(A1*A2*(A3+flag*A4))) >>return(ll) >> } >> >> Thank you very much for your help in advance! >> >> Best regards, >> >> Michael >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 -- Michael Jungbluth Research Associate Department of Marketing Ingolstadt School of Management CU-Eichstaett-Ingolstadt Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.
Re: [R] Setting where the x-axis crosses the y-axis
Hi Stephane, Drawing a bar plot with log axes is a really bad idea. The whole point of a bar is that you are judging the area between the top of the bar and the y-axis. If you use a log scaled axis the distance to y=0 is Inf, and your plot isn't really meaningul. You might want to consider using a dot plot instead. See http://www.b-eye-network.com/view/index.php?cid=2468&fc=0&frss=1&ua= for a good discussion of the issues. Hadley On 4/6/07, stephane helleringer <[EMAIL PROTECTED]> wrote: > Hi all, > > my apologies for a probably very obvious question but i can't figure out if, > on > a bar plot, there is a simple way to have the x-axis cross the y-axis at 1, > when the y-axis is on a log-scale? > I want to draw a bar plot, and have some of the bars "drop" below 1 while > starting from 1. Is this possible? > I have been trying various things using barplot, barplot2 etc... without > success. > Thanks a lot for your help, > > stephane > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting where the x-axis crosses the y-axis
I think you are looking for this tmp <- abs(rnorm(100, s=8)) ltmp <- log(tmp) plot(ltmp, type="h", yaxt="n", main="what you want") exp(par("usr")[3:4]) par("yaxp")[1:2] logticks <- axTicks(2, axp=c(10^c(-1,3),3), log=TRUE) axis(2, at=log(logticks), labels=logticks) plot(tmp, type="h", log="y", main="standard") __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeated-measures: aov(), lme() and lmer()
Hello, Can anyone confirm if these 3 function calls are equivalent? I would like to test for treatment effects in a repeated-measures design. The design is balanced: # 1: aov(y~subject+treatment) # 2: aov(y~treatment+Error(subject)) # 3: lme(y~treatment, random= ~1|subject) # 4: lmer(y~treatment+(1|subject)) ## Thanks in advance, -- Francisco Torreira PhD Candidate in Romance Linguistics University of Illinois at Urbana-Champaign https://netfiles.uiuc.edu/ftorrei2/www/index.html tel: (+1) 217 - 778 8510 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting where the x-axis crosses the y-axis
Hi all, my apologies for a probably very obvious question but i can't figure out if, on a bar plot, there is a simple way to have the x-axis cross the y-axis at 1, when the y-axis is on a log-scale? I want to draw a bar plot, and have some of the bars "drop" below 1 while starting from 1. Is this possible? I have been trying various things using barplot, barplot2 etc... without success. Thanks a lot for your help, stephane __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
--- Frank E Harrell Jr <[EMAIL PROTECTED]> wrote: > Charilaos Skiadas wrote: > > On Apr 6, 2007, at 12:32 PM, John Kane wrote: > > > >> I have simply moved to exporting the SPSS file to > a > >> delimited file and loading it. Unfortunately I'm > >> losing all the labelling which can be > time-consuming > >> to redo.Some of the data has something like > 10 > >> categories for a variable. > > > > I save as csv format all the time, and it offers > me a choice to use > > the labels instead of the corresponding numbers. > So you shouldn't > > have to lose that labelling. > > > > Haris Skiadas > > Department of Mathematics and Computer Science > > Hanover College > > That's a different point. The great advantage of > read.spss (and the > spss.get function in Hmisc that uses it) is that > long variable labels > are supported in addition to variable names. That's > why I like getting > SPSS or Stata files instead of csv files. I'm going > to enhance csv.get > in Hmisc to allow a row number to be specified, to > contain long variable > labels. > > Frank > Ah, I missed that point. I think it is a "little" bit less important to me but I do notice that I 'label' just about everything. Trying to rememeber what "sexy" meant 3 months ago is not always easy even when I'm reading the code. :) The enhancement will, definitely, be appreciated. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
Regarding (2), I wonder if this information is too outdated or not relevant when scaled up to larger problems... http://www.sciviews.org/benchmark/index.html --- Ramon Diaz-Uriarte <[EMAIL PROTECTED]> wrote: > Dear Lorenzo, > > I'll try not to repeat what other have answered before. > > On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote: > > The institute I work for is organizing an internal workshop for High > > Performance Computing (HPC). > (...) > > > (1)Institutions (not only academia) using R > > You can count my institution too. Several groups. (I can provide more > details off-list if you want). > > > (2)Hardware requirements, possibly benchmarks > > (3)R & clusters, R & multiple CPU machines, R performance on different > hardware. > > We do use R in commodity off-the shelf clusters; our two clusters are > running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit > machines ---dual-core AMD Opterons. We use parallelization quite a > bit, with MPI (via Rmpi and papply packages mainly). One convenient > feature is that (once the lam universe is up and running) whether we > are using the 4 cores in a single box, or the max available 120, is > completeley transparent. Using R and MPI is, really, a piece of cake. > That said, there are things that I miss; in particular, oftentimes I > wish R were Erlang or Oz because of the straightforward fault-tolerant > distributed computing and the built-in abstractions for distribution > and concurrency. The issue of multithreading has come up several times > in this list and is something that some people miss. > > I am not sure how much R is used in the usual HPC realms. It is my > understanding that the "traditional HPC" is still dominated by things > such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer > to "but R is too slow" is "but you can write Fortran or C code for the > bottlenecks and call it from R". I guess you could use, say, UPC in > that C that is linked to R, but I have no experience. And I think this > code can become a pain to write and maintain (specially if you want to > play around with what you try to parallelize, etc). My feeling (based > on no information or documentation whatsoever) is that how far R can > be stretched or extended into HPC is still an open question. > > > > (4)finally, a list of the advantages for using R over commercial > > statistical packages. The money-saving in itself is not a reason good > > enough and some people are scared by the lack of professional support, > > though this mailing list is simply wonderful. > > > > (In addition to all the already mentioned answers) > Complete source code availability. Being able to look at the C source > code for a few things has been invaluable for me. > And, of course, and extremely active, responsive, and vibrant > community that, among other things, has contributed packages and code > for an incredible range of problems. > > > Best, > > R. > > P.S. I'd be interested in hearing about the responses you get to your > presentation. > > > > Kind Regards > > > > Lorenzo Isella > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Ramon Diaz-Uriarte > Statistical Computing Team > Structural Biology and Biocomputing Programme > Spanish National Cancer Centre (CNIO) > http://ligarto.org/rdiaz > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > TV dinner still cooling? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression
Uwe Ligges a écrit : > > > Laurent Rhelp wrote: > >> Dear R-List, >> >> I have a great many files in a directory and I would like to >> replace in every file the character " by the character ' and in the >> same time, I have to change ' by '' (i.e. the character ' twice and >> not the unique character ") when the character ' is embodied in "." >> So, "." becomes '.' and ".'.." becomes '.''..' >> Certainly, regular expression could help me but I am not able to use it. >> >> How can I do that with R ? > > > > In fact, you do not need to know anything about regular expressions in > this case, since you are simply going to replace certain characters by > others without any fuzzy restrictions: > > x <- "\".'..\"" > cat(x, "\n") > xn <- gsub('"', "'", gsub("'", "''", x)) > cat(xn, "\n") > > > Uwe Ligges > > >> Thank you very much >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > Yes, You are right. So I wrote the code below (that I find a little awkward but it works). ##- dirdata <- getwd() fichnames <- list.files(path=paste(dirdata,"\\initial\\",sep="")) for( i in 1:length(fichnames)){ filein <- paste(dirdata,"\\initial\\",fichnames[i],sep="") conin <- file(filein) open(conin) nbrows <- length( readLines(conin,n=-1) ) close(conin) fileout <- paste(dirdata,"\\result\\",fichnames[i],sep="") conout <- file(fileout,"w") conin <- file(filein) open(conin) for( l in 1:nbrows ) { text <- gsub('"',"'",gsub("'","''",readLines(conin,n=1))) writeLines(con=conout,text=text) } close(conin) close(conout) } ##-- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression
Laurent Rhelp wrote: > Dear R-List, > > I have a great many files in a directory and I would like to > replace in every file the character " by the character ' and in the same > time, I have to change ' by '' (i.e. the character ' twice and not the > unique character ") when the character ' is embodied in "." > So, "." becomes '.' and ".'.." becomes '.''..' > Certainly, regular expression could help me but I am not able to use it. > > How can I do that with R ? In fact, you do not need to know anything about regular expressions in this case, since you are simply going to replace certain characters by others without any fuzzy restrictions: x <- "\".'..\"" cat(x, "\n") xn <- gsub('"', "'", gsub("'", "''", x)) cat(xn, "\n") Uwe Ligges > Thank you very much > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting multilevel / lme lines
Yesterday, I tried to do exactly this, too. Below is my approach. Unfortunately, I did not find a textbook example against which I could verify my code. Hints, simplifications, and verification highly appreciated!!! ### random slope! model v = lme(POPULAR ~ SEX + TEXP, data=dta, random = ~ SEX | SCHOOL) ### setting up the thing to plot a = matrix(t(cbind(0, coef(v)[,1], 1, coef(v)[,2])), ncol=2, byrow=1) plot(a) ### connecting only the pairs I want to have connected, ie dont connect all dots for (i in 1:(length(a)/4)*2-1) lines(a[i:(i+1),]) Thanks Toby data downloaded from http://www.ats.ucla.edu/stat/examples/ma_hox/default.htm Rense Nieuwenhuis wrote: > Dear expeRts, > > I am trying to plot a lme-object {package nlme) in such a way, that > on a selected level the x-axis represents the value on a selected > predictor and the y-axis represents the predicted-outcome variable. > The graphs would than consist of several lines that each represent > one group. I can't find such a plotting function. > > I could write such a function myself, based on ranef() and fixef(), > but it would be a waste of time if such a function would already exist. > > Does any of you such a function? > > Regards, > > Rense Nieuwenhuis > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
Dear Lorenzo and Steven, I'm not a programmer, but I have the experience that R is good for processing large datasets, especially in combination with specialised statistics. There are some limits to that, but R handles large datasets / complicated computation a lot better that SPSS for example. I cannot speak of Fortran, but I have the experience of Pascal. I prefer R, because in Pascal you become easily confused an endless programming effort which has nothing to do with the problem. I do like Pascal, it's the only programming language I actually learned, but it isn't an adequate replacement of R. The experience I have is that the SPSS language, and menu-driven package, is far easier to handle than R, but when it comes to specific computations, SPSS loses it, by far. Non-parametrics is good in R, e.g. Dataset handling is adequate (my SPSS ports can be read), I noticed that R has good numerical routines like optimisation (even mixed integer programming), good procedures for regression (GLM, which is not an SPSS standard). Try to compute a Kendall-W statistic in SPSS. It's relatively easy in R. The only thing that I DON'T like about R is dataset computations and it's syntax. When I have a dataset with only non-parametric content which is also "dirty" (dataset is incomplete / wrong value), I have to call in almost a technician how to do that. To be honest: I use a spreadsheet for these dataset computations, and then export it to R. But I noted in R there are several solutions for that. With SciViews I could get a basic feeling for it. Pascal is basically the only programming language that I syntactically understood. It had a kind of logical mathematical structure to it. The logic of Fortran (and to some extent R): I completely miss it. Statistically: R is my choice, and luckely most procedures in R are easily accessible. And my experience with computations in R are... good. I have done in the past simulations, especially with time-series, but I cannot recommend R for it (arima.sim is not sufficient for these types of simulations). I still would prefer Pascal for it. There is also an excellent open source package for Pascal: Free Pascal, but I hardly use it. I do have some good experiences with computations in C, but little experience. Instead of C I would prefer R, I believe. Cheers, Wilfred __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
Dear Lorenzo, I'll try not to repeat what other have answered before. On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote: > The institute I work for is organizing an internal workshop for High > Performance Computing (HPC). (...) > (1)Institutions (not only academia) using R You can count my institution too. Several groups. (I can provide more details off-list if you want). > (2)Hardware requirements, possibly benchmarks > (3)R & clusters, R & multiple CPU machines, R performance on different > hardware. We do use R in commodity off-the shelf clusters; our two clusters are running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit machines ---dual-core AMD Opterons. We use parallelization quite a bit, with MPI (via Rmpi and papply packages mainly). One convenient feature is that (once the lam universe is up and running) whether we are using the 4 cores in a single box, or the max available 120, is completeley transparent. Using R and MPI is, really, a piece of cake. That said, there are things that I miss; in particular, oftentimes I wish R were Erlang or Oz because of the straightforward fault-tolerant distributed computing and the built-in abstractions for distribution and concurrency. The issue of multithreading has come up several times in this list and is something that some people miss. I am not sure how much R is used in the usual HPC realms. It is my understanding that the "traditional HPC" is still dominated by things such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer to "but R is too slow" is "but you can write Fortran or C code for the bottlenecks and call it from R". I guess you could use, say, UPC in that C that is linked to R, but I have no experience. And I think this code can become a pain to write and maintain (specially if you want to play around with what you try to parallelize, etc). My feeling (based on no information or documentation whatsoever) is that how far R can be stretched or extended into HPC is still an open question. > (4)finally, a list of the advantages for using R over commercial > statistical packages. The money-saving in itself is not a reason good > enough and some people are scared by the lack of professional support, > though this mailing list is simply wonderful. > (In addition to all the already mentioned answers) Complete source code availability. Being able to look at the C source code for a few things has been invaluable for me. And, of course, and extremely active, responsive, and vibrant community that, among other things, has contributed packages and code for an incredible range of problems. Best, R. P.S. I'd be interested in hearing about the responses you get to your presentation. > Kind Regards > > Lorenzo Isella > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
Charilaos Skiadas wrote: > On Apr 6, 2007, at 12:32 PM, John Kane wrote: > >> I have simply moved to exporting the SPSS file to a >> delimited file and loading it. Unfortunately I'm >> losing all the labelling which can be time-consuming >> to redo.Some of the data has something like 10 >> categories for a variable. > > I save as csv format all the time, and it offers me a choice to use > the labels instead of the corresponding numbers. So you shouldn't > have to lose that labelling. > > Haris Skiadas > Department of Mathematics and Computer Science > Hanover College That's a different point. The great advantage of read.spss (and the spss.get function in Hmisc that uses it) is that long variable labels are supported in addition to variable names. That's why I like getting SPSS or Stata files instead of csv files. I'm going to enhance csv.get in Hmisc to allow a row number to be specified, to contain long variable labels. Frank > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regular expression
Dear R-List, I have a great many files in a directory and I would like to replace in every file the character " by the character ' and in the same time, I have to change ' by '' (i.e. the character ' twice and not the unique character ") when the character ' is embodied in "." So, "." becomes '.' and ".'.." becomes '.''..' Certainly, regular expression could help me but I am not able to use it. How can I do that with R ? Thank you very much __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling boxplot with fivenumber summary
Here is one way of labeling the values: # capture data returned by boxplot x <- boxplot(count ~ spray, data = InsectSprays, col = "lightgray") # plot each group for (i in seq(ncol(x$stats))){ text(i, x$stats[,i], labels=x$stats[,i]) } # if there are outliers, plot them if (length(x$out) > 0){ # split the groups so you can get max/min maxmin <- split(x$out, x$group) # go through each group getting min/max lapply(names(maxmin), function(.grp){ .range <- range(maxmin[[.grp]]) text(as.numeric(.grp), .range, labels=.range) }) } On 4/6/07, Daniel Siddle <[EMAIL PROTECTED]> wrote: > > I am very new to R so forgive me if this seems basic but I have done > extensive searching and failed to come up with the answer for myself. > > I am trying to label a boxplot I have created with the values for the median, > upper and lower quartiles and max and min values. I have been unable to do > this or find anything on the net to say how it might be done. Is this > possible and if so how? Regards, > > Daniel Siddle > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
Thanks, that's excellent news. This is a relatively new problem for me and we don't have SPSS on the local machines so I have not been experimenting. Given that I used SPSS for about 3 minutes in the last 5-6 months I was too cheap to have us get a licence. And it's not that far to walk to the nearest lab with SPSS. :) Next time I'll take a coffee and experiment. --- Charilaos Skiadas <[EMAIL PROTECTED]> wrote: > On Apr 6, 2007, at 12:32 PM, John Kane wrote: > > > I have simply moved to exporting the SPSS file to > a > > delimited file and loading it. Unfortunately I'm > > losing all the labelling which can be > time-consuming > > to redo.Some of the data has something like 10 > > categories for a variable. > > I save as csv format all the time, and it offers me > a choice to use > the labels instead of the corresponding numbers. So > you shouldn't > have to lose that labelling. > > Haris Skiadas > Department of Mathematics and Computer Science > Hanover College > > > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling boxplot with fivenumber summary
Daniel Siddle wrote: > I am very new to R so forgive me if this seems basic but I have done > extensive searching and failed to come up with the answer for myself. > > I am trying to label a boxplot I have created with the values for the median, > upper and lower quartiles and max and min values. I have been unable to do > this or find anything on the net to say how it might be done. Is this > possible and if so how? Regards, Here is another idea: fn <- boxplot(ToothGrowth$len, plot=FALSE)$stats par(mar=c(4,6,4,2)) boxplot(ToothGrowth$len, ylab="Length", at=.80) text(1.15, fn[1], paste("Minimum Value =", fn[1]), adj=0, cex=.7) text(1.15, fn[2], paste("Lower Quartile =", fn[2]), adj=0, cex=.7) text(1.15, fn[3], paste("Median =", fn[3]), adj=0, cex=.7) text(1.15, fn[4], paste("Upper Quartile =", fn[4]), adj=0, cex=.7) text(1.15, fn[5], paste("Maximum Value =", fn[5]), adj=0, cex=.7) arrows(1.14, fn[1], 1.02, fn[1]) arrows(1.14, fn[2], 1.02, fn[2]) arrows(1.14, fn[3], 1.02, fn[3]) arrows(1.14, fn[4], 1.02, fn[4]) arrows(1.14, fn[5], 1.02, fn[5]) title("Annotated Boxplot of Tooth Growth") > Daniel Siddle > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
On Apr 6, 2007, at 12:32 PM, John Kane wrote: > I have simply moved to exporting the SPSS file to a > delimited file and loading it. Unfortunately I'm > losing all the labelling which can be time-consuming > to redo.Some of the data has something like 10 > categories for a variable. I save as csv format all the time, and it offers me a choice to use the labels instead of the corresponding numbers. So you shouldn't have to lose that labelling. Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing the rank of a matrix.
Hi, qr(A)$rank will work, but just be wary of the tolerance parameter (default is 1.e-07), since the rank computation could be sensitive to the tolerance chosen. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Friday, April 06, 2007 11:07 AM To: José Luis Aznarte M. Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] Computing the rank of a matrix. How about qr(A)$rank or perhaps qr(A, LAPACK=TRUE)$rank Cheers, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: [EMAIL PROTECTED] Tel: (651) 733-6092 Fax: (651) 736-3122 "José Luis Aznarte M." <[EMAIL PROTECTED] To .ugr.es> r-help@stat.math.ethz.ch Sent by: cc [EMAIL PROTECTED] at.math.ethz.ch Subject [R] Computing the rank of a matrix. 04/06/2007 06:39 AM Hi! Maybe this is a silly question, but I need the column rank (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function 'rank()' only gives me the ordering of the elements of my matrix. How can I compute the column rank of a matrix? Is there not an R equivalent to Matlab's 'rank()'? I've been browsing for a time now and I can't find anything, so any help will be greatly appreciated. Best regards! -- -- Jose Luis Aznarte M. http://decsai.ugr.es/~jlaznarte Department of Computer Science and Artificial Intelligence Universidad de Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax: +34 - 958 - 24 00 79 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
On Thu, 5 Apr 2007, John Kane wrote: > Heck. I'd be happy to get an answer to what is > happening here: >> mac <- spss.get("H:/ONTH/Raw.data/Follow.sav") > Warning message: > H:/ONTH/Raw.data/Follow.sav: Unrecognized record type > 7, subtype 16 encountered in system file > It means that your file had a record of type 7, subtype 16 in it, and read.spss doesn't know how to handle these. You would have to ask SPSS what record type 7 and subtype 16 represent -- their software put them there, and it's their terminology. People's experience with unrecognised record types is that they usually don't matter, which would make sense from a backwards-compatibility point of view, but in the absence of documentation or psychic powers it is hard to be sure. Avoiding read.spss is a perfectly reasonable strategy, and is in fact what we have always recommended in the Data Import-Export manual. AFAIK the only commercial statistical software vendor that does provide complete, public documentation of their file formats is Stata, and this is one reason why there are fewer complaints about read.dta and write.dta. It also probably helps that the code was written by someone who uses Stata -- there hasn't been much contribution of code or patches for the foreign package from SPSS users. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
--- Thomas Lumley <[EMAIL PROTECTED]> wrote: > On Thu, 5 Apr 2007, John Kane wrote: > > Heck. I'd be happy to get an answer to what is > > happening here: > >> mac <- spss.get("H:/ONTH/Raw.data/Follow.sav") > > Warning message: > > H:/ONTH/Raw.data/Follow.sav: Unrecognized record > type > > 7, subtype 16 encountered in system file > > > > It means that your file had a record of type 7, > subtype 16 in it, and > read.spss doesn't know how to handle these. > > You would have to ask SPSS what record type 7 and > subtype 16 represent -- > their software put them there, and it's their > terminology. > > People's experience with unrecognised record types > is that they usually > don't matter, which would make sense from a > backwards-compatibility point > of view, but in the absence of documentation or > psychic powers it is hard > to be sure. Yes, that actually was what I meant. I have had no problems with SPSS 12 but 14 seems a bit nasty. Sometime I may get a change to build a couple of test files in SPSS that I can check. >Avoiding read.spss is a perfectly > reasonable strategy, and is > in fact what we have always recommended in the Data > Import-Export manual. I have simply moved to exporting the SPSS file to a delimited file and loading it. Unfortunately I'm losing all the labelling which can be time-consuming to redo.Some of the data has something like 10 categories for a variable. > > AFAIK the only commercial statistical software > vendor that does provide > complete, public documentation of their file formats > is Stata, and this > is one reason why there are fewer complaints about > read.dta and write.dta. > It also probably helps that the code was written by > someone who uses Stata > -- there hasn't been much contribution of code or > patches for the > foreign package from SPSS users. > > > -thomas > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a csv file row by row
Hi. On 4/6/07, Yuchen Luo <[EMAIL PROTECTED]> wrote: > Hi, my friends. > When a data file is large, loading the whole file into the memory all > together is not feasible. A feasible way is to read one row, process it, > store the result, and read the next row. > > > In Fortran, by default, the 'read' command reads one line of a file, which > is convenient, and when the same 'read' command is executed the next time, > the next row of the same file will be read. > > I tried to replicate such row-by-row reading in R.I use scan( ) to do so > with the "skip= xxx " option. It takes only seconds when the number of the > rows is within 1000. However, it takes hours to read 1 rows. I think it > is because every time R reads, it needs to start from the first row of the > file and count xxx rows to find the row it needs to read. Therefore, it > takes more time for R to locate the row it needs to read. Yes, to skip rows scan() needs to locate every single row (line feed/carriage return). The only gain you get is that it does not have to parse and store the contents of those skipped lines. One solution is to first go through the file and register the file position of the first character in every line, and then make use of this in subsequent reads. In order to do this, you have to work with an opened connection and pass that to scan instead. Rough sketch: con <- file(pathname, open="r") # Scan file for first position of every line rowStarts <- scanForRowStarts(con); # Skip to a certain row and read a set of lines: seek(con, where=rowStarts, origin="start", rw="r) data <- scan(con, ..., skip=0, nlines=rowsPerChunk) close(con) That's the idea. The tricky part is to get scanForRowStarts() correct. After reading a line you can always query the connection for the current file position using: pos <- seek(con, rw="r") so you could always iterate between readLines(con, n=1) and pos <- c(pos, seek(con, rw="r")), but there might be a faster way. Cheers /Henrik > > Is there a solution to this problem? > > Your help will be highly appreciated! > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing the rank of a matrix.
How about qr(A)$rank or perhaps qr(A, LAPACK=TRUE)$rank Cheers, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: [EMAIL PROTECTED] Tel: (651) 733-6092 Fax: (651) 736-3122 "José Luis Aznarte M." <[EMAIL PROTECTED] To .ugr.es> r-help@stat.math.ethz.ch Sent by: cc [EMAIL PROTECTED] at.math.ethz.ch Subject [R] Computing the rank of a matrix. 04/06/2007 06:39 AM Hi! Maybe this is a silly question, but I need the column rank (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function 'rank()' only gives me the ordering of the elements of my matrix. How can I compute the column rank of a matrix? Is there not an R equivalent to Matlab's 'rank()'? I've been browsing for a time now and I can't find anything, so any help will be greatly appreciated. Best regards! -- -- Jose Luis Aznarte M. http://decsai.ugr.es/~jlaznarte Department of Computer Science and Artificial Intelligence Universidad de Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax: +34 - 958 - 24 00 79 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling boxplot with fivenumber summary
Daniel Siddle wrote: > I am very new to R so forgive me if this seems basic but I have done > extensive searching and failed to come up with the answer for myself. > > I am trying to label a boxplot I have created with the values for the median, > upper and lower quartiles and max and min values. I have been unable to do > this or find anything on the net to say how it might be done. Is this > possible and if so how? Regards, This message from back in 2002 gives a function called bp.example(), which shows how a boxplot might be annotated: http://tolstoy.newcastle.edu.au/R/help/02a/1515.html You could easily modify it into a stripped down version that does what you want. > Daniel Siddle > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
Hi Lorenzo, On 4/5/07, Lorenzo Isella <[EMAIL PROTECTED]> wrote: > > I would like to have suggestions about where to collect info about: > (1)Institutions (not only academia) using R A starting point might be to look at the R-project homepage and look at the members and donors list. This is, of course, not a comprehensive list; but at least it can give an overview in which diverse backgrounds people are using R --- even if it is only the tip of the iceberg. (2)Hardware requirements, possibly benchmarks Maybe you should also mention that you can run just from a USB stick if you want (See R for Windows FAQ 2.6). (3)R & clusters, R & multiple CPU machines, R performance on different > hardware. Have a look a the 'R Administration and Installation' manual; it gives a nice overview on how many platforms are is running. Best, Roland [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Labelling boxplot with fivenumber summary
I am very new to R so forgive me if this seems basic but I have done extensive searching and failed to come up with the answer for myself. I am trying to label a boxplot I have created with the values for the median, upper and lower quartiles and max and min values. I have been unable to do this or find anything on the net to say how it might be done. Is this possible and if so how? Regards, Daniel Siddle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss (package foreign) and SPSS 15.0 files
--- Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > On Thu, 5 Apr 2007, Michael Conklin wrote: > > > Not being the developer I cannot answer > definitively but, as a frequent > > user of SPSS files I can give you my experience. > > > > 1) The unrecognized coding is perhaps due to the > locale of the SPSS > > installation. I have had success reading in files > from version 15 but > > often encounter that error when the file was > created with data that > > included some foreign language. I often receive > survey files that were > > administered in a non-English language and that is > when I usually see > > the error. > > That is what is surmised in this recent R-devel > thread: > > https://stat.ethz.ch/pipermail/r-devel/2007-April/045238.html > > although it may also happen in an English locale > (since after all Windows > uses codepage 1252, not ASCII, for American > 'English'). > > The next release of package foreign will give a > warning (rather than an > error) with an unrecognized encoding and recognize a > few more. > > > 2) My experience with the "Warning - unrecognized > record type" message > > is that it has no effect whatsoever on the data > file. > > > > 3) Others on the list have noted that you are > safer exporting POR files > > instead of SAV files from SPSS. Both are read by > the read.spss function. > > The R Data Import/Export manual recommends an open > format such as .csv. > (Look like John Kane has yet to read it ) Well, as I mentioned, I've been using a tab delimited approach. I suppose I could move to .csv... >R > does have quite extensive > facilities for dealing with encodings in text files. > > > > > Hope that helps. > > > > > > > > Michael Conklin > > Chief Methodologist - Advanced Analytics > > MarketTools, Inc. > > > > > > -Original Message- > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf > Of John Kane > > Sent: Thursday, April 05, 2007 7:45 PM > > To: RINNER Heinrich; r-help@stat.math.ethz.ch > > Subject: Re: [R] read.spss (package foreign) and > SPSS 15.0 files > > > > > > --- RINNER Heinrich <[EMAIL PROTECTED]> > > wrote: > > > >> Hello, > >> > >> does anyone have experience with reading SPSS > >> Version 15.0 files into R (version 2.4.1, WinXP)? > >> > >> I have long been sucessfully reading SPSS files > with > >> read.spss from the wonderful foreign package, but > >> somehow after upgrading from SPSS14 to SPSS15 I > seem > >> to have problems. > >> > >> Trying a simple example, where test.sav is a SPSS > >> 15.0 data file consisting of x1=c(1,2,3) and > >> x2=c("a","b","c"), I get this: > >>> read.spss(file = "C:\\temp\\test.sav") > >> Fehler in read.spss(file = "C:\\temp\\test.sav") > : > >> error reading system-file header > >> Zusätzlich: Warning message: > >> C:\temp\test.sav: File-indicated character > >> representation code (Unknown) is not ASCII > >> > >> version infos: > >> R version 2.4.1 (under WinXP) > >> foreign version 0.8-18 > >> > >> Has anyone experienced the same, and can give a > >> solution here (possibly other than "downgrade to > >> SPSS14.0" ;-))? > >> > >> Regards, > >> Heinrich. > > > > Heck. I'd be happy to get an answer to what is > > happening here: > >> mac <- spss.get("H:/ONTH/Raw.data/Follow.sav") > > Warning message: > > H:/ONTH/Raw.data/Follow.sav: Unrecognized record > type > > 7, subtype 16 encountered in system file > > > > I have taken to exporting the file to a delimited > > format and reading it into R since I cannot trust > the > > R import. > > > -- > Brian D. Ripley, > [EMAIL PROTECTED] > Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 > 272861 (self) > 1 South Parks Road, +44 1865 > 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Likelihood returning inf values to optim(L-BFGS-B) other
Hello, A couple of ideas... Im not clear on your whole problem however... Consider making use of the lgamma function, which returns the natural log of the gamma function. This may help. The gamma function gets awfully, big very fast. Also multivariable likelihoods can be bumpy like a mountain range, with minor peaks and valleys. It is possible that your likelihood has such a shape. Maybe each iteration Xn is trying to get closer to the main peak, but instead goes up the ridge of a valley and gets lost, ultimately reaching a boundary of the region. You could try starting at a variety of locations. Possibly many hundreds of starting points, randomly selected from within your region. Then examine the ending point for each starting point. If you do have a bumpy likelihood surface you might have to start very close to the actual maximum to get there. Surface plots might help, setting some variables to a constant. I know in 4D this will be tough. Here is a useful reference that helped me recently with a similar maximization problem: "Computational Statistics" by Geof H. Givens and Jennifer A. Hoeting They have R-code examples here: http://www.stat.colostate.edu/computationalstatistics/ Good luck! Joe Liddle University of Alaska Southeast __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dbinom and Catherine Loader
Hi Folks, There has been past correspondence regarding Catherine Loader's Bell Labs (oops, Lucent) paper "Fast and Accurate Computation of Binomial Probabilities" which gives the algorithm on which R's dbinom() is based. The original URL given in the R documentation "?dbinom" is: http://cm.bell-labs.com/cm/ms/departments/sia/catherine/dbinom but this link is dead. Likewise, Marc Schwarz (in reply to Aries Arditi on Thu Dec 11 2003) gives http://kiefer.stat.cwru.edu/~catherine/pubs.html "There is a link to the paper (as a Postscript file) at the bottom of that page, however the link appears to be dead." I've just discovered that Catherine Loader seems to have cunningly encoded herself as "c at herine.net". So now we can find a URL for her dbinom: http://www.herine.net/stat/software/dbinom.html which points to a PDF of the above paper at http://www.herine.net/stat/papers/dbinom.pdf (which, today at least, works). More generally, see http://www.herine.net/stat/index.html Best wishes to all, Ted. E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 06-Apr-07 Time: 14:56:29 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
> (1)Institutions (not only academia) using R http://www.r-project.org/useR-2006/participants.html > (2)Hardware requirements, possibly benchmarks Since you mention huge data sets, GNU/Linux running on 64-bit machines with as much RAM as your budget allows. > (3)R & clusters, R & multiple CPU machines, > R performance on different hardware. OpenMosix, Quantian for clusters; the archive for multiple CPUs (this was asked quite a few times). It may be best to measure R performance on different hardware by yourself, using your own data and code. > (4)finally, a list of the advantages for using R over > commercial statistical packages. I'd say it's not R vs. commercial packages, but S vs. the rest of the world. Check http://www.insightful.com/ , much of what they say is applicable to R. Make the case that S is vastly superior directly, not just through a list of reasons: take a few data sets and show how they can be analyzed with S compared to other choices. Both R and S-Plus are likely to significantly outperform most other software, depending on the kind of work that needs to be done. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Lorenzo Isella > Sent: Thursday, April 05, 2007 11:02 AM > To: r-help@stat.math.ethz.ch > Subject: [R] Reasons to Use R > > Dear All, > The institute I work for is organizing an internal workshop for High > Performance Computing (HPC). > I am planning to attend it and talk a bit about fluid dynamics, but > there is also quite a lot of interest devoted to data post-processing > and management of huge data sets. > A lot of people are interested in image processing/pattern recognition > and statistic applied to geography/ecology, but I would like not to > post this on too many lists. > The final aim of the workshop is understanding hardware requirements > and drafting a list of the equipment we would like to buy. I think > this could be the venue to talk about R as well. > Therefore, even if it is not exactly a typical mailing list question, > I would like to have suggestions about where to collect info about: > (1)Institutions (not only academia) using R > (2)Hardware requirements, possibly benchmarks > (3)R & clusters, R & multiple CPU machines, R performance on > different hardware. > (4)finally, a list of the advantages for using R over commercial > statistical packages. The money-saving in itself is not a reason good > enough and some people are scared by the lack of professional support, > though this mailing list is simply wonderful. > > Kind Regards > > Lorenzo Isella > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a csv file row by row
And _file()_ is helpful in such situation. R/S-PLUS Fundamentals and Programming Techniques by Thomas Lumley has something relavant in page 185 (total page is 208). I believe you can find it by googling. On 4/6/07, Martin Becker <[EMAIL PROTECTED]> wrote: > readLines (which is mentioned in the "See also" section of ?scan with > the hint "to read a file a line at a time") should work. > > Regards, > Martin > > Yuchen Luo schrieb: > > Hi, my friends. > > When a data file is large, loading the whole file into the memory all > > together is not feasible. A feasible way is to read one row, process it, > > store the result, and read the next row. > > > > > > In Fortran, by default, the 'read' command reads one line of a file, which > > is convenient, and when the same 'read' command is executed the next time, > > the next row of the same file will be read. > > > > I tried to replicate such row-by-row reading in R.I use scan( ) to do so > > with the "skip= xxx " option. It takes only seconds when the number of the > > rows is within 1000. However, it takes hours to read 1 rows. I think it > > is because every time R reads, it needs to start from the first row of the > > file and count xxx rows to find the row it needs to read. Therefore, it > > takes more time for R to locate the row it needs to read. > > > > Is there a solution to this problem? > > > > Your help will be highly appreciated! > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ronggui Huang Department of Sociology Fudan University, Shanghai, China __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting multilevel / lme lines
Rense Nieuwenhuis wrote: > Dear expeRts, > > I am trying to plot a lme-object {package nlme) in such a way, that > on a selected level the x-axis represents the value on a selected > predictor and the y-axis represents the predicted-outcome variable. > The graphs would than consist of several lines that each represent > one group. I can't find such a plotting function. > > I could write such a function myself, based on ranef() and fixef(), > but it would be a waste of time if such a function would already exist. > > Does any of you such a function? I don't know of a single function with an lme object as argument, but for what I think you have in mind, here is how you might go about it: library(nlme) fm2 <- lme(distance ~ poly(age, 2) * Sex, data = Orthodont, random = ~ 1) newdat <- expand.grid(age = 8:14, Sex = c("Male","Female")) newdat$PREDDIST <- predict(fm2, newdat, level = 0) library(lattice) xyplot(PREDDIST ~ age, groups=Sex, ylab="Model Predicted Distance", data = newdat, xlab="Age", panel = function(x, y, ...){ panel.grid(h=6,v=6) panel.superpose(x, y, type="l", ...)}, main="Orthodont Growth Model", key = simpleKey(levels(newdat$Sex), lines=TRUE, points=FALSE) ) > Regards, > > Rense Nieuwenhuis > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing the rank of a matrix.
On Apr 6, 2007, at 7:39 AM, José Luis Aznarte M. wrote: > Hi! Maybe this is a silly question, but I need the column rank > (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function > 'rank()' only gives me the ordering of the elements of my matrix. > How can I compute the column rank of a matrix? Is there not an R > equivalent to Matlab's 'rank()'? > I've been browsing for a time now and I can't find anything, so > any > help will be greatly appreciated. Best regards! > Surprisingly, google searching for "r matrix rank" actually returns a R link: http://tolstoy.newcastle.edu.au/R/help/05/05/4000.html I suppose the point is that in R you usually need a bit more than just the rank, so instead you want an object that contains all that info and more. Like we have the various lm objects, so to speak. They do the hard work once, and then we can ask them more particular questions. ?qr > -- -- > Jose Luis Aznarte M. http://decsai.ugr.es/~jlaznarte > Department of Computer Science and Artificial Intelligence > Universidad de Granada Tel. +34 - 958 - 24 04 67 > GRANADA (Spain) Fax: +34 - 958 - 24 00 79 Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing the rank of a matrix.
On 4/6/07, "José Luis Aznarte M." <[EMAIL PROTECTED]> wrote: > Hi! Maybe this is a silly question, but I need the column rank > (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function > 'rank()' only gives me the ordering of the elements of my matrix. > How can I compute the column rank of a matrix? Is there not an R > equivalent to Matlab's 'rank()'? > I've been browsing for a time now and I can't find anything, so any > help will be greatly appreciated. Best regards! This discussion may help you: http://marc.info/?l=r-help&m=111522337531442&w=2 Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Computing the rank of a matrix.
Hi! Maybe this is a silly question, but I need the column rank (http://en.wikipedia.org/wiki/Rank_matrix) of a matrix and R function 'rank()' only gives me the ordering of the elements of my matrix. How can I compute the column rank of a matrix? Is there not an R equivalent to Matlab's 'rank()'? I've been browsing for a time now and I can't find anything, so any help will be greatly appreciated. Best regards! -- -- Jose Luis Aznarte M. http://decsai.ugr.es/~jlaznarte Department of Computer Science and Artificial Intelligence Universidad de Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax: +34 - 958 - 24 00 79 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to set the scale of axis?
Try this: plot(1:100, xaxt = "n") axis(1, c(1, 20, 40, 60, 80, 100)) # and optionally add this third line axis(1, 1:100, FALSE, tcl = -0.3) See ?par (for the xaxt argument to plot) and ?axis . ?plot and ?plot.default have info on the plot command. A good source of sample code for graphics is: http://addictedtor.free.fr/graphiques/ On 4/6/07, Shao <[EMAIL PROTECTED]> wrote: > Hello,everyone. > > I want to know how to control the scale of axises. > > For example, the range of x axis is (1,100),and I want to show the scale in > the axis as this: > 1 20 40 60 80 100. > > Is there any parameters in plot() or other functions to set the scale? > > Thands! > >[[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting multilevel / lme lines
Dear expeRts, I am trying to plot a lme-object {package nlme) in such a way, that on a selected level the x-axis represents the value on a selected predictor and the y-axis represents the predicted-outcome variable. The graphs would than consist of several lines that each represent one group. I can't find such a plotting function. I could write such a function myself, based on ranef() and fixef(), but it would be a waste of time if such a function would already exist. Does any of you such a function? Regards, Rense Nieuwenhuis __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to set the scale of axis?
Hello,everyone. I want to know how to control the scale of axises. For example, the range of x axis is (1,100),and I want to show the scale in the axis as this: 1 20 40 60 80 100. Is there any parameters in plot() or other functions to set the scale? Thands! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to exclude some packages from help.search() ?
On Fri, 6 Apr 2007, Vladimir Eremeev wrote: > > I have installed RGTk2 to satisfy other package requirements. > I am not planning to use it in my own work. > > Occasionally I search through the R help using the help.search() function, > and every time it returns me lots of references to the functions in the > RGtk2 package, which I don't need. > I would like to avoid them. > > At present, I have renamed the file hsearch.rds in the RGtk2 directory. > > This worked, however, help.search now gives a warning, that it didn't find > that file. > > Is there any other way to avoid extraneous information, returned by > help.search, which is not such crude as mine? Use the package= argument to say which packages you want, or even install little-used packages in a different library and use lib.loc= (or only have that library in .libPaths() when you need it). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a large csv file row by row
The solution is to read the 'R Data Import/Export Manual' and make use of connections or databases. What you want to do is very easy in RODBC, for example, but can be done with scan() easily provided you keep a connection open. On Fri, 6 Apr 2007, Yuchen Luo wrote: > Hi, my friends. > > When a data file is large, loading the whole file into the memory all > together is not feasible. A feasible way is to read one row, process it, > store the result, and read the next row. It makes a lot more sense to process say 1000 rows at a time. > In Fortran, by default, the 'read' command reads one line of a file, which > is convenient, and when the same 'read' command is executed the next time, > the next row of the same file will be read. > > I tried to replicate such row-by-row reading in R.I use scan( ) to do so > with the "skip= xxx " option. It takes only seconds when the number of the > rows is within 1000. However, it takes hours to read 1 rows. I think it > is because every time R reads, it needs to start from the first row of the > file and count xxx rows to find the row it needs to read. Therefore, it > takes more time for R to locate the row it needs to read. Yes, R does tend to do what you tell it to > Is there a solution to this problem? > > Your help will be highly appreciated! > Best Wishes > Yuchen Luo > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. PLEASE do as we ask. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to exclude some packages from help.search() ?
I have installed RGTk2 to satisfy other package requirements. I am not planning to use it in my own work. Occasionally I search through the R help using the help.search() function, and every time it returns me lots of references to the functions in the RGtk2 package, which I don't need. I would like to avoid them. At present, I have renamed the file hsearch.rds in the RGtk2 directory. This worked, however, help.search now gives a warning, that it didn't find that file. Is there any other way to avoid extraneous information, returned by help.search, which is not such crude as mine? -- View this message in context: http://www.nabble.com/how-to-exclude-some-packages-from-help.search%28%29---tf3535954.html#a9869861 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a csv file row by row
readLines (which is mentioned in the "See also" section of ?scan with the hint "to read a file a line at a time") should work. Regards, Martin Yuchen Luo schrieb: > Hi, my friends. > When a data file is large, loading the whole file into the memory all > together is not feasible. A feasible way is to read one row, process it, > store the result, and read the next row. > > > In Fortran, by default, the 'read' command reads one line of a file, which > is convenient, and when the same 'read' command is executed the next time, > the next row of the same file will be read. > > I tried to replicate such row-by-row reading in R.I use scan( ) to do so > with the "skip= xxx " option. It takes only seconds when the number of the > rows is within 1000. However, it takes hours to read 1 rows. I think it > is because every time R reads, it needs to start from the first row of the > file and count xxx rows to find the row it needs to read. Therefore, it > takes more time for R to locate the row it needs to read. > > Is there a solution to this problem? > > Your help will be highly appreciated! > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
Hi Lorenzo, I don't think I'm qualified to provide solid information on the first three questions, but I'd like to drop a few thoughts on (4). While there are no shortage of language advocates out there, I'd like to join in for this once. My background is in chemical engineering and atmospheric science; I've done simulation on a smaller scale but spend much of my time analyzing large sets of experimental data. I am comfortable programming in Matlab, R, Python, C, Fortran, Igor Pro, and I also know a little IDL but have not programmed in it extensively. As you are probably aware, I would count among these, Matlab, R, Python, and IDL as good candidates for processing large data sets, as they are high-level languages and can communicate with netCDF files (which I imagine will be used to transfer data). Each language boasts an impressive array of libraries, but what I think gives R the advantage for analyzing data is the level of abstraction in the language. I am extremely impressed with the objects available to represent data sets, and the functions support them very well - it requires that I carry around a fewer number of objects to hold information about my data (and I don't have to "unpack" them to feed them into functions). The language is also very "expressive" in that it lets you write a procedure in many different ways, some shorter, some more readable, depending on what your situation requires. System commands and text processing are integrated into the language, and the input/output facilities are excellent, in terms of data and graphics. Once I have my data object I am only a few keystrokes to split, sort, and visualize multivariate data; even after several years I keep discovering new functions for basic things like manipulation of data objects and descriptive statistics, and plotting - truly, an analyst's needs have been well anticipated. And this is a recent obsession of mine, which I was introduced to through Python, but the functional programming support for R is amazing. By using higher-order functions like lapply(), I infrequently rely on FOR-LOOPS, which have often caused me trouble in the past because I had forgotten to re-initialize a variable, or incremented the wrong variable, etc. Though I'm definitely not militant about functional programming, in general I try to write functions and then apply them to the data (if the functions don't exist in R already), often through higher-order functions such as lapply(). This approach keeps most variables out of the global namespace and so I am less likely to reassign a value to a variable that I had intended to keep. It also makes my code more modular so that I can re-use bits of my code as my analysis inevitably grows much larger than I had originally intended. Furthermore, my code in R ends up being much, much shorter than code I imagine writing in other languages to accomplish the same task; I believe this leads to fewer places for errors to occur, and the nature of the code is immediately comprehensible (though a series of nested functions can get pretty hard to read at times), not to mention it takes less effort to write. This also makes it easier to interact with the data, I think, because after making a plot I can set up for the next plot with only a few function calls instead of setting out to write a block of code with loops, etc. I have actually recommended R to colleagues who needed to analyze the information from large-scale air quality/ global climate simulations, and they are extremely pleased. I think the capability for statistics and graphics is well-established enough that I don't need to do a hard-sell on that so much, but R's language is something I get very excited about. I do appreciate all the contributors who have made this available. Best regards, ST --- Lorenzo Isella <[EMAIL PROTECTED]> wrote: > Dear All, > The institute I work for is organizing an internal workshop for High > Performance Computing (HPC). > I am planning to attend it and talk a bit about fluid dynamics, but > there is also quite a lot of interest devoted to data post-processing > and management of huge data sets. > A lot of people are interested in image processing/pattern recognition > and statistic applied to geography/ecology, but I would like not to > post this on too many lists. > The final aim of the workshop is understanding hardware requirements > and drafting a list of the equipment we would like to buy. I think > this could be the venue to talk about R as well. > Therefore, even if it is not exactly a typical mailing list question, > I would like to have suggestions about where to collect info about: > (1)Institutions (not only academia) using R > (2)Hardware requirements, possibly benchmarks > (3)R & clusters, R & multiple CPU machines, R performance on different > hardware. > (4)finally, a list of the advantages for using R over commercial > statistical packages. The money-saving in itself is not a reason good > enough and some peop
Re: [R] Reasons to Use R
As to my knowledge the core of R is considered "adequate" and "good" by the statisticians. That's sufficient isn't it? Last year I read some documentation about R and most routines were considered "good", but "some very bad". That is a benchmark somehow. There must be some benchmarks you want. R is widely used and there must be people around who can provide you with the adequate stuff. CRAN is a way to that, or the project page. The core is free by the way and you can participate in the development. People can provide you there with the information you want. R is quite well documented (not everybody thinks it's well doc'ed, but... you know... opinions do vary). There is one simple reason to use R. It's free that's for one. If you have the money commercial software is sufficient. That doesn't mean that R is the poor mans software. It works quite well actually (but you... know... opinions vary, especially about statistical software). I think that's the usual reason to use it: it works quite well, and it's documentation is widely available. A LOT of statistical procedures are available. R crashed about 2 times last year on my computer and that's a better than SPSS, and there are a lot of user interfaces available which make working with R easier. Personally I don't like SPSS, but I do know that the R core is used in commercial applications. So at least one person has done some benchmarks. Wilfred __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a csv file row by row
Hi, my friends. When a data file is large, loading the whole file into the memory all together is not feasible. A feasible way is to read one row, process it, store the result, and read the next row. In Fortran, by default, the 'read' command reads one line of a file, which is convenient, and when the same 'read' command is executed the next time, the next row of the same file will be read. I tried to replicate such row-by-row reading in R.I use scan( ) to do so with the "skip= xxx " option. It takes only seconds when the number of the rows is within 1000. However, it takes hours to read 1 rows. I think it is because every time R reads, it needs to start from the first row of the file and count xxx rows to find the row it needs to read. Therefore, it takes more time for R to locate the row it needs to read. Is there a solution to this problem? Your help will be highly appreciated! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading a large csv file row by row
Hi, my friends. When a data file is large, loading the whole file into the memory all together is not feasible. A feasible way is to read one row, process it, store the result, and read the next row. In Fortran, by default, the 'read' command reads one line of a file, which is convenient, and when the same 'read' command is executed the next time, the next row of the same file will be read. I tried to replicate such row-by-row reading in R.I use scan( ) to do so with the "skip= xxx " option. It takes only seconds when the number of the rows is within 1000. However, it takes hours to read 1 rows. I think it is because every time R reads, it needs to start from the first row of the file and count xxx rows to find the row it needs to read. Therefore, it takes more time for R to locate the row it needs to read. Is there a solution to this problem? Your help will be highly appreciated! Best Wishes Yuchen Luo [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate a serie of new vars that correlate withexistingvar
Hello Greg (and List), Thnaks for your reply and reflections (and sorry for my "frenglish"). Of course you're right, and I agree "a posteriori" with all your views. Probably my suggestion was first of all a mark of appreciation for your solution ;-) . Here is the path I followed to get where I was, but I see that I was probably misunderstanding what makes the "core" of R : 1) The question of making such related couples of vectors is nearly a FAQ, as you point out in your reply. 2) It appeared to me that it is often asked by newbies or users with relatively small statistical knowledge. 3) To get to your solution, a good understanding is needed of what correlation is, as well as of matrix properties and operators. My guess was that the people listed above have generally not. 4) I believed from my own experience that the core of R was dedicated either to basics or to rather complicated algorithms to handle or produce results appearing as "simple" or "classical". 5) From my same own experience, I was not able to imagine to which non-core package such a function should "obviously" be added. I imagined that in the same manner, a person seeking for the function could have some problems in locating it. Until now I did not have a look to your TeachingDemos package (I'll do it), but I know of other categories of searchers, often not statisticians, who have a need to generate such data and would not think of getting there to find a way. To end with, all this mainly shows that I did not understand R philosophy as well as I thought ! Thanks, and regards. Olivier Greg Snow a écrit : > Oliver, > > I have thought of adding something like this to a package, but here is my > current thinking on the issue. > > This question (or similar) has been asked a few times, so there is some > demand for a general answer, I see three approaches: > > 1. Have an example of the necessary steps archived in a publicly available > place. > 2. Write a function and include it in a non-core package. > 3. Add it to the core of R or a core package. > > Number 1 is already in process as the e-mails will be part of the archive. > Though someone is welcome to add it to the Wiki if they think that would be > useful as well. > > Your suggestion is number 3, but I would argue that 2 is better than 3 for > the simple reason that anything added to the core is implied to be top > quality and have pretty much any options that most people would think of. > Putting it in a non-core package makes it available, with less implications > of quality. > > The question then becomes, what options do we make available? Do we have > them specify the entire correlation structure? Or just assume the new > variables will be independent of each other? What should the function do if > the set of correlations result in a matrix that is not positive definite? > What if the user wants to have 2 fixed variables? And other questions. > > My current thinking is that the process is simple enough that it is easier to > do this by hand than to remember all the options to the function. There are > currently people who use bootstrap and permutation tests without loading in > the packages that do these because it is quicker to write the code by hand > than to remember the syntax of the functions. I think this type of data > generation falls under the same situation. But if you, or someone else > thinks that there is enough justification for a function to do this, and can > specify what options it should have, I will be happy to add it to my > TeachingDemos package (this seems an appropriate place, since one of the > places that I want to generate data with a specific correlation structure is > when creating an example for students). > > > Hope this helps, > > -- Olivier ETERRADOSSI Maître-Assistant CMGD / Equipe "Propriétés Psycho-Sensorielles des Matériaux" Ecole des Mines d'Alès Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 tel std: +33 (0)5.59.30.54.25 tel direct: +33 (0)5.59.30.90.35 fax: +33 (0)5.59.30.63.68 http://www.ema.fr __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reasons to Use R
John Kane wrote: > --- Lorenzo Isella <[EMAIL PROTECTED]> wrote: > > >> (4)finally, a list of the advantages for using R >> over commercial >> statistical packages. The money-saving in itself is >> not a reason good >> enough and some people are scared by the lack of >> professional support, >> though this mailing list is simply wonderful. >> >> > Given that I can do as much if not more with R (in > most cases) than with commercial software, as an > independent consultant, 'cost' is a very significant > factor. > > A very major advantage of R is the money-saving. Have > a look at > http://www.spss.com/stores/1/Software_Full_Version_C2.cfm > > and convince me that cost ( for an independent > contractor) is not a good reason. > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > Hello, No doubt that for an independent contractor money is a significant issue, but we are talking about the case of a large organization for which spending a few thousand euros on software is routine. To avoid misunderstandings: I am myself an R user and I have no intention to pay a cent for statistical software, but in order to speak up for R vs any commercial software for data analysis and postprocessing, I need technical details (benchmarks, etc...) rather than the fact that it helps saving money. Kind Regards Lorenzo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic/Cox regression: Parameter estimates directly from model matrix
On 4/6/07, Kaspar Rufibach <[EMAIL PROTECTED]> wrote: > Hi out there > > Is there a way to get the estimated coefficients in a logistic / Cox > regression without having to specify a 'formula' but by only giving the > model matrix? See 'coxreg.fit' in package 'eha'. Or 'glm.fit' for logistic regression. hth, Göran > > Example for Cox regression: > > ## predictors > n <- 50 > q1 <- rnorm(n) > q2 <- rgamma(n, 2, 2) > Z <- cbind(q1, q2) > > ## response > ttf <- rexp(n) > tf <- round(runif(n)) > > ## compute estimates > res <- coxph(Surv(ttf, tf) ~ q1 + q2) > r <- res$coef > > My goal is to have a function > > estFromModelMatrix <- function(tf, ttf, Z){ > > /* do something meaningful using built-in functions */ > > return(r)} > > I have written such functions myself using LL - maximization from > scratch, but these are slower than the built-in functions. Since I > intend to do some simulations (where I specify the model matrix, but not > want to give a 'formula' manually for each simulation scenario), it > would be nice to have a function estFromModelMatrix(). > > I searched the help extensively, but did not find a way to do this. > > Hope I was clear enough, any help is appreciated! > Kaspar Rufibach > > > -- > __ > Kaspar Rufibach > Department of Statistics -- Sequoia Hall > 390 Serra Mall > Stanford University > Stanford, CA 94305-4065 > > mailto:[EMAIL PROTECTED] > skype:kasparrufibach > http://www.stanford.edu/~kasparr > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Göran Broström __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a data frame from a list
Hi Dimitri, You can try this one if you'd like: lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0)) # get unique names nms <- unique(rapply(lst,function(x) names(x))) # create a vector of NA's and then fill it according # to matching names for each element of list doit <- function(x,nms) { y <- rep(NA,length(nms)); names(y) <- nms y[match(names(x),names(y))] <- x return(y) } # apply it to the data dtf <- as.data.frame(sapply(lst,doit,nms)) --- Dimitri Szerman <[EMAIL PROTECTED]> wrote: > Dear all, > > A few months ago, I asked for your help on the following problem: > > I have a list with three (named) numeric vectors: > > > lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) ) > > lst > $a > A B > 1 8 > > $b > A B C > 2 3 0 > > $c > B D > 2 0 > > Now, I'd love to use this list to create the following data frame: > > > dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA), > + b=c(A=2,B=3,C=0,D=NA), > + c=c(A=NA,B=2,C=NA,D=0) ) > > > dtf >ab c > A 1 2 NA > B 8 3 2 > C NA 0 NA > D NA NA0 > > That is, I wish to "merge" the three vectors in the list into a data frame > by their "(row)"names. > > And I got the following answer: > > library(zoo) > z <- do.call(merge, lapply(lst, function(x) zoo(x, names(x > rownames(z) <- time(z) > coredata(z) > > However, it does not seem to be working. Here's what I get when I try it: > > > lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) ) > > library(zoo) > > z <- do.call(merge, lapply(lst, function(x) zoo(x, names(x > Error in if (freq > 1 && identical(all.equal(freq, round(freq)), > TRUE)) freq <- round(freq) : > missing value where TRUE/FALSE needed > In addition: Warning message: > NAs introduced by coercion > > and z was not created. > > Any ideas on what is going on here? > Thank you, > Dimitri > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Be a PS3 game guru. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.