Re: [R] A question about dput
Try control=NULL : dput(list(x=rnorm(3),N=3), control = NULL) list(x = c(-0.254393363810571, -0.650328028909466, -1.20888767858120 ), N = 3) On 2/2/07, Tong Wang [EMAIL PROTECTED] wrote: Hi, I am trying to output a R data set for use in WinBugs, I used dput(list(x=rnorm(100),N=100),file=bug.dat) But I can't get the intended format:list(x=c(...),N=100), instead, I got something like this (copied the first two lines): []737472756374757265286C6973742878 structure(list(x [0010]203D2063282D302E3336333136313033 =c(-0.36316103 Did I do something wrong here ? Thanks a lot for any help tong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression trees with an ordinal response variable
Den Fr, 2007-02-02, 06:03 skrev Stacey Buckelew: Hi, I am working on a regression tree in Rpart that uses a continuous response variable that is ordered. I read a previous response by Pfr. Ripley to a inquiry regarding the ability of rpart to handle ordinal responses in 2003. At that time rpart was unable to implement an algorithm to handle ordinal responses. Has there been any effort to rectify this in recent years? The `ctree' function in the `party' package is able to handle ordered responses, but note that there are fundamental differences between the former and `rpart'. Reading the package vignette and the relevant references will help. However, at the moment there seems to be a problem related to the ordinal case (predicted probabilities 1) and I've CC:ed the package's maintainer (Torsten Hothorn). HTH, Henric - - - - - Torsten, consider the following: ### ordinal regression mammoct - ctree(ME ~ ., data = mammoexp) Warning message: no admissible split found ### estimated class probabilities treeresponse(mammoct, newdata = mammoexp[1:5, ]) [[1]] [1] 1.822115 [[2]] [1] 1.265487 [[3]] [1] 1.822115 [[4]] [1] 1.560440 [[5]] [1] 1.822115 sessionInfo() R version 2.4.1 Patched (2007-01-06 r40399) i386-pc-mingw32 locale: LC_COLLATE=Swedish_Sweden.1252;LC_CTYPE=Swedish_Sweden.1252;LC_MONETARY=Swedish_Sweden.1252;LC_NUMERIC=C;LC_TIME=Swedish_Sweden.1252 attached base packages: [1] stats4grid splines stats graphics grDevices [7] utils datasets methods base other attached packages: party vcd colorspaceMASS strucchangesandwich 0.9-8 1.0-2 0.957.2-31 1.3-1 2.0-1 zoocoin mvtnorm modeltoolssurvival 1.2-2 0.5-2 0.7-50.2-10 2.30 Thanks! Stacey On Mon, 2 Jun 2003, Andreas Christmann wrote: 1. RE: Ordinal data - Regression Trees Proportional Odds (Liaw, Andy) AFAIK there's no implementation (or description) of tree algorithm that handles ordinal response. Regression trees with an ordinal response variable can be computed with SPSS Answer Tree 3.0. They *can* be handled by tree or rpart in R. I think Andy's point was that there is no consensus as to the right way to handle them: certainly using the codes of categories works and may often be reasonable, and treating ordinal responses as categorical is also very often perfectly adequate. Note that rpart is user-extensible, so it would be reasonably easy to write an extension for a proportional-odds logistic regression model, if that is thought appropriate (and it seems strange to me to impose such strong structure on the model with such a general `linear predictor': POLR models are often in my experience a poor reflection of real problems). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Autocorrelated Binomial
Den To, 2007-02-01, 22:38 skrev Rick Bilonick: I need to generate autocorrelated binary data. I've found references to the IEKS package but none of the web pages currently exist. Does anyone know where I can find this package or suggest another package? The `bindata' package can generate correlated binary data. HTH, Henric Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression trees with an ordinal response variable
On Fri, 2 Feb 2007, Henric Nilsson (Public) wrote: Den Fr, 2007-02-02, 06:03 skrev Stacey Buckelew: Hi, I am working on a regression tree in Rpart that uses a continuous response variable that is ordered. I read a previous response by Pfr. Ripley to a inquiry regarding the ability of rpart to handle ordinal responses in 2003. At that time rpart was unable to implement an algorithm to handle ordinal responses. Has there been any effort to rectify this in recent years? The `ctree' function in the `party' package is able to handle ordered responses, but note that there are fundamental differences between the former and `rpart'. Reading the package vignette and the relevant references will help. Hi Henric, However, at the moment there seems to be a problem related to the ordinal case (predicted probabilities 1) and I've CC:ed the package's maintainer (Torsten Hothorn). yep, you are right (as always)-- a bug introduced by a fix, grrr. Its a little bit more complicated, but I'll make correct predictions available again *asap*. Thanks! Torsten HTH, Henric - - - - - Torsten, consider the following: ### ordinal regression mammoct - ctree(ME ~ ., data = mammoexp) Warning message: no admissible split found ### estimated class probabilities treeresponse(mammoct, newdata = mammoexp[1:5, ]) [[1]] [1] 1.822115 [[2]] [1] 1.265487 [[3]] [1] 1.822115 [[4]] [1] 1.560440 [[5]] [1] 1.822115 sessionInfo() R version 2.4.1 Patched (2007-01-06 r40399) i386-pc-mingw32 locale: LC_COLLATE=Swedish_Sweden.1252;LC_CTYPE=Swedish_Sweden.1252;LC_MONETARY=Swedish_Sweden.1252;LC_NUMERIC=C;LC_TIME=Swedish_Sweden.1252 attached base packages: [1] stats4grid splines stats graphics grDevices [7] utils datasets methods base other attached packages: party vcd colorspaceMASS strucchangesandwich 0.9-8 1.0-2 0.957.2-31 1.3-1 2.0-1 zoocoin mvtnorm modeltoolssurvival 1.2-2 0.5-2 0.7-50.2-10 2.30 Thanks! Stacey On Mon, 2 Jun 2003, Andreas Christmann wrote: 1. RE: Ordinal data - Regression Trees Proportional Odds (Liaw, Andy) AFAIK there's no implementation (or description) of tree algorithm that handles ordinal response. Regression trees with an ordinal response variable can be computed with SPSS Answer Tree 3.0. They *can* be handled by tree or rpart in R. I think Andy's point was that there is no consensus as to the right way to handle them: certainly using the codes of categories works and may often be reasonable, and treating ordinal responses as categorical is also very often perfectly adequate. Note that rpart is user-extensible, so it would be reasonably easy to write an extension for a proportional-odds logistic regression model, if that is thought appropriate (and it seems strange to me to impose such strong structure on the model with such a general `linear predictor': POLR models are often in my experience a poor reflection of real problems). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Affymetrix data analysis
On Thu, 1 Feb 2007, Sicotte, Hugues Ph.D. wrote: Tristan, I have a soft spot for problems analyzing microarrays with R.. for the memory issue, there have been previous posts to this list.. But here is the answer I gave a few weeks ago. If you need more memory, you have to move to linux or recompile R for windows yourself.. .. But you'll still need a computer with more memory. The long term solution, which we are implementing, is to rewrite the normalization code so it doesn't Need to load all those arrays at once. -- cut previous part of message-- The defaults in R is to play nice and limit your allocation to half the available RAM. Make sure you have a lot of disk swap space (at least 1G with 2G of RAM) and you can set your memory limit to 2G for R. That just isn't true (R uses as much of the RAM as is reasonable, all for up to 1.5Gb installed). Please consult the rw-FAQ for the whole truth. See help(memory.size) and use the memory.limit function [Please follow the advice you quote.] Hugues P.s. Someone let me use their 16Gig of RAM linux And I was able to run R-64 bits with top showing 6Gigs of RAM allocated (with suitable --max-mem-size command line parameters at startup for R). There is no such 'command' for R under Linux. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] features of save and save.image (unexpected file sizes)
On Thu, 1 Feb 2007, Vaidotas Zemlys wrote: Hi, On 2/1/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 1 Feb 2007, Vaidotas Zemlys wrote: Hi, On 1/31/07, Professor Brian Ripley [EMAIL PROTECTED] wrote: Two comments: 1) ls() does not list all the objects: it has all.names argument. Yes, I tried it with all.names, but the effect was the same, I forgot to mention it in a letter. 2) save.image() does not just save the objects in the workspace, it also saves any environments they may have. Having a function with a large environment is the usual cause of a large saved image. I have little experience dealing with enivronments, so is there a quick way to discard the environments of the functions? When saving the session I really do not need them. Change, not discard. E.g. environment(f) - .GlobalEnv. If environments are not mentioned by anything saved, they will not be saved. I found the culprit. I was parsing formulas in my code, and I saved them in that large object. So the environment came with saved formulas. Is there a nice way to say R: please do not save the environments with the formulas, I do not need them? No, but why create them that way? You could do mmodel - as.formula(mmodel, env=.GlobalEnv) The R way is to create what you want, not fix up afterwards. (I find your code unreadable--spaces help a great deal, so am not sure if I have understood it correctly.) This is what I was doing (I am discarding irrelevant code) testf- function(formula) { mainform - formula if(deparse(mainform[[3]][[1]])!=|) pandterm(invalid conditioning for main regression) mmodel - substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula=list(main=mmodel)) } when called bu - testf(lnp~I(CE/12000)+hhs|Country) I get ls(env=environment(bu$formula$main)) [1] formula mainform mmodel or in actual case, a lot of more objects, which I do not need, but which take a lot of place. For the moment I solved the problem with environment(mmodel) - NULL but is this correct R way? Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for bioinformatics
Marc Schwartz wrote: On Thu, 2007-02-01 at 21:32 +0100, Peter Dalgaard wrote: Marc Schwartz wrote: On Thu, 2007-02-01 at 10:45 -0800, Seth Falcon wrote: Benoit Ballester [EMAIL PROTECTED] writes: Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. I've never seen a copy so I can't comment about its quality (has anyone seen a copy?). You might want to take a look at _Bioinformatics and Computational Biology Solutions Using R and Bioconductor_. http://www.bioconductor.org/pub/docs/mogr/ I'll stand (or sit) to be corrected on this as I cannot find the source, but I have a recollection from seeing something quite some time ago that the book may have never been published. It's been a while since the status was something along the lines that the authors may or may not complete it. Subject matter moving faster than pen, I suspect Peter, that wording does seem familiar, just cannot recall where I saw it. Perhaps on the O'Reilly web site, where it is no longer listed. For confirmation, I called O'Reilly's customer service in Cambridge, MA. They confirm that the book was indeed cancelled and never published. No reasons were given. Thanks for those replies. I did also contacted the O'reilly offices in UK, and they told me the same thing. The book was never published. I just wanted to compare the R for bioinformatics with the Bioinformatics and Computational Biology Solutions Using R and Bioconductor, and see which one suit me more - But guess I don't have the choice now :-) Ben -- Benoit Ballester Ensembl Team __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series analysis
Hello John, as a starting point you might also want to have a look at: @book{BOOK, author={Robert S Pindyck and Daniel L Rubinfeld}, title={Econometric Models and Economic Forecasts}, year={1997}, publisher={McGraw-Hill/Irwin}, isbn={0079132928} } The monographies of Hamilton and Lütkepohl might then be taken into focus. Best, Bernhard John -- Well, as a start, have a look at Modern Applied Statistics with S, by Venables and Ripley, both of which names you will recognize if you read this list often. There is a 30-page chapter on time series (with suggestions for other readings), obviously geared to S and R, that is a good jumping-off place. Ben Fairbank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack Sent: Thursday, February 01, 2007 3:12 PM To: R-help@stat.math.ethz.ch Subject: [R] time series analysis Does anyone know a good introductory book or tutorial about time series analysis? (time series for a beginner). Thank you so much. John Lamak _ Descubra como mandar Torpedos SMS do seu Messenger para o celular dos seus amigos. http://mobile.msn.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * Confidentiality Note: The information contained in this mess...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing
Thanks a lot match() function does the task (of course this a a simple example of my long datasets) Just for the record, merge() function does not seem to work properly for this (even with the sort=FALSE argument) as the order of the results does not match the corresponding order of classes in the x vector. Wishes, Javier --- Tony Plate wrote: a - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) x - c(1,1,2,7,6,5,4,3,2,2,2) match(x, a$class) [1] 1 1 4 NA NA 3 NA 2 4 4 4 a[match(x, a$class), value] [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 -- Tony Plate javier garcia-pintado wrote: Hello, In a nutshell, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem installing R-2.4.1 on AIX 5.3
Dear all, I have some problems to install R-2.4.1 on AIX 5.3. Configure string: ./configure --with-readline=no LDFLAGS='-bshared' --with-jpeglib=no --with-libpng=no --with-lapack=no --prefix=/cineca/prod/Bioinf/R-2.4.1 configure .site= #! /bin/sh CC=xlc F77=xlf MAIN_LDFLAGS=-Wl,-brtl SHLIB_LDFLAGS=-Wl,-G CXX=xlc CXXFLAGS=' -g -O' SHLIB_LDFLAGS=-W1, -G MAKE=gmake Configure ends successfully, but the make fails: ---cut de -DHAVE_CONFIG_H -I/usr/local/include -g -c init.c -o init.o xlc -qlanglvl=extc99 -Wl,-G -Wl,-G -Wl,-bexpall -Wl,-bnoentry -bshared -o grDevices.so chull.o devNull.o devPicTeX.o devPS.o devQuartz.o init.o -lm Error in solve.default(rgb) : lapack routines cannot be loaded In addition: Warning message: unable to load shared library '/cineca/prod/Bioinf/R-2.4.1/modules//lapack.so': rtld: 0712-001 Symbol _xldipow/cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol _log was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol _sqrt was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol idamax was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol dger was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbo Error: unable to load R code in package 'grDevices' Execution halted make: 1254-004 The error code from the last command is 1. Stop. make: 1254-004 The error code from the last command is 1. Stop. make: 1254-004 The error code from the last command is 1. Stop. make: 1254-004 The error code from the last command is 1. Stop. - Can you help me? Thank you in advance. FF --- Dr Francesco Falciano CINECA (High Performance Systems) via Magnanelli, 6/3 40033 Casalecchio di Reno (BO)-ITALY tel: +39-051-6171724 fax: +39-051-6132198 e-mail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Affymetrix data analysis
Of course, you would know best, so can you tell us if the help pages I pull using help(Memory) is wrong? That help page says (2nd paragraph) (On Windows the --max-mem-size option sets the maximum memory allocation: it has a minimum allowed value of 16M. This is intended to catch attempts to allocate excessive amounts of memory which may cause other processes to run out of resources. The default is the smaller of the amount of physical RAM in the machine and 1024Mb. See also memory.limit.) Hugues -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 3:05 AM To: Sicotte, Hugues Ph.D. Cc: Tristan Coram; R-help@stat.math.ethz.ch Subject: Re: [R] Affymetrix data analysis On Thu, 1 Feb 2007, Sicotte, Hugues Ph.D. wrote: Tristan, I have a soft spot for problems analyzing microarrays with R.. for the memory issue, there have been previous posts to this list.. But here is the answer I gave a few weeks ago. If you need more memory, you have to move to linux or recompile R for windows yourself.. .. But you'll still need a computer with more memory. The long term solution, which we are implementing, is to rewrite the normalization code so it doesn't Need to load all those arrays at once. -- cut previous part of message-- The defaults in R is to play nice and limit your allocation to half the available RAM. Make sure you have a lot of disk swap space (at least 1G with 2G of RAM) and you can set your memory limit to 2G for R. That just isn't true (R uses as much of the RAM as is reasonable, all for up to 1.5Gb installed). Please consult the rw-FAQ for the whole truth. See help(memory.size) and use the memory.limit function [Please follow the advice you quote.] Hugues P.s. Someone let me use their 16Gig of RAM linux And I was able to run R-64 bits with top showing 6Gigs of RAM allocated (with suitable --max-mem-size command line parameters at startup for R). There is no such 'command' for R under Linux. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Affymetrix data analysis
On Fri, 2 Feb 2007, Sicotte, Hugues Ph.D. wrote: Of course, you would know best, so can you tell us if the help pages I pull using help(Memory) is wrong? That help page says (2nd paragraph) (On Windows the --max-mem-size option sets the maximum memory allocation: it has a minimum allowed value of 16M. This is intended to catch attempts to allocate excessive amounts of memory which may cause other processes to run out of resources. The default is the smaller of the amount of physical RAM in the machine and 1024Mb. See also memory.limit.) It says nothing about 'half' does it? Depending on your version of R and Windows, the default is 1Gb, 1.5Gb or 2.5Gb, and the rw-FAQ gives the whole truth. The current version of that help page is different: https://svn.r-project.org/R/trunk/src/library/base/man/Memory.Rd it looks like in 2.4.1 it had not been updated yet. Hugues -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 3:05 AM To: Sicotte, Hugues Ph.D. Cc: Tristan Coram; R-help@stat.math.ethz.ch Subject: Re: [R] Affymetrix data analysis On Thu, 1 Feb 2007, Sicotte, Hugues Ph.D. wrote: Tristan, I have a soft spot for problems analyzing microarrays with R.. for the memory issue, there have been previous posts to this list.. But here is the answer I gave a few weeks ago. If you need more memory, you have to move to linux or recompile R for windows yourself.. .. But you'll still need a computer with more memory. The long term solution, which we are implementing, is to rewrite the normalization code so it doesn't Need to load all those arrays at once. -- cut previous part of message-- The defaults in R is to play nice and limit your allocation to half the available RAM. Make sure you have a lot of disk swap space (at least 1G with 2G of RAM) and you can set your memory limit to 2G for R. That just isn't true (R uses as much of the RAM as is reasonable, all for up to 1.5Gb installed). Please consult the rw-FAQ for the whole truth. See help(memory.size) and use the memory.limit function [Please follow the advice you quote.] Hugues P.s. Someone let me use their 16Gig of RAM linux And I was able to run R-64 bits with top showing 6Gigs of RAM allocated (with suitable --max-mem-size command line parameters at startup for R). There is no such 'command' for R under Linux. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem installing R-2.4.1 on AIX 5.3
Please check the archives: these problems are known, and await help from an AIX expert. On Fri, 2 Feb 2007, Francesco Falciano wrote: Dear all, I have some problems to install R-2.4.1 on AIX 5.3. Configure string: ./configure --with-readline=no LDFLAGS='-bshared' --with-jpeglib=no --with-libpng=no --with-lapack=no --prefix=/cineca/prod/Bioinf/R-2.4.1 configure .site= #! /bin/sh CC=xlc F77=xlf MAIN_LDFLAGS=-Wl,-brtl SHLIB_LDFLAGS=-Wl,-G CXX=xlc CXXFLAGS=' -g -O' SHLIB_LDFLAGS=-W1, -G MAKE=gmake Configure ends successfully, but the make fails: ---cut de -DHAVE_CONFIG_H -I/usr/local/include -g -c init.c -o init.o xlc -qlanglvl=extc99 -Wl,-G -Wl,-G -Wl,-bexpall -Wl,-bnoentry -bshared -o grDevices.so chull.o devNull.o devPicTeX.o devPS.o devQuartz.o init.o -lm Error in solve.default(rgb) : lapack routines cannot be loaded In addition: Warning message: unable to load shared library '/cineca/prod/Bioinf/R-2.4.1/modules//lapack.so': rtld: 0712-001 Symbol _xldipow/cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol _log was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol _sqrt was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol idamax was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol dger was referenced from module /cineca/prod/Bioinf/R-2.4.1/lib/libRlapack.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbo Error: unable to load R code in package 'grDevices' Execution halted make: 1254-004 The error code from the last command is 1. Stop. make: 1254-004 The error code from the last command is 1. Stop. make: 1254-004 The error code from the last command is 1. Stop. make: 1254-004 The error code from the last command is 1. Stop. - Can you help me? Thank you in advance. FF --- Dr Francesco Falciano CINECA (High Performance Systems) via Magnanelli, 6/3 40033 Casalecchio di Reno (BO)-ITALY tel: +39-051-6171724 fax: +39-051-6132198 e-mail: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] features of save and save.image (unexpected file sizes)
Hi, On 2/2/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: I found the culprit. I was parsing formulas in my code, and I saved them in that large object. So the environment came with saved formulas. Is there a nice way to say R: please do not save the environments with the formulas, I do not need them? No, but why create them that way? You could do mmodel - as.formula(mmodel, env=.GlobalEnv) Hm, but say I have some large object in .GlobalEnv, and I generate mmodel 10 different times and save the result as a list with length 10. Now if I try to save this list, R will save 10 different copies of .GlobalEnv together with aforementioned large object? The R way is to create what you want, not fix up afterwards. (I find your code unreadable--spaces help a great deal, so am not sure if I have understood it correctly.) Hm, I copied this code directly from Emacs+ESS, maybe the mailer mangled something. What I want to do with this piece of code (I will repaste it here) testf- function(formula) { mainform - formula if(deparse(mainform[[3]][[1]])!=|) stop(invalid conditioning) mmodel - substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula=list(main=mmodel)) } is to read formula with condition: formula(y~x|z) and construct formula formula(y~x) I looked for examples in code of coplot in library graphics and latticeParseFormula in library lattice. Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assigning labels to a list created with apply
I have a simple data base and I want to produce tables for each variable. I wrote a simple function fn1 - function(x) {table(x)} where x is a matrix or data.frame. and used apply to produce a list of tables. Example below. How do I apply the colnames from the matrix or names from the data.frame to label the tables in the results in the list. I know that I can do this individually but there should be a way to do something like an apply but I am missing it cata - c( 1,1,6,1,1,NA) catb - c( 1,2,3,4,5,6) doga - c(3,5,3,6,4, 0) dogb - c(2,4,6,8,10, 12) rata - c (NA, 9, 9, 8, 9, 8) ratb - c( 1,2,3,4,5,6) bata - c( 12, 42,NA, 45, 32, 54) batb - c( 13, 15, 17,19,21,23) id - c('a', 'b', 'b', 'c', 'a', 'b') site - c(1,1,4,4,1,4) mat1 - cbind(cata, catb, doga, dogb, rata, ratb, bata, batb) fn1 - function(x) {table(x)} jj -apply(mat1, 1, fn1) ; jj ## Slow way to label a list ### label(jj[[1]]) - cata label(jj[[2]]) - catb # and so on ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Double labels and scales
Dear All, Say you want to plot, on the same figure two quantities, concentration and temperature, both as function of the same variable. I'd like to be able to put a certain label and scale on the y axis on the left of the figure (referring to the temperature) and another label and scale for the concentration on the right. Any suggestion about what to do? I am sure it is trivial, but I could not find what I needed on the net. I found some reference about a plot.double function by Anne York, but I do not think that I need anything so elaborate. Many thanks Lorenzo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] features of save and save.image (unexpected file sizes)
On Fri, 2 Feb 2007, Vaidotas Zemlys wrote: Hi, On 2/2/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: I found the culprit. I was parsing formulas in my code, and I saved them in that large object. So the environment came with saved formulas. Is there a nice way to say R: please do not save the environments with the formulas, I do not need them? No, but why create them that way? You could do mmodel - as.formula(mmodel, env=.GlobalEnv) Hm, but say I have some large object in .GlobalEnv, and I generate mmodel 10 different times and save the result as a list with length 10. Now if I try to save this list, R will save 10 different copies of .GlobalEnv together with aforementioned large object? No, it saves the environment (here .GlobalEnv), not objects, and there can be many shared references. The R way is to create what you want, not fix up afterwards. (I find your code unreadable--spaces help a great deal, so am not sure if I have understood it correctly.) Hm, I copied this code directly from Emacs+ESS, maybe the mailer mangled something. What I want to do with this piece of code (I will repaste it here) testf- function(formula) { mainform - formula if(deparse(mainform[[3]][[1]])!=|) stop(invalid conditioning) mmodel - substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula=list(main=mmodel)) } You use no spaces around your operators or after commas. R does when deparsing: testf function (formula) { mainform - formula if (deparse(mainform[[3]][[1]]) != |) stop(invalid conditioning) mmodel - substitute(y ~ x, list(y = mainform[[2]], x = mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula = list(main = mmodel)) } because it is (at least to old hands) much easier to read. IcanreadEnglishtextwithoutanyspacesbutIchoosenotto.Similarly,Rcode.Occasional spacesare evenharderto parse. is to read formula with condition: formula(y~x|z) and construct formula formula(y~x) I looked for examples in code of coplot in library graphics and latticeParseFormula in library lattice. Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Affymetrix data analysis
I stand corrected, the rule was not 1/2. (I have 2 Gigs.. So the rule was for my office's PC's). Still, R doesn't always use all available memory on Windows, and one may be able to set the options in order to get more. Hugues Ps. By the way Prof. Ripley, Thanks for all your efforts for R. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 5:37 AM To: Sicotte, Hugues Ph.D. Cc: Tristan Coram; R-help@stat.math.ethz.ch Subject: RE: [R] Affymetrix data analysis On Fri, 2 Feb 2007, Sicotte, Hugues Ph.D. wrote: Of course, you would know best, so can you tell us if the help pages I pull using help(Memory) is wrong? That help page says (2nd paragraph) (On Windows the --max-mem-size option sets the maximum memory allocation: it has a minimum allowed value of 16M. This is intended to catch attempts to allocate excessive amounts of memory which may cause other processes to run out of resources. The default is the smaller of the amount of physical RAM in the machine and 1024Mb. See also memory.limit.) It says nothing about 'half' does it? Depending on your version of R and Windows, the default is 1Gb, 1.5Gb or 2.5Gb, and the rw-FAQ gives the whole truth. The current version of that help page is different: https://svn.r-project.org/R/trunk/src/library/base/man/Memory.Rd it looks like in 2.4.1 it had not been updated yet. Hugues -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 3:05 AM To: Sicotte, Hugues Ph.D. Cc: Tristan Coram; R-help@stat.math.ethz.ch Subject: Re: [R] Affymetrix data analysis On Thu, 1 Feb 2007, Sicotte, Hugues Ph.D. wrote: Tristan, I have a soft spot for problems analyzing microarrays with R.. for the memory issue, there have been previous posts to this list.. But here is the answer I gave a few weeks ago. If you need more memory, you have to move to linux or recompile R for windows yourself.. .. But you'll still need a computer with more memory. The long term solution, which we are implementing, is to rewrite the normalization code so it doesn't Need to load all those arrays at once. -- cut previous part of message-- The defaults in R is to play nice and limit your allocation to half the available RAM. Make sure you have a lot of disk swap space (at least 1G with 2G of RAM) and you can set your memory limit to 2G for R. That just isn't true (R uses as much of the RAM as is reasonable, all for up to 1.5Gb installed). Please consult the rw-FAQ for the whole truth. See help(memory.size) and use the memory.limit function [Please follow the advice you quote.] Hugues P.s. Someone let me use their 16Gig of RAM linux And I was able to run R-64 bits with top showing 6Gigs of RAM allocated (with suitable --max-mem-size command line parameters at startup for R). There is no such 'command' for R under Linux. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning labels to a list created with apply
On Fri, 2 Feb 2007, John Kane wrote: I have a simple data base and I want to produce tables for each variable. I wrote a simple function fn1 - function(x) {table(x)} where x is a matrix or data.frame. and used apply to produce a list of tables. Example below. How do I apply the colnames from the matrix or names from the data.frame to label the tables in the results in the list. I know that I can do this individually but there should be a way to do something like an apply but I am missing it cata - c( 1,1,6,1,1,NA) catb - c( 1,2,3,4,5,6) doga - c(3,5,3,6,4, 0) dogb - c(2,4,6,8,10, 12) rata - c (NA, 9, 9, 8, 9, 8) ratb - c( 1,2,3,4,5,6) bata - c( 12, 42,NA, 45, 32, 54) batb - c( 13, 15, 17,19,21,23) id - c('a', 'b', 'b', 'c', 'a', 'b') site - c(1,1,4,4,1,4) mat1 - cbind(cata, catb, doga, dogb, rata, ratb, bata, batb) fn1 - function(x) {table(x)} jj -apply(mat1, 1, fn1) ; jj ## Slow way to label a list ### label(jj[[1]]) - cata label(jj[[2]]) - catb That does not work in vanilla R. There is no function label- (or label). Have you a package attached you did not tell us about? (E.g. are you using the Hmisc system, not the R system?) You have applied fn1 to the rows and not the columns, and jj -apply(mat1, 2, fn1) would give you the column labels as they do make sense. The way to add names to a list is names(). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning labels to a list created with apply
--- Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 2 Feb 2007, John Kane wrote: I have a simple data base and I want to produce tables for each variable. I wrote a simple function fn1 - function(x) {table(x)} where x is a matrix or data.frame. and used apply to produce a list of tables. Example below. How do I apply the colnames from the matrix or names from the data.frame to label the tables in the results in the list. I know that I can do this individually but there should be a way to do something like an apply but I am missing it cata - c( 1,1,6,1,1,NA) catb - c( 1,2,3,4,5,6) doga - c(3,5,3,6,4, 0) dogb - c(2,4,6,8,10, 12) rata - c (NA, 9, 9, 8, 9, 8) ratb - c( 1,2,3,4,5,6) bata - c( 12, 42,NA, 45, 32, 54) batb - c( 13, 15, 17,19,21,23) id - c('a', 'b', 'b', 'c', 'a', 'b') site - c(1,1,4,4,1,4) mat1 - cbind(cata, catb, doga, dogb, rata, ratb, bata, batb) fn1 - function(x) {table(x)} jj -apply(mat1, 1, fn1) ; jj ## Slow way to label a list ### label(jj[[1]]) - cata label(jj[[2]]) - catb That does not work in vanilla R. There is no function label- (or label). Have you a package attached you did not tell us about? (E.g. are you using the Hmisc system, not the R system?) Yes I am. I usually load it automatically and forgot that it is not part of the package. My appologies You have applied fn1 to the rows and not the columns, and jj -apply(mat1, 2, fn1) would give you the column labels as they do make sense. The way to add names to a list is names(). Now that makes me feel stupid. I knew I was overlooking something blindingly obvious. Thanks very much. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double labels and scales
On 2/2/2007 8:07 AM, Lorenzo Isella wrote: Dear All, Say you want to plot, on the same figure two quantities, concentration and temperature, both as function of the same variable. I'd like to be able to put a certain label and scale on the y axis on the left of the figure (referring to the temperature) and another label and scale for the concentration on the right. Any suggestion about what to do? I am sure it is trivial, but I could not find what I needed on the net. I found some reference about a plot.double function by Anne York, but I do not think that I need anything so elaborate. Many thanks One way is to work out the linear transformation that puts the other one in the scale of the one you want on the left axis, then plot both and manually apply an axis to the right side. For example: x - 1:10 temp - rnorm(10, mean=5-20*x) conc - rnorm(10, mean=x) b - diff(range(temp))/diff(range(conc)) a - min(temp) - b*min(conc) par(mar=c(5,5,5,5)) plot(x, temp, ylim = range(c(temp, a + b*conc)), type='l') lines(x, a + b*conc, lty=2) ticks - pretty(conc) axis(4, at=a + b*ticks, labels=ticks) mtext(conc, 4, 3) legend(top, c(temp, conc), lty=1:2) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wiki for Graphics tips for MacOS X
On Wed, 31-Jan-2007 at 12:11PM -0500, Gabor Grothendieck wrote: | To get the best results you need to transfer it using vector | graphics rather than bitmapped graphics: | | http://www.stc-saz.org/resources/0203_graphics.pdf | | There are a number of variations described here (see | entire thread). Its for UNIX and Windows but I think | it would likely work similarly on Mac and Windows: | | http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32297.html I found that interesting, particularly this part: For example, on Linux do this: dev.control(displaylist=enable) # enable display list plot(1:10) myplot - recordPlot() # load displaylist into variable save(myplot, file=myplot, ascii=TRUE) Send the ascii file, myplot, to the Windows machine and on Windows do this: dev.control(displaylist=enable) # enable display list load(myplot) myplot # displays the plot savePlot(myplot, type=wmf) # saves current plot as wmf I tried that, but I was never able to load the myplot in the Windows R. I always got a message about a syntax error to do with ' ' but I was unable to work out what the problem was. I thought it was because the transfer to Windows wasn't binary, but that wasn't the problem. I was unable to get the thread view at that archive to function so I was unable to see if there were any follow ups which offered an explanation. R has changed quite a bit in the years since then, so it might be that something needs to be done differently with more recent versions. Has anyone done this recently? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2 axes in an coordinate system
Hello sir: I wanna get such kind of plot: x : location of chromosome y_left: p value y_right: logratio In other words,On the left side of plot,y is p value;On the right side of plot,y is logratio How can I do it via R command? Thanks a lot! My best! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
On Thu, 1 Feb 2007, Ravi Varadhan wrote: Jeff, Here is something which is a little faster: sum1 - sum(outer(x, x, FUN=pmax)) sum3 - sum(outer(x, y, FUN=pmax)) This is the sort of problem where profiling can be useful. My experience with pmax() is that it is surprisingly slow, presumably because it handles recycling and NAs In the example I profiled (an MCMC calculation) it was measurably faster to use function(x,y) {i- xy; x[i]-y[i]; x} -thomas Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey Racine Sent: Thursday, February 01, 2007 1:18 PM To: r-help@stat.math.ethz.ch Subject: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can this loop be delooped?
Consider na=43; nb=5; x=1:na ns=rep(na %/% nb, nb) + (1:nb = na %% nb) split(x, rep(1:nb, ns)) Heikki Kaskelma On Fri, 2 Feb 2007, jim holtman [EMAIL PROTECTED] wrote: This might do what you want: # test data x - 1:43 nb - 5 # number of subsets # create vector of lengths of subsets ns - rep(length(x) %/% nb, nb) # see if we have to adjust counts of initial subsets if ((.offset - length(x) %% nb) != 0) ns[1:.offset] = ns[1:.offset] + 1 # create the subsets split(x, rep(1:nb,ns)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double labels and scales
On Feb 2, 2007, at 8:07 AM, Lorenzo Isella wrote: Dear All, Say you want to plot, on the same figure two quantities, concentration and temperature, both as function of the same variable. I'd like to be able to put a certain label and scale on the y axis on the left of the figure (referring to the temperature) and another label and scale for the concentration on the right. Any suggestion about what to do? I am sure it is trivial, but I could not find what I needed on the net. I found some reference about a plot.double function by Anne York, but I do not think that I need anything so elaborate. Many thanks Sounds like you just need the following two commands probably: ?axis ?mtext Lorenzo Haris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New RODBC: a problem with sqlSave
Till yesterday I was more than satisfied by RODBC, specifically by sqlSave which I had been happily using in a daily crontab R batch job regularly updating a postgresql db for as long as one year. In this R- batch I was able to save records into indexed tables *** even though *** they already existed. in this case sqlSave simply neglected those rows of the input table that were offending the constraints of existing rows of a table, only appending the *** very new *** rows. Yesterday I updated the R packages in my unix box including RODBC and running the R batch the following error popped up ((I only added the verbose option to sqlSave!) sqlSave(canale, tabella1, tablename=tminmax, rownames=FALSE, varTypes=tipicampitminmax, append=TRUE,fast=TRUE,verbose=T) Query: INSERT INTO tminmax ( data, cod_wmo, tipo, t ) VALUES ( ?,?,?,? ) Binding: data: DataType 9 Binding: cod_wmo: DataType 1 Binding: tipo: DataType 1 Binding: t: DataType 2 Parameters: no: 1: data 2007-01- 26/***/no: 2: cod_wmo 16045/***/no: 3: tipo MIN/***/no: 4: t -2.5/***/ sqlwrite returned [RODBC] Failed exec in Update 01000 -1 [unixODBC] Error while executing the query (non-fatal); ERROR: duplicate key violates unique constraint data_cod_tipo Query: DROP TABLE tminmax Errore in sqlSave(canale, tabella1, tablename = tminmax, rownames = FALSE, : unable to append to table 'tminmax' === It seems to say that the violation caused a non-fatal error nevertheless it stopped the batch job without going on! What should I do? Ciao Vittorio PS Eliminating from the original postgresql the offending records, of course, the above mentioned sqlSave command works like a charme! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dynamic loading error with Open Watcom object file
Hello. I am trying to use a FORTRAN subroutine from within R (Windows version). This fortran subroutine is compiled using the Open Watcom Fortran compiler and the compiled object file is called ritscale.obj. Following the explanation on pages 193-194 of The New S language I use the dyn.load command: dyn.load(f:/maxent/ritscale.obj) Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library 'f:/maxent/ritscale.obj': LoadLibrary failure: %1 n'est pas une application Win32 valide. The error message says: LoadLibrary failure: %1 is not a valid Win32 application I do not know what this means. Can someone help? Bill Shipley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dealing with Duplicates - How to count instances?
Hi there, given a data.frame 'data' I managed to filter out entries (rows) that are identical with respect to one column like so: duplicity - duplicated(data[column]) data_unique - subset(data,duplicity!=TRUE) But I'm trying to extract how many duplicates each of the remaining rows had. Can someone please send me down the right path for this? Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] features of save and save.image (unexpected file sizes)
Hi, On 2/2/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 2 Feb 2007, Vaidotas Zemlys wrote: Hm, I copied this code directly from Emacs+ESS, maybe the mailer mangled something. What I want to do with this piece of code (I will repaste it here) testf- function(formula) { mainform - formula if(deparse(mainform[[3]][[1]])!=|) stop(invalid conditioning) mmodel - substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula=list(main=mmodel)) } You use no spaces around your operators or after commas. R does when deparsing: testf function (formula) { mainform - formula if (deparse(mainform[[3]][[1]]) != |) stop(invalid conditioning) mmodel - substitute(y ~ x, list(y = mainform[[2]], x = mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula = list(main = mmodel)) } because it is (at least to old hands) much easier to read. IcanreadEnglishtextwithoutanyspacesbutIchoosenotto.Similarly,Rcode.Occasional spacesare evenharderto parse. Sorry for that, it is an old bad habit of mine. I'll try to get rid of it. Does anybody know, if ESS can do this automatically, besides automatical identing? Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CGIwithR question
I use svm in R to do some prediction. Now I want to get input from web, do prediction use svm, then output prediction results to web. I know CGIwithR could do this, but I don't know the details.Could someone give me some advice for this? thanks, Aimin Yan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with party and ordered factors
Hi All, i've got a problem using the ctree function of the party package. I've searched around a lot but couldn't manage it. I've got an ordered factor as response. As far as i know i have to use scores to be able to use this ordered factor. But if i do so i get a tree which predicts all observations as the first level of my ordered factor. In order to test if i did anything wrong i tried the example of Torsten Hothorn, Kurt Hornik and Achim Zeileis described in the documentation party: A Laboratory for Recursive Part(y)titioning There i got the same problem. I execute the following code: data(mammoexp, package = party) mtree - ctree(ME ~ ., data = mammoexp, scores = list(ME = 1:3, SYMPT = 1:4, DECT = 1:3)) plot(mtree) Beside getting one waring everthing's ok. But when i excute summary(predict(mtree)) the result is: Never Within a Year Over a Year 412 0 0 So now i'm stuck. Am i doing anything wrong? I'm using R 2.4.1 and all packages are uptodate. Thanks in advance for your help. Sincerely yours Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dynamic loading error with Open Watcom object file
On Fri, 2 Feb 2007, Bill Shipley wrote: Hello. I am trying to use a FORTRAN subroutine from within R (Windows version). This fortran subroutine is compiled using the Open Watcom Fortran compiler and the compiled object file is called ritscale.obj. Following the explanation on pages 193-194 of The New S language I use the dyn.load command: dyn.load(f:/maxent/ritscale.obj) Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library 'f:/maxent/ritscale.obj': LoadLibrary failure: %1 n'est pas une application Win32 valide. The error message says: LoadLibrary failure: %1 is not a valid Win32 application I do not know what this means. Can someone help? Yes. Unlike versions of S from the 1980s and 1990s, dyn.load() in R loads a DLL and not a compiled object. As the help page says: dyn.load(x, local = TRUE, now = TRUE) x: a character string giving the pathname to a shared library or DLL. 'shared library' is a Unix name for what Windows calls a DLL. So, you need to make a DLL from ritscale.obj. I used to know how to do that under Watcom, but (S Programming p.245) 'it is fraught with difficulties'. It would be much easier to use the recommended compiler (MinGW's g77). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Easy Earnings with Flextime
Hello, We hope you are reading this message in a fine mood. I'd like to welcome you on a very interesting opportunity. We supppose you will be very interested in a home job in which you could get about AUD4000 per month. This job will not affect your present career, it will only take a small part of your free time. The only things you will need to have to start running your business with our company are reliable Internet/E-mail access and checking/savings bank account. And your decency, of course! Your part of the job is to receive the funds which we will send directly to you through one of our money transfer methods from our company and/or our partners. After that you should re-send the money (less your commission) to us/our customers via one of chosen money transfer agencies. The job is rather simple and you won't need any special knowledge to become our partner! You will also stand the chances of being a part of our future and the excellence of a team in which you will be highly respected - just think about this amazing opportunity! We will be hoping to hear from you soon. Please fill our application form. No fees are asked, just leave your contact details: [EMAIL PROTECTED] We will contact you soon. Thank you! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can this loop be delooped?
Jim and Heikki, Thank you so much for your inventive solutions, much better and more efficient than my approach! As a footnote to my original post, it turns out that using lapply rather than sapply consistently returns a list, and not an array (which sapply gives when the number of points is an exact multiple of the number of groups). -- TMK -- 212-460-5430home 917-656-5351cell From: Kaskelma, Heikki [EMAIL PROTECTED] To: jim holtman [EMAIL PROTECTED],Talbot Katz [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: RE: [R] Can this loop be delooped? Date: Fri, 2 Feb 2007 16:06:51 +0200 Consider na=43; nb=5; x=1:na ns=rep(na %/% nb, nb) + (1:nb = na %% nb) split(x, rep(1:nb, ns)) Heikki Kaskelma On Fri, 2 Feb 2007, jim holtman [EMAIL PROTECTED] wrote: This might do what you want: # test data x - 1:43 nb - 5 # number of subsets # create vector of lengths of subsets ns - rep(length(x) %/% nb, nb) # see if we have to adjust counts of initial subsets if ((.offset - length(x) %% nb) != 0) ns[1:.offset] = ns[1:.offset] + 1 # create the subsets split(x, rep(1:nb,ns)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Thomas, you are absolutely correct. Here are some results comparing Jeff's original implementation, my suggestion using outer and pmax, and your clever trick using fast.pmax. Your fast.pmax function is more than 3 times faster than pmax. Thanks for this wonderful insight. Best, Ravi. x - rnorm(2000) y - runif(500) nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 # Here is the straightforward way system.time( + for(i in 1:nx) { + sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) + sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) + } + ) [1] 3.83 0.00 3.83 NA NA # Here is a faster way using outer and pmax system.time({ + sum11 - sum(outer(x,x,FUN=pmax)) + sum33 - sum(outer(x,y,FUN=pmax)) + }) [1] 2.55 0.48 3.04 NA NA # Here is an even faster method using Tom Lumley's suggestion: fast.pmax - function(x,y) {i- xy; x[i]-y[i]; x} system.time({ + sum111 - sum(outer(x,x,FUN=fast.pmax)) + sum333 - sum(outer(x,y,FUN=fast.pmax)) + }) [1] 0.78 0.08 0.86 NA NA all.equal(sum1,sum11,sum111) [1] TRUE all.equal(sum3,sum33,sum333) [1] TRUE --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 9:06 AM To: Ravi Varadhan Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) On Thu, 1 Feb 2007, Ravi Varadhan wrote: Jeff, Here is something which is a little faster: sum1 - sum(outer(x, x, FUN=pmax)) sum3 - sum(outer(x, y, FUN=pmax)) This is the sort of problem where profiling can be useful. My experience with pmax() is that it is surprisingly slow, presumably because it handles recycling and NAs In the example I profiled (an MCMC calculation) it was measurably faster to use function(x,y) {i- xy; x[i]-y[i]; x} -thomas Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey Racine Sent: Thursday, February 01, 2007 1:18 PM To: r-help@stat.math.ethz.ch Subject: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas
[R] R syntaxe
Hi all, Suppose I have a vector x with numerical values. In y, I have a categorial variable : y-c(1,1,..2,2,...30,30,30) x and y have the same length. I would like to compute the mean for x for the modality 1 to 30 in y. mean(x[y==1]),...,mean(x[y==30]) I do not want to use an iterative procedure such that for (i in 1:30).. Thanks for your help, Regards. Olivier. -- - Martin Olivier INRA - Unité Biostatistique Processus Spatiaux Domaine St Paul, Site Agroparc 84914 Avignon Cedex 9, France Tel : 04 32 72 21 57 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question about dput
Have you looked at the BRugs and R2WinBUGS packages? They have functions for automatically converting R lists and other objects into WinBugs files (you can even run winbugs from within R). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tong Wang Sent: Friday, February 02, 2007 12:38 AM To: R help Subject: [R] A question about dput Hi, I am trying to output a R data set for use in WinBugs, I used dput(list(x=rnorm(100),N=100),file=bug.dat) But I can't get the intended format: list(x=c(...),N=100), instead, I got something like this (copied the first two lines): []737472756374757265286C697374287 8structure(list(x [0010]203D2063282D302E333633313631303 3=c(-0.36316103 Did I do something wrong here ? Thanks a lot for any help tong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R syntaxe
Hi, On 2/2/07, Martin Olivier [EMAIL PROTECTED] wrote: Hi all, Suppose I have a vector x with numerical values. In y, I have a categorial variable : y-c(1,1,..2,2,...30,30,30) x and y have the same length. I would like to compute the mean for x for the modality 1 to 30 in y. mean(x[y==1]),...,mean(x[y==30]) I hope I understand your question correctly, but is this what you are looking for? x - runif(1) y - sample(x=1:30, size=length(x), replace=TRUE) one.possibility - tapply(X=x, INDEX=list(my.categ.var=y), FUN=mean) another.possibility - by(data=x, INDICES=list(my.categ.var=y), FUN=mean) one.possibility another.possibility Which one you prefer depends probably on your taste and what you intend to do with the results. HTH, Roland [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading very large files
Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli -- http://www.ceam.es/pausas [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Make error for R-devel package under linux
Hi, I got an error while compiling the R-devel (R-2.5.0) package under Redhat Linux, but the R-patch package (R-2.4.1) was compiled perfectly fine. The following is what I got from the R-devel package downloaed on Jan. 30th., 2007: --- ./configure R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory:/usr/local C compiler:gcc -std=gnu99 -g -O2 Fortran 77 compiler: g77 -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler:gfortran -g -O2 Obj-C compiler: Interfaces supported: X11 External libraries:readline Additional capabilities: PNG, JPEG, iconv, MBCS, NLS Options enabled: shared BLAS, R profiling, Java Recommended packages: yes configure: WARNING: you cannot build DVI versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals make . mkdir -p -- ../../../../library/tools/libs make[5]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools/src' make[4]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools/src' Error in loadNamespace(name) : there is no package called 'tools' In addition: Warning messages: 1: Line starting 'Package: tools ...' is malformed! 2: Line starting 'Version: 2.5.0 ...' is malformed! 3: Line starting 'Priority: base ...' is malformed! 4: Line starting 'Title: Tools for Pac ...' is malformed! 5: Line starting 'Author: Kurt Hornik ...' is malformed! 6: Line starting 'Maintainer: R Core T ...' is malformed! 7: Line starting 'Description: Tools f ...' is malformed! 8: Line starting 'License: GPL Version ...' is malformed! 9: Line starting 'Built: R 2.5.0; x86_ ...' is malformed! Execution halted make[3]: *** [all] Error 1 make[3]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools' make[2]: *** [R] Error 1 make[2]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library' make[1]: *** [R] Error 1 make[1]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src' make: *** [R] Error 1 --- I got the same error after I went into the directoy 'R-devel/src/library/tools' and run 'make' again. Any suggestions? Thanks, Xinxia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
Hi. General idea: 1. Open your file as a connection, i.e. con - file(pathname, open=r) 2. Generate a row to (file offset, row length) map of your text file, i.e. a numeric vector 'fileOffsets' and 'rowLengths'. Use readBin() for this. You build this up as you go by reading the file in chunks meaning you can handles files of any size. You can store this lookup map to file for your future R sessions. 3. Sample a set of rows r = (r1, r2, ..., rR), i.e. rows = sample(length(fileOffsets)). 4. Look up the file offsets and row lengths for these rows, i.e. offsets = fileOffsets[rows]. lengths = rowLengths[rows]. 5. In case your subset of rows is not ordered, it is wise to order them first to speed up things. If order is important, keep track of the ordering and re-order them at then end. 6. For each row r, use seek(con=con, where=offsets[r]) to jump to the start of the row. Use readBin(..., n=lengths[r]) to read the data. 7. Repeat from (3). /Henrik On 2/2/07, juli g. pausas [EMAIL PROTECTED] wrote: Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli -- http://www.ceam.es/pausas [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
Forgot to say, in your script you're reading the rows unordered meaning you're jumping around in the file and there is no way the hardware or the file caching system can optimize that. I'm pretty sure you would see a substantial speedup if you did: sel - sort(sel); /H On 2/2/07, Henrik Bengtsson [EMAIL PROTECTED] wrote: Hi. General idea: 1. Open your file as a connection, i.e. con - file(pathname, open=r) 2. Generate a row to (file offset, row length) map of your text file, i.e. a numeric vector 'fileOffsets' and 'rowLengths'. Use readBin() for this. You build this up as you go by reading the file in chunks meaning you can handles files of any size. You can store this lookup map to file for your future R sessions. 3. Sample a set of rows r = (r1, r2, ..., rR), i.e. rows = sample(length(fileOffsets)). 4. Look up the file offsets and row lengths for these rows, i.e. offsets = fileOffsets[rows]. lengths = rowLengths[rows]. 5. In case your subset of rows is not ordered, it is wise to order them first to speed up things. If order is important, keep track of the ordering and re-order them at then end. 6. For each row r, use seek(con=con, where=offsets[r]) to jump to the start of the row. Use readBin(..., n=lengths[r]) to read the data. 7. Repeat from (3). /Henrik On 2/2/07, juli g. pausas [EMAIL PROTECTED] wrote: Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli -- http://www.ceam.es/pausas [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
On Fri, 2007-02-02 at 18:40 +0100, juli g. pausas wrote: Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli Juli, I don't have a file to test this on, so caveat emptor. The problem with the approach above, is that you are re-reading the source file, once per line, or 3000 times. In addition, each read is likely going through half the file on average to locate the randomly selected line. Thus, the reality is that you are probably reading on the order of: 3000 * 45 [1] 1.35e+09 lines in the file, which of course if going to be quite slow. In addition, you are also writing to the target file 3000 times. The basic premise with this approach below, is that you are in effect creating a sequential file cache in an R object. Reading large chunks of the source file into the cache. Then randomly selecting rows within the cache and then writing out the selected rows. Thus, if you can read 100,000 rows at once, you would have 9 reads of the source file, and 9 writes of the target file. The key thing here is to ensure that the offsets within the cache and the corresponding random row values are properly set. Here's the code: # Generate the random values sel - sample(1:90, 3000) # Set up a sequence for the cache chunks # Presume you can read 100,000 rows at once Cuts - seq(0, 90, 10) # Loop over the length of Cuts, less 1 for (i in seq(along = Cuts[-1])) { # Get a 100,000 row chunk, skipping rows # as appropriate for each subsequent chunk Chunk - scan(myfile, what = character(), sep = \n, skip = Cuts[i], nlines = 10) # set up a row sequence for the current # chunk Rows - (Cuts[i] + 1):(Cuts[i + 1]) # Are any of the random values in the # current chunk? Chunk.Sel - sel[which(sel %in% Rows)] # If so, get them if (length(Chunk.Sel) 0) { Write.Rows - Chunk[sel - Cuts[i]] # Now write them out write(Write.Rows, myfile_short, append = TRUE) } } As noted, I have not tested this, so there may yet be additional ways to save time with file seeks, etc. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
I had a file with 200,000 lines in it and it took 1 second to select 3000 sample lines out of it. One of the things is to use a connection so that the file stays opens and then just 'skip' to the next record to read: input - file(/tempxx.txt, r) sel - 3000 remaining - 20 # get the records numbers to select recs - sort(sample(1:remaining, sel)) # compute number to skip on each read; account for the record just read skip - diff(c(1, recs)) - 1 # allocate my data mysel - vector('character', sel) system.time({ + for (i in 1:sel){ + mysel[i] - scan(input, what=, sep=\n, skip=skip[i], n=1, quiet=TRUE) + } + }) [1] 0.97 0.02 1.00 NA NA On 2/2/07, juli g. pausas [EMAIL PROTECTED] wrote: Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli -- http://www.ceam.es/pausas [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
I suspect that reading from a connection in chunks of say 10,000 rows and discarding those you do not want would be simpler and at least as quick. Not least because seek() on Windows is so unreliable. On Fri, 2 Feb 2007, Henrik Bengtsson wrote: Hi. General idea: 1. Open your file as a connection, i.e. con - file(pathname, open=r) 2. Generate a row to (file offset, row length) map of your text file, i.e. a numeric vector 'fileOffsets' and 'rowLengths'. Use readBin() for this. You build this up as you go by reading the file in chunks meaning you can handles files of any size. You can store this lookup map to file for your future R sessions. 3. Sample a set of rows r = (r1, r2, ..., rR), i.e. rows = sample(length(fileOffsets)). 4. Look up the file offsets and row lengths for these rows, i.e. offsets = fileOffsets[rows]. lengths = rowLengths[rows]. 5. In case your subset of rows is not ordered, it is wise to order them first to speed up things. If order is important, keep track of the ordering and re-order them at then end. 6. For each row r, use seek(con=con, where=offsets[r]) to jump to the start of the row. Use readBin(..., n=lengths[r]) to read the data. 7. Repeat from (3). /Henrik On 2/2/07, juli g. pausas [EMAIL PROTECTED] wrote: Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli -- http://www.ceam.es/pausas [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] features of save and save.image (unexpected file sizes)
On Fri, 2 Feb 2007, Prof Brian Ripley wrote: On Fri, 2 Feb 2007, Vaidotas Zemlys wrote: Hi, On 2/2/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: I found the culprit. I was parsing formulas in my code, and I saved them in that large object. So the environment came with saved formulas. Is there a nice way to say R: please do not save the environments with the formulas, I do not need them? No, but why create them that way? You could do mmodel - as.formula(mmodel, env=.GlobalEnv) Hm, but say I have some large object in .GlobalEnv, and I generate mmodel 10 different times and save the result as a list with length 10. Now if I try to save this list, R will save 10 different copies of .GlobalEnv together with aforementioned large object? No, it saves the environment (here .GlobalEnv), not objects, and there can be many shared references. Just to amplify this point: Only a marker representing .GlobalEnv is saved; on load into a new session that marker becomes the .GlobalEnv of the new session. Best, luke The R way is to create what you want, not fix up afterwards. (I find your code unreadable--spaces help a great deal, so am not sure if I have understood it correctly.) Hm, I copied this code directly from Emacs+ESS, maybe the mailer mangled something. What I want to do with this piece of code (I will repaste it here) testf- function(formula) { mainform - formula if(deparse(mainform[[3]][[1]])!=|) stop(invalid conditioning) mmodel - substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula=list(main=mmodel)) } You use no spaces around your operators or after commas. R does when deparsing: testf function (formula) { mainform - formula if (deparse(mainform[[3]][[1]]) != |) stop(invalid conditioning) mmodel - substitute(y ~ x, list(y = mainform[[2]], x = mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula = list(main = mmodel)) } because it is (at least to old hands) much easier to read. IcanreadEnglishtextwithoutanyspacesbutIchoosenotto.Similarly,Rcode.Occasional spacesare evenharderto parse. is to read formula with condition: formula(y~x|z) and construct formula formula(y~x) I looked for examples in code of coplot in library graphics and latticeParseFormula in library lattice. Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: [EMAIL PROTECTED] Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
On Fri, 2007-02-02 at 12:32 -0600, Marc Schwartz wrote: On Fri, 2007-02-02 at 18:40 +0100, juli g. pausas wrote: Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs - scan(myfile, what= character(), sep = \n) trs- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle the 1.8 G file. I thought of an alternative way that not require to read the whole file: sel - sample(1:90, 3000) for (i in 1:3000) { un - scan(myfile, what= character(), sep = \n, skip=sel[i], nlines=1) write(un, myfile_short, append=TRUE) } This works on my computer; however it is extremely slow; it read one line each time. It is been running for 25 hours and I think it has done less than half of the file (Yes, probably I do not have a very good computer and I'm working under Windows ...). So my question is: do you know any other faster way to do this? Thanks in advance Juli Juli, I don't have a file to test this on, so caveat emptor. The problem with the approach above, is that you are re-reading the source file, once per line, or 3000 times. In addition, each read is likely going through half the file on average to locate the randomly selected line. Thus, the reality is that you are probably reading on the order of: 3000 * 45 [1] 1.35e+09 lines in the file, which of course if going to be quite slow. In addition, you are also writing to the target file 3000 times. The basic premise with this approach below, is that you are in effect creating a sequential file cache in an R object. Reading large chunks of the source file into the cache. Then randomly selecting rows within the cache and then writing out the selected rows. Thus, if you can read 100,000 rows at once, you would have 9 reads of the source file, and 9 writes of the target file. The key thing here is to ensure that the offsets within the cache and the corresponding random row values are properly set. Here's the code: # Generate the random values sel - sample(1:90, 3000) # Set up a sequence for the cache chunks # Presume you can read 100,000 rows at once Cuts - seq(0, 90, 10) # Loop over the length of Cuts, less 1 for (i in seq(along = Cuts[-1])) { # Get a 100,000 row chunk, skipping rows # as appropriate for each subsequent chunk Chunk - scan(myfile, what = character(), sep = \n, skip = Cuts[i], nlines = 10) # set up a row sequence for the current # chunk Rows - (Cuts[i] + 1):(Cuts[i + 1]) # Are any of the random values in the # current chunk? Chunk.Sel - sel[which(sel %in% Rows)] # If so, get them if (length(Chunk.Sel) 0) { Write.Rows - Chunk[sel - Cuts[i]] Quick typo correction: The last line above should be: Write.Rows - Chunk[sel - Cuts[i], ] # Now write them out write(Write.Rows, myfile_short, append = TRUE) } } As noted, I have not tested this, so there may yet be additional ways to save time with file seeks, etc. If that's the only error in the code... :-) Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inquiry
Hello, I came across your website and thought it would be a great resource to have listed on my friends page. Would you be interested in exchanging links with my new site? Kind Regards, Angie Hernandez [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make error for R-devel package under linux
It is Feb 2 today: this was fixed 60 hours ago. In any case, R-help is not the list to discuss any aspect of R version 2.5.0 Under development (unstable) (2007-02-01 r40632) which cannot be expected to build in all locales and architectures at all times. (This was a UTF-8 x 64-bit problem, and you omitted to tell us your locale, crucial here.) 'R-devel' is not a package: it is both a series of versions of R and a mailing to discuss R developments (amongst other things). On Fri, 2 Feb 2007, Xinxia Peng wrote: Hi, I got an error while compiling the R-devel (R-2.5.0) package under Redhat Linux, but the R-patch package (R-2.4.1) was compiled perfectly fine. The following is what I got from the R-devel package downloaed on Jan. 30th., 2007: --- ./configure R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory:/usr/local C compiler:gcc -std=gnu99 -g -O2 Fortran 77 compiler: g77 -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler:gfortran -g -O2 Obj-C compiler: Interfaces supported: X11 External libraries:readline Additional capabilities: PNG, JPEG, iconv, MBCS, NLS Options enabled: shared BLAS, R profiling, Java Recommended packages: yes configure: WARNING: you cannot build DVI versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals make . mkdir -p -- ../../../../library/tools/libs make[5]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools/src' make[4]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools/src' Error in loadNamespace(name) : there is no package called 'tools' In addition: Warning messages: 1: Line starting 'Package: tools ...' is malformed! 2: Line starting 'Version: 2.5.0 ...' is malformed! 3: Line starting 'Priority: base ...' is malformed! 4: Line starting 'Title: Tools for Pac ...' is malformed! 5: Line starting 'Author: Kurt Hornik ...' is malformed! 6: Line starting 'Maintainer: R Core T ...' is malformed! 7: Line starting 'Description: Tools f ...' is malformed! 8: Line starting 'License: GPL Version ...' is malformed! 9: Line starting 'Built: R 2.5.0; x86_ ...' is malformed! Execution halted make[3]: *** [all] Error 1 make[3]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools' make[2]: *** [R] Error 1 make[2]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library' make[1]: *** [R] Error 1 make[1]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src' make: *** [R] Error 1 --- I got the same error after I went into the directoy 'R-devel/src/library/tools' and run 'make' again. Any suggestions? Thanks, Xinxia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make error for R-devel package under linux
Thanks a lot for the quick help. That's the first message I sent since I joined the list last night. :) Best, Xinxia -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 10:56 AM To: Xinxia Peng Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Make error for R-devel package under linux It is Feb 2 today: this was fixed 60 hours ago. In any case, R-help is not the list to discuss any aspect of R version 2.5.0 Under development (unstable) (2007-02-01 r40632) which cannot be expected to build in all locales and architectures at all times. (This was a UTF-8 x 64-bit problem, and you omitted to tell us your locale, crucial here.) 'R-devel' is not a package: it is both a series of versions of R and a mailing to discuss R developments (amongst other things). On Fri, 2 Feb 2007, Xinxia Peng wrote: Hi, I got an error while compiling the R-devel (R-2.5.0) package under Redhat Linux, but the R-patch package (R-2.4.1) was compiled perfectly fine. The following is what I got from the R-devel package downloaed on Jan. 30th., 2007: --- ./configure R is now configured for x86_64-unknown-linux-gnu Source directory: . Installation directory:/usr/local C compiler:gcc -std=gnu99 -g -O2 Fortran 77 compiler: g77 -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler:gfortran -g -O2 Obj-C compiler: Interfaces supported: X11 External libraries:readline Additional capabilities: PNG, JPEG, iconv, MBCS, NLS Options enabled: shared BLAS, R profiling, Java Recommended packages: yes configure: WARNING: you cannot build DVI versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals make . mkdir -p -- ../../../../library/tools/libs make[5]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools/src' make[4]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools/src' Error in loadNamespace(name) : there is no package called 'tools' In addition: Warning messages: 1: Line starting 'Package: tools ...' is malformed! 2: Line starting 'Version: 2.5.0 ...' is malformed! 3: Line starting 'Priority: base ...' is malformed! 4: Line starting 'Title: Tools for Pac ...' is malformed! 5: Line starting 'Author: Kurt Hornik ...' is malformed! 6: Line starting 'Maintainer: R Core T ...' is malformed! 7: Line starting 'Description: Tools f ...' is malformed! 8: Line starting 'License: GPL Version ...' is malformed! 9: Line starting 'Built: R 2.5.0; x86_ ...' is malformed! Execution halted make[3]: *** [all] Error 1 make[3]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library/tools' make[2]: *** [R] Error 1 make[2]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src/library' make[1]: *** [R] Error 1 make[1]: Leaving directory `/nethome/xpeng/linux/tools/R-devel/src' make: *** [R] Error 1 --- I got the same error after I went into the directoy 'R-devel/src/library/tools' and run 'make' again. Any suggestions? Thanks, Xinxia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] good brochure about matrix operations
Hi there, I would like if is there a good PDF available about matrix operations in R. Kind regards, miltinho Brazil __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading very large files
On Fri, 2007-02-02 at 12:42 -0600, Marc Schwartz wrote: On Fri, 2007-02-02 at 12:32 -0600, Marc Schwartz wrote: Juli, I don't have a file to test this on, so caveat emptor. The problem with the approach above, is that you are re-reading the source file, once per line, or 3000 times. In addition, each read is likely going through half the file on average to locate the randomly selected line. Thus, the reality is that you are probably reading on the order of: 3000 * 45 [1] 1.35e+09 lines in the file, which of course if going to be quite slow. In addition, you are also writing to the target file 3000 times. The basic premise with this approach below, is that you are in effect creating a sequential file cache in an R object. Reading large chunks of the source file into the cache. Then randomly selecting rows within the cache and then writing out the selected rows. Thus, if you can read 100,000 rows at once, you would have 9 reads of the source file, and 9 writes of the target file. The key thing here is to ensure that the offsets within the cache and the corresponding random row values are properly set. Here's the code: # Generate the random values sel - sample(1:90, 3000) # Set up a sequence for the cache chunks # Presume you can read 100,000 rows at once Cuts - seq(0, 90, 10) # Loop over the length of Cuts, less 1 for (i in seq(along = Cuts[-1])) { # Get a 100,000 row chunk, skipping rows # as appropriate for each subsequent chunk Chunk - scan(myfile, what = character(), sep = \n, skip = Cuts[i], nlines = 10) # set up a row sequence for the current # chunk Rows - (Cuts[i] + 1):(Cuts[i + 1]) # Are any of the random values in the # current chunk? Chunk.Sel - sel[which(sel %in% Rows)] # If so, get them if (length(Chunk.Sel) 0) { Write.Rows - Chunk[sel - Cuts[i]] Quick typo correction: The last line above should be: Write.Rows - Chunk[sel - Cuts[i], ] # Now write them out write(Write.Rows, myfile_short, append = TRUE) } } OK, I knew it was too good to be true... One more correction on that same line: Write.Rows - Chunk[Chunk.Sel - Cuts[i], ] For clarity, here is the full set of code: # Generate the random values sel - sample(90, 3000) # Set up a sequence for the cache chunks # Presume you can read 100,000 rows at once Cuts - seq(0, 90, 10) # Loop over the length of Cuts, less 1 for (i in seq(along = Cuts[-1])) { # Get a 100,000 row chunk, skipping rows # as appropriate for each subsequent chunk Chunk - scan(myfile, what = character(), sep = \n, skip = Cuts[i], nlines = 10) # set up a row sequence for the current # chunk Rows - (Cuts[i] + 1):(Cuts[i + 1]) # Are any of the random values in the # current chunk? Chunk.Sel - sel[which(sel %in% Rows)] # If so, get them if (length(Chunk.Sel) 0) { Write.Rows - Chunk[Chunk.Sel - Cuts[i], ] # Now write them out write(Write.Rows, myfile_short, append = TRUE) } } Regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with Duplicates - How to count instances?
table(data[column]) will give you the number of items in each subgroup; that would be the count you are after. On 2/2/07, Johannes Graumann [EMAIL PROTECTED] wrote: Hi there, given a data.frame 'data' I managed to filter out entries (rows) that are identical with respect to one column like so: duplicity - duplicated(data[column]) data_unique - subset(data,duplicity!=TRUE) But I'm trying to extract how many duplicates each of the remaining rows had. Can someone please send me down the right path for this? Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with party and ordered factors
Christoph: I've got an ordered factor as response. As far as i know i have to use scores to be able to use this ordered factor. If you want to exploit the ordering in the statistical tests (used for variable selection in CTree), a natural approach is to use a linear-by-linear test with scores assigned to the ordered levels of your factor. That's what the example below does. But if i do so i get a tree which predicts all observations as the first level of my ordered factor. That is not due to the factor being ordered. It results simply from the fact that more than half of the observations have Never in the variable ME. There i got the same problem. I execute the following code: data(mammoexp, package = party) mtree - ctree(ME ~ ., data = mammoexp, scores = list(ME = 1:3, SYMPT = 1:4, DECT = 1:3)) plot(mtree) If you look at this picture, you can see that majority voting in each node will result in the prediction Never. So now i'm stuck. Am i doing anything wrong? Nothing. If you want to see how the distribution in each node changes, you can look at treeresponse(mtree) I'm using R 2.4.1 and all packages are uptodate. Not anymore, I just uploaded a new party version to CRAN ;-)) Best wishes, Z __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] good brochure about matrix operations
There are some reference cards and other contributed documents here: http://cran.r-project.org/other-docs.html On 2/2/07, Milton Cezar Ribeiro [EMAIL PROTECTED] wrote: Hi there, I would like if is there a good PDF available about matrix operations in R. Kind regards, miltinho Brazil __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inquiry
Angie Hernandez wrote: I came across your website and thought it would be a great resource to have listed on my friends page. Would you be interested in exchanging links with my new site? Well, I don't see why we cant make CRAN more like MySpace? Barry [joke] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multinomial logistic regression with equality constraints?
I'm interested in doing multinomial logistic regression with equality constraints on some of the parameter values. For example, with categorical outcomes Y_1 (baseline), Y_2, and Y_3, and covariates X_1 and X_2, I might want to impose the equality constraint that \beta_{2,1} = \beta_{3,2} that is, that the effect of X_1 on the logit of Y_2 is the same as the effect of X_2 on the logit of Y_3. Is there an existing facility or package in R for doing this? Would multinomRob fit the bill? Many thanks, Roger -- Roger Levy Email: [EMAIL PROTECTED] Assistant Professor Phone: 858-534-7219 Department of Linguistics Fax: 858-534-4789 UC San DiegoWeb: http://ling.ucsd.edu/~rlevy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Snow Package and R: Exported Variable Problem
Hello and thanks in advance for your time. I've created a simulation on my cluster which uses a custom package developed by me for different functions and also the snow package. Right now I'm using LAM to communicate between nodes and am currently only testing my code on 3 nodes for simplicity, though I plan on expanding to 16 later. My problem is this error: Error in fn(par, ..) : object \x1\ not found \n attr(,class) try-error In my simulation I need to run a function several times with an different variable each time. All the invocations on the functions are independent of the others. I start the simulation on one node, create a cluster of several nodes, load my custom package and snow on all of them, use clusterExport(cl, x1) to export the variable x1(among other variable I need), then I call my simulation on the cluster using clusterApplyLB(cl, 2:S, simClust) where cl is the cluster and S is a constant defined above as 500. Using print statements (since snow, or R for that matter, has next to no ability to debug) I found that the error cropped up in this statement: theta6 = optim(c(0,0,0,0,0,0,.2), loglikelihood, scrore6, method = CG, control=list(fnscale=-1,reltol=1e-8,maxit=2000))$par Both the functions loglikelelihood and score6 use x1, but I know that it is getting exported to the node correctly since it gets assigned earlier in the simulation: x1 = rep(0,n1) The error I stated above happens fo every itteration of the simulation (499 times) and I'm really at a loss as to why its happening and what I can do to determine what it is. I'm wondering at this point if exporting the variable makes it unavailable to certain other packages, though that doesn't really make any sense. If anyone can help me with this problem, or let me know how I can debug this, or even a clue as to why it might be happening I would greatly appreciate it. I've been wrestling with this for some time and no online documentation can help. Thank you for your time and help. Just so you know I'm a Computer Scientist not a Statistician, though I will be able to give any information about the statistics involved in this program. I am reluctant to give away all source code since it is not my work but rather code I'm converting from standard code to parallelized code for a professor of mine. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Access to column names stored in a vector in lm procedure
Hello everybody I have to run many statistical tests that are identical, with the exception of the dependent variable. Is there a possibility to store the dependent variable names e.g. in a vector (in the below mentioned example called variable) and to use the content of this vector in a simple statistical test (e.g. a regression). I would like to write the statistical procedure only once For example: I would like to store 100 dependent variable names in a vector called variable and then I would like to do a simple regression with each of theses variables and the variable length using e.g. the while function. I would then extract e.g. the t-value and add it to a vector (result) that contains all results. Something like that: variable should contain the names of the 100 dependent variables (Var1, Var2, Var100) while(i101){ result-c(result,coef(summary(lm(variable[i] ~ length, data = data2)))[2,4]); i-i+1 } This example does not work since the lm function does not recognize the dependent variables name. Does somebody know how to store the names of the dependent variables in e.g. a vector and to make them available for the lm function? Many thanks Patrick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with survreg and anova function
Hi, I make a weibull survival regression using suvreg function. Bu when I try to get the P values from anova, it give me NAs: I'm using R version 2.4.0 Patched (2006-11-25 r39997) and survival 2.30 library Look: m - survreg(Surv(tempo,censor)~grupo*peso) anova(m) Df Deviance Resid. Df-2*LL P(|Chi|) NULL NA NA 148 966.6416NA grupo -2 25.6334407 146 941.0081NA peso -1 0.7088892 145 940.2992NA grupo:peso -2 1.0731248 143 939.2261NA Older version of survival dont have this problem, look this printed result: anova(glex8.m1) -2LL DfDeviance Resid. Df P(|Chi|) NULL NA NA 2 966.6416 NA grupo 2 25.6334408 4 941.0081 2.714995e-06 peso1 0.7088892 5 940.2992 3.998128e-01 grupo:peso 2 1.07312487 939.2261 5.847550e-01 This a bug or a change in package? The data from crawley book: structure(list(tempo = c(20, 34, 1, 2, 3, 3, 50, 26, 1, 50, 21, 3, 13, 11, 22, 50, 50, 1, 50, 9, 50, 1, 13, 50, 50, 1, 6, 50, 50, 50, 36, 3, 46, 10, 50, 1, 18, 3, 36, 37, 50, 7, 1, 1, 7, 24, 4, 50, 12, 17, 1, 1, 1, 21, 50, 50, 1, 46, 50, 1, 8, 2, 12, 3, 2, 1, 5, 50, 1, 2, 2, 4, 17, 5, 1, 11, 8, 1, 5, 2, 41, 5, 21, 1, 38, 50, 3, 19, 4, 7, 1, 46, 2, 5, 40, 4, 50, 2, 1, 17, 7, 1, 5, 1, 1, 5, 6, 2, 24, 1, 1, 1, 1, 7, 13, 6, 11, 46, 5, 14, 2, 1, 20, 2, 20, 1, 23, 11, 1, 1, 20, 9, 1, 1, 1, 1, 7, 11, 1, 3, 1, 5, 9, 21, 10, 11, 30, 1, 1, 17), censor = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ), peso = c(5.385, 7.413, 9.266, 6.228, 5.229, 9.699, 1.973, 5.838, 2.088, 0.237, 6.814, 5.502, 1.137, 6.323, 7.384, 8.713, 7.458, 1.424, 1.312, 5.162, 7.187, 4.677, 6.548, 5.903, 2.113, 7.617, 3.737, 8.972, 6.523, 2.165, 4.895, 6.538, 1.674, 6.726, 2.671, 4.949, 4.819, 5.08, 3.532, 4.406, 6.286, 5.529, 2.27, 5.245, 9.675, 5.61, 4.297, 3.179, 6.776, 0.466, 0.626, 1.221, 0.124, 0.32, 2.282, 0.287, 3.468, 7.314, 4.901, 5.418, 6.344, 1.163, 12.126, 11.561, 8.333, 0.055, 10.583, 9.534, 13.182, 10.156, 16.881, 15.452, 16.831, 18.947, 19.099, 19, 9.652, 1.544, 10.786, 4.13, 2.2, 7.567, 14.581, 26.259, 0.44, 18.188, 6.789, 16.669, 38.177, 29.154, 14.578, 1.569, 0.345, 33.929, 28.958, 38.139, 26.822, 39.501, 9.264, 22.88, 27.48, 35.069, 4.974, 41.521, 42.09, 25.037, 9.509, 23.682, 0.352, 19.589, 7.426, 7.913, 2.37, 5.533, 18.8, 18.508, 3.343, 26.926, 2.388, 21.567, 5.594, 17.15, 15.986, 1.588, 2.055, 16.074, 12.086, 20.524, 6.493, 7.258, 16.635, 10.324, 5.228, 0.784, 5.587, 5.011, 7.441, 3.69, 4.708, 9.207, 1.4, 6.309, 1.784, 0.767, 1.993, 1.03, 2.875, 1.82, 0.974, 0.1), grupo = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), .Label = c(g1, g2, g3), class = factor)), .Names = c(tempo, censor, peso, grupo), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150)) Thanks Inte Ronaldo -- O cigarro disse ao fumante: Hoje você me acende, amanhã eu te apago. -- Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/Depto. Biologia Geral/Lab. Ecologia Evolutiva | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8190 | [EMAIL PROTECTED] | ICQ#: 5692561 | LinuxUser#: 205366 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
[R] Another loop - deloop question
Hi. You folks are so clever, I thought perhaps you could help me make another procedure more efficient. Right now I have the following setup: p is a vector of length m g is a list of length n, g[[i]] is a vector whose elements are indices of p, i.e., integers between 1 and m inclusive); the g[[i]] cover the full set 1:m, but they don't have to constitute an exact partition, theycan overlap members. y is another list of length n, each y[[i]] is a vector of the same length as g[[i]]. Now I build up the vector p as follows: p=rep(0,m) for(i in 1:n){p[g[[i]]]=p[g[[i]]]+y[[i]]} Can this loop be vectorized? Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with Duplicates - How to count instances?
jim holtman wrote: table(data[column]) will give you the number of items in each subgroup; that would be the count you are after. Thanks for your Help! That rocks! I can do copynum - table(data_6plus[Accession.number]) data_6plus$Repeats - sapply(data_6plus[[Accession.number]], function(x) copynum[x][[1]]) now! But how about this: - do something along the lines of duplicity - duplicated(data_6plus[Accession.number]) data_6plus_unique - subset(data_6plus,duplicity!=TRUE) - BUT: retain from each deleted row one field, append it to a vector and fill that into a new field of the remaining row of the set sharing data_6plus[Accession.number]? How would you do something like that? Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [BioC] Outlook does threading [Broadcast]
To close out this thread... I have solved my problem with Outlook not displaying threads by creating a Gmail account for all my R and BioC needs and am viewing that with Mozilla Thunderbird. Seems to be working nicely and I now have the benefit of viewing by threads like to many of you have been doing all along. I continue to use Outlook for my other needs. I thought I would share this in case other Outlook users search the archives and wonder how the problem was ever solved. Thanks again, Mark Kimpel IU School of Medicine Liaw, Andy wrote: This is really off-topic for both BioC and R-help, so I'll keep it short. From: Kimpel, Mark William See below for Bert Gunter's off list reply to me (which I do appreciate). I'm putting it back on the list because it seems there is still confusion regarding the difference between threading and sorting by subject. I thought the example I will give below will serve as instructional for other Outlook users who may be similarly confused as I was (am?). Per Bert's instructions, I just set up my inbox to sort by subject. I sent one email to myself with the subject test1 and then replied to it without changing the subject. The reply correctly went to test1 in the inbox sorter. I then changed the subject heading in the test1 reply to test2 and sent it to myself. This time Outlook re-categorized it and put it in a separate compartment in the view called test2. If Outlook can do threading the way the R mail server does, I don't think this is the way to do it. AFAIK there's no proper way to get the correct threading in Outlook. What I do is group by conversation topic, but that doesn't solve the problem. This is only problem on your (and all Outlook users'?) end, though. The bigger problem that affects the lists is that some versions of MS Exchange Server do not include the In-reply-to header field that many mailing lists rely on for proper threading. As a result, when I reply to other people's post, it may show up in Outlook as having been threaded properly (because the subject is fine), but it throws everything else that does proper threading off. Unless someone has an idea of how to correctly set up Outlook to do threading in the manner that the R mail server does, Maybe some VBA coding can be done to get it right, but short of that, I very much doubt it. I think the message for us Outlook users is to just create, from scratch, a new message when initiating a new subject. That message ought to be clear for everyone. You should never reply to a message when you really mean to start a new topic, regardless what you are using. Andy Thanks for all your help. Mark -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 7:03 PM To: Kimpel, Mark William Subject: Outlook does threading Mark: No need to bother the R list with this. Outlook does threading. Just sort on Subject in the viewer. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark William Sent: Wednesday, January 31, 2007 3:36 PM To: Peter Dalgaard Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] possible spam alert Peter, Thanks you for your explanation, I had taken Mr. Connolly's message to me to imply that I was not changing the subject line. I use MS Outlook 2007 and, unless I am just not seeing it, Outlook does not normally display the in reply to header, I was under the mistaken impression that that was what the Subject line was for. See, for example, the header to your message to me below. Outlook will, however, sort messages by Subject, and that is what I thought was meant by threading. Well, I learned something today and apologize for any inconvenience my posts may have caused. BTW, I use Outlook because it is supported by my university server and will synch my appointments and contacts with my PDA, which runs Windows CE. If anyone has a suggestion for me of a better email program that will provide proper threading AND work with a MS email server and synch with Windows CE, I'd love to hear it. Thanks again, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 6:25 PM To: Kimpel, Mark William Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] possible spam alert Kimpel, Mark William wrote: The last two times I have originated message threads on R or Bioconductor I have received the message included below from someone named Patrick Connolly. Both times I was the originator of the
[R] CGIwithR
I try a example http://omega.psi.iastate.edu/bootstrapFile.html it doesn't give me output. I don't why. For another example http://omega.psi.iastate.edu/trivial.html it seems works except it doesn't display figure. Does anyone know how to figure it out? Aimin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding Histograms to Leaves of Rpart Tree or other Dendrogram
Hi - I'm trying to append simple density histograms of a continuous variable to the leaves of an rpart tree. The splits in the tree are all levels of a factor and I'm hoping to make the histograms out of the subsets of the dataframe corresponding to the splits and for them to be attached to the appropriate leaf of the final tree. Any help would be much appreciated, thanks, Jon Zelner University of Michigan Gerald R. Ford School of Public Policy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding Histograms to Leaves of Rpart Tree or other Dendrogram
Hi Jon, Take a look at this graph http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=85 I think it is very close to what you need (source code provided at the site). -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Zelner Sent: Friday, February 02, 2007 5:07 PM To: r-help@stat.math.ethz.ch Subject: [R] Adding Histograms to Leaves of Rpart Tree or other Dendrogram Hi - I'm trying to append simple density histograms of a continuous variable to the leaves of an rpart tree. The splits in the tree are all levels of a factor and I'm hoping to make the histograms out of the subsets of the dataframe corresponding to the splits and for them to be attached to the appropriate leaf of the final tree. Any help would be much appreciated, thanks, Jon Zelner University of Michigan Gerald R. Ford School of Public Policy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] silent loading of packages
Thank you for your help. It answers my question but I'm still unclear about something. I understand that suppressWarnings() will suppress warnings suppressMessanges() will suppress *messages* What other *messages* type are there and how to suppress all of them. For example, I want to quietly *install* a package: install.packages(pkgDepTools,repos=http://bioconductor.org,lib=~/.R/library-i686;) but I always get the following printed. How to suppress it ? trying URL 'http://bioconductor.org/src/contrib/pkgDepTools_1.0.1.tar.gz' Content type 'application/x-gzip' length 145394 bytes opened URL == downloaded 141Kb * Installing *source* package 'pkgDepTools' ... ** R ** inst ** preparing package for lazy loading Loading required package: graph Loading required package: RBGL ** help Building/Updating help pages for package 'pkgDepTools' Formats: text html latex example basicInstallOrder texthtmllatex cleanPkgField texthtmllatex getDownloadSize texthtmllatex getInstallOrder texthtmllatex makeDepGraph texthtmllatex example makePkgUrltexthtmllatex parseContentLengthtexthtmllatex pkgDepTools-package texthtmllatex ** building package indices ... * DONE (pkgDepTools) The downloaded packages are in /tmp/RtmpYl8LZA/downloaded_packages Warning message: cannot create HTML package index in: tools:::unix.packages.html(.Library) Thank you again, -Johan - Original Message From: Prof Brian Ripley [EMAIL PROTECTED] To: johan Faux [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Tuesday, January 30, 2007 5:24:25 PM Subject: Re: [R] silent loading of packages It depends on the 'message'. In this case library(VGAM, warn.conflicts=FALSE) suppressMessages(library(VGAM)) both work. (How did you manage to miss the first?) In general, it depends on whether the 'message' is a message in the sense of message() or produced some other way. sink() would work, but these are *messages*, so how did you use it? On Tue, 30 Jan 2007, johan Faux wrote: I would like to turn off all the messages during library(aPackage) or require(aPackage) I tried different commands: invisible, capture.output, sink but none of them is working. For example, loading VGAM, gives a lot of unnecessary messages: library(VGAM) Attaching package: 'VGAM' The following object(s) are masked from package:splines : bs The following object(s) are masked from package:splines : ns The following object(s) are masked from package:boot : logit The following object(s) are masked from package:boot : simplex The following object(s) are masked from package:stats : glm The following object(s) are masked from package:stats : lm The following object(s) are masked from package:stats : poly The following object(s) are masked from package:stats : predict.glm The following object(s) are masked from package:stats : predict.lm The following object(s) are masked from package:stats : predict.mlm The following object(s) are masked from package:base : scale.default Any hint/help will be appreciated. - Expecting? Get great news right away with email Auto-Check. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 Never Miss an Email [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another loop - deloop question
Talbot, Vectorization is not panacea. For n == 100, m ==1000: system.time( for( i in 1:n ){ p[ g[[i]] ] - p[ g[[i]] ] + y[[i]] }) [1] 0 0 0 NA NA system.time( p2 - tapply( unlist(y), unlist(g), sum )) [1] 0.16 0.00 0.16 NA NA all.equal(p,as.vector(p2)) [1] TRUE system.time( p3 - xtabs( unlist(y) ~ unlist(g) ) ) [1] 0.08 0.00 0.08 NA NA all.equal(p,as.vector(p3)) [1] TRUE system.time( p4 - unlist(y) %*% diag(m)[ unlist(g), ] ) [1] 4.16 0.20 4.36 NA NA all.equal(p,as.vector(p4)) [1] TRUE Vectorization has had no victory, Grasshopper. --- For n == 1, m == 10, the slowest method above becomes the fastest, and the fastest above becomes the slowest. So, you need to consider the applications to which you will apply this. Read up on profiling if you really 'feel the need for speed'. (Writing R Extensions 3.2 Profiling R code for speed.) Chuck p.s. Please read Writing R Extensions 3.1 Tidying R code and follow the wisdom therein. On Fri, 2 Feb 2007, Talbot Katz wrote: Hi. You folks are so clever, I thought perhaps you could help me make another procedure more efficient. Right now I have the following setup: p is a vector of length m g is a list of length n, g[[i]] is a vector whose elements are indices of p, i.e., integers between 1 and m inclusive); the g[[i]] cover the full set 1:m, but they don't have to constitute an exact partition, theycan overlap members. y is another list of length n, each y[[i]] is a vector of the same length as g[[i]]. Now I build up the vector p as follows: p=rep(0,m) for(i in 1:n){p[g[[i]]]=p[g[[i]]]+y[[i]]} Can this loop be vectorized? Thanks! -- TMK -- 212-460-5430 home 917-656-5351 cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CGIwithR
Aimin Yan wrote: I try a example http://omega.psi.iastate.edu/bootstrapFile.html it doesn't give me output. I don't why. For another example http://omega.psi.iastate.edu/trivial.html it seems works except it doesn't display figure. But it gives you an error that states why there is no figure. One has to install and configure CGIwithR for use with the Apache server. It is described in a paper that accompanies the package. Please read that and if you have specific questions, then we can address those. That is where the details are described, so there is little point in cutting and pasting them here too! D. Does anyone know how to figure it out? Aimin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.