[R] Re : Combining two datasets
data3-sort(c(data1,data2)) See ?cbind, ? rbind ?append and ?merge for combining data Justin BEM Elève Ingénieur Statisticien Economiste BP 294 Yaoundé. Tél (00237)9597295. ___ Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! Profitez des connaissances, des opinions et des expériences des internautes sur Yahoo! Questions/Réponses [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sleep Function
Hi R, I am fetching Bloomberg data from R. The problem I face is that I get a downloading error once pasted, but the same code run again will download the data. So, I assure you that it is not the problem with the R code. I will not be able to download the data due to some system capacities...Could I use sleep function of R here? Thanks, Shubha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calling C code from R
Hi! Thanks in advance. I am using R-2.4.0 on Windows XP. I am trying to create dll file. My C code: /* useC1.c */ void useC(int *i) { i[6] = 100; } I have tried to create useC1.dll. C:\R-2.4.0\binR CMD SHLIB useC1.c 'perl' is not recognized as an internal or external command, operable program or batch file. Then I have tried: C:\R-2.4.0\binRcmd SHLIB useC1.c 'perl' is not recognized as an internal or external command, operable program or batch file. I am looking forward for your reply. Regards, Deb Statistician NSW Department of Commerce Sydney Australia. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting part of date variable
Dear all, Suppose I have a date variable: c = 99/05/12 I want to extract the parts of this date like month number, year and day. I can do it in SPSS. Is it possible to do this in R as well? Rgd, - Heres a new way to find what you're looking for - Yahoo! Answers [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimation of discrete unimodal density
Dear All, A method for the estimation is univariate unimodal densities (with unknown mode) is described in Statistical Inference under Order Restrictions by Barlow et al.. Would anyone know whether there is an R-implementation (preferably with reference) for the estimation of univariate discrete unimodal densities (with unknown mode)? Thanks in advance for your help. Kind regards, Wessel van Wieringen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R for bioinformatics
Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. Thanks, Ben -- Benoit Ballester __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting part of date variable
format.Date(c, %d) format.Date(c, %m) format.Date(c, %y) format.Date(c, %Y) On 01/02/07, stat stat [EMAIL PROTECTED] wrote: Dear all, Suppose I have a date variable: c = 99/05/12 I want to extract the parts of this date like month number, year and day. I can do it in SPSS. Is it possible to do this in R as well? Rgd, - Here's a new way to find what you're looking for - Yahoo! Answers [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wavlet filter using morlet mother wavelet
nbsp; Hi, List ,I am searching any package on R which can do wavelet filtering for mother wavelet morlet ,is anybody having any script for the same ?I am new to the RwAVELET ANALSSIS..THANKS IN ADVANCE ANIL KUMAR ANIL KUMAR(nbsp;METEOROLOGIST)LRF SECTIONnbsp;NATIONAL CLIMATEnbsp;CENTER ADGM(RESEARCH)INDIA METEOROLOGICALnbsp;DEPARTMENT SHIVIJI NAGARPUNE-411005 INDIAMOBILE +919422023277[EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calling C code from R
You need to install perl and MinGW, at least. If you have them installed, then you need to properly set PATH environment variable and, probably, restart your command line session. See chapter 5 of the manual Writing R extensions (installed in R_HOME/doc/manual) and these two links http://www.murdoch-sutherland.com/Rtools/ http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/ Also, it would be great to upgrade R to 2.4.1 Deb Midya wrote: I am using R-2.4.0 on Windows XP. I am trying to create dll file. My C code: /* useC1.c */ void useC(int *i) { i[6] = 100; } I have tried to create useC1.dll. C:\R-2.4.0\binR CMD SHLIB useC1.c 'perl' is not recognized as an internal or external command, operable program or batch file. Then I have tried: C:\R-2.4.0\binRcmd SHLIB useC1.c 'perl' is not recognized as an internal or external command, operable program or batch file. -- View this message in context: http://www.nabble.com/-R--Calling-C-code-from-R-tf3154058.html#a8746593 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix of matrices
Dear all, it is likely a stupid question but I cannot solve it. I want to have a matrix of 100 elements. Each element must be a vector of 500 elements. If I do: imp-array(dim=100) imp[1]-vector(length=500) it does not work. Warning message: number of items to replace is not a multiple of replacement length If I do: imp - array(dim=c(100,500)) and then fill imp: for(i in c(1:500)) { imp[i,] - im[1:500,] #im[1:500,] is a vector of length 500, of class numeric. IT CONTAINS NAMES! } Now it works, but I loose the labels (names) associated to the original im variable. If I just do: j- im[1:500,] I do not loose the labels. names(j) = list of labels names(imp[1,]) = NULL Any clue? Thanks in advance! Federico __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] indexing
Hello, In a nutshell, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting part of date variable
Read the help desk article in R News 4/1 about dates and note the table at the end of it, in particular. On 2/1/07, stat stat [EMAIL PROTECTED] wrote: Dear all, Suppose I have a date variable: c = 99/05/12 I want to extract the parts of this date like month number, year and day. I can do it in SPSS. Is it possible to do this in R as well? Rgd, - Here's a new way to find what you're looking for - Yahoo! Answers [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calling C code from R
Deb Midya wrote: Hi! Thanks in advance. I am using R-2.4.0 on Windows XP. I am trying to create dll file. My C code: /* useC1.c */ void useC(int *i) { i[6] = 100; } I have tried to create useC1.dll. C:\R-2.4.0\binR CMD SHLIB useC1.c 'perl' is not recognized as an internal or external command, operable program or batch file. Then I have tried: C:\R-2.4.0\binRcmd SHLIB useC1.c 'perl' is not recognized as an internal or external command, operable program or batch file. I am looking forward for your reply. Did you install Perl? and did you read http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset and http://cran.r-project.org/doc/manuals/R-exts.html#Creating-R-packages? -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting part of date variable
stat stat wrote: Dear all, Suppose I have a date variable: c = 99/05/12 I want to extract the parts of this date like month number, year and day. I can do it in SPSS. Is it possible to do this in R as well? Rgd, Yes. One way is to use substr(), e.g.: substr(c,1,2) [1] 99 as.numeric(substr(c,1,2)) [1] 99 This also nicely sidesteps the ambiguity issue: 1999 or 1899? May or December? On the other hand, you'll get in trouble if leading zeros are sometimes absent (strsplit() or gsub() if you want to pursue that route further). For a more principled approach, use the time and date handling tools. Assuming that you can live with the system defaults for 2-digit years, strptime(c,format=%y/%m/%d) [1] 1999-05-12 strptime(c,format=%y/%m/%d)$year [1] 99 strptime(c,format=%y/%m/%d)$mon [1] 4 strptime(c,format=%y/%m/%d)$mday [1] 12 Beware the peculiarities of the entries defined by POSIX standard, see ?DateTimeClasses, and also: '%y' Year without century (00-99). If you use this on input, which century you get is system-specific. So don't! Often values up to 69 (or 68) are prefixed by 20 and 70(or 69) to 99 by 19. (I'm at a bit of a loss as to fixing up two digit years once the damage has been done. Presumably, you can just diddle the year field, but I'm a bit uneasy about the fact that 2000 was a leap year and 1900 was not.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing
one way is the following: assignation$value[match(x, assignation$class)] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: javier garcia-pintado [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 12:05 PM Subject: [R] indexing Hello, In a nutshell, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing
assignation$value[match(x,assignation$class)] [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 On 2/1/07, javier garcia-pintado [EMAIL PROTECTED] wrote: Hello, In a nutshell, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.spss and encodings
Hi! I'm having trouble with importing spss files containing non-ascii characters (R 2.4.1, debian linux, i386). To reproduce: Download the following file: http://statmath.wu-wien.ac.at/data/spss/de/comphomeneu.sav require (foreign) Sys.setlocale (locale=C) read.spss(comphomeneu.sav)$ARBEIT[1] # prints: # [1] im B\374ro # Levels: im B\374ro zuhause \374 of course is actually a u-umlaut. However, I guess in the C locale it's not expected to print as such. But now try this (use any UTF-8 locale you may have installed): Sys.setlocale (locale=de_DE.UTF-8) read.spss(comphomeneu.sav)$ARBEIT[1] # prints: # [1]Error in print.default(xx, quote = quote, ...) : #invalid multibyte string To me it looks, like read.spss () would probably need an encoding parameter, and / or some iconv () magic. Now, locale conversion always makes my head spin, so I thought I'd better post here, before calling this to be a bug in R. Two questions: 1) Is there some way to work around this, i.e. make sure it is converted to proper UTF-8 while importing? Am I missing something obvious? 2) Should I submit this as a bug report? Thanks! Thomas Friedrichsmeier pgpEhd7gpCdY9.pgp Description: PGP signature __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I calculate conditional mean in a large dataset including date data
Dear R users, I have a dataframe with two columns: first column is date data (e.g. 1/1/2000 with character format: daily data from 1/1/1970 till 31/12/2003) and second column is temperature value. Now I'd like to calculate mean for each month in a year (i.e. May 2001, June 1997) and mean for each month in all of years. As the number of days in some months is different from others I could not write appreciate command for this. Therefore I would greatly appreciate if somebody can help me in this case Thank you Majid Majid Iravani PhD Student Swiss Federal Research Institute WSL Research Group of Vegetation Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Switzerland Phone: +41-1-739-2693 Fax: +41-1-739-2215 Email: [EMAIL PROTECTED] http://www.wsl.ch/staff/majid.iravani/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss and encodings
Thomas Friedrichsmeier wrote: Hi! I'm having trouble with importing spss files containing non-ascii characters (R 2.4.1, debian linux, i386). To reproduce: Download the following file: http://statmath.wu-wien.ac.at/data/spss/de/comphomeneu.sav require (foreign) Sys.setlocale (locale=C) read.spss(comphomeneu.sav)$ARBEIT[1] # prints: # [1] im B\374ro # Levels: im B\374ro zuhause \374 of course is actually a u-umlaut. However, I guess in the C locale it's not expected to print as such. But now try this (use any UTF-8 locale you may have installed): Sys.setlocale (locale=de_DE.UTF-8) read.spss(comphomeneu.sav)$ARBEIT[1] # prints: # [1]Error in print.default(xx, quote = quote, ...) : #invalid multibyte string To me it looks, like read.spss () would probably need an encoding parameter, and / or some iconv () magic. Now, locale conversion always makes my head spin, so I thought I'd better post here, before calling this to be a bug in R. Two questions: 1) Is there some way to work around this, i.e. make sure it is converted to proper UTF-8 while importing? Am I missing something obvious 2) Should I submit this as a bug report? 1) Yes, 2) No This is really not in read.spss, but in R itself. The short version is that in released versions, we have Im B\374ro [1]Error: invalid multibyte string which is indeed a buglet, since it is not good if you cannot output what you can input (notice that there is no problem until you try to print). In r-devel, this has become Im B\374ro [1] Im B\xfcro so that invalid multibytes at least do not cause error. However, the real issue is that the string is in the wrong encoding for your locale, so you should convert it: iconv(Im B\xfcro, from=latin1, to=UTF-8) [1] Im Büro iconv(Im B\374ro,from=latin1, to=UTF-8) [1] Im Büro -p -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory-efficient column aggregation of a sparse matrix
On 1/31/07, Jon Stearley [EMAIL PROTECTED] wrote: I need to sum the columns of a sparse matrix according to a factor - ie given a sparse matrix X and a factor fac of length ncol(X), sum the elements by column factors and return the sparse matrix Y of size nrow(X) by nlevels(f). The appended code does the job, but is unacceptably memory-bound because tapply() uses a non-sparse representation. Can anyone suggest a more memory and cpu efficient approach? Eg, a sparse matrix tapply method? Thanks. This is the sort of operation that is much more easily performed in the triplet representation of a sparse matrix where each nonzero element is represented by its row index, column index and value. Using that representation you could map the column indices according to the factor then convert back to one of the other representations. The only question would be what to do about nonzeros in different columns of the original matrix that get mapped to the same element in the result. It turns out that in the sparse matrix code used by the Matrix package the triplet representation allows for duplicate index positions with the convention that the resulting value at a position is the sum of the values of any triplets with that index pair. If you decide to use this approach please be aware that the indices for the triplet representation in the Matrix package are 0-based (as in C code) not 1-based (as in R code). (I imagine that Martin is thinking we really should change that as he reads this part.) -- +--+ | Jon Stearley (505) 845-7571 (FAX 844-9297) | | Sandia National Laboratories Scalable Systems Integration | +--+ # x and y are of SparseM class matrix.csr aggregate.csr - function(x, fac) { # make a vector indicating the row of each nonzero rows - integer(length=length([EMAIL PROTECTED])) [EMAIL PROTECTED]:nrow(x)]] - 1 # put a 1 at start of each row rows - as.integer(cumsum(rows)) # and finish with a cumsum # make a vector indicating the column factor of each nonzero f - [EMAIL PROTECTED] # aggregate by row,f y - tapply([EMAIL PROTECTED], list(rows,f), sum) # sparsify it y[is.na(y)] - 0 # change tapply NAs to as.matrix.csr 0s y - as.matrix.csr(y) y } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix of matrices
For the case someone is interested in it, here it is the solution somebody suggested me: to use a list. imp - vector(list, 100) imp[[1]] - im[1:500,] names(imp[[1]]) = the list of labels of imp[1:500,] Thanks! Federico Federico Abascal wrote: Dear all, it is likely a stupid question but I cannot solve it. I want to have a matrix of 100 elements. Each element must be a vector of 500 elements. If I do: imp-array(dim=100) imp[1]-vector(length=500) it does not work. Warning message: number of items to replace is not a multiple of replacement length If I do: imp - array(dim=c(100,500)) and then fill imp: for(i in c(1:500)) { imp[i,] - im[1:500,] #im[1:500,] is a vector of length 500, of class numeric. IT CONTAINS NAMES! } Now it works, but I loose the labels (names) associated to the original im variable. If I just do: j- im[1:500,] I do not loose the labels. names(j) = list of labels names(imp[1,]) = NULL Any clue? Thanks in advance! Federico __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] features of save and save.image (unexpected file sizes)
Hi, On 2/1/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Thu, 1 Feb 2007, Vaidotas Zemlys wrote: Hi, On 1/31/07, Professor Brian Ripley [EMAIL PROTECTED] wrote: Two comments: 1) ls() does not list all the objects: it has all.names argument. Yes, I tried it with all.names, but the effect was the same, I forgot to mention it in a letter. 2) save.image() does not just save the objects in the workspace, it also saves any environments they may have. Having a function with a large environment is the usual cause of a large saved image. I have little experience dealing with enivronments, so is there a quick way to discard the environments of the functions? When saving the session I really do not need them. Change, not discard. E.g. environment(f) - .GlobalEnv. If environments are not mentioned by anything saved, they will not be saved. I found the culprit. I was parsing formulas in my code, and I saved them in that large object. So the environment came with saved formulas. Is there a nice way to say R: please do not save the environments with the formulas, I do not need them? This is what I was doing (I am discarding irrelevant code) testf- function(formula) { mainform - formula if(deparse(mainform[[3]][[1]])!=|) pandterm(invalid conditioning for main regression) mmodel - substitute(y~x,list(y=mainform[[2]],x=mainform[[3]][[2]])) mmodel - as.formula(mmodel) list(formula=list(main=mmodel)) } when called bu - testf(lnp~I(CE/12000)+hhs|Country) I get ls(env=environment(bu$formula$main)) [1] formula mainform mmodel or in actual case, a lot of more objects, which I do not need, but which take a lot of place. For the moment I solved the problem with environment(mmodel) - NULL but is this correct R way? Vaidotas Zemlys -- Doctorate student, http://www.mif.vu.lt/katedros/eka/katedra/zemlys.php Vilnius University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory-efficient column aggregation of a sparse matrix
Doug is right, I think, that this would be easier with full indexing using the matrix.coo classe, if you want to use SparseM. But then the tapply seems to be the way to go. url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Feb 1, 2007, at 7:22 AM, Douglas Bates wrote: On 1/31/07, Jon Stearley [EMAIL PROTECTED] wrote: I need to sum the columns of a sparse matrix according to a factor - ie given a sparse matrix X and a factor fac of length ncol(X), sum the elements by column factors and return the sparse matrix Y of size nrow(X) by nlevels(f). The appended code does the job, but is unacceptably memory-bound because tapply() uses a non-sparse representation. Can anyone suggest a more memory and cpu efficient approach? Eg, a sparse matrix tapply method? Thanks. This is the sort of operation that is much more easily performed in the triplet representation of a sparse matrix where each nonzero element is represented by its row index, column index and value. Using that representation you could map the column indices according to the factor then convert back to one of the other representations. The only question would be what to do about nonzeros in different columns of the original matrix that get mapped to the same element in the result. It turns out that in the sparse matrix code used by the Matrix package the triplet representation allows for duplicate index positions with the convention that the resulting value at a position is the sum of the values of any triplets with that index pair. If you decide to use this approach please be aware that the indices for the triplet representation in the Matrix package are 0-based (as in C code) not 1-based (as in R code). (I imagine that Martin is thinking we really should change that as he reads this part.) -- +--+ | Jon Stearley (505) 845-7571 (FAX 844-9297) | | Sandia National Laboratories Scalable Systems Integration | +--+ # x and y are of SparseM class matrix.csr aggregate.csr - function(x, fac) { # make a vector indicating the row of each nonzero rows - integer(length=length([EMAIL PROTECTED])) [EMAIL PROTECTED]:nrow(x)]] - 1 # put a 1 at start of each row rows - as.integer(cumsum(rows)) # and finish with a cumsum # make a vector indicating the column factor of each nonzero f - [EMAIL PROTECTED] # aggregate by row,f y - tapply([EMAIL PROTECTED], list(rows,f), sum) # sparsify it y[is.na(y)] - 0 # change tapply NAs to as.matrix.csr 0s y - as.matrix.csr(y) y } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Index mapping on arrays
Dear R-community, I have some trouble with index mappings for arrays. If, for example, I have the array R A - array(1:9, c(3,3,2)) and two index mappings both of same size R x - c(2, 3) R y - c(1, 2) Now I want to access the elements (A[1, x[i], y[i]])_i of A, i.e. A[1, x[1], y[1]] = A[1, 2, 1] and A[1, x[2], y[2]] = A[1, 3, 2]. If I use R A[1, x, y] I would get every combinations of indices of all elements of x and y i.e. A[1, x[1], y[1]], A[1, x[1], y[2]], A[1, x[2], y[1]] and A[1, x[2], y[2]]. But how can I access the elements (A[1, x[i], y[i]])_i. My arrays dimensions are actually large in the second component (for example the dimension might be 10*1*10) so I'm looking for a method avoiding loops. The question is probably trivial for you, but I just could not figure it out. So sorry for bugging you and many thanks in advance for any help. Best wishes, Demi Anderson. -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wiki for Graphics tips for MacOS X
Gabor Grothendieck ggrothendieck at gmail.com writes: To get the best results you need to transfer it using vector graphics rather than bitmapped graphics: http://www.stc-saz.org/resources/0203_graphics.pdf There are a number of variations described here (see entire thread). Its for UNIX and Windows but I think it would likely work similarly on Mac and Windows: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32297.html yes, but: the whole point of this discussion was that there _is_ no vector format that anyone knows of that (1) can be reliably created on MacOS using R/open source tools and (2) can be reliably imported into MS Word (with working preview etc.). The thread you reference assumes that one has a Windows machine handy (with or without R installed) for creating WMF graphics. Hence the advice to create a high-resolution PNG, which seems to work well enough even if it is not optimal. cheers Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any implementations for adaptive modeling of time series?
Hi Peter, generally speaking, wavelets are known to be good at extracting signal from noisy data and are adaptive but I am not familiar with any R implementation of wavelets. A simple way of looking at changes would be to use CUSUM (strucchange package). I hope this helps. Ansel. On 1/30/07, Peter Nimda [EMAIL PROTECTED] wrote: Hallo, my noisy time series represent a fading signal comprising of long enough parts with a simple trend inside of each such a part. Transition from one part into another is always a non-smooth and very sharp/acute. In other words I have a piecewise polynomial noisy curve asymptotically converging to the biased constant, points between pieces are non-differentiable. I am looking for implementations of models adequate for such a data. Are there any possibilities to adapt the ARIMA or MCMC? Many thanks in advance for any help/URLs __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimation of discrete unimodal density
Wessel van Wieringen wrote: A method for the estimation is univariate unimodal densities (with unknown mode) is described in Statistical Inference under Order Restrictions by Barlow et al.. Would anyone know whether there is an R-implementation (preferably with reference) for the estimation of univariate discrete unimodal densities (with unknown mode)? Thanks in advance for your help. You could have a look at my ``isotonic'' package. Go to: http://www.math.unb.ca/~rolf/Research/Packages/ Click on ``gzipped tar file for R'' under ``isotonic''. cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Index mapping on arrays
probably you want something like the following: A[cbind(rep(1, length(x)), x, y)] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Demi Anderson [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 2:34 PM Subject: [R] Index mapping on arrays Dear R-community, I have some trouble with index mappings for arrays. If, for example, I have the array R A - array(1:9, c(3,3,2)) and two index mappings both of same size R x - c(2, 3) R y - c(1, 2) Now I want to access the elements (A[1, x[i], y[i]])_i of A, i.e. A[1, x[1], y[1]] = A[1, 2, 1] and A[1, x[2], y[2]] = A[1, 3, 2]. If I use R A[1, x, y] I would get every combinations of indices of all elements of x and y i.e. A[1, x[1], y[1]], A[1, x[1], y[2]], A[1, x[2], y[1]] and A[1, x[2], y[2]]. But how can I access the elements (A[1, x[i], y[i]])_i. My arrays dimensions are actually large in the second component (for example the dimension might be 10*1*10) so I'm looking for a method avoiding loops. The question is probably trivial for you, but I just could not figure it out. So sorry for bugging you and many thanks in advance for any help. Best wishes, Demi Anderson. -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test() references
On Thu, 2007-02-01 at 07:22 +0100, Jean lobry wrote: Dear R-help, I'm using prop.test() to compute a confidence interval for a proportion under R version 2.4.1, as in: prop.test(x = 340, n = 400)$conf [1] 0.8103309 0.8827749 I have two questions: 1) from the source code my understanding is that the confidence interval is computed according to Wilson, E.B. (1927) Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc., 22:209-212. Is it correct? Yes. 2) The doc says Continuity correction is used only if it does not exceed the difference between sample and null proportions in absolute value. Does someone has a reference in which this point is discussed? I believe that this is a modification by Newcombe. See: Newcombe RG: Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine 1998;17:857-872. Newcombe RG: Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods. Statistics in Medicine 1998;17:873-890. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wiki for Graphics tips for MacOS X
On 2/1/07, Ben Bolker [EMAIL PROTECTED] wrote: Gabor Grothendieck ggrothendieck at gmail.com writes: To get the best results you need to transfer it using vector graphics rather than bitmapped graphics: http://www.stc-saz.org/resources/0203_graphics.pdf There are a number of variations described here (see entire thread). Its for UNIX and Windows but I think it would likely work similarly on Mac and Windows: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32297.html yes, but: the whole point of this discussion was that there _is_ no vector format that anyone knows of that (1) can be reliably created on MacOS using R/open source tools and (2) can be reliably imported into MS Word (with working preview etc.). The thread you reference assumes that one has a Windows machine handy (with or without R installed) for creating WMF graphics. Hence the advice to create a high-resolution PNG, which seems to work well enough even if it is not optimal. AFAIK there do exist tools for the Mac for fig graphics and that was one of the several solutions proposed there. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what is the purpose of an error message in uniroot?
Matt, Some time back I didn't like the uniroot restriction either so I wrote a short function manyroots that breaks an interval into many shorter intervals and looks for a single root in each of them. This function is NOT guaranteed to find all roots in an interval even if you specify many subintervals. Graphing is always a good idea. There is always a chance the function dips to or below the axis and back up in an arbitrarily short interval. For example, f(x) = x^2. If one of your subintervals doesn't happen to end at 0, you're out of luck. (This is in contrast to uniroot working on a continuous function that is positive at one end of an interval and negative at the other end: at least one root is guaranteed and uniroot will find it given enough iterations.) Furthermore, if you are working with a polynomial, just use polyroot. Chris (Sorry for the dearth of code comments in the following) manyroots - function(f, interval, ints=1, maxlen=NULL, lower = min(interval), upper = max(interval), tol = .Machine$double.eps^0.25, maxiter = 1000, ...) { if (!is.numeric(lower) || !is.numeric(upper) || lower = upper) stop(lower upper is not fulfilled) if (is.infinite(lower) || is.infinite(upper)) stop(Interval must have finite length) if (!is.null(maxlen)) ints - ceiling((upper-lower)/maxlen) if (!is.numeric(ints) || length(ints)1 || floor(ints)!=ints || ints1) stop(ints must be positive integer) ends - seq(lower, upper, length=ints+1) fends - numeric(length(ends)) for (i in seq(along=ends)) fends[i] - f(ends[i], ...) zeros - iters - prec - rep(NA, ints) for (i in seq(ints)) { cat(i, ends[i], ends[i+1], fends[i], fends[i+1], \n) if (fends[i] * fends[i+1] 0) { #cat(f() values at end points not of opposite sign\n) next; } if (fends[i] == 0 i1) { #cat(this was found in previous iteration\n) next; } val - .Internal(zeroin(function(arg) f(arg, ...), ends[i], ends[i+1], tol, as.integer(maxiter))) if (as.integer(val[2]) == maxiter) { warning(Iteration limit (, maxiter, ) reached in interval (, ends[i], ,, ends[i+1], ).) } zeros[i] - val[1] iters[i] - val[2] prec[i] - val[3] } zeros - as.vector(na.omit(zeros)) fzeros - numeric(length(zeros)) for (i in seq(along=zeros)) fzeros[i] - f(zeros[i], ...) list(root = zeros, f.root = fzeros, iter = as.vector(na.omit(iters)), estim.prec = as.vector(na.omit(prec))) } gg - function(x) x*(x-1)*(x+1) manyroots(gg, c(-4,4), 13, maxiter=200, tol=10^-10) hh - function(x,x2) x^2-x2 manyroots(hh, c(-10, 10), maxlen=.178, x2=9) manyroots(sin, c(-4,20), maxlen=.01) #but ss - function(x) sin(x)^2 manyroots(ss, c(-4,20), maxlen=.01) plot(ss, -4,20) abline(h=0) From: [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: This is probably a blindingly obvious question: Yes, it is. Why does it matter in the uniroot function whether the f() values at the end points that you supply are of the same sign? Plot some graphs. Think about the *name* of the function --- *uni*root. Does that ring any bells? And how do you know there *is* a root in the interval in question? Try your ``uniroot2'' on f(x) = 1+x2 and the interval [-5,5]. To belabour the point --- if the f() values are of the same sign, then there are 0, or 2, or 4, or roots in the interval in question. Rolf, Only if f is continuous (of course finding roots of discontinuous functions is a greater challenge) The ***only chance*** you have of there being a unique root is if the f() values are of opposite sign. The algorithm used and the precision estimates returned presumably depend on the change of sign. You can get answers --- sometimes --- if the change of sign is not present, but the results could be seriously misleading. Without the opposite sign requirement the user will often wind up trying to do something impossible or getting results about which he/she is deluded. cheers, Rolf Turner [EMAIL PROTECTED] P. S. If the f() values are of the same sign, uniroot() DOES NOT give a warning! It gives an error. R. T. -- Christopher Andrews, PhD SUNY Buffalo, Department of Biostatistics 242 Farber Hall, [EMAIL PROTECTED], 716 829 2756 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] How can I calculate conditional mean in a large dataset including date data
dfr-data.frame(day=c(1/1/1970,5/1/1970,5/12/2003,31/12/2003),temperature=c(1,-1,2,0.5)) dfr day temperature 1 1/1/1970 1.0 2 5/1/1970-1.0 3 5/12/2003 2.0 4 31/12/2003 0.5 aggregate(dfr[temperature],by=list(format(as.Date(dfr$day,format=%d/%m/%Y),%m-%Y)),mean,na.rm=TRUE) Group.1 temperature 1 01-19700.00 2 12-20031.25 aggregate(dfr[temperature],by=list(format(as.Date(dfr$Dt,format=%d/%m/%Y),%m)),mean,na.rm=TRUE) Group.1 temperature 1 010.00 2 121.25 Majid Iravani wrote: Dear R users, I have a dataframe with two columns: first column is date data (e.g. 1/1/2000 with character format: daily data from 1/1/1970 till 31/12/2003) and second column is temperature value. Now I'd like to calculate mean for each month in a year (i.e. May 2001, June 1997) and mean for each month in all of years. As the number of days in some months is different from others I could not write appreciate command for this. Therefore I would greatly appreciate if somebody can help me in this case Thank you Majid -- View this message in context: http://www.nabble.com/-R--How-can-I-calculate-conditional-mean-in-a-large-dataset-including-date-data-tf3154751.html#a8748821 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss and encodings
On Thursday 01 February 2007 14:18, Peter Dalgaard wrote: so you should convert it: iconv(Im B\xfcro, from=latin1, to=UTF-8) [1] Im Büro iconv(Im B\374ro,from=latin1, to=UTF-8) [1] Im Büro I see. Thanks! Any chances of adding something like this to read.spss()? read.spss - function([...], encoding=NULL) { [...] if (!is.null(encoding)) { iconv.recursive - function(x, from) { attribs - attributes(x); if (is.character(x)) { x - iconv(x, from=from, to=, sub=) } else if (is.list(x)) { x - lapply(x, function(sub) iconv.recursive(sub, from)) } # convert factor levels and all other attributes attributes(x) - lapply(attribs, function(sub) iconv.recursive(sub, from)) x } convert.recursive(rval, from=encoding) } else { rval } } Now that I've written this iconv.recursive() function once, I'm fine. But I guess something like this might be useful to others as well. Regards Thomas pgpzGuoLD95Bi.pgp Description: PGP signature __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Re: read.spss and encodings
--- John Kane [EMAIL PROTECTED] wrote: Date: Thu, 1 Feb 2007 09:07:11 -0500 (EST) From: John Kane [EMAIL PROTECTED] Subject: Re: [R] read.spss and encodings To: Thomas Friedrichsmeier [EMAIL PROTECTED] Hi Thomas, I am using R 2.4.1 on WindowsXP and I don't seem to be having any problem, im Büro, and zuhause are coming in just fine in a 200 line dataset. I have imported it with both read.spss and spss.get (package Hmisc) with no problems. I am afraid I have no idea what the problem is but it does not seem to be specifically an R problem --- Thomas Friedrichsmeier [EMAIL PROTECTED] wrote: Hi! I'm having trouble with importing spss files containing non-ascii characters (R 2.4.1, debian linux, i386). To reproduce: Download the following file: http://statmath.wu-wien.ac.at/data/spss/de/comphomeneu.sav require (foreign) Sys.setlocale (locale=C) read.spss(comphomeneu.sav)$ARBEIT[1] # prints: # [1] im B\374ro # Levels: im B\374ro zuhause \374 of course is actually a u-umlaut. However, I guess in the C locale it's not expected to print as such. But now try this (use any UTF-8 locale you may have installed): Sys.setlocale (locale=de_DE.UTF-8) read.spss(comphomeneu.sav)$ARBEIT[1] # prints: # [1]Error in print.default(xx, quote = quote, ...) : #invalid multibyte string To me it looks, like read.spss () would probably need an encoding parameter, and / or some iconv () magic. Now, locale conversion always makes my head spin, so I thought I'd better post here, before calling this to be a bug in R. Two questions: 1) Is there some way to work around this, i.e. make sure it is converted to proper UTF-8 while importing? Am I missing something obvious? 2) Should I submit this as a bug report? Thanks! Thomas Friedrichsmeier __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Do You Yahoo!? protection around http://mail.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I calculate conditional mean in a large dataset including date data
days - seq(as.Date(1970/1/1), as.Date(2003/12/31), days) temp - rnorm(length(days), mean=10, sd=8) tapply(temp, format(days,%Y-%m), mean) tapply(temp, format(days,%b), mean) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Majid Iravani Sent: Thursday, February 01, 2007 8:11 AM To: r-help@stat.math.ethz.ch Subject: [R] How can I calculate conditional mean in a large dataset including date data Dear R users, I have a dataframe with two columns: first column is date data (e.g. 1/1/2000 with character format: daily data from 1/1/1970 till 31/12/2003) and second column is temperature value. Now I'd like to calculate mean for each month in a year (i.e. May 2001, June 1997) and mean for each month in all of years. As the number of days in some months is different from others I could not write appreciate command for this. Therefore I would greatly appreciate if somebody can help me in this case Thank you Majid -- -- Majid Iravani PhD Student Swiss Federal Research Institute WSL Research Group of Vegetation Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Switzerland Phone: +41-1-739-2693 Fax: +41-1-739-2215 Email: [EMAIL PROTECTED] http://www.wsl.ch/staff/majid.iravani/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting part of date variable
Thank you Peter. It was not my question but I was just about to start the morning's work by searching Help and RSiteSearch() for this exact question. --- Peter Dalgaard [EMAIL PROTECTED] wrote: stat stat wrote: Dear all, Suppose I have a date variable: c = 99/05/12 I want to extract the parts of this date like month number, year and day. I can do it in SPSS. Is it possible to do this in R as well? Rgd, Yes. One way is to use substr(), e.g.: substr(c,1,2) [1] 99 as.numeric(substr(c,1,2)) [1] 99 This also nicely sidesteps the ambiguity issue: 1999 or 1899? May or December? On the other hand, you'll get in trouble if leading zeros are sometimes absent (strsplit() or gsub() if you want to pursue that route further). For a more principled approach, use the time and date handling tools. Assuming that you can live with the system defaults for 2-digit years, strptime(c,format=%y/%m/%d) [1] 1999-05-12 strptime(c,format=%y/%m/%d)$year [1] 99 strptime(c,format=%y/%m/%d)$mon [1] 4 strptime(c,format=%y/%m/%d)$mday [1] 12 Beware the peculiarities of the entries defined by POSIX standard, see ?DateTimeClasses, and also: '%y' Year without century (00-99). If you use this on input, which century you get is system-specific. So don't! Often values up to 69 (or 68) are prefixed by 20 and 70(or 69) to 99 by 19. (I'm at a bit of a loss as to fixing up two digit years once the damage has been done. Presumably, you can just diddle the year field, but I'm a bit uneasy about the fact that 2000 was a leap year and 1900 was not.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Index mapping on arrays
Hello Demi The trick for array indexing on an array A, where length(dim(A))==n, is to use an n-column matrix M to extract the elements by rows of M. If I understand correctly, the following should help: A - array(1:18,c(3,3,2)) x - 2:3 y - 1:2 A[cbind(x,y,1)] [1] 2 6 You may find the following useful too: A[as.matrix(cbind(1,expand.grid(x,y)))] [1] 4 7 13 16 best rksh On 1 Feb 2007, at 13:34, Demi Anderson wrote: Dear R-community, I have some trouble with index mappings for arrays. If, for example, I have the array R A - array(1:9, c(3,3,2)) and two index mappings both of same size R x - c(2, 3) R y - c(1, 2) Now I want to access the elements (A[1, x[i], y[i]])_i of A, i.e. A[1, x[1], y[1]] = A[1, 2, 1] and A[1, x[2], y[2]] = A[1, 3, 2]. If I use R A[1, x, y] I would get every combinations of indices of all elements of x and y i.e. A[1, x[1], y[1]], A[1, x[1], y[2]], A[1, x[2], y[1]] and A[1, x[2], y[2]]. But how can I access the elements (A[1, x[i], y[i]])_i. My arrays dimensions are actually large in the second component (for example the dimension might be 10*1*10) so I'm looking for a method avoiding loops. The question is probably trivial for you, but I just could not figure it out. So sorry for bugging you and many thanks in advance for any help. Best wishes, Demi Anderson. -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 -- This e-mail (and any attachments) is confidential and intend...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xtable and column headings
When I generate a LaTeX table using xtable I have been setting column names to strings with LaTeX code in order to get features like subscripts in the column headings. I recently had to reinstall xtable and discovered that all my LaTeX column headings were printing out in LaTeX code rather than with LaTeX formatting. For example, with the older xtable I could give my column a name something like $A_b$ to get printed column heading of A with the subscript b. Now my printed column heading is $A_b$ and the LaTeX code in the .tex file generated by Sweave is \$A\_b\$. It seems that the newest version of print.xtable takes all my LaTeX special characters and inserts backslashes, making LaTeX print the special characters rather than interpreting them. Is there a way to keep xtable from fixing my column names like this? Is there another (maybe better) way to get nicely LaTeX formatted column headings from xtable? Thanks, Ian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I calculate conditional mean in a large dataset including date data
You could also use aggregate with the zoo package. Using the same input data that Vladimir used, create a zoo variable and aggregate it: library(zoo) z - zoo(dfr[,2], as.Date(dfr[,1], %d/%m/%Y)) aggregate(z, as.yearmon, mean) Jan 1970 Dec 2003 0.00 1.25 zoo is described in the vignette: library(zoo) vignette(zoo) On 2/1/07, Vladimir Eremeev [EMAIL PROTECTED] wrote: dfr-data.frame(day=c(1/1/1970,5/1/1970,5/12/2003,31/12/2003),temperature=c(1,-1,2,0.5)) dfr day temperature 1 1/1/1970 1.0 2 5/1/1970-1.0 3 5/12/2003 2.0 4 31/12/2003 0.5 aggregate(dfr[temperature],by=list(format(as.Date(dfr$day,format=%d/%m/%Y),%m-%Y)),mean,na.rm=TRUE) Group.1 temperature 1 01-19700.00 2 12-20031.25 aggregate(dfr[temperature],by=list(format(as.Date(dfr$Dt,format=%d/%m/%Y),%m)),mean,na.rm=TRUE) Group.1 temperature 1 010.00 2 121.25 Majid Iravani wrote: Dear R users, I have a dataframe with two columns: first column is date data (e.g. 1/1/2000 with character format: daily data from 1/1/1970 till 31/12/2003) and second column is temperature value. Now I'd like to calculate mean for each month in a year (i.e. May 2001, June 1997) and mean for each month in all of years. As the number of days in some months is different from others I could not write appreciate command for this. Therefore I would greatly appreciate if somebody can help me in this case Thank you Majid -- View this message in context: http://www.nabble.com/-R--How-can-I-calculate-conditional-mean-in-a-large-dataset-including-date-data-tf3154751.html#a8748821 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] line plot
Hi, or otherwise you may try: plot(c(1,5), c(1,10),type=l) kindest regard, Rense On Feb 1, 2007, at 8:14 , Petr Pikal wrote: Hi see ?segments segments(1,10,5,10) HTH Petr On 1 Feb 2007 at 14:21, XinMeng wrote: From: XinMeng [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Date sent:Thu, 01 Feb 2007 14:21:34 +0800 Subject: [R] line plot Send reply to:XinMeng [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Hello sir: I wanna get such kind of plot: a line whose start point is(1,10),end point is(5,10) In other words: How can I draw a line if I only know the coordinate of the start point and end point? Thanks! My best __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help writing a faster code
Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mca-graphics: all elements overlapping in the help-example for multiple correspondence analysis
Thank you very much, that works fine! I now realize that I should have looked up not only the help pages for plot and mca but also for plot.mca, which I did not think possible (unfortunately I am a too sporadic user to know where to get the appropriate information). Best regards, Michael -Ursprüngliche Nachricht- Von: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 31. Januar 2007 10:19 An: Michael Reinecke Cc: r-help@stat.math.ethz.ch Betreff: Re: [R] mca-graphics: all elements overlapping in the help-example for multiple correspondence analysis On Wed, 31 Jan 2007, Michael Reinecke wrote: Dear all, I tried out the example in the help document for mca (the multiple correspondence analysis of the MASS package): farms.mca - mca(farms, abbrev=TRUE) farms.mca plot(farms.mca) But the graphic that I get seems unfeasible to me: I cannot recognize the numbers (printed in black) because they are all overlapping and concealing each other. I don ´t dare using my own data, which consist of several hundred cases - I guess I won ´t see anything. How can I solve this? Thank you for any idea! Some levels do overplot, as they are identical (this is an unusual example). But as you see in the book, not many, and you can adjust pointsize of your device or 'cex' to mitigate the problem. Plotting the rows is optional: see the help page. I would not recommend plotting rows for several hundred cases. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading functions in R
Hi Jeff, The way I do this is to place all the options that I want, along with functions I've written that I always want available, into the Rprofile.site file. R always loads this file upon startup. That file is located in the etc/ folder. E.g., on my computer, it is at: C:\Program Files\R\R-2.2.1\etc\Rprofile.site . This is explained in section 10.8 of the R-intro.pdf manual that comes with R. If you only want the functions available sometimes, then use Christos' suggestions. -- Matt On 1/31/07, Christos Hatzis [EMAIL PROTECTED] wrote: The recommended approach is to make a package for your functions that will include documentation, error checks etc. Another way to accomplish what you want is to start a new R session and 'source' your .R files and then to save the workspace in a .RData file, e.g. myFunctions.RData. Finally attach(myFunctions.RData) should do the trick without cluttering your workspace. -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Forest Floor Sent: Wednesday, January 31, 2007 10:41 PM To: r-help@stat.math.ethz.ch Subject: [R] Loading functions in R Hi all, This information must be out there, but I can't seem to find it. What I want to do is to store functions I've created (as .R files or in whatever form) and then load them when I need them (or on startup) so that I can access without cluttering my program with the function code. This seems like it should be easy, but Thanks! Jeff __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help writing a faster code
the following seems to be a first improvement: m - 2000 n - 5000 A - matrix(rnorm(2*m), ncol=2) B - matrix(rnorm(2*n), ncol=2) W1 - W2 - matrix(0, m, n) ## ## g1 - function(x, y){ theta - atan((y[2] - x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } invisible({gc(); gc()}) system.time(for (i in 1:m) { W1[i, ] - apply(B, 1, y = A[i,], function(x, y) g1(y, x)) }) ## g2 - function(x){ out - tB - x theta - atan(out[2, ] / out[1, ]) theta + 2*pi*(theta 0) } tB - t(B) invisible({gc(); gc()}) system.time(for (i in 1:m) { W2[i, ] - g2(A[i, ]) }) ## or invisible({gc(); gc()}) system.time(W3 - t(apply(A, 1, g2))) all.equal(W1, W2) all.equal(W1, W3) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 4:10 PM Subject: [R] Need help writing a faster code Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help writing a faster code
Hi A - matrix(runif(10),ncol=2) B - matrix(runif(10),ncol=2) g - function(i4){theta - atan2( (i4[4]-i4[2]),(i4[3]-i4[1])) + return(theta + 2*pi*(theta0))} apply(A,1,function(x){apply(B,1,function(y){g(c(x,y))})}) [,1] [,2] [,3][,4] [,5] [1,] 1.1709326 2.6521457 3.857477 0.219274562 1.2948374 [2,] 1.1770919 4.2109056 4.057313 4.918552967 1.9733967 [3,] 0.9171661 0.6721475 4.193675 0.434253839 0.9781060 [4,] 0.9181475 0.6911804 4.213295 0.455127422 0.9771797 [5,] 1.0467449 4.9263243 3.983248 0.004371504 1.1693707 HTH rksh On 1 Feb 2007, at 15:10, Ravi Varadhan wrote: Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. -- -- --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/ Varadhan.html -- -- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 -- This e-mail (and any attachments) is confidential and intend...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss and encodings
On Thu, 1 Feb 2007, Thomas Friedrichsmeier wrote: Hi! I'm having trouble with importing spss files containing non-ascii characters Peter has explained what is going on. It would be ideal for read.spss() to do the translation to the current locale. This would require knowing what encoding the SPSS file is using. I think it is always a one-byte encoding and in your case it is apparently Latin-1, but I don't know if this is always the case, or how to tell which encoding it uses. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xtable and column headings
I would use latex() in the Hmsic package. Here is a short example tmp - matrix(1:12,4) library(Hmisc) tmp.latex - latex(tmp, colheads=c(abc$_1$,def$^{12}_4$,$g\\times h$)) ## note the escaped \ in the above colheads vector print.default(tmp.latex) Copy the contents of the file referenced in tmp.latex to your real myfile.tex file. There are about a zillion optional arguments to latex() that give you very fine control over the appearance of the typeset object. See ?latex __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [lattice] levelplot for 2D density plots
Hello all, I'm trying to use the levelplot lattice function and can not adapt it to my tastes concering colors: dens - data.frame(x=c(), y=c(), z=c(), run=c()) for(l in levels(degCorrel$run)) { ind - degCorrel$run == l dk - kde2d(log10(degCorrel$correlFunc[ind]), log10(degCorrel $correlFunc.ref[ind]), n=50) dt - cbind(con2tr(dk), run=l) dt$z - dt$z/sum(dt$z) dens - rbind(dens, dt) } dens$run - factor(dens$run) levelplot(z ~ x *y | run, data=dens) However, I need to adjust the cuts for every panel differently since the scales are very different. I know, that this is not a very good practice, but anyway, how can I do it? Any help is greatly appreciated. Thanks in advance, Sebastian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] indexing without looping
Hello, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help writing a faster code
Thank you, Dimitris and Robin. Dimitris - your solution(s) works very well. Although my g function is a lot more complicated than that in the simple example that I gave, I think that I can use your idea of taking the whole matrix inside the function and working directly with it. Robin - using two applys doesn't make the code any faster, it just produces a compact one-liner. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Dimitris Rizopoulos [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 10:33 AM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Need help writing a faster code the following seems to be a first improvement: m - 2000 n - 5000 A - matrix(rnorm(2*m), ncol=2) B - matrix(rnorm(2*n), ncol=2) W1 - W2 - matrix(0, m, n) ## ## g1 - function(x, y){ theta - atan((y[2] - x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } invisible({gc(); gc()}) system.time(for (i in 1:m) { W1[i, ] - apply(B, 1, y = A[i,], function(x, y) g1(y, x)) }) ## g2 - function(x){ out - tB - x theta - atan(out[2, ] / out[1, ]) theta + 2*pi*(theta 0) } tB - t(B) invisible({gc(); gc()}) system.time(for (i in 1:m) { W2[i, ] - g2(A[i, ]) }) ## or invisible({gc(); gc()}) system.time(W3 - t(apply(A, 1, g2))) all.equal(W1, W2) all.equal(W1, W3) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 4:10 PM Subject: [R] Need help writing a faster code Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] when i configure the R 2.4.1,i meet the problem
i get the message : configure: WARNING: you cannot build info or html versions of the R manuals how to deal with it ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing without looping
One way would be to use merge, like this: merge(assignation,data.frame(class=x),all.y=TRUE) There might well be better ways... On 01/02/07, javier garcia-pintado [EMAIL PROTECTED] wrote: Hello, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier -- Javier García-Pintado Institute of Earth Sciences Jaume Almera (CSIC) Lluis Sole Sabaris s/n, 08028 Barcelona Phone: +34 934095410 Fax: +34 934110012 e-mail:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing without looping
Hi, xt - assignation$value[match(x,assignation$class)] HTH ido __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing
a - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) x - c(1,1,2,7,6,5,4,3,2,2,2) match(x, a$class) [1] 1 1 4 NA NA 3 NA 2 4 4 4 a[match(x, a$class), value] [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 -- Tony Plate javier garcia-pintado wrote: Hello, In a nutshell, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help writing a faster code
Dear Dimitris, I implemented your solution on my actual problem. I was able to generate my large transition matrix in 56 seconds, compared to the previous time of around 27 minutes. Wow!!! I thank you very much for the help. R and the R-user group are truly amazing! Best regards, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Dimitris Rizopoulos [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 10:33 AM To: Ravi Varadhan Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Need help writing a faster code the following seems to be a first improvement: m - 2000 n - 5000 A - matrix(rnorm(2*m), ncol=2) B - matrix(rnorm(2*n), ncol=2) W1 - W2 - matrix(0, m, n) ## ## g1 - function(x, y){ theta - atan((y[2] - x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } invisible({gc(); gc()}) system.time(for (i in 1:m) { W1[i, ] - apply(B, 1, y = A[i,], function(x, y) g1(y, x)) }) ## g2 - function(x){ out - tB - x theta - atan(out[2, ] / out[1, ]) theta + 2*pi*(theta 0) } tB - t(B) invisible({gc(); gc()}) system.time(for (i in 1:m) { W2[i, ] - g2(A[i, ]) }) ## or invisible({gc(); gc()}) system.time(W3 - t(apply(A, 1, g2))) all.equal(W1, W2) all.equal(W1, W3) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Ravi Varadhan [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, February 01, 2007 4:10 PM Subject: [R] Need help writing a faster code Hi, I apologize for this repeat posting, which I first posted yesterday. I would appreciate any hints on solving this problem: I have two matrices A (m x 2) and B (n x 2), where m and n are large integers (on the order of 10^4). I am looking for an efficient way to create another matrix, W (m x n), which can be defined as follows: for (i in 1:m){ for (j in 1:n) { W[i,j] - g(A[i,], B[j,]) } } where g(x,y) is a function that takes two vectors and returns a scalar. The following works okay, but is not fast enough for my purpose. I am sure that I can do better: for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } How can I do this in a faster manner? I attempted outer, kronecker, expand.grid, etc, but with no success. Here is an example: m - 2000 n - 5000 A - matrix(rnorm(2*m),ncol=2) B - matrix(rnorm(2*n),ncol=2) W - matrix(NA, m, n) for (i in 1:m) { W[i,] - apply(B, 1, y=A[i,], function(x,y) g(y,x)) } g - function(x,y){ theta - atan((y[2]-x[2]) / (y[1] - x[1])) theta + 2*pi*(theta 0) } Thanks for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] when i configure the R 2.4.1,i meet the problem
On 2/1/07, xiaopeng hu [EMAIL PROTECTED] wrote: i get the message : configure: WARNING: you cannot build info or html versions of the R manuals how to deal with it ? did you read the manual 'R Installation and Administration'? see section 2.2 there. it is mentioned there that you need makeinfo version 4.7 or later for the manuals in info format. (No, this is not circular; you don't have to build the manuals first to be able to read them. Check www.r-project.org - Manuals). I guess several other things are not available either on your computer. For example section 2.1 of the aforementioned manual points out that you need Perl 5. Hope this helps, Roland [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] would you please navigate me?
Hello My name is Aida Eslami. I am a M.S.c student of statistics at Shahid Beheshti University , Tehran, Iran. The subject of my thesis is Analysis of Masked Data. I have some problems in writing of my program (optimization). Would you please navigate me and introduce some to help me? Thank you. Yours sincerely Aida Eslami [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Losing factor levels when moving variables from one context to another
Hi, there I'm currently trying to figure out how to keep my factor levels for a variable when moving it from one data frame or matrix to another. Example below: vec1-(rep(10,5)) vec2-(rep(30,5)) vec3-(rep(80,5)) vecs-c(vec1, vec2, vec3) resp-rnorm(2,15) dat-as.data.frame(cbind(resp, vecs)) dat$vecs-factor(dat$vecs) dat R returns: resp vecs 1 1.57606068767956 10 2 2.30271782269308 10 3 2.39874788444542 10 40.963987738423353 10 5 2.03620782454740 10 6 -0.0706713324725649 30 7 1.49001721222926 30 8 2.00587718501980 30 90.450576585429981 30 102.87120375367357 30 112.25575058079324 80 122.03471288724508 80 132.67432066972984 80 141.74102136279177 80 152.29827581276955 80 and now: newvar-(rnorm(15,4)) newdat-as.data.frame(cbind(newvar, dat$vecs)) newdat R returns: newvar V2 1 4.300788 1 2 5.295951 1 3 5.099849 1 4 3.211045 1 5 3.703554 1 6 3.693826 2 7 5.314679 2 8 4.70 2 9 3.534515 2 10 4.037401 2 11 4.476808 3 12 4.842449 3 13 3.109677 3 14 4.752961 3 15 4.445216 3 I seem to have lost everything I once has associated with vecs, and it's turned my actual values into arbitrary groupings. I assume this has something to do with the behaviour of factors? Does anyone have any suggestions on how to get my original levels, etc., back? Cheers, Mike Michael Rennie Ph.D. Candidate, University of Toronto at Mississauga 3359 Mississauga Rd. N. Mississauga, ON L5L 1C6 Ph: 905-828-5452 Fax: 905-828-3792 www.utm.utoronto.ca/~w3rennie __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Losing factor levels when moving variables from one context to another
Michael Rennie wrote: Hi, there I'm currently trying to figure out how to keep my factor levels for a variable when moving it from one data frame or matrix to another. Example below: vec1-(rep(10,5)) vec2-(rep(30,5)) vec3-(rep(80,5)) vecs-c(vec1, vec2, vec3) resp-rnorm(2,15) dat-as.data.frame(cbind(resp, vecs)) dat$vecs-factor(dat$vecs) dat R returns: resp vecs 1 1.57606068767956 10 2 2.30271782269308 10 3 2.39874788444542 10 40.963987738423353 10 5 2.03620782454740 10 6 -0.0706713324725649 30 7 1.49001721222926 30 8 2.00587718501980 30 90.450576585429981 30 102.87120375367357 30 112.25575058079324 80 122.03471288724508 80 132.67432066972984 80 141.74102136279177 80 152.29827581276955 80 and now: newvar-(rnorm(15,4)) newdat-as.data.frame(cbind(newvar, dat$vecs)) newdat R returns: newvar V2 1 4.300788 1 2 5.295951 1 3 5.099849 1 4 3.211045 1 5 3.703554 1 6 3.693826 2 7 5.314679 2 8 4.70 2 9 3.534515 2 10 4.037401 2 11 4.476808 3 12 4.842449 3 13 3.109677 3 14 4.752961 3 15 4.445216 3 I seem to have lost everything I once has associated with vecs, and it's turned my actual values into arbitrary groupings. I assume this has something to do with the behaviour of factors? Does anyone have any suggestions on how to get my original levels, etc., back? It has more to do with the behavior of cbind(). Construct the data frame with data.frame() rather than the combination of as.data.frame() and cbind(). For example: vec1 - (rep(10,2)) vec2 - (rep(30,2)) vec3 - (rep(80,2)) vecs - c(vec1, vec2, vec3) resp - rnorm(6,2) dat - data.frame(resp, vecs) dat$vecs - factor(dat$vecs) dat resp vecs 1 2.795851 10 2 3.673296 10 3 1.731921 30 4 1.172945 30 5 2.427164 80 6 1.470758 80 newvar - (rnorm(6,4)) newdat - data.frame(newvar, dat$vecs) newdat newvar dat.vecs 1 6.389386 10 2 3.453535 10 3 3.807821 30 4 6.067712 30 5 4.978724 80 6 3.015975 80 ?data.frame Cheers, Mike Michael Rennie Ph.D. Candidate, University of Toronto at Mississauga 3359 Mississauga Rd. N. Mississauga, ON L5L 1C6 Ph: 905-828-5452 Fax: 905-828-3792 www.utm.utoronto.ca/~w3rennie __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Losing factor levels when moving variables from one context to another
On Thu, 2007-02-01 at 12:13 -0500, Michael Rennie wrote: Hi, there I'm currently trying to figure out how to keep my factor levels for a variable when moving it from one data frame or matrix to another. Example below: vec1-(rep(10,5)) vec2-(rep(30,5)) vec3-(rep(80,5)) vecs-c(vec1, vec2, vec3) resp-rnorm(2,15) dat-as.data.frame(cbind(resp, vecs)) dat$vecs-factor(dat$vecs) dat R returns: resp vecs 1 1.57606068767956 10 2 2.30271782269308 10 3 2.39874788444542 10 40.963987738423353 10 5 2.03620782454740 10 6 -0.0706713324725649 30 7 1.49001721222926 30 8 2.00587718501980 30 90.450576585429981 30 102.87120375367357 30 112.25575058079324 80 122.03471288724508 80 132.67432066972984 80 141.74102136279177 80 152.29827581276955 80 and now: newvar-(rnorm(15,4)) newdat-as.data.frame(cbind(newvar, dat$vecs)) newdat R returns: newvar V2 1 4.300788 1 2 5.295951 1 3 5.099849 1 4 3.211045 1 5 3.703554 1 6 3.693826 2 7 5.314679 2 8 4.70 2 9 3.534515 2 10 4.037401 2 11 4.476808 3 12 4.842449 3 13 3.109677 3 14 4.752961 3 15 4.445216 3 I seem to have lost everything I once has associated with vecs, and it's turned my actual values into arbitrary groupings. I assume this has something to do with the behaviour of factors? Does anyone have any suggestions on how to get my original levels, etc., back? Cheers, Mike Mike, The problem (specific to your example) is that you are using as.data.frame() and cbind(), which will first coerce the columns to a common data type, create a matrix and then coerce the matrix to a dataframe. Thus, in the second case, your factor dat$vecs is first being coerced to its numeric equivalent values, rather then being retained as a factor, since a matrix can contain only one data type and the first column is numeric. Try this instead: vec1-(rep(10, 5)) vec2-(rep(30, 5)) vec3-(rep(80, 5)) vecs-c(vec1, vec2, vec3) set.seed(1) resp-rnorm(15, 2) dat - data.frame(resp, vecs) str(dat) 'data.frame': 15 obs. of 2 variables: $ resp: num 1.37 2.18 1.16 3.60 2.33 ... $ vecs: Factor w/ 3 levels 10,30,80: 1 1 1 1 1 2 2 2 2 2 .. set.seed(2) newvar - rnorm(15, 4) newdat - data.frame(newvar, dat$vecs) str(newdat) 'data.frame': 15 obs. of 2 variables: $ newvar : num 3.10 4.18 5.59 2.87 3.92 ... $ dat.vecs: Factor w/ 3 levels 10,30,80: 1 1 1 1 1 2 2 2 2 2 ... all(levels(newdat$dat.vecs) == levels(dat$vecs)) [1] TRUE BTW, there may very well be times when you are combining two factors together and need to ensure that the factor levels either are intentionally different or need to relevel the combined factors into common levels. See the Warning and other information in ?factor. This would be critical, for example, if you are combining data sets to then run modeling functions on the combined data sets. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] would you please navigate me?
a.eslami a.eslami at Mail.sbu.ac.ir writes: Hello My name is Aida Eslami. I am a M.S.c student of statistics at Shahid Beheshti University , Tehran, Iran. The subject of my thesis is Analysis of Masked Data. I have some problems in writing of my program (optimization). Would you please navigate me and introduce some to help me? Thank you. Yours sincerely Aida Eslami I'm sorry, but we can only answer _specific_ questions about R on this mailing list; there are too many deserving students all over the world for us to help them all. You should try to get enough help from someone at your local institution to get you to the point where you can formulate a specific question about R; failing that, you will have to struggle with the R documentation on your own until you can get to that point. As it says at the bottom of every e-mail to the list, please read the posting guide as well ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Jeff, Here is something which is a little faster: sum1 - sum(outer(x, x, FUN=pmax)) sum3 - sum(outer(x, y, FUN=pmax)) Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey Racine Sent: Thursday, February 01, 2007 1:18 PM To: r-help@stat.math.ethz.ch Subject: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors) Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Well, a reproducible example would be nice =) not tested: x = rnorm(10) y = rnorm(20) mymax - function(t1, t2) apply(cbind(t1, t2), 1, max) sum(outer(x, y, mymax)) is this sth like what you need? b On Feb 1, 2007, at 1:18 PM, Jeffrey Racine wrote: Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a more efficient way to do this. Basically, I require three sums: sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum3: \sum_i\sum_j max(X_i,Y_j) Here is my current implementation - any pointers to more efficient computation greatly appreciated. nx - length(x) ny - length(y) sum1 - 0 sum3 - 0 for(i in 1:nx) { sum1 - sum1 + sum(ifelse(x[i]x,x[i],x)) sum3 - sum3 + sum(ifelse(x[i]y,x[i],y)) } sum2 - 0 sum4 - sum3 # symmetric and identical for(i in 1:ny) { sum2 - sum2 + sum(ifelse(y[i]y,y[i],y)) } Thanks in advance for your help. -- Jeff -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for bioinformatics
Benoit Ballester [EMAIL PROTECTED] writes: Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. I've never seen a copy so I can't comment about its quality (has anyone seen a copy?). You might want to take a look at _Bioinformatics and Computational Biology Solutions Using R and Bioconductor_. http://www.bioconductor.org/pub/docs/mogr/ + seth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Upcoming Course**** R/Splus Fundamentals and Programming Techniques**** In Washington DC, San Francisco and Princeton
XLSolutions Corporation (www.xlsolutions-corp.com) is proud to announce our R/S-plus Fundamentals and Programming Techniques : www.xlsolutions-corp.com/Rfund.htm *** Washington DC / March 1-2, 2007 *** San Francisco / March 15-16, 2007 *** Princeton / Week of Feb 26 (dates coming soon) Should we bring this course to your city? please let us know! Interested in R/Splus Advanced course? email us. Reserve your seat now at the early bird rates! Payment due AFTER the class Course Description: This two-day beginner to intermediate R/S-plus course focuses on a broad spectrum of topics, from reading raw data to a comparison of R and S. We will learn the essentials of data manipulation, graphical visualization and R/S-plus programming. We will explore statistical data analysis tools,including graphics with data sets. How to enhance your plots, build your own packages (librairies) and connect via ODBC,etc. We will perform some statistical modeling and fit linear regression models. Participants are encouraged to bring data for interactive sessions With the following outline: - An Overview of R and S - Data Manipulation and Graphics - Using Lattice Graphics - A Comparison of R and S-Plus - How can R Complement SAS? - Writing Functions - Avoiding Loops - Vectorization - Statistical Modeling - Project Management - Techniques for Effective use of R and S - Enhancing Plots - Using High-level Plotting Functions - Building and Distributing Packages (libraries) - Connecting; ODBC, Rweb, Orca via sockets and via Rjava Email us for group discounts. Email Sue Turner: [EMAIL PROTECTED] Phone: 206-686-1578 Visit us: www.xlsolutions-corp.com/training.htm Please let us know if you and your colleagues are interested in this classto take advantage of group discount. Register now to secure your seat! Interested in R/Splus Advanced course? email us. Cheers, Elvis Miller, PhD Manager Training. XLSolutions Corporation 206 686 1578 www.xlsolutions-corp.com [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing without looping
javier garcia-pintado jgarcia at ija.csic.es writes: Hello, I've got a data.frame like this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 and a long vector of classes like this: x - c(1,1,2,7,6,5,4,3,2,2,2...) And would like to obtain a vector of length = length(x), with the corresponding values extracted from assignation table. Like this: x.value [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 Could you help me with an elegant way to do this ? (I just can do it with looping for each class in the assignation table, what a think is not perfect in R's sense) Wishes, Javier Javier, you might try this: assignation - data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) assignation value class 1 6.5 1 2 7.5 3 3 8.5 5 4 12.0 2 x - c(1,1,2,7,6,5,4,3,2,2,2) x [1] 1 1 2 7 6 5 4 3 2 2 2 merge( x, assignation, by.x=1, by.y=2, all.x=T ) x value 1 1 6.5 2 1 6.5 3 2 12.0 4 2 12.0 5 2 12.0 6 2 12.0 7 3 7.5 8 4NA 9 5 8.5 10 6NA 11 7NA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for bioinformatics
On Thu, 2007-02-01 at 10:45 -0800, Seth Falcon wrote: Benoit Ballester [EMAIL PROTECTED] writes: Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. I've never seen a copy so I can't comment about its quality (has anyone seen a copy?). You might want to take a look at _Bioinformatics and Computational Biology Solutions Using R and Bioconductor_. http://www.bioconductor.org/pub/docs/mogr/ I'll stand (or sit) to be corrected on this as I cannot find the source, but I have a recollection from seeing something quite some time ago that the book may have never been published. It is no longer listed on Amazon.com (USA), but here is a listing on UK: http://www.amazon.co.uk/R-Bioinformatics-Kimberley-Seefeld/dp/059600544X I located a posting from Kim Seefeld (one of the authors) on a usenet group from back in 2003. Her e-mail then was listed as: [EMAIL PROTECTED] You might want to drop her a line if the e-mail is still valid. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with efficient double sum of max (X_i, Y_i) (X Y vectors)
Jeff, you can do sum1: \sum_i\sum_j max(X_i,X_j) sum2: \sum_i\sum_j max(Y_i,Y_j) sum(x * (2 * rank(x) - 1)) sum3: \sum_i\sum_j max(X_i,Y_j) sum(outer(x, y, pmax)) Probably, the latter can be speeded up even more... Z __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot.function with xlim, bug?
Consider the following lines of code: plot(function(x) sin(cos(x)*exp(-x/2)), from=-8,to=7,xlim=c(-5,5)) Uses integral points (integers from -5 to 5) to draw the plot, instead of the usual default of n= 101 equally spaced points (from ?plot.function). plot(function(x) sin(cos(x)*exp(-x/2)), from=-8,to=7,n=101,xlim=c(-5,5)) Gives the following error: Error in add par(xlog) : invalid 'x' type in 'x y' Any explanations? The following modification, in the plot.R by NOT passing 'y' to plot.function, seems to fix both of the above problems ! I am sure to be missing something! plot2 - function (x, y, ...) { if (is.null(attr(x, class)) is.function(x)) { nms - names(list(...)) ## need to pass 'y' to plot.function() when positionally matched if(missing(y)) # set to defaults {could use formals(plot.default)}: y - { if (!from %in% nms) 0 else if (!to %in% nms) 1 else if (!xlim %in% nms) NULL } if (ylab %in% nms) plot.function(x, ...) else plot.function(x, ylab=paste(deparse(substitute(x)),(x)), ...) } else UseMethod(plot) } --- version.string R version 2.4.1 (2006-12-18) platform i486-pc-gnu-linux __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lining up x-y datasets based on values of x
Hi, I was wondering if there is a direct approach for lining up 2-column matrices according to the values of the first column. An example and a brute-force approach is given below: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) xx - seq( min(c(x[,1],y[,1],z[,1])), max(c(x[,1],y[,1],z[,1])), 1) w - cbind(xx, matrix(rep(0, 3*length(xx)), ncol=3)) w[ xx = x[1,1] xx = x[10,1], 2 ] - x[,2] w[ xx = y[1,1] xx = y[10,1], 3 ] - y[,2] w[ xx = z[1,1] xx = z[10,1], 4 ] - z[,2] w I appreciate any pointers. Thanks. Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
On Thu, 2007-02-01 at 15:05 -0500, Christos Hatzis wrote: Hi, I was wondering if there is a direct approach for lining up 2-column matrices according to the values of the first column. An example and a brute-force approach is given below: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) xx - seq( min(c(x[,1],y[,1],z[,1])), max(c(x[,1],y[,1],z[,1])), 1) w - cbind(xx, matrix(rep(0, 3*length(xx)), ncol=3)) w[ xx = x[1,1] xx = x[10,1], 2 ] - x[,2] w[ xx = y[1,1] xx = y[10,1], 3 ] - y[,2] w[ xx = z[1,1] xx = z[10,1], 4 ] - z[,2] w I appreciate any pointers. Thanks. How about this: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) colnames(x) - c(X, Y) colnames(y) - c(X, Y) colnames(z) - c(X, Y) xy - merge(x, y, by = X, all = TRUE) xyz - merge(xy, z, by = X, all = TRUE) xyz[is.na(xyz)] - 0 xyz X Y.x Y.y Y 1 -4 0.000 0.000 0.3969099 2 -3 0.000 0.000 0.8943127 3 -2 0.000 0.000 0.4882819 4 -1 0.000 0.000 0.0275787 5 0 0.000 0.000 0.7562341 6 1 0.6873130 0.000 0.6185218 7 2 0.1930880 0.000 0.2318025 8 3 0.1164783 0.000 0.7336057 9 4 0.7408532 0.000 0.3006347 10 5 0.7112887 0.6383823 0.8515126 11 6 0.2719079 0.5952721 0.000 12 7 0.2067017 0.8178048 0.000 13 8 0.2085043 0.5714917 0.000 14 9 0.2251435 0.4032660 0.000 15 10 0.3471888 0.5247478 0.000 16 11 0.000 0.6899197 0.000 17 12 0.000 0.7188912 0.000 18 13 0.000 0.9133252 0.000 19 14 0.000 0.9186001 0.000 Note that 'xyz' will be a data frame, so just use as.matrix(xyz) to coerce back to a numeric matrix if needed. See ?merge HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for bioinformatics
Marc Schwartz wrote: On Thu, 2007-02-01 at 10:45 -0800, Seth Falcon wrote: Benoit Ballester [EMAIL PROTECTED] writes: Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. I've never seen a copy so I can't comment about its quality (has anyone seen a copy?). You might want to take a look at _Bioinformatics and Computational Biology Solutions Using R and Bioconductor_. http://www.bioconductor.org/pub/docs/mogr/ I'll stand (or sit) to be corrected on this as I cannot find the source, but I have a recollection from seeing something quite some time ago that the book may have never been published. It's been a while since the status was something along the lines that the authors may or may not complete it. Subject matter moving faster than pen, I suspect -p __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can this loop be delooped?
Hi. I have the following code in a loop. It splits a vector into subvectors of equal size. But if the size of the original vector is not an exact multiple of the desired subvector size, then the first few subvectors have one more element than the last few. I know that the cut function could be used to determine where to break up the vector, but it doesn't seem to provide control over where to put the larger and smaller subvectors. numgp1_v=sidect_v%/%compmin numgroup_v[small]=max(1,numgp1_v[small]) sidemin_v=sidect_v%/%numgroup_v nummax_v=sidect_v%%sidemin_v eix=0 smallindexlist-list(NULL) for(i in 1:numgroup_v[small]) { bix=eix+1 eix=bix+sidemin_v[small]+(i=nummax_v[small])-1 smallindexlist[[i]]-dlpo_sm_v[bix:eix] } The key fact is that smallindexlist is a list, each list element is a subvector of dlpo_sm_v of the proper size. The sizes may be different. I tried to see whether I could eliminate the loop, as follows. First I defined a function: intgpi - function(totalength,numgroups,groupnum,place=LEFT){ # function to split the integer sequence, 1:totalength, into the groupnum group out of numgroups groups of equal size, totalength%/%numgroups # there are totalength%%numgroups number of groups of length 1+totalength%/%numgroups, with the large groups all to one side, left if place=LEFT # totalength = numgroups = groupnum all integers, or it won't work right if(charmatch(toupper(place),RIGHT,nomatch=FALSE)==1){ extra1_1=max((groupnum-1)+((totalength%%numgroups)-numgroups),0) extra1_2=(groupnumnumgroups-totalength%%numgroups) } else{ extra1_1=min(totalength%%numgroups,groupnum-1) extra1_2=(groupnum=totalength%%numgroups) } gsize=totalength%/%numgroups gleft=((groupnum-1)*gsize)+extra1_1+1 gright=gleft+gsize+extra1_2-1 gleft:gright } The function appears to work okay. Then I used it as follows: numgp1_v=sidect_v%/%compmin numgroup_v[small]=max(1,numgp1_v[small]) smallindexlist-list(NULL) smallindexlist=sapply(1:numgroup_v[small],function(i){dlpo_sm_v[intgpi(sidect_v[small],numgroup_v[small],i)]}) In this case, smallindexlist will be a list like I had before if the subvectors are not all the same size, but if the subvectors are all the same size, it appears that I get an array. Can I force this operation to give me a list the way I want it in all cases? Or is there a better way to deloop my original code? Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems installing R-2.4.1 on Solaris 11 x-86 from source: error in gmake after successful configure
What is 'Solaris 11'? According to www.sun.com, the latest Solaris version is 10, and my sysadmins have not heard of Solaris 11. You seem to be missing the Solaris compilation tools, ar in this case. In Solaris = 10 they are in /usr/ccs/bin, not in the path by default. On Wed, 31 Jan 2007, Octavio Tourinho wrote: Dear friends, I am trying to install R-2.4.1 from source on Solaris 11 x-86. 64 bits, There is 32-bit x86 and 64-bit amd64 or x86_64. running on Sun Ultra-20 workstation, and using the SunStudio 11 compilers. I was able to configure R correctly, but received an error in gmake, aparently related to bzip2 which I have been unable to debug. The messages are listed below. The configure.log and configure.status files are attached. Any help would be sincerely appreciated. Octavio Tourinho = R is now configured for i386-pc-solaris2.11 Source directory: . Installation directory:/usr/local C compiler:gcc -std=gnu99 -D__NO_MATH_INLINES -g -O2 Fortran 77 compiler: g77 -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler:f95 -g Interfaces supported: X11, tcltk External libraries:readline Additional capabilities: PNG, JPEG, NLS Options enabled: shared BLAS, R profiling Recommended packages: yes configure: WARNING: you cannot build DVI versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals # gmake gmake[1]: Entering directory `/usr/local/R-2.4.1/m4' gmake[1]: Nothing to be done for `R'. gmake[1]: Leaving directory `/usr/local/R-2.4.1/m4' gmake[1]: Entering directory `/usr/local/R-2.4.1/tools' gmake[1]: Nothing to be done for `R'. gmake[1]: Leaving directory `/usr/local/R-2.4.1/tools' gmake[1]: Entering directory `/usr/local/R-2.4.1/doc' gmake[2]: Entering directory `/usr/local/R-2.4.1/doc/html' gmake[3]: Entering directory `/usr/local/R-2.4.1/doc/html/search' gmake[3]: Leaving directory `/usr/local/R-2.4.1/doc/html/search' gmake[2]: Leaving directory `/usr/local/R-2.4.1/doc/html' gmake[2]: Entering directory `/usr/local/R-2.4.1/doc/manual' gmake[2]: Nothing to be done for `R'. gmake[2]: Leaving directory `/usr/local/R-2.4.1/doc/manual' gmake[1]: Leaving directory `/usr/local/R-2.4.1/doc' gmake[1]: Entering directory `/usr/local/R-2.4.1/etc' gmake[1]: Leaving directory `/usr/local/R-2.4.1/etc' gmake[1]: Entering directory `/usr/local/R-2.4.1/share' gmake[1]: Leaving directory `/usr/local/R-2.4.1/share' gmake[1]: Entering directory `/usr/local/R-2.4.1/src' gmake[2]: Entering directory `/usr/local/R-2.4.1/src/scripts' creating src/scripts/R.fe gmake[3]: Entering directory `/usr/local/R-2.4.1/src/scripts' gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/scripts' gmake[2]: Leaving directory `/usr/local/R-2.4.1/src/scripts' gmake[2]: Entering directory `/usr/local/R-2.4.1/src/include' config.status: creating src/include/config.h config.status: src/include/config.h is unchanged Rmath.h is unchanged gmake[3]: Entering directory `/usr/local/R-2.4.1/src/include/R_ext' gmake[3]: Nothing to be done for `R'. gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/include/R_ext' gmake[2]: Leaving directory `/usr/local/R-2.4.1/src/include' gmake[2]: Entering directory `/usr/local/R-2.4.1/src/extra' gmake[3]: Entering directory `/usr/local/R-2.4.1/src/extra/blas' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/blas' gmake[4]: `libRblas.so' is up to date. gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/blas' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/blas' /usr/local/R-2.4.1/lib/libRblas.so is unchanged gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/blas' gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/extra/blas' gmake[3]: Entering directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/bzip2' rm -f libbz2.a false cr libbz2.a blocksort.o bzlib.o compress.o crctable.o decompress.o huffman.o randtable.o gmake[4]: *** [libbz2.a] Error 1 gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[3]: *** [R] Error 2 gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[2]: *** [R] Error 1 gmake[2]: Leaving directory `/usr/local/R-2.4.1/src/extra' gmake[1]: *** [R] Error 1 gmake[1]: Leaving directory `/usr/local/R-2.4.1/src' gmake: *** [R] Error 1 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595
Re: [R] Lining up x-y datasets based on values of x
Thanks Marc and Phil. My dataset actually consists of 50+ individual files, so I will have to do this one column at a time in a loop... I might look into SQL and outer joints as an alternative to avoid looping. Thanks again. -Christos -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 3:29 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Lining up x-y datasets based on values of x On Thu, 2007-02-01 at 15:05 -0500, Christos Hatzis wrote: Hi, I was wondering if there is a direct approach for lining up 2-column matrices according to the values of the first column. An example and a brute-force approach is given below: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) xx - seq( min(c(x[,1],y[,1],z[,1])), max(c(x[,1],y[,1],z[,1])), 1) w - cbind(xx, matrix(rep(0, 3*length(xx)), ncol=3)) w[ xx = x[1,1] xx = x[10,1], 2 ] - x[,2] w[ xx = y[1,1] xx = y[10,1], 3 ] - y[,2] w[ xx = z[1,1] xx = z[10,1], 4 ] - z[,2] w I appreciate any pointers. Thanks. How about this: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) colnames(x) - c(X, Y) colnames(y) - c(X, Y) colnames(z) - c(X, Y) xy - merge(x, y, by = X, all = TRUE) xyz - merge(xy, z, by = X, all = TRUE) xyz[is.na(xyz)] - 0 xyz X Y.x Y.y Y 1 -4 0.000 0.000 0.3969099 2 -3 0.000 0.000 0.8943127 3 -2 0.000 0.000 0.4882819 4 -1 0.000 0.000 0.0275787 5 0 0.000 0.000 0.7562341 6 1 0.6873130 0.000 0.6185218 7 2 0.1930880 0.000 0.2318025 8 3 0.1164783 0.000 0.7336057 9 4 0.7408532 0.000 0.3006347 10 5 0.7112887 0.6383823 0.8515126 11 6 0.2719079 0.5952721 0.000 12 7 0.2067017 0.8178048 0.000 13 8 0.2085043 0.5714917 0.000 14 9 0.2251435 0.4032660 0.000 15 10 0.3471888 0.5247478 0.000 16 11 0.000 0.6899197 0.000 17 12 0.000 0.7188912 0.000 18 13 0.000 0.9133252 0.000 19 14 0.000 0.9186001 0.000 Note that 'xyz' will be a data frame, so just use as.matrix(xyz) to coerce back to a numeric matrix if needed. See ?merge HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for bioinformatics
On Thu, 2007-02-01 at 21:32 +0100, Peter Dalgaard wrote: Marc Schwartz wrote: On Thu, 2007-02-01 at 10:45 -0800, Seth Falcon wrote: Benoit Ballester [EMAIL PROTECTED] writes: Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. I've never seen a copy so I can't comment about its quality (has anyone seen a copy?). You might want to take a look at _Bioinformatics and Computational Biology Solutions Using R and Bioconductor_. http://www.bioconductor.org/pub/docs/mogr/ I'll stand (or sit) to be corrected on this as I cannot find the source, but I have a recollection from seeing something quite some time ago that the book may have never been published. It's been a while since the status was something along the lines that the authors may or may not complete it. Subject matter moving faster than pen, I suspect Peter, that wording does seem familiar, just cannot recall where I saw it. Perhaps on the O'Reilly web site, where it is no longer listed. For confirmation, I called O'Reilly's customer service in Cambridge, MA. They confirm that the book was indeed cancelled and never published. No reasons were given. Regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time series analysis
Does anyone know a good introductory book or tutorial about time series analysis? (time series for a beginner). Thank you so much. John Lamak _ Descubra como mandar Torpedos SMS do seu Messenger para o celular dos seus amigos. http://mobile.msn.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series analysis
John -- Well, as a start, have a look at Modern Applied Statistics with S, by Venables and Ripley, both of which names you will recognize if you read this list often. There is a 30-page chapter on time series (with suggestions for other readings), obviously geared to S and R, that is a good jumping-off place. Ben Fairbank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack Sent: Thursday, February 01, 2007 3:12 PM To: R-help@stat.math.ethz.ch Subject: [R] time series analysis Does anyone know a good introductory book or tutorial about time series analysis? (time series for a beginner). Thank you so much. John Lamak _ Descubra como mandar Torpedos SMS do seu Messenger para o celular dos seus amigos. http://mobile.msn.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
On Thu, 2007-02-01 at 15:45 -0500, Christos Hatzis wrote: Thanks Marc and Phil. My dataset actually consists of 50+ individual files, so I will have to do this one column at a time in a loop... I might look into SQL and outer joints as an alternative to avoid looping. Thanks again. -Christos If the files conform to some naming convention and/or are all located in a common sub-directory, you can use list.files() to get the file names into a vector. If not, you could use file.choose() interactively. Then use either a for() loop or sapply() to loop over the filenames, read them in to data frames using read.table() and merge them together in the same loop. When it comes to basic data manipulation like this, loops are not a bad thing. The overhead of a loop is typically outweighed by the file I/O and related considerations. HTH, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Autocorrelated Binomial
I need to generate autocorrelated binary data. I've found references to the IEKS package but none of the web pages currently exist. Does anyone know where I can find this package or suggest another package? Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
Christos, Haccording to the Value section in ?merge: A data frame. The rows are by default lexicographically sorted on the common columns, but for sort=FALSE are in an unspecified order. Looking at the code, while there is a lot of time spent on matching things, the key sort() code seems to be near the end of the function: if (sort) res - res[if (all.x || all.y) do.call(order, x[, 1:l.b, drop = FALSE]) else sort.list(bx[m$xi]), , drop = FALSE] I wonder if you could create a local version of merge(), say my.merge(), without that code and without breaking things. A quick glance suggests that as long as you are not merging on the rownames, I think that you might be OK. You would want to test that hypothesis however. HTH, Marc On Thu, 2007-02-01 at 16:48 -0500, Christos Hatzis wrote: [Sorry I meant to reply to the list] Thanks, Marc. That's what I have done. However, there seems to be a penalty from using merge repeatedly as it appears to internally re-sort the datasets. In my case the datasets are long (~35K rows) and already sorted so this step adds considerable and unnecessary overhead. There doesn't seem to be an option for disabling sorting. Setting 'sort=F' only affects sorting of the final data.frame. system.time(merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T)) [1] 6.96 0.00 7.24 NA NA system.time(merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=F)) [1] 6.82 0.00 7.14 NA NA I was wondering if perhaps there is a parallel between this problem and methods for linining up time-series data, since such data are also usually sorted on the time dimension. -Christos -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 4:21 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Lining up x-y datasets based on values of x On Thu, 2007-02-01 at 15:45 -0500, Christos Hatzis wrote: Thanks Marc and Phil. My dataset actually consists of 50+ individual files, so I will have to do this one column at a time in a loop... I might look into SQL and outer joints as an alternative to avoid looping. Thanks again. -Christos If the files conform to some naming convention and/or are all located in a common sub-directory, you can use list.files() to get the file names into a vector. If not, you could use file.choose() interactively. Then use either a for() loop or sapply() to loop over the filenames, read them in to data frames using read.table() and merge them together in the same loop. When it comes to basic data manipulation like this, loops are not a bad thing. The overhead of a loop is typically outweighed by the file I/O and related considerations. HTH, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
[Sorry I meant to reply to the list] Thanks, Marc. That's what I have done. However, there seems to be a penalty from using merge repeatedly as it appears to internally re-sort the datasets. In my case the datasets are long (~35K rows) and already sorted so this step adds considerable and unnecessary overhead. There doesn't seem to be an option for disabling sorting. Setting 'sort=F' only affects sorting of the final data.frame. system.time(merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T)) [1] 6.96 0.00 7.24 NA NA system.time(merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=F)) [1] 6.82 0.00 7.14 NA NA I was wondering if perhaps there is a parallel between this problem and methods for linining up time-series data, since such data are also usually sorted on the time dimension. -Christos -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 4:21 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Lining up x-y datasets based on values of x On Thu, 2007-02-01 at 15:45 -0500, Christos Hatzis wrote: Thanks Marc and Phil. My dataset actually consists of 50+ individual files, so I will have to do this one column at a time in a loop... I might look into SQL and outer joints as an alternative to avoid looping. Thanks again. -Christos If the files conform to some naming convention and/or are all located in a common sub-directory, you can use list.files() to get the file names into a vector. If not, you could use file.choose() interactively. Then use either a for() loop or sapply() to loop over the filenames, read them in to data frames using read.table() and merge them together in the same loop. When it comes to basic data manipulation like this, loops are not a bad thing. The overhead of a loop is typically outweighed by the file I/O and related considerations. HTH, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Affymetrix data analysis
Hi, I am trying to read in my Affymetrix CEL files (48 files, total ~600 MB) but I keep getting memory errors. Can somebody please help me with this. Or is therea remote server I can send my data to for computation? Any help is much appreciated. Thanks Dr. Tristan Coram Postdoctoral Research Associate Research Plant Pathologist/Geneticist United States Department of Agriculture Agricultural Research Service Wheat Genetics, Quality Physiology Disease Research 209 Johnson Hall Washington State University Pullman, WA 99163 Office: +1 509 335-1596 Fax: +1 509 335-2553 Email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can this loop be delooped?
This might do what you want: # test data x - 1:43 nb - 5 # number of subsets # create vector of lengths of subsets ns - rep(length(x) %/% nb, nb) # see if we have to adjust counts of initial subsets if ((.offset - length(x) %% nb) != 0) ns[1:.offset] = ns[1:.offset] + 1 # create the subsets split(x, rep(1:nb,ns)) $`1` [1] 1 2 3 4 5 6 7 8 9 $`2` [1] 10 11 12 13 14 15 16 17 18 $`3` [1] 19 20 21 22 23 24 25 26 27 $`4` [1] 28 29 30 31 32 33 34 35 $`5` [1] 36 37 38 39 40 41 42 43 On 2/1/07, Talbot Katz [EMAIL PROTECTED] wrote: Hi. I have the following code in a loop. It splits a vector into subvectors of equal size. But if the size of the original vector is not an exact multiple of the desired subvector size, then the first few subvectors have one more element than the last few. I know that the cut function could be used to determine where to break up the vector, but it doesn't seem to provide control over where to put the larger and smaller subvectors. numgp1_v=sidect_v%/%compmin numgroup_v[small]=max(1,numgp1_v[small]) sidemin_v=sidect_v%/%numgroup_v nummax_v=sidect_v%%sidemin_v eix=0 smallindexlist-list(NULL) for(i in 1:numgroup_v[small]) { bix=eix+1 eix=bix+sidemin_v[small]+(i=nummax_v[small])-1 smallindexlist[[i]]-dlpo_sm_v[bix:eix] } The key fact is that smallindexlist is a list, each list element is a subvector of dlpo_sm_v of the proper size. The sizes may be different. I tried to see whether I could eliminate the loop, as follows. First I defined a function: intgpi - function(totalength,numgroups,groupnum,place=LEFT){ # function to split the integer sequence, 1:totalength, into the groupnum group out of numgroups groups of equal size, totalength%/%numgroups # there are totalength%%numgroups number of groups of length 1+totalength%/%numgroups, with the large groups all to one side, left if place=LEFT # totalength = numgroups = groupnum all integers, or it won't work right if(charmatch(toupper(place),RIGHT,nomatch=FALSE)==1){ extra1_1=max((groupnum-1)+((totalength%%numgroups)-numgroups),0) extra1_2=(groupnumnumgroups-totalength%%numgroups) } else{ extra1_1=min(totalength%%numgroups,groupnum-1) extra1_2=(groupnum=totalength%%numgroups) } gsize=totalength%/%numgroups gleft=((groupnum-1)*gsize)+extra1_1+1 gright=gleft+gsize+extra1_2-1 gleft:gright } The function appears to work okay. Then I used it as follows: numgp1_v=sidect_v%/%compmin numgroup_v[small]=max(1,numgp1_v[small]) smallindexlist-list(NULL) smallindexlist=sapply(1:numgroup_v[small],function(i){dlpo_sm_v[intgpi(sidect_v[small],numgroup_v[small],i)]}) In this case, smallindexlist will be a list like I had before if the subvectors are not all the same size, but if the subvectors are all the same size, it appears that I get an array. Can I force this operation to give me a list the way I want it in all cases? Or is there a better way to deloop my original code? Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wavlet filter using morlet mother wavelet
Anil Kumar, it seems there isn't packages for continuous wavelet transforms in R. Anyway, take a look at the packages waveslim, wavethresh, wavelets or rwt. Maybe one of them can be useful to you. Rogerio. -- Cabeçalho original --- De: [EMAIL PROTECTED] Para: r-help@stat.math.ethz.ch Cópia: Data: 1 Feb 2007 07:33:52 - Assunto: [R] Wavlet filter using morlet mother wavelet nbsp; Hi, List ,I am searching any package on R which can do wavelet filtering for mother wavelet morlet ,is anybody having any script for the same ?I am new to the RwAVELET ANALSSIS..THANKS IN ADVANCE ANIL KUMAR ANIL KUMAR(nbsp;METEOROLOGIST)LRF SECTIONnbsp; NATIONAL CLIMATEnbsp;CENTER ADGM(RESEARCH)INDIA METEOROLOGICALnbsp;DEPARTMENTSHIVIJI NAGARPUNE-411005 INDIA MOBILE +919422023277[EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RSiteSearch() etc. - speed improvement
My R search page at http://finzi.psych.upenn.edu/ which is also what you get with RSiteSearch() has been slowing down the last few months (years?). I thought this was because the archives were just getting too big. But I discovered a simple fix. The technical term for the problem is garbage. By cleaning up the garbage, I increased the speed to the point where now most searches - even those with three search terms - are instantaneous. Thus, it is probably good for a few more years, before I have to think about a different search engine or a faster computer (which I should get anyway). If you have given up on it because of its slow response, do try again. Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
On Thu, 1 Feb 2007, Marc Schwartz wrote: Christos, Haccording to the Value section in ?merge: A data frame. The rows are by default lexicographically sorted on the common columns, but for sort=FALSE are in an unspecified order. There is also a sort in the .Internal code. But I am not buying that this is a major part of the time without detailed evidence from profiling. Sorting 35k numbers should take a few milliseconds, and less if they are already sorted. x - rnorm(35000) system.time(y - sort(x, method=quick)) [1] 0.003 0.001 0.004 0.000 0.000 system.time(sort(y, method=quick)) [1] 0.002 0.000 0.001 0.000 0.000 Looking at the code, while there is a lot of time spent on matching things, the key sort() code seems to be near the end of the function: if (sort) res - res[if (all.x || all.y) do.call(order, x[, 1:l.b, drop = FALSE]) else sort.list(bx[m$xi]), , drop = FALSE] I wonder if you could create a local version of merge(), say my.merge(), without that code and without breaking things. A quick glance suggests that as long as you are not merging on the rownames, I think that you might be OK. You would want to test that hypothesis however. HTH, Marc On Thu, 2007-02-01 at 16:48 -0500, Christos Hatzis wrote: [Sorry I meant to reply to the list] Thanks, Marc. That's what I have done. However, there seems to be a penalty from using merge repeatedly as it appears to internally re-sort the datasets. In my case the datasets are long (~35K rows) and already sorted so this step adds considerable and unnecessary overhead. There doesn't seem to be an option for disabling sorting. Setting 'sort=F' only affects sorting of the final data.frame. system.time(merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T)) [1] 6.96 0.00 7.24 NA NA system.time(merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=F)) [1] 6.82 0.00 7.14 NA NA I was wondering if perhaps there is a parallel between this problem and methods for linining up time-series data, since such data are also usually sorted on the time dimension. -Christos -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 4:21 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Lining up x-y datasets based on values of x On Thu, 2007-02-01 at 15:45 -0500, Christos Hatzis wrote: Thanks Marc and Phil. My dataset actually consists of 50+ individual files, so I will have to do this one column at a time in a loop... I might look into SQL and outer joints as an alternative to avoid looping. Thanks again. -Christos If the files conform to some naming convention and/or are all located in a common sub-directory, you can use list.files() to get the file names into a vector. If not, you could use file.choose() interactively. Then use either a for() loop or sapply() to loop over the filenames, read them in to data frames using read.table() and merge them together in the same loop. When it comes to basic data manipulation like this, loops are not a bad thing. The overhead of a loop is typically outweighed by the file I/O and related considerations. HTH, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory-efficient column aggregation of a sparse matrix
On Feb 1, 2007, at 6:22 AM, Douglas Bates wrote: It turns out that in the sparse matrix code used by the Matrix package the triplet representation allows for duplicate index positions with the convention that the resulting value at a position is the sum of the values of any triplets with that index pair. Very handy! I suggest adding this nugget near the (possibly redundant) triplets phrase in Matrix.pdf. If you decide to use this approach please be aware that the indices for the triplet representation in the Matrix package are 0-based (as in C code) not 1-based (as in R code). (I imagine that Martin is thinking we really should change that as he reads this part.) The Value of the appended function is equivalent to my previous version, but it runs in 1/10'th the time, uses vastly less memory, and is fewer lines of code to boot! Sure it's tricky, but it does the trick. THANK YOU SO MUCH! -jon NEWaggregate.csr - function(x,fac) { # cast into handy Matrix sparse Triplet form x.T - as(as(x, dgRMatrix), dgTMatrix) # factor column indexes (compensating for 0 vs 1 indexing) [EMAIL PROTECTED] - as.integer(as.integer([EMAIL PROTECTED])-1) # cast back, magically computing factor sums along the way :) y - as(x.T, matrix.csr) # and fix the dimension (doing this on x.T bus errors!) [EMAIL PROTECTED] - as.integer(c(nrow(y),nlevels(fac))) y } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
The zoo package has a multiway merge with optional zero fill. Here are two ways: library(zoo) merge(x = zoo(x[,2], x[,1]), y = zoo(y[,2], y[,1]), z = zoo(z[,2], z[,1]), fill = 0) # or library(zoo) X - list(x = x, y = y, z = z) merge0 - function(..., fill = 0) merge(..., fill = fill) do.call(merge0, lapply(X, function(x) zoo(x[,2], x[,1]))) To get more info on zoo try: vignette(zoo) On 2/1/07, Christos Hatzis [EMAIL PROTECTED] wrote: Hi, I was wondering if there is a direct approach for lining up 2-column matrices according to the values of the first column. An example and a brute-force approach is given below: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) xx - seq( min(c(x[,1],y[,1],z[,1])), max(c(x[,1],y[,1],z[,1])), 1) w - cbind(xx, matrix(rep(0, 3*length(xx)), ncol=3)) w[ xx = x[1,1] xx = x[10,1], 2 ] - x[,2] w[ xx = y[1,1] xx = y[10,1], 3 ] - y[,2] w[ xx = z[1,1] xx = z[10,1], 4 ] - z[,2] w I appreciate any pointers. Thanks. Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Affymetrix data analysis
The bioconductor mailing list is probably a better place to ask this type of question. [EMAIL PROTECTED] But we also need to know what arrays are you working with, what the errors are, what your sessionInfo() is Let us know, ok? b On Feb 1, 2007, at 5:46 PM, Tristan Coram wrote: Hi, I am trying to read in my Affymetrix CEL files (48 files, total ~600 MB) but I keep getting memory errors. Can somebody please help me with this. Or is therea remote server I can send my data to for computation? Any help is much appreciated. Thanks Dr. Tristan Coram Postdoctoral Research Associate Research Plant Pathologist/Geneticist United States Department of Agriculture Agricultural Research Service Wheat Genetics, Quality Physiology Disease Research 209 Johnson Hall Washington State University Pullman, WA 99163 Office: +1 509 335-1596 Fax: +1 509 335-2553 Email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Affymetrix data analysis
Tristan, I have a soft spot for problems analyzing microarrays with R.. for the memory issue, there have been previous posts to this list.. But here is the answer I gave a few weeks ago. If you need more memory, you have to move to linux or recompile R for windows yourself.. .. But you'll still need a computer with more memory. The long term solution, which we are implementing, is to rewrite the normalization code so it doesn't Need to load all those arrays at once. -- cut previous part of message-- The defaults in R is to play nice and limit your allocation to half the available RAM. Make sure you have a lot of disk swap space (at least 1G with 2G of RAM) and you can set your memory limit to 2G for R. See help(memory.size) and use the memory.limit function Hugues P.s. Someone let me use their 16Gig of RAM linux And I was able to run R-64 bits with top showing 6Gigs of RAM allocated (with suitable --max-mem-size command line parameters at startup for R). -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benilton Carvalho Sent: Thursday, February 01, 2007 6:47 PM To: Tristan Coram Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Affymetrix data analysis The bioconductor mailing list is probably a better place to ask this type of question. [EMAIL PROTECTED] But we also need to know what arrays are you working with, what the errors are, what your sessionInfo() is Let us know, ok? b On Feb 1, 2007, at 5:46 PM, Tristan Coram wrote: Hi, I am trying to read in my Affymetrix CEL files (48 files, total ~600 MB) but I keep getting memory errors. Can somebody please help me with this. Or is therea remote server I can send my data to for computation? Any help is much appreciated. Thanks Dr. Tristan Coram Postdoctoral Research Associate Research Plant Pathologist/Geneticist United States Department of Agriculture Agricultural Research Service Wheat Genetics, Quality Physiology Disease Research 209 Johnson Hall Washington State University Pullman, WA 99163 Office: +1 509 335-1596 Fax: +1 509 335-2553 Email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wiki for Graphics tips for MacOS X
I don't have a Linux system to try it with but omitting both dev.control statements it worked for me between two Windows XP sessions on the same machine using this version of R: R.version.string # Windows XP [1] R version 2.4.1 Patched (2006-12-30 r40331) It also successfully worked with: R.version.string # Windows XP [1] R version 2.5.0 Under development (unstable) (2007-01-31 r40623) On 2/1/07, Patrick Connolly [EMAIL PROTECTED] wrote: On Wed, 31-Jan-2007 at 12:11PM -0500, Gabor Grothendieck wrote: | To get the best results you need to transfer it using vector | graphics rather than bitmapped graphics: | | http://www.stc-saz.org/resources/0203_graphics.pdf | | There are a number of variations described here (see | entire thread). Its for UNIX and Windows but I think | it would likely work similarly on Mac and Windows: | | http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32297.html I found that interesting, particularly this part: For example, on Linux do this: dev.control(displaylist=enable) # enable display list plot(1:10) myplot - recordPlot() # load displaylist into variable save(myplot, file=myplot, ascii=TRUE) Send the ascii file, myplot, to the Windows machine and on Windows do this: dev.control(displaylist=enable) # enable display list load(myplot) myplot # displays the plot savePlot(myplot, type=wmf) # saves current plot as wmf I tried that, but I was never able to load the myplot in the Windows R. I always got a message about a syntax error to do with ' ' but I was unable to work out what the problem was. I thought it was because the transfer to Windows wasn't binary, but that wasn't the problem. I was unable to get the thread view at that archive to function so I was unable to see if there were any follow ups which offered an explanation. R has changed quite a bit in the years since then, so it might be that something needs to be done differently with more recent versions. Has anyone done this recently? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
On Thu, 2007-02-01 at 23:34 +, Prof Brian Ripley wrote: On Thu, 1 Feb 2007, Marc Schwartz wrote: Christos, Haccording to the Value section in ?merge: A data frame. The rows are by default lexicographically sorted on the common columns, but for sort=FALSE are in an unspecified order. There is also a sort in the .Internal code. But I am not buying that this is a major part of the time without detailed evidence from profiling. Sorting 35k numbers should take a few milliseconds, and less if they are already sorted. x - rnorm(35000) system.time(y - sort(x, method=quick)) [1] 0.003 0.001 0.004 0.000 0.000 system.time(sort(y, method=quick)) [1] 0.002 0.000 0.001 0.000 0.000 Having had a chance to mock up some examples, I would have to agree with Prof. Ripley on this point. Presuming that we are not missing something about the nature of Christos' data sets, here are 4 examples, with rows sorted in ascending order, descending order, reversed sort order and random order. In theory, the descending order example should, I believe, represent a worst cast scenario, since reverse sorting a sorted list is typically slowest. However, note that there is not much time variation below and running each of the examples several times resulted in material differences across runs. 1. Ascending order DF.X - data.frame(X = 1:35000, Y = runif(35000)) DF.Y - data.frame(X = 1:35000, Y = runif(35000)) system.time(DF.XY - merge(DF.X, DF.Y, by = X, all = TRUE)) [1] 0.249 0.004 0.264 0.000 0.000 2. Descending order DF.X - data.frame(X = 35000:1, Y = runif(35000)) DF.Y - data.frame(X = 35000:1, Y = runif(35000)) system.time(DF.XY - merge(DF.X, DF.Y, by = X, all = TRUE)) [1] 0.300 0.007 0.309 0.000 0.000 3. Reversed sort order DF.X - data.frame(X = 35000:1, Y = runif(35000)) DF.Y - data.frame(X = 1:35000, Y = runif(35000)) system.time(DF.XY - merge(DF.X, DF.Y, by = X, all = TRUE)) [1] 0.236 0.008 0.245 0.000 0.000 4. Random order DF.X - data.frame(X = sample(35000), Y = runif(35000)) DF.Y - data.frame(X = sample(35000), Y = runif(35000)) system.time(DF.XY - merge(DF.X, DF.Y, by = X, all = TRUE)) [1] 0.339 0.016 0.357 0.000 0.000 Spending some time looking at profiling the descending order example, we get: summaryRprof() $by.self self.time self.pct total.time total.pct duplicated.default 0.16 38.1 0.16 38.1 match 0.08 19.0 0.08 19.0 sort.list 0.08 19.0 0.08 19.0 [.data.frame0.04 9.5 0.24 57.1 merge.data.frame0.02 4.8 0.42 100.0 names.default 0.02 4.8 0.02 4.8 seq_len 0.02 4.8 0.02 4.8 merge 0.00 0.0 0.42 100.0 [ 0.00 0.0 0.24 57.1 any 0.00 0.0 0.18 42.9 duplicated 0.00 0.0 0.18 42.9 cbind 0.00 0.0 0.04 9.5 data.frame 0.00 0.0 0.04 9.5 data.row.names 0.00 0.0 0.02 4.8 names 0.00 0.0 0.02 4.8 row.names- 0.00 0.0 0.02 4.8 row.names-.data.frame 0.00 0.0 0.02 4.8 $by.total total.time total.pct self.time self.pct merge.data.frame 0.42 100.0 0.02 4.8 merge0.42 100.0 0.00 0.0 [.data.frame 0.24 57.1 0.04 9.5 [0.24 57.1 0.00 0.0 any 0.18 42.9 0.00 0.0 duplicated 0.18 42.9 0.00 0.0 duplicated.default 0.16 38.1 0.16 38.1 match0.08 19.0 0.08 19.0 sort.list0.08 19.0 0.08 19.0 cbind0.04 9.5 0.00 0.0 data.frame 0.04 9.5 0.00 0.0 names.default0.02 4.8 0.02 4.8 seq_len 0.02 4.8 0.02 4.8 data.row.names 0.02 4.8 0.00 0.0 names0.02 4.8 0.00 0.0 row.names- 0.02 4.8 0.00 0.0 row.names-.data.frame 0.02 4.8 0.00 0.0 $sampling.time [1] 0.42 The above suggests that a meaningful amount of time is spent in checking for and dealing with duplicates in the common ('by') columns. To that end: DF.X - data.frame(X = sample(1, 35000, replace = TRUE), Y = runif(35000)) DF.Y - data.frame(X = sample(1, 35000, replace = TRUE), Y = runif(35000)) system.time(DF.XY - merge(DF.X, DF.Y, by = X, all = TRUE)) [1] 3.316 0.148
Re: [R] Lining up x-y datasets based on values of x
Marc, I don't think the issue is duplicates in the matching columns. The data were generated by an instrument (NMR spectrometer), processed by the instrument's software through an FFT transform and other transformations and finally reported as a sequence of chemical shift (x) vs intensity (y) pairs. So all x values are unique. For the example that I reported earlier: length(nmr.spectra.serum[[1]]$V1) [1] 32768 length(unique(nmr.spectra.serum[[1]]$V1)) [1] 32768 length(nmr.spectra.serum[[2]]$V1) [1] 32768 length(unique(nmr.spectra.serum[[2]]$V1)) [1] 32768 And most of the x-values are common sum(nmr.spectra.serum[[1]]$V1 %in% nmr.spectra.serum[[2]]$V1) [1] 32625 For this reason, merge is probably an overkill for this problem and my initial thought was to align the datasets through some simple index-shifting operation. Profiling of the merge code in my case shows that most of the time is spent on data frame subsetting operations and on internal merge and rbind calls secondarily (if I read the summary output correctly). So even if most of the time in the internal merge function is spent on sorting (haven't checked the source code), this is in the worst case a rather minor effect, as suggested by Prof. Ripley. Rprof(merge.out) zz - merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T) Rprof(NULL) summaryRprof(merge.out) $by.self self.time self.pct total.time total.pct merge.data.frame6.56 50.0 11.84 90.2 [.data.frame2.42 18.4 3.68 28.0 merge 1.28 9.8 13.12 100.0 rbind 1.24 9.5 1.36 10.4 names-.default 1.16 8.8 1.16 8.8 row.names-.data.frame 0.12 0.9 0.18 1.4 duplicated.default 0.12 0.9 0.12 0.9 make.unique 0.10 0.8 0.10 0.8 data.frame 0.02 0.2 0.04 0.3 * 0.02 0.2 0.02 0.2 is.na 0.02 0.2 0.02 0.2 match 0.02 0.2 0.02 0.2 order 0.02 0.2 0.02 0.2 unclass 0.02 0.2 0.02 0.2 [ 0.00 0.0 3.68 28.0 do.call 0.00 0.0 1.18 9.0 names- 0.00 0.0 1.16 8.8 row.names- 0.00 0.0 0.18 1.4 any 0.00 0.0 0.14 1.1 duplicated 0.00 0.0 0.12 0.9 cbind 0.00 0.0 0.04 0.3 as.vector 0.00 0.0 0.02 0.2 seq 0.00 0.0 0.02 0.2 seq.default 0.00 0.0 0.02 0.2 $by.total total.time total.pct self.time self.pct merge 13.12 100.0 1.28 9.8 merge.data.frame11.84 90.2 6.56 50.0 [.data.frame 3.68 28.0 2.42 18.4 [3.68 28.0 0.00 0.0 rbind1.36 10.4 1.24 9.5 do.call 1.18 9.0 0.00 0.0 names-.default 1.16 8.8 1.16 8.8 names- 1.16 8.8 0.00 0.0 row.names-.data.frame 0.18 1.4 0.12 0.9 row.names- 0.18 1.4 0.00 0.0 any 0.14 1.1 0.00 0.0 duplicated.default 0.12 0.9 0.12 0.9 duplicated 0.12 0.9 0.00 0.0 make.unique 0.10 0.8 0.10 0.8 data.frame 0.04 0.3 0.02 0.2 cbind0.04 0.3 0.00 0.0 *0.02 0.2 0.02 0.2 is.na0.02 0.2 0.02 0.2 match0.02 0.2 0.02 0.2 order0.02 0.2 0.02 0.2 unclass 0.02 0.2 0.02 0.2 as.vector0.02 0.2 0.00 0.0 seq 0.02 0.2 0.00 0.0 seq.default 0.02 0.2 0.00 0.0 $sampling.time [1] 13.12 Thanks again for your time in looking into this. -Christos -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 9:59 PM To: Prof Brian Ripley Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] Lining up x-y datasets based on values of x On Thu, 2007-02-01 at 23:34 +, Prof Brian Ripley wrote: On Thu, 1 Feb 2007, Marc Schwartz wrote: Christos,
Re: [R] Problems installing R-2.4.1 on Solaris 11 x-86 from source: error in gmake after successful configure
On Thu, 2007-02-01 at 20:39 +, Prof Brian Ripley wrote: What is 'Solaris 11'? According to www.sun.com, the latest Solaris version is 10, and my sysadmins have not heard of Solaris 11. That is the solaris express community release, or the pre-release of the upcoming OpenSolaris http://www.opensolaris.org You seem to be missing the Solaris compilation tools, ar in this case. In Solaris = 10 they are in /usr/ccs/bin, not in the path by default. On Wed, 31 Jan 2007, Octavio Tourinho wrote: Dear friends, I am trying to install R-2.4.1 from source on Solaris 11 x-86. 64 bits, There is 32-bit x86 and 64-bit amd64 or x86_64. running on Sun Ultra-20 workstation, and using the SunStudio 11 compilers. I was able to configure R correctly, but received an error in gmake, aparently related to bzip2 which I have been unable to debug. The messages are listed below. The configure.log and configure.status files are attached. Any help would be sincerely appreciated. Octavio Tourinho = R is now configured for i386-pc-solaris2.11 Source directory: . Installation directory:/usr/local C compiler:gcc -std=gnu99 -D__NO_MATH_INLINES -g -O2 Fortran 77 compiler: g77 -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler:f95 -g Interfaces supported: X11, tcltk External libraries:readline Additional capabilities: PNG, JPEG, NLS Options enabled: shared BLAS, R profiling Recommended packages: yes configure: WARNING: you cannot build DVI versions of the R manuals configure: WARNING: you cannot build PDF versions of the R manuals # gmake gmake[1]: Entering directory `/usr/local/R-2.4.1/m4' gmake[1]: Nothing to be done for `R'. gmake[1]: Leaving directory `/usr/local/R-2.4.1/m4' gmake[1]: Entering directory `/usr/local/R-2.4.1/tools' gmake[1]: Nothing to be done for `R'. gmake[1]: Leaving directory `/usr/local/R-2.4.1/tools' gmake[1]: Entering directory `/usr/local/R-2.4.1/doc' gmake[2]: Entering directory `/usr/local/R-2.4.1/doc/html' gmake[3]: Entering directory `/usr/local/R-2.4.1/doc/html/search' gmake[3]: Leaving directory `/usr/local/R-2.4.1/doc/html/search' gmake[2]: Leaving directory `/usr/local/R-2.4.1/doc/html' gmake[2]: Entering directory `/usr/local/R-2.4.1/doc/manual' gmake[2]: Nothing to be done for `R'. gmake[2]: Leaving directory `/usr/local/R-2.4.1/doc/manual' gmake[1]: Leaving directory `/usr/local/R-2.4.1/doc' gmake[1]: Entering directory `/usr/local/R-2.4.1/etc' gmake[1]: Leaving directory `/usr/local/R-2.4.1/etc' gmake[1]: Entering directory `/usr/local/R-2.4.1/share' gmake[1]: Leaving directory `/usr/local/R-2.4.1/share' gmake[1]: Entering directory `/usr/local/R-2.4.1/src' gmake[2]: Entering directory `/usr/local/R-2.4.1/src/scripts' creating src/scripts/R.fe gmake[3]: Entering directory `/usr/local/R-2.4.1/src/scripts' gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/scripts' gmake[2]: Leaving directory `/usr/local/R-2.4.1/src/scripts' gmake[2]: Entering directory `/usr/local/R-2.4.1/src/include' config.status: creating src/include/config.h config.status: src/include/config.h is unchanged Rmath.h is unchanged gmake[3]: Entering directory `/usr/local/R-2.4.1/src/include/R_ext' gmake[3]: Nothing to be done for `R'. gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/include/R_ext' gmake[2]: Leaving directory `/usr/local/R-2.4.1/src/include' gmake[2]: Entering directory `/usr/local/R-2.4.1/src/extra' gmake[3]: Entering directory `/usr/local/R-2.4.1/src/extra/blas' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/blas' gmake[4]: `libRblas.so' is up to date. gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/blas' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/blas' /usr/local/R-2.4.1/lib/libRblas.so is unchanged gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/blas' gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/extra/blas' gmake[3]: Entering directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[4]: Entering directory `/usr/local/R-2.4.1/src/extra/bzip2' rm -f libbz2.a false cr libbz2.a blocksort.o bzlib.o compress.o crctable.o decompress.o huffman.o randtable.o gmake[4]: *** [libbz2.a] Error 1 gmake[4]: Leaving directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[3]: *** [R] Error 2 gmake[3]: Leaving directory `/usr/local/R-2.4.1/src/extra/bzip2' gmake[2]: *** [R] Error 1 gmake[2]: Leaving directory `/usr/local/R-2.4.1/src/extra' gmake[1]: *** [R] Error 1 gmake[1]: Leaving directory `/usr/local/R-2.4.1/src' gmake: *** [R] Error 1 I have build
Re: [R] Lining up x-y datasets based on values of x
Thanks Gabor. This is along the lines of what I was looking for. In fact the merge function for zoo objects (ordered) turns out to be almost an order of magnitude faster than the generic merge function for my problem: system.time( + zz - merge( spec.1 = zoo(nmr.spectra.serum[[1]]$V2, nmr.spectra.serum[[1]]$V1), +spec.2 = zoo(nmr.spectra.serum[[2]]$V2, nmr.spectra.serum[[2]]$V1), fill=NA ) + ) [1] 0.74 0.07 0.82 NA NA system.time( + ww - merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T) + ) [1] 6.85 0.05 6.94 NA NA head(zz) spec.1 spec.2 -1322.2 -0.651 NA -1321.9 -0.266 NA -1321.7 -0.962 NA -1321.4 -0.602 NA -1321.2 0.753 NA -1320.9 1.212 NA head(ww) V1 V2.x V2.y 1 -1322.2 -0.651 NA 2 -1321.9 -0.266 NA 3 -1321.7 -0.962 NA 4 -1321.4 -0.602 NA 5 -1321.2 0.753 NA 6 -1320.9 1.212 NA Thanks again. -Christos -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 7:25 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Lining up x-y datasets based on values of x The zoo package has a multiway merge with optional zero fill. Here are two ways: library(zoo) merge(x = zoo(x[,2], x[,1]), y = zoo(y[,2], y[,1]), z = zoo(z[,2], z[,1]), fill = 0) # or library(zoo) X - list(x = x, y = y, z = z) merge0 - function(..., fill = 0) merge(..., fill = fill) do.call(merge0, lapply(X, function(x) zoo(x[,2], x[,1]))) To get more info on zoo try: vignette(zoo) On 2/1/07, Christos Hatzis [EMAIL PROTECTED] wrote: Hi, I was wondering if there is a direct approach for lining up 2-column matrices according to the values of the first column. An example and a brute-force approach is given below: x - cbind(1:10, runif(10)) y - cbind(5:14, runif(10)) z - cbind((-4):5, runif(10)) xx - seq( min(c(x[,1],y[,1],z[,1])), max(c(x[,1],y[,1],z[,1])), 1) w - cbind(xx, matrix(rep(0, 3*length(xx)), ncol=3)) w[ xx = x[1,1] xx = x[10,1], 2 ] - x[,2] w[ xx = y[1,1] xx = y[10,1], 3 ] - y[,2] w[ xx = z[1,1] xx = z[10,1], 4 ] - z[,2] w I appreciate any pointers. Thanks. Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lining up x-y datasets based on values of x
On Thu, 2007-02-01 at 22:46 -0500, Christos Hatzis wrote: Marc, I don't think the issue is duplicates in the matching columns. The data were generated by an instrument (NMR spectrometer), processed by the instrument's software through an FFT transform and other transformations and finally reported as a sequence of chemical shift (x) vs intensity (y) pairs. So all x values are unique. For the example that I reported earlier: length(nmr.spectra.serum[[1]]$V1) [1] 32768 length(unique(nmr.spectra.serum[[1]]$V1)) [1] 32768 length(nmr.spectra.serum[[2]]$V1) [1] 32768 length(unique(nmr.spectra.serum[[2]]$V1)) [1] 32768 And most of the x-values are common sum(nmr.spectra.serum[[1]]$V1 %in% nmr.spectra.serum[[2]]$V1) [1] 32625 For this reason, merge is probably an overkill for this problem and my initial thought was to align the datasets through some simple index-shifting operation. Profiling of the merge code in my case shows that most of the time is spent on data frame subsetting operations and on internal merge and rbind calls secondarily (if I read the summary output correctly). So even if most of the time in the internal merge function is spent on sorting (haven't checked the source code), this is in the worst case a rather minor effect, as suggested by Prof. Ripley. Rprof(merge.out) zz - merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T) Rprof(NULL) summaryRprof(merge.out) $by.self self.time self.pct total.time total.pct merge.data.frame6.56 50.0 11.84 90.2 [.data.frame2.42 18.4 3.68 28.0 merge 1.28 9.8 13.12 100.0 rbind 1.24 9.5 1.36 10.4 names-.default 1.16 8.8 1.16 8.8 row.names-.data.frame 0.12 0.9 0.18 1.4 duplicated.default 0.12 0.9 0.12 0.9 make.unique 0.10 0.8 0.10 0.8 data.frame 0.02 0.2 0.04 0.3 * 0.02 0.2 0.02 0.2 is.na 0.02 0.2 0.02 0.2 match 0.02 0.2 0.02 0.2 order 0.02 0.2 0.02 0.2 unclass 0.02 0.2 0.02 0.2 [ 0.00 0.0 3.68 28.0 do.call 0.00 0.0 1.18 9.0 names- 0.00 0.0 1.16 8.8 row.names- 0.00 0.0 0.18 1.4 any 0.00 0.0 0.14 1.1 duplicated 0.00 0.0 0.12 0.9 cbind 0.00 0.0 0.04 0.3 as.vector 0.00 0.0 0.02 0.2 seq 0.00 0.0 0.02 0.2 seq.default 0.00 0.0 0.02 0.2 $by.total total.time total.pct self.time self.pct merge 13.12 100.0 1.28 9.8 merge.data.frame11.84 90.2 6.56 50.0 [.data.frame 3.68 28.0 2.42 18.4 [3.68 28.0 0.00 0.0 rbind1.36 10.4 1.24 9.5 do.call 1.18 9.0 0.00 0.0 names-.default 1.16 8.8 1.16 8.8 names- 1.16 8.8 0.00 0.0 row.names-.data.frame 0.18 1.4 0.12 0.9 row.names- 0.18 1.4 0.00 0.0 any 0.14 1.1 0.00 0.0 duplicated.default 0.12 0.9 0.12 0.9 duplicated 0.12 0.9 0.00 0.0 make.unique 0.10 0.8 0.10 0.8 data.frame 0.04 0.3 0.02 0.2 cbind0.04 0.3 0.00 0.0 *0.02 0.2 0.02 0.2 is.na0.02 0.2 0.02 0.2 match0.02 0.2 0.02 0.2 order0.02 0.2 0.02 0.2 unclass 0.02 0.2 0.02 0.2 as.vector0.02 0.2 0.00 0.0 seq 0.02 0.2 0.00 0.0 seq.default 0.02 0.2 0.00 0.0 $sampling.time [1] 13.12 Thanks again for your time in looking into this. -Christos Christos, Thanks for the follow up. Thought I had something, but apparently not. Question: What is the actual structure of the nmr.spectra.serum objects? The indexing approach that you have suggests they are
Re: [R] Lining up x-y datasets based on values of x
Marc, The data structure is a list of data frames generated from read.table: class(nmr.spectra.serum) [1] list class(nmr.spectra.serum[[1]]) [1] data.frame dim(nmr.spectra.serum[[1]]) [1] 32768 2 Converting the data.frames to matrices does not have much of an effect on timing. -Christos -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Thursday, February 01, 2007 11:06 PM To: [EMAIL PROTECTED] Cc: 'Prof Brian Ripley'; r-help@stat.math.ethz.ch Subject: Re: [R] Lining up x-y datasets based on values of x On Thu, 2007-02-01 at 22:46 -0500, Christos Hatzis wrote: Marc, I don't think the issue is duplicates in the matching columns. The data were generated by an instrument (NMR spectrometer), processed by the instrument's software through an FFT transform and other transformations and finally reported as a sequence of chemical shift (x) vs intensity (y) pairs. So all x values are unique. For the example that I reported earlier: length(nmr.spectra.serum[[1]]$V1) [1] 32768 length(unique(nmr.spectra.serum[[1]]$V1)) [1] 32768 length(nmr.spectra.serum[[2]]$V1) [1] 32768 length(unique(nmr.spectra.serum[[2]]$V1)) [1] 32768 And most of the x-values are common sum(nmr.spectra.serum[[1]]$V1 %in% nmr.spectra.serum[[2]]$V1) [1] 32625 For this reason, merge is probably an overkill for this problem and my initial thought was to align the datasets through some simple index-shifting operation. Profiling of the merge code in my case shows that most of the time is spent on data frame subsetting operations and on internal merge and rbind calls secondarily (if I read the summary output correctly). So even if most of the time in the internal merge function is spent on sorting (haven't checked the source code), this is in the worst case a rather minor effect, as suggested by Prof. Ripley. Rprof(merge.out) zz - merge(nmr.spectra.serum[[1]], nmr.spectra.serum[[2]], by=V1, all=T, sort=T) Rprof(NULL) summaryRprof(merge.out) $by.self self.time self.pct total.time total.pct merge.data.frame6.56 50.0 11.84 90.2 [.data.frame2.42 18.4 3.68 28.0 merge 1.28 9.8 13.12 100.0 rbind 1.24 9.5 1.36 10.4 names-.default 1.16 8.8 1.16 8.8 row.names-.data.frame 0.12 0.9 0.18 1.4 duplicated.default 0.12 0.9 0.12 0.9 make.unique 0.10 0.8 0.10 0.8 data.frame 0.02 0.2 0.04 0.3 * 0.02 0.2 0.02 0.2 is.na 0.02 0.2 0.02 0.2 match 0.02 0.2 0.02 0.2 order 0.02 0.2 0.02 0.2 unclass 0.02 0.2 0.02 0.2 [ 0.00 0.0 3.68 28.0 do.call 0.00 0.0 1.18 9.0 names- 0.00 0.0 1.16 8.8 row.names- 0.00 0.0 0.18 1.4 any 0.00 0.0 0.14 1.1 duplicated 0.00 0.0 0.12 0.9 cbind 0.00 0.0 0.04 0.3 as.vector 0.00 0.0 0.02 0.2 seq 0.00 0.0 0.02 0.2 seq.default 0.00 0.0 0.02 0.2 $by.total total.time total.pct self.time self.pct merge 13.12 100.0 1.28 9.8 merge.data.frame11.84 90.2 6.56 50.0 [.data.frame 3.68 28.0 2.42 18.4 [3.68 28.0 0.00 0.0 rbind1.36 10.4 1.24 9.5 do.call 1.18 9.0 0.00 0.0 names-.default 1.16 8.8 1.16 8.8 names- 1.16 8.8 0.00 0.0 row.names-.data.frame 0.18 1.4 0.12 0.9 row.names- 0.18 1.4 0.00 0.0 any 0.14 1.1 0.00 0.0 duplicated.default 0.12 0.9 0.12 0.9 duplicated 0.12 0.9 0.00 0.0 make.unique 0.10 0.8 0.10 0.8 data.frame 0.04 0.3 0.02 0.2 cbind0.04 0.3 0.00 0.0 *0.02 0.2 0.02 0.2 is.na0.02 0.2 0.02 0.2 match0.02 0.2 0.02 0.2 order0.02 0.2 0.02 0.2
[R] Regression trees with an ordinal response variable
Hi, I am working on a regression tree in Rpart that uses a continuous response variable that is ordered. I read a previous response by Pfr. Ripley to a inquiry regarding the ability of rpart to handle ordinal responses in 2003. At that time rpart was unable to implement an algorithm to handle ordinal responses. Has there been any effort to rectify this in recent years? Thanks! Stacey On Mon, 2 Jun 2003, Andreas Christmann wrote: 1. RE: Ordinal data - Regression Trees Proportional Odds (Liaw, Andy) AFAIK there's no implementation (or description) of tree algorithm that handles ordinal response. Regression trees with an ordinal response variable can be computed with SPSS Answer Tree 3.0. They *can* be handled by tree or rpart in R. I think Andy's point was that there is no consensus as to the right way to handle them: certainly using the codes of categories works and may often be reasonable, and treating ordinal responses as categorical is also very often perfectly adequate. Note that rpart is user-extensible, so it would be reasonably easy to write an extension for a proportional-odds logistic regression model, if that is thought appropriate (and it seems strange to me to impose such strong structure on the model with such a general `linear predictor': POLR models are often in my experience a poor reflection of real problems). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.