Re: [R] panel.first problem when plotting with formula
Peter, Good idea! (why didn't I think of that?) If it stumped the r-list, I think there is probably a slight bug with the plot formula. Problems like this make me realize how amazingly full featured and relatively bug free R is. A problem like this would never happen in Excel, because this level of functionality does not exist. However, if it did, it would probably never be fixed... and you could substitute Excel with Any commercial software. Gene On Tue, May 24, 2011 at 3:13 AM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2011-05-23 16:54, Gene Leynes wrote: I wrote a little function called bgfun that adds gridlines and a background, but it's not working with I plot using the formula. I have some theories on what's happening, but even if my theory is right, I don't know how to fix it. Someone have a straightforward silver bullet? No silver bullet, but this seems to work: plot(y ~ x, data=dat, type=n) points(y ~ x, data=dat, panel.first=bgfun()) (I think that plot.formula may need a fix but offhand I can't see whether that's easy or hard.) Peter Ehlers Thank you, Gene bgfun = function(color='honeydew2',linecolor='grey45', addgridlines=TRUE){ tmp=par(usr) rect(tmp[1], tmp[3], tmp[2], tmp[4], col=color) if(addgridlines){ ylimits=par()$usr[c(3,4)] abline(h=pretty(ylimits,10), lty=2, col=linecolor) } } dat = data.frame(x=1:10,y=1:10) ## Works plot(dat$x, dat$y, panel.first=bgfun()) ## Why doesn't this work? plot(y ~ x, data=dat, panel.first=bgfun()) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting texas school district using shape files
Hi, I was plotting or creating a map for Texas school districts using the shape file of Texas. I could not find any other helpful mail in the mailing list. txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools)) Error- read.shape no found. But read.shape is there in maptools. If anyone can help me out it will be great. Thanks in advance. Shant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Processing large datasets
Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL14130770400 15.4800 BATS35482391Y 1 1 0 0 2011-05-24 904 Bid DELL14130772300 15.4800 BATS35482391Y 1 0 0 0 2011-05-24 904 Bid DELL14130773135 15.4800 BATS35482391Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to call an external program/web page under R for Mac OS?
On 05/24/2011 10:56 PM, jbrezmes wrote: I would like to be able to call external programs such as Java scripts (*.jar files) or bring up the browser to a given direction. Can that be done from R? I am running R on a mac OS X system. Thanks again for any suggestions or solutions. Best regards, Jesus Brezmes -- View this message in context: http://r.789695.n4.nabble.com/How-to-call-an-external-program-web-page-under-R-for-Mac-OS-tp3548479p3548479.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. See ?system for executing external calls... cheers, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] questions about rpart
Hi, I have applied rpart to my data set and for cp=.01, the cross-validation error (xerr) is less (min 0.05) than for other cp. However, in the final tree, an important predictor is not retained. Moreover, another predictor contains missing values in 40% of samples. So I don't know if the important predictor is not retained as the result of missing values or if I should have selected other values of cp. Note that the data contains binary class. Another question is that how it is possible to interpret the relative or cross-validation error for ex by the number of samples. I know that they are scaled to 1 at the root node of the tree but for any number of splits, how much error we make for each sample (but we don't know the number of sample in each split retured by printcp). Any other information is welcome. Look forward to your reply, Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting single variables common to multiple data frames
Hi John, First off, thanks again for your help with this. Much appreciated. I've attached a file of the original data (yes, as you can see there are header names). These hour long files are zipped together on a computer (which is actually an analyzer) and sent each morning to a server. I then run the function below which extracts the data files and binds them into daily files. I then save these files in the .RData format (the 'stuff' file I sent you). I agree that the way I am saving these files must be causing the problem. I'm very new to R and this was the only way I found to save the files, that 'seemed' to work. Please tell me if you know a better way (I'm sure you do)! ###Function used to extract and save## # this function extracts 1 hour iso data from zip files and creates daily files # path = Y:\\Data\\ pathout = Y:\\Daily\\ time=Sys.time() tt- as.numeric(format.Date(time, %Y%m%d)) #tt-20110523 #used to manually enter in date end = .zip ind = paste(path,tt,end,sep=) xx=unzip(ind) #merge all files to one and split by day iso- c();d- c() for (x in xx) { u-read.table(x, header=TRUE, sep=, dec=.) u$dataset = x iso = rbind(iso,u) udate- unique(iso$DATE) d=split(iso,iso$DATE) } #create directories and load if files exist. Then merge data from same day and save fname- c() finame- c() old- c() udate=gsub([^0-9],,udate) for (i in 1:length(udate)){ #fname[i]- paste(pathout,udate[i],sep=) finame[i]=paste(pathout,udate[i],.RData, sep=) deskdir- dir.create(pathout,showWarnings=FALSE) if (file.exists(finame[i])){ old=load(finame[i]) e3- new.env() old- get('isot', e3) isot = merge(old,d[i],all = TRUE) save(isot, file=finame[i]) }else { isot=d[i] save(isot, file=finame[i]) } } rm(list = ls(all = TRUE)) #End Once I have these daily RData files (e.g. 20110520.RData) I'd like to be able to grab any number of them and plot them all together. I'm trying to get this process streamlines as much as possible so I can come into work each day and plot the data from the last week with 'a click of a button'. Thanks again! Mat On 5/24/2011 7:15 PM, John Kane wrote: Whoa, more data than I needed. I called the rdata file from your dput results 'stuff' so any commands to stuff is to that file You say The structure is kind of strange and I have to agree with you. As it stands I cannot get it to do anything A str(stuff) command show that it is data.frame with 8258 obs. of 38 variables. However it also says a variable called X2011.05.20.TIME which is a Factor w/ 114230 level--and this is patently nonsense. It is almost a certainty that it is something about the code you are using to load the data or the orginal structure of the file which is causing the problem Simple commands like: names(stuff) stuff[1,1] stuff[,1]dim(stuff) are not working or returning nonsense I took the file, wrote it back out of R as a csv file and read it bake in and I seem to have something I can work with but, of course, that does not mean it looks like your orginal data. Se my code below Some quick questions 1. What is the format of the original data files? 2. What commands are you using to read the data into R? Please supply the code. 3. Do the files actually have header names? It looks to me as if the reading in command thinks you have variable names at the top of the column but you don't and so it's using the first row of data as the variable names Mysteps #=== #I took the stuff file and did a write.table on it, # storing the file as a text (or csv) file called mystuff #=== write.table(stuff, file=c:/rdata/mystuff.csv, row.names = FALSE, sep=,, col.names=FALSE ) # # I, then, read the data back into a new file new.data # new.data- read.csv(c:/rdata/mystuff.csv, sep=,, header=FALSE) # #now commands like names(new.data) new.data[,1] dim(new.data) # are working the way we would expect. #= names(new.data) new.data[,1] --- On Tue, 5/24/11, Mathew Brownmathew.br...@forst.uni-goettingen.de wrote but From: Mathew Brownmathew.br...@forst.uni-goettingen.de Subject: Re: [R] plotting single variables common to multiple data frames To: John Kanejrkrid...@yahoo.ca Cc: r-help@r-project.org Received: Tuesday, May 24, 2011, 10:38 AM Here is some data. Only one day as two days were too big. The structure is kind of strange and I'm not sure how to 'grab' a single variable from it to plot. I would be happy if someone could tell me how to do that. Cheers On 5/24/2011 3:55 PM, John Kane wrote:
Re: [R] Count of rows while looping through data
An alternative approach would be to `split` the data frame by family, then `lapply` a function selecting random row from each slice, and then `rbind` it all together. x = data.frame(family = rep(1:20,sample(2:5,20,replace=TRUE)), xyz=1) randomrow - function(x) x[sample(1:nrow(x),1),] # step by step x.split - split(x, x$family) x.rnd - lapply(x.split, randomrow) x.togetheragain - do.call(rbind, x.rnd) # or more concisely do.call(rbind, lapply(split(x, x$family), randomrow) ) Best regards, Kenn On Wed, May 25, 2011 at 12:54 AM, Phil Spector spec...@stat.berkeley.edu wrote: Jeanna - I can't imagine how you could solve this problem with a loop, but here's one way to solve it using R: First, I'll create a data frame with a family variable: x = data.frame(family = rep(1:20,sample(2:5,20,replace=TRUE))) Next, I'll number each family member within each family: x$seq = ave(x$family,x$family,FUN=seq) Now I'll choose a random number within each family: x$use = ave(x$family,x$family,FUN=function(x)sample(1:length(x),1)) Finally, I'll select the family member whose sequence number matches the random number: answer = subset(x,seq == use) Hope this helps. Take a look at the help page for the ave function to understand how it works. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 24 May 2011, Jeanna wrote: I have a data table with one column that indicates families, and subsequent columns with other characteristics. I want to randomize one member of the family to a separate table. My approach is to count the number of members, set up a random number generator, and assign the family member based on where they fall within the random number spectrum. Is there a way to count the number of family members as I loop through the whole table? Something like this: for (j in 1:15){ if (x$family[j] == x$family[j+1]){ count = count +1 (which doesn't work) as I do the larger: for (i in 2:nrow(x.tab)){ -- View this message in context: http://r.789695.n4.nabble.com/Count-of-rows-while-looping-through-data-tp3547949p3547949.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with tune.svm() e1071
Hi, I am trying to use tune.svm in e1071 package. the command i use is tobj - tune.svm(labels, data= data, cost = 10^(1:2)) Should the last column of the 'data' contain the labels as well? I want to use the linear kernel. But it gives me the error Error in model.frame.default(formula, data) : 'data' must be a data.frame, not a matrix or an array Do you know why this might happen? best, salih [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: questions about rpart - cont.
Forgot to specify that the cross-val error cannot be decreased lower than 0.91. Note that for smaller values of cp than 0.01, the cross-val error increases. Is the cross-val error sum of squared error or relative error for classification problem (method = class in rpart function) or another type of error? Is it possible to determine the true positive, false positive using rpart? Thanks - Forwarded Message From: carol white wht_...@yahoo.com To: r-h...@stat.math.ethz.ch Sent: Wed, May 25, 2011 9:06:15 AM Subject: questions about rpart Hi, I have applied rpart to my data set and for cp=.01, the cross-validation error (xerr) is less (min 0.05) than for other cp. However, in the final tree, an important predictor is not retained. Moreover, another predictor contains missing values in 40% of samples. So I don't know if the important predictor is not retained as the result of missing values or if I should have selected other values of cp. Note that the data contains binary class. Another question is that how it is possible to interpret the relative or cross-validation error for ex by the number of samples. I know that they are scaled to 1 at the root node of the tree but for any number of splits, how much error we make for each sample (but we don't know the number of sample in each split retured by printcp). Any other information is welcome. Look forward to your reply, Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RGL package installation problem on Centos
Hi. Thank you for your help. From your suggestions, I tried the following; R CMD INSTALL --no-test-load rgl_0.92.798.tar.gz This seemed to load and install (starting R and issuing library(rgl) did not flag any problems But running the sphere example from rgl, it causes big problems :-) # R R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(rgl) open3d() [1] 1 spheres3d(rnorm(10), rnorm(10), rnorm(10), radius=runif(10), color=rainbow(10)) X Error of failed request: GLXUnsupportedPrivateRequest Major opcode of failed request: 143 (GLX) Minor opcode of failed request: 16 (X_GLXVendorPrivate) Serial number of failed request: 75 Current serial number in output stream: 80 *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: .C(rgl_spheres, success = as.integer(FALSE), idata, as.numeric(vertex), as.numeric(radius), NAOK = TRUE) 2: rgl.spheres(x = c(0.506515614656334, -0.610549216480097, 1.08552683577513, 0.189935807154803, 1.3670636776769, 1.0181689602839, -1.51133180077403, 1.41127485066926, 0.199668469858345, -1.22523054947931), y = c(-0.323499291411831, -1.00507951141751, -0.901821819799205, 1.41189828512003, -0.131573335707317, -0.308459525548042, 1.50221794165404, -0.154047787639801, 0.44717002689869, -0.93671163236924), z = c(0.836709660070246, -0.251235618242673, -2.02289120416259, 0.499914144749108, -0.458094619767492, 1.48047512280956, 0.80987242929676, -1.17963322744287, 0.81492625128413, 0.475181724036684), radius = c(0.174093995941803, 0.75503840832971, 0.562892300076783, 0.541058518458158, 0.724675815086812, 0.828356854617596, 0.423405217472464, 0.540400178171694, 0.0765824350528419, 0.55016236170195), color = c(#FFFF, #FF9900FF, #CCFF00FF, #33FF00FF, #00FF66FF, #00FF, #0066, #3300, #CC00, #FF0099FF), alpha = 1, lit = TRUE, ambient = #00, specular = #FF, emission = #00, shininess = 50, smooth = TRUE, front = filled, back = filled, size = 3, lwd = 1, fog = FALSE, point_antialias = FALSE, line_antialias = FALSE, texture = NULL, textype = rgb, texmipmap = FALSE, texminfilter = linear, texmagfilter = linear, texenvmap = FALSE) 3: do.call(rgl.spheres, c(list(x = x, y = y, z = z, radius = radius), .fixMaterialArgs(..., Params = save))) 4: spheres3d(rnorm(10), rnorm(10), rnorm(10), radius = runif(10), color = rainbow(10)) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: Does this error message make anything clearer? On Mon, May 23, 2011 at 2:43 PM, john herbert arraystrugg...@gmail.comwrote: Dear R users, I have installed the latest version of R from source on Centos (using configure and make install). This seemed to work fine, with no Errors reported and R at the command line starts R. However, if I try and installed the package rgl using; install.packages(rgl) I get the following error; installing to /usr/local/lib64/R/library/rgl/libs ** R ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... ** testing if installed package can be loaded *** caught segfault *** address (nil), cause 'memory not mapped' aborting ... sh: line 1: 23732 Segmentation fault '/usr/local/lib64/R/bin/R' --no-save --slave /tmp/RtmpkvIjOb/file6d97876 ERROR: loading failed * removing â/usr/local/lib64/R/library/rglâ The downloaded packages are in â/tmp/Rtmp5OaGuQ/downloaded_packagesâ Updating HTML index of packages in '.Library' Making packages.html ... done Warning message: In install.packages(rgl) : installation of package 'rgl' had non-zero exit status I read that Open GL header files have to be present and are in /usr/include/GL. I also read about different graphics cards causing problems but I don't know how to find this info out. Any help appreciated and full error message included below. Thanks, sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11]
Re: [R] RGL package installation problem on Centos
On 11-05-25 6:08 AM, john herbert wrote: Hi. Thank you for your help. From your suggestions, I tried the following; R CMD INSTALL --no-test-load rgl_0.92.798.tar.gz This seemed to load and install (starting R and issuing library(rgl) did not flag any problems But running the sphere example from rgl, it causes big problems :-) # R R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(rgl) open3d() [1] 1 spheres3d(rnorm(10), rnorm(10), rnorm(10), radius=runif(10), color=rainbow(10)) X Error of failed request: GLXUnsupportedPrivateRequest Major opcode of failed request: 143 (GLX) Minor opcode of failed request: 16 (X_GLXVendorPrivate) Serial number of failed request: 75 Current serial number in output stream: 80 *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: .C(rgl_spheres, success = as.integer(FALSE), idata, as.numeric(vertex), as.numeric(radius), NAOK = TRUE) 2: rgl.spheres(x = c(0.506515614656334, -0.610549216480097, 1.08552683577513, 0.189935807154803, 1.3670636776769, 1.0181689602839, -1.51133180077403, 1.41127485066926, 0.199668469858345, -1.22523054947931), y = c(-0.323499291411831, -1.00507951141751, -0.901821819799205, 1.41189828512003, -0.131573335707317, -0.308459525548042, 1.50221794165404, -0.154047787639801, 0.44717002689869, -0.93671163236924), z = c(0.836709660070246, -0.251235618242673, -2.02289120416259, 0.499914144749108, -0.458094619767492, 1.48047512280956, 0.80987242929676, -1.17963322744287, 0.81492625128413, 0.475181724036684), radius = c(0.174093995941803, 0.75503840832971, 0.562892300076783, 0.541058518458158, 0.724675815086812, 0.828356854617596, 0.423405217472464, 0.540400178171694, 0.0765824350528419, 0.55016236170195), color = c(#FFFF, #FF9900FF, #CCFF00FF, #33FF00FF, #00FF66FF, #00FF, #0066, #3300, #CC00, #FF0099FF), alpha = 1, lit = TRUE, ambient = #00, specular = #FF, emission = #00, shininess = 50, smooth = TRUE, front = filled, back = filled, size = 3, lwd = 1, fog = FALSE, point_antialias = FALSE, line_antialias = FALSE, texture = NULL, textype = rgb, texmipmap = FALSE, texminfilter = linear, texmagfilter = linear, texenvmap = FALSE) 3: do.call(rgl.spheres, c(list(x = x, y = y, z = z, radius = radius), .fixMaterialArgs(..., Params = save))) 4: spheres3d(rnorm(10), rnorm(10), rnorm(10), radius = runif(10), color = rainbow(10)) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: Does this error message make anything clearer? The problem is being reported by your X Windows system, because something that rgl is doing is not supported by it. If you Google for GLXUnsupportedPrivateRequest you'll see a lot of similar reports for this for various systems, but I don't see a lot of solutions. I suspect it's a badly implemented graphics driver for your graphics card. All I can suggest is that you contact the vendor to see if there's an update. Duncan Murdoch On Mon, May 23, 2011 at 2:43 PM, john herbertarraystrugg...@gmail.comwrote: Dear R users, I have installed the latest version of R from source on Centos (using configure and make install). This seemed to work fine, with no Errors reported and R at the command line starts R. However, if I try and installed the package rgl using; install.packages(rgl) I get the following error; installing to /usr/local/lib64/R/library/rgl/libs ** R ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... ** testing if installed package can be loaded *** caught segfault *** address (nil), cause 'memory not mapped' aborting ... sh: line 1: 23732 Segmentation fault '/usr/local/lib64/R/bin/R' --no-save --slave /tmp/RtmpkvIjOb/file6d97876 ERROR: loading failed * removing â/usr/local/lib64/R/library/rglâ The downloaded packages are in â/tmp/Rtmp5OaGuQ/downloaded_packagesâ Updating HTML index of packages in '.Library' Making packages.html ... done Warning message: In install.packages(rgl) : installation of package 'rgl' had non-zero exit status I read that Open GL header files have to be present and are in /usr/include/GL. I also read about different graphics cards causing problems but I don't
Re: [R] plotting texas school district using shape files
Shant Ch sha1one at yahoo.com writes: Hi, I was plotting or creating a map for Texas school districts using the shape file of Texas. I could not find any other helpful mail in the mailing list. txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools)) Error- read.shape no found. But read.shape is there in maptools. A couple of things: that's probably not the *exact* error you got. Did you remember to load the package first with library(maptools) ... ? (You did install the package first, too, right?) After you have done that I suspect you will still have a problem with finding the file -- I think you want something like library(maptools) txtshp - read.shape(S:\\Districts_10_11.shp) Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Thiessen method
federico.eccel federico.eccel at gmail.com writes: I try to search in the web and in the R forum for any package for computing The thiessen method but I didn't find anything. I would like to ask if it exists any package in R that provides the possiblity to compute the Thiessen method for interpolating rain gauges. Do any of the hits provided by library(sos) findFn(thiessen) help? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that data in an R data.frame. [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] barplot groups of different size i.e. height is NOT a matrix
Hello, I want to use the function barplot do display several group of bars. A standard example is given at this link http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html But in their example the 4 groups of bars are all composed of 8 bars. I want to be able do display the same kind of graph but where the number of bars in each group are not the same. For example the first group of bars would have 2 bars and the second group of bars would have 10 bars. barplot function has a first parameter named height which is a matrix where each line are the values for the bars of one particular group. One solution could be to have a height matrix with NA values but then the space occupied by each group is equal to the size of the largest group!! So you end up with gaps (empty) where there are NAs. Do you know how to solve this problem? Do i have to consider multiple barplots in the same plot with the same axis? (btw, i don't know how to do that) In fact the bar would represent the performance of an algorithm. A group of bars would be the performance of an algorithms with different parameters. But when comparing different algorithms it is possible that we don't want to display the same number of parameters for each algorithm. Thanks for your help. Victor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot groups of different size i.e. height is NOT a matrix
Dear Victor, Here is a basic solutions using ggplot2 library(ggplot2) dataset - data.frame(Main = c(A, A, A, B, B), Detail = c(a, b, c, 1, 2), value = runif(5, min = 0.5, max = 1)) ggplot(dataset, aes(x = Detail, y = value)) + geom_bar() + facet_grid(.~Main, scales = free_x) Best regards, Thierry -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Victor Gabillon Verzonden: woensdag 25 mei 2011 14:56 Aan: r-help@r-project.org Onderwerp: [R] barplot groups of different size i.e. height is NOT a matrix Hello, I want to use the function barplot do display several group of bars. A standard example is given at this link http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one- plot.html But in their example the 4 groups of bars are all composed of 8 bars. I want to be able do display the same kind of graph but where the number of bars in each group are not the same. For example the first group of bars would have 2 bars and the second group of bars would have 10 bars. barplot function has a first parameter named height which is a matrix where each line are the values for the bars of one particular group. One solution could be to have a height matrix with NA values but then the space occupied by each group is equal to the size of the largest group!! So you end up with gaps (empty) where there are NAs. Do you know how to solve this problem? Do i have to consider multiple barplots in the same plot with the same axis? (btw, i don't know how to do that) In fact the bar would represent the performance of an algorithm. A group of bars would be the performance of an algorithms with different parameters. But when comparing different algorithms it is possible that we don't want to display the same number of parameters for each algorithm. Thanks for your help. Victor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with tune.svm() e1071
Hi, On Wed, May 25, 2011 at 4:54 AM, Salih Tuna saliht...@gmail.com wrote: Hi, I am trying to use tune.svm in e1071 package. the command i use is tobj - tune.svm(labels, data= data, cost = 10^(1:2)) The first few arguments from the method signature for tune.svm is: tune.svm(x, y = NULL, data = NULL, ...) I'm assuming your `labels` variable is a vector of class labels (or real values if you are doing regression) -- this corresponds to the `y` in the method signature. Also note in your call to tune.svm, you are missing a correct value for the `x` parameter. Should the last column of the 'data' contain the labels as well? This depends if you are using a formula for x. I want to use the linear kernel. But it gives me the error Error in model.frame.default(formula, data) : 'data' must be a data.frame, not a matrix or an array What type of object is `data`? What is the result of: R is(data) Do you know why this might happen? You aren't calling the function correctly. Either (1) create a matrix of predictor variables (rows are observations, columns are features, dimensions, whatever you want to call them) and a vector of class labels (I guess this is your `labels` variable?). Do *not* put the class labels as an extra column in your predictor variable matrix. Then do: R tune.svm(predictors, labels, ...) or (2) Use a formula interface and pass in a data.frame as the data argument: R tune.svm(y ~ some + thing, data=your.data.frame) (where 'some' and 'thing' are names of feature columns in your.data.frame, and y is the name of your label column) Please read through the help pages ?tune and ?tune.svm for more examples. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Hi, On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Why not try a few experiments and see for yourself -- I guess the answer will depend on what exactly you are doing. If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Also, if you find yourself needing to do lots of grouping/summarizing type of calculations over large data frame-like objects, you might want to check out the data.table package: http://cran.r-project.org/web/packages/data.table/index.html -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] transpose ?
Dear All, Suppose this data.frame D V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 C C C C T T G G A A C C G G C C G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C I would translate D as follow ( just for the first line) C (V7) C (V9) T G A C G C C (V8) C (V10) T G A C G C Any help would be appreciated Regards M -- Mohamed Lajnef,IE INSERM U955 eq 15# Pôle de Psychiatrie# Hôpital CHENEVIER # 40, rue Mesly # 94010 CRETEIL Cedex FRANCE # mohamed.laj...@inserm.fr # tel : 01 49 81 32 79 # Sec : 01 49 81 32 90 # fax : 01 49 81 30 99 # [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transpose ?
See ?t __Scott Chamberlain Rice University, EEB Dept. On Wednesday, May 25, 2011 at 9:07 AM, Mohamed Lajnef wrote: Dear All, Suppose this data.frame D V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 C C C C T T G G A A C C G G C C G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C I would translate D as follow ( just for the first line) C (V7) C (V9) T G A C G C C (V8) C (V10) T G A C G C Any help would be appreciated Regards M -- Mohamed Lajnef,IE INSERM U955 eq 15# Pôle de Psychiatrie # Hôpital CHENEVIER # 40, rue Mesly # 94010 CRETEIL Cedex FRANCE # mohamed.laj...@inserm.fr # tel : 01 49 81 32 79 # Sec : 01 49 81 32 90 # fax : 01 49 81 30 99 # [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: transpose ?
Dear All, Sorry for the previous mail,suppose this data.frame D V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 C C C C T T G G A A C C G G C C G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C I would translate D as follow ( just for the first line) C C T G A C G C C C T G A C G C (V8 under V7) (V9 under V10) ... Any help would be appreciated Regards M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] issues with rJava; cannot run JRI example
Hello, I am trying to run JRI example from rJava, but I have some issues. I have read many posts and didn't find any solution to my problem. I have the following code: Rengine re = new Rengine(null, false, null); System.out.println(Rengine created, waiting for R); if (!re.waitForR()) { System.out.println(Cannot load R); return; } System.out.println(re-routing stdout/err into R console); But I get the message: Creating Rengine Java Result: 10 When I run the program. It never reaches the statement below Rengine re = new Rengine(null, false, null); I think there might be some problem while creating Rengine. Then I tried something like this: without any parameters in Rengine Rengine re = new Rengine(); System.out.println(Rengine created, waiting for R); if (!re.waitForR()) { System.out.println(Cannot load R); return; } System.out.println(re-routing stdout/err into R console); Now it shows me the following message: Creating Rengine Rengine created, waiting for R re-routing stdout/err into R console which means that Rengine was created. But if i try to add some lines like this . double [] d = {1.0, 2.0, 3.0}; re.assign(a, d); after the line above and try to run again it shows me following error messages: # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6c731a9e, pid=5152, tid=492 # # JRE version: 6.0_25-b06 # Java VM: Java HotSpot(TM) Client VM (20.0-b11 mixed mode windows-x86 ) # Problematic frame: # C [R.dll+0x31a9e] # # An error report file with more information is saved as: # C:\Documents and Settings\ajayami\My Documents\NetBeansProjects\JAVA_R\hs_err_pid5152.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # Java Result: 1 BUILD SUCCESSFUL (total time: 2 seconds) I don't know how to solve this problem. Does anyone have idea how to solve this?? I have kept all the .dll files in System32 folder. Any kind of help is appreciated. Regards, Ajaya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Print the content of several columns in only one
Hi, I’m an R beginner and I'd really appreciate an hand… I'd like to create a new column in a dataframe in wich will be print the content of several other columns. For instance : I’ve got 2 columns : site – sampling number and I would like to create a third column ID, in wich will appear both the site name and the sampling number, like : site sampling ID site1 1site1.1 site1 2site1.2 site2 1site2.1 site3 1site3.1 How could I do that in R? If someone could help me it'd be great, thanks in advance ! Zoé -- View this message in context: http://r.789695.n4.nabble.com/Print-the-content-of-several-columns-in-only-one-tp3549114p3549114.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select levels of factor variables
Hi again, I've got another question... I often use the symbol == to select some levels of factor variables like : data[data$var==blabla, [ But this time, I'd like to select all the levels of my variable wich contain the letter B, is that a way to determine this conditions ? Thanks a lot ! Zoé -- View this message in context: http://r.789695.n4.nabble.com/select-levels-of-factor-variables-tp3549189p3549189.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] approximate function and find local peaks (Maxima or Minima)
Hi, I have a data-matrix: CB                   Zeit         Low 2  2011-05-02 08:05:05 7596.0 3  2011-05-02 08:10:06 7593.5 4  2011-05-02 08:15:11 7594.5 5  2011-05-02 08:20:15 7597.5 6  2011-05-02 08:25:18 7595.0 7  2011-05-02 08:30:20 7593.5 8  2011-05-02 08:35:21 7593.0 9  2011-05-02 08:40:21 7593.0 10 2011-05-02 08:45:25 7599.0 11 2011-05-02 08:50:34 7596.0 12 2011-05-02 08:55:59 7591.0 13 2011-05-02 09:01:00 7590.5 14 2011-05-02 09:06:00 7590.5 15 2011-05-02 09:11:04 7590.5 16 2011-05-02 09:16:04 7591.0 17 2011-05-02 09:21:06 7593.0 18 2011-05-02 09:26:08 7596.0 19 2011-05-02 09:31:09 7596.0 20 2011-05-02 09:36:10 7599.0 21 2011-05-02 09:41:11 7601.5 22 2011-05-02 09:46:11 7608.0 23 2011-05-02 09:51:18 7611.5 24 2011-05-02 09:56:20 7605.5 25 2011-05-02 10:01:20 7601.5 I want to approximate this data (actually I dont care, whether keep the time information, or lose it, while making it a function) With approxfun( ), it seems, like I managed to apprximate a function. f - approxfun(2:nrow(CB), CB[2:nrow(CB),2]) But how do I defferentiate f()? g-deriv(f(2:nrow(CB)),x) Did not work out for me, or at least, I dont know how to get those x, with g(x)=0. My ultimate goal, is to find all the local minima of CB[,2]. (min() gives only the global minimum) Any suggestions how to do it? Thanks for your help in advance. Michael -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combined odds ratio
Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. I am very thankful for any reply. Regards, Linda [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stepwise selection cox model
Sorry, I have wrote a wrong subject in the first email! Regards, Linda -- Forwarded message -- From: linda Porz linda.p...@gmail.com Date: 2011/5/25 Subject: combined odds ratio To: r-help@r-project.org Cc: r-help-requ...@stat.math.ethz.ch Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. I am very thankful for any reply. Regards, Linda [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] connection problem
Hi, I have a problem during choosing a Cran mirror, an error messages comes: In open.connection (con, r) connection to 'cran.r-project.org' impossible to prt 80. I don't know why? can you help me to choose a cran mirror. thanks for any suggestion. -- View this message in context: http://r.789695.n4.nabble.com/connection-problem-tp3549420p3549420.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What does smaller than comparison do on strings?
What's the logic behind the following, and where can I find any documentation about it? In particular, why are 2:9 - as characters - not regarded as being smaller than 10? # R-Code: a - as.character(1:12) a 10 # [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Thanks in advance! Niklaus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print the content of several columns in only one
Ok, I found how to do, with the function paste() -- View this message in context: http://r.789695.n4.nabble.com/Print-the-content-of-several-columns-in-only-one-tp3549114p3549514.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R as.numeric()
Thanks a lot for both replies. If I setup the option as proposed everything works as I wanted it to. I guess as.character would work as well. Only then I guess I would need to loop through the data frame. Lutz On 24/05/11 22:42, Ista Zahn wrote: This is a FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Please try there before posting a question to the list. Best, Ista On Tue, May 24, 2011 at 5:33 PM, David Scott d.sc...@auckland.ac.nz wrote: On 25/05/2011 9:20 a.m., Lutz Fischer wrote: Hi, I have a bit of a problem with as.numeric or as.double. I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls). Select a subset and then try to make it numeric: # read in the excel-file alldata-read.xlsx2(input.xls,1) # select the subset s-subset(alldata, select=c(cI,cII,cIII,cIV,cV)) # unluckily we have n/a for missing values in the file - so we turn it into proper missing values s[s == n/a]-NA n-data.matrix(s); The problem I have is that it does not convert the date the way I would expect. just as an example: s[1,2] [1] 30.94346629 3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352 0.030248034 0.030841352 0.032966308 ... n/a turned into: n[1,2] [1] 3020 And I would like to have there 30.94346629 as well. I assume that has to do with the Levels attribute - but not sure what to make of these in the first place. I also tried to convert each value on its own: #make some space that holds the actual numeric data n - array(dim=c(length(s[,1]),length(s))) # now turn everything into doubles for (c in 1:length(s)) { for (r in 1:length(s[,1])) { n[r,c]-as.double(s[r,c]) } } but that gave the same result - just a lot slower. Thanks Lutz Your problem is the conversion to factors when the data is read. Use options(stringsAsFactors = FALSE) before you read the data, then the mixed columns of numeric and missing will be read as character data and the conversion to numeric will go as you expect. (But I haven't tested this.) David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multinomial Logistical Model
On May 24, 2011; 11:06pm Belle wrote: Does anyone know how to run Multinomial logistical Model in R in order to get predicted probability? Yes. I could stop there but you shouldn't. The author of the package provides plenty of examples (and two good vignettes) showing you how to do this. Suggest you do some work in that area. Look especially at how model formulas are used/specified. This is at least one area where you have gone wrong, as the error message clearly tells you. Good luck. Mark. - Mark Difford (Ph.D.) Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logistical-Model-tp3548239p3549611.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adjusted Rate Ratios in R
I am trying to calculate Poisson regression based adjusted rate ratios in R, but R's default in glm does not code the intercept as the global rate. In SAS I use cell means coding so that the intercept is the global rate, but I do not know how to do this in R. If anyone knows a way to make glm use cell means, or an how to find adjusted rate ratios I would be grateful. -- View this message in context: http://r.789695.n4.nabble.com/Adjusted-Rate-Ratios-in-R-tp3549604p3549604.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting texas school district using shape files
Yes I had included the library(maptools) in my code, it is already installed in my computer. but still it is showing the same error. From: Ben Bolker bbol...@gmail.com To: r-h...@stat.math.ethz.ch Sent: Wed, May 25, 2011 8:06:19 AM Subject: Re: [R] plotting texas school district using shape files Shant Ch sha1one at yahoo.com writes: Hi, I was plotting or creating a map for Texas school districts using the shape file of Texas. I could not find any other helpful mail in the mailing list. txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools)) Error- read.shape no found. But read.shape is there in maptools. A couple of things: that's probably not the *exact* error you got. Did you remember to load the package first with library(maptools) ... ? (You did install the package first, too, right?) After you have done that I suspect you will still have a problem with finding the file -- I think you want something like library(maptools) txtshp - read.shape(S:\\Districts_10_11.shp) Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Thanks Jonathan. I'm already using RMySQL to load data for couple of days. I wanted to know what are the relevant R capabilities if I want to process much bigger tables. R always reads the whole set into memory and this might be a limitation in case of big tables, correct? Doesn't it use temporary files or something similar to deal such amount of data? As an example I know that SAS handles sas7bdat files up to 1TB on a box with 76GB memory, without noticeable issues. --Roman - Original Message - In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that data in an R data.frame. [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Hi, On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Why not try a few experiments and see for yourself -- I guess the answer will depend on what exactly you are doing. If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Also, if you find yourself needing to do lots of grouping/summarizing type of calculations over large data frame-like objects, you might want to check out the data.table package: http://cran.r-project.org/web/packages/data.table/index.html -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact I don't think data.table is fundamentally different from data.frame type, but thanks for the suggestion. http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf Just like data.frames, data.tables must fit inside RAM The ff package by Adler, listed in Large memory and out-of-memory data is probably most interesting. --Roman Naumenko __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: transpose ?
Then use as.matrix. Transpose is not a well-defined operation for data frames. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Mohamed Lajnef mohamed.laj...@inserm.fr wrote: Dear All, Sorry for the previous mail,suppose this data.frame D V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 C C C C T T G G A A C C G G C C G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C I would translate D as follow ( just for the first line) C C T G A C G C C C T G A C G C (V8 under V7) (V9 under V10) ... Any help would be appreciated Regards M [[alternative HTML version deleted]]_ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] panel.first problem when plotting with formula
On May 24, 2011, at 11:42 PM, Gene Leynes wrote: Peter, Good idea! (why didn't I think of that?) If it stumped the r-list, I think there is probably a slight bug with the plot formula. Problems like this make me realize how amazingly full featured and relatively bug free R is. A problem like this would never happen in Excel, because this level of functionality does not exist. However, if it did, it would probably never be fixed... and you could substitute Excel with Any commercial software. plot(dat, panel.first=bgfun() ) # succeeds So the problem is not with plot.data.frame. So someplace in the processing of dots and the handoff to do.call(funname, c(list(mf[[i]], y, ylab = yl, xlab = xl), dots)) ... where funname = plot, the dot identities do not get honored. The 'plot function is where it all started, but the first argument is now mf[[i]], and is that is now a numeric vector. So I think it gets handed off to plot.default, which sets panel.first to NULL. -- David. Gene On Tue, May 24, 2011 at 3:13 AM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2011-05-23 16:54, Gene Leynes wrote: I wrote a little function called bgfun that adds gridlines and a background, but it's not working with I plot using the formula. I have some theories on what's happening, but even if my theory is right, I don't know how to fix it. Someone have a straightforward silver bullet? No silver bullet, but this seems to work: plot(y ~ x, data=dat, type=n) points(y ~ x, data=dat, panel.first=bgfun()) (I think that plot.formula may need a fix but offhand I can't see whether that's easy or hard.) Peter Ehlers Thank you, Gene bgfun = function(color='honeydew2',linecolor='grey45', addgridlines=TRUE){ tmp=par(usr) rect(tmp[1], tmp[3], tmp[2], tmp[4], col=color) if(addgridlines){ ylimits=par()$usr[c(3,4)] abline(h=pretty(ylimits,10), lty=2, col=linecolor) } } dat = data.frame(x=1:10,y=1:10) ## Works plot(dat$x, dat$y, panel.first=bgfun()) ## Why doesn't this work? plot(y ~ x, data=dat, panel.first=bgfun()) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What does smaller than comparison do on strings?
On 25/05/2011 6:06 AM, Niklaus Kuehnis wrote: What's the logic behind the following, and where can I find any documentation about it? In particular, why are 2:9 - as characters - not regarded as being smaller than 10? # R-Code: a- as.character(1:12) a 10 # [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE See ?Comparison for help. There are lots of details given there. In summary: your second comparison is of 2 to 10. Since the character 2 sorts later than the character 1, 2 10 is FALSE. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Take a look at the High-Performance and Parallel Computing with R CRAN Task View: http://cran.us.r-project.org/web/views/HighPerformanceComputing.html specifically at the section labeled Large memory and out-of-memory data. There are some specific R features that have been implemented in a fashion to enable out of memory operations, but not all. I believe that Revolution's commercial version of R, has developed 'big data' functionality, but would defer to them for additional details. You can of course use a 64 bit version of R on a 64 bit OS to increase accessible RAM, however, there will still be object size limitations predicated upon the fact that R uses 32 bit signed integers for indexing into objects. See ?Memory-limits for more information. HTH, Marc Schwartz On May 25, 2011, at 8:49 AM, Roman Naumenko wrote: Thanks Jonathan. I'm already using RMySQL to load data for couple of days. I wanted to know what are the relevant R capabilities if I want to process much bigger tables. R always reads the whole set into memory and this might be a limitation in case of big tables, correct? Doesn't it use temporary files or something similar to deal such amount of data? As an example I know that SAS handles sas7bdat files up to 1TB on a box with 76GB memory, without noticeable issues. --Roman - Original Message - In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that data in an R data.frame. [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select levels of factor variables
On May 25, 2011, at 4:46 AM, zoe.cryocla wrote: Hi again, I've got another question... I often use the symbol == to select some levels of factor variables like : data[data$var==blabla, [ But this time, I'd like to select all the levels of my variable wich contain the letter B, is that a way to determine this conditions ? Perhaps with grep and/or %in% Got a reproducible example? ...preferably constructed with dput -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Hi, On Wed, May 25, 2011 at 10:18 AM, Roman Naumenko ro...@bestroman.com wrote: [snip] I don't think data.table is fundamentally different from data.frame type, but thanks for the suggestion. http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf Just like data.frames, data.tables must fit inside RAM Yeah, I know -- I only mentioned in the context of manipulating data.frame-like objects -- sorry if I wasn't clear. If you've got data that's data.frame like that you can store in ram AND you find yourself wanting to do some summary calcs over different subgroups of it, you might find that data.table will be a quicker way to get that done -- the larger your data.frame/table, the more noticeable the speed. To give you and idea of what scenarios I'm talking about, other packages you'd use to do the same would by plyr and sqldf. For out of memory datasets, you're in a different realm -- hence the HPC Task view link. The ff package by Adler, listed in Large memory and out-of-memory data is probably most interesting. Cool. I've had some luck using the bigmemory package (and friends) in the past as well. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.
Date: Wed, 25 May 2011 09:49:00 -0400 From: ro...@bestroman.com To: biomathjda...@gmail.com CC: r-help@r-project.org Subject: Re: [R] Processing large datasets Thanks Jonathan. I'm already using RMySQL to load data for couple of days. I wanted to know what are the relevant R capabilities if I want to process much bigger tables. R always reads the whole set into memory and this might be a limitation in case of big tables, correct? ok, now I ask, perhaps for my first R effort I will try to find source code for data frame and make a paging or streaming derivative. That is, at least for fixed size things, it can supply things like number of total rows but has facilities for paging in and out of memory. Presumably all users of data frame have to work through a limited interface which I guess could be expanded with various hints on prefetch this for example. I haven't looked at this idea in a while but the issue keeps coming up, dev list maybe? Anyway, for your immediate issues with a few statistics you could probably write a simple c++ program that ultimately becomes part of an R package. It is a good idea to see what is available but these questions come up here a lot and the normal suggestion is DB which is exactly the opposite of what you want if you have predictable access patterns ( although even here prefetch could probably be implemented). Doesn't it use temporary files or something similar to deal such amount of data? As an example I know that SAS handles sas7bdat files up to 1TB on a box with 76GB memory, without noticeable issues. --Roman - Original Message - In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that data in an R data.frame. [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot groups of different size i.e. height is NOT a matrix
On May 25, 2011, at 7:56 AM, Victor Gabillon wrote: Hello, I want to use the function barplot do display several group of bars. A standard example is given at this link http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html But in their example the 4 groups of bars are all composed of 8 bars. I want to be able do display the same kind of graph but where the number of bars in each group are not the same. For example the first group of bars would have 2 bars and the second group of bars would have 10 bars. barplot function has a first parameter named height which is a matrix where each line are the values for the bars of one particular group. One solution could be to have a height matrix with NA values but then the space occupied by each group is equal to the size of the largest group!! So you end up with gaps (empty) where there are NAs. Do you know how to solve this problem? Do i have to consider multiple barplots in the same plot with the same axis? (btw, i don't know how to do that) In fact the bar would represent the performance of an algorithm. A group of bars would be the performance of an algorithms with different parameters. But when comparing different algorithms it is possible that we don't want to display the same number of parameters for each algorithm. Thanks for your help. Victor barplot() is fundamentally built upon the use of rect() to construct the bars, so you could always create your own variant to allow for the flexibility that you desire. That being said, if your performance measures (the bar heights) are other than discrete counts or proportions, I would advise you to consider using other visual presentation forms, as these are really the only two types of data for which barplots are generally considered satisfactory. A key to barplots of course is that they are based at 0 for proper visual comparison. Thus, if you need to have the minima of the relevant axis at a value other than 0, this is another reason to not use them. Even then, many folks have moved away from barplots to use point or dot plots and similar formats, especially where you also need to include some type of confidence interval for each measure. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting texas school district using shape files
Shant Ch sha1one at yahoo.com writes: Yes I had included the library(maptools) in my code, it is already installed in my computer. but still it is showing the same error. In that case you should (1) read the posting guide, (2) copy and paste the code you ran, and the precise error you got, into an email to the list that also includes (3) the results of running sessionInfo() during your R session (after loading the maptools package). Then perhaps we will have enough information to help diagnose the problem. PS: a little more poking around shows that, at least on my system, read.shape() is *not* part of the maptools package. help.search(read.shape) finds maptools::readShapeSpatial. library(sos); findFn(read.shape) discovers that there is a read.shape() function in the spsurvey package. Ben Bolker I was plotting or creating a map for Texas school districts using the shape file of Texas. I could not find any other helpful mail in the mailing list. txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools)) Error- read.shape no found. But read.shape is there in maptools. After you have done that I suspect you will still have a problem with finding the file -- I think you want something like library(maptools) txtshp - read.shape(S:\\Districts_10_11.shp) Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multinomial Logistical Model
I suggest a couple of courses before proceeding. Multinomial logistic models have special challenges. And note that you have two nomenclature errors in your note, which is usually a sign of not having taken the relevant coursework. Frank Belle wrote: Does anyone know how to run Multinomial logistical Model in R in order to get predicted probability? The response is content (5 levels: 1, 2, 3, 4, 5) The covariance are: assignment - int (0, 1) dr0 - int (0, 1) dr1 - int (0, 1) yr_exp - num yr_exp_s - num ismgdr - int (0, 1) ismgyr_t_A - int (0, 1) pair - int (41 pairs: 1001, 1002, ...) There is no random effect involved, all the variables are fixed. I have tried mlogit, but it does not work. x - SciContent x$content - as.factor(x$content) mldata - mlogit.data(x, varying=NULL, choice=content, shape=wide) SciCt - mlogit(mldata$content | mldata$assignment + mldata$dr0 + mldata$dr1 + mldata$yr_tch_exp + mldata$yr_tch_exp_s + mldata$ismgdr + mldata$ismgyr_t_A + mldata$pair) Error: inherits(object, formula) is not TRUE - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logistical-Model-tp3548239p3550003.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] the mgcv package can not be loaded
Hi. I have been trying to load the mgcv package but I always get the error message: there is no package called 'nlme' Error: package/namespace load failed for 'mgcv' I load the package nlme and still I get the same message. I have noticed that there are some problems in using nlme in recent versions of R. Is there any suggestion or any special issue that I should know about nlme or mgcv? Thanks Gilbert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise selection cox model
On May 25, 2011, at 5:28 AM, linda Porz wrote: Sorry, I have wrote a wrong subject in the first email! Regards, Linda -- Forwarded message -- From: linda Porz linda.p...@gmail.com Date: 2011/5/25 Subject: combined odds ratio To: r-help@r-project.org Cc: r-help-requ...@stat.math.ethz.ch Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. Does the Stata method apply appropriate penalization to its stepwise procedures? I suspect you will find that the experts in survival analysis around these parts take a very dim view of stepwise procedures and I would not be surprised if they purposely put a barrier in front of naive users to protect them from falling into the well-described but perhaps not widely understood pitfalls of such methods. I do know that Harrell provides for some support for penalized methods in his cph related functions. See the function pentrace. He also has a fastbw function in rms which is provided mainly so one can investigate and demonstrate those aforementioned pitfalls. Note: You should not be adding the address r-help-requ...@stat.math.ethz.ch to your postings. You may get cryptic replies in your Inbox. It is the address for interacting with the mail-server to manage your subscription options. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R as.numeric()
On May 25, 2011, at 7:25 AM, Lutz Fischer wrote: Thanks a lot for both replies. If I setup the option as proposed everything works as I wanted it to. I guess as.character would work as well. Only then I guess I would need to loop through the data frame. as.character is vectorized. You should not need loops. -- David. Lutz On 24/05/11 22:42, Ista Zahn wrote: This is a FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Please try there before posting a question to the list. Best, Ista On Tue, May 24, 2011 at 5:33 PM, David Scott d.sc...@auckland.ac.nz wrote: On 25/05/2011 9:20 a.m., Lutz Fischer wrote: Hi, I have a bit of a problem with as.numeric or as.double. I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls). Select a subset and then try to make it numeric: # read in the excel-file alldata-read.xlsx2(input.xls,1) # select the subset s-subset(alldata, select=c(cI,cII,cIII,cIV,cV)) # unluckily we have n/a for missing values in the file - so we turn it into proper missing values s[s == n/a]-NA n-data.matrix(s); The problem I have is that it does not convert the date the way I would expect. just as an example: s[1,2] [1] 30.94346629 3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352 0.030248034 0.030841352 0.032966308 ... n/a turned into: n[1,2] [1] 3020 And I would like to have there 30.94346629 as well. I assume that has to do with the Levels attribute - but not sure what to make of these in the first place. I also tried to convert each value on its own: #make some space that holds the actual numeric data n - array(dim=c(length(s[,1]),length(s))) # now turn everything into doubles for (c in 1:length(s)) { for (r in 1:length(s[,1])) { n[r,c]-as.double(s[r,c]) } } but that gave the same result - just a lot slower. Thanks Lutz Your problem is the conversion to factors when the data is read. Use options(stringsAsFactors = FALSE) before you read the data, then the mixed columns of numeric and missing will be read as character data and the conversion to numeric will go as you expect. (But I haven't tested this.) David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the mgcv package can not be loaded
We really need some more information to be able to help you (as requested in the posting guide): What OS? What version of R? How did you install nlme? Were there any messages? What happens when you type library(nlme) at the R prompt? How did you install mgcv? Were there any messages? On Wed, May 25, 2011 at 11:13 AM, gbre...@ssc.wisc.edu wrote: Hi. I have been trying to load the mgcv package but I always get the error message: there is no package called 'nlme' Error: package/namespace load failed for 'mgcv' I load the package nlme and still I get the same message. I have noticed that there are some problems in using nlme in recent versions of R. Is there any suggestion or any special issue that I should know about nlme or mgcv? Thanks Gilbert -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print the content of several columns in only one
Or use melt in the reshape2 package to melt all columns to one with an indexing column to boot... __Scott Chamberlain Rice University, EEB Dept. On Wednesday, May 25, 2011 at 7:06 AM, zoe.cryocla wrote: Ok, I found how to do, with the function paste() -- View this message in context: http://r.789695.n4.nabble.com/Print-the-content-of-several-columns-in-only-one-tp3549114p3549514.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Date: Wed, 25 May 2011 10:18:48 -0400 From: ro...@bestroman.com To: mailinglist.honey...@gmail.com CC: r-help@r-project.org Subject: Re: [R] Processing large datasets Hi, If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Does this have any specific limitations ? It sounds offhand like it does paging and all the needed buffering for arbitrary size data. Does it work with everything? I seem to recall bigmemory came up before in this context and there was some problem. Thanks. Also, if you find yourself needing to do lots of grouping/summarizing type of calculations over large data frame-like objects, you might want to check out the data.table package: http://cran.r-project.org/web/packages/data.table/index.html -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact I don't think data.table is fundamentally different from data.frame type, but thanks for the suggestion. http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf Just like data.frames, data.tables must fit inside RAM The ff package by Adler, listed in Large memory and out-of-memory data is probably most interesting. --Roman Naumenko __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise selection cox model
Hi, You are unlikely to find one, as fundamentally, stepwise procedures are a bad way to engage in covariate selection. Search the list archives at rseek.org using 'stepwise' as the keyword to see a plethora of discussion on this point. This is not a new issue BTW, as I happened to stumble upon this 1998 Stata FAQ recently during a related search: http://www.stata.com/support/faqs/stat/stepwise.html and there are more recent literature citations and books that reinforce those points. HTH, Marc Schwartz On May 25, 2011, at 4:28 AM, linda Porz wrote: Sorry, I have wrote a wrong subject in the first email! Regards, Linda -- Forwarded message -- From: linda Porz linda.p...@gmail.com Date: 2011/5/25 Subject: combined odds ratio To: r-help@r-project.org Cc: r-help-requ...@stat.math.ethz.ch Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. I am very thankful for any reply. Regards, Linda __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Importing fixed-width data
I have a data set where the lines look like: 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM Some lines are missing the field before and after the NON: 2011-05-13 00:00:05 EONBHS229 mia13001621NON I read them into R using df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1), col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway,Source), colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,factor)) The documentation for read.fwf says that the data are read into a dataframe. Yet, I get a list, and the conversions I specified do not seem to have been obeyed: df[1:20,] DateTime Flight Dest ArrTime MsgType Conf Runway Source 1 2011-05-13 00:00:00 AAL330 dfa 13002516 PSCNON A 2 2011-05-13 00:00:01 AAL223 laa 13044510 AS.NON M . . . sapply(df, mode) DateTime FlightDest ArrTime MsgTypeConf numeric numeric numeric numeric character numeric Runway Source numeric numeric dfn = df[!is.na(df$Source),] mode(df) [1] list What am I doing wrong? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing fixed-width data
Everything looks OK. Does this help? test - data.frame(alpha=as.factor(c(A,A,B,B,C)),number=c(1,2,3,4,5)) mode(test) [1] list class(test) [1] data.frame sapply(test, mode) alphanumber numeric numeric sapply(test, class) alphanumber factor numeric On 5/25/11 10:42 AM, James Rome jamesr...@gmail.com wrote: I have a data set where the lines look like: 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM Some lines are missing the field before and after the NON: 2011-05-13 00:00:05 EONBHS229 mia13001621NON I read them into R using df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1), col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway ,Source), colClasses=c(POSIXct,NA,factor,factor,character,factor,factor, factor)) The documentation for read.fwf says that the data are read into a dataframe. Yet, I get a list, and the conversions I specified do not seem to have been obeyed: df[1:20,] DateTime Flight Dest ArrTime MsgType Conf Runway Source 1 2011-05-13 00:00:00 AAL330 dfa 13002516 PSCNON A 2 2011-05-13 00:00:01 AAL223 laa 13044510 AS.NON M . . . sapply(df, mode) DateTime FlightDest ArrTime MsgTypeConf numeric numeric numeric numeric character numeric Runway Source numeric numeric dfn = df[!is.na(df$Source),] mode(df) [1] list What am I doing wrong? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise selection cox model
See the Vignette in the glmnet package for one alternative approach to variable selection. Of course, you need to gain some background to know what you're doing here. -- Bert On Wed, May 25, 2011 at 8:38 AM, Marc Schwartz marc_schwa...@me.com wrote: Hi, You are unlikely to find one, as fundamentally, stepwise procedures are a bad way to engage in covariate selection. Search the list archives at rseek.org using 'stepwise' as the keyword to see a plethora of discussion on this point. This is not a new issue BTW, as I happened to stumble upon this 1998 Stata FAQ recently during a related search: http://www.stata.com/support/faqs/stat/stepwise.html and there are more recent literature citations and books that reinforce those points. HTH, Marc Schwartz On May 25, 2011, at 4:28 AM, linda Porz wrote: Sorry, I have wrote a wrong subject in the first email! Regards, Linda -- Forwarded message -- From: linda Porz linda.p...@gmail.com Date: 2011/5/25 Subject: combined odds ratio To: r-help@r-project.org Cc: r-help-requ...@stat.math.ethz.ch Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. I am very thankful for any reply. Regards, Linda __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
With PostgreSQL at least, R can also be used as implementation language for stored procedures. Hence data transfers between processes can be avoided alltogether. http://www.joeconway.com/plr/ Implemention of such a procedure in R appears to be straighforward: CREATE OR REPLACE FUNCTION overpaid (emp) RETURNS bool AS ' if (20 arg1$salary) { return(TRUE) } if (arg1$age 30 10 arg1$salary) { return(TRUE) } return(FALSE) ' LANGUAGE 'plr'; CREATE TABLE emp (name text, age int, salary numeric(10,2)); INSERT INTO emp VALUES ('Joe', 41, 25.00); INSERT INTO emp VALUES ('Jim', 25, 12.00); INSERT INTO emp VALUES ('Jon', 35, 5.00); SELECT name, overpaid(emp) FROM emp; name | overpaid --+-- Joe | t Jim | t Jon | f (3 rows) Best On Wednesday 25 May 2011 14:12:23 Jonathan Daily wrote: In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that data in an R data.frame. [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL14130770400 15.4800 BATS35482391Y 1 1 0 0 2011-05-24 904 Bid DELL14130772300 15.4800 BATS35482391Y 1 0 0 0 2011-05-24 904 Bid DELL14130773135 15.4800 BATS35482391Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble Combining With Paste
Dear R Helpers, I am having trouble combining some pieces of programming that work fine individually, but fall down when I try to get them to work together. The end goal is to take a data frame, and if any of the variables has more than 10 values, then use cut2 to reduce the number of (effective) values to 10. I want to do this in automated fashion, which is where the combining comes in. For example all of these pieces work as I would expect: tables-lapply(infert,table) lengths-lapply(tables,length) toolong-which(lengths10) require(Hmisc) foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE)) str(foo) #num [1:248] 2 10 9 7 7 8 1 6 1 3 ... bar-paste(inftert$,attr(toolong[1],names),sep=) bar #[1] inftert$age But the following gives an error: foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE)) Error in min(diff(x.unique))/2 : non-numeric argument to binary operator In addition: Warning message: In min(diff(x.unique)) : no non-missing arguments, returning NA Your guidance would be much appreciated. --John J. Sparks, Ph.D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the mgcv package can not be loaded
Well, that answered some of my questions, though you forgot to send your answer to the r-help list rather than just to me. I don't use windows, so someone else may have better advice. On Wed, May 25, 2011 at 12:02 PM, gbre...@ssc.wisc.edu wrote: Sorry, I forgot to be more specific. I am using Windows XP. I am using R.12.2 I installed both packages from the install packages menu. And were there any messages? I always write library(name.of.library), and it is enough. But when I write library(nlme), R does not find nlme right away I load nlme first and it says package was downloaded succesfully. load? Installed? Downloaded successfully is not the same as installed successfully. How about the actual wording? However, when I try to do this again in another day, R cannot find nlme, so I try to load mgcv with library(mgcv), then I get this message: Error: package 'nlme' could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'nlme' Is there any problem with nlme that I need to install it every time I open R? I wouldn't think so. But obviously something is not right, and you still haven't provided enough information to be able to diagnose the problem. Sarah Gilbert We really need some more information to be able to help you (as requested in the posting guide): What OS? What version of R? How did you install nlme? Were there any messages? What happens when you type library(nlme) at the R prompt? How did you install mgcv? Were there any messages? On Wed, May 25, 2011 at 11:13 AM, gbre...@ssc.wisc.edu wrote: Hi. I have been trying to load the mgcv package but I always get the error message: there is no package called 'nlme' Error: package/namespace load failed for 'mgcv' I load the package nlme and still I get the same message. I have noticed that there are some problems in using nlme in recent versions of R. Is there any suggestion or any special issue that I should know about nlme or mgcv? Thanks Gilbert -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Combining With Paste
Hi John, The issue is that: infert$age != infert$age One is a text string, the other references the information stored in the age variable of the infert object. If you need to pass the names as a string, use [ instead: ## for a data frame infert[, age] ## for a list infert[[age]] It looks like from your code maybe: infert[, attr(toolong[1],names)] HTH, Josh On Wed, May 25, 2011 at 9:02 AM, Sparks, John James jspa...@uic.edu wrote: Dear R Helpers, I am having trouble combining some pieces of programming that work fine individually, but fall down when I try to get them to work together. The end goal is to take a data frame, and if any of the variables has more than 10 values, then use cut2 to reduce the number of (effective) values to 10. I want to do this in automated fashion, which is where the combining comes in. For example all of these pieces work as I would expect: tables-lapply(infert,table) lengths-lapply(tables,length) toolong-which(lengths10) require(Hmisc) foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE)) str(foo) #num [1:248] 2 10 9 7 7 8 1 6 1 3 ... bar-paste(inftert$,attr(toolong[1],names),sep=) bar #[1] inftert$age But the following gives an error: foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE)) Error in min(diff(x.unique))/2 : non-numeric argument to binary operator In addition: Warning message: In min(diff(x.unique)) : no non-missing arguments, returning NA Your guidance would be much appreciated. --John J. Sparks, Ph.D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Combining With Paste
You need to use get() so that you are acting on the dataframe, and not the string that names the dataframe. Sarah On Wed, May 25, 2011 at 12:02 PM, Sparks, John James jspa...@uic.edu wrote: Dear R Helpers, I am having trouble combining some pieces of programming that work fine individually, but fall down when I try to get them to work together. The end goal is to take a data frame, and if any of the variables has more than 10 values, then use cut2 to reduce the number of (effective) values to 10. I want to do this in automated fashion, which is where the combining comes in. For example all of these pieces work as I would expect: tables-lapply(infert,table) lengths-lapply(tables,length) toolong-which(lengths10) require(Hmisc) foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE)) str(foo) #num [1:248] 2 10 9 7 7 8 1 6 1 3 ... bar-paste(inftert$,attr(toolong[1],names),sep=) bar #[1] inftert$age But the following gives an error: foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE)) Error in min(diff(x.unique))/2 : non-numeric argument to binary operator In addition: Warning message: In min(diff(x.unique)) : no non-missing arguments, returning NA Your guidance would be much appreciated. --John J. Sparks, Ph.D. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot groups of different size i.e. height is NOT a matrix
Victor, I agree with Marc's point of view. So, if you can use another representation of you data, like points, considering looking at http://lmdvr.r-forge.r-project.org/figures/figures.html figures 10.20 and 10.21 for a start point. Walmes. == Walmes Marques Zeviani LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W) Departamento de Estatística - Universidade Federal do Paraná fone: (+55) 41 3361 3573 VoIP: (3361 3600) 1053 1173 e-mail: wal...@ufpr.br twitter: @walmeszeviani homepage: http://www.leg.ufpr.br/~walmes linux user number: 531218 == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Fwd: Re: the mgcv package can not be loaded]
Sorry, I forgot to be more specific. I am using Windows XP. I am using R.12.2 I installed both packages from the install packages menu. I always write library(name.of.library), and it is enough. But when I write library(nlme), R does not find nlme right away I load nlme first and it says package was downloaded succesfully. However, when I try to do this again in another day, R cannot find nlme, so I try to load mgcv with library(mgcv), then I get this message: Error: package 'nlme' could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'nlme' Is there any problem with nlme that I need to install it every time I open R? Gilbert We really need some more information to be able to help you (as requested in the posting guide): What OS? What version of R? How did you install nlme? Were there any messages? What happens when you type library(nlme) at the R prompt? How did you install mgcv? Were there any messages? On Wed, May 25, 2011 at 11:13 AM, gbre...@ssc.wisc.edu wrote: Hi. I have been trying to load the mgcv package but I always get the error message: there is no package called 'nlme' Error: package/namespace load failed for 'mgcv' I load the package nlme and still I get the same message. I have noticed that there are some problems in using nlme in recent versions of R. Is there any suggestion or any special issue that I should know about nlme or mgcv? Thanks Gilbert -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Combining With Paste
John - Try infert[,toolong] = sapply(infert[,toolong],cut2,g=10,levels.mean=TRUE) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Wed, 25 May 2011, Sparks, John James wrote: Dear R Helpers, I am having trouble combining some pieces of programming that work fine individually, but fall down when I try to get them to work together. The end goal is to take a data frame, and if any of the variables has more than 10 values, then use cut2 to reduce the number of (effective) values to 10. I want to do this in automated fashion, which is where the combining comes in. For example all of these pieces work as I would expect: tables-lapply(infert,table) lengths-lapply(tables,length) toolong-which(lengths10) require(Hmisc) foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE)) str(foo) #num [1:248] 2 10 9 7 7 8 1 6 1 3 ... bar-paste(inftert$,attr(toolong[1],names),sep=) bar #[1] inftert$age But the following gives an error: foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE)) Error in min(diff(x.unique))/2 : non-numeric argument to binary operator In addition: Warning message: In min(diff(x.unique)) : no non-missing arguments, returning NA Your guidance would be much appreciated. --John J. Sparks, Ph.D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Hi, On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka marchy...@hotmail.com wrote: [snip] If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Does this have any specific limitations ? It sounds offhand like it does paging and all the needed buffering for arbitrary size data. Does it work with everything? I'm not sure what limitations ... I know the bigmemory (and ff) packages try hard to make using out-of-memory datasets as transparent as possible. That having been said, I guess you will have to port more advanced methods to use such packages, hence the existence of the biglm, biganalytics, bigtabulate packages do. I seem to recall bigmemory came up before in this context and there was some problem. Well -- I don't often see emails on this list complaining about their functionality. That doesn't mean they're flawless (I also don't scrutinize the list traffic too closely). It could be that not too many people use them, or that people give up before they come knocking when there is a problem. Has something specifically failed for you in the past, or? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjusted Rate Ratios in R
Matthew, You can change the matrix (restriction) involved. Start from help(contr.sum) to know how specify this. Walmes. == Walmes Marques Zeviani LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W) Departamento de Estatística - Universidade Federal do Paraná fone: (+55) 41 3361 3573 VoIP: (3361 3600) 1053 1173 e-mail: wal...@ufpr.br twitter: @walmeszeviani homepage: http://www.leg.ufpr.br/~walmes linux user number: 531218 == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise selection cox model
On May 25, 2011, at 12:11 PM, linda Porz wrote: Many thanks for your reply. I have run a stepwise selection in Stata and R using the function fastbw (rule=p) from Design package. Both functions give the same results. Is this because both functions do the same job or can it be that for different data one will have different results? I don't understand your question. Why would giving the same results be a concern? And why would one expect that with different data one would _not_ get different results? The point of the critique against stepwise procedures is that they assume too much determinism (i.e. that all of the internal structure of the small sample of data will be present in the wider universe) and that they generate too much confidence on the part of the unwary and insufficiently educated user. -- David. Many thanks, Linda 2011/5/25 Bert Gunter gunter.ber...@gene.com See the Vignette in the glmnet package for one alternative approach to variable selection. Of course, you need to gain some background to know what you're doing here. -- Bert On Wed, May 25, 2011 at 8:38 AM, Marc Schwartz marc_schwa...@me.com wrote: Hi, You are unlikely to find one, as fundamentally, stepwise procedures are a bad way to engage in covariate selection. Search the list archives at rseek.org using 'stepwise' as the keyword to see a plethora of discussion on this point. This is not a new issue BTW, as I happened to stumble upon this 1998 Stata FAQ recently during a related search: http://www.stata.com/support/faqs/stat/stepwise.html and there are more recent literature citations and books that reinforce those points. HTH, Marc Schwartz On May 25, 2011, at 4:28 AM, linda Porz wrote: Sorry, I have wrote a wrong subject in the first email! Regards, Linda -- Forwarded message -- From: linda Porz linda.p...@gmail.com Date: 2011/5/25 Subject: combined odds ratio To: r-help@r-project.org Cc: r-help-requ...@stat.math.ethz.ch Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. I am very thankful for any reply. Regards, Linda __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I assign boolean (o,1) values to a column?
Thankyou very much, I managed to count he numbr of Markers 2 linked to Markers 3. And Markers 1 to Markers 3 with the aggregate function: with(data,aggregate(Marker1,list(Marker2=Marker2),length)) data2-with(data,aggregate(Marker1,list(Marker2=Marker2,Marker3=Merker3),length)) So, now is easy I will only apply an if and solved. I want to thankyou Steve and David, the info you gave was actally usefull and I learned the ave now. I hope I can start being usefull in the R blog myself soon. Regards -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-assign-boolean-o-1-values-to-a-column-tp3544304p3550309.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to intantiate a list of data.frames?
Hi Josh, You are definitely right. And were all time. Yes, the problem was always with the write.csv(). I though it was with the ds. Thank you very much. Cheers, Rui Date: Tue, 24 May 2011 14:30:56 -0700 Subject: Re: [R] How to intantiate a list of data.frames? From: jwiley.ps...@gmail.com To: ruimax...@hotmail.com CC: r-help@r-project.org Hi Rui, Please look at the documentation for ?write.csv I do not have oilDF, but my guess is that you make the object, ds fine, but then you are trying to pass a list to write.csv which works on matrices or data frames (or attempts to coerce to such). The easiest answer is probably to write each element of ds (that is, each data frame) to a separate file. Cheers, Josh On Sun, May 22, 2011 at 12:11 PM, Rui Maximo ruimax...@hotmail.com wrote: I will post the whole function, but I believe the problem is in the 3th part. The issue is that oilDF has different number of rows than oilDF2. Thank you, Rui myScan - function(dirPath, num) { #dirPath is the name of the directory where we want to apply the function. It should be called from the immediate above level without the last 3 characters. For example dirPath=oil_0 #num is the mussel number #Heart rate startPath=getwd() workPath=paste(startPath,/, dirPath,_HR, sep=) setwd(workPath) temp=dir() d=sort(temp) oilDF=read.table (d[1], header=TRUE) oilDF=data.frame(oilDF[,1], oilDF[,2], oilDF[,num+2]) for(i in 2:length(d)) { temp - read.table(d[i], header=TRUE) temp=data.frame(temp[,1], temp[,2], temp[,num+2]) colnames(temp) - colnames(oilDF) oilDF=rbind(oilDF,temp) } setwd(startPath) #Valve Gape workPath=paste(startPath,/, dirPath,_VG, sep=) setwd(workPath) temp=dir() d=sort(temp) oilDF2=read.table (d[1], header=FALSE) oilDF2=data.frame(oilDF2[,1],oilDF2[,2],oilDF2[,num+3]) for(i in 2:length(d)) { temp - read.table(d[i], header=FALSE) temp=data.frame(temp[,1], temp[,2], temp[,num+3]) colnames(temp) - colnames(oilDF2) oilDF2=rbind(oilDF2,temp) } #Pack both signals in a vector of dataframes for each Mussel. ds - vector(list, 2) timeHR = as.numeric(strptime(paste(oilDF[,1],oilDF[,2]), %m/%d/%y %H:%M:%OS)) timeVG = as.numeric(strptime(paste(oilDF2[,1],oilDF2[,2]), %d/%m/%y %H:%M:%OS)) ds[[1]] - data.frame(timeHR,oilDF[,3]) ds[[2]] - data.frame(timeVG,oilDF2[,3]) write.csv(ds,paste(startPath, /, mussel_, i, dirPath, .csv, sep=)) return(ds) } Date: Sun, 22 May 2011 11:33:38 -0700 Subject: Re: [R] How to intantiate a list of data.frames? From: jwiley.ps...@gmail.com To: ruimax...@hotmail.com CC: r-help@r-project.org Hi Rui, data frames must have the same number of rows, but two different data frames stored within a list do not need to have the same number of rows. Can you please post the code that is giving the error? Josh On Sun, May 22, 2011 at 9:41 AM, Rui Maximo ruimax...@hotmail.com wrote: Hi Josh, Sorry, your examples have equal number of rows in both df and df2. In my situation they haven't. Strangely, your solution have worked only when I am copy post the code into the command line. If I use the code inside of a function I get an error at: return(ds) ERROR: arguments imply differing number of rows Thanks, Rui Date: Sat, 21 May 2011 11:46:05 -0700 Subject: Re: [R] How to intantiate a list of data.frames? From: jwiley.ps...@gmail.com To: ruimax...@hotmail.com CC: r-help@r-project.org Hi Rui, Here is one option: ds - vector(list, 6) for(i in 1:6) ds[[i]] - list(df = mtcars[, c(i, i + 2)], df2 = mtcars[, c(i, i + 2)] + 10) another could be: altds - lapply(1:6, function(x) { list(df = mtcars[, c(x, x + 2)], df2 = mtcars[, c(x, x + 2)] + 10) }) all.equal(ds, altds) For some documentation, see ?vector ?lapply Cheers, Josh On Sat, May 21, 2011 at 10:47 AM, Rui Maximo ruimax...@hotmail.com wrote: Hello, I am newbie to R and I want to do this: for(i in 1:6) { ds[i] - list(df=data.frame(oilDF[,1],oilDF[,i+2]), df2=data.frame(oilDF2[,1],oilDF2[,i+2])) } #oilDF and oilDF2 are 2 data frames with several columns. They have different number of rows #I want to have for example ds[1]$df, ds[1]$df2 with the respective data.frames. #How can I instantiate a list of data.frames pairs with different number of rows? Thank you, Rui [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] stepwise selection cox model
Many thanks for your reply. I have run a stepwise selection in Stata and R using the function fastbw (rule=p) from Design package. Both functions give the same results. Is this because both functions do the same job or can it be that for different data one will have different results? Many thanks, Linda 2011/5/25 Bert Gunter gunter.ber...@gene.com See the Vignette in the glmnet package for one alternative approach to variable selection. Of course, you need to gain some background to know what you're doing here. -- Bert On Wed, May 25, 2011 at 8:38 AM, Marc Schwartz marc_schwa...@me.com wrote: Hi, You are unlikely to find one, as fundamentally, stepwise procedures are a bad way to engage in covariate selection. Search the list archives at rseek.org using 'stepwise' as the keyword to see a plethora of discussion on this point. This is not a new issue BTW, as I happened to stumble upon this 1998 Stata FAQ recently during a related search: http://www.stata.com/support/faqs/stat/stepwise.html and there are more recent literature citations and books that reinforce those points. HTH, Marc Schwartz On May 25, 2011, at 4:28 AM, linda Porz wrote: Sorry, I have wrote a wrong subject in the first email! Regards, Linda -- Forwarded message -- From: linda Porz linda.p...@gmail.com Date: 2011/5/25 Subject: combined odds ratio To: r-help@r-project.org Cc: r-help-requ...@stat.math.ethz.ch Dear all, I am looking for an R function which does stepwise selection cox model in r (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr: stcox in STATA. I am very thankful for any reply. Regards, Linda __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subtracting rows by id
Dear R users, I have two datasets: id1 - c(rep(1,10), rep(2,10), rep(3,10)) value1 - sample(1:100, 30, replace=TRUE) dataset1 - cbind(id1,value1) id2 - c(1,2,3) subtract.value - c(1,3,5) dataset2 - cbind(id2, subtract.value) I want to subtract the number of rows in the subtract.value that corresponds to the id value in dataset1. So for the 1 in id1, I want to remove the first row, for 2 in id1 I want to remove the first 3 rows, for 3 in id1 I want to remove the first 5 rows, finally creating a new dataframe with the remaining values. I am having trouble structuring a loop that can do this by the unique ids in the first dataset while matching the ids in the datasets. Any thoughts would be greatly appreciated. Thank you, Sara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot groups of different size i.e. height is NOT a matrix
You can produce a graph similar to the ggplot with lattice::barchart, require(lattice) dataset - data.frame(Main=c(A,A,A,B,B), Detail=c(a,b,c,1,2), value=runif(5, min= 0.5, max=1)) barchart(value~Detail|Main, data=dataset, scales=list(x=list(relation=free))) Walmes. == Walmes Marques Zeviani LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W) Departamento de Estatística - Universidade Federal do Paraná fone: (+55) 41 3361 3573 VoIP: (3361 3600) 1053 1173 e-mail: wal...@ufpr.br twitter: @walmeszeviani homepage: http://www.leg.ufpr.br/~walmes linux user number: 531218 == On Wed, May 25, 2011 at 12:04 PM, Marc Schwartz marc_schwa...@me.comwrote: On May 25, 2011, at 7:56 AM, Victor Gabillon wrote: Hello, I want to use the function barplot do display several group of bars. A standard example is given at this link http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html But in their example the 4 groups of bars are all composed of 8 bars. I want to be able do display the same kind of graph but where the number of bars in each group are not the same. For example the first group of bars would have 2 bars and the second group of bars would have 10 bars. barplot function has a first parameter named height which is a matrix where each line are the values for the bars of one particular group. One solution could be to have a height matrix with NA values but then the space occupied by each group is equal to the size of the largest group!! So you end up with gaps (empty) where there are NAs. Do you know how to solve this problem? Do i have to consider multiple barplots in the same plot with the same axis? (btw, i don't know how to do that) In fact the bar would represent the performance of an algorithm. A group of bars would be the performance of an algorithms with different parameters. But when comparing different algorithms it is possible that we don't want to display the same number of parameters for each algorithm. Thanks for your help. Victor barplot() is fundamentally built upon the use of rect() to construct the bars, so you could always create your own variant to allow for the flexibility that you desire. That being said, if your performance measures (the bar heights) are other than discrete counts or proportions, I would advise you to consider using other visual presentation forms, as these are really the only two types of data for which barplots are generally considered satisfactory. A key to barplots of course is that they are based at 0 for proper visual comparison. Thus, if you need to have the minima of the relevant axis at a value other than 0, this is another reason to not use them. Even then, many folks have moved away from barplots to use point or dot plots and similar formats, especially where you also need to include some type of confidence interval for each measure. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Frame housekeeping
On May 25, 2011, at 1:16 PM, Scott Hatcher wrote: Hello Dr. Winsemius, First of all, thank you for your prompt and helpful reply. Also, for providing something I hoped would be produced from joining this mailing list: a means of discovering incredibly useful packages such as the reshape2 one you have introduced me too. I have a follow up question to your solution (which should produce exactly what I need): when I run the cast function to reassemble the data frame I get: I used `dcast`. Error in names(data) - array_names(res$labels[[2]]) : 'names' attribute [7] must be the same length as the vector [1] And I obviously didn't get that error, so there might be a difference in either the code (which you did not show), or the data (which you did not offer in a reproducible form). This signaled to me that the function was returning 7 values where it expected only 1. To test this I applied a summary function mean to the cast, and the result processed (however it only produced NA's because my values were class:factors). What I don't understand is where these multiple values are coming from; there should be only a single value corresponding to the 4 id.vars given in the cast function (STN_ID,YEAR,MM,variable). If you want further effort you should address the inadequacies of your question. It is very possible that you will need to acquaint yourself with the use of either `dump` pr `dput`. -- David. Thanks again for your help, Scott Hatcher On 24/05/2011 5:16 PM, David Winsemius wrote: On May 24, 2011, at 3:03 PM, Scott Hatcher wrote: Hello, I have a large data frame that is organized by date in a peculiar way. I am seeking advice on how to transform the data into a format that is of more use to me. The data is organized as follows: STN_ID YEAR MM ELEM X1 X2 X3 X4X5X6 X7 1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190 -00251 -00243 -00249 2 2402594 1997 10 1 -3 -5 -1 -00039 -00031 -00036 -00033 3 2402594 1997 11 1 25 65 70 69 000115 72 93 Where MM is the month of the year, and ELEM is the variable to which the values in the X* columns describe (in the actual data there are 31 X columns, one for each day of the month). The values in bold are the values that are transferred into the small chart below (which is the result I hope to get). This is to give a sense of how the data is picked out of the original data frame. assuming this dataframe is named 'tst': require(reshape2) mtst - melt(tst[, 1:7], id.vars=1:4) Only select idvars and X1:X3 str(mtst) #-- 'data.frame':54 obs. of 6 variables: $ STN_ID : num 2402594 2402594 2402594 2402594 2402594 ... $ YEAR: num 1997 1997 1997 1997 1998 ... $ MM : num 9 10 11 12 1 2 3 4 5 9 ... $ ELEM: num 1 1 1 1 1 1 1 1 1 2 ... $ variable: Factor w/ 3 levels X1,X2,X3: 1 1 1 1 1 1 1 1 1 1 ... $ value : chr -00233 -3 25 000160 ... dcast(mtst, STN_ID +YEAR+ MM + variable ~ ELEM) #- STN_ID YEAR MM variable 1 2 1 2402594 1997 9 X1 -00233 -00339 2 2402594 1997 9 X2 -00204 -00339 3 2402594 1997 9 X3 -00119 -00343 4 2402594 1997 10 X1 -3 -00207 5 2402594 1997 10 X2 -5 -00289 6 2402594 1997 10 X3 -1 -00278 7 2402594 1997 11 X1 25 -00242 snipped output I would like to organize the data so it looks like this: STN_ID YEAR MM DAYELEM1 ELEM2 1 2402594 1997 9 X1 -00233 -00339 2 2402594 1997 9 X2 -00204 77 3 2402594 1997 9 X3 -00119 30 Where is that second column coming from. I don't see it in the data example Such that I create a new column named DAY that is made up of the numbers following X in the original data.frame columns. Also, the ELEM values are converted to columns and parsed with the ELEM code (in this case 1 and 2). I have tried to split apart the columns, transform them, and bind them back together, but my ability to do so just isn't there yet. I am still fairly new to R, and would really appreciate some help in working towards organizing this data frame. Thanks in advance, Scott Hatcher [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: transpose ?
Hi: Does this work? dd - read.table(textConnection( C C C C T T G G A A C C G G C C G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C ), stringsAsFactors = FALSE) # Convert the data frame to a character matrix # To do this, you need to make sure that the variables in # your data frame are character rather than factor dm - as.matrix(dd) dm# elements should be quoted if character # Create an empty list of nrow(dm) components mm - vector('list', nrow(dm)) # Create a two-row matrix from each row of dm for(i in seq_len(nrow(dm))) mm[[i]] - matrix(dm[i, ], nrow = 2) mm [[1]] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] C C T G A C G C [2,] C C T G A C G C [[2]] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] G T A A T T C G [2,] G T A A A T C G [[3]] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] C C T G A C G C [2,] C C T G A C G C HTH, Dennis On Wed, May 25, 2011 at 7:19 AM, Mohamed Lajnef mohamed.laj...@inserm.fr wrote: Dear All, Sorry for the previous mail,suppose this data.frame D V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 C C C C T T G G A A C C G G C C G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C I would translate D as follow ( just for the first line) C C T G A C G C C C T G A C G C (V8 under V7) (V9 under V10) ... Any help would be appreciated Regards M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What does smaller than comparison do on strings?
Hi: Here are two alternatives that do work as you expect; sprintf() is your friend: sprintf(%2d, 1:12) [1] 1 2 3 4 5 6 7 8 9 10 11 12 sprintf(%02d, 1:12) [1] 01 02 03 04 05 06 07 08 09 10 11 12 sprintf(%2d, 1:12) 10 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE sprintf(%02d, 1:12) 10 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE A leading space or leading 0 on the digits 1-9 'fixes' the problem for the reason Duncan mentioned. HTH, Dennis On Wed, May 25, 2011 at 3:06 AM, Niklaus Kuehnis kuehnik_0...@gmx-topmail.de wrote: What's the logic behind the following, and where can I find any documentation about it? In particular, why are 2:9 - as characters - not regarded as being smaller than 10? # R-Code: a - as.character(1:12) a 10 # [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Thanks in advance! Niklaus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep pattern
try this using strsplit: x - round(runif(10)*10, digits=0) y - as.Date(x, origin=1970-01-01) str(y) Class 'Date' num [1:10] 26551 37212 57285 90821 20168 ... y1 - as.character(y) str(y1) chr [1:10] 2042-09-11 2071-11-19 2126-11-04 2218-08-30 2025-03-21 2215-12-22 ... x - strsplit(y1, '-') x[1:3] [[1]] [1] 2042 09 11 [[2]] [1] 2071 11 19 [[3]] [1] 2126 11 04 x.1 - sapply(x, '[', 3) str(x.1) chr [1:10] 11 19 04 30 21 22 24 03 31 02 On Tue, May 24, 2011 at 10:19 AM, Kang Min ngokang...@gmail.com wrote: I have another question - I'd like to extract dates from a vector of -mm-dd, so I just want the dd. x - round(runif(10)*10, digits=0) y - as.Date(x, origin=1970-01-01) I tried this based on the code that Jim provided, but it just printed the whole date. I think I just need to tweak it a little, but haven't been able to figure it out. y[grep([[:digit:]]{2}$, y)] Thanks. Kang Min On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote: If you want to only match names of length 6, you will have to use thispattern: x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ, + ZZAZ, ZRITEZ) # match exactly values of length 6 len6 - ^Z[[:alpha:]]{4}Z$ grep(len6, x) [1] 2 5 9 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote: Thanks! On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote: On May 20, 2011, at 11:57 AM, Kang Min wrote: Hi all, I'm trying to subset apatternin a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. You may need to study the regex page a bit longer the ^ is the beginning of a string .+ will math can arbitrarily long string of anything and $ indicates the end of a string x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) grep(^Z.+Z$, x) [1] 2 5 grep(^Z.+Z$, x, value=TRUE) [1] ZFHJKZ ZKFLPZ Thanks. Kangmin __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix Manipulation...
Hello everyone, I have a 2 x 5 matrix: say 0.2 0.3 1 -1 3 0.2. 0.4 5 0.5 -1 I want to replace all the values greater than or equal to 1 with 1 and those less than or equal to 0 with 0. So I should end up with a mtrix looking like: 0.2 0.3 1 0 1 0.2. 0.4 1 0.5 0 Any ideas how to do this? -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix Manipulation...
It's very easy to do in two steps: testmat - matrix(c(.2, .3, 1, -1, 3, .2, .4, 5, .5, -1), byrow=TRUE, nrow=2) testmat [,1] [,2] [,3] [,4] [,5] [1,] 0.2 0.31 -1.03 [2,] 0.2 0.45 0.5 -1 testmat[testmat = 1] - 1 testmat[testmat 0] - 0 testmat [,1] [,2] [,3] [,4] [,5] [1,] 0.2 0.31 0.01 [2,] 0.2 0.41 0.50 This is pretty basic. You might want to read one of the many excellent intro to R guides, especially the subsetting section. Sarah On Wed, May 25, 2011 at 2:51 PM, Jim Silverton jim.silver...@gmail.com wrote: Hello everyone, I have a 2 x 5 matrix: say 0.2 0.3 1 -1 3 0.2. 0.4 5 0.5 -1 I want to replace all the values greater than or equal to 1 with 1 and those less than or equal to 0 with 0. So I should end up with a mtrix looking like: 0.2 0.3 1 0 1 0.2. 0.4 1 0.5 0 Any ideas how to do this? -- Thanks, Jim. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing fixed-width data
I get a data frame on my end: lines - 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM 2011-05-13 00:00:05 EONBHS229 mia13001621NON df = read.fwf(textConnection(lines), widths=c(19,-4,7,3,8,2,1,3,1), col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway,Source), colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,factor)) df DateTime Flight Dest ArrTime MsgType Conf Runway Source 1 2011-05-13 00:00:00 AAL330 dfa 13002516 PSCNON A 2 2011-05-13 00:00:01 AAL223 laa 13044510 AS.NON M 3 2011-05-13 00:00:05 BHS229 mia 13001621 NON NA NA str(df) 'data.frame': 3 obs. of 8 variables: $ DateTime: POSIXct, format: 2011-05-13 00:00:00 2011-05-13 00:00:01 ... $ Flight : Factor w/ 3 levels AAL223 ,AAL330 ,..: 2 1 3 $ Dest: Factor w/ 3 levels dfa,laa,mia: 1 2 3 $ ArrTime : Factor w/ 3 levels 13001621,13002516,..: 2 3 1 $ MsgType : chr PS AS NO $ Conf: Factor w/ 3 levels .,C,N: 2 1 3 $ Runway : Factor w/ 1 level NON: 1 1 NA $ Source : Factor w/ 2 levels A,M: 1 2 NA sessionInfo() R version 2.13.0 Patched (2011-04-19 r55523) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] gplots_2.8.0caTools_1.12bitops_1.0-4.1 gdata_2.8.2 [5] gtools_2.6.2sos_1.3-0 brew_1.0-6 lattice_0.19-26 [9] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 loaded via a namespace (and not attached): [1] tools_2.13.0 Dennis On Wed, May 25, 2011 at 8:42 AM, James Rome jamesr...@gmail.com wrote: I have a data set where the lines look like: 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM Some lines are missing the field before and after the NON: 2011-05-13 00:00:05 EONBHS229 mia13001621NON I read them into R using df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1), col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway,Source), colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,factor)) The documentation for read.fwf says that the data are read into a dataframe. Yet, I get a list, and the conversions I specified do not seem to have been obeyed: df[1:20,] DateTime Flight Dest ArrTime MsgType Conf Runway Source 1 2011-05-13 00:00:00 AAL330 dfa 13002516 PS C NON A 2 2011-05-13 00:00:01 AAL223 laa 13044510 AS . NON M . . . sapply(df, mode) DateTime Flight Dest ArrTime MsgType Conf numeric numeric numeric numeric character numeric Runway Source numeric numeric dfn = df[!is.na(df$Source),] mode(df) [1] list What am I doing wrong? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Job opening at Harvard Business School
Dear Colleagues, I'd like to draw your attention to the following job available at the Harvard Business School. We are looking for a candidate with strong statistical/econometrical background, with strong programming skills in R or Stata. Please apply through the following link. If you know someone with this qualification, please forward the email. Thanks! Job Number: 23803BR Statistician/Analyst Harvard Business School Boston, Massachusetts Application: http://jobs.brassring.com/1033/asp/tg/cim_jobdetail.asp?partnerID=25240siteID=5341AReq=23803br Duties Responsibilities Reporting to Director of Research Computing Services, works directly with faculty and other RCS staff in support of research-related projects. Provides advanced statistical consultation for faculty researchers and doctoral students. Maintains expertise in new research methodologies and techniques. Provides design and statistical consultation for researchers, as well as primary support and training for two or more statistical software packages (e.g., R, STATA, MATLAB, SAS, Mathematica). Employs methodological approaches such as multinomial logit and similar models, time-series analysis, random effects models, survival analysis, text analysis, and other appropriate tools. Manages and manipulates data using packages such as Python, mySQL, and other tools. Produces results as reports, presentations, graphics, web sites. Explores and tests statistical software. Develops statistical and technical documents for the RCS web site. Basic Qualifications Advanced degree in quantitative field required. 3+ years statistical/programming experience in research based setting; mathematics background; broad training and good habits in data management and analysis; expertise with multiple statistical software packages, including R, Stata, MatLab, SAS, or Mathematica; problem solving skills, organizational ability, communication skills, initiative. Strong customer service orientation. Ability to work independently and on a team. Demonstrated ability and desire to develop and maintain expertise in emerging research methods and technologies. Additional Qualifications Ph.D. preferred. Desired abilities include experience in Linux-based parallel processing computing environments; business-related research experience; experience with large data sets; familiarity with computer programming languages, such as Python or C++ . Chase H. Harrison Director, Research Computing Services Principal Survey Methodologist Harvard Business School Baker Library | Bloomberg Center B-93 Soldiers Field Rd. Boston, MA 02163 617.495.6100 (Main) 617.496.6252 (Direct) 617.495.5287 (FAX) charri...@hbs.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset
At 00:41 25/05/2011, Greg Snow wrote: The only statistical method that I know of that can be applied to any dataset without further definition of the nature of the data or the question being asked is SnowsCorrectlySizedButOtherwiseUselessTestOfAnything which is found in the TeachingDemos package for R. Greg, have you overlooked the intra-ocular trauma test? However this test is not common (for a couple of very good reasons). If you want a more useful method you first need to decide on what your question is that you want answered and have some more detail about the dataset. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ramnath R Sent: Monday, May 23, 2011 12:12 PM To: r-help@r-project.org Subject: [R] What are the common Standard Statistical methods used for the analysis of a dataset Hi, Anybody know what are the common Standard statistical methods used for the analysis of a dataset,and anybody know which of these methods give similar results Ram [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing elements of a list
I have a list that is made of lists of varying length. I wish to create a new vector that contains the last element of each list. So far I have used sapply to determine the length of each list, but I'm stymied at the part where I index the list to make a new vector containing only the last item of each list mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz)) # Create list last - sapply(mylist,length) # Make vector with list lengths last_only - mylist[[1:length(mylist)]][last] # Crash and burn trying to make new vector with last items! How do I do this last step? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California sbige...@fs.fed.us / ph. 530 759 1718 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing elements of a list
On May 25, 2011, at 3:25 PM, Seth W Bigelow wrote: I have a list that is made of lists of varying length. I wish to create a new vector that contains the last element of each list. So far I have used sapply to determine the length of each list, but I'm stymied at the part where I index the list to make a new vector containing only the last item of each list mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz)) # Create list last - sapply(mylist,length) # Make vector with list lengths last_only - mylist[[1:length(mylist)]][last] # Crash and burn trying to make new vector with last items! How do I do this last step? lapply(mylist, tail, 1) [[1]] [1] 3 [[2]] [1] dog [[3]] [1] zz unlist(lapply(mylist, tail, 1)) [1] 3 dog zz Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California sbige...@fs.fed.us / ph. 530 759 1718 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing elements of a list
On May 25, 2011, at 2:25 PM, Seth W Bigelow wrote: I have a list that is made of lists of varying length. I wish to create a new vector that contains the last element of each list. So far I have used sapply to determine the length of each list, but I'm stymied at the part where I index the list to make a new vector containing only the last item of each list mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz)) # Create list last - sapply(mylist,length) # Make vector with list lengths last_only - mylist[[1:length(mylist)]][last] # Crash and burn trying to make new vector with last items! How do I do this last step? See ?tail lapply(mylist, tail, 1) [[1]] [1] 3 [[2]] [1] dog [[3]] [1] zz You can't actually create a vector, since your list contains both numeric and alpha data types and a vector can only contain a single data type. The 3 would be coerced to 3 (a character 3, not the number 3). If your actual data contains the same type in each element, replace lapply() above with sapply() and that will return a vector. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing elements of a list
On May 25, 2011, at 3:25 PM, Seth W Bigelow wrote: I have a list that is made of lists of varying length. I wish to create a new vector that contains the last element of each list. So far I have used sapply to determine the length of each list, but I'm stymied at the part where I index the list to make a new vector containing only the last item of each list mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz)) # Create list last - sapply(mylist,length) # Make vector with list lengths last_only - mylist[[1:length(mylist)]][last] # Crash and burn trying to make new vector with last items! If you wanted to apply the successive values of last using [ to successive values of mylist there is a list-ish method via mapply: mapply([, mylist, last) [1] 3 dog zz `mapply` is also the function underlying `Vectorise` -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subtracting rows by id
Hi: Interesting problem. Here's one approach: library(plyr) # Read in your datasets as data frames rather than matrices dataset1 - data.frame(id1 = rep(1:3, each = 10), value1 = sample(seq_len(100), 30, replace = TRUE)) dataset2 - data.frame(id2 = 1:3, subtract.value = c(1, 3, 5)) # The idea is to use the rows of dataset2 as parameters for # subsetting and removing the first n_i rows. The tail() function # serves the purpose: foo - function(id2, subtract.value) tail(subset(dataset1, id1 == id2), -subtract.value) # Use the mdply function in the plyr package: mdply(dataset2, foo)[, -(1:2)] id1 value1 11 2 21 55 31 18 41 4 51 3 61 76 71 74 81 21 91 97 10 2 19 11 2 49 12 2 20 13 2 73 14 2 79 15 2 95 16 2 52 17 3 60 18 3 58 19 3 68 20 3 59 21 3 13 HTH, Dennis On Wed, May 25, 2011 at 9:55 AM, Sara Maxwell smaxw...@ucsc.edu wrote: Dear R users, I have two datasets: id1 - c(rep(1,10), rep(2,10), rep(3,10)) value1 - sample(1:100, 30, replace=TRUE) dataset1 - cbind(id1,value1) id2 - c(1,2,3) subtract.value - c(1,3,5) dataset2 - cbind(id2, subtract.value) I want to subtract the number of rows in the subtract.value that corresponds to the id value in dataset1. So for the 1 in id1, I want to remove the first row, for 2 in id1 I want to remove the first 3 rows, for 3 in id1 I want to remove the first 5 rows, finally creating a new dataframe with the remaining values. I am having trouble structuring a loop that can do this by the unique ids in the first dataset while matching the ids in the datasets. Any thoughts would be greatly appreciated. Thank you, Sara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are the common Standard Statistical methods used fo
[See in-line below] On 25-May-11 19:14:11, Michael Dewey wrote: At 00:41 25/05/2011, Greg Snow wrote: The only statistical method that I know of that can be applied to any dataset without further definition of the nature of the data or the question being asked is SnowsCorrectlySizedButOtherwiseUselessTestOfAnything which is found in the TeachingDemos package for R. Greg, have you overlooked the intra-ocular trauma test? No, Greg has not overlooked it. He invented it. However, he never published it, preferring to communicate it by causing others to feel its impact whenever he writes anything. Ted. However this test is not common (for a couple of very good reasons). If you want a more useful method you first need to decide on what your question is that you want answered and have some more detail about the dataset. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ramnath R Sent: Monday, May 23, 2011 12:12 PM To: r-help@r-project.org Subject: [R] What are the common Standard Statistical methods used for the analysis of a dataset Hi, Anybody know what are the common Standard statistical methods used for the analysis of a dataset,and anybody know which of these methods give similar results Ram [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 25-May-11 Time: 21:15:36 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Date: Wed, 25 May 2011 12:32:37 -0400 Subject: Re: [R] Processing large datasets From: mailinglist.honey...@gmail.com To: marchy...@hotmail.com CC: ro...@bestroman.com; r-help@r-project.org Hi, On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka wrote: [snip] If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Does this have any specific limitations ? It sounds offhand like it does paging and all the needed buffering for arbitrary size data. Does it work with everything? I'm not sure what limitations ... I know the bigmemory (and ff) packages try hard to make using out-of-memory datasets as transparent as possible. That having been said, I guess you will have to port more advanced methods to use such packages, hence the existence of the biglm, biganalytics, bigtabulate packages do. I seem to recall bigmemory came up before in this context and there was some problem. Well -- I don't often see emails on this list complaining about their functionality. That doesn't mean they're flawless (I also don't scrutinize the list traffic too closely). It could be that not too many people use them, or that people give up before they come knocking when there is a problem. Has something specifically failed for you in the past, or? No, I haven't tried. I may have it confused with something else. But this question does come up a bit usually related to I tried to read huge file into data frame and wanted to pass it to something with predictable memory access patterns and it ran out of memory. What can I do? I guess I also stopped reading anything after using a DB as this is generally not a replacement for a data strcuture. I'll take a look when I have a big dataset that I can't condense easily. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Frame housekeeping
Hello Dr. Winsemius, First of all, thank you for your prompt and helpful reply. Also, for providing something I hoped would be produced from joining this mailing list: a means of discovering incredibly useful packages such as the reshape2 one you have introduced me too. I have a follow up question to your solution (which should produce exactly what I need): when I run the cast function to reassemble the data frame I get: Error in names(data) - array_names(res$labels[[2]]) : 'names' attribute [7] must be the same length as the vector [1] This signaled to me that the function was returning 7 values where it expected only 1. To test this I applied a summary function mean to the cast, and the result processed (however it only produced NA's because my values were class:factors). What I don't understand is where these multiple values are coming from; there should be only a single value corresponding to the 4 id.vars given in the cast function (STN_ID,YEAR,MM,variable). Thanks again for your help, Scott Hatcher On 24/05/2011 5:16 PM, David Winsemius wrote: On May 24, 2011, at 3:03 PM, Scott Hatcher wrote: Hello, I have a large data frame that is organized by date in a peculiar way. I am seeking advice on how to transform the data into a format that is of more use to me. The data is organized as follows: STN_ID YEAR MM ELEM X1 X2 X3 X4 X5X6 X7 1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190 -00251 -00243 -00249 2 2402594 1997 10 1 -3 -5 -1 -00039 -00031 -00036 -00033 3 2402594 1997 11 1 25 65 70 69 000115 72 93 Where MM is the month of the year, and ELEM is the variable to which the values in the X* columns describe (in the actual data there are 31 X columns, one for each day of the month). The values in bold are the values that are transferred into the small chart below (which is the result I hope to get). This is to give a sense of how the data is picked out of the original data frame. assuming this dataframe is named 'tst': require(reshape2) mtst - melt(tst[, 1:7], id.vars=1:4) Only select idvars and X1:X3 str(mtst) #-- 'data.frame':54 obs. of 6 variables: $ STN_ID : num 2402594 2402594 2402594 2402594 2402594 ... $ YEAR: num 1997 1997 1997 1997 1998 ... $ MM : num 9 10 11 12 1 2 3 4 5 9 ... $ ELEM: num 1 1 1 1 1 1 1 1 1 2 ... $ variable: Factor w/ 3 levels X1,X2,X3: 1 1 1 1 1 1 1 1 1 1 ... $ value : chr -00233 -3 25 000160 ... dcast(mtst, STN_ID +YEAR+ MM + variable ~ ELEM) #- STN_ID YEAR MM variable 1 2 1 2402594 1997 9 X1 -00233 -00339 2 2402594 1997 9 X2 -00204 -00339 3 2402594 1997 9 X3 -00119 -00343 4 2402594 1997 10 X1 -3 -00207 5 2402594 1997 10 X2 -5 -00289 6 2402594 1997 10 X3 -1 -00278 7 2402594 1997 11 X1 25 -00242 snipped output I would like to organize the data so it looks like this: STN_ID YEAR MM DAYELEM1 ELEM2 1 2402594 1997 9 X1 -00233 -00339 2 2402594 1997 9 X2 -00204 77 3 2402594 1997 9 X3 -00119 30 Where is that second column coming from. I don't see it in the data example Such that I create a new column named DAY that is made up of the numbers following X in the original data.frame columns. Also, the ELEM values are converted to columns and parsed with the ELEM code (in this case 1 and 2). I have tried to split apart the columns, transform them, and bind them back together, but my ability to do so just isn't there yet. I am still fairly new to R, and would really appreciate some help in working towards organizing this data frame. Thanks in advance, Scott Hatcher [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution
Hi, can anyone help me to figure out how to compute the percentile of an individual observation with respect to a reference distribution. What I mean is. Let's assume I have a vector consisting of 10 numbers {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the number 4.9 corresponds to. I failed to find any reference to such a function, although I would assume this must frequently be necessary. Thanks in advance for you help. /Rudi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution
On May 25, 2011, at 3:42 PM, rudi wrote: Hi, can anyone help me to figure out how to compute the percentile of an individual observation with respect to a reference distribution. What I mean is. Let's assume I have a vector consisting of 10 numbers {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the number 4.9 corresponds to. I failed to find any reference to such a function, although I would assume this must frequently be necessary. ?quantile Talking about percentiles when you only have 10 numbers seems rather misleading, don't you think? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution
Hi Rudi, Take a look at ?ecdf HTH, Jorge On Wed, May 25, 2011 at 3:42 PM, rudi wrote: Hi, can anyone help me to figure out how to compute the percentile of an individual observation with respect to a reference distribution. What I mean is. Let's assume I have a vector consisting of 10 numbers {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the number 4.9 corresponds to. I failed to find any reference to such a function, although I would assume this must frequently be necessary. Thanks in advance for you help. /Rudi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset
How can anyone overlook the intra-ocular trauma test (or sometimes called the inter-ocular concussion test). But the i-o trauma test needs either a small data set or an appropriate graph of the data (or can you look at a dataset of a hundred columns and a million rows and do an intra-ocular trauma test?). We were not told the size of the dataset or enough information to know what type of graph to make. You do make a good point though that with minimal additional information the intra-ocular trauma test can be useful (well if it is significant, there are many datasets that fail the intra-ocular trauma test, but still yield interesting results after careful study). And for any dataset that has a significant intra-ocular trauma test result, that should trump the results of SnowsCorrectlySizedButOtherwiseUselessTestOfAnything. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Michael Dewey [mailto:i...@aghmed.fsnet.co.uk] Sent: Wednesday, May 25, 2011 1:14 PM To: Greg Snow; Ramnath R; r-help@r-project.org Subject: Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset At 00:41 25/05/2011, Greg Snow wrote: The only statistical method that I know of that can be applied to any dataset without further definition of the nature of the data or the question being asked is SnowsCorrectlySizedButOtherwiseUselessTestOfAnything which is found in the TeachingDemos package for R. Greg, have you overlooked the intra-ocular trauma test? However this test is not common (for a couple of very good reasons). If you want a more useful method you first need to decide on what your question is that you want answered and have some more detail about the dataset. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ramnath R Sent: Monday, May 23, 2011 12:12 PM To: r-help@r-project.org Subject: [R] What are the common Standard Statistical methods used for the analysis of a dataset Hi, Anybody know what are the common Standard statistical methods used for the analysis of a dataset,and anybody know which of these methods give similar results Ram [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Opening R in 64-bit version by default
I have no problems configuring .r files to start in Emacs or RStudio and then use Emacs or RStudio to call the required version of R. You might check when you open with other from Windows Explorer that the check box Always open with this program is ticked. If you are using Windows 7 you can set an change default programs as follows - 1 open Control Panel 2 click on programs 3 click Default Programs and follow the options to set the required defaults. I have never used Vista and dont know if this works in Vista Best regards John On 25 May 2011 01:55, Michael Sumner mdsum...@gmail.com wrote: When you installed R there should be shortcuts on your desktop, or under /R/ in the start menu unless you opted for the installation to not create those. Click (or double-click) the one that has a name like R x64 2.13.0 - the x64 indicates that the shortcut is for the 64-bit R. You won't have this if you opted not to install the 64-bit R components. Use that shortcut every time to start R, and when it's running right-click the task bar item and click Pin this program to tasksbar to make it super accessible. If you have R older than 2.12.0 then the 32-bit and 64-bit installers are separate, but you don't specify your version and you should use the latest in any case. If you have shortcuts for 32-bit R, or other versions then you'll need to clean up or organize them in whatever way works best for you. Cheers, Mike. On Wed, May 25, 2011 at 5:23 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 24/05/2011 1:27 PM, Josh Browning wrote: Oh, of course, sorry. I'm running Windows 7. Thanks! Your question is probably a question for Microsoft. Why doesn't whatever you did work? Someone here might be able to help if you describe what you did. I just tried Open with... and selected Rgui.exe from the bin/x64 directory, and that failed. A couple of other things I tried worked: 1. Edit the registry key HKEY_CLASSES_ROOT\RWorkspace\shell\open\command 2. Rename the bin/x64/Rgui.exe file to something else, and ask to open with that. Duncan Murdoch Josh -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Tuesday, May 24, 2011 11:25 AM To: Josh Browning Cc: r-help@r-project.org Subject: Re: [R] Opening R in 64-bit version by default On May 24, 2011, at 11:03 AM, Josh Browning wrote: Hi Everyone, This may be a dumb question, but I can't seem to figure it out. I have 32 and 64 bit versions of R installed on my machine, and I'd really like the 64-bit version to be the default (i.e. what opens when I open up a workspace). I've tried right-clicking on the workspace and setting the default option as the 64 bit version, but it still opens the workspace in 32-bit. Am I missing something here? Any help would be greatly appreciated! Shirley, you don't expect us to read your mind. OS? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- John C Frain Economics Department Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:fra...@tcd.ie mailto:fra...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset
Dear all, may I suggest the acronym IOTT for the inter-ocular trauma test? Now we just need someone to implement iot.test(). I assume it will appear on CRAN within the next 24 hours. Looking forward to yet another base package, Stephan Am 25.05.2011 23:36, schrieb Greg Snow: How can anyone overlook the intra-ocular trauma test (or sometimes called the inter-ocular concussion test). But the i-o trauma test needs either a small data set or an appropriate graph of the data (or can you look at a dataset of a hundred columns and a million rows and do an intra-ocular trauma test?). We were not told the size of the dataset or enough information to know what type of graph to make. You do make a good point though that with minimal additional information the intra-ocular trauma test can be useful (well if it is significant, there are many datasets that fail the intra-ocular trauma test, but still yield interesting results after careful study). And for any dataset that has a significant intra-ocular trauma test result, that should trump the results of SnowsCorrectlySizedButOtherwiseUselessTestOfAnything. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution
Hi: On Wed, May 25, 2011 at 12:42 PM, rudi rudi.stras...@gmail.com wrote: Hi, can anyone help me to figure out how to compute the percentile of an individual observation with respect to a reference distribution. What I mean is. Let's assume I have a vector consisting of 10 numbers {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the number 4.9 corresponds to. I failed to find any reference to such a function, although I would assume this must frequently be necessary. The simple answer is, I believe, x - c(3,5,8,1,9,5,4,3,5.5,7) plot(ecdf(x)) sum(x = 4.9)/length(x) [1] 0.4 This would correspond to the empirical cumulative distribution function (ecdf) to which Jorge alluded. HTH, Dennis Thanks in advance for you help. /Rudi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Opening R in 64-bit version by default
On 25/05/2011 5:43 PM, John C Frain wrote: I have no problems configuring .r files to start in Emacs or RStudio and then use Emacs or RStudio to call the required version of R. You might check when you open with other from Windows Explorer that the check box Always open with this program is ticked. If you are using Windows 7 you can set an change default programs as follows - 1 open Control Panel 2 click on programs 3 click Default Programs and follow the options to set the required defaults. I have never used Vista and dont know if this works in Vista I suspect the latter method will fail, since it's using the same tools as Open with... uses, and that appears to be buggy. But I'm on 32 bit XP right now, so I can't verify. Duncan Murdoch Best regards John On 25 May 2011 01:55, Michael Sumnermdsum...@gmail.com wrote: When you installed R there should be shortcuts on your desktop, or under /R/ in the start menu unless you opted for the installation to not create those. Click (or double-click) the one that has a name like R x64 2.13.0 - the x64 indicates that the shortcut is for the 64-bit R. You won't have this if you opted not to install the 64-bit R components. Use that shortcut every time to start R, and when it's running right-click the task bar item and click Pin this program to tasksbar to make it super accessible. If you have R older than 2.12.0 then the 32-bit and 64-bit installers are separate, but you don't specify your version and you should use the latest in any case. If you have shortcuts for 32-bit R, or other versions then you'll need to clean up or organize them in whatever way works best for you. Cheers, Mike. On Wed, May 25, 2011 at 5:23 AM, Duncan Murdochmurdoch.dun...@gmail.comwrote: On 24/05/2011 1:27 PM, Josh Browning wrote: Oh, of course, sorry. I'm running Windows 7. Thanks! Your question is probably a question for Microsoft. Why doesn't whatever you did work? Someone here might be able to help if you describe what you did. I just tried Open with... and selected Rgui.exe from the bin/x64 directory, and that failed. A couple of other things I tried worked: 1. Edit the registry key HKEY_CLASSES_ROOT\RWorkspace\shell\open\command 2. Rename the bin/x64/Rgui.exe file to something else, and ask to open with that. Duncan Murdoch Josh -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Tuesday, May 24, 2011 11:25 AM To: Josh Browning Cc: r-help@r-project.org Subject: Re: [R] Opening R in 64-bit version by default On May 24, 2011, at 11:03 AM, Josh Browning wrote: Hi Everyone, This may be a dumb question, but I can't seem to figure it out. I have 32 and 64 bit versions of R installed on my machine, and I'd really like the 64-bit version to be the default (i.e. what opens when I open up a workspace). I've tried right-clicking on the workspace and setting the default option as the 64 bit version, but it still opens the workspace in 32-bit. Am I missing something here? Any help would be greatly appreciated! Shirley, you don't expect us to read your mind. OS? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution
On May 25, 2011, at 5:50 PM, Dennis Murphy wrote: Hi: On Wed, May 25, 2011 at 12:42 PM, rudi rudi.stras...@gmail.com wrote: Hi, can anyone help me to figure out how to compute the percentile of an individual observation with respect to a reference distribution. What I mean is. Let's assume I have a vector consisting of 10 numbers {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the number 4.9 corresponds to. I failed to find any reference to such a function, although I would assume this must frequently be necessary. The simple answer is, I believe, x - c(3,5,8,1,9,5,4,3,5.5,7) Try instead: ecdf(x)(4.9) [1] 0.4 ecdf returns a function, so why not use it as such? It is also linked from the quantile help page where it is called the inverse of quantile. -- David. plot(ecdf(x)) sum(x = 4.9)/length(x) [1] 0.4 (Somewhat more complicated than necessary.) This would correspond to the empirical cumulative distribution function (ecdf) to which Jorge alluded. HTH, Dennis Thanks in advance for you help. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subtracting rows by id
That worked perfectly. Thank you Dennis - I very much appreciate the help! Sara Maxwell, PhD, Postdoctoral Fellow Marine Conservation Institute University of California Santa Cruz Long Marine Laboratory 100 Shaffer Road Santa Cruz CA 95060 USA +1 206 355 3249 sara.maxw...@marine-conservation.org www.Marine-Conservation.org On May 25, 2011, at 12:58 PM, Dennis Murphy wrote: Hi: Interesting problem. Here's one approach: library(plyr) # Read in your datasets as data frames rather than matrices dataset1 - data.frame(id1 = rep(1:3, each = 10), value1 = sample(seq_len(100), 30, replace = TRUE)) dataset2 - data.frame(id2 = 1:3, subtract.value = c(1, 3, 5)) # The idea is to use the rows of dataset2 as parameters for # subsetting and removing the first n_i rows. The tail() function # serves the purpose: foo - function(id2, subtract.value) tail(subset(dataset1, id1 == id2), -subtract.value) # Use the mdply function in the plyr package: mdply(dataset2, foo)[, -(1:2)] id1 value1 11 2 21 55 31 18 41 4 51 3 61 76 71 74 81 21 91 97 10 2 19 11 2 49 12 2 20 13 2 73 14 2 79 15 2 95 16 2 52 17 3 60 18 3 58 19 3 68 20 3 59 21 3 13 HTH, Dennis On Wed, May 25, 2011 at 9:55 AM, Sara Maxwell smaxw...@ucsc.edu wrote: Dear R users, I have two datasets: id1 - c(rep(1,10), rep(2,10), rep(3,10)) value1 - sample(1:100, 30, replace=TRUE) dataset1 - cbind(id1,value1) id2 - c(1,2,3) subtract.value - c(1,3,5) dataset2 - cbind(id2, subtract.value) I want to subtract the number of rows in the subtract.value that corresponds to the id value in dataset1. So for the 1 in id1, I want to remove the first row, for 2 in id1 I want to remove the first 3 rows, for 3 in id1 I want to remove the first 5 rows, finally creating a new dataframe with the remaining values. I am having trouble structuring a loop that can do this by the unique ids in the first dataset while matching the ids in the datasets. Any thoughts would be greatly appreciated. Thank you, Sara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] panel.first problem when plotting with formula
On May 25, 2011, at 5:56 PM, Gene Leynes wrote: David, Peter (and others), If you're interested, I submitted this as a bug, and was informed of the error of my ways by Professor Ripley * His informative reply is copied below. * The short answer is that panel.first is not a documented function of plot.formula, which is called by the generic plot. Apparently not the first time he has been called upon to do so. Here is a similar question, albeit with no answer (at least in Baron's archive) at that time. http://finzi.psych.upenn.edu/Rhelp10/2009-September/210328.html (... the link to the ancient bug is broken.) But plot.formula promises to pass ... arguments to later hand offs and apparently it munges up the 'dots' in a manner that plot.data.frame does not. In fact, plot.formula gets handed back to generic `plot`. Prof Ripley obviously has an understanding of the term `expression` that surpasses mine. Does your understaning of his reply extend to explaining why plot.data.frame works with our naive invocation of panel.first while his suggested syntax does not: plot(dat, panel.first=quote( bgfun() ) ) # Fails. plot(dat, panel.first= bgfun() ) # Succeeds. So I it still appears there is a demonstrable degree of inconsistency, even if there is no bug. The solution gives me some insight into how the lazy evaluation works. ## Note: It's still not a documented use of the function! plot(y ~ x, data=dat, panel.first=quote(bgfun())) On Wed, May 25, 2011 at 2:13 AM, r-b...@r-project.org wrote: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14591 Brian Ripley rip...@stats.ox.ac.uk changed: What|Removed |Added Status|NEW |CLOSED Resolution||INVALID --- Comment #1 from Brian Ripley rip...@stats.ox.ac.uk 2011-05-25 03:13:34 EDT --- panel.first is not a documented argument to plot.formula: please do read the help. Yes, I did read the help page. I also looked at the code (of plot.formula, plot.data.frame, and plot.default) and made a good faith effort at following the flow of data through that code by inserting print and str statements at what appeared to be critical points so I could see where plot.formula was going and what it was being given to work with. It is a documented argument to plot.default(), as panel.first: an expression to be evaluated after the plot axes are set ^^ but you passed an evaluated function call. It first ran bgfun() and then the plot call. It worked for plot.default() by lazy evaluation. I also tried using just panel.first=bgfun as I would have with lattice calls, and it did not succeed in any application. You needed plot(y ~ x, data=dat, panel.first=quote(bgfun())) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.