[R] Data generation
I want to generate a data matrix (20*30) having mean 3 and std deviation 1 (normal dist). pl help Partha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: repeated measures MANOVA with interaction
If I have a matrix x: x slug surgery swat prey predator 1 122 2 91 2 240 8 115 3 348 3 110 slug = individual is tested in each swat, prey and predator odour treatments surgery = different surgical treatment on slug values in swat, prey, predator columns = average headings of slug 2 levels of treatments: surgery and odour (swat, prey, predator) how do I test (appropriate code) a MANOVA repeated measures with an interaction between surgery and odour (swat,prey,predator)? do I need to re-arrange the matrix? the solution might be: y - cbind(swat,prey,predator) fit - manova(y ~ surgery * swat,prey,predator+ Error(slug), data = x) ??? Greg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeat function for entire list of matrices
didn't seem to quite work: i tried different subsetting. lapply(nestedseasonlower, nested(nestedseason,.) are there any functions that can repeat a function while counting each iteration of the repeated function? (n=1, n=2, n=3) thanks -- View this message in context: http://r.789695.n4.nabble.com/repeat-function-for-entire-list-of-matrices-tp4334587p4339299.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] height of plots
Hello R gurus, I have to create 12 plots, I have been using the following script, which leaves a large white space between two plot. I would appreciate if someone can suggest an alternative to reduce the white space. par(mar=c(3,3,.5,.5)) split.screen(c(6,2))# split display into two screens for (i in 1:12) { if (i11) { screen(i) plot(1:10,xaxt='n', xlab='', ylab='') box() }else{ screen(i) plot(1:10, xlab='', ylab='', cex=0.75) box() } } Thanks Sharad -- View this message in context: http://r.789695.n4.nabble.com/height-of-plots-tp4339152p4339152.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need some help with model.matrix
hello folks, i am learning R and microarray analysis from scratch using different sites. today i am doing an exercise from http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#R_functions the section i am at is 2. Affymetrix data analysis. I understand the syntax given in this section up until: design - model.matrix(~ -1+factor(c(1,1,2,2,3,3))) # Creates appropriate design matrix. Alternatively, such a design matrix can be created in any spreadsheet program and then imported into R. i am stuck at this point. i believe the model.matrix is creating a design matrix that the data will be put in later. the data in the example is: NameFileName Target Shoot12h.1 COLD_CONTROL_12H_SHOOT_REP1.cel c12h Shoot12h.2 COLD_CONTROL_12H_SHOOT_REP2.cel c12h ColdShoot6h.1 COLD_6H_SHOOT_REP1.cel t6h ColdShoot6h.2 COLD_6H_SHOOT_REP2.cel t6h ColdShoot12h.1 COLD_12H_SHOOT_REP1.cel t12h ColdShoot12h.2 COLD_12H_SHOOT_REP2.cel t12h Three experimental samples (duplicates of each giving a total of 6 arrays). now back to where i got stuck: design - model.matrix(~ -1+factor(c(1,1,2,2,3,3))) # Creates appropriate design matrix. Alternatively, such a design matrix can be created in any spreadsheet program and then imported into R. what is model.matrix exactly doing? my real data that i will analyze after figuring this out has 49 arrays (columns): 3, 6, 9, 12 month samples with 9 replicates each and then 23 month samples with 13 replicates == total 49. how should i create an appropriate design matrix?? PLEASE help? thanks, daniel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to calculate length of each triangulated face in deldir
Hi, I have some data data=read.table(SCI.was ,header=TRUE) sci_lat=data[,7] # latitude temp_lon=data[,8] # longitude # the longitude data is in 360 degree format need to convert to -180 to 180 sci_lon= ((temp_lon+180) %% 360 ) -180 m -cbind(sci_lon,sci_lat) dist - spDistsN1(m, m[1,], longlat=TRUE) hist(dist[2:9997]) try - deldir(sci_lon,sci_lat) try_p - deldir(sci_lon,sci_lat,plot=TRUE,wl='tr') Now I would like to calculate the length of each triangulated face , could somebody please tell me how to calculate it ? Cheers Uday -- View this message in context: http://r.789695.n4.nabble.com/how-to-calculate-length-of-each-triangulated-face-in-deldir-tp4339205p4339205.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to select columns
Hello, I have the following question: when creating a data.frame a1-c(1,2,3) a2-c(1,2,3) c-data.frame(a1,a2) I can select columns using an index like: c[,1:2] Is this possible too when using column-names? (something like c(,a1:a2), which doesn't work) Alternative question: Is there a function to get the index of a variable by name or can I select certain columns using a loop? (a_1, a_2, ..., a_n) Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modifying whiskers in boxplots?
Hello, I know this has been covered on here before, but as a complete novice, I need a little more guidance. I would like to produce boxplots with the whiskers extending to the 10 and 90th percentiles. I found this code: myboxplot.stats - function (x, coef = NULL, do.conf = TRUE, do.out = TRUE) { nna - !is.na(x) n - sum(nna) stats - quantile(x, c(.1,.25,.5,.75,.9), na.rm = TRUE) iqr - diff(stats[c(2, 4)]) out - x stats[1] | x stats[5] conf - if (do.conf) stats[3] + c(-1.58, 1.58) * diff(stats[c(2, 4)])/sqrt(n) list(stats = stats, n = n, conf = conf, out = x[out nna]) } posted by Mr. Jim Bowers, and additional posts discussing how to make it work. The issue I am having is that all the posts say to edit boxplot.default, but I have no idea how to actually do that. I've tried fix(bowplot.default) and fixInNamespace(...), but what do I actually change? I tried including an argument stats=myboxplot.stats but that did not change anything. Once it is changed, can I just use the same boxplot(...) code as normal or do I need to use myboxplot? I know this should be simple, but it is generating a lot of frustration. Any help would be greatly appreciated. Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PLEASE HELP creating a matrix
hello folks, i am learning R and microarray analysis from scratch using different sites. today i am doing an exercise from http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#R_functions the section i am at is 2. Affymetrix data analysis. I understand the syntax given in this section up until: design - model.matrix(~ -1+factor(c(1,1,2,2,3,3))) # Creates appropriate design matrix. Alternatively, such a design matrix can be created in any spreadsheet program and then imported into R. i am stuck at this point. i believe the model.matrix is creating a design matrix that the data will be put in later. the data in the example is: NameFileName Target Shoot12h.1 COLD_CONTROL_12H_SHOOT_REP1.cel c12h Shoot12h.2 COLD_CONTROL_12H_SHOOT_REP2.cel c12h ColdShoot6h.1 COLD_6H_SHOOT_REP1.cel t6h ColdShoot6h.2 COLD_6H_SHOOT_REP2.cel t6h ColdShoot12h.1 COLD_12H_SHOOT_REP1.cel t12h ColdShoot12h.2 COLD_12H_SHOOT_REP2.cel t12h Three experimental samples (duplicates of each giving a total of 6 arrays). now back to where i got stuck: design - model.matrix(~ -1+factor(c(1,1,2,2,3,3))) # Creates appropriate design matrix. Alternatively, such a design matrix can be created in any spreadsheet program and then imported into R. what is model.matrix exactly doing? my real data that i will analyze after figuring this out has 49 arrays (columns): 3, 6, 9, 12 month samples with 9 replicates each and then 23 month samples with 13 replicates == total 49. how should i create an appropriate design matrix?? PLEASE help? thanks, daniel -- View this message in context: http://r.789695.n4.nabble.com/PLEASE-HELP-creating-a-matrix-tp4340263p4340263.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about undefined columns selected
Hi,all, when I run the below code,there is an error occured. could you please tell me how to treat it? pdf('covariate.pdf') par(mfrow=c(1,1)) pairs(data2[,c(ID,TYPE,AGE,GNDR,HT)], + panel=function(x,y) { points(x,y); lines(lowess(x,y))}) Error in `[.data.frame`(data2, , c(ID, TYPE, AGE, GNDR, HT)) : undefined columns selected dev.off() RStudioGD 2 Thank you very much! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using influence plots and obtaining id numbers
I am a novice R user, and I am having difficulty understanding R's influence plots. I am trying to remove outliers from a particular variable, sib. I am able to generate influence plots and further outlier information such as below (which is a shortened example). For my analyses, I end up excluding the points R refers to, 7, 18, 26, and 105. However, my question is, how can I understand which ID numbers these points (7,18,26, and 105) are referring to? These numbers, 7,18, 26. and 105, are definitely not my study ID numbers. Myoutput-aov(sib~newgroup1, data=Study1) influencePlot(Myoutput) [1] 7 18 26 105 influence.measures(Myoutput) Influence measures of aov(formula = sib ~ newgroup1, data = Study1) : dfb.1_ dfb.nw12 dfb.nw13 dfb.nw14 dfb.nw15 dffit cov.r cook.dhat inf 33 1.70e-01 -1.33e-01 -1.53e-01 -1.56e-01 -1.52e-01 0.170405 1.124 5.83e-03 0.0909 * 34 7.79e-02 -6.07e-02 -7.00e-02 -7.14e-02 -6.94e-02 0.077934 1.131 1.22e-03 0.0909 * 35 1.47e-01 -1.15e-01 -1.32e-01 -1.35e-01 -1.31e-01 0.147268 1.126 4.36e-03 0.0909 * 36 6.64e-02 -5.17e-02 -5.96e-02 -6.08e-02 -5.91e-02 0.066386 1.132 8.86e-04 0.0909 * 37 -3.15e-01 2.46e-01 2.83e-01 2.89e-01 2.81e-01 -0.315448 1.100 1.99e-02 0.0909 * 38 1.47e-01 -1.15e-01 -1.32e-01 -1.35e-01 -1.31e-01 0.147268 1.126 4.36e-03 0.0909 * 39 -9.26e-01 7.22e-01 8.32e-01 8.48e-01 8.24e-01 -0.926059 0.882 1.64e-01 0.0909 * -- View this message in context: http://r.789695.n4.nabble.com/Using-influence-plots-and-obtaining-id-numbers-tp4339144p4339144.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about undefined columns selected
Le dimanche 29 janvier 2012 à 21:50 -0500, xiaocong zuo a écrit : Hi,all, when I run the below code,there is an error occured. could you please tell me how to treat it? pdf('covariate.pdf') par(mfrow=c(1,1)) pairs(data2[,c(ID,TYPE,AGE,GNDR,HT)], + panel=function(x,y) { points(x,y); lines(lowess(x,y))}) Error in `[.data.frame`(data2, , c(ID, TYPE, AGE, GNDR, HT)) : undefined columns selected dev.off() RStudioGD 2 Thank you very much! This simply means that one of the columns you tried to select doesn't exist in data2. You can see what columns are present using: colnames(data2) or since data2 is a data frame: names(data2) But you could probably have figured this out by yourself... ;-) Hope this helps __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data generation
Assuming you want the whole data matrix coming from a single distribution. matrix(rnorm(20 *30, 3, 1), 20, 30) On 30/01/12 06:33, Partha Sinha wrote: I want to generate a data matrix (20*30) having mean 3 and std deviation 1 (normal dist). pl help Partha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to select columns
Le lundi 30 janvier 2012 à 08:30 +0100, David Studer a écrit : Hello, I have the following question: when creating a data.frame a1-c(1,2,3) a2-c(1,2,3) c-data.frame(a1,a2) I can select columns using an index like: c[,1:2] Is this possible too when using column-names? (something like c(,a1:a2), which doesn't work) Read the R intro, or any tutorial on R. You can just do: c[,c(a1, a2)] (And I think you don't understand what : does, read the manual. At least, it doesn't work like your attempt c(,a1:a2) would imply.) Alternative question: Is there a function to get the index of a variable by name or can I select certain columns using a loop? (a_1, a_2, ..., a_n) No need for a loop: which(colnames(c) == a1) Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] height of plots
On 01/30/2012 07:24 AM, 1Rnwb wrote: Hello R gurus, I have to create 12 plots, I have been using the following script, which leaves a large white space between two plot. I would appreciate if someone can suggest an alternative to reduce the white space. par(mar=c(3,3,.5,.5)) split.screen(c(6,2))# split display into two screens for (i in 1:12) { if (i11) { screen(i) plot(1:10,xaxt='n', xlab='', ylab='') box() }else{ screen(i) plot(1:10, xlab='', ylab='', cex=0.75) box() } } Hi Sharad, Specify your margins like this: split.screen(c(6,2)) for (i in 1:12) { if (i11) { screen(i) par(mar=c(0.5,3,0,0.5)) plot(1:10,xaxt='n', xlab='', ylab='') box() } else { screen(i) par(mar=c(3,3,0,0.5)) plot(1:10, xlab='', ylab='', cex=0.75) } } Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] handling a lot of data
Hi, I have got a lot of SPSS data for years 1993-2010. I load all data into lists so I can easily index the values over the years. Unfortunately loaded data occupy quite a lot of memory (10Gb) - so my question is, what's the best approach to work with big data files? Can R get a value from the file data without full loading into memory? How can a slower computer with not enough memory work with such data? I use the following commands: data1993 = vector(list, 4); data1993[[1]] = read.spss(...) # first trimester data1993[[2]] = read.spss(...) # second trimester ... data_all = vector(list, 17); data_all[[1993]] = data1993; ... and indexing, e.g.: data_all[[1993]][[1]]$DISTRICT, etc. Thanks, Petr Kurtin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] handling a lot of data
Le lundi 30 janvier 2012 à 09:54 +0100, Petr Kurtin a écrit : Hi, I have got a lot of SPSS data for years 1993-2010. I load all data into lists so I can easily index the values over the years. Unfortunately loaded data occupy quite a lot of memory (10Gb) - so my question is, what's the best approach to work with big data files? Can R get a value from the file data without full loading into memory? How can a slower computer with not enough memory work with such data? I use the following commands: data1993 = vector(list, 4); data1993[[1]] = read.spss(...) # first trimester data1993[[2]] = read.spss(...) # second trimester ... data_all = vector(list, 17); data_all[[1993]] = data1993; ... and indexing, e.g.: data_all[[1993]][[1]]$DISTRICT, etc. Have a look at the Large memory and out-of-memory data of High Performance Computing task view[1]. In particular, you may want to use the ff package and its ffdf object, which allows backing a data frame on a file so that RAM can be freed when needed. Another advice I'd give you is to convert the data from SPSS format to .RData once, and to always use the latter. In my experience, importation often creates memory fragmentation, in addition to being very slow (don't hesitate to save, quit and restart R to reduce this problem). What use do you make of the different years? If you need e.g. to run a model on all of them at the same time, then you'll need to concatenate all the data frames from the data_all list, and I guess that's where the RAM will be the problem: you'll have two copies of the data at the same time. Once you've succeeded doing this, loading the full data set will use less RAM, and so may work on lower-end computers. A general solution is also to only load the variables you really need. The saves package allows you to save the whole data set into an archive of several .RData files, and to load only what you want from it. It all depends on your needs, constraints, and failed attempts. ;-) Regards 1: http://cran.r-project.org/web/views/HighPerformanceComputing.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Modifying whiskers in boxplots?
Hi Hello, I know this has been covered on here before, but as a complete novice, I need a little more guidance. I would like to produce boxplots with the whiskers extending to the 10 and 90th percentiles. I found this code: myboxplot.stats - function (x, coef = NULL, do.conf = TRUE, do.out = TRUE) { nna - !is.na(x) n - sum(nna) stats - quantile(x, c(.1,.25,.5,.75,.9), na.rm = TRUE) iqr - diff(stats[c(2, 4)]) out - x stats[1] | x stats[5] conf - if (do.conf) stats[3] + c(-1.58, 1.58) * diff(stats[c(2, 4)])/sqrt(n) list(stats = stats, n = n, conf = conf, out = x[out nna]) } posted by Mr. Jim Bowers, and additional posts discussing how to make it work. The issue I am having is that all the posts say to edit boxplot.default, but I have no idea how to actually do that. I've tried fix(bowplot.default) I do not see any need for modifying boxplot.default. You could change $stats part of list produced from boxplot call set.seed(111) x-rnorm(100) bb-boxplot(x, plot=F) quantile(x,c(.1,.9)) 10% 90% -1.315051 1.400721 bb$stats [,1] [1,] -2.3023457 [2,] -0.7581696 [3,] 0.1315965 [4,] 0.6211842 [5,] 2.4856616 bb$stats[c(1,5),]-quantile(x,c(.1,.9)) bb$stats [,1] [1,] -1.3150509 [2,] -0.7581696 [3,] 0.1315965 [4,] 0.6211842 [5,] 1.4007212 bxp(bb, add=T, col=2) Regards Petr and fixInNamespace(...), but what do I actually change? I tried including an argument stats=myboxplot.stats but that did not change anything. Once it is changed, can I just use the same boxplot(...) code as normal or do I need to use myboxplot? I know this should be simple, but it is generating a lot of frustration. Any help would be greatly appreciated. Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ColorBrewer question
It works! Thanks a lot for your explanations, Michael. Good luck, Mario Von: R. Michael Weylandt michael.weyla...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Gesendet: 5:22 Montag, 30.Januar 2012 Betreff: Re: [R] ColorBrewer question I believe you need to use the scale_fill_brewer since fill is the color of the bars while color is the outside of the bars in ggplot2-speak: E.g., with built-in data (it's polite to provide yours so that your minimal working example is working): data(diamonds) ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity, color = clarity)) # Note the borders are now changed but the fill is the same ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity, color = clarity)) + scale_color_brewer(pal = Blues) # Now the fill is changed, but you probably want to drop the border coloring since it's hideous against the blues ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity, color = clarity)) + scale_fill_brewer(pal = Blues) # So lovely ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity)) + scale_fill_brewer(pal = Blues) Michael Hello, R friends, I'm trying to change colors of my horizontal bars so that they show a sequence. I chose the ColorBrewer palette Blues. However the resulting plot doesn't show any changes to the default. I tried several places of + scale_colour_brewer(type=seq, pal = Blues) with no effect. This is my code: p - ggplot(data, aes(x = gender)) + scale_y_continuous(,formatter=percent) + xlab(Gender) + coord_flip() + scale_colour_brewer(type=seq, pal = Blues) p+geom_bar(aes(fill=pet),colour='black',position='fill') Any ideas welcome. Thanks, Mario [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple column comparison
Hi I did not see any response and actually I can not offer any ready made solution too. For such problems there could be various solutions from cycles to *apply, reshape or plyr options. However for anybody to start with it would be nice to get rather more clear description together with some small toy ready available data (preferably produced by dput) and desired result. Regards Petr Hello, I have a very large content analysis project, which I've just begun to collect training data on. I have three coders, who are entering data on up to 95 measurements. Traditionally, I've used Excel to check coder agreement (e.g., percentage agreement), by lining up each coder's measurements side-by-side, creating a new column with the results using if statements. That is, if (a=b, 1, 0). With this many variables, I am clearly interested in something that I don't have to create manually every time I check percentage agreement for coders. The data are set up like this: IDCODER V1 V2 V3 V4 ... V95 ID1 C1 y int doc y ID2 C1 y ext doc y ID1 C2nint doc y ID2 C2nint doc y ID1 C3 n int doc y ID2 C3 n int doc y I would like to write a script to do the following: For each variable compare each pair of coders using if statements (e.g., if C1.V1.==C1.V2, 1, 0) IDC1.V1 C2.V1 C3.V1 ID1 y y y ID2 yy y For each coding pair, enter the resulting 1s and 0s into a new column. The new column name would reflect the results of the comparison, such as C1.C2.V1 I'd ideally like to create this so that it can handle any number of variables and any number of coders. I appreciate any thoughts, help, and pointers on this. Thanks in advance. Best, Ryan Fuller Doctoral Candidate, Communication Graduate Student Researcher, Carsey-Wolf Center http://carseywolf.ucsb.edu University of California, Santa Barbara -- View this message in context: http://r.789695.n4.nabble.com/multiple- column-comparison-tp4332604p4332604.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Consultant to program R-code dealing with social networks
Dear all, I am looking for a consultant/ programmer to program a relatively simple R code for me. Specifically, I have about 50 social networks. These networks have between 5,000 and 5 million nodes and between 30,000 and 70 million edges. The code should (a) read one network into R, (b) draw a snowball sample of size x out of the network (e.g., a snowball sample of 1,000 nodes), (c) determine some basic network statistics for that sample and (d) save the sample and network statistics into two files for further use. Let me know by email on case you are interested so that we can speak about the remaining details. Thanks, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nice report generator?
Helloe dear Duncan, Gabor, Michael and others, After taking some time, I wrote a bridge function between a cast_df object from the {reshape} package into a table in Duncan's new {tables} package. The motivation was to make cast_df table prettier in the R terminal, as well as allow us to export a pretty version of the table to latex (using Hmisc::latex, on the output of tabular.cast_df) The code is now available on: http://www.r-statistics.com/2012/01/printing-nested-tables-in-r-bridging-between-the-reshape-and-tables-packages/ I would be happy for any input/revisions/suggestions from you. With regards, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Thu, Dec 8, 2011 at 8:37 PM, Tal Galili tal.gal...@gmail.com wrote: reasonably [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing Rcompression package
Dear all I'm trying to install the Rcompression package under R-2.14.0 on a Windows plateform. I need it to use the Ropenoffice package Because there is no binary available, I'm trying to install it from source but I have always some error messages. I have installed zlib and Bzip2 softwares, defined LIB_ZLIB and LIB_BZIP2 variables and I have no space in my R home directories... but nothing tod do! Is there anyone who could help me? Jérémy Mazet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing Rcompression package
On 30/01/2012 12:24, Jeremy MAZET wrote: Dear all I'm trying to install the Rcompression package under R-2.14.0 on a Windows plateform. I need it to use the Ropenoffice package Because there is no binary available, I'm trying to install it from source but I have always some error messages. I have installed zlib and Bzip2 softwares, defined LIB_ZLIB and LIB_BZIP2 variables and I have no space in my R home directories... but nothing tod do! Is there anyone who could help me? Well, you have not even told us where you got either of those packages from. But as the posting guide told you, your first port of call is the maintainer. AFAIK these are Omegahat packages, and Omegahat often provides Windows binaries for its packages. So if they have chosen not to do so for Rcompression, the maintainer may have a good reason. Jérémy Mazet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variable selection based on both training and testing data
Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = http://humus101.com/?p=2737; a = getURL(u) a # Here - the hebrew is fine. a2 - htmlParse(a) a2 # Here it is a mess... None of these seem to fix it: htmlParse(a, encoding = utf-8) htmlParse(a, encoding = iso8859-8) This is my locale: Sys.getlocale() [1] LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255 Any suggestions? Thanks up front, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable selection based on both training and testing data
Variable section is part of the training process-- it chooses the model. By definition, test data is used only for testing (evaluating chosen model). If you find a package or function that does variable selection on test data, run from it! Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jin Minming Sent: Monday, January 30, 2012 8:14 AM To: r-help@r-project.org Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Checking for invalid dates: Code works but needs improvement
Hi Rui, Marc, and Gabor, Thanks for your replies to my question. All were helpful and it was interesting to see how different people approach various aspects of the same problem. Spent some time this weekend looking at Rui's solution, which is certainly much clearer than my own. Managed to figure out pretty much all the details of how it works. Also managed to tweak it slightly in order to make it do exactly what I wanted. (See revised code below.) Still have a couple of questions though. The first concerns the insertion of the code Y 2012 to set year values beyond 2012 to NA (on line 10 of the function below). When I add this (or use it in place of nchar(Y) 4), the code succesfully finds the problem date 05/16/2015. After that though, it produces the following error message: Error in if (any(is.na(x) M != un Y != un)) cat(Warning: Invalid date values in, : missing value where TRUE/FALSE needed Why is this happening? If the code correctly correctly handles the date 06/20/1840 without producing an error, why can't it do likelwise with 05/16/2015? The second question is why it's necessary to put x on line 15 following cat(Warning ...). I know that I don't get any date columns if I don't include this but am not sure why. The third question is whether it's possible to change the class of the date variables without using a for loop. I played around with this a little but didn't find a vectorized alternative. It may be that this is not really important. It's just that I've read in several places that for loops should be avoided wherever possible. Thanks, Paul ## Code for detecting invalid dates ## Test Data connection - textConnection( 1 11/23/21931 05/23/2009 un/17/2011 2 06/20/1840 02/30/2010 03/17/2011 3 06/17/1935 12/20/2008 07/un/2011 4 05/31/1937 01/18/2007 04/30/2011 5 06/31/1933 05/16/2015 11/20/un ) TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection) Input Data TDSaved - TestDates List of Date Variables DateNames - c(birthDT, diagnosisDT, metastaticDT) Date Function fun - function(Dat){ f - function(jj, DF){ x - as.character(DF[, jj]) x - unlist(strsplit(x, /)) n - length(x) M - x[seq(1, n, 3)] D - x[seq(2, n, 3)] Y - x[seq(3, n, 3)] D[D == un] - 15 Y - ifelse(nchar(Y) 4 | Y 2012 | Y 1900, NA, Y) x - as.Date(paste(Y, M, D, sep=-), format=%Y-%m-%d) if(any(is.na(x) M != un Y != un)) cat(Warning: Invalid date values in, jj, \n, as.character(DF[is.na(x), jj]), \n) x } Dat - data.frame(sapply(names(Dat), function(j) f(j, Dat))) for(i in names(Dat)) class(Dat[[i]]) - Date Dat } Output Data TD - TDSaved Read Dates TD[, DateNames] - fun(TD[, DateNames]) TD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] useR! 2012: Earlybird Registration for International R Users Conference, Nashville TN 12-15 2012
The 8th international R users conference useR! 2012 will be in Nashville TN USA June 12-15 with a special all-day pre-conference course from Bill Venables on June 11. We have a terrific lineup of half-day tutorials on June 12 and will have invited and contributed presentations of interest to a wide variety of R users. Details may be found at http://biostat.mc.vanderbilt.edu/UseR-2012 and online early bird registration is now available. Abstract submissions for contributed talks and posters are welcomed. There are major entertainment events in and around Nashville before and during the conference that you may also want to take advantage of. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/useR-2012-Earlybird-Registration-for-International-R-Users-Conference-Nashville-TN-12-15-2012-tp4341040p4341040.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nice report generator?
On 30/01/2012 6:59 AM, Tal Galili wrote: Helloe dear Duncan, Gabor, Michael and others, After taking some time, I wrote a bridge function between a cast_df object from the {reshape} package into a table in Duncan's new {tables} package. The motivation was to make cast_df table prettier in the R terminal, as well as allow us to export a pretty version of the table to latex (using Hmisc::latex, on the output of tabular.cast_df) The code is now available on: http://www.r-statistics.com/2012/01/printing-nested-tables-in-r-bridging-between-the-reshape-and-tables-packages/ I would be happy for any input/revisions/suggestions from you. Seems like a nice idea. Two comments: 1. I did add a Factor() function as described in the message you quote from me, so you might be able to use that and simplify things a little. 2. It's more flexible to construct the language object as a language object, rather than pasting something together and parsing it. For one thing, that allows non-syntactic variable names; I think it's also easier to read. So your code txt- paste(tabular(value*v*, LEFT , ~ ,RIGHT ,, data = m_xx, suppressLabels = 2,...), sep = ) eval(parse(text = txt )) could be rewritten as formula- substitute( value*v*LEFT ~ RIGHT, list(LEFT=LEFT, RIGHT=RIGHT)) tabular(formula, data = m_xx, suppressLabels = 2, ...) It might make sense to put something like this into the tables package, but I don't want to have a dependency on reshape. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] percentage from density()
Great suggestions and comments, Bill, Greg and Rolf. You provided me some valuable ways to deal with the data I am working with. Thank you all so much! Bests, D. On 1/29/12 4:03 PM, William Dunlap wrote: If v is your original data, v- c(-20, rep(0,98), 20) why not use mean( -20 v v 2) as your estimate of the probability that v is in (-20,2)? Estimating a density is like taking the derivative of a smooth of the empirical distribution function, so why not eliminate the middleman instead of integrating the estimated density? Any difference between the two methods tells more about the smoothing used than about the data involved. (Not that I am any sort of expert in this matter.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Greg Snow Sent: Saturday, January 28, 2012 8:12 PM To: Duke; r-help@r-project.org Subject: Re: [R] percentage from density() If you use logspline estimation (logspline package) instead of kernel density estimation then this is simple as there are cumulative area functions for logspline fits. If you need to do this with kernel density estimates then you can just find the area over your region for the kernel centered at each data point and average those values together to get the area under the entire density estimate. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Duke Sent: Friday, January 27, 2012 3:45 PM To: r-help@r-project.org Subject: [R] percentage from density() Hi folks, I know that density function will give a estimated density for a give dataset. Now from that I want to have a percentage estimation for a certain range. For examle: y = density(c(-20,rep(0,98),20)) plot(y, xlim=c(-4,4)) Now if I want to know the percentage of data lying in (-20,2). Basically it should be the area of the curve from -20 to 2. Anybody knows a simple function to do it? Thanks, D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROC curve
Hello all, I am very new to R and i am facing two problems. First i didn't succeed changing the konsole language in english even after trying the line command set language='en'. I would like to plot ROC curves. I have a serie of 10 threshold tests that i do for 10 patients. The prediction for the patients is always the same but the status can change given to the considered threshold. I have 11 columns of 10 rows, the first colums containing the10 lines of the predicted status of the patients (0=cured, 1=non cured). Then follow 10 columns (10 thresholds) containing the found status using the threshold. Please do someone know how i can use those values with R to plot ROC curves? I thank you for your understanding, Josiane. Everything should be made as simple as possible, but not simpler.Albert Einstein. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RCurl format
I am having trouble with the postForm function in RCurl. I want to send a the command DELETE https://somewebsite.com.json but I can't seem to find it. I could try: postForm(url, _method=DELETE, .opts = list(username:password) ) but I get the error: Error: unexpected input in postForm(url4, _ this error seems to be due to the underscore _ before method Any ideas how I can do a DELETE command another way in RCurl? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] parameter estimate
I need help, the codes below estimates the weibull parameters with complete failure, my question is how do i change the state to include some censoring (may be right, type-I or type-II) to generate and estimate the parameters. thank you x=rweibull(10,2,2) library(survival) d-data.frame(ob=c(x),state=1) s - Surv(d$ob,d$state) sr - survreg(s~1,dist=weibull) print(paste(beta =,1/sr$scale)) print(paste(eta =,exp(sr$coefficients[1]))) or library(MASS) set.seed(123) m - replicate(1000, coef(fitdistr(rweibull(50, 0.8, 2), weibull))) summary(t(m)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA factors
Hi all How to make from a n x m matrix with the stack command 2 categorical factors in R, the row and the col factor? Is there a function for nice graphical outputs in ANOVA? Thanks Wolfgang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New package geotools
Dear All, I have upload a new package geotools, that main purpose is to propose functions to get distance between cities, with city name or postal code (usage: shipment). For now: there is only the french cities dataset. An example: Return all postal code at 7 kms from Paris: codesNearToCode(zipCode(Paris),7) Regards, Antoine Lucas. ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple data frames
hi don I followed your advice about using sqldf package but the problem of labelling the fields persists; for some reasons I can not properly handle the sql 'as' statement a_b-sqldf(select a.*, b.* from a left join b on a.date=b.date) a_b_c-sqldf(select a_b.*, c.* from a_b left join c on a_b.date=c.date) bye max - Original Message - From: MacQueen, Don macque...@llnl.gov To: maxbre mbres...@arpa.veneto.it; r-help@r-project.org Sent: Saturday, January 28, 2012 12:24 AM Subject: Re: [R] merge multiple data frames Not tested, but this might be a case for the sqldf package. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/26/12 9:29 AM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) c-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086, 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142, 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447, 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816 ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056, 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128, 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057 ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029, 4.33258009,
[R] r-help; parameter estimate
I need help, the codes below estimates the weibull parameters with complete failure, my question is how do i change the state to include some censoring (may be right, type-I or type-II) to generate and estimate the parameters. thank you x=rweibull(10,2,2) library(survival) d-data.frame(ob=c(x),state=1) s - Surv(d$ob,d$state) sr - survreg(s~1,dist=weibull) print(paste(beta =,1/sr$scale)) print(paste(eta =,exp(sr$coefficients[1]))) or library(MASS) set.seed(123) m - replicate(1000, coef(fitdistr(rweibull(50, 0.8, 2), weibull))) summary(t(m)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple data frames
thanks michael it's working like a charm: that's exaclty what I was looking for bye max - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: Massimo Bressan mbres...@arpa.veneto.it Cc: r-help@r-project.org Sent: Friday, January 27, 2012 4:16 PM Subject: Re: [R] merge multiple data frames Oh, sorry -- I assumed that was intentional since my code passed the identical() test with what you said you wanted. Perhaps this gets what you meant you wanted instead (though the treatment of the names is far from elegant) mergeAll - function(..., by = date, all = TRUE) { dotArgs - list(...) dotNames - lapply(dotArgs, names) repNames - Reduce(intersect, dotNames) repNames - repNames[repNames != by] for(i in seq_along(dotArgs)){ wn - which( (names(dotArgs[[i]]) %in% repNames) (names(dotArgs[[i]]) != by)) names(dotArgs[[i]])[wn] - paste(names(dotArgs[[i]])[wn], names(dotArgs)[[i]], sep = .) } Reduce(function(x, y) merge(x, y, by = by, all = all), dotArgs) } print(str(mergeAll(a=a,b=b,c=c))) Is that what you were going for? Michael On Fri, Jan 27, 2012 at 3:19 AM, Massimo Bressan mbres...@arpa.veneto.it wrote: I tested your code: it's OK but there is still the problem of the suffixes for the last dataframe thank you for the support - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: maxbre mbres...@arpa.veneto.it Cc: r-help@r-project.org Sent: Thursday, January 26, 2012 8:19 PM Subject: Re: [R] merge multiple data frames I might do something like this: mergeAll - function(..., by = date, all = TRUE) { dotArgs - list(...) Reduce(function(x, y) merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs), sep = )), dotArgs)} mergeAll(a = a, b = b, c = c) str(.Last.value) You also might be able to set it up to capture names without you having to put a = a etc. using substitute. On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no,
[R] Problem in Fitting model equation in nls function
Dear R users, I am struggling to fit expo-linear equation to my data using nls function. I am always getting error message as i highlighted below in yellow color: ### Theexpo-linear equation which i am interested to fit my data: response_variable = (c/r)*log(1+exp(r*(Day-tt))), where Day is time-variable ## my response variable rl - c(2,1.5,1.8,2,2,2.5,2.6,1.5,2.4,1.7,2.3,2.4,2.2,2.6, 2.8,2,2.5,1.8,2.4,2.4,2.3,2.6,3,2,2.6,1.8,2.5,2.5, 2.3,2.7,3,2.2,2.6,1.8,2.5,2.5,2.3,2.7,3,2.2) myday - rep(c(3,5,7,9,10), each = 8) # creating my predictor time-variable mydata - data.frame(rl,myday) # data object ### fitting model equation in nls function ### when i assigned initial value for tt = 0.6, CASE-I: mytest - nls(rl ~ (c/r)*log(1+exp(r*(myday-tt))), data = mydata, + na.action = na.omit, + start = list(c = 2.0, r = 0.05, tt = 0.6),algorithm = plinear) Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model CASE - II: When i assigned initial value for tt = 1: mytest - nls(rl ~ (c/r)*log(1+exp(r*(myday-tt))), data = mydata, + na.action = na.omit, + start = list(c = 2.0, r = 0.5, tt = 1),algorithm = plinear) Error in nls(rl ~ (c/r) * log(1 + exp(r * (myday - tt))), data = mydata, : singular gradient I am getting the yellow-color highlighted error message (see above). Truely speaking, i have not so much experienced with fitting specific model equation in R-package. I have following queries: 1. Does any one can explain me what is going wrong here ? 2. Importantly, how can i write above equation into nls functions ? I will be very thankful to you, if any one can help me. I am looking for your cooperations. Thanks Regards, Ram Kumar Basnet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about changing line type and line width in Taylor Diagram
Dear all, I am new to plotting Taylor Diagram using plotrix package within R, hence this post. I have written a script which plots Taylor Diagram with one reference and 7 model values. However the font size, line width and line type are not clear when saving the diagram as a jpeg file. I tried the functions lty, lwd and font but no apparent change. I am attaching the script here. Any help would be greatly appreciated. The script is # my first taylor diagram ref-c(0.00640091,0.00533091,0.00381636,0.00275519,0.00277649,0.00280806,0.00267945,0.00237123,0.000970663,0.000986191,0.00100226,0.00086391,0.000622819,0.000485319,0.000362976,0.000246112,0.000165615,0.8184,0.4) m1-c(0.0124827,0.011662,0.0102956,0.0091183,0.00813907,0.007192,0.00662517,0.00433745,0.00184044,0.000649477,0.00024642,5.43E-05,0.97696,0.000194817,0.000182709,0.000134398,0.000106024,8.92E-05,6.28E-05) taylor.diagram(ref,m1,pos.cor=FALSE,ngamma=3,pcex=1,grad.corr.lines=c(-0.99,-0.95,-0.9,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,0.9,0.95,0.99),lty=1,lwd=10,font=5) m2-c(0.0101348,0.00920886,0.0086196,0.00785134,0.00723838,0.00675833,0.00579093,0.00540478,0.00226489,0.000809049,0.00019625,3.95E-05,8.89E-05,0.000195028,0.000185004,0.000131202,0.000109852,9.98E-05,6.80E-05) taylor.diagram(ref,m2,add=TRUE,pch=19,col=blue,lty=solid,lwd=3) m3-c(0.0123251,0.0120384,0.00871793,0.00678519,0.00628331,0.00532673,0.00486861,0.0048328,0.0038655,0.00143683,0.00022057,8.61E-06,7.79E-05,0.000184976,0.000185927,0.000133771,0.000104613,9.26E-05,6.38E-05) taylor.diagram(ref,m3,add=TRUE,pch=19,col=orange,lty=solid,lwd=3) m4-c(0.0134251,0.0126776,0.012559,0.0121933,0.0099911,0.00727952,0.00475407,0.00227909,0.00130748,0.000705607,0.000304828,5.70E-05,0.000109972,0.000187504,0.0002016,0.000133706,0.000109697,9.54E-05,6.35E-05) taylor.diagram(ref,m4,add=TRUE,pch=19,col=pink,lty=solid,lwd=3) m5-c(0.0124275,0.0112242,0.00886243,0.00793019,0.0067846,0.00603205,0.00566561,0.00530552,0.00318331,0.000961854,0.000218234,3.66E-05,7.99E-05,0.000182724,0.000196627,0.000136862,0.000104907,0.94622,6.20E-05) taylor.diagram(ref,m5,add=TRUE,pch=19,col=purple,lty=solid,lwd=3) m6-c(0.0142817,0.0134474,0.0129694,0.0113914,0.0102208,0.00920309,0.00555206,0.00289796,0.00143831,0.000706277,0.000277201,5.60E-05,0.000114714,0.000186412,0.000198743,0.000134991,0.000108689,9.43E-05,6.16E-05) taylor.diagram(ref,m6,add=TRUE,pch=19,col=brown,lty=solid,lwd=3) m7-c(0.0120621,0.0117936,0.00854782,0.00734006,0.00669576,0.00629334,0.00595018,0.00564455,0.00396859,0.000991006,0.000171742,9.68E-06,8.10E-05,0.000186982,0.00018854,0.000136548,0.000104581,9.18E-05,6.20E-05) taylor.diagram(ref,m7,add=TRUE,pch=19,col=cyan,lty=solid,lwd=3) lpos-3.5*sd(ref) legend(.75*lpos,1.5*lpos,legend=c(,1171,1211,2121,2221,4141,5251),pch=19,col=c(red,blue,orange,pink,purple,brown,cyan)) Thanking you, Warm Regards Roopa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] And Statement for two if functions
I want to perform two if functions at the same time: if(home team away team home team = away team + 7) in R but i am struggling to work out how to write this correctly. Thanks for any help. -- View this message in context: http://r.789695.n4.nabble.com/And-Statement-for-two-if-functions-tp4341179p4341179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package does not have a NAMESPACE
Hello Ondrej, I experienced the same problem and circumvented it by installing R 2.13.2 where the package runs fine. I also tried contacting the authors with not reply so far, but if you manage to solve the NAMESPACE problem in 2.14 I would be interested. The source does not seem to be available. Stefan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Variable selection based on both training and testing data
From: SR Millis srmil...@yahoo.com To: Jin Minming jminm...@yahoo.com Sent: Monday, January 30, 2012 9:25 AM Subject: Re: [R] Variable selection based on both training and testing data Jim, First, stepwise methods for variable selection should be avoided. Frank Harrell (in Regression Modeling Strategies) discusses this at length. Second, splitting a dataset into training and validation sets is generally not a good idea unless you have a really large sample, eg, 20,000. As Harrell has discussed, split-sample validation does not provide external validation, is terribly inefficient, and is arbitrary. It's better to specify your model a priori and use the bootstrap to obtain an estimate of your model's over-optimism. Bootstrapping can be implemented with Harrell's rms package in R. Scott ~~~ Scott R Millis, PhD, ABPP, CStat, PStat® Professor Wayne State University School of Medicine Email: aa3...@wayne.edu Email: srmil...@yahoo.com Tel: 313-993-8085 To: r-help@r-project.org Sent: Monday, January 30, 2012 8:14 AM Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Displaying percentages within bars
Hello, R friends, I've got this graph: p - ggplot(diamonds, aes(x = color)) + scale_fill_brewer(type=seq, pal = Blues) + scale_y_continuous(,formatter=percent) + coord_flip() p+geom_bar(aes(fill=cut),colour='black',position='fill') Is it possible to place percentages within each field of the bars? So, for instance, the dark blue field of color = D would contain a number of about 42.0%. A second question is how to change the size of this number. Any comments are welcome! Mario [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple data frames
Does this example help? It doesn't handle the problem of common field names, but see below for another example. df1 - data.frame(jn=1:4, a1=letters[1:4], a2=LETTERS[1:4]) df2 - data.frame(jn=2:6, b1=month.abb[2:6]) df3 - data.frame(jn=3:7, x=rnorm(5), y=13:17) dfn - sqldf('select * from df1 left join df2 using (jn) left join df3 using (jn)') In this example, you automatically get all fields from all three data frames, without having to name them in the SQL statement -- but you should not have common names. To deal with common names, I myself would probably rename the variables in the data frames before trying to merge. A general method would be something like: nms1 - names(df1) nms1[nms1 != 'date'] - paste(nms1[nms1 != 'date'],'.1',sep='') names(df1) - nms1 Of course it has to be done for every data frame, but this can be put in a loop, if necessary. However, here is an example where I have changed df1 and df2; they both have a field named 'aa', in addition to the matching field. df1 - data.frame(jn=1:4, aa=letters[1:4], a2=LETTERS[1:4]) df2 - data.frame(jn=2:6, aa=month.abb[2:6]) df3 - data.frame(jn=3:7, x=rnorm(5), y=13:17) dfn - sqldf('select jn, df1.aa aa1, df2.aa aa2, a2, x, y from df1 left join df2 using (jn) left join df3 using (jn)') By the way, you can still select *, even with common names: dfx - sqldf('select * from df1 left join df2 using (jn) left join df3 using (jn)')but you might not like the result. Try it and see! It's my understanding that in the current SQL definition 'as' is no longer required when changing field names (though it is also still allowed in the databases I work with, Oracle and MySQL). Perhaps sqldf does not allow it. I don't know. Hope this helps. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/30/12 4:40 AM, Massimo Bressan mbres...@arpa.veneto.it wrote: hi don I followed your advice about using sqldf package but the problem of labelling the fields persists; for some reasons I can not properly handle the sql 'as' statement a_b-sqldf(select a.*, b.* from a left join b on a.date=b.date) a_b_c-sqldf(select a_b.*, c.* from a_b left join c on a_b.date=c.date) bye max - Original Message - From: MacQueen, Don macque...@llnl.gov To: maxbre mbres...@arpa.veneto.it; r-help@r-project.org Sent: Saturday, January 28, 2012 12:24 AM Subject: Re: [R] merge multiple data frames Not tested, but this might be a case for the sqldf package. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/26/12 9:29 AM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 =
Re: [R] And Statement for two if functions
Hi kerry1912, And what exactly would you like to do after the if(...) statement? How did you read your data in? What's the output of str(yourdata)? Please see http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.htmland help us to help you. Regards, Jorge On Mon, Jan 30, 2012 at 9:52 AM, kerry1912 wrote: I want to perform two if functions at the same time: if(home team away team home team = away team + 7) in R but i am struggling to work out how to write this correctly. Thanks for any help. -- View this message in context: http://r.789695.n4.nabble.com/And-Statement-for-two-if-functions-tp4341179p4341179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROC curve
Hi Josiane, Concerning ROC curves, the package ROCR should do what you want to do. Use install.packages to add it to you library. Getting you data into a text file format, use read.delim to read into an data frame. Once you have a data frame, you can use the methods in ROCR to analyze the data. Best, Corey On Mon, Jan 30, 2012 at 1:52 AM, Josiane NJIWA joa...@yahoo.com wrote: Hello all, I am very new to R and i am facing two problems. First i didn't succeed changing the konsole language in english even after trying the line command set language='en'. I would like to plot ROC curves. I have a serie of 10 threshold tests that i do for 10 patients. The prediction for the patients is always the same but the status can change given to the considered threshold. I have 11 columns of 10 rows, the first colums containing the10 lines of the predicted status of the patients (0=cured, 1=non cured). Then follow 10 columns (10 thresholds) containing the found status using the threshold. Please do someone know how i can use those values with R to plot ROC curves? I thank you for your understanding, Josiane. Everything should be made as simple as possible, but not simpler. Albert Einstein. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *The mark of a successful man is one that has spent an entire day on the bank of a river without feeling guilty about it.* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package does not have a NAMESPACE
On Mon, Jan 30, 2012 at 02:35:29PM +, Reinker, Stefan wrote: Hello Ondrej, I experienced the same problem and circumvented it by installing R 2.13.2 where the package runs fine. I also tried contacting the authors with not reply so far, but if you manage to solve the NAMESPACE problem in 2.14 I would be interested. The source does not seem to be available. Hello: The source code kopls_1.1.1.tar.gz is available at http://kopls.sourceforge.net/download.shtml Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] handling a lot of data
This won't help with large memory issues, but just a pointer: When you start to construct data_all with these commands data_all = vector(list, 17); data_all[[1993]] = data1993; The first pre-allocates a list of length 17, but the second adds the data to the 1993rd slot requiring a complete reallocation. Look at length(data_all). You'd be better off in general with something like this: data_all - vector(list, 17) names(data_all) - 1993: 2010 data_all[[1993]] - data1993 etc. which creates a vector of length 17 with components named after the years. If you want to automate that last bit over each year, this would work: for( yr in 1993: 2010){ data_all[[as.character(yr)]] - get(paste(data, yr, sep = )) } It's also been pointed out to me that the Oarray package allows one to start indexing at an arbitrary point (e.g., 1993 for the first slot) which might be helpful for managing your data_all object. Michael On Mon, Jan 30, 2012 at 3:54 AM, Petr Kurtin kur...@avast.com wrote: Hi, I have got a lot of SPSS data for years 1993-2010. I load all data into lists so I can easily index the values over the years. Unfortunately loaded data occupy quite a lot of memory (10Gb) - so my question is, what's the best approach to work with big data files? Can R get a value from the file data without full loading into memory? How can a slower computer with not enough memory work with such data? I use the following commands: data1993 = vector(list, 4); data1993[[1]] = read.spss(...) # first trimester data1993[[2]] = read.spss(...) # second trimester ... data_all = vector(list, 17); data_all[[1993]] = data1993; ... and indexing, e.g.: data_all[[1993]][[1]]$DISTRICT, etc. Thanks, Petr Kurtin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeat function for entire list of matrices
lapply() takes a function in its second argument, but that is not what you passed it. Also, there's no such construct in R as . What happens with the code I gave you? Michael On Sun, Jan 29, 2012 at 4:28 PM, pabears danss...@gmail.com wrote: didn't seem to quite work: i tried different subsetting. lapply(nestedseasonlower, nested(nestedseason,.) are there any functions that can repeat a function while counting each iteration of the repeated function? (n=1, n=2, n=3) thanks -- View this message in context: http://r.789695.n4.nabble.com/repeat-function-for-entire-list-of-matrices-tp4334587p4339299.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question on model.matrix
Greetings On Sat, Jan 28, 2012 at 2:43 PM, Daniel Negusse daniel.negu...@my.mcphs.edu wrote: while reading some tutorials, i came across this and i am stuck. i want to understand it and would appreciate if anyone can tell me. design - model.matrix(~ -1+factor(c(1,1,2,2,3,3))) can someone break down this code and explain to me what the ~, and the -1+factor are doing? A formula would be y ~ x, so when you don't include y, it means you only want the right hand side variables. The term design matrix generally means the numeric coding that is fitted in a statistical procedure. The -1 in the formula means do not insert an intercept for me. It affects the way the factor variable is converted to numeric contrasts in the design matrix. If there is an intercept, then the contrasts have to be adjusted to prevent perfect multicollinearity. If you run a few examples, you will see. This uses lm, but the formula and design matrix ideas are same. Note, with an intercept, I get 3 dummy variables from x2, but with no intercept, I get 4 dummies: x1 - rnorm(16) x2 - gl(4, 4, labels=c(none,some,more,lots)) y - rnorm(16) m1 - lm(y ~ x1 + x2) model.matrix(m1) (Intercept) x1 x2some x2more x2lots 11 -0.2567 0 0 0 21 0.94963659 0 0 0 31 0.06915561 0 0 0 41 0.89971204 0 0 0 51 0.73817482 1 0 0 61 2.92451195 1 0 0 71 -0.80682449 1 0 0 81 1.07472998 1 0 0 91 1.34949123 0 1 0 10 1 -0.42203984 0 1 0 11 1 -1.66316740 0 1 0 12 1 -2.83232063 0 1 0 13 1 1.26177313 0 0 1 14 1 0.10359857 0 0 1 15 1 -1.85671242 0 0 1 16 1 -0.25140729 0 0 1 attr(,assign) [1] 0 1 2 2 2 attr(,contrasts) attr(,contrasts)$x2 [1] contr.treatment m2 - lm(y ~ -1 + x1 + x2) model.matrix(m2) x1 x2none x2some x2more x2lots 1 -0.2567 1 0 0 0 2 0.94963659 1 0 0 0 3 0.06915561 1 0 0 0 4 0.89971204 1 0 0 0 5 0.73817482 0 1 0 0 6 2.92451195 0 1 0 0 7 -0.80682449 0 1 0 0 8 1.07472998 0 1 0 0 9 1.34949123 0 0 1 0 10 -0.42203984 0 0 1 0 11 -1.66316740 0 0 1 0 12 -2.83232063 0 0 1 0 13 1.26177313 0 0 0 1 14 0.10359857 0 0 0 1 15 -1.85671242 0 0 0 1 16 -0.25140729 0 0 0 1 attr(,assign) [1] 1 2 2 2 2 attr(,contrasts) attr(,contrasts)$x2 [1] contr.treatment I think you'll need to mess about with R basics like plot and lm before you go off using the formulas that you really care about. Otherwise, well, you'll always be lost about stuff like ~ and -1. I've started posting all my lecture notes (source code, R code, pdf output) http://pj.freefaculty.org/guides. That might be a quick start for you. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] handling a lot of data
If you do not need all the variables in the SPSS files, use package 'memisc'. spss.system.file() and it's subset() allow you to just load the variables needed. You will need to transform into data.frame as the memisc data.set includes the SPSS attributes, user-missings etc. Paul Bivand Centre for Economic and Social Inclusion London On 30 January 2012 16:02, R. Michael Weylandt michael.weyla...@gmail.com wrote: This won't help with large memory issues, but just a pointer: When you start to construct data_all with these commands data_all = vector(list, 17); data_all[[1993]] = data1993; The first pre-allocates a list of length 17, but the second adds the data to the 1993rd slot requiring a complete reallocation. Look at length(data_all). You'd be better off in general with something like this: data_all - vector(list, 17) names(data_all) - 1993: 2010 data_all[[1993]] - data1993 etc. which creates a vector of length 17 with components named after the years. If you want to automate that last bit over each year, this would work: for( yr in 1993: 2010){ data_all[[as.character(yr)]] - get(paste(data, yr, sep = )) } It's also been pointed out to me that the Oarray package allows one to start indexing at an arbitrary point (e.g., 1993 for the first slot) which might be helpful for managing your data_all object. Michael On Mon, Jan 30, 2012 at 3:54 AM, Petr Kurtin kur...@avast.com wrote: Hi, I have got a lot of SPSS data for years 1993-2010. I load all data into lists so I can easily index the values over the years. Unfortunately loaded data occupy quite a lot of memory (10Gb) - so my question is, what's the best approach to work with big data files? Can R get a value from the file data without full loading into memory? How can a slower computer with not enough memory work with such data? I use the following commands: data1993 = vector(list, 4); data1993[[1]] = read.spss(...) # first trimester data1993[[2]] = read.spss(...) # second trimester ... data_all = vector(list, 17); data_all[[1993]] = data1993; ... and indexing, e.g.: data_all[[1993]][[1]]$DISTRICT, etc. Thanks, Petr Kurtin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] discrete simulated annealing
Dear All, I need to use simulated annealing for optimization is there a way to limit the search place to only discrete values? And also exclude certain solutions, e.g. exclude the solutions when all the variables are the same? many thanks Yan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl format
Hi KTD Services (!) I assume by DELETE, you mean the HTTP method and not the value of a parameter named _method that is processed by the URL script. If that is the case, then you want to use the customRequest option for the libcurl operation and you don't need or want to use postForm(). Either curlPerform(url = url, customrequest = DELETE, userpwd = user:password) or with a recent version of the RCurl package httpDELETE(url, userpwd = user:password) The parameter _method you are using is being passed on to the form script. It is not recognized by postForm() as being something controlling the request, but just part of the form submission. D. On 1/30/12 2:55 AM, KTD Services wrote: I am having trouble with the postForm function in RCurl. I want to send a the command DELETE https://somewebsite.com.json but I can't seem to find it. I could try: postForm(url, _method=DELETE, .opts = list(username:password) ) but I get the error: Error: unexpected input in postForm(url4, _ this error seems to be due to the underscore _ before method Any ideas how I can do a DELETE command another way in RCurl? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROC curve
On Jan 30, 2012, at 4:52 AM, Josiane NJIWA wrote: Hello all, I am very new to R and i am facing two problems. First i didn't succeed changing the konsole language in english even after trying the line command set language='en'. R is a functional language, so it shouldn't surprise you that issuing a command does not do what you apparently expected based on your experience with macro languages. You should read: ?locales I would like to plot ROC curves. I have a serie of 10 threshold tests that i do for 10 patients. The prediction for the patients is always the same but the status can change given to the considered threshold. I have 11 columns of 10 rows, the first colums containing the10 lines of the predicted status of the patients (0=cured, 1=non cured). Then follow 10 columns (10 thresholds) containing the found status using the threshold. Please do someone know how i can use those values with R to plot ROC curves? I thank you for your understanding, Josiane. Everything should be made as simple as possible, but not simpler.Albert Einstein. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] And Statement for two if functions
On Jan 30, 2012, at 9:52 AM, kerry1912 wrote: I want to perform two if functions at the same time: if(home team away team home team = away team + 7) in R but i am struggling to work out how to write this correctly. Generally newcomers to the R language find that the ifelse function does what they expect. The if function is quite different and seemes less likely to be what you wnat: ?Control ?ifelse -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] discrete simulated annealing
On Mon, Jan 30, 2012 at 04:57:36PM +, yan jiao wrote: Dear All, I need to use simulated annealing for optimization is there a way to limit the search place to only discrete values? And also exclude certain solutions, e.g. exclude the solutions when all the variables are the same? Dear Yan: The page ?optim says If a function to generate a new candidate point is given, method ‘SANN’ can also be used to solve combinatorial optimization problems. If you have your specific function to generate a new point, this function may apply the required restrictions. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking for invalid dates: Code works but needs improvement
On Jan 30, 2012, at 8:44 AM, Paul Miller wrote: Hi Rui, Marc, and Gabor, Thanks for your replies to my question. All were helpful and it was interesting to see how different people approach various aspects of the same problem. Spent some time this weekend looking at Rui's solution, which is certainly much clearer than my own. Managed to figure out pretty much all the details of how it works. Also managed to tweak it slightly in order to make it do exactly what I wanted. (See revised code below.) Still have a couple of questions though. The first concerns the insertion of the code Y 2012 to set year values beyond 2012 to NA (on line 10 of the function below). When I add this (or use it in place of nchar(Y) 4), the code succesfully finds the problem date 05/16/2015. After that though, it produces the following error message: Error in if (any(is.na(x) M != un Y != un)) cat(Warning: Invalid date values in, : missing value where TRUE/FALSE needed It's a bit dangerous to use comparison operators on mixed data types. In your case you are comparing a character value to a numeric value and may not realize that 2015 is not the same as 2015. Try 123 1000 if you want a quick counter-example. You may want to coerce the Y value to numeric mode to be safe. Also 'any' does not expect the logical connectives. You probably want: any(is.na(x) , M != un , Y != un) Why is this happening? If the code correctly correctly handles the date 06/20/1840 without producing an error, why can't it do likelwise with 05/16/2015? The second question is why it's necessary to put x on line 15 following cat(Warning ...). I know that I don't get any date columns if I don't include this but am not sure why. The third question is whether it's possible to change the class of the date variables without using a for loop. I played around with this a little but didn't find a vectorized alternative. It may be that this is not really important. It's just that I've read in several places that for loops should be avoided wherever possible. Thanks, Paul ## Code for detecting invalid dates ## Test Data connection - textConnection( 1 11/23/21931 05/23/2009 un/17/2011 2 06/20/1840 02/30/2010 03/17/2011 3 06/17/1935 12/20/2008 07/un/2011 4 05/31/1937 01/18/2007 04/30/2011 5 06/31/1933 05/16/2015 11/20/un ) TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection) Input Data TDSaved - TestDates List of Date Variables DateNames - c(birthDT, diagnosisDT, metastaticDT) Date Function fun - function(Dat){ f - function(jj, DF){ x - as.character(DF[, jj]) x - unlist(strsplit(x, /)) n - length(x) M - x[seq(1, n, 3)] D - x[seq(2, n, 3)] Y - x[seq(3, n, 3)] D[D == un] - 15 Y - ifelse(nchar(Y) 4 | Y 2012 | Y 1900, NA, Y) x - as.Date(paste(Y, M, D, sep=-), format=%Y-%m-%d) if(any(is.na(x) M != un Y != un)) cat(Warning: Invalid date values in, jj, \n, as.character(DF[is.na(x), jj]), \n) x } Dat - data.frame(sapply(names(Dat), function(j) f(j, Dat))) for(i in names(Dat)) class(Dat[[i]]) - Date Dat } Output Data TD - TDSaved Read Dates TD[, DateNames] - fun(TD[, DateNames]) TD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable selection based on both training and testing data
I do not have enough test data for regression analysis although I know there are some statistical regression methods that can be used for small dataset. That is why I need build a model firslty using training dataset. Thanks, Jim --- On Mon, 30/1/12, Liaw, Andy andy_l...@merck.com wrote: From: Liaw, Andy andy_l...@merck.com Subject: RE: [R] Variable selection based on both training and testing data To: 'Jin Minming' jminm...@yahoo.com, r-help@r-project.org r-help@r-project.org Date: Monday, 30 January, 2012, 13:39 Variable section is part of the training process-- it chooses the model. By definition, test data is used only for testing (evaluating chosen model). If you find a package or function that does variable selection on test data, run from it! Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jin Minming Sent: Monday, January 30, 2012 8:14 AM To: r-help@r-project.org Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replacing characters in matrix. substitute, delayedAssign, huh?
A user question today has me stumped. Can you advise me, please? User wants a matrix that has some numbers, some variables, possibly even some function names. So that has to be a character matrix. Consider: BM - matrix(0.1, 5, 5) Use data.entry(BM) or similar to set some to more abstract values. BM[3,1] - a BM[4,2] - b BM[5,2] - b BM[5,3] - d BM var1 var2 var3 var4 var5 [1,] 0.1 0.1 0.1 0.1 0.1 [2,] 0.1 0.1 0.1 0.1 0.1 [3,] a 0.1 0.1 0.1 0.1 [4,] 0.1 b 0.1 0.1 0.1 [5,] 0.1 b d 0.1 0.1 Later on, user code will set values, e.g., a - rnorm(1) b - 17 d - 4 Now, push those into BM, convert whole thing to numeric newBM - apply(BM, c(1,2), as.numeric) and use newBM for some big calculation. Then re-set new values for a, b, d, do the same over again. I've been trying lots of variations on parse, substitute, and eval. The most interesting function I learned about this morning was delayedAssign. If I had only to work with one scalar, it does what I want delayedAssign(a, whatA) whatA - 91 a [1] 91 I can't see how to make that work in the matrix context, though. Got ideas? pj sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.14.1 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot with ylim with regural interval
Dear Researchers, sorry for the easy question but Is it possible to plot with an interval of 1 or .5 in a plot using ylim? Thanks gianni x = 0:10; y = 0:10; plot(x~y,ylim=c(0,10),las=1) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?
Are you sure this isn't a dataframe? Some minor rethinking of the structure might get it there. Rich On Mon, Jan 30, 2012 at 1:26 PM, Paul Johnson pauljoh...@gmail.com wrote: A user question today has me stumped. Can you advise me, please? User wants a matrix that has some numbers, some variables, possibly even some function names. So that has to be a character matrix. Consider: BM - matrix(0.1, 5, 5) Use data.entry(BM) or similar to set some to more abstract values. BM[3,1] - a BM[4,2] - b BM[5,2] - b BM[5,3] - d BM var1 var2 var3 var4 var5 [1,] 0.1 0.1 0.1 0.1 0.1 [2,] 0.1 0.1 0.1 0.1 0.1 [3,] a 0.1 0.1 0.1 0.1 [4,] 0.1 b 0.1 0.1 0.1 [5,] 0.1 b d 0.1 0.1 Later on, user code will set values, e.g., a - rnorm(1) b - 17 d - 4 Now, push those into BM, convert whole thing to numeric newBM - apply(BM, c(1,2), as.numeric) and use newBM for some big calculation. Then re-set new values for a, b, d, do the same over again. I've been trying lots of variations on parse, substitute, and eval. The most interesting function I learned about this morning was delayedAssign. If I had only to work with one scalar, it does what I want delayedAssign(a, whatA) whatA - 91 a [1] 91 I can't see how to make that work in the matrix context, though. Got ideas? pj sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.14.1 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to select columns
On Jan 30, 2012, at 2:30 AM, David Studer wrote: Hello, I have the following question: when creating a data.frame a1-c(1,2,3) a2-c(1,2,3) c-data.frame(a1,a2) I can select columns using an index like: c[,1:2] Is this possible too when using column-names? (something like c(,a1:a2), which doesn't work): Generally you need to use grep to convert column names to numbers for use within [ operations] df[ , grep(^a1$, names(df)):grep^a2$, names(df)) ] -- Another David Alternative question: Is there a function to get the index of a variable by name That's what grep will do. or can I select certain columns using a loop? (a_1, a_2, ..., a_n) Thank you very much! David David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ode() tries to allocate an absurd amount of memory
Hi there R-helpers: I'm having problems with the function ode() found in the package deSolve. It seems that when my state variables are too numerous (33000 elements), the function throws the following error: Error in vode(y, times, func, parms, ...) : cannot allocate memory block of size 137438953456.0 Gb In addition: Warning message: In vode(y, times, func, parms, ...) : NAs introduced by coercion This appears to be case regardless of the computer I use; that is, whether it's a laptop or server with 24Gb of RAM. Why is ode() trying to allocate 137 billion gigabytes of memory?! (I receive exactly the same error message whether I have, for example, 34000 or 8 state variables: the amount of memory trying to be allocated is exactly the same.) I have included a trivial example below that uses a function that returns a rate of change of zero for all state variables. require(deSolve) Loading required package: deSolve C-rep(0,34000) TestFunc-function(t,C,para){ + return(list(rep(0,length(C + } soln-ode(y=C,times=seq(0,1,0.1),func=TestFunc,parms=c(0),method=vode) Error in vode(y, times, func, parms, ...) : cannot allocate memory block of size 137438953456.0 Gb In addition: Warning message: In vode(y, times, func, parms, ...) : NAs introduced by coercion Am I making a foolish mistake somewhere or is this simply a limitation of the function? Thanks in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Euler identity with complex exp
Hi, Am i doing something silly here in expecting Euler's formula to be handled by exp? exp( ix ) = cos x + i sin x. The first example below follows this, the others not. Thanks for the education! exp( complex(real = 0, imag = 2*pi) ) [1] 1-0i exp( complex(real = pi, imag = 2*pi) ) [1] 23.14069-0i exp( complex(real = pi/2, imag = 0) ) [1] 4.810477+0i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Variable selection based on both training and testing data
From: SR Millis srmil...@yahoo.com To: Jin Minming jminm...@yahoo.com Sent: Monday, January 30, 2012 9:25 AM Subject: Re: [R] Variable selection based on both training and testing data Jim, First, stepwise methods for variable selection should be avoided. Frank Harrell (in Regression Modeling Strategies) discusses this at length. Second, splitting a dataset into training and validation sets is generally not a good idea unless you have a really large sample, eg, 20,000. As Harrell has discussed, split-sample validation does not provide external validation, is terribly inefficient, and is arbitrary. It's better to specify your model a priori and use the bootstrap to obtain an estimate of your model's over-optimism. Bootstrapping can be implemented with Harrell's rms package in R. Scott ~~~ Scott R Millis, PhD, ABPP, CStat, PStat® Professor Wayne State University School of Medicine Email: aa3...@wayne.edu Email: srmil...@yahoo.com Tel: 313-993-8085 To: r-help@r-project.org Sent: Monday, January 30, 2012 8:14 AM Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reg : Hello all.. help needed regarding heatmaps
Hello all , I am beginner and new to this -R world. I have heard much about R and started working on it. I have some data of 20 business applications( y -axis) and Months( x-axis) and values as their score for every month . I tried to generate a heatmap with this data and got some good results. Can some one help me on how to generate the legend next to heatmap please... can some one send me a sample code ..? Some of the useful link that i found on web : http://www.oga-lab.net/RGM2/func.php?rd_id=gplots:heatmap.2 Thanks in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate a function repeatedly over sections of a ts object
Thank you very much Mike. The script is working now. Jorge From: R. Michael Weylandt [michael.weyla...@gmail.com] Sent: 30 January 2012 04:29 To: Jorge Molinos; r-help Subject: Re: [R] Calculate a function repeatedly over sections of a ts object Sorry, that last line should read: FUN=function(z){ lz - length(z) SDF(z,method=lag window, window=taper(type=parzen,n.sample=lz,cutoff= 2*sqrt(lz)), npad=2*lz) } On Sun, Jan 29, 2012 at 11:29 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: It's customary to keep the list cc'd. I can't run your code without the data, but it does seem to me that your problem is in the FUN argument, as you guess. You have: FUN=function(z) SDF(adezoo,method=lag window, window=taper(type=parzen,n.sample=n.d,cutoff=(2*sqrt(n.d))), npad=2*n.d) But this function doesn't actually act on it's argument: you tell it to accept something called z but then it never gets told to do anything to z. Perhaps you meant FUN=function(z) SDF(z,method=lag window, window=taper(type=parzen,n.sample=n.d,cutoff=(2*sqrt(n.d))), npad=2*n.d) I also worry about your use of n.d; are you sure you don't want to use the length of the rolling window? Something more like: FUN=function(z){ lz - length(z) SDF(z,method=lag window, window=taper(type=parzen,n.sample=lz,cutoff= 2*sqrt(lz)), npad=2*nlz) } Does that fix it? Michael On Fri, Jan 27, 2012 at 1:06 PM, Jorge Molinos jgarc...@tcd.ie wrote: Hi Michael, Sorry, I've been trying to use rollapply with my function but it seems I can't get it to work properly. The function seems to be dividing the time series accordingly (every 1) and using the correct length for the time window (10 years) but when I look at the results all of them are the same for all the subseries which doesn't make sense. The problem has to be within the FUN argument though I cannot figure out what it is. Would you mind checking on the code to see if you can spot where is the problem? adets-ts(adeery$DA,c(adeery$Year[1],adeery$Day[1]),frequency=365) adezoo-as.zoo(adets) n.d-length(adets) especlist-rollapply(adezoo, width=3650, FUN=function(z) SDF(adezoo,method=lag window, window=taper(type=parzen,n.sample=n.d,cutoff=(2*sqrt(n.d))), npad=2*n.d), by = 365, align=left) And these are, for example, the SDF values at the last day for each 10-y subseries (all the same though they should be different as I have it verify by doing the SDF step by step using the same values for the arguments within the function): especlist1.7048 1978(20)1.998068e-06 1979(20)1.998068e-06 1980(20)1.998068e-06 1981(20)1.998068e-06 1982(20)1.998068e-06 1983(20)1.998068e-06 1984(20)1.998068e-06 1985(20)1.998068e-06 1986(20)1.998068e-06 1987(20)1.998068e-06 Thanks a lot. Jorge From: R. Michael Weylandt [michael.weyla...@gmail.com] Sent: 26 January 2012 21:00 To: Jorge Molinos Cc: r-help@R-project.org Subject: Re: [R] Calculate a function repeatedly over sections of a ts object I'm not sure if it's easily doable with a ts class, but the rollapply function in the zoo package will do this easily. (Also, I find zoo to be a much more natural time-series workflow than ts so it might make the rest of your life easier as well) Michael On Thu, Jan 26, 2012 at 2:24 PM, Jorge Molinos jgarc...@tcd.ie wrote: Hi, I want to apply a function (in my case SDF; package “sapa”) repeatedly over discrete sections of a daily time series object by sliding a time window of constant length (e.g. 10 consecutive years or 1825 days) over the entire ts at increments of 1 time unit (e.g. 1 year or 365 days). So for example, the first SDF would be calculated for the daily values of my variable recorded between years 1 to 5, SDF2 to those for years 2 to 6 and so on until the total length of the series is covered. How can I implement this into a R script? Any help is much appreciated. Jorge __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to sum multiple data entries for the same sampling event?
I'm having trouble with some catch per unit effort data (CPUE, fisheries data). Some of the samples were retained and some unretained, and they are entered as 2 separate entries for the same sampling event (Date and time). I want to calculate the total CPUE (so sum the retained and unretained number for each sampling event) and am having troubld doing so. Here's a sample of what my data.frame looks like now: Date lmb.cpue Disposition.of.Catch 1999-07-10 12:10:00 0.6667 Unretained 1999-07-10 12:10:00 0.1667 Retained 1999-07-14 11:22:00 0.8333 Unretained 1999-07-14 11:22:00 0.5556 Retained 1999-07-14 11:48:00 0.1667 Unretained 1999-07-14 11:48:00 0.5833 Retained 1999-07-14 13:56:00 0.57142857 Retained 1999-07-15 10:23:00 0. Retained 1999-07-22 12:03:00 0. Retained 1999-07-25 11:26:00 0.4000 Unretained 1999-07-25 11:26:00 1. Retained And I would like to end up with: Date lmb.cpue 1999-07-10 12:10:00 0.8333 1999-07-14 11:22:00 1.3889 1999-07-14 11:48:00 0.7500 1999-07-14 13:56:00 0.57142857 1999-07-15 10:23:00 0. 1999-07-22 12:03:00 0. 1999-07-25 11:26:00 1.4000 Thanks for any help you have to offer! -- View this message in context: http://r.789695.n4.nabble.com/how-to-sum-multiple-data-entries-for-the-same-sampling-event-tp4341670p4341670.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot with ylim with regural interval
Hi Gianni, Yes, take a look at x - y - 1:10 plot(x, y, ylim=c(0,10),las=1, yaxt = 'n') axis(2, seq(0, 10, by = .5), seq(0, 10, by = .5), las = 2) plot(x, y, ylim=c(0,10),las=1, yaxt = 'n') axis(2, seq(0, 10, by = 1), seq(0, 10, by = 1), las = 2) Also, check ?plot and ?par for more details. HTH, Jorge.- On Mon, Jan 30, 2012 at 1:28 PM, gianni lavaredo wrote: Dear Researchers, sorry for the easy question but Is it possible to plot with an interval of 1 or .5 in a plot using ylim? Thanks gianni x = 0:10; y = 0:10; plot(x~y,ylim=c(0,10),las=1) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Variable selection based on both training and testing data
Dear Scott, I am so sorry that I think I just sent an empty email to you. Thanks a lot for your advice. The problem is that we do not have sufficient prior knowledge for the regression form and even appropriate inputs. We need try to find some possible regression equations, then add our explanation to them. So we need explore a lot of options. The two input datasets are very different in nature and they are from two locations. Hence, it can be used for testing purpose although it may turn out to be that there is not an appropriate regression due to the intrinsic difference in these two datasets. In fact, if I can extract the models used (not only the final model) in stepAIC function, then it will be easier to add some simple scripts to calculate R2 or RMSE for both datasets. Thanks, Jim --- On Mon, 30/1/12, SR Millis aa3...@wayne.edu wrote: From: SR Millis aa3...@wayne.edu Subject: [R] Fw: Variable selection based on both training and testing data To: r-help@r-project.org r-help@r-project.org Date: Monday, 30 January, 2012, 14:57 From: SR Millis srmil...@yahoo.com To: Jin Minming jminm...@yahoo.com Sent: Monday, January 30, 2012 9:25 AM Subject: Re: [R] Variable selection based on both training and testing data Jim, First, stepwise methods for variable selection should be avoided. Frank Harrell (in Regression Modeling Strategies) discusses this at length. Second, splitting a dataset into training and validation sets is generally not a good idea unless you have a really large sample, eg, 20,000. As Harrell has discussed, split-sample validation does not provide external validation, is terribly inefficient, and is arbitrary. It's better to specify your model a priori and use the bootstrap to obtain an estimate of your model's over-optimism. Bootstrapping can be implemented with Harrell's rms package in R. Scott ~~~ Scott R Millis, PhD, ABPP, CStat, PStat® Professor Wayne State University School of Medicine Email: aa3...@wayne.edu Email: srmil...@yahoo.com Tel: 313-993-8085 To: r-help@r-project.org Sent: Monday, January 30, 2012 8:14 AM Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to select columns
On Jan 30, 2012, at 12:33 PM, David Winsemius wrote: On Jan 30, 2012, at 2:30 AM, David Studer wrote: Hello, I have the following question: when creating a data.frame a1-c(1,2,3) a2-c(1,2,3) c-data.frame(a1,a2) I can select columns using an index like: c[,1:2] Is this possible too when using column-names? (something like c(,a1:a2), which doesn't work): Generally you need to use grep to convert column names to numbers for use within [ operations] df[ , grep(^a1$, names(df)):grep^a2$, names(df)) ] -- Another David Just to throw out another option here, the ?subset function has a 'select' argument, which supports a start:end syntax to extract sequential columns from a data frame. Thus: subset(DF, StartColumnName:EndColumnName) gets you that ability. The column names are NOT quoted, so in your case: subset(DF, select = a1:a2) You can even select sequential and non-sequential columns by using c() along with the start:end syntax: subset(DF, select = c(ColA, ColF:ColH, ColK, ColN:ColW, ColZ)) HTH, Marc Schwartz Alternative question: Is there a function to get the index of a variable by name That's what grep will do. or can I select certain columns using a loop? (a_1, a_2, ..., a_n) Thank you very much! David David Winsemius, MD Heritage Laboratories West Hartford, CT \ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeat function for entire list of matrices
michael, i don't know what happened, i was reading up on ?lapply(), i was up really late, and somehow it didn't seem to take, but i tried it again this morning and it worked like a charm.(sorry about the ellipses, i was just being lazy/unclear). that's great, thanks, this is a great help... -- View this message in context: http://r.789695.n4.nabble.com/repeat-function-for-entire-list-of-matrices-tp4334587p4341629.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Mixed Model set-up
Hello, I have some data covering contaminant concentrations in fish over a time period of ~35 years. Each year, multiple samples of fish were taken (with varying sample sizes each year). Ultimately, I want an estimation of the variance between years, and the variance within years + random effects. I used a linear mixed model to estimate these variances, but after reading a number of different references and examples, I am still unclear as to whether I have set up the model correctly to obtain these values. I've used the *lme* function as follows - the example here is on an abbreviated version of my data set: fish-read.csv(data.csv,header=TRUE) fish SPECIES YEAR CONTAMINANT 1 Walleye 19702.83 2 Walleye 19702.56 3 Walleye 19702.83 4 Walleye 19702.56 5 Walleye 19702.77 6 Walleye 19702.56 7 Walleye 19702.64 8 Walleye 19702.22 9 Walleye 19702.56 10 Walleye 19702.40 11 Walleye 19751.59 12 Walleye 19751.53 13 Walleye 19752.16 14 Walleye 19751.60 15 Walleye 19752.16 16 Walleye 19762.03 17 Walleye 19761.97 18 Walleye 19761.95 19 Walleye 19762.36 20 Walleye 19761.82 21 Walleye 19761.99 22 Walleye 19771.06 23 Walleye 19772.00 24 Walleye 19771.97 25 Walleye 19772.00 26 Walleye 19771.99 27 Walleye 19771.95 28 Walleye 19772.10 29 Walleye 19772.29 30 Walleye 19772.20 31 Walleye 19791.90 32 Walleye 19791.98 33 Walleye 19792.00 34 Walleye 19792.11 35 Walleye 19801.92 36 Walleye 19802.00 37 Walleye 19801.98 38 Walleye 19802.25 39 Walleye 19811.22 40 Walleye 19811.36 41 Walleye 19811.48 42 Walleye 19811.86 43 Walleye 19811.41 44 Walleye 19821.25 45 Walleye 19821.10 46 Walleye 19821.28 47 Walleye 19821.28 48 Walleye 19821.77 49 Walleye 19821.59 50 Walleye 19821.61 51 Walleye 19821.55 52 Walleye 19841.25 53 Walleye 19841.41 54 Walleye 19841.50 55 Walleye 19841.39 contaminant-fish$CONTAMINANT year-fish$YEAR mod-lme(contaminant~year,random=~1|year,data=data) varcomp(mod,cum=FALSE) year Within 0.02695566 0.05758531 attr(,class) [1] varcomp Thanks in advance for your help - I very new to formula-building in R. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?
The quick solution: parseAndEval - function(x, ...) eval(parse(text=x)) apply(BM, MARGIN=c(1,2), FUN=parseAndEval) My $.02 /Henrik On Mon, Jan 30, 2012 at 10:26 AM, Paul Johnson pauljoh...@gmail.com wrote: A user question today has me stumped. Can you advise me, please? User wants a matrix that has some numbers, some variables, possibly even some function names. So that has to be a character matrix. Consider: BM - matrix(0.1, 5, 5) Use data.entry(BM) or similar to set some to more abstract values. BM[3,1] - a BM[4,2] - b BM[5,2] - b BM[5,3] - d BM var1 var2 var3 var4 var5 [1,] 0.1 0.1 0.1 0.1 0.1 [2,] 0.1 0.1 0.1 0.1 0.1 [3,] a 0.1 0.1 0.1 0.1 [4,] 0.1 b 0.1 0.1 0.1 [5,] 0.1 b d 0.1 0.1 Later on, user code will set values, e.g., a - rnorm(1) b - 17 d - 4 Now, push those into BM, convert whole thing to numeric newBM - apply(BM, c(1,2), as.numeric) and use newBM for some big calculation. Then re-set new values for a, b, d, do the same over again. I've been trying lots of variations on parse, substitute, and eval. The most interesting function I learned about this morning was delayedAssign. If I had only to work with one scalar, it does what I want delayedAssign(a, whatA) whatA - 91 a [1] 91 I can't see how to make that work in the matrix context, though. Got ideas? pj sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.14.1 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euler identity with complex exp
Seems fine to me: exp(pi + i*2pi) = exp(pi) * exp(i *2pi) = exp(pi) * (cos(2pi) + i*sin(2*pi)) = exp(pi) *(1+ 0i) = exp(pi) ~ 23.14 exp(pi/2) ~ 4.81 What would you expect? Michael On Mon, Jan 30, 2012 at 10:37 AM, Joseph Park josephp...@ieee.org wrote: Hi, Am i doing something silly here in expecting Euler's formula to be handled by exp? exp( ix ) = cos x + i sin x. The first example below follows this, the others not. Thanks for the education! exp( complex(real = 0, imag = 2*pi) ) [1] 1-0i exp( complex(real = pi, imag = 2*pi) ) [1] 23.14069-0i exp( complex(real = pi/2, imag = 0) ) [1] 4.810477+0i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reg : Hello all.. help needed regarding heatmaps
If you don't mind using an external (but very popular) graphics package known as ggplot2 it's super easy: https://learnr.wordpress.com/2010/01/26/ggplot2-quick-heatmap-plotting/ I'm sure it can be done in base graphics as well, but I'll leave that to someone else. It's also well implemented in gplots::heatmap.2 (that is, the heatmap.2 function from the gplots package which, name notwithstanding, is unrelated to ggplot2). Run if(!require(gplots)) {install.packages(gplot); library(gplot)} example(heatmap.2) for some examples. Michael On Mon, Jan 30, 2012 at 10:47 AM, koushik gangavaram kgangava...@gmail.com wrote: Hello all , I am beginner and new to this -R world. I have heard much about R and started working on it. I have some data of 20 business applications( y -axis) and Months( x-axis) and values as their score for every month . I tried to generate a heatmap with this data and got some good results. Can some one help me on how to generate the legend next to heatmap please... can some one send me a sample code ..? Some of the useful link that i found on web : http://www.oga-lab.net/RGM2/func.php?rd_id=gplots:heatmap.2 Thanks in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?
On Mon, Jan 30, 2012 at 1:26 PM, Paul Johnson pauljoh...@gmail.com wrote: A user question today has me stumped. Can you advise me, please? User wants a matrix that has some numbers, some variables, possibly even some function names. So that has to be a character matrix. Consider: BM - matrix(0.1, 5, 5) Use data.entry(BM) or similar to set some to more abstract values. BM[3,1] - a BM[4,2] - b BM[5,2] - b BM[5,3] - d BM var1 var2 var3 var4 var5 [1,] 0.1 0.1 0.1 0.1 0.1 [2,] 0.1 0.1 0.1 0.1 0.1 [3,] a 0.1 0.1 0.1 0.1 [4,] 0.1 b 0.1 0.1 0.1 [5,] 0.1 b d 0.1 0.1 Later on, user code will set values, e.g., a - rnorm(1) b - 17 d - 4 Now, push those into BM, convert whole thing to numeric newBM - apply(BM, c(1,2), as.numeric) and use newBM for some big calculation. Then re-set new values for a, b, d, do the same over again. I've been trying lots of variations on parse, substitute, and eval. The most interesting function I learned about this morning was delayedAssign. If I had only to work with one scalar, it does what I want delayedAssign(a, whatA) whatA - 91 a [1] 91 I can't see how to make that work in the matrix context, though. You can do this: m - list(a, 1L, 2.5, function(x)x^2) dim(m) - c(2, 2) m [,1] [,2] [1,] a 2.5 [2,] 1? # Run the function in 2,2 passing it argument in 1,2 m[[2,2]]( m[[1, 2]] ) [1] 6.25 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euler identity with complex exp
Not sure why you think the formula does not hold... but am guessing you think that sin(x) and cos(x) are have values in [-1, 1]? Well that only holds for real x. If you have a complex x, sin(x) and cos(x) are unbounded - indeed, if you can write x=iy and y is real, you can show (up to my own ignorance of possible signs) cos(x) = cosh(y), and sin(x) = -sinh(y) simply by expressing (from the formula you wrote) cos(x) and sin(x) as cos(x) = ( exp(ix) + exp(-ix) )/2 and sin(x) = ( exp(ix) - exp(-ix) )/2 In any case, plug any complex number into exp( ix ) and cos x + i sin x in R and you will get the exact same answers. HTH, Peter On Mon, Jan 30, 2012 at 7:37 AM, Joseph Park josephp...@ieee.org wrote: Hi, Am i doing something silly here in expecting Euler's formula to be handled by exp? exp( ix ) = cos x + i sin x. The first example below follows this, the others not. Thanks for the education! exp( complex(real = 0, imag = 2*pi) ) [1] 1-0i exp( complex(real = pi, imag = 2*pi) ) [1] 23.14069-0i exp( complex(real = pi/2, imag = 0) ) [1] 4.810477+0i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot with ylim with regural interval
On Jan 30, 2012, at 1:28 PM, gianni lavaredo wrote: Dear Researchers, sorry for the easy question but Is it possible to plot with an interval of 1 or .5 in a plot using ylim? Thanks gianni x = 0:10; y = 0:10; plot(x~y,ylim=c(0,10),las=1) plot(x~y,ylim=c(0,10), xaxt=n) axis(1, at=seq(0, 10, by=0.5) , labels= seq(0, 10, by=0.5), cex.axis=0.75) Unless you make the cex.axis number small enough, axis() won't put them all in. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to sum multiple data entries for the same sampling event?
Perhaps something like # Untested library(plyr) ddply(DATA, Date, function(d) sum(d$lmb.cpue)) For example, on some fake data DATA - data.frame(class = rep(letters[1:5], each = 2), type = rep(c(good, bad), 5), value = rnorm(10)) ddply(DATA, class, function(d) sum(d$value)) If you want to send example data, it's best to send it with the plaintext output of dput(). Michael On Mon, Jan 30, 2012 at 12:16 PM, karengrace84 kgfis...@alumni.unc.edu wrote: I'm having trouble with some catch per unit effort data (CPUE, fisheries data). Some of the samples were retained and some unretained, and they are entered as 2 separate entries for the same sampling event (Date and time). I want to calculate the total CPUE (so sum the retained and unretained number for each sampling event) and am having troubld doing so. Here's a sample of what my data.frame looks like now: Date lmb.cpue Disposition.of.Catch 1999-07-10 12:10:00 0.6667 Unretained 1999-07-10 12:10:00 0.1667 Retained 1999-07-14 11:22:00 0.8333 Unretained 1999-07-14 11:22:00 0.5556 Retained 1999-07-14 11:48:00 0.1667 Unretained 1999-07-14 11:48:00 0.5833 Retained 1999-07-14 13:56:00 0.57142857 Retained 1999-07-15 10:23:00 0. Retained 1999-07-22 12:03:00 0. Retained 1999-07-25 11:26:00 0.4000 Unretained 1999-07-25 11:26:00 1. Retained And I would like to end up with: Date lmb.cpue 1999-07-10 12:10:00 0.8333 1999-07-14 11:22:00 1.3889 1999-07-14 11:48:00 0.7500 1999-07-14 13:56:00 0.57142857 1999-07-15 10:23:00 0. 1999-07-22 12:03:00 0. 1999-07-25 11:26:00 1.4000 Thanks for any help you have to offer! -- View this message in context: http://r.789695.n4.nabble.com/how-to-sum-multiple-data-entries-for-the-same-sampling-event-tp4341670p4341670.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?
On 30/01/2012 1:26 PM, Paul Johnson wrote: A user question today has me stumped. Can you advise me, please? User wants a matrix that has some numbers, some variables, possibly even some function names. So that has to be a character matrix. It might make more sense for it to be a list-mode matrix. Lists are vectors, and if they have dimension, they are matrices, but the entries need not be the same types. Consider: BM- matrix(0.1, 5, 5) Use data.entry(BM) or similar to set some to more abstract values. BM[3,1]- a BM[4,2]- b BM[5,2]- b BM[5,3]- d BM var1 var2 var3 var4 var5 [1,] 0.1 0.1 0.1 0.1 0.1 [2,] 0.1 0.1 0.1 0.1 0.1 [3,] a 0.1 0.1 0.1 0.1 [4,] 0.1 b 0.1 0.1 0.1 [5,] 0.1 b d 0.1 0.1 Later on, user code will set values, e.g., a- rnorm(1) b- 17 d- 4 Now, push those into BM, convert whole thing to numeric newBM- apply(BM, c(1,2), as.numeric) and use newBM for some big calculation. Then re-set new values for a, b, d, do the same over again. I've been trying lots of variations on parse, substitute, and eval. The most interesting function I learned about this morning was delayedAssign. If I had only to work with one scalar, it does what I want delayedAssign(a, whatA) whatA- 91 a [1] 91 I can't see how to make that work in the matrix context, though. Got ideas? I don't think delayedAssign is what you want: it creates promises, and promises can only be evaluated once. You want language entries in your matrix, and you want to use eval() to evaluate them. (Or character entries, and use Henrik's parseAndEval.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] timeseries highlighting
I'd like to plot a given time series in a primary color but highlight a segment of it in a different color. Is there an elegant way to do it? A+ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] timeseries highlighting
library(zoo) demo(zoo-overplot) Michael On Mon, Jan 30, 2012 at 2:05 PM, Alexy Khrabrov delivera...@gmail.com wrote: I'd like to plot a given time series in a primary color but highlight a segment of it in a different color. Is there an elegant way to do it? A+ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] timeseries highlighting
On Mon, Jan 30, 2012 at 2:12 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: library(zoo) demo(zoo-overplot) Also: library(zoo) example(xblocks) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeat function for entire list of matrices
No problem. Glad it worked for you. Michael On Mon, Jan 30, 2012 at 12:05 PM, pabears danss...@gmail.com wrote: michael, i don't know what happened, i was reading up on ?lapply(), i was up really late, and somehow it didn't seem to take, but i tried it again this morning and it worked like a charm.(sorry about the ellipses, i was just being lazy/unclear). that's great, thanks, this is a great help... -- View this message in context: http://r.789695.n4.nabble.com/repeat-function-for-entire-list-of-matrices-tp4334587p4341629.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] And Statement for two if functions
Sorry that post was written in a bit if a rush. I am writing a function in which I am trying to create a league table from a data frame of rugby matches with the columns as follows: home team, away team, home score and away score. In rugby you can get an extra bonus point if you are the losing team and lose by less than 7 points. So therefore in my function I am writing if the away team loses AND loses by less than or equal to 7 points then the away team will get an extra point, So ideally want to write: if(games[i,3] games[i,4] AND games[i,3] = games[i,4] + 7) { T[which(teams == games[i,2]),Points] - T[which(teams == games[i,2]),Points] + 1} Which is inset into a function in R where the input of the function is 'games' which will be the list of the 132 matches of rugby being analysed and where teams is the list of 12 teams in the league. I wasn't sure if it was possible to write an 'if' function embedded in another 'if' function or which method would be best to achieve this. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/And-Statement-for-two-if-functions-tp4341179p4342098.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem in fitting model equation in nls function
Dear R users, I am struggling to fit expo-linear equation to my data using nls function. I am always getting error message as i highlighted below in yellow color: ### Theexpo-linear equation which i am interested to fit my data: response_variable = (c/r)*log(1+exp(r*(Day-tt))), where Day is time-variable ## my response variable rl - c(2,1.5,1.8,2,2,2.5,2.6,1.5,2.4,1.7,2.3,2.4,2.2,2.6, 2.8,2,2.5,1.8,2.4,2.4,2.3,2.6,3,2,2.6,1.8,2.5,2.5, 2.3,2.7,3,2.2,2.6,1.8,2.5,2.5,2.3,2.7,3,2.2) myday - rep(c(3,5,7,9,10), each = 8) # creating my predictor time-variable mydata - data.frame(rl,myday) # data object ### fitting model equation in nls function ### when i assigned initial value for tt = 0.6, CASE-I: mytest - nls(rl ~ (c/r)*log(1+exp(r*(myday-tt))), data = mydata, + na.action = na.omit, + start = list(c = 2.0, r = 0.05, tt = 0.6),algorithm = plinear) Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model CASE - II: When i assigned initial value for tt = 1: mytest - nls(rl ~ (c/r)*log(1+exp(r*(myday-tt))), data = mydata, + na.action = na.omit, + start = list(c = 2.0, r = 0.5, tt = 1),algorithm = plinear) Error in nls(rl ~ (c/r) * log(1 + exp(r * (myday - tt))), data = mydata, : singular gradient I am getting the yellow-color highlighted error message (see above). Truely speaking, i have not so much experienced with fitting specific model equation in R-package. I have following queries: 1. Does any one can explain me what is going wrong here ? 2. Importantly, how can i write above equation into nls functions ? I will be very thankful to you, if any one can help me. I am looking for your cooperations. Thanks Regards, Ram Kumar Basnet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: Variable selection based on both training and testing data
Jim, With regard to variable and model selection, you might consider using Bayesian model averaging (bma program) or some sort of shrinkage (lars or lasso2 programs). Scott Millis From: Jin Minming jminm...@yahoo.com To: r-help@r-project.org r-help@r-project.org; SR Millis aa3...@wayne.edu Sent: Monday, January 30, 2012 11:30 AM Subject: Re: [R] Fw: Variable selection based on both training and testing data Dear Scott, I am so sorry that I think I just sent an empty email to you. Thanks a lot for your advice. The problem is that we do not have sufficient prior knowledge for the regression form and even appropriate inputs. We need try to find some possible regression equations, then add our explanation to them. So we need explore a lot of options. The two input datasets are very different in nature and they are from two locations. Hence, it can be used for testing purpose although it may turn out to be that there is not an appropriate regression due to the intrinsic difference in these two datasets. In fact, if I can extract the models used (not only the final model) in stepAIC function, then it will be easier to add some simple scripts to calculate R2 or RMSE for both datasets. Thanks, Jim --- On Mon, 30/1/12, SR Millis aa3...@wayne.edu wrote: From: SR Millis aa3...@wayne.edu Subject: [R] Fw: Variable selection based on both training and testing data To: r-help@r-project.org r-help@r-project.org Date: Monday, 30 January, 2012, 14:57 From: SR Millis srmil...@yahoo.com To: Jin Minming jminm...@yahoo.com Sent: Monday, January 30, 2012 9:25 AM Subject: Re: [R] Variable selection based on both training and testing data Jim, First, stepwise methods for variable selection should be avoided. Frank Harrell (in Regression Modeling Strategies) discusses this at length. Second, splitting a dataset into training and validation sets is generally not a good idea unless you have a really large sample, eg, 20,000. As Harrell has discussed, split-sample validation does not provide external validation, is terribly inefficient, and is arbitrary. It's better to specify your model a priori and use the bootstrap to obtain an estimate of your model's over-optimism. Bootstrapping can be implemented with Harrell's rms package in R. Scott ~~~ Scott R Millis, PhD, ABPP, CStat, PStat® Professor Wayne State University School of Medicine Email: aa3...@wayne.edu Email: srmil...@yahoo.com Tel: 313-993-8085 To: r-help@r-project.org Sent: Monday, January 30, 2012 8:14 AM Subject: [R] Variable selection based on both training and testing data Dear all, The variable selection in regression is usually determined by the training data using AIC or F value, such as stepAIC. Is there some R package that can consider both the training and test dataset? For example, I have two separate training data and test data. Firstly, a regression model is obtained by using training data, and then this model is tested by using test data. This process continues in order to find some possible optimal models in terms of RMSE or R2 for both training and test data. Thanks, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart usersplits
I'm inspecting tests/usersplits.R in rpart, trying to get my head around how to pass data to the split function. I'm trying to instantiate a number of goodness measures which compare treatment vs control within splits. A simple example is difference-in-difference estimate of a candidate split, (Y_t - Y_c)_L - (Y_t - Y_c)_R (difference between treatment and control for the left minus difference between treatment and control for the right) I need to know whether each Y value is a treatment Y or a control. the documentation in usersplits.R says that Y is provided in sort order of X, so I'm not sure how to pass a vector indicating treatment vs control that will split Y appropriately. rpart.poisson takes a two column matrix as an input, and I've been trying to mimic that, but there's no documentation in rpart.poisson (i can't figure out where it creates the 'vector of goodness') I'd appreciate any advice, J. Cress -- View this message in context: http://r.789695.n4.nabble.com/rpart-usersplits-tp4342156p4342156.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] User Interface Equivalent Code
When I plot, the plot's user interface offers me a choice: File | Copy to the Clipboard | as a Bitmap. What is the equivalent code for achieving this but without the plot interface becoming visible? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking for invalid dates: Code works but needs improvement
On Jan 30, 2012, at 12:15 PM, David Winsemius wrote: On Jan 30, 2012, at 8:44 AM, Paul Miller wrote: Hi Rui, Marc, and Gabor, Thanks for your replies to my question. All were helpful and it was interesting to see how different people approach various aspects of the same problem. Spent some time this weekend looking at Rui's solution, which is certainly much clearer than my own. Managed to figure out pretty much all the details of how it works. Also managed to tweak it slightly in order to make it do exactly what I wanted. (See revised code below.) Still have a couple of questions though. The first concerns the insertion of the code Y 2012 to set year values beyond 2012 to NA (on line 10 of the function below). When I add this (or use it in place of nchar(Y) 4), the code succesfully finds the problem date 05/16/2015. After that though, it produces the following error message: Error in if (any(is.na(x) M != un Y != un)) cat(Warning: Invalid date values in, : missing value where TRUE/FALSE needed It's a bit dangerous to use comparison operators on mixed data types. In your case you are comparing a character value to a numeric value and may not realize that 2015 is not the same as 2015. Try 123 1000 if you want a quick counter-example. You may want to coerce the Y value to numeric mode to be safe. Also 'any' does not expect the logical connectives. You probably want: any(is.na(x) , M != un , Y != un) Perhaps I am missing something relevant here, but I am still confused by what I see as an over engineering of the code being implemented. If the primary requirements are: 1. Impute the 15th of month if it is 'un' 2. Reject dates prior to 1900 or after 2011 3. Reject dates with an unknown ('un') month or year 4. Reject years with 4 digits, also presuming that the value passed should always be 10 characters in length If that is the basic functionality required, then a modest modification of my prior code should work: checkDate - function(x) { # Replace unknown day with 15 tmp - gsub(/un/, /15/, x) tmp2 - as.Date(tmp, format = %m/%d/%Y) as.character(x[is.na(tmp2) | tmp2 as.Date(1900/01/01) | tmp2 as.Date(2012/01/01) | nchar(as.character(x)) 10]) } TestDates Patient birthDT diagnosisDT metastaticDT 1 1 11/23/21931 05/23/2009 un/17/2011 2 2 06/20/1840 02/30/2010 03/17/2011 3 3 06/17/1935 12/20/2008 07/un/2011 4 4 05/31/1937 01/18/2007 04/30/2011 5 5 06/31/1933 05/16/2015 11/20/un lapply(TestDates[, -1], checkDate) $birthDT [1] 11/23/21931 06/20/1840 06/31/1933 $diagnosisDT [1] 02/30/2010 05/16/2015 $metastaticDT [1] un/17/2011 11/20/un Does that not do what you require Paul? Marc Why is this happening? If the code correctly correctly handles the date 06/20/1840 without producing an error, why can't it do likelwise with 05/16/2015? The second question is why it's necessary to put x on line 15 following cat(Warning ...). I know that I don't get any date columns if I don't include this but am not sure why. The third question is whether it's possible to change the class of the date variables without using a for loop. I played around with this a little but didn't find a vectorized alternative. It may be that this is not really important. It's just that I've read in several places that for loops should be avoided wherever possible. Thanks, Paul snip prior content __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking for invalid dates: Code works but needs improvement
On Jan 30, 2012, at 1:30 PM, Marc Schwartz wrote: On Jan 30, 2012, at 12:15 PM, David Winsemius wrote: On Jan 30, 2012, at 8:44 AM, Paul Miller wrote: Hi Rui, Marc, and Gabor, Thanks for your replies to my question. All were helpful and it was interesting to see how different people approach various aspects of the same problem. Spent some time this weekend looking at Rui's solution, which is certainly much clearer than my own. Managed to figure out pretty much all the details of how it works. Also managed to tweak it slightly in order to make it do exactly what I wanted. (See revised code below.) Still have a couple of questions though. The first concerns the insertion of the code Y 2012 to set year values beyond 2012 to NA (on line 10 of the function below). When I add this (or use it in place of nchar(Y) 4), the code succesfully finds the problem date 05/16/2015. After that though, it produces the following error message: Error in if (any(is.na(x) M != un Y != un)) cat(Warning: Invalid date values in, : missing value where TRUE/FALSE needed It's a bit dangerous to use comparison operators on mixed data types. In your case you are comparing a character value to a numeric value and may not realize that 2015 is not the same as 2015. Try 123 1000 if you want a quick counter-example. You may want to coerce the Y value to numeric mode to be safe. Also 'any' does not expect the logical connectives. You probably want: any(is.na(x) , M != un , Y != un) Perhaps I am missing something relevant here, but I am still confused by what I see as an over engineering of the code being implemented. If the primary requirements are: 1. Impute the 15th of month if it is 'un' 2. Reject dates prior to 1900 or after 2011 3. Reject dates with an unknown ('un') month or year 4. Reject years with 4 digits, also presuming that the value passed should always be 10 characters in length If that is the basic functionality required, then a modest modification of my prior code should work: Ack...typo in my code for the upper end of the date range. Should be: checkDate - function(x) { # Replace unknown day with 15 tmp - gsub(/un/, /15/, x) tmp2 - as.Date(tmp, format = %m/%d/%Y) as.character(x[is.na(tmp2) | tmp2 as.Date(1900/01/01) | tmp2 as.Date(2011/12/31) | nchar(as.character(x)) 10]) } Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euler identity with complex exp
On Mon, Jan 30, 2012 at 11:43 AM, Joseph Park josephp...@ieee.org wrote: Thanks Michael Peter. Michael's expansion makes sense. This is what I expected: a = pi + 0i complex( real = cos(Re(a)), imaginary = sin(Im(a)) ) [1] -1+0i As they say, the error is between the keyboard and the chair. You cannot drop parts of a - in your formula above you dropped the imaginary part of a in cos and the real part of a in sin. In this case it doesn't make a difference but in general it will. Not this: exp(a) you need exp(ia), not exp(a): i = complex(real = 0, imaginary = 1) exp(i*a) [1] -1+0i [1] 23.14069+0i Is this not an implementation of Euler's formula: complex( real = cos(2*pi), imaginary = sin(2*pi) ) [1] 1-0i And that is a result Michael depends on in his expansion, yet if we pass this argument to exp: exp( (complex( real = 2*pi, imaginary = 2*pi) ) ) [1] 535.4917-0i Again, you are not using the formula correctly. Remember x = 2*pi, so you need exp( i * 2 * pi) and you get the same result as complex( real = cos(2*pi), imaginary = sin(2*pi) ) Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euler identity with complex exp
This is off-topic for R-help, but we might as well finish what's been started: Take a closer look at exp(i*x). If x is real, i*x is a pure imaginary number, not a complex number so the formula you are using doesn't hold in general.** The general Euler result for complex (= mixed real and imaginary) numbers looks like this: exp(x + iy) = exp(x)*(cos(y) + i sin(y)) That is, the real part gives the modulus and the imaginary part goes solely to the argument. What's often surprising about this is that exp(2 + 2*pi*i) = exp(2) = exp(2+4*pi*i) = exp(2 - 2*pi*i) because the trig functions which get applied to the imaginary part are periodic. Take a closer look at what you wrote: complex( real = cos(2*pi), imaginary = sin(2*pi) ) exp( (complex( real = 2*pi, imaginary = 2*pi) ) ) The number in the first line is not what gets exponentialed in the second! You'll get the expected (by you) behavior if you actually use the same number for both calculations: complex( real = cos(2*pi), imaginary = sin(2*pi) ) exp(complex( real = cos(2*pi), imaginary = sin(2*pi) )) or complex(real = 2*pi, imaginary = 2*pi) exp(complex(real = 2*pi, imaginary = 2*pi)) If you work out the second like I did for exp(pi + 2*pi*i) in my first email, you'll get the correct answer. All in all, R is definitely correct in it's interpretation of Euler's formula. There's only one way to parse this relationship that gives mathematical consistency and it's what Peter and I have set out for you. Michael ** Not actually true, if x is complex, it of course works out correctly as well, but you wind up having to use the more general expression I give to get there. On Mon, Jan 30, 2012 at 2:43 PM, Joseph Park josephp...@ieee.org wrote: Thanks Michael Peter. Michael's expansion makes sense. This is what I expected: a = pi + 0i complex( real = cos(Re(a)), imaginary = sin(Im(a)) ) [1] -1+0i Not this: exp(a) [1] 23.14069+0i Is this not an implementation of Euler's formula: complex( real = cos(2*pi), imaginary = sin(2*pi) ) [1] 1-0i And that is a result Michael depends on in his expansion, yet if we pass this argument to exp: exp( (complex( real = 2*pi, imaginary = 2*pi) ) ) [1] 535.4917-0i That would not work in Michaels expansion, the answer must be 1 + 0i. Which seems to suggest that exp( ix ) and cos x + i sin x (as written above) are different interpretations. On 01/30/2012 12:47 PM, Peter Langfelder wrote: Not sure why you think the formula does not hold... but am guessing you think that sin(x) and cos(x) are have values in [-1, 1]? Well that only holds for real x. If you have a complex x, sin(x) and cos(x) are unbounded - indeed, if you can write x=iy and y is real, you can show (up to my own ignorance of possible signs) cos(x) = cosh(y), and sin(x) = -sinh(y) simply by expressing (from the formula you wrote) cos(x) and sin(x) as cos(x) = ( exp(ix) + exp(-ix) )/2 and sin(x) = ( exp(ix) - exp(-ix) )/2 In any case, plug any complex number into exp( ix ) and cos x + i sin x in R and you will get the exact same answers. HTH, Peter On Mon, Jan 30, 2012 at 7:37 AM, Joseph Park josephp...@ieee.org wrote: Hi, Am i doing something silly here in expecting Euler's formula to be handled by exp? exp( ix ) = cos x + i sin x. The first example below follows this, the others not. Thanks for the education! exp( complex(real = 0, imag = 2*pi) ) [1] 1-0i exp( complex(real = pi, imag = 2*pi) ) [1] 23.14069-0i exp( complex(real = pi/2, imag = 0) ) [1] 4.810477+0i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] And Statement for two if functions
Nested if's are fine in R, but as David said you probably want ifelse(). This sounds sufficiently homework-y that I'm hesitant to give example code but it's all over the archives. Just to head off a problem I see in your pesudo-code; you're going to want to use ifelse() to construct the points vector and then assign it: it's terribly dangerous to do assignment within ifelse() as if it were a simple if(). Michael On Mon, Jan 30, 2012 at 1:55 PM, kerry1912 kerry1...@hotmail.com wrote: Sorry that post was written in a bit if a rush. I am writing a function in which I am trying to create a league table from a data frame of rugby matches with the columns as follows: home team, away team, home score and away score. In rugby you can get an extra bonus point if you are the losing team and lose by less than 7 points. So therefore in my function I am writing if the away team loses AND loses by less than or equal to 7 points then the away team will get an extra point, So ideally want to write: if(games[i,3] games[i,4] AND games[i,3] = games[i,4] + 7) { T[which(teams == games[i,2]),Points] - T[which(teams == games[i,2]),Points] + 1} Which is inset into a function in R where the input of the function is 'games' which will be the list of the 132 matches of rugby being analysed and where teams is the list of 12 teams in the league. I wasn't sure if it was possible to write an 'if' function embedded in another 'if' function or which method would be best to achieve this. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/And-Statement-for-two-if-functions-tp4341179p4342098.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?
Henrik's proposal works well, so far. Thanks very much. I could not have figured that out (without much more suffering). Here's the working example in case future googlers find their way to this thread. ## Paul Johnson paulj...@ku.edu ## 2012-01-30 ## Special thanks to r-help email list contributors, ## especially Henrik Bengtsson BM - matrix(0.1, 5, 5) BM[2,1] - a BM[3,2] - b BM parseAndEval - function(x, ...) eval(parse(text=x)) a - 0.5 b - 0.4 realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval) BM[4,5] - rnorm(1, m=7, sd=1) BM realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval) realBM ## Now, what about gui interaction with that table? ## The best nice looking options are not practical at the moment. ## Try this instead data.entry(BM) ## That will work on all platforms, so far as I know, without ## any special effort from us. Run that, make some changes, then ## make sure you insert new R variables to match in your environment. ## Suppose you inserted the letter z in there somewhere ## set z out here z - rpois(1, lambda=10) realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Different type of legend?
How would I create a legend that looks like the attached image? Basically all of the color boxes are right next to each other and the text is below. This kind of arrangement allows for many more items in the legend. Using the legend() method seems to top out at about 14 items (that will fit in the horizontal plot). Suggestions? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different type of legend?
Server stripped the attachment. Can you post a link somewhere? Michael On Mon, Jan 30, 2012 at 4:25 PM, rkevinbur...@charter.net wrote: How would I create a legend that looks like the attached image? Basically all of the color boxes are right next to each other and the text is below. This kind of arrangement allows for many more items in the legend. Using the legend() method seems to top out at about 14 items (that will fit in the horizontal plot). Suggestions? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.