Re: [R] Extract elements from objects in a list
x - lapply(1:100, function(x) summary(runif(100))) head(x, 4) [[1]] Min. 1st Qu. MedianMean 3rd Qu.Max. 0.02922 0.38330 0.58120 0.58230 0.83430 0.99870 [[2]] Min. 1st Qu. Median Mean 3rd Qu. Max. 0.004903 0.281400 0.478900 0.497100 0.729900 0.990700 [[3]] Min. 1st Qu. Median Mean 3rd Qu. Max. 0.002561 0.283100 0.567400 0.516500 0.736800 0.986700 [[4]] Min. 1st Qu. Median Mean 3rd Qu. Max. 0.004827 0.264000 0.499500 0.503200 0.732300 0.998500 sapply(x, [, 3)[1:4] Median Median Median Median 0.5812 0.4789 0.5674 0.4995 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeremy Miles Sent: Wednesday, 29 June 2011 9:23 AM To: r-help Subject: [R] Extract elements from objects in a list Hi All, I want to extract elements of elements in a list. Here's an example of what I mean: If I create a list: x - as.list(100) for(loop in c(1:100)) { x[[loop]] - summary(runif(100)) } head(x) [[1]] Min. 1st Qu. MedianMean 3rd Qu.Max. 0.02271 0.25260 0.58130 0.52120 0.77270 0.99670 [[2]] Min. 1st Qu. Median Mean 3rd Qu. Max. 0.006796 0.259700 0.528100 0.515500 0.781900 0.993100 [[3]] Min. 1st Qu. MedianMean 3rd Qu.Max. 0.00927 0.22800 0.40780 0.46410 0.69460 0.98780 I want to extract (say) the medians as a vector. This would be: x[[1]][[3]] x[[2]][[3]] x[[3]][[3]] I thought there would be a way of doing this with something like apply(), but I cannot work it out. Is there a way of doing this without a loop? Thanks, Jeremy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New to R, trying to use agnes, but can't load my ditance matrix
The first problem is that you are using a character string as the first argument to agnes() The help information for agnes says that its first argument, x, is x: data matrix or data frame, or dissimilarity matrix, depending on the value of the 'diss' argument. Not a character string. So first you have to read your data into R and hold it as a data matrix or data frame. Then you have a choice. Either you can calculate your own distance matrix with it and then call agnes() with that as the first argument (and with diss = TRUE) or you can get agnes() to calculate the distance matrix for you, in which case you need to specify how, using the metric = argument. With 1 entities to cluster, your distance matrix will require 1*/2 [1] 49995000 numbers to be stored at once. I hope you are using a 64-bit OS! With such large numbers of entities to cluster, the usual advice is to try something more suited to the job. clara() is designed for this kind of problem. It might be useful to keep in mind that R is not a package. (Repeat: R is NOT a package - I cannot stress that strongly enough.) It is a programming language. To use it effectively you really need to know something about how it works, first. It might pay you to spend a little time getting used to the protocols, how to do simple things in R like reading in data and manipulating it, before you tackle such a large and potentially tricky clustering problem. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Karen R. Khar Sent: Monday, 27 June 2011 5:44 PM To: r-help@r-project.org Subject: [R] New to R, trying to use agnes, but can't load my ditance matrix Hi, I'm mighty new to R. I'm using it on Windows. I'm trying to cluster using a distance matrix I created from the data on my own and called it D10.dist. I loaded the cluster package. Then tried the following command... agnes(E:D10.dist, diss = TRUE, metric = euclidean, stand = FALSE, method = average, par.method, keep.diss = n 1000, keep.data = !diss) And it responded... Error in agnes(E:D10.dist, diss = TRUE, metric = euclidean, stand = FALSE, : x is not and cannot be converted to class dissimilarity D10.dist has the following data... D1 0 D2 0.6083920 D3 0.4974510.5376620 D4 0.6345480.3933430.5374260 D5 0.5587850.5433990.6322210.7266330 D6 0.6594830.7017780.7414250.668624 0.6559140 D7 0.6030120.6591730.5717760.687599 0.3837120.6839480 D8 0.6119190.6653570.5264530.715093 0.4574960.6982130.3170390 D9 0.41501 0.6521170.5520110.68969 0.485988 0.7027380.42819 0.4425980 D10 0.3765120.6006070.5178570.673515 0.5304210.6677360.5370250.48062 0.2405590 I would appreciate any suggestions. Please assume I know virtually nothing about R. Thanks, Karen PS I'll eventually be using ~10,000 species to cluster. I'll need to have within and between cluster distance info and I'll want a plot colored by cluster. I agnes the right R tool to use? -- View this message in context: http://r.789695.n4.nabble.com/New-to-R-trying-to-use-agnes-but-can-t-load-my-ditance-matrix-tp3627154p3627154.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing variables in a data frame
Just to start things off: var.name - c(gdp,inf,unp) var.id - c(w,i) x - paste(var.name, rep(var.id, each=length(var.name)), sep=_) x [1] gdp_w inf_w unp_w gdp_i inf_i unp_i Now the three differences: gdp_w - gdp_i inf_w - inf_i unp_w - unp_i Can be got using dwi - dat[, x[1:3]] - dat[, x[4:6]] and the other three differences gdp - gdp_w inf - inf_w unp - unp_w by dw - dat[, var.name] - dat[, x[1:3]] The results, in both cases, should be data frames Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Serguei Kaniovski Sent: Sunday, 26 June 2011 10:01 PM To: r-help@r-project.org Subject: [R] Accessing variables in a data frame Hello My data.frame (dat) contains many variables named var.names and others named var.names_var.id For example var.name - c(gdp,inf,unp) var.id - c(w,i) x - paste(var.name, rep(var.id, each=length(var.name)), sep=_) How can I access variables in the dama.frame by names listed in x, for example to compute gdp_w - gdp_i inf_w - inf_i unp_w - unp_i or gdp - gdp_w inf - inf_w unp - unp_w without needing to code each difference separately? Thanks for your help! Serguei Kaniovski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract worksheet names from an Excel file
Package XLConnect appears to provide this kind of thing. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shi, Tao Sent: Friday, 24 June 2011 2:42 PM To: r-help@r-project.org Subject: [R] extract worksheet names from an Excel file Hi list, Is there a R function I can use to extract the worksheet names from an Excel file? If no, any other automatic ways (not using R) to do this? thanks! ...Tao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] omitting columns from a data frame
Suppose names(xm1) - c(alpha, beta, gamma, delta) then xm2 - subset(xm1, select = alpha:gamma) or xm2 - subset(xm1, select = -delta) will do the same job as xm1[, -4] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erin Hodgess Sent: Tuesday, 21 June 2011 1:45 PM To: R help Subject: [R] omitting columns from a data frame Dear R People: I have a data frame, xm1, which has 12 rows and 4 columns. If I put is xm1[,-4], I get all rows, and columns 1 - 3, which is as it should be. Now, is there a way to use the names of the columns to omit them, please? Thanks so much in advance! Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] and now a cut question
con - + textConnection(13053 13068 13068 13053 14853 14853 14850 14850 13053 13053 13068 13068 + ) x - scan(con) Read 12 items cut(x, 4) [1] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] [4] (1.31e+04,1.35e+04] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04] [7] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04] (1.31e+04,1.35e+04] [10] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] 4 Levels: (1.31e+04,1.35e+04] (1.35e+04,1.4e+04] ... (1.44e+04,1.49e+04] cut(x, 4, dig.lab = 5) [1] (13051,13502] (13051,13502] (13051,13502] (13051,13502] (14404,14855] [6] (14404,14855] (14404,14855] (14404,14855] (13051,13502] (13051,13502] [11] (13051,13502] (13051,13502] Levels: (13051,13502] (13502,13953] (13953,14404] (14404,14855] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erin Hodgess Sent: Tuesday, 21 June 2011 10:46 AM To: R help Subject: [R] and now a cut question Hello again R People: I have the following: xm1[,1] [1] 13053 13068 13068 13053 14853 14853 14850 14850 13053 13053 13068 13068 cut(xm1[,1],4) [1] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] [4] (1.31e+04,1.35e+04] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04] [7] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04] (1.31e+04,1.35e+04] [10] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] 4 Levels: (1.31e+04,1.35e+04] (1.35e+04,1.4e+04] ... (1.44e+04,1.49e+04] Is there any way to have the levels print as 13100, 13500, etc., please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting character to numeric
..or something like that. Without more details it is hard to know just what is going on. Firstly in R the object is a 'data frame' (or object of class data.frame to be formal). There is no standard object in R called a 'database'. If you read in your data using read.csv, then mydata is going to be a data frame. Character columns in the original .csv file will be (most likely) factors in the R object. (This varies with how you import it, though.) This means you will need to convert them to character before you convert them to numeric. If they really are character, this initial conversion will not do anything (good or bad). If you want to operate on the individual columns of the data frame, then I would recommend you do it using something like: mydata - within(mydata, { apples - as.numeric(as.character(apples)) oranges - as.numeric(as.character(oranges)) ... }) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Steven Kennedy Sent: Wednesday, 22 June 2011 6:52 AM To: Alina Sheyman Cc: r-help@r-project.org Subject: Re: [R] converting character to numeric You need: mydata$apples-as.numeric(mydata$apples) On Wed, Jun 22, 2011 at 6:38 AM, Alina Sheyman alina...@gmail.com wrote: I'm trying to convert data from character to numeric. I've imported data as a csv file, I'm assuming that the import is a database - are all the columns in a database considered vectors and that they can be operated on individually Therefore I've tried the following mydata - as.numeric(mydata$apples) when i then look at mydata again the named column is still in character format if i do mydata2 - as.numeric(mydata$apples) the new object mydata2 is empty. Am i missing something about the structure of R? alina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting character to numeric
The point I would make is that for safety it's much better to use FALSE rather than F. FALSE is a reserved word in R, F is a pre-set variable, but can easily be changed at any time by the user. Secondly, doesn't this do the same as yours: readFF.csv - function(..., stringsAsFactors = FALSE) read.csv(..., stringsAsFactors = stringsAsFactors) ? (Warning: untested code...) Personally, I much prefer stringsAsFactors = TRUE, but then that's the way I was brought up... Bill Venables. -Original Message- From: Don McKenzie [mailto:d...@u.washington.edu] Sent: Wednesday, 22 June 2011 8:40 AM To: Venables, Bill (CMIS, Dutton Park) Cc: stevenkennedy2...@gmail.com; alina...@gmail.com; r-help@r-project.org Subject: Re: [R] converting character to numeric I have to chime in here with a slight revision to read.csv(), for those of us like me who have whined over time about stringsAsFactors. (tested on my machine macOSX 10.6) readFF.csv - function (file, header = TRUE, sep = ,, quote = \, dec = ., fill = TRUE, comment.char = , ...) read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, stringsAsFactors=F,comment.char = comment.char, ...) On 21-Jun-11, at 3:16 PM, bill.venab...@csiro.au wrote: ..or something like that. Without more details it is hard to know just what is going on. Firstly in R the object is a 'data frame' (or object of class data.frame to be formal). There is no standard object in R called a 'database'. If you read in your data using read.csv, then mydata is going to be a data frame. Character columns in the original .csv file will be (most likely) factors in the R object. (This varies with how you import it, though.) This means you will need to convert them to character before you convert them to numeric. If they really are character, this initial conversion will not do anything (good or bad). If you want to operate on the individual columns of the data frame, then I would recommend you do it using something like: mydata - within(mydata, { apples - as.numeric(as.character(apples)) oranges - as.numeric(as.character(oranges)) ... }) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Steven Kennedy Sent: Wednesday, 22 June 2011 6:52 AM To: Alina Sheyman Cc: r-help@r-project.org Subject: Re: [R] converting character to numeric You need: mydata$apples-as.numeric(mydata$apples) On Wed, Jun 22, 2011 at 6:38 AM, Alina Sheyman alina...@gmail.com wrote: I'm trying to convert data from character to numeric. I've imported data as a csv file, I'm assuming that the import is a database - are all the columns in a database considered vectors and that they can be operated on individually Therefore I've tried the following mydata - as.numeric(mydata$apples) when i then look at mydata again the named column is still in character format if i do mydata2 - as.numeric(mydata$apples) the new object mydata2 is empty. Am i missing something about the structure of R? alina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Science is organized skepticism in the reliability of expert opinion -- Richard Feynman Don McKenzie, Research Ecologist Pacific WIldland Fire Sciences Lab US Forest Service Affiliate Professor School of Forest Resources, College of the Environment CSES Climate Impacts Group University of Washington phone: 206-732-7824 d...@uw.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unreasonable syntax error
The advice is always NOT to use Microsoft Word to edit an R file. That stuff is poisonous. Microsoft word, typical of all Microsoft software, does not do what you tell it to do but helpfully does what it thinks you meant to ask it to do but were too dumb to do so. Even notepad, gawdelpus, would be better. When I look at your script the first sign of trouble I see is: Error: unexpected input in: s[1,3,k]- (0.1) * runif(1)+ 0.1; s[1,1,k]- (0.02) * runif(1)+ 0.98 ¨ which is the spurious character Microsoft word has helpfully inserted to make it all look nice. It's downhill all the way from there. (R does not expect Microsoft Word, just as nobody expects the Spanish Inquisition. See http://www.youtube.com/watch?v=CSe38dzJYkY). Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of james198877 Sent: Tuesday, 21 June 2011 7:13 AM To: r-help@r-project.org Subject: [R] Unreasonable syntax error http://r.789695.n4.nabble.com/file/n3612530/PSC.r PSC.r Hi all, I just wrote a program in R by editing it in Microsoft Word and then pasting into the text editor of R. The above is the file. And below is what the console complains Why doesn't it recognise 'r'?? I have to mention that at least when I typed this first several lines into the console, the first error didn't appear. I don't try the next errors as there would be too many lines to type...I'm not sure if this is something about Word http://r.789695.n4.nabble.com/file/n3612530/lastsave.txt lastsave.txt Thanks a lot for your help!!! -- View this message in context: http://r.789695.n4.nabble.com/Unreasonable-syntax-error-tp3612530p3612530.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setting breaks in hist
The way to guarantee a specific number of panels in the histogram, say n, is to specify n+1 breaks which cover the range of the data. As far as I know this is the only way. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erin Hodgess Sent: Tuesday, 21 June 2011 10:17 AM To: R help Subject: [R] setting breaks in hist Dear R People: Is there a way to guarantee that breaks=n will give you exactly n breaks, please? I'm fairly certain that the answer is no, but thought I'd check. Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please help! what are the different using log-link function and log transformation?
The two commands you give below are certain to lead to very different results, because they are fitting very different models. The first is a gaussian model for the response with a log link, and constant variance. The second is a gaussian model for a log-transformed response and identity link. On the original scale this model would imply a constant coefficient of variation and hence a variance proportional to the square of the mean, and not constant. Your problem is not particularly an R issue, but a difficulty with understanding generalized linear models (and hence generalized additive models, which are based on them). Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of pigpigmeow [gloryk...@hotmail.com] Sent: 19 June 2011 17:39 To: r-help@r-project.org Subject: [R] please help! what are the different using log-link function and log transformation? I'm new R-programming user, I need to use gam function. y - gam(a ~ s(b), family = gaussian(link=log), data) y - gam(log(a) ~ s(b), family = gaussian (link=identity), data) why [do] these two command [give different] results? I guess these two command results are same, but actally these two command results are different, Why? -- View this message in context: http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] For loop by factor.
If I understand you correctly, you are trying to find the cumulative maximum from the end within each level of the factor. If this is what you are trying to do, then here is one way you might like to do it. First, define the function: cumMax - function(x) Reduce(max, x, right = TRUE, accumulate = TRUE) Here is how you might use it: test A B 1 a 3 2 a 2 3 a 1 4 b 3 5 b 2 6 c 2 7 c 3 8 c 1 9 c 1 (test - within(test, B - ave(B, A, FUN = cumMax)) A B 1 a 3 2 a 2 3 a 1 4 b 3 5 b 2 6 c 3 7 c 3 8 c 1 9 c 1 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Christopher Peters Sent: Monday, 20 June 2011 6:21 AM To: r-help@r-project.org Subject: [R] For loop by factor. I have a data.frame as follows: a 3 a 2 a 1 b 3 b 2 c 2 c 3 c 1 c 1 Each factor (a, b, c) should be monotonically decreasing, notice that factor 'c' is not. I could use some help to figure out how to form a logical structure (mostly just syntax), that will check each 'next value' for each factor to see if it is less than the previous value. If it is less than the previous value, do nothing, else subtract 'next value' from 'current value', add that amount to the starting value and each previous value to the 'next value' is greater than 'previous value'. So basically the data.frame would look like: a 3 a 2 a 1 b 3 b 2 c 3 c 3 c 1 c 1 Thanks for your help! __ Christopher P. Peters Research Associate Center for Energy Studies http://www.enrg.lsu.edu/staff/peters Energy, Coast Environment Building Louisiana State University Baton Rouge, LA 70803 Telephone: 225-578-4400 Fax: 225-578-4541 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can this sequence be generated easier?
Here ia an idea that might be useful to adapt fixedSumCombinations - function(N, terms) if(terms == 1) return(N) else if(terms == 2) return(cbind(0:N, N:0)) else { X - NULL for(i in 0:N) X - rbind(X, cbind(i, Recall(N-i, terms-1))) X } example: fixedSumCombinations(4, 3) i [1,] 0 0 4 [2,] 0 1 3 [3,] 0 2 2 [4,] 0 3 1 [5,] 0 4 0 [6,] 1 0 3 [7,] 1 1 2 [8,] 1 2 1 [9,] 1 3 0 [10,] 2 0 2 [11,] 2 1 1 [12,] 2 2 0 [13,] 3 0 1 [14,] 3 1 0 [15,] 4 0 0 Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of pdb [ph...@philbrierley.com] Sent: 18 June 2011 14:34 To: r-help@r-project.org Subject: [R] can this sequence be generated easier? I have 'x' variables that I need to find the optimum combination of, with the constraint that the sum of all x variables needs to be exactly 100. I need to test all combinations to get the optimal mix. This is easy if I know how many variables I have - I can hard code as below. But what if I don't know the number of variables and want this to be a flexible parameter. Is there a sexy recursive way that this can be done in R? #for combinations of 2 variables vars = 2 for(i in 0:100){ for(j in 0:(100-i)){ ...do some test i,j combination }} #for combinations of 3 variables vars = 3 for(i in 0:100){ for(j in 0:(100-i)){ for(k in 0:100-(i+j)){ ...do some test on i,j,k combination }}} -- View this message in context: http://r.789695.n4.nabble.com/can-this-sequence-be-generated-easier-tp3607240p3607240.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] functions for polynomial and rational interplation?
Neville's algorithm is not an improvement on Lagrange interpolation, it is simply one way of calculating it that has some useful properties. The result is still the Lagrange interpolating polynomial, though, with all its flaws. Implementing Neville's algorithm is fairly easy using the PolynomF package, but I'm not sure if it really offers much advantage. YMMV. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of dslowik Sent: Tuesday, 14 June 2011 10:34 AM To: r-help@r-project.org Subject: [R] functions for polynomial and rational interplation? Are there implementations of, e.g. Neville's algorithm, for interpolating polynomials through some data points? Nevilles' is an improvement on Lagrange interpolation. And how about interpolating rational functions? I could not find anything at rseek.org or at crantastic.org. thanks -- View this message in context: http://r.789695.n4.nabble.com/functions-for-polynomial-and-rational-interplation-tp3595334p3595334.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Score Test Function
The score test looks at the effect of adding extra columns to the model matrix. The function glm.scoretest takes the fitted model object as the first argument and the extra column, or columns, as the second argument. Your x2 argument has length only 3. Is this really what you want? I would have expected you need to specify a vector of length nrow(DF), [as in the help information for glm.scoretest itself]. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tyler Rinker Sent: Sunday, 12 June 2011 2:46 PM To: r-help@r-project.org Subject: [R] Score Test Function Greeting R Community, I'm trying to learn Logistic Regression on my own and am using An Introduction to Logistic Regression Analysis and Reporting (Peng, C., Lee, K., Ingersoll, G. ,2002). This article uses a Score Test Stat as a measure of overall fit for a logistic regression model. The author calculates this statistic using SAS. I am looking for an [R] function that can compute this stat and a p=value (please don't critique the stat itself). As far as I can tell glm.scoretest() from library(statmod) is the only function for this but it does not give a pvalue and appears to not correspond with the author's example. Some chat on the discussion board a while back indicated that this was a well known test Click here for that thread but difficult to calculate in [R]. I'm wondering if there is a way to do this fairly easily in [R] at this time? I am working with the data set from the study and am looking to get close to the 9.5177 score test stat and .0086 p-value with 2 degrees freedom. Below is the code for the data set and some descriptives so you can see the data set is the same as from the study. I highlighted my attempt to get a score test statistic and am curious if this is it (minus the p-value). #BEGINNING OF CODE id-factor(1:189) gender-factor(c(Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl, Boy,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl, Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,Boy,Boy,Boy,Boy,Girl,Boy, Girl,Boy,Boy,Boy,Girl,Boy,Girl,Girl,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Girl, Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Boy, Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl, Girl,Girl,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Girl,Boy,Boy,Girl,Boy,Boy,Boy, Boy,Girl,Boy,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Boy,Boy,Girl)) reading.score-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0, 60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0, 80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5, 57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5, 62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0, 65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5, 75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0, 65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0, 52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0) reading.recommendation-as.factor(c(rep(no,130),rep(yes,59))) DF-data.frame(id,gender,reading.score,reading.recommendation) head(DF) #= # DESCRIPTIVES #= table(DF[2:4]) #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING RECOMENDATIONS table(DF[c(2)]) #TABLE OF GENDER print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER BREAKDOWN table(DF[c(4)]) #TABLE RECOMENDDED FOR REMEDIAL READING print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of Reccomended table(DF[c(2,4)]) #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended #= #ANALYSIS #= (mod1-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF)) library(statmod) with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2 #If I move the decimal over 2 to the right
Re: [R] Custom Sort on a Table object
Here is a one way. tab fm 0 to 5 11.328000 6.900901 15 to 24 6.100570 5.190058 25 to 34 9.428707 6.567280 35 to 4410.462158 7.513270 45 to 54 7.621988 5.692905 5 to 14 6.502741 6.119663 55 to 64 5.884737 4.319905 65 to 74 5.075606 4.267810 75 to 84 4.702020 3.602362 85 and_over 4.75 3.877551 lowAge - as.numeric(sapply(strsplit(rownames(tab), ), [, 1)) (tab - tab[order(lowAge), ]) fm 0 to 5 11.328000 6.900901 5 to 14 6.502741 6.119663 15 to 24 6.100570 5.190058 25 to 34 9.428707 6.567280 35 to 4410.462158 7.513270 45 to 54 7.621988 5.692905 55 to 64 5.884737 4.319905 65 to 74 5.075606 4.267810 75 to 84 4.702020 3.602362 85 and over 4.75 3.877551 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Galen Moore Sent: Tuesday, 7 June 2011 1:23 PM To: r-help@r-project.org Subject: [R] Custom Sort on a Table object Greetings - I've got the following table (the result of a two-way table operation): f m 0 to 5 11.328000 6.900901 15 to 24 6.100570 5.190058 25 to 34 9.428707 6.567280 35 to 4410.462158 7.513270 45 to 54 7.621988 5.692905 5 to 14 6.502741 6.119663 55 to 64 5.884737 4.319905 65 to 74 5.075606 4.267810 75 to 84 4.702020 3.602362 85 and over 4.75 3.877551 Which I'd like to sort so that the column of rownames (which represent age bands) and their corresponding f and m values appear in logical order. I've tried a bunch of things; merging with a separate df bearing Age Bands paired with a sequence number, stripping out row vectors and rbind-ing a new df, etc., all to no avail. It seems to be very difficult to spin a table object into a data frame without being stuck with the tables rownames! I haven't yet tried writing to an external file and then reading it back (so as to get R to forget that it's a Table object), and then merging on the group bands to pull in a sequence vector upon which to do an order(). Seems like it should be easier. Many thanks, Galen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix Question
Here is one way you might do it. con - textConnection( + characteristics_ch1.3 Stage: T1N0 Stage: T2N1 + Stage: T0N0 Stage: T1N0 Stage: T0N3 + ) txt - scan(con, what = ) Read 11 items close(con) Ts - grep(^T, txt, value = TRUE) Ts - sub(T([[:digit:]]+)N([[:digit:]]+), \\1x\\2, Ts) out - do.call(rbind, strsplit(Ts, x)) mode(out) - numeric dimnames(out) - list(rep(, nrow(out)), c(T, N)) out T N 1 0 2 1 0 0 1 0 0 3 Now you can print 'out' however you want it, e.g. sink(outfile.txt) out sink() This is slightly more complex than it might be as I have allowed for the possibility that your numbers have more than one digit. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ben Ganzfried Sent: Friday, 3 June 2011 4:54 AM To: r-help@r-project.org Subject: [R] Matrix Question Hi, First of all, I would like to introduce myself as I will probably have many questions over the next few weeks and want to thank you guys in advance for your help. I'm a cancer researcher and I need to learn R to complete a few projects. I have an introductory background in Python. My questions at the moment are based on the following sample input file: *Sample_Input_File* characteristics_ch1.3 Stage: T1N0 Stage: T2N1 Stage: T0N0 Stage: T1N0 Stage: T0N3 characteristics_ch1.3 is a column header in the input excel file. T's represent stage and N's represent degree of disease spreading. I want to create output that looks like this: *Sample_Output_File* T N 1 0 2 1 0 0 1 0 0 3 As it currently stands, my code is the following: rm(list=ls()) source(../../functions.R) uncurated - read.csv(../uncurated/Sample_Input_File_full_pdata.csv,as.is =TRUE,row.names=1) ##initial creation of curated dataframe curated - initialCuratedDF(rownames(uncurated),template.filename=Sample_Template_File.csv) ## ##start the mappings ## ##title - alt_sample_name curated$alt_sample_name - uncurated$title #T tmp - uncurated$characteristics_ch1.3 tmp - *??* curated$T - tmp #N tmp - uncurated$characteristics_ch1.3 tmp - *??* curated$N - tmp write.table(curated, row.names=FALSE, file=../curated/Sample_Output_File_curated_pdata.txt,sep=\t) My question is the following: What code gets me the desired output (replacing the *??*'s above)? I want to: a) Find the integer value one element to the right of T; and b) find the integer value one element to the right of N. I've read the regular expression tutorial for R, but could only figure out how to grab an integer value if it is the only integer value in the row (ie more than one integer value makes this basic regular expression unsuccessful). Thank you very much for any help you can provide. Sincerely, Ben Ganzfried [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count value changes in a column
I thought so to. If so, here is one way you could do it fixSeq - function(state) { shift1 - function(x) c(1, x[-length(x)]) repeat { change - state %in% c(4,5) shift1(state) == 3 if(any(change)) state[change] - 3 else break } state } e.g. state [1] 1 3 3 5 5 3 2 4 2 1 5 3 3 5 fixSeq(state) [1] 1 3 3 3 3 3 2 4 2 1 5 3 3 3 For the data frame: set.seed(144) dfr - data.frame(state = sample(rep(1:5,200))) dfr - within(dfr, changed - fixSeq(state)) head(dfr, 11) state changed 1 5 5 2 2 2 3 1 1 4 5 5 5 2 2 6 1 1 7 1 1 8 4 4 9 3 3 10 5 3 ### --- change 11 3 3 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman Sent: Wednesday, 1 June 2011 11:20 AM To: Justin Haynes Cc: r-help@r-project.org Subject: Re: [R] count value changes in a column Why isn't the sequence: 1 3 3 3 3 3 2 4 2 1 5 3 3 3 according to your rule about 5s following 3s. On Tue, May 31, 2011 at 6:23 PM, Justin Haynes jto...@gmail.com wrote: is there a way to look for value changes in a column? set.seed(144) df-data.frame(state=sample(rep(1:5,200),1000)) any of the five states are acceptable. however if, for example, states 4 or 5 follow state 3, i want to overwrite them with 3. changes from 1 to any value and 2 to any value are acceptable as are changes from any value to 1 or 2. By way of an example: the sequence 1 3 3 5 5 3 2 4 2 1 5 3 3 5 should read 1 3 3 3 3 3 2 4 2 1 5 5 5 5 Thanks for the help! Justin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forcing a negative slope in linear regression?
If you want to go ahead with this in cold blood, you might look at the 'nnls' package. It fits regressions with non-negative coefficients. This might seem like the very opposite of what you want, but it essentially gets you there. You have to be prepared for the coefficient to go to zero though, if according to the data it really needs to be positive to minimise the residual SSQ. Here's what you do: * For any predictor, x, for which you want the regression coefficient to be non-positive, use -x as the predictor in the model. Think about it. * (The real trick) For any predictor, z, whose coefficient is not to be constrained at all, put *both* z and -z in as predictors. The algorithm will choose only one of them. nnls is now quite an old package and the interface is rather klunky, but the method is still OK. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller Sent: Wednesday, 1 June 2011 11:38 AM To: J S; r-help@r-project.org Subject: Re: [R] Forcing a negative slope in linear regression? If you force the slope, it is no longer a regression, so no. It is best to add those other dependent variables to the regression and evaluate whether their presence causes the fit to improve and yield signs of coefficients that match what you expect. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. J S yulya...@gmail.com wrote: Dear forum members, How can I force a negative slope in a linear regression even though the slope might be positive? I will need it for the purpose of determining the trend due reasons other than biological because the biological (genetic) trend is not positive for these data. Thanks. Julia Example of the data: [1] 1.254 1.235 1.261 0.952 1.202 1.152 0.801 0.424 0.330 0.251 0.229 0.246 [13] 0.414 0.494 0.578 0.628 0.514 0.594 0.827 0.812 0.629 0.928 0.707 0.976 [25] 1.099 1.039 1.272 1.398 1.926 1.987 2.132 1.644 2.174 2.453 2.392 3.002 [37] 3.352 2.410 2.206 2.692 2.653 1.604 2.536 3.070 3.137 4.187 4.803 4.575 [49] 4.580 3.779 4.201 5.685 4.915 5.929 5.474 6.140 5.182 5.524 5.848 5.830 [61] 5.800 7.517 6.422 [[alternative HTML version deleted]] _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Value of 'pi'
There is an urban legend that says Indiana passed a law implying pi = 3. (Because it says so in the bible...) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joshua Wiley Sent: Monday, 30 May 2011 4:10 PM To: Vincy Pyne Cc: r-help@r-project.org Subject: Re: [R] Value of 'pi' Dear Vincy, I hope that in school you also learned that 22/7 is an approximation. Please consult your local mathematician for a proof that pi != 22/7. A quick search will provide you with volumes of information on what pi is, how it may be calculated, and calculations out to thousands of digits. Cheers, Josh On Sun, May 29, 2011 at 11:01 PM, Vincy Pyne vincy_p...@yahoo.ca wrote: Dear R helpers, I have one basic doubt about the value of pi. In school, we have learned that pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 3.141593. So which value of pi should be considered? Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic question about three factor Anova
This is really a question about the help file for gl. The arguments are gl(n, k, length = n*k, labels = 1:n, ordered = FALSE) 'n' is the number of factor levels. That seems to be easy enough 'k' is called the number of replications. This is perhaps not the best way to express what it is. k is the number of times each of the n levels is to be repeated before starting again. In your example the 'a' levels are repeated 3 times (to cover 'b'), the 'b' levels are repeated once since you read in the values b1 b2 b3 b1 b2 ... and the levels of 'c' are repeated 60 times each since the top 60 values are all c1 and the bottom 60 values are all c2. 'length' is the overall length of the factor you are generating. By default is is just n*k, but in this case it has to be 4 (A levels) x 3 (B levels) x 2 (C levels) x 5 (reps in each A:B:C subgroup). The other two arguments are clear enough. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bogdan Lataianu Sent: Tuesday, 31 May 2011 8:05 AM To: r-help@r-project.org Subject: [R] Basic question about three factor Anova Read the data using scan(): # # a1 a2 a3 a4 # ---- # b1 b2 b3 b1 b2 b3 b1 b2 b3 b1 b2 b3 # --- --- ------ --- ------ --- ------ --- --- # # c1: # 4.1 4.6 3.74.9 5.2 4.75.0 6.1 5.53.9 4.4 3.7 # 4.3 4.9 3.94.6 5.6 4.75.4 6.2 5.93.3 4.3 3.9 # 4.5 4.2 4.15.3 5.8 5.05.7 6.5 5.63.4 4.7 4.0 # 3.8 4.5 4.55.0 5.4 4.55.3 5.7 5.03.7 4.1 4.4 # 4.3 4.8 3.94.6 5.5 4.75.4 6.1 5.93.3 4.2 3.9 # # c2: # 4.8 5.6 5.04.9 5.9 5.06.0 6.0 6.14.1 4.9 4.3 # 4.5 5.8 5.25.5 5.3 5.45.7 6.3 5.33.9 4.7 4.1 # 5.0 5.4 4.65.5 5.5 4.75.5 5.7 5.54.3 4.9 3.8 # 4.6 6.1 4.95.3 5.7 5.15.7 5.9 5.84.0 5.3 4.7 # 5.0 5.4 4.75.5 5.5 4.95.5 5.7 5.64.3 4.3 3.8 # # NOTE: Cut and paste the numbers without the leading # or labels # Y - scan() A - gl(4,3, 4*3*2*5, labels=c(a1,a2,a3,a4)); B - gl(3,1, 4*3*2*5, labels=c(b1,b2,b3)); C - gl(2,60, 4*3*2*5, labels=c(c1,c2)); anova(lm(Y~A*B*C)) # all effects and interactions In the above example, why the number of replications for A is 3, for B is 1 and for C is 60? And why 4*3*2*5? Is the 5 because there are 5 lines in each 4*3*2 group? What is the logic behind this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DateTime Math in R - POSIXct
Perhaps because the timezone is specified as a character string and not a date-time object complete with timezone. From the help filr for as.POSIXct.numeric: origin:a date-time object, or something which can be coerced by as.POSIXct(tz=GMT) to such an object. Note the coercion. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Galen Moore Sent: Tuesday, 31 May 2011 12:20 PM To: r-help@r-project.org Subject: [R] DateTime Math in R - POSIXct Greetings - I'm battling POSIXct, as per the code below. My input is actually an XL file, but the weird results below correctly model what I am seeing in my program. Before I punt and use lubridate or timeDate, could anyone please help me understand why POSIXct forces my variable back to GMT? I suspect that I'm not properly coding the tzone value, but it does not throw an error as-is. tstamp - 2011-05-22 11:45:00 MDT mode(tstamp) [1] character dateP - as.POSIXct(tstamp, origin=1970-01-01, tzone=MDT) mode(dateP) [1] numeric dateP [1] 2011-05-22 11:45:00 MDT dateN - as.numeric(dateP) dateN [1] 1306086300 dateP2 - as.POSIXct(dateN, origin=1970-01-01, tzone=MDT) dateP2 [1] 2011-05-22 18:45:00 MDT Many thanks. Galen Moore [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding a numeric to all values in the dataframe
Oops The first line of my template should use data.matrix() rather than data.frame() data.matrix() is guaranteed to return a numerical matrix from a data frame, making arithmetic always possible. Bill Venables. From: Venables, Bill (CMIS, Dutton Park) Sent: 20 May 2011 14:30 To: 'Ramya'; 'r-help@r-project.org' Subject: RE: [R] Adding a numeric to all values in the dataframe For that kind of operation (unusual as it is) work with numeric matrices. When you are finished, if you still want a data frame, make it then, not before. If your data starts off as data frame to begin with, turn it into a matrix first. E.g. myMatrix - data.frame(myData) myMatrix2 - myMatrix + 2 myData2 - data.frame(myMatrix2) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ramya Sent: Friday, 20 May 2011 1:54 PM To: r-help@r-project.org Subject: [R] Adding a numeric to all values in the dataframe Hi there I just want to add 2 to all the values in dataframe. I tried using sapply but it seem to die all the time. Any help would be appreciated. Thanks Ramya -- View this message in context: http://r.789695.n4.nabble.com/Adding-a-numeric-to-all-values-in-the-dataframe-tp3537594p3537594.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Contrasts in Penalized Package
The key line is prep - .checkinput(match.call(), parent.frame()) Among other things the model matrix is built in .checkinput( ) which is not exported from the package namespace. So you have to get rough with it and use penalized:::.checkinput and then you see these line of code options(contrasts = c(unordered = contr.none, ordered = contr.diff)) and the scene is set. The selection of contrasts is hardwired and to get around it you would need to hack the package, not a great idea. The namespace will get you if you try to mask contr.none with a function of your own, for example. Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Lars Bishop [lars...@gmail.com] Sent: 20 May 2011 20:52 To: r-help@r-project.org Subject: [R] Contrasts in Penalized Package Hi, The penalized documentation says that Unordered factors are turned into as many dummy variables as the factor has levels. This is done by a function in the package called contr.none. I'm trying to figure out how exactly is a model matrix created with this contrast option when the user calls the function with a formula. I typed library(penalized) ; penalized but couldn't point the part of the code where this is done. I'll appreciate any help on this. Lars. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [r] regression coefficient for different factors
You have received suggestions about this already, but you may want to consider something like this as an alternative: require(english) lev - as.character(as.english(0:9)) dat - data.frame(f = factor(sample(lev, 500, + rep=TRUE), levels = lev), + B = rnorm(500)) dat - within(dat, A - 2 + 3*B + rnorm(B)) ### method 1: using a loop coefs - sapply(levels(dat$f), + function(x) coef(lm(A ~ B, dat, + subset = f == x))) t(coefs) (Intercept)B zero 1.967234 2.795218 one 1.864298 3.048861 two 1.978757 2.893950 three2.035777 2.796963 four 2.092047 2.826677 five 2.263936 3.229843 six 1.740911 3.114069 seven1.975918 3.090971 eight2.064802 3.048225 nine 2.030697 3.059960 ### Greg Snow's suggeston - use lmList require(nlme) coef(lmList(A ~ B | f, dat)) (Intercept)B zero 1.967234 2.795218 one 1.864298 3.048861 two 1.978757 2.893950 three2.035777 2.796963 four 2.092047 2.826677 five 2.263936 3.229843 six 1.740911 3.114069 seven1.975918 3.090971 eight2.064802 3.048225 nine 2.030697 3.059960 Bill Venables From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Francesco Nutini [nutini.france...@gmail.com] Sent: 20 May 2011 19:17 To: [R] help Subject: [R] [r] regression coefficient for different factors Dear R-helpers, In my dataset I have two continuous variable (A and B) and one factor. I'm investigating the regression between the two variables usign the command lm(A ~ B, ...) but now I want to know the regression coefficient (r2) of A vs. B for every factors. I know that I can obtain this information with excel, but the factor have 68 levels...maybe [r] have a useful command. Thanks, Francesco Nutini [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] *not* using attach() *but* in one case ....
Martin Maechler writes: Well, then you don't know *THE ONE* case where modern users of R should use attach() ... as I have been teaching for a while, but seem not have got enought students listening ;-) ... --- Use it instead of load() {for save()d R objects} --- The advantage of attach() over load() there is that loaded objects (and there maye be a bunch!), are put into a separate place in the search path and will not accidentally overwrite objects in the global workspace. Of course, there are still quite a few situations {e.g. in typical BATCH use of R for simulations, or Sweaving, etc} where load() is good enough, and the extras of using attach() are not worth it. But the unconditional do not use attach() is not quite ok, at least not when you talk to non-beginners. Martin Maechler, ETH Zurich Curiously this is why I wrote the SOAR package. Instead of save() you use Store() (Frank Harrell had already taken 'Save') and instead of attach() use Attach(). The objects are saved as separate rda files in a special subdirectory and when Attach() is used on it they are placed on the search path, normally at position 2, as promises. The global environment is kept relatively free of extraneous objects (if you want to work like that) and the operation is fast and very low on memory use, (unless you want to use every object you see all at once, of course). The track package from Tony Plate achieves somewhat similar results but with a different method. Essentially it implements something very like the old S-PLUS style of keeping everything out of memory as files in a .Data subdirectory, (although Tony uses the name 'rdatadir'), unless actually in use. Bill Venables. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extraction of mean square value from ANOVA
That only applies if you have the same factors a and b each time. If this is the case you can do things in a much more slick way. u - matrix(rnorm(5000), nrow = 10) ## NB, nrow AB - expand.grid(a = letters[1:2], b = letters[1:5]) M - lm(u ~ a+b, AB) rmsq - colSums(resid(M)^2)/M$df.resid and Bob's your uncle. If you really want to do it quickly you would bypass lm() altogether and use something like ls.fit or, at an even lower level, qr() and qr.resid(). There are umpteen ways of fitting linear models. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dennis Murphy Sent: Friday, 20 May 2011 1:38 PM To: Cheryl Johnson Cc: r-help@r-project.org Subject: Re: [R] extraction of mean square value from ANOVA Hi: It's easier to use an apply family function than a loop for this type of problem, as illustrated below: # Generate 30 random samples of size 10 from a standard # normal distribution and put them into a matrix u - matrix(rnorm(300), ncol = 10) a - factor(rep(1:5, each = 2)) b - factor(rep(1:2, 5)) # Side note: It's not a good idea to name an object 'c' because a commonly used function in the base package has that name already, as in c(1, 3, 5)... # A function to fit the model and to compute the MSE # deviance() returns the residual sum of squares in a linear model meansq - function(y) { m - lm(y ~ a + b) deviance(m)/df.residual(m) } # Apply the function to each row of u; the result is a vector of MSEs msevec - apply(u, 1, meansq) msevec Note 2: The function works if a and b are in the same environment as the meansq() function. If you do all of this in the console, there should be no problem. If you decide to put all of this into a function, then you need to be more careful. HTH, Dennis On Thu, May 19, 2011 at 6:46 PM, Cheryl Johnson johnson.cheryl...@gmail.com wrote: Hello, I am randomly generating values and then using an ANOVA table to find the mean square value. I would like to form a loop that extracts the mean square value from ANOVA in each iteration. Below is an example of what I am doing. a-rnorm(10) b-factor(c(1,1,2,2,3,3,4,4,5,5)) c-factor(c(1,2,1,2,1,2,1,2,1,2)) mylm-lm(a~b+c) anova(mylm) Since I would like to use a loop to generate this several times it would be helpful to know how to extract the mean square value from ANOVA. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding a numeric to all values in the dataframe
For that kind of operation (unusual as it is) work with numeric matrices. When you are finished, if you still want a data frame, make it then, not before. If your data starts off as data frame to begin with, turn it into a matrix first. E.g. myMatrix - data.frame(myData) myMatrix2 - myMatrix + 2 myData2 - data.frame(myMatrix2) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ramya Sent: Friday, 20 May 2011 1:54 PM To: r-help@r-project.org Subject: [R] Adding a numeric to all values in the dataframe Hi there I just want to add 2 to all the values in dataframe. I tried using sapply but it seem to die all the time. Any help would be appreciated. Thanks Ramya -- View this message in context: http://r.789695.n4.nabble.com/Adding-a-numeric-to-all-values-in-the-dataframe-tp3537594p3537594.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb
Hi Bert, I think people should know about the Google Sytle Guide for R because, as I said, it represents a thoughtful contribution to the debate. Most of its advice is very good (meaning I agree with it!) but some is a bit too much (for example, the blanket advice never to use S4 classes and methods - that's just resisting progress, in my view). The advice on using - for the (normal) assingment operator rather than = is also good advice, (according to me), but people who have to program in both C and R about equally often may find it a bit tedious. We can argue over that one. I suggest it has a place in the R FAQ but with a suitable warning that this is just one view, albeit a thougtful one. I don't think it need be included in the posting guide, though. It would take away some of the fun. :-) Bill Venables. -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Wednesday, 18 May 2011 11:47 PM To: Venables, Bill (CMIS, Dutton Park) Cc: r-help@r-project.org Subject: R Style Guide -- Was Post-hoc tests in MASS using glm.nb Thanks Bill. Do you and others think that a link to this guide (or another)should be included in the Posting Guide and/or R FAQ? -- Bert On Tue, May 17, 2011 at 4:07 PM, bill.venab...@csiro.au wrote: Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html) I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost). The difference between with() and within() is as follows: with(data, expr, ...) allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast within(data, expr, ...) again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result. I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project. Here is a simple example using within() for something you cannot do in one step with transform(): polyData - within(data.frame(x = runif(500)), { x2 - x^2 x3 - x*x2 b - runif(4) eta - cbind(1,x,x2,x3) %*% b y - eta + rnorm(x, sd = 0.5) rm(b) }) check: str(polyData) 'data.frame': 500 obs. of 5 variables: $ x : num 0.5185 0.185 0.5566 0.2467 0.0178 ... $ y : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ... $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ... $ x3 : num 1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ... $ x2 : num 0.268811 0.034224 0.309802 0.060844 0.000315 ... Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM To: Peter Ehlers Cc: R list Subject: Re: [R] Post-hoc tests in MASS using glm.nb Folks: Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: attach can lead to confusion. I can't remember the last time I needed attach(). Peter Ehlers Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential confusion comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, a in the formula and in the attached data and another a also in the global environment. Then the wrong a would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the data = argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert [... non-germane material snipped ...] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb
I used to think like that. However I have recently re-read John Chambers' Software for Data Analysis and now I'm starting to see the point. S4 classes and methods do require you to plan your classes and methods well and the do impose a discipline that can seem rigid and unnecessary. But I have found that to program well you do need to exerceise a lot of discipline, mainly because it can take quite some time to spot all the inadequacies and even traps in your code that an ill-disciplined approach lets you get away with at first. IMHO, of course. Perhaps you can all see the traps that elude me. Cheers, Bill. PS Rolf Turner? Respectful? Goodness, what's going on? :-) -Original Message- From: Rolf Turner [mailto:rolf.tur...@xtra.co.nz] Sent: Thursday, 19 May 2011 9:34 AM To: Venables, Bill (CMIS, Dutton Park) Cc: r-help@r-project.org Subject: Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb On 19/05/11 10:26, bill.venab...@csiro.au wrote: SNIP Most of [the Google style guide's] advice is very good (meaning I agree with it!) but some is a bit too much (for example, the blanket advice never to use S4 classes and methods - that's just resisting progress, in my view). SNIP I must respectfully disagree with this view, and concur heartily with the style guide. S4 classes and methods are a ball-and-chain that one has to drag along. See also fortune(S4 methods). :-) cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html) I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost). The difference between with() and within() is as follows: with(data, expr, ...) allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast within(data, expr, ...) again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result. I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project. Here is a simple example using within() for something you cannot do in one step with transform(): polyData - within(data.frame(x = runif(500)), { x2 - x^2 x3 - x*x2 b - runif(4) eta - cbind(1,x,x2,x3) %*% b y - eta + rnorm(x, sd = 0.5) rm(b) }) check: str(polyData) 'data.frame': 500 obs. of 5 variables: $ x : num 0.5185 0.185 0.5566 0.2467 0.0178 ... $ y : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ... $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ... $ x3 : num 1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ... $ x2 : num 0.268811 0.034224 0.309802 0.060844 0.000315 ... Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM To: Peter Ehlers Cc: R list Subject: Re: [R] Post-hoc tests in MASS using glm.nb Folks: Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: attach can lead to confusion. I can't remember the last time I needed attach(). Peter Ehlers Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential confusion comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, a in the formula and in the attached data and another a also in the global environment. Then the wrong a would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the data = argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert [... non-germane material snipped ...] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
PS I should have followed the example with one using with() for something that would often be done with attach(): Consider: with(polyData, { plot(x, y, pch=.) o - order(x) lines(x[o], eta[o], col = red) }) I use this kind of dodge a lot, too, but now you can mostly use data= arguments on the individual functions. Bill Venables. -Original Message- From: Venables, Bill (CMIS, Dutton Park) Sent: Wednesday, 18 May 2011 9:07 AM To: 'Bert Gunter'; 'Peter Ehlers' Cc: 'R list' Subject: RE: [R] Post-hoc tests in MASS using glm.nb Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html) I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost). The difference between with() and within() is as follows: with(data, expr, ...) allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast within(data, expr, ...) again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result. I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project. Here is a simple example using within() for something you cannot do in one step with transform(): polyData - within(data.frame(x = runif(500)), { x2 - x^2 x3 - x*x2 b - runif(4) eta - cbind(1,x,x2,x3) %*% b y - eta + rnorm(x, sd = 0.5) rm(b) }) check: str(polyData) 'data.frame': 500 obs. of 5 variables: $ x : num 0.5185 0.185 0.5566 0.2467 0.0178 ... $ y : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ... $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ... $ x3 : num 1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ... $ x2 : num 0.268811 0.034224 0.309802 0.060844 0.000315 ... Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM To: Peter Ehlers Cc: R list Subject: Re: [R] Post-hoc tests in MASS using glm.nb Folks: Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: attach can lead to confusion. I can't remember the last time I needed attach(). Peter Ehlers Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential confusion comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, a in the formula and in the attached data and another a also in the global environment. Then the wrong a would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the data = argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert [... non-germane material snipped ...] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] simprof test using jaccard distance
The documentation for simprof says, with respect to the method.distance argument, This value can also be any function which returns a dist object. So you should be able to use the Jaccard index by setting up your own function to compute it. e.g. Jaccard - function(X) vegan::vegdist(X, method = jaccard) tst - simprof( , method.distance = Jaccard, .) by rights, ought to do the job. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of maia.ber...@csiro.au Sent: Tuesday, 17 May 2011 6:03 PM To: r-help@r-project.org Subject: [ExternalEmail] [R] simprof test using jaccard distance Dear All, I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function: simprof function (data, num.expected = 1000, num.simulated = 999, method.cluster = average, method.distance = euclidean, method.transform = identity, alpha = 0.05, sample.orientation = row, const = 0, silent = TRUE, increment = 100) { if (!is.matrix(data)) data - as.matrix(data) rawdata - data if (sample.orientation == column) rawdata - t(rawdata) if (is.function(method.distance)) rawdata.dist - method.distance(rawdata) else if (method.distance == braycurtis) { rawdata.dist - braycurtis(rawdata, const) if (!is.null(rownames(rawdata))) attr(rawdata.dist, Labels) - rownames(rawdata) } else rawdata.dist - dist(rawdata, method = method.distance) if (!method.transform == identity) rawdata - trans(rawdata, method.transform) hclust.results - hclust(rawdata.dist, method = method.cluster) pMatrix - cbind(hclust.results$merge, matrix(data = NA, nrow = nrow(hclust.results$merge), ncol = 1)) simprof.results - simprof.body(rawdata = rawdata, num.expected = num.expected, num.simulated = num.simulated, method.cluster = method.cluster, method.distance = method.distance, originaldata = rawdata, alpha = alpha, clust.order = hclust.results$merge, startrow = nrow(hclust.results$merge), pMatrix = pMatrix, const = const, silent = silent, increment = increment) results - list() results[[numgroups]] - length(simprof.results$samples) results[[pval]] - simprof.results$pval results[[hclust]] - hclust.results if (!is.null(rownames(rawdata))) { for (i in 1:length(simprof.results$samples)) { for (j in 1:length(simprof.results$samples[[i]])) { simprof.results$samples[[i]][j] - rownames(rawdata)[as.numeric(simprof.results$samples[[i]][j])] } } } results[[significantclusters]] - simprof.results$samples return(results) } environment: namespace:clustsig I tried to trick the function by bypassing the input of a raw community matrix (sites x species) by inputing instead a distance matrix already calculated with vegdist (vegan package) (Jaccard distance is available in vegdist, but not in the function dist used in simprof...) I therefore modified the function as follow: simprof2=function (data, num.expected = 1000, num.simulated = 999, method.cluster = average, alpha = 0.05, sample.orientation = row, const = 0, silent = TRUE, increment = 100) { hclust.results - hclust(data, method = method.cluster) pMatrix - cbind(hclust.results$merge, matrix(data = NA, nrow = nrow(hclust.results$merge), ncol = 1)) simprof.results - simprof.body(data = data, num.expected = num.expected, num.simulated = num.simulated, method.cluster = method.cluster,originaldata = data, alpha = alpha, clust.order = hclust.results$merge, startrow = nrow(hclust.results$merge), pMatrix = pMatrix, const = const, silent = silent, increment = increment) results - list() results[[numgroups]] - length(simprof.results$samples) results[[pval]] - simprof.results$pval results[[hclust]] - hclust.results if (!is.null(rownames(data))) { for (i in 1:length(simprof.results$samples)) { for (j in 1:length(simprof.results$samples[[i]])) { simprof.results$samples[[i]][j] - rownames(data)[as.numeric(simprof.results$samples[[i]][j])] } } } results[[significantclusters]] - simprof.results$samples return(results) } But upon running it on my vegdist object, I get the following error: simprof2(G3_jaccard) Error in simprof2(G3_jaccard) : could not find function simprof.body I guess this function is part of the initial environment in which simprof runs, but how do I access it in order to call it when I run my own function outside of the initial environment? Another possible but maybe more complex way would be to define the jaccard distance (as J= 2B/(1+B) B beeing BrayCurtis distance already used in the initial function)
Re: [R] Multiple plots on one device using stl
If you ?plot.stl you will see that that the second argument, set.pars, is a list of argument settings for par(), including a (variable) default setting for mfrow. I.e. plot.stl overrides your external setting (which will also override any layout() setting). It looks like to override it back, you may need to take charge yourself. Here is the gist of a partial way round it, perhaps, sort of. ... par(mar = c(0, 6, 0, 6), oma = c(6, 0, 4, 0), tck = -0.01, mfcol = c(4, 2)) plot(stl.1, set.pars = list()) plot(stl.2, set.pars = list()) Things might be a bit more flexible if you were to use xyplot.stl in the latticeExtra package. With lattice the operations of making the plot and displaying it are more cleanly separated. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ben Madin Sent: Wednesday, 18 May 2011 11:55 AM To: r-help@r-project.org Subject: [R] Multiple plots on one device using stl G'day, I am looking at monthly reports, and have three series of monthly data from 2007 to 2009. I would like to show the season decomposition of these two series side by side on the one device, however using plot doesn't seem to respect any use of layout(matrix(1:3, ncol=3)) or par(mfcol=c(1,3)). I'm guessing that this means that the plot(stl) perhaps uses them, but I can't find anywhere the / a plot.stl() - ie, I can't work out where the plot() call is going? I've attached a small example of data and some probably overly verbose code. There are only two sets of example data, not three as mentioned above. # load the data vals.1 - structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(2007, 2008, 2009), class = c(ordered, factor)), month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec), class = c(ordered, factor)), count = c(105, 100, 64, 44, 49, 65, 88, 90, 99, 92, 93, 88, 212, 146, 96, 131, 220, 143, 137, 138, 395, 362, 349, 222, 294, 268, 298, 310, 426, 348, 287, 101, 66, 112, 105, 4)), .Names = c(year, month, count), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36), class = data.frame) vals.2 - structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(2007, 2008, 2009), class = c(ordered, factor)), month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec), class = c(ordered, factor)), count = c(45, 34, 17, 6, 7, 16, 12, 11, 14, 17, 11, 20, 27, 12, 10, 14, 22, 23, 92, 144, 385, 274, 320, 252, 240, 146, 222, 142, 122, 117, 163, 89, 51, 89, 108)), .Names = c(year, month, count), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35), class = data.frame) # create the time series series.1 - ts(vals.1$count, start=c(2007,1), freq=12) series.2 - ts(vals.2$count, start=c(2007,1), freq=12) # apply the seasonal decomposition stl.1 - stl(series.1, per, robust=TRUE) stl.2 - stl(series.2, per, robust=TRUE) # set up the device for side by side display layout(matrix(1:2, ncol=2)) # little check layout.show(2) # plot the first series and second series plot(stl.1, labels = c('Count by month','Seasonal Component','Trend','Remainder'), main='Series.1 Decomposition') plot(stl.2, labels = c('Count by month','Seasonal Component','Trend','Remainder'), main='Series.2 Decomposition') # hmm, used whole device twice, try again par(mfcol=c(1,2)) # and now the second plot(stl.1, labels = c('Count by month','Seasonal Component','Trend','Remainder'), main='Series.1 Decomposition') plot(stl.2, labels = c('Count by month','Seasonal Component','Trend','Remainder'), main='Series.2 Decomposition') # oh what about split.screen() split.screen(c(1,2)) screen(1) # now plot plot(stl.1, labels = c('Count by month','Seasonal Component','Trend','Remainder'), main='Series.1 Decomposition') # something wrong with the plot, not seeing original series at the top. screen(2) # Error in par(split.screens[[n]]) : parameter j in mfg is out of range cheers Ben SessionInfo() R version 2.13.0 Patched (2011-05-05 r55784) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1]
Re: [R] Extracting the dimnames of an array with variable dimensions
Here is an alternative solution foo - array(data = rnorm(32), dim = c(4,4,2), + dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) ind - which(foo 0, arr.ind = TRUE) row.names(ind) - NULL ## to avoid warnings. mapply([, dimnames(foo), data.frame(ind)) [,1] [,2] [,3] [1,] a A e [2,] b A e [3,] a B e [4,] c B e [5,] a C e [6,] c C e [7,] b D e [8,] a A f [9,] b B f [10,] c B f [11,] d B f [12,] a C f [13,] d C f [14,] a D f [15,] b D f Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dennis Murphy Sent: Monday, 16 May 2011 3:14 PM To: Pierre Roudier Cc: r-help@r-project.org Subject: Re: [R] Extracting the dimnames of an array with variable dimensions Hi: Does it have to be an array? If all you're interested in is the dimnames, how about this? library(plyr) foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) foo , , e A B C D a -0.2183877 -0.8912908 -2.0175612 -0.8080548 b 0.4870784 -0.8626293 -0.5641368 -0.5219722 c 0.8821044 0.3187850 1.2203297 -0.3151186 d -0.9894656 -1.1779108 0.9853935 0.3560747 , , f A B C D a 0.7357773 -1.7591637 1.6320887 1.2248529 b 0.4662315 0.1131432 -0.9790887 -0.6575306 c -0.3564725 -0.9202688 0.1017894 0.7382683 d 0.2825117 0.9242299 0.3577063 -1.3297339 # flatten array into a data frame with dimnames as factors # adply() converts an array to a data frame, applying a function # along the stated dimensions u - adply(foo, c(1, 2, 3), as.vector) subset(u, V1 0)[, 1:3] X1 X2 X3 2 b A e 3 c A e 7 c B e 11 c C e 12 d C e 16 d D e 17 a A f 18 b A f 20 d A f 22 b B f 24 d B f 25 a C f 27 c C f 28 d C f 29 a D f 31 c D f HTH, Dennis On Sun, May 15, 2011 at 9:20 PM, Pierre Roudier pierre.roud...@gmail.com wrote: Hi list, In a function I am writing, I need to extract the dimension names of an array. I know this can be acheived easily using dimnames() but my problem is that I want my function to be robust when the number of dimensions varies. Consider the following case: foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) # What I want is to extract the *names of the dimensions* for which foo have positive values: ind - which(foo 0, arr.ind = TRUE) # A first solution is: t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3], drop=FALSE] # But it does require to know the dimensions of foo I would like to do something like: ind - which(foo 0, arr.ind = TRUE) t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE] but in that case the dimnames are dropped. Any suggestion? Cheers, Pierre -- Scientist Landcare Research, New Zealand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
?relevel Also, you might want to fit the models as follows Model1 - glm.nb(Cells ~ Cryogel*Day, data = myData) myData2 - within(myData, Cryogel - relevel(Cryogel, ref = 2)) Model2 - update(Model1, data = myData1) c You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances). You could fit your model as Model0 - glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData) This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics. This is pretty straightforward as well. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of bryony Sent: Tuesday, 17 May 2011 3:46 AM To: r-help@r-project.org Subject: [R] Post-hoc tests in MASS using glm.nb I am struggling to generate p values for comparisons of levels (post-hoc tests) in a glm with a negative binomial distribution I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them. However, I seem unable to generate multiple comparisons between the days and cryogels. So my model is Model1-glm.nb(Cells~Cryogel+Day+Day:Cryogel) The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** Day143.3226 0.3440 9.658 2e-16 *** Day283.3546 0.3440 9.752 2e-16 *** Day7 3.3638 0.3440 9.779 2e-16 *** Cryogel2 0.7097 0.3655 1.942 0.05215 . Cryogel3 0.7259 0.3651 1.988 0.04677 * Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is. Please help! Many thanks for your input Bryony -- View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filtering out bad data points
You could use a function to do the job: withinRange - function(x, r = quantile(x, c(0.05, 0.95))) x = r[1] x = r[2] dtest2 - subset(dftest, withinRange(x)) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert A'gata Sent: Tuesday, 10 May 2011 10:57 AM To: r-help@r-project.org Subject: [R] Filtering out bad data points Hi, I always have a question about how to do this best in R. I have a data frame and a set of criteria to filter points out. My procedure is to always locate indices of those points, check if index vector length is greater than 0 or not and then remove them. Meaning dftest - data.frame(x=rnorm(100),y=rnorm(100)); qtile - quantile(dftest$x,probs=c(0.05,0.95)); badIdx - which((dftest$x qtile[1]) | (dftest$x qtile[2])); if (length(badIdx) 0) { dftest - dftest[-idx,]; } My question is that is there a more streamlined way to achieve this? Thank you. Cheers, Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Observation ID
Here is one way: df - data.frame(Value = rnorm(30), Group = sample(c('A','B','C'), 30, replace = TRUE)) ## make a little function to do the job iNumber - function(f) { f - as.factor(f) X - outer(f, levels(f), ==)+0 rowSums(X * apply(X, 2, cumsum)) } ## add the numbering column df - within(df, internalNumber - iNumber(Group)) ## Check that it works head(df) Value Group internalNumber 1 -1.5014788 C 1 2 0.6035679 C 2 3 -0.6953930 C 3 4 -0.2413863 A 1 5 -0.1170961 A 2 6 1.5110721 C 4 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Baer Sent: Tuesday, 10 May 2011 7:22 AM To: R-help@r-project.org Subject: [R] Creating Observation ID If I have a data frame something like: Value=rnorm(30) Group = sample(c('A','B','C'), 30, replace=TRUE) df = data.frame(Value, Group) It seems like it should be simple to create an 'ObsID' column which indicates the observation order of each Value within each of the 3 groups. Somehow, I can't quite see how to do it without manually sub-setting the parent data frame and then putting it back together again. Anyone able to get me started on a cleaner (more R-like) approach? Thanks, Rob -- Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences 800 W. Jefferson St. Kirksville, MO 63501 660-626-2322 FAX 660-626-2965 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is this an ANOVA ?
You probably want to do something like this: fm - lm(y ~ x, MD) anova(fm) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F valuePr(F) x 2250 125.0 50 1.513e-06 Residuals 12 30 2.5 Answers to questions: 1. No. 2. Yes. (whoever you are). -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ubuntu Diego Sent: Wednesday, 13 April 2011 10:10 AM To: r-help@r-project.org Subject: [R] is this an ANOVA ? Hi all, I have a very easy questions (I hope). I had measure a property of plants, growing in three different substrates (A, B and C). The rest of the conditions remained constant. There was very high variation on the results. I want to do address, whether there is any difference in the response (my measurement) from substrate to substrate? x-c('A','A','A','A','A','B','B','B','B','B','C','C','C','C','C') # Substrate type y - c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) # Results of the measurement MD-data.frame(x,y) I wrote a linear model for this: summary(lm(y~x,data=MD)) This is the output: Call: lm(formula = y ~ x, data = MD) Residuals: Min 1Q Median 3QMax -2.000e+00 -1.000e+00 5.551e-17 1.000e+00 2.000e+00 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3. 0.7071 4.243 0.001142 ** xB5. 1. 5.000 0.000309 *** xC 10. 1. 10.000 3.58e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.581 on 12 degrees of freedom Multiple R-squared: 0.8929, Adjusted R-squared: 0.875 F-statistic:50 on 2 and 12 DF, p-value: 1.513e-06 I conclude that there is an effect of substrate type (x). NOW the questions : 1) Do the fact that the all p-values are significant means that all the groups are different from each other ? 2) Is there a (easy) way to plot, mean plus/minus 2*sd for each substrate type ? (with asterisks denoting significant differences ?) THANKS ! version platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status major 2 minor 11.1 year 2010 month 05 day31 svn rev52157 language R version.string R version 2.11.1 (2010-05-31) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multinom() residual deviance
The residual deviance from a multinomial model is numerically equal (up to round-off error) to that you would get had you fitted the model as a surrogate Poisson generalized linear model. Here is a short demo building on your example set.seed(101) df - data.frame(f = sample(letters[1:3], 500, rep = TRUE), + x = rnorm(500)) library(nnet) mm - multinom(f ~ x, df) # weights: 9 (4 variable) initial value 549.306144 final value 548.246724 converged X - with(df, outer(f, levels(f), ==)+0) DF - within(expand.grid(a = 1:500, b = letters[1:3]), + a - factor(a)) DF - cbind(DF, f = as.vector(X), x = df$x) gm - glm(f ~ a + b/x, poisson, DF) ## surrogate Poisson equivalent model deviance(gm) ## surrogate Poisson [1] 1096.493 deviance(mm) ## multinomial from nnet package [1] 1096.493 Bill Venables From: Sascha Vieweg [saschav...@gmail.com] Sent: 09 April 2011 17:06 To: Venables, Bill (CMIS, Dutton Park) Cc: saschav...@gmail.com; r-help@r-project.org Subject: RE: [R] multinom() residual deviance Hello I am aware of the differences between the two models, excuse me for being imprecise on that in my first posting. My question only regards the nature or structure of the deviance, and thus whether the residual deviance of the multinomial model is the same residual deviance as reported by, say, glm. This is particularly important, since I want to calculate a pseudo R-Squared as follows (data from below): library(nnet) m - multinom(y ~ ., data=df) (llnull - deviance(update(m, . ~ 1, trace=F))) (llmod - deviance(m)) (RMcFadden - 1 - (llmod/llnull)) (I know that many statisticians here highly discourage people from using the pseudo R-Squareds, however, not many editors read this mailing list and still insist.) The MASS book warning alerted me that the multinom residual deviance may be of principally different structure/nature than the glm one. Thus, while the calculation above holds for a glm, it does not for a multinom model. Am I right? And how to fix? On 11-04-09 05:43, bill.venab...@csiro.au wrote: The two models you fit are quite different. The first is a binomial model equivalent to fm - glm(I(y == a) ~ x, binomial, df) which you can check leads to the same result. I.e. this model amalgamates classes b and c into one. The second is a multivariate logistic model that considers all three classes defined by your factor y, (and has twice the number of parameters, among other things). The three clases, a, b and c remain separate in the model. Hence the two models are not directly comparable, so why should the deviance be? Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Sascha Vieweg [saschav...@gmail.com] Sent: 09 April 2011 01:14 To: r-help@r-project.org Subject: [R] multinom() residual deviance Running a binary logit model on the data df - data.frame(y=sample(letters[1:3], 100, repl=T), x=rnorm(100)) reveals some residual deviance: summary(glm(y ~ ., data=df, family=binomial(logit))) However, running a multinomial model on that data (multinom, nnet) reveals a residual deviance: summary(multinom(y ~ ., data=df)) On page 203, the MASS book says that here the deviance is comparing with the model that correctly predicts each person, not the multinomial response for each cell of the mininal model, followed by and instruction how to compare with the saturated model. For me as a beginner, this sounds like an important warning, however, I don't know what the warning exactly means and hence do not know what the difference between the residual deviance of the former (binary) and the latter (multinomial) model is. (I need the deviances to calculate some of the pseudo R-squares with function pR2(), package pscl.) Could you give good advice? Thanks *S* -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multinom() residual deviance
The two models you fit are quite different. The first is a binomial model equivalent to fm - glm(I(y == a) ~ x, binomial, df) which you can check leads to the same result. I.e. this model amalgamates classes b and c into one. The second is a multivariate logistic model that considers all three classes defined by your factor y, (and has twice the number of parameters, among other things). The three clases, a, b and c remain separate in the model. Hence the two models are not directly comparable, so why should the deviance be? Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Sascha Vieweg [saschav...@gmail.com] Sent: 09 April 2011 01:14 To: r-help@r-project.org Subject: [R] multinom() residual deviance Running a binary logit model on the data df - data.frame(y=sample(letters[1:3], 100, repl=T), x=rnorm(100)) reveals some residual deviance: summary(glm(y ~ ., data=df, family=binomial(logit))) However, running a multinomial model on that data (multinom, nnet) reveals a residual deviance: summary(multinom(y ~ ., data=df)) On page 203, the MASS book says that here the deviance is comparing with the model that correctly predicts each person, not the multinomial response for each cell of the mininal model, followed by and instruction how to compare with the saturated model. For me as a beginner, this sounds like an important warning, however, I don't know what the warning exactly means and hence do not know what the difference between the residual deviance of the former (binary) and the latter (multinomial) model is. (I need the deviances to calculate some of the pseudo R-squares with function pR2(), package pscl.) Could you give good advice? Thanks *S* -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pulling strings from a Flat file
Isn't all you need read.fwf? From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Kalicin, Sarah [sarah.kali...@intel.com] Sent: 06 April 2011 09:48 To: r-help@r-project.org Subject: [R] Pulling strings from a Flat file Hi, I have a flat file that contains a bunch of strings that look like this. The file was originally in Unix and brought over into Windows: E123456E234567E345678E456789E567891E678910E. . . . Basically the string starts with E and is followed with 6 numbers. One string=E123456, length=7 characters. This file contains 10,000's of these strings. I want to separate them into one vector the length of the number of strings in the flat file, where each string is it's on unique value. cc-c(7,7,7,7,7,7,7) aa- file(Master,r, raw=TRUE) readChar(aa, cc, useBytes = FALSE) [1] E123456 \nE23456 7\nE3456 78\nE456 789\nE56 7891\nE6 78910\nE close(aa) unlink(Master) The biggest issue is I am getting \n added into the string, which I am not sure where it is coming from, and splices the strings. Any suggestions on getting rid of the /n and create an infinite sequence of 7's for the string length for the cc vector? Is there a better way to do this? Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fraction with timelag
How about simply df - data.frame(id = 1:6, + xout = c(12.34, 21.34, 2.34, 4.56, 3.24, 3.45), + xin = c(NA, 34,67,87,34, NA)) with(df, c(NA, xin[-1]/xout[-length(xout)])) [1]NA 2.755267 3.139644 37.179487 7.456140NA BTW You seem to have some decimal points missing in your example. BTW2: zoo? From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Patrick Hausmann [patrick.hausm...@uni-bremen.de] Sent: 24 March 2011 21:31 To: r-help@r-project.org Subject: [R] fraction with timelag Dear r-help, I'm having this DF df - data.frame(id = 1:6, xout = c(1234, 2134, 234, 456, 324, 345), xin= c(NA, 34,67,87,34, NA)) and would like to calculate the fraction (xin_t / xout_t-1) The result should be: # NA, 2.76, 3.14, 37.18, 7.46, NA I am sure there is a solution using zoo... but I don't know how... Thanks for any help! Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R as a non-functional language
That's not the point. The point is that R has functions which have side-effects and hence does not meet the strict requirements for a functional language. -Original Message- From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be] Sent: Monday, 21 March 2011 7:20 PM To: russ.abb...@gmail.com; Venables, Bill (CMIS, Dutton Park) Cc: r-help@r-project.org Subject: RE: [R] R as a non-functional language Dear Russ, Why not use simply pH - c(area1 = 4.5, area2 = 7, mud = 7.3, dam = 8.2, middle = 6.3) That notation is IMHO the most readable for students. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Russ Abbott Verzonden: zondag 20 maart 2011 6:46 Aan: bill.venab...@csiro.au CC: r-help@r-project.org Onderwerp: Re: [R] R as a non-functional language I'm afraid I disagree. As a number of people have shown, it's certainly possible to get the end result pH - c(4.5,7,7.3,8.2,6.3) names(pH) - c('area1','area2','mud','dam','middle') pH area1 area2muddam middle 4.57.07.38.26.3 using a single expression. But what makes this non-functional is that the names() function operates on a reference to the pH object/entity/element. In other words, the names() function has a side effect, which is not permitted in strictly functional programming. I don't know if R has threads. But imagine it did and that one ran names(pH) - c('area1','area2','mud','dam','middle') and names(pH) - c('areaA','areaB','dirt','blockage','center') in simultaneous threads. Since they are both operating on the same pH element, it is uncertain what the result would be. That's one of the things that functional programming prevents. *-- Russ * On Sat, Mar 19, 2011 at 10:22 PM, bill.venab...@csiro.au wrote: PS the form names(p) - c(...) is still functional, of course. It is just a bit of syntactic sugar for the clumsier p - `names-`(p, c(...)) e.g. pH - `names-`(pH, letters[1:5]) pH a b c d e 4.5 7.0 7.3 8.2 6.3 -Original Message- From: Venables, Bill (CMIS, Dutton Park) Sent: Sunday, 20 March 2011 3:09 PM To: 'Gabor Grothendieck'; 'russ.abb...@gmail.com' Cc: 'r-help@r-project.org' Subject: RE: [R] R as a non-functional language The idiom I prefer is pH - structure(c(4.5,7,7.3,8.2,6.3), names = c('area1','area2','mud','dam','middle')) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gabor Grothendieck Sent: Sunday, 20 March 2011 2:33 PM To: russ.abb...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] R as a non-functional language On Sun, Mar 20, 2011 at 12:20 AM, Russ Abbott russ.abb...@gmail.com wrote: I'm reading Torgo (2010) *Data Mining with R*http://www.liaad.up.pt/~ltorgo/DataMiningWithR/code.htmlin preparation for a class I'll be teaching next quarter. Here's an example that is very non-functional. pH - c(4.5,7,7.3,8.2,6.3) names(pH) - c('area1','area2','mud','dam','middle') pH area1 area2muddam middle 4.57.07.38.26.3 This sort of thing seems to be quite common in R. Try this: pH - setNames(c(4.5,7,7.3,8.2,6.3), c('area1','area2','mud','dam','middle')) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] Help with POSIXct
You might try dat$F1 - format(as.Date(dat$F1), format = %b-%y) although it rather depends on the class of F1 as it has been read. Bill Venables. (It would be courteous of you to give us yor name, by the way.) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of i...@mathewanalytics.com Sent: Tuesday, 22 March 2011 6:31 AM To: r-help@r-project.org Subject: [R] Help with POSIXct I rarely work with dates in R, so I know very little about the POSIXct and POSIXlt classes. I'm importing an excel file into R using the RODBC package, and am having issues reformatting the dates. 1. The important info: sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-2 loaded via a namespace (and not attached): [1] tools_2.12.2 2. My question: My data looks like the following once it is imported into R. one - odbcConnectExcel(marketshare.xls) dat - sqlFetch(one, Sheet1) close(one) dat F1 Marvel DC 1 2010-01-01 42 34 2 2010-02-01 45 34 3 2010-03-01 47 29 4 2010-04-01 45 32 5 2010-05-01 45 35 6 2010-06-01 42 34 Variable F1, is supposed to be Jan-10, Feb-10, Mar-10, etc. However, in the process of importing the .xls file, R is reformating the dates. How can I retrieve the original month-year format that I have in my excel file. 3. While the R help list discourages against asking multiple questions in an inquiry, it'd be really helpful if someone could point me to any good online resources on the POSIXct and POSIXlt classes. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to substract a valur from dataframe with condition
dat - within(dat, { X2 - ifelse(X2 50, 100-X2, X2) X3 - ifelse(X3 50, 100-X3, X3) }) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of joe82 Sent: Tuesday, 22 March 2011 7:40 AM To: r-help@r-project.org Subject: [R] How to substract a valur from dataframe with condition Hello All, I need help with my dataframe, it is big but here I am using a small table as an example. My dataframe df looks like: X1 X2X3 1 2011-02 0.00 96.00 2 2011-02 0.00 2.11 3 2011-02 2.00 3.08 4 2011-02 0.06 2.79 5 2011-02 0.00 96.00 6 2011-02 0.00 97.00 7 2011-02 0.08 2.23 I want values in columns X2 and X3 to be checked if they are greater than 50, if yes, then subtract from '100' df should look like: X1 X2X3 1 2011-02 0.00 4.00 2 2011-02 0.00 2.11 3 2011-02 2.00 3.08 4 2011-02 0.06 2.79 5 2011-02 0.00 4.00 6 2011-02 0.00 3.00 7 2011-02 0.08 2.23 Please help, I will really appreciate that. Thanks, Joe -- View this message in context: http://r.789695.n4.nabble.com/How-to-substract-a-valur-from-dataframe-with-condition-tp3394907p3394907.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R as a non-functional language
The idiom I prefer is pH - structure(c(4.5,7,7.3,8.2,6.3), names = c('area1','area2','mud','dam','middle')) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gabor Grothendieck Sent: Sunday, 20 March 2011 2:33 PM To: russ.abb...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] R as a non-functional language On Sun, Mar 20, 2011 at 12:20 AM, Russ Abbott russ.abb...@gmail.com wrote: I'm reading Torgo (2010) *Data Mining with R*http://www.liaad.up.pt/~ltorgo/DataMiningWithR/code.htmlin preparation for a class I'll be teaching next quarter. Here's an example that is very non-functional. pH - c(4.5,7,7.3,8.2,6.3) names(pH) - c('area1','area2','mud','dam','middle') pH area1 area2 mud dam middle 4.5 7.0 7.3 8.2 6.3 This sort of thing seems to be quite common in R. Try this: pH - setNames(c(4.5,7,7.3,8.2,6.3), c('area1','area2','mud','dam','middle')) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] making dataframes
Firstly, the way you have constructed your data frame in the example will convert everything to factors. What you need to do is actually a bit simpler: ### dum - data.frame(date, col1, col2) ### One way to turn this into the kind of data frame you want is to convert the main part of it to a table first, and then coerce into a data frame: ### tab - as.table(as.matrix(dum[, -1])) row.names(tab) - date names(dimnames(tab)) - c(date, category) Dum - as.data.frame(tab, responseName = rainfall) Dum$date - factor(Dum$date, levels = date) ### Here is a checK: head(Dum) date category rainfall 1 jan col1 8.2 2 feb col1 5.4 3 mar col1 4.3 4 apr col1 4.1 5 may col1 3.1 6 june col1 2.5 with(Dum, tapply(rainfall, date, mean)) jan feb mar apr may june july aug sep oct nov dec 5.65 3.85 4.50 5.50 5.30 1.80 2.35 6.50 5.35 2.20 5.95 4.40 Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of pelt Sent: Thursday, 17 March 2011 12:28 AM To: r-help@r-project.org Subject: [R] making dataframes Dear all, I have a dataframe which looks like this (dummy): date-c(jan, feb, mar, apr, may, june, july, aug,sep,oct,nov,dec) col1-c(8.2,5.4,4.3,4.1,3.1,2.5,1.1,4.5,3.2,1.9,7.8,6.5) col2-c(3.1,2.3,4.7,6.9,7.5,1.1,3.6,8.5,7.5,2.5,4.1,2.3) dum-data.frame(cbind(date,col1,col2)) dum date col1 col2 1 jan 8.2 3.1 2 feb 5.4 2.3 3 mar 4.3 4.7 4 apr 4.1 6.9 5 may 3.1 7.5 6 june 2.5 1.1 7 july 1.1 3.6 8 aug 4.5 8.5 9 sep 3.2 7.5 10 oct 1.9 2.5 11 nov 7.8 4.1 12 dec 6.5 2.3 I would like to convert this data.frame into something that looks like this: date rainfall category 1 jan 8.2 col1 2 feb 5.4 col1 3 mar 4.3 col1 4 apr 4.1 col1 5 may 3.1 col1 6 june 2.5 col1 7 july 1.1 col1 8 aug 4.5 col1 9 sep 3.2 col1 10 oct 1.9 col1 11 nov 7.8 col1 12 dec 6.5 col1 1 jan 3.1 col2 2 feb 2.3 col2 3 mar 4.7 col2 4 apr 6.9 col2 5 may 7.5 col2 6 june 1.1 col2 7 july3.6 col2 8 aug 8.5 col2 9 sep 7.5 col2 10 oct 2.5 col2 11 nov 4.1 col2 12 dec 2.3 col2 So the column-names become categories. The dataset is rather large with many columns and a lengthy date-string. Is there an easy way to do this? Thank you for your help, Kind regards, Saskia van Pelt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why doesn't this work ?
It doesn't work (in R) because it is not written in R. It's written in some other language that looks a bit like R. t - 3 z - t %in% 1:3 z [1] TRUE t - 4 z - t %in% 1:3 z [1] FALSE -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of eric Sent: Thursday, 17 March 2011 1:26 PM To: r-help@r-project.org Subject: [R] Why doesn't this work ? Why doesn't this work and is there a better way ? z -ifelse(t==1 || 2 || 3, 1,0) t -3 z [1] 1 t -4 z [1] 1 trying to say ...if t == 1 or if t== 2 or if t ==3 then true, otherwise false -- View this message in context: http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset using grepl
subset(data, grepl([1-5], section) !grepl(0, section)) BTW grepl([1:5], section) does work. It checks for the characters 1, :, or 5. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kang Min Sent: Thursday, 17 March 2011 1:29 PM To: r-help@r-project.org Subject: Re: [R] Subset using grepl I have a new question, also regarding grepl. I would like to subset rows with numbers from 1 to 5 in the section column, so I used subset(data, grepl([1:5], section)) but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried subset(data, grepl([1,2,3,4,5], section)) which worked. But I also got 10 in the dataframe as well. How can I exclude 10? data section piece LTc1LTc2 10a 10-10.729095368 NA 10a 10-259.53292189 13.95612454 10h 10-30.213756661 NA 10i 10-4NA NA 10b NA NA NA 10c NA NA NA 10d NA NA NA 10e NA NA NA 10f NA NA NA 10g NA NA NA 10h NA NA NA 10j NA NA NA 1b 1-1 NA NA 1d 1-2 29.37971303 12.79688209 1g 1-6 NA 7.607911603 1h 1-3 0.298059164 27.09896941 1i 1-4 25.11261782 19.87149991 1j 1-5 36.66969601 42.28507923 1a NA NA NA 1c NA NA NA 1e NA NA NA 1f NA NA NA 2a 2-1 15.98582117 10.58696146 2a 2-2 0.557308341 41.52650718 2c 2-3 14.99499024 10.0896793 2e 2-4 148.4530636 56.45493191 2f 2-5 25.27493551 12.98808577 2i 2-6 20.32857108 22.76075728 2b NA NA NA 2d NA NA NA 2g NA NA NA 2h NA NA NA 2j NA NA NA 3a 3-1 13.36602867 11.47541439 3a 3-7 NA 111.9007822 3c 3-2 10.57406701 5.58567 3d 3-3 11.73240891 10.73833651 3e 3-8 NA 14.54214165 3h 3-4 21.56072089 21.59748884 3i 3-5 15.42846935 16.62715409 3i 3-6 129.7367193 121.8206045 3b NA NA NA 3f NA NA NA 3g NA NA NA 3j NA NA NA 5b 5-1 18.61733498 18.13545293 5d 5-3 NA 7.81018526 5f 5-2 12.5158971 14.37884817 5a NA NA NA 5c NA NA NA 5e NA NA NA 5g NA NA NA 5h NA NA NA 5i NA NA NA 5j NA NA NA 9h 9-1 NA NA 9a NA NA NA 9b NA NA NA 9c NA NA NA 9d NA NA NA 9e NA NA NA 9f NA NA NA 9g NA NA NA 9i NA NA NA 9j NA NA NA 8a 8-1 14.29712852 12.83178905 8e 8-2 23.46594953 9.097377872 8f 8-3 NA NA 8f 8-4 22.20001584 20.39646766 8h 8-5 50.54497551 56.93752065 8b NA NA NA 8c NA NA NA 8d NA NA NA 8g NA NA NA 8i NA NA NA 8j NA NA NA 4b 4-1 40.83468857 35.99017683 4f 4-3 NA 182.8060799 4f 4-4 NA 36.81401955 4h 4-2 17.13625062 NA 4a NA NA NA 4c NA NA NA 4d NA NA NA 4e NA NA NA 4g NA NA NA 4i NA NA NA 4j NA NA NA 7b 7-1 8.217605633 8.565035083 7a NA NA NA 7c NA NA NA 7d NA NA NA 7e NA NA NA 7f NA NA NA 7g NA NA NA 7h NA NA NA 7i NA NA NA 7j NA NA NA 6b 6-6 NA 11.57887288 6c 6-1 27.32608984 17.17778959 6c 6-2 78.21988783 61.80558768 6d 6-7 NA 3.599685625 6f 6-3 26.78838281 23.33258286 6h 6-4 NA NA 6h 6-5 NA NA 6a NA NA NA 6e NA NA NA 6g NA NA NA 6i NA NA NA 6j NA NA NA On Jan 29, 10:43 pm, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Sat, 29 Jan 2011, Kang Min wrote: Thanks Prof Ripley, the condition worked! Btw I tried to search ?repl but I don't have documentation for it. Is it in a non-basic package? I meant grepl: the edit messed up (but not on my screen, as sometimes happens when working remotely). The point is that 'perl=TRUE' guarantees that [A-J] is interpreted in ASCII order. On Jan 29, 6:54�pm, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: The grep comdition is [A-J] BTW, why there are lots of unnecessary steps here, including using cbind() and subset(): x - rep(LETTERS[1:20],3) y - rep(1:3, 20) z - paste(x,y, sep=)
Re: [R] data.frame transformation
It is possible to do it with numeric comparisons, as well, but to make life comfortable you need to turn off the warning system temporarily. df - data.frame(q1 = c(0,0,33.33,check), q2 = c(0,33.33,check,9.156), q3 = c(check,check,25,100), q4 = c(7.123,35,100,check)) conv - function(x, cutoff) { oldOpt - options(warn = -1) on.exit(options(oldOpt)) x - as.factor(x) lev - as.numeric(levels(x)) levels(x)[!is.na(lev) lev cutoff] - . x } Check: (df1 - data.frame(lapply(df, conv, cutoff = 10))) q1q2q3q4 1 . . check . 2 . 33.33 check35 3 33.33 check25 100 4 check . 100 check Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Tuesday, 15 March 2011 6:29 AM To: andrija djurovic Cc: r-help@r-project.org Subject: Re: [R] data.frame transformation On Mar 14, 2011, at 3:51 PM, andrija djurovic wrote: I would like to hide cells with values less the 10%, so . or just doesn't make me any difference. Also I used apply combined with as.character: apply(df, 2, function(x) ifelse(as.character(x) 10,.,x)) This is, probably not a good solution, but it works except that I lose row names and because of that I was wondering if there is some other way to do this. Anyway thank you both i will try to do this before combining numbers and strings. I saw your later assertion that it didn't work which surprised me. My version of your data followed my advice not to use factors and your effort did succeed when the columns were character rather than factor. I put back the row numbers by coercing back to a data.frame. `apply` returns a matrix. df-data.frame(q1=c(0,0,33.33,check),q2=c(0,33.33, check,9.156), + q3=c(check,check,25,100),q4=c(7.123,35,100,check), stringsAsFactors=FALSE) as.data.frame(apply(df, 2, function(x) ifelse(as.character(x) 10,.,x))) q1q2q3q4 1 . . check 7.123 2 . 33.33 check35 3 33.33 .25 100 4 check 9.156 100 check There is a danger of using character collation in that if there are any leading characters in those strings that are below 1 such as a blank or any other punctuation, they will get dotted. , 1 [1] TRUE . 1 [1] TRUE - 1 [1] TRUE And 1.check would also get dotted 1.check 10 [1] TRUE Andrija On Mon, Mar 14, 2011 at 8:11 PM, David Winsemius dwinsem...@comcast.net wrote: On Mar 14, 2011, at 2:52 PM, andrija djurovic wrote: Hi R users, I have following data frame df-data.frame(q1=c(0,0,33.33,check),q2=c(0,33.33,check,9.156), q3=c(check,check,25,100),q4=c(7.123,35,100,check)) and i would like to replace every element that is less then 10 with . (dot) in order to obtain this: q1q2q3q4 1 . . check . 2 . 33.33 check35 3 33.33 check25 100 4 check . 100 check I had a lot of difficulties because each variable is factor. Right, so comparisons with will throw an error. I would sidestep the factor problem with stringsAsFactors=FALSE in the data.frame call. You might want to reconsider the . as a missing value. If you are coming from a SAS background, you should try to get comfortable with NA or NA_character as a value. df-data.frame(q1=c(0,0,33.33,check),q2=c(0,33.33,check,9.156), q3=c(check,check,25,100),q4=c(7.123,35,100,check), stringsAsFactors=FALSE) is.na(df) - t(apply(df, 1, function(x) as.numeric(x) 10)) Warning messages: 1: In FUN(newX[, i], ...) : NAs introduced by coercion 2: In FUN(newX[, i], ...) : NAs introduced by coercion 3: In FUN(newX[, i], ...) : NAs introduced by coercion 4: In FUN(newX[, i], ...) : NAs introduced by coercion df q1q2q3q4 1 NA NA check NA 2 NA 33.33 check35 3 33.33 check25 100 4 check NA 100 check Could someone help me with this? Thanks in advance for any help. Andrija [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Stepwise Discriminant... in R
If you want to do a stepwise selection there is a function in the klaR package to do it. This is not what you are asking for, though. You want a way of finding the successive error rates as additional variables are added in the forward selection process. As far as I can see you have to do this yourself and it is a mildly interesting little exercise in R programming. Here is one possible way to do it. First you need a couple of functions: ## errorRate - function(object, ...) { if(!require(MASS)) stop(you need the MASS package installed) UseMethod(errorRate) } errorRate.lda - function(object, data = eval.parent(object$call$data), type = plug-in) { pred - predict(object, data, type = type)$class actu - eval(formula(object)[[2]], data) conf - table(pred, actu) 1 - sum(diag(conf))/sum(conf) } eRates - function(object, data = eval.parent(object$call$data), type = plug-in) { f - formula(object) r - data.frame(formula = deparse(f[[3]]), Error = errorRate(object, data, type = type)) while(length(f[[3]]) 1) { f[[3]] - f[[3]][[2]] object$call$formula - f object - update(object, data = data) r - rbind(data.frame(formula = deparse(f[[3]]), Error = errorRate(object, data, type = type)), r) } r } ## (I have made errorRate generic as it is potentially a generic sort of operation.) Now look at your trivial example (extended a bit): ## require(klaR) QRBdfa - data.frame(LANDUSE = sample(c(A, B, C), 270, rep = TRUE), Al = runif(270, 0, 125), Sb = runif(270, 0, 1), Ba = runif(270, 0, 235), Bi = runif(270, 0, 0.11), Cr = runif(270, 0, 65)) gw_obj - greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 1) ## NB large 'niveau' ## To use the functions you need an lda fit with the same formula as for the gw object and the same data argument as in the original call. (If you try to do this the way suggested in the help file for greedy.wilks the functions to be used here will not work. No dollars in formulae is always a good rule to follow.) The way greedy.wilks is written makes this a bit tricky, but unless you want to just type it in, here is a partly automated way of doing it: ## require(MASS) fit - do.call(lda, list(formula = formula(gw_obj), data = quote(QRBdfa))) ## To use the functions: errorRate(fit) ## for one error rate [1] 0.5962963 eRates(fit) ## for a sequence of error rates. formula Error 1 Ba 0.6148148 2Ba + Bi 0.6296296 3 Ba + Bi + Al 0.6074074 4 Ba + Bi + Al + Cr 0.5740741 5 Ba + Bi + Al + Cr + Sb 0.5962963 Since this example uses very artificial random data, the output will be different every time you re-create the data. Note also that the error rates are not necessarily monotonically decreasing. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ty Smith Sent: Monday, 14 March 2011 3:51 AM To: r-help@r-project.org Subject: [R] Stepwise Discriminant... in R Hello R list, I'm looking to do some stepwise discriminant function analysis (DFA) based on the minimization of Wilks' lambda in R to end up with a composite signature (of metals Al,Sb,Bi,Cr,Ba) capable of discriminating 100% of the source factors (LANDUSE: A,B,C). The Wilks' lambda portion seems straightforward. I am using the following: gw_obj - greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 0.1) gw_obj Thus determining the stepwise order of metals.But I can't seem to figure out how to coerce the DFA to give me an output with the % of factors which each successive metal (variable) correctly classifies (discriminates). e.g. StepMetal%correctly classified 1Al25 2Sb 75 3Bi89 4Cr 100 I've worked up a trivial example below. Can anyone offer any suggestions on how I might go about doing this in R? I am working in a MAC OS environment with a current version of R. Many thanks in advance! Tyler #Example library(scatterplot3d) library(klaR) Al -runif(27, 0, 125) QRBdfa - as.data.frame(Al) QRBdfa$LANDUSE - factor(c(A,A,A,B,B,B,C,C,C)) QRBdfa$Sb - runif(27, 0, 1) QRBdfa$Ba - runif(27, 0, 235) QRBdfa$Bi - runif(27, 0, 0.11) QRBdfa$Cr - runif(27, 0, 65) gw_obj - greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 0.1) gw_obj fit - lda(LANDUSE ~ Al + Sb + Bi + Cr + Ba, data = QRBdfa) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] troubles with logistic regression
It means you have selected a response variable from one data frame (unmarried.male) and a predictor from another data frame (fieder.male) and they have different lengths. You might be better off if you used the names in the data frame rather than selecting columns in a form such as 'some.data.frame[, 3]', This just confuses the issue and makes it very easy to make mistakes - as indeed you have done. Also, to fit models on subsets of the data, you do not have to create separate data frames. See the 'subset' argument of glm, which is standard for most fitting functions. This is also a way to avoid problems and would have helped you here as well. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of gked Sent: Monday, 14 March 2011 4:33 AM To: r-help@r-project.org Subject: [R] troubles with logistic regression hello everyone, I working on the dataset for my project in class and got stuck on trying to run logistic regression. here is my code: data - read.csv(file=C:/Users/fieder.data.2000.csv) # creating subset of men fieder.male-subset(data,data[,8]==1) unmarried.male-subset(data,data[,8]==1data[,6]==1) # glm fit agesq.male-(unmarried.male[,5])^2 male.sqrtincome-sqrt(unmarried.male[,9]) fieder.male.mar.glm-glm(as.factor(unmarried.male[,6])~ factor(fieder.male[,7])+fieder.male[,5]+agesq.male+ male.sqrtincome,binomial(link=logit) ) par(mfrow=c(1,1)) plot(c(0,300),c(0,1),pch= , xlab=sqrt income, truncated at 9, ylab=modeled probability of being never-married) junk- lowess(male.sqrtincome, log(fieder.male.mar.glm$fitted.values/ (1-fieder.male.mar.glm$fitted.values))) lines(junk$x,exp(junk$y)/(1+exp(junk$y))) title(main=probability of never marrying\n males, by sqrt(income)) points(male.sqrtincome[unmarried.male==0], fieder.male.mar.glm$fitted.values[unmarried.male==0],pch=16) points(male.sqrtincome[unmarried.male==1], fieder.male.mar.glm$fitted.values[unmarried.male==1],pch=1) The error says: Error in model.frame.default(formula = as.factor(unmarried.male[, 6]) ~ : variable lengths differ (found for 'factor(fieder.male[, 7])') What does it mean? Where am i making a mistake? Thank you P.S. i am also attaching data file in .csv format http://r.789695.n4.nabble.com/file/n3352356/fieder.data.2000.csv fieder.data.2000.csv -- View this message in context: http://r.789695.n4.nabble.com/troubles-with-logistic-regression-tp3352356p3352356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning date columns
Here is one possible way (I think - untested code) cData - do.call(rbind, lapply(split(data, data$prochi), function(dat) { dat - dat[order(dat$date), ] while(any(d - (diff(dat$date) = 3))) dat - dat[-(min(which(d))+1), ] dat })) (It would be courteous of you to give us your real name, by the way) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Newbie19_02 Sent: Wednesday, 9 March 2011 9:20 PM To: r-help@r-project.org Subject: [R] Cleaning date columns Hi Everyone, I have the following problem: data - structure(list(prochi = c(IND1, IND1, IND1, IND2, IND2, IND2, IND2, IND3, IND4, IND5), date_admission = structure(c(6468, 6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)), .Names = c(prochi, date_admission), row.names = c(27, 28, 21, 86, 77, 80, 1, 114, 192, 322), class = data.frame) I have records for individuals that were taken on specific dates. Some of the dates are within 3 days of each other. I want to be able to clean my date column and select the earliest of the dates that occur within 3 days of each other per individual as a single observation that represents the N observations. So for example: input: IND11987-09-17 IND1 1987-09-19 IND1 1989-05-04 output: IND11987-09-17 IND1 1989-05-04 I'm not sure where to start with this? Thanks, Nat -- View this message in context: http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3343359.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting only odd columns from a matrix
Xonly - XY[, grep(^X, dimnames(XY)[[2]])] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Nixon, Matthew Sent: Thursday, 10 March 2011 12:20 AM To: r-help@R-project.org Subject: [R] Extracting only odd columns from a matrix Hi, This might seem like a simple question but at the moment I am stuck for ideas. The columns of my matrix in which some data is stored are of this form: X1 Y1 X2 Y2 X3 Y3 ... Xn Yn with n~100. I would like to look at just the X values (i.e. odd column numbers). Is there an easy way to loop round extracting only these columns? Any help would be appreciated. Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] attr question
Erin You could use as.vector(t.test(buzz$var1, conf.level=.98)$conf.int) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erin Hodgess Sent: Monday, 7 March 2011 3:12 PM To: R help Subject: [R] attr question Dear R People: When I want to produce a small sample confidence interval using t.test, I get the following: t.test(buzz$var1, conf.level=.98)$conf.int [1] 2.239337 4.260663 attr(,conf.level) [1] 0.98 How do I keep the attr statement from printing, please? I'm sure it's something really simple. Thanks, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract rows with unique values from data.frame
Here is possibly one method (if I have understood you correctly): con - textConnection( + xloc yloc gonad indEneW Agent + 123 20 516.74 1 0.02 20.21 0.25 + 223 20 1143.20 1 0.02 20.21 0.50 + 321 19 250.00 1 0.02 20.21 0.25 + 422 15 251.98 1 0.02 18.69 0.25 + 524 18 598.08 1 0.02 18.69 0.25 + ) pop - read.table(con, header = TRUE) close(con) i - with(pop, cumsum(!duplicated(cbind(xloc, yloc k - 2 ## how many do you want? no - min(which(i == k)) pop[1:no, ] xloc yloc gonad ind Ene W Agent 1 23 20 516.74 1 0.02 20.21 0.25 2 23 20 1143.20 1 0.02 20.21 0.50 3 21 19 250.00 1 0.02 20.21 0.25 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Nicolas Gutierrez Sent: Tuesday, 8 March 2011 1:52 PM To: R help Subject: [R] extract rows with unique values from data.frame Hello! I have the data frame pop: xloc yloc gonad indEneW Agent 123 20 516.74 1 0.02 20.21 0.25 223 20 1143.20 1 0.02 20.21 0.50 321 19 250.00 1 0.02 20.21 0.25 422 15 251.98 1 0.02 18.69 0.25 524 18 598.08 1 0.02 18.69 0.25 And I need to extract the number of rows that have unique (xloc, yloc) values. For example I need 2 unique cells (xloc,yloc) so the number of rows to extract would be 3 as follows: xloc yloc gonad indEneW Agent 123 20 516.74 1 0.02 20.21 0.25 223 20 1143.20 1 0.02 20.21 0.50 321 19 250.00 1 0.02 20.21 0.25 Thanks for any help! Nico __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a count variable in R
You can probably simplify this if you can assume that the dates are in sorted order. Here is a way of doing it even if the days are in arbitrary order. The count refers to the number of times that this date has appeared so far in the sequence. con - textConnection( 01/01/2011 01/01/2011 02/01/2011 02/01/2011 02/01/2011 02/01/2011 03/01/2011 03/01/2011 03/01/2011 03/01/2011 03/01/2011 03/01/2011 03/01/2011 ) days - scan(con, what = ) close(con) X - model.matrix(~days-1) XX - apply(X, 2, cumsum) dat - data.frame(days = days, count = rowSums(X*XX)) dat ### this uses days as a character string vector. If they are actual dates, then convert them to character strings for this operation. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of JonC Sent: Friday, 4 March 2011 7:58 AM To: r-help@r-project.org Subject: [R] creating a count variable in R Hi R helpers, I'm trying to create a count in R , but as there is no retain function like in SAS I'm running into difficulties. I have the following : Date_var and wish to obtain Date_var Count_var 01/01/2011 01/01/2011 1 01/01/2011 01/01/2011 2 02/01/2011 02/01/2011 1 02/01/2011 02/01/2011 2 02/01/2011 02/01/2011 3 02/01/2011 02/01/2011 4 03/01/2011 03/01/2011 1 03/01/2011 03/01/2011 2 03/01/2011 03/01/2011 3 03/01/2011 03/01/2011 4 03/01/2011 03/01/2011 5 03/01/2011 03/01/2011 6 03/01/2011 03/01/2011 7 As can be seen above the count var is re initialised every time a new date is found. I hope this is easy. Many thanks in advance for assistance. It is appreciated. Cheers Jon -- View this message in context: http://r.789695.n4.nabble.com/creating-a-count-variable-in-R-tp3334288p3334288.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R usage survey
No. That's not answering the question. ALL surveys are for collecting information. The substantive issue is what purpose do you have in seeking this information in the first place and what are you going to do with it when you get it? Do you have some commercial purpose in mind? If so, what is it? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Harsh Sent: Friday, 4 March 2011 1:13 AM To: rex.dw...@syngenta.com Cc: r-help@r-project.org Subject: Re: [R] R usage survey Hi Rex and useRs, The purpose of the survey has been mentioned on the survey link goo.gl/jw1ig but I will also reproduce it here. - Geographical distribution of R users - Application areas where R is being used - Supporting technology being used along with R - Academic background distribution of R users The potential personally identifiable information such as name and employer name are optional fields. Actually all the fields in the survey are optional. Some of the analysis output(s) could be along the lines of :- - Usage statistics of various R packages - Distribution of R users across countries/cities - Mapping various applications to packages - Text Mining of the responses to create informative word clouds Personally, I am excited about the kind of data I will receive through this survey and the various insights that could be derived. As already mentioned, the results will be shared with the community. Thank you Rex for raising an important point. It is indeed necessary for me to personally assure the user community that the results will be shared in a manner that will not contain any personally identifiable information. Those who wish to gain access to the raw data will be provided with all the fields but not the name and employer name fields. Just out of curiosity : It is possible to get name, employer name, location, usage information and academic background details when searching for R users on LinkedIn and the many R related groups there. Does this also provide potential opportunities for misuse and outrageous analyses, since almost anyone can get onto LinkedIn and access user profiles ? Thank you for your interest and support. Regards, Harsh On Thu, Mar 3, 2011 at 8:02 PM, rex.dw...@syngenta.com wrote: Harsh, Suitably analyzed for whose purposes? One man's suitable is another's outrageous. That's why people want to see the gowns at the Oscars. Under what auspices are you conducting this survey? What do you intend to do with it? You don't give any assurance that the results you post won't have personally identifiable information. I don't get the impression that you know much about survey design. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Harsh Sent: Thursday, March 03, 2011 5:53 AM To: r-help@r-project.org Subject: [R] R usage survey Hi R users, I request members of the R community to consider filling a short survey regarding the use of R. The survey can be found at http://goo.gl/jw1ig Please accept my apologies for posting here for a non-technical reason. The data collected will be suitably analyzed and I'll post a link to the results in the coming weeks. Thank you all for your interest and for sharing your R usage information. Regards, Harsh Singhal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-conformable arrays
Here is one way. 1. make sure y.test is a factor 2. Use table(y.test, factor(PredictedTestCurrent, levels = levels(y.test)) 3. If PredictedTestCurrent is already a factor with the wrong levels, turn it back into a character string vector first. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gregory Ryslik Sent: Thursday, 3 March 2011 1:46 PM To: r-help Help Subject: [R] Non-conformable arrays Hi Everyone, I'm running some simulations where eventually I need to table the results. The problem is, that while most simulations I have at least one predicted outcome for each of the six possible categories, sometimes the algorithm assigns all the outcomes and one category is left out. Thus when I try to add the misclassification matrices I get an error. Is there a simple way I can make sure that all my tables have the same size (with a row or column of zeros) if appropriate without a messy if structure checking each condition? To be more specific, here's my line of code for the table command and the two matrices that I sometimes have. Notice that in the second matrix, the fad column is missing. Basically, I want all the columns and rows to be predetermined so that no columns/rows go missing. Thanks for your help! Kind regards, Greg table(y.test,PredictedTestCurrent): PredictedTestCurrent y.test adi car con fad gla mas adi 9 0 0 0 0 0 car 0 6 1 0 0 3 con 1 0 3 0 0 0 fad 0 0 0 2 5 4 gla 0 1 0 0 6 3 mas 0 0 0 1 4 4 PredictedTestCurrent y.test adi car con gla mas adi 8 0 0 0 0 car 0 8 0 0 1 con 2 0 3 0 0 fad 0 1 0 4 7 gla 0 0 0 3 5 mas 0 2 0 6 3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What am I doing wrong with this loop ?
Here is a start x - as.data.frame(runif(2000, 12, 38)) length(x) [1] 1 names(x) [1] runif(2000, 12, 38) Why are you turning x and y into data frames? It also looks as if you should be using if(...) ... else ... rather than ifelse(.,.,), too. You need to sort out a few issues, it seems. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of eric Sent: Thursday, 3 March 2011 1:19 PM To: r-help@r-project.org Subject: [R] What am I doing wrong with this loop ? What is wrong with this loop ? I am getting an error saying incorrect number of dimensions y[i,2] x - as.data.frame(runif(2000, 12, 38)) z -numeric(length(x)) y - as.data.frame(z) for(i in 1:length(x)) { y - ifelse(i 500, as.data.frame(lowess(x[1:i,1], f=1/9)) , as.data.frame(lowess(x[(i-499):i,1], f=1/9))) z[i] -y[i,2] } -- View this message in context: http://r.789695.n4.nabble.com/What-am-I-doing-wrong-with-this-loop-tp3332703p3332703.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Stepwise Criterion
The probability OF the residual deviance is zero. The significance level for the residual deviance according to its asymptotic Chi-squared distribution is a possible criterion, but a silly one. If you want to minimise that, just fit no variables at all. That's the best you can do. If you want to maximise it, just minimise the deviance itself, which means include all possible variables in the regression, together with as many interactions as you can as well. (Incidently R doesn't have restrictions on how many interaction terms it can handle, those are imposed my your computer.) I suggest you think again about what criterion you really want to use. Somehow you need to balance fit in the training sample against some complexity measure. AIC and BIC are commonly used criteria, but not the only ones. I suggest you start with these and see if either does the kind of job you want. Stepwise regression with interaction terms can be a bit tricky if you want to impose the marginality constraints, but that is a bigger issue. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mano Gabor Toth Sent: Wednesday, 2 March 2011 6:44 AM To: r-help@r-project.org Subject: [R] Logistic Stepwise Criterion Dear R-help members, I'd like to run a binomial logistic stepwise regression with ten explanatory variables and as many interaction terms as R can handle. I'll come up with the right R command sooner or later, but my real question is whether and how the criterion for the evaluation of the different models can be set to be the probability of the residual deviance in the Chi-Square distribution (which would be more informative of overall model fit than AIC). Thanks in advance for all your help. Kind regards, Mano Gabor TOTH MA Political Science Central European University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to prove the MLE estimators are normal distributed?
This is a purely statistical question and you should try asking it on some statistics list. This is for help with using R, mostly for data analysis and graphics. A glance at the posting guide (see the footnote below) might be a good idea. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ning Cheng Sent: Wednesday, 2 March 2011 8:21 AM To: r-help@r-project.org Subject: [R] How to prove the MLE estimators are normal distributed? Dear List, I'm now working on MLE and OSL estimators.I just noticed that the textbook argues they are joint normal distributed.But how to prove the conclusion? Thanks for your time in advance! Best, Ning __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transforming list into an array
One way to do it is to use the 'abind' package NCurvas - 10 NumSim - 15 dW - replicate(NumSim, matrix(rnorm(NCurvas * 3), NCurvas, 3), + simplify = FALSE) library(abind) DW - do.call(abind, c(dW, rev.along = 0)) dim(DW) [1] 10 3 15 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Luis Felipe Parra Sent: Monday, 28 February 2011 10:59 AM To: r-help Subject: [R] Transforming list into an array Hello. I have the following object which is a list of length NumSim with each entry being a matrix of dimensions Ncurvas x 3: dW = replicate(NumSim,cbind(rnorm(Ncurvas),rnorm(Ncurvas),rnorm(Ncurvas)),simplify=F) I would like to transform it into an array of dimension Ncurvas x 3 x NumSim. Does anybody does how to do this? or how to generate directly and array composed of independent random nomrmal numbers of dimensions Ncurvas x 3 x NumSim. Thank you Felipe Parra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combinations
You can compute the logarithm of it easily enough lchoose(54323456, 2345) [1] 25908.4 Now, what did you want to do with it? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Silverton Sent: Monday, 28 February 2011 10:38 AM To: r-help@r-project.org Subject: Re: [R] Combinations any idea how to get R to compute a combination like (54323456, 2345) ? -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted Mean By Factor Using BY
Here is the party line, perhaps by(data, data$TYPE, function(dat) with(dat, weighted.mean(MEASURE, COUNT))) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Schumacher Sent: Thursday, 24 February 2011 9:40 AM To: r-help@r-project.org Subject: Re: [R] Weighted Mean By Factor Using BY I withdraw this question, I was able to accomplish this by creating a new function. Now if only I could get the by output into a dataframe... Mike On Wed, Feb 23, 2011 at 2:25 PM, Mike Schumacher mike.schumac...@gmail.comwrote: Hello R folks, Reproducible code below - I'm trying to do a weighted mean by a factor and can't figure it out. Thanks in advance for your assistance. Mike data-data.frame(c(5,5,1,1,1), c(10,8,9,5,3), c(A,A,A,B,B)) names(data)-c(COUNT,MEASURE,TYPE) ## UNWEIGHTED MEAN BY TYPE by(data$MEASURE,data$TYPE,mean) ## WEIGHTED MEAN WITHOUT TYPE weighted.mean(x=data$MEASURE,w=data$COUNT) ## WEIGHTED MEAN BY TYPE - DOESNT WORK by(data$MEASURE,data$TYPE,weighted.mean(x=data$MEASURE,w=data$COUNT)) -- Michael Schumacher Manager, Data and Analytics ValueClick mike.schumac...@gmail.com -- Michael Schumacher mike.schumac...@gmail.com 818-851-8638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with rbind when data frame contains an date-time variable POSIXt POSIXlt
The solution is probably to make the data-time columns POSIXct: x - read.table(textConnection( + ID event.date.time + 1 '2009-07-23 00:20:00' + 2 '2009-08-18 16:25:00' + 3 '2009-08-13 08:30:00' + ), header = TRUE) y - read.table(textConnection( + ID event.date.time + 4 '2009-08-25 10:25:00' + 5 '2009-08-10 06:20:00' + 6 '2009-10-09 08:20:00' + ), header = TRUE) closeAllConnections() x - within(x, + event.date.time - as.POSIXct(as.character(event.date.time), + format = %Y-%m-%d %H:%M:%S)) y - within(y, + event.date.time - as.POSIXct(as.character(event.date.time), + format = %Y-%m-%d %H:%M:%S)) z - rbind(x, y) z ID event.date.time 1 1 2009-07-23 00:20:00 2 2 2009-08-18 16:25:00 3 3 2009-08-13 08:30:00 4 4 2009-08-25 10:25:00 5 5 2009-08-10 06:20:00 6 6 2009-10-09 08:20:00 No problems. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Andrew Anglemyer Sent: Friday, 18 February 2011 3:54 PM To: r-help@r-project.org Subject: [R] problem with rbind when data frame contains an date-time variable POSIXt POSIXlt I'm trying to rbind two data frames, both with the same columns names. One of the columns is a variable with date-time and this variable is causing the rbind to fail--giving the error Error in names(value[[jj]])[ri] - nm : 'names' attribute [7568] must be the same length as the vector [9] Is there a way to stack or rbind these two data frames even with this extended date-time variable? The class of event.date.time in each data frame is POSIXt POSIXlt. x ID event.date.time 1 2009-07-23 00:20:00 2 2009-08-18 16:25:00 3 2009-08-13 08:30:00 y ID event.date.time 4 2009-08-25 10:25:00 5 2009-08-10 06:20:00 6 2009-10-09 08:20:00 I would like to get z ID event.date.time 1 2009-07-23 00:20:00 2 2009-08-18 16:25:00 3 2009-08-13 08:30:00 4 2009-08-25 10:25:00 5 2009-08-10 06:20:00 6 2009-10-09 08:20:00 I've looked at stripping the dates and times, but it would be really helpful for my purposes to keep the extended variable date-time variable (have to eventually get 24 hours from baseline.date.time). thanks for any and all help! Andy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saturated model in binomial glm
This is a very good question. You have spotted something that not many people see and it is important. The bland assertion that the deviance can be used as a test of fit can be seriously misleading. For this data the response is clearly binary, Admitted (success) or Rejected (failure) and the other two factors are explanatory variables. Any binomial model can be fitted by expressing the data as a binary response. In this case there are sum(UCBAd$Freq) [1] 4526 4526 trials, corresponding to the individual applicants for admission. We can expand the data frame right out to this level and fit the model with the data in that form, and in this case the weights will be the default, ie. all 1's. We can also *group* the data into subsets which are homogeneous with respect to the explanatory variables. The most succinct grouping would be into 12 groups corresponding to the 2 x 6 distinct classes defined by the two explanatory variables. In this case you would specify the response either as a two-column matrix of successes/failures, or as a proportion with the totals for each of the 12 cases as weights. Another succince grouping is into 2 x 2 x 6 classes as you do in your example. In this case your response is the factor and the weights are the frequencies. For all three cases a) the estimates will be the same, and so the predictions will be identical, b) the deviance will also be the same, but c) the degrees of freedom attributed to the deviance will be different. The reason for c) is, as you have intuited, the saturated model is different. Literally, the saturated model is a model with one mean parameter for each value taken as an observation when the model is fitted. So the saturated model is *not* invariant with respect to grouping. Let's look at two of these cases computationally: UCB_Expanded - UCBAd[rep(1:24, UCBAd$Freq), 1:3] ## expand the data frame dim(UCB_Expanded) [1] 45263 ### now fit your model m1 - glm(Admit ~ Gender + Dept, binomial, UCBAd, weights = Freq) ### and the same model using the binary data form m2 - glm(Admit ~ Gender + Dept, binomial, UCB_Expanded) ### as predicted, the coefficients are identical (up to round off) coef(m1) (Intercept) GenderFemaleDeptBDeptCDeptDDeptE -0.58205140 -0.09987009 0.04339793 1.26259802 1.29460647 1.73930574 DeptF 3.30648006 coef(m2) (Intercept) GenderFemaleDeptBDeptCDeptDDeptE -0.58205140 -0.09987009 0.04339793 1.26259802 1.29460647 1.73930574 DeptF 3.30648006 ### and so are the deviances, perhaps surprisingly: deviance(m1) [1] 5187.488 deviance(m2) [1] 5187.488 ### but the degrees of freedom attributed to the deviance are different! m1$df.resid [1] 17 m2$df.resid [1] 4519 If you were to fit the model in the most succinct form, with 12 relative frequencies, then you would get the same deviance again, but the degrees of freedom would be only 5. So you need to be very careful in taking the deviance, even in binomial models, as a test of fit. The way the data are grouped is relevant. If you have two fixed models, e.g. Admit ~ Gender, and Admit ~ Gender + Dept, then The estimated coefficients, and their standard errors, vcov matrix, The deviance, and so *Differences* in deviances and *Differences* in degrees of freedom will be the same however the data are grouped, and so the usual tests and CI processes go ahead fine. But the deviance itself can be misleading as a test of fit, since the outer hypothesis, the saturated model, is not fixed and depends on grouping. For the ungrouped binary case it is *usually* misleading when taken simply at face value as chi-squared distributed under the null hypothesis. I think there is a book or two around that discusses this issue, but probably not well enough. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Giovanni Petris Sent: Thursday, 17 February 2011 7:58 AM To: r-help@r-project.org Subject: [R] Saturated model in binomial glm Hi all, Could somebody be so kind to explain to me what is the saturated model on which deviance and degrees of freedom are calculated when fitting a binomial glm? Everything makes sense if I fit the model using as response a vector of proportions or a two-column matrix. But when the response is a factor and counts are specified via the weights argument, I am kind of lost as far as what is the implied saturated model. Here is a simple example, based on the UCBAdmissions data. UCBAd - as.data.frame(UCBAdmissions) UCBAd - glm(Admit ~ Gender + Dept, family = binomial, + weights = Freq, data = UCBAd) UCBAd$deviance [1] 5187.488 UCBAd$df.residual [1] 17 I can see that the data frame UCBAd has 24 rows and using 1+1+5 parameters to fit the model leaves me with 17 degrees of freedom.
Re: [R] How to get warning about implicit factor to integer coercion?
Your complaint is based on what you think a factor should be rather than what it actually is andhow it works. The trick with R (BTW I think it's version 2.12.x rather than 12.x at this stage...) is learning to work *with* it as it is rather than making it work the way you would like it to do. Factors are a bit tricky. The are numeric objects, even if arithmetic is inhibited. f - factor(letters) is.numeric(f) ## this looks strange [1] FALSE mode(f)## but at a lower level [1] numeric Take a simple example. x - structure(1:26, names = sample(letters)) x h o u w l z a j e n k i s v t g f x c b y d m q p r 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 If you use the factor f as an index, it behaves as a numeric vector of indices: x[f] h o u w l z a j e n k i s v t g f x c b y d m q p r 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 That's just the way it is. That's the reality. You have to learn to deal with it. Sometimes a factor behaves as a character string vector, when no other interpretation would make sense. e.g. which(f == o) [1] 15 but in other cases they do not. In this case you can make the coercion explicit of course, if that is your bent: which(as.character(f) == o) [1] 15 but here there is no need. There are cases were you *do* need to make an explicit coercion, though, if you want it to behave as a character string vector, and indexing is one: x[as.character(f)] a b c d e f g h i j k l m n o p q r s t u v w x y z 7 20 19 22 9 17 16 1 12 8 11 5 23 10 2 25 24 26 13 15 3 14 4 18 21 6 If you want factors to behave universally as character string vectors, the solution is not to use factors at all but to use character string vectors instead. You can get away with a surprising amount this way. e.g. character string vectors used in model formulae are silently coerced to factors, anyway. What you need to learn is how to read in data frames keeping the character string columns as is and stop them from being made into factors at that stage. That is a lesson for another day... Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of WB Kloke Sent: Monday, 14 February 2011 8:31 PM To: r-h...@stat.math.ethz.ch Subject: [R] How to get warning about implicit factor to integer coercion? Is there a way in R (12.x) to avoid the implicit coercion of factors to integers in the context of subscripts? If this is not possible, is there a way to get at least a warning, if any coercion of this type happens, given that the action of this coercion is almost never what is wanted? Of course, in the rare case that as.integer() is applied explicitly onto a factor, the warning is not needed, but certainly not as disastrous as in the other case. Probably, in a future version of R, an option similar to that for partial matches of subscripts might be useful to change the default from maximal unsafety to safe use. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Predictions with missing inputs
With R it is always possible to shoot yourself squarely in the foot, as you seem keen to do, but R does at least often make it difficult. When you predict, you need to have values for ALL variables used in the model. Just leaving out the coefficients corresponding to absent predictors is equivalent to assuming that those coefficients are zero, and there is no basis whatever for so assuming. (In this constructed example things are different because the missing variable is a nonsense variable and the coefficient should be roughly zero, as it is, but in general that is not going to be the case.) So you need to supply some value for each of the missing predictors if you are going to use the standard prediction tools. An obvious plug is the mean of that variable in the training data, though more sophisticated alternatives would often be available. Here is a suggestion for your case. ## fit some linear model to random data x - matrix(rnorm(100*3),100,3) y - sample(1:2, 100, replace = TRUE) mydata - data.frame(y, x) library(splines)## missing from your code. mymodel - lm(y ~ ns(X1, df = 3) + X2 + X3, data = mydata) summary(mymodel) ## create new data with 1 missing input mynewdata - within(data.frame(matrix(rnorm(100*2), 100, 2)), ## add in an X3 X3 - mean(mydata$X3)) mypred - predict(mymodel, mynewdata) From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Axel Urbiz [axel.ur...@gmail.com] Sent: 12 February 2011 11:51 To: R-help@r-project.org Subject: [R] Predictions with missing inputs Dear users, I'll appreciate your help with this (hopefully) simple problem. I have a model object which was fitted to inputs X1, X2, X3. Now, I'd like to use this object to make predictions on a new data set where only X1 and X2 are available (just use the estimated coefficients for these variables in making predictions and ignoring the coefficient on X3). Here's my attempt but, of course, didn't work. #fit some linear model to random data x=matrix(rnorm(100*3),100,3) y=sample(1:2,100,replace=TRUE) mydata - data.frame(y,x) mymodel - lm(y ~ ns(X1, df=3) + X2 + X3, data=mydata) summary(mymodel) #create new data with 1 missing input mynewdata - data.frame(matrix(rnorm(100*2),100,2)) mypred - predict(mymodel, mynewdata) Thanks in advance for your help! Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if a variable is defined
!is.null(my.obj...@my.data.frame$my.var) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kushan Thakkar Sent: Friday, 11 February 2011 10:30 AM To: r-help@r-project.org Subject: [R] if a variable is defined I have an object type my.object. One of its slots contains a data frame (i.e. my.obj...@my.data.frame) .. I want to check if one of the variables exists in this data frame (i.e. my.obj...@my.data.frame$my.var) I am trying to use the exists function but can't figure out how the arguments work. Please help. So far I have tried exists(my.obj...@my.data.frame$my.var) exists(my.obj...@my.data.frame$my.var) exists(my.var,where=my.obj...@my.data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting p-value from summary output
Hi Alice, You can use pvals - summary(myprobit)$coefficients[, Pr(|z|)] Notice that if the p-value is very small, the printed version is abbreviated, but the object itself has full precision (not that it matters). Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Allie818 Sent: Friday, 11 February 2011 9:46 AM To: r-help@r-project.org Subject: [R] Getting p-value from summary output I can get this summary of a model that I am running: summary(myprobit) Call: glm(formula = Response_Slot ~ trial_no, family = binomial(link = probit), data = neg_data, na.action = na.pass) Deviance Residuals: Min 1Q Median 3Q Max -0.9528 -0.8934 -0.8418 1.4420 1.6026 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.340528 0.371201 -0.9170.359 trial_no-0.005032 0.012809 -0.3930.694 (Dispersion parameter for binomial family taken to be 1) Null deviance: 62.687 on 49 degrees of freedom Residual deviance: 62.530 on 48 degrees of freedom AIC: 66.53 Number of Fisher Scoring iterations: 4 But I would like to get the p-value [column heading Pr(|z|)] for the esimate. I can get the coefficient estimates with myprobit$coefficients. Is there something similar to get the p-value? Thank you in advance. -- View this message in context: http://r.789695.n4.nabble.com/Getting-p-value-from-summary-output-tp3300503p3300503.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor.scores
The function factor.scores does not inherit anything. It is a generic function that provieds methods for a number of classes, including those you mention. (The terminology is important if you are to understand what is going on here): library(ltm) Loading required package: MASS Loading required package: msm Loading required package: mvtnorm Loading required package: polycor Loading required package: sfsmisc This is package 'ltm' version '0.9-5' find(factor.scores) [1] package:ltm factor.scores function (object, ...) { UseMethod(factor.scores) } environment: namespace:ltm methods(factor.scores) [1] factor.scores.gpcm factor.scores.grm factor.scores.ltm factor.scores.rasch [5] factor.scores.tpm So what you have to do, if you want factor.scores() to work on objects that you create is two things: 1. Give your objects an S3 class attribute, say bifactor, and 2. Write your own S3 method function factor.scores.bifactor - function(object, z, y, x, ...) { } to perform the task. The name of the function has to be factor.scores.bifactor and the name of the first formal argument has to be object and you must have a ... in the argument list, but from that point on you have a pretty free hand. What you probably want to do is use some of the existing method functions to do the hard work once you have munged you object into a form that makes that possible. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of James Lewis Sent: Thursday, 10 February 2011 12:09 PM To: r-help@R-project.org Subject: [R] factor.scores The function factor.scores is used with package ltm and others to estimate IRT type scores for various models. It inherits objects of class grm, gpcm and a few others. What I would like to do is to use the factor.scores function, but feed it my own item parameters (from a bifactor model where the 2PL parameters are adjusted for the bifactor structure). Does anybody have an idea of how this might be done? I can, of course, create a list, matrix or other object containing the item parameters, but factor.scores only inherits particular object classes. Any thoughts would be great. Thanks, Jimmy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple .csv files
You want advice? 1. Write sentences that contain a subject and where appropriate, an object as well. This makes your email just that bit more polite. This list is not a paid service. 2. The sheets may have variables in common, but do they have the same name in both, and the same class, and some values in common? You do not give us very much to go on. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Benjamin Caldwell Sent: Thursday, 10 February 2011 1:22 PM To: r-help Subject: [R] merge multiple .csv files Am trying to merge about 15 .csv tables - tried a test run but came up with 0 rows (no data for each variable/column head) CAHSEE.EA.feb.2009-read.csv(2009 CAHSEE EA feb 2009.csv, header=TRUE) CAHSEE.IM.MATH.2009-read.csv(2009 CAHSEE Impact Math.csv, header=TRUE) testmerge-merge(CAHSEE.EA.feb.2009,CAHSEE.IM.MATH.2009) testmerge [1] Grade LocalStudentID MathPassed MathScaleScore SchoolCode [6] LastName FirstName ELAPassed ELAScaleScore MathTestDate 0 rows (or 0-length row.names) I know several variables are shared in both sheets. Please advise. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newb Prediction Question using stepAIC and predict(), is R wrong?
Using complex names, like res[, 3+i] or res$var, in the formula for a model is a very bad idea, especially if eventually you want eventualluy to predict to new data. (In fact it won't work, so that makes is very bad indeed.) So do not use '$' or '[..]' terms in model formulae - this is going to cause problems when it comes to predict, because your formula will not associate with the names it has in its formula in the new data frame. When you think about it, this is obvious. In your case you will have to identify the actual names and build the formula that way. So your model will be fitted with a call something like fm - lm(paid ~ x3i + xi + Sun + Fri + Sat, data = reservesub) (but you will have to use the real names for the first two, of course). If you are doing this in some kind of loop, there are ways to handle it without using terms such as reservesub[, 3+i] but they are not all that simple. Still, if you want to predict from the model to new data, there is no way round it. Interactions are inculded generally with the * or the / linear model operators. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of BSanders Sent: Thursday, 10 February 2011 2:49 PM To: r-help@r-project.org Subject: [R] Newb Prediction Question using stepAIC and predict(), is R wrong? I'm using stepAIC to fit a model. Then I'm trying to use that model to predict future happenings. My first few variables are labeled as their column. (Is this a problem?) The dataframe that I use to build the model is the same as the data I'm using to predict with. Here is a portion of what is happening.. This is the value it is predicting = [1] 9.482975 Summary of the model Call: lm(formula = reservesub$paid ~ reservesub[, 3 + i] + reservesub$grads[, i] + reservesub$Sun + reservesub$Fri + reservesub$Sat) Residuals: Min 1Q Median 3Q Max -15.447 -4.993 -1.090 3.910 27.454 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)5.713701.46449 3.902 0.000149 *** reservesub[, 3 + i]1.008680.01643 61.391 2e-16 *** reservesub$grads[, i] 0.446490.12131 3.681 0.000333 *** reservesub$Sun 8.636061.95100 4.426 1.93e-05 *** reservesub$Fri 3.769282.00079 1.884 0.061682 . reservesub$Sat 4.031032.12754 1.895 0.060225 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.842 on 138 degrees of freedom (131 observations deleted due to missingness) Multiple R-squared: 0.9794, Adjusted R-squared: 0.9787 F-statistic: 1312 on 5 and 138 DF, p-value: 2.2e-16 Here is the data that is being fed into predicted[p] = predict.(stepsaicguess[[p]], newdata = reservesubpred[p,]) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 paid Mon Tue Wed Thu 276 10/3/2010 155 84 76 68 64 63 53 42 42 42 42 38 38 38 35 31 31 NA 84 0 0 0 0 Fri Sat Sun grads.1 grads.2 grads.3 grads.4 grads.5 grads.6 grads.7 0 01 8 4 1 10 11 0 0 grads.8 grads.9 grads.10 grads.11 grads.12 grads.13 grads.14 0 400340 In this case, i = 1, so I calculate the predicted value should be 5.7137+1.00868*84+.44649*8+1*8.636+0*3.769+0*4.03=102 But, R is giving me 9.482975 for a predicted value .. (Which, interestingly is 5.7137+3.769*1) (Intercept+Sat) Another question I have is, if I were to include interactions in this model, would I have to make those variables in my prediction dataframe, or would R 'know' what to do? Thanks in advance for your expert assistance. -- View this message in context: http://r.789695.n4.nabble.com/Newb-Prediction-Question-using-stepAIC-and-predict-is-R-wrong-tp3298569p3298569.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] leap year and order function
yearLength - function(year) 365 + (year %% 4 == 0) yearLength(1948:2010) [1] 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 [22] 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 [43] 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bobby Lee Sent: Monday, 31 January 2011 2:23 PM To: R-help@r-project.org Subject: [R] leap year and order function im trying to write a for loop so that every leap year, the number of days becomes to 366 instead of 365. could someone help me out? and also, this set of data has 99.99s I set all the 99.99 ==NA. however, when im doing the order function to find the max value of that year, it still reads 99.99 as the max value. Thank you very much maxday - matrix(NA,63,5) for (a in 1948:2010){ maxday[,1]-1948:2010 yearly-na.omit(dat.mat[dat.mat[,1]==a,]) maxday[a-1947,2]-yearly[order(yearly[,4])[*365*],2] maxday[a-1947,3]-yearly[order(yearly[,4])[*365*],3] maxday[63,2]-yearly[order(yearly[,4])[127],2] maxday[63,3]-yearly[order(yearly[,4])[127],3] maxday[a-1947,4]-max(yearly[,4]) maxday[,5]-len[,2] } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected Gap in simple line plot
You do have missing values. Setting xlim does not subset the data. How about link - http://processtrends.com/files/RClimate_CTS_latest.csv; cts - read.csv(link, header = TRUE) scts - subset(cts, !is.na(GISS) !is.na(cts)) ## remove defectives plot(GISS ~ yr_frac, scts, type = l, xlim = c(1982, 1983), xaxs = i, yaxs = i) points(GISS ~ yr_frac, scts, type = p, col = red) ? Bill Venables -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of D Kelly O'Day Sent: Friday, 21 January 2011 11:12 AM To: r-help@r-project.org Subject: [R] Unexpected Gap in simple line plot I am getting an unexpected gap in a simple plot of monthly data series. I have created a csv file of monthly climate observations that I store on-line. When I download the csv file and plot one of the series, I get a gap even though there is data for the missing point. Here is a snippet to show the problem. ## Strange plot results link - http://processtrends.com/files/RClimate_CTS_latest.csv; cts - read.csv(link, header=T) ## Simple line plot - gap for no reason plot(cts$yr_frac, cts$GISS, type=l, xlim=c(1982, 1983),xaxs=i, yaxs=i) ## May, 1982 observation missing ## Add points to plot in red, to see if May shows up points(cts$yr_frac, cts$GISS, type=p, col=red) ## yes, May shows up in points ## Look at cts data.frame. No obvious problems to me?? small - subset(cts, cts$yr_frac 1982 cts$yr_frac 1983) small[,c(1,3)] The same problem occurs in the other data vectors (HAD, NOAA, RSS, UAH, etc) I have not been able to figure why I am getting the gap and how to show a continuous line through May, 1982 data points. Any suggestions? D Kelly O'Day http://chartsgraphs.worpress.com http://chartsgraphs.wordprss.com -- View this message in context: http://r.789695.n4.nabble.com/Unexpected-Gap-in-simple-line-plot-tp3228853p3228853.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Log difference in a dataframe column
lag and as.ts are separate operations (which in fact commute) lag(as.ts(1:10), 1) Time Series: Start = 0 End = 9 Frequency = 1 [1] 1 2 3 4 5 6 7 8 9 10 as.ts(lag(1:10, 1)) Time Series: Start = 0 End = 9 Frequency = 1 [1] 1 2 3 4 5 6 7 8 9 10 You do NOT need to call as.ts at all it that's your bent: lag(1:10, 1) [1] 1 2 3 4 5 6 7 8 9 10 attr(,tsp) [1] 0 9 1 but as lag is an operation designed for time series objects, it's a pretty good bet that to make any sense of the result you need to start (or end up) with a time series object. The key is the concept of an 'object' in R. You need to know what kind of objects you are dealing with. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of eric Sent: Wednesday, 19 January 2011 1:08 PM To: r-help@r-project.org Subject: Re: [R] Log difference in a dataframe column Just learning so excuse me if I'm being too basic here. But I'm wondering how should I know that as.ts would be needed for lag ? Is there a thought process or way to inspect that I should have gone through to know that log would work on y[,5] but lag would not work on [,5] ? Is the general rule that lag is not in the base package and therefore would not work ? Thanks for the comments -- View this message in context: http://r.789695.n4.nabble.com/Log-difference-in-a-dataframe-column-tp3221225p3224478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot continuous data vs clock time
plot(y~x, type=p, xlim = x[c(2,4)]) ? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of wangxipei Sent: Tuesday, 18 January 2011 1:27 PM To: r-help Subject: [R] plot continuous data vs clock time Dear R users, I have a question about ploting clock time, the example is as below: y-seq(from=1, to=30, by=5) x-c(0:01,1:20, 8:40, 9:25, 15:30, 21:23) x-as.POSIXct(strptime(paste(x),%H:%M)) plot(y~x, type=p) I got the plot, but if I want to plot the x range from 1:20 to 9:25, how can I set the xlim argument? Any suggestions about the clock time plotting would be appreciated. Xipei Wang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data prep question
Here is one way Here is one way: con - textConnection( + ID TIMEOBS + 001 220023 + 001 240011 + 001 320010 + 001 450022 + 003 3900 45 + 003 5605 32 + 005 180056 + 005 190034 + 005 230023) dat - read.table(con, header = TRUE, + colClasses = c(factor, numeric, numeric)) closeAllConnections() tmp - lapply(split(dat, dat$ID), + function(x) within(x, TIME - TIME - min(TIME))) split(dat, dat$ID) - tmp dat ID TIME OBS 1 0010 23 2 001 200 11 3 001 1000 10 4 001 2300 22 5 0030 45 6 003 1705 32 7 0050 56 8 005 100 34 9 005 500 23 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Matthew Strother [rstro...@gmail.com] Sent: 16 January 2011 07:26 To: r-help@r-project.org Subject: [R] data prep question I have a data set with several thousand observations across time, grouped by subject (example format below) ID TIMEOBS 001 220023 001 240011 001 320010 001 450022 003 390045 003 560532 005 180056 005 190034 005 230023 ... I would like to identify the first time for each subject, and then subtract this value from each subsequent time. However, the number of observations per subject varies widely (from 1 to 20), and the intervals between times varies widely. Is there a package that can help do this, or a loop that can be set up to evaluate ID, then calculate the values? The outcome I would like is presented below. ID TIMEOBS 001 0 23 001 200 11 001 100010 001 230022 003 0 45 003 170532 005 0 56 005 100 34 005 500 23 ... Any help appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rounding variables in a data frame
If you can specify the omitted columns as numbers there is a quick way to do it. e.g. d d1 d2d3d4 1 9.586524 4.833417 0.8142588 -3.237877 2 11.481521 6.536360 2.3361894 -4.042314 3 10.243192 5.506440 2.0443788 -3.478543 4 9.969548 6.159666 3.0449121 -4.827746 5 9.193832 4.085796 3.0176759 -2.199658 6 10.821571 5.442870 1.2630331 -3.483939 7 9.288714 5.756906 1.2432079 -4.387761 8 8.258251 7.729755 0.9170335 -5.416554 9 10.371080 6.234921 2.4462774 -4.433582 10 9.378106 5.389561 4.0592613 -3.433946 out - d out[-3] - round(d[-3]) out d1 d2d3 d4 1 10 5 0.8142588 -3 2 11 7 2.3361894 -4 3 10 6 2.0443788 -3 4 10 6 3.0449121 -5 5 9 4 3.0176759 -2 6 11 5 1.2630331 -3 7 9 6 1.2432079 -4 8 8 8 0.9170335 -5 9 10 6 2.4462774 -4 10 9 5 4.0592613 -3 If you want to specify the column names as a character vector, this is almost as easy. e.g. omit - c(d1, d3) leave - match(omit, names(d)) out - d out[-leave] - round(d[-leave]) out d1 d2d3 d4 1 9.586524 5 0.8142588 -3 2 11.481521 7 2.3361894 -4 3 10.243192 6 2.0443788 -3 4 9.969548 6 3.0449121 -5 5 9.193832 4 3.0176759 -2 6 10.821571 5 1.2630331 -3 7 9.288714 6 1.2432079 -4 8 8.258251 8 0.9170335 -5 9 10.371080 6 2.4462774 -4 10 9.378106 5 4.0592613 -3 Bill Venables From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Pete B [peter.breckn...@bp.com] Sent: 15 January 2011 14:53 To: r-help@r-project.org Subject: [R] Rounding variables in a data frame Hi All I am trying to use the round function on some columns of a dataframe while leaving others unchanged. I wish to specify those columns to leave unchanged. My attempt is below - here, I would like the column d3 to be left but columns d1, d2 and d4 to be rounded to 0 decimal places. I would welcome any suggestions for a nicer way of doing this. d1= rnorm(10,10) d2= rnorm(10,6) d3= rnorm(10,2) d4= rnorm(10,-4) d = data.frame(d1,d2,d3,d4) x= NULL for (i in 1:ncol(d)){ if (colnames(d)[i] ==d3){x[i] = d[i] } else { x[i] = round(d[i])} out = do.call(cbind,x) } colnames(out) = colnames(d) Thanks and regards Pete -- View this message in context: http://r.789695.n4.nabble.com/Rounding-variables-in-a-data-frame-tp3218729p3218729.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted least squares regression for an exponential decay function
nls in the stats package. ?nls From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Erik Thulin [ethu...@gmail.com] Sent: 15 January 2011 16:16 To: r-help@r-project.org Subject: [R] Weighted least squares regression for an exponential decay function Hello, I have a data set of data which is best fit by an exponential decay function. I would like to use a nonlinear weighted least squares regression. What function should I be using? Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems creating a PNG file for a dendrogram: Error in plot.window(...) : need finite 'xlim' values
I very much doubt your first example does work. the value of plot() is NULL which if you plot again will give the error message you see in your second example. What where you trying to achieve doing p - plot(hc) plot(p) ### this one is trying to plot NULL ? Here is an example (such as you might have given, according to the posting guide): x - matrix(rnorm(500*5), 500, 5) dc - as.dist(1-cor(x)) hc - hclust(dc) p - plot(hc) plot(p) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf Look familiar? This is why: p NULL -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Richard Vlasimsky Sent: Wednesday, 12 January 2011 10:01 AM To: r-help@r-project.org Subject: [R] Problems creating a PNG file for a dendrogram: Error in plot.window(...) : need finite 'xlim' values Has anyone successfully created a PNG file for a dendrogram? I am able to successfully launch and view a dendrogram in Quartz. However, the dendrogram is quite large (too large to read on a computer screen), so I am trying to save it to a file (1000x4000 pixels) for viewing in other apps. However, whenever I try to initiate a PNG device, I get a need finitite 'xlim' values error. Here is some example code to illustrate my point: cor.matrix - cor(mydata,method=pearson,use=pairwise.complete.obs); distance - as.dist(1.0-cor.matrix); hc - hclust(distance); p - plot(hc); plot(p); #This works! Plot is generated in quartz no problem. #Now, try this: png(filename=delme.png,width=4000,height=1000); cor.matrix - cor(mydata,method=pearson,use=pairwise.complete.obs); distance - as.dist(1.0-cor.matrix); hc - hclust(distance); p - plot(hc); plot(p); #Error in plot.window(...) : need finite 'xlim' values #In addition: Warning messages: #1: In min(x) : no non-missing arguments to min; returning Inf #2: In max(x) : no non-missing arguments to max; returning -Inf #3: In min(x) : no non-missing arguments to min; returning Inf #4: In max(x) : no non-missing arguments to max; returning -Inf This is the exact same code, only a prior call to png() causes the seemingly unrelated xlim to fail. Why is this? Thanks, Richard Vlasimsky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Factors -- Sorting x-axis
That rather depends on what kind of plot you want to use. Here is one option that you can use without any changes: ## con - textConnection( Months Prec 1 Jan 102.1 2 Feb69.7 3 Mar44.7 4 Apr32.1 5 May24.0 6 Jun18.7 7 Jul14.0 8 Aug20.0 9 Sep32.4 10Oct58.9 11Nov94.5 12Dec 108.2 ) tab - read.table(con) closeAllConnections() rm(con) with(tab, barplot(Prec, names = as.character(Months))) # ## If you want to adjust the factor so that the levels remain in natural months order, then # tab - within(tab, Months - factor(Months, levels = month.abb)) with(tab, barplot(Prec, names = levels(Months))) ## From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Taylor, Eric HLS:EX [eric.tay...@gov.bc.ca] Sent: 07 January 2011 09:13 To: 'r-h...@stat.math.ethz.ch' Subject: [R] Plotting Factors -- Sorting x-axis Hello; How do I plot these data in R without the Months being ordered alphabetically? Months Prec 1 Jan 102.1 2 Feb69.7 3 Mar44.7 4 Apr32.1 5 May24.0 6 Jun18.7 7 Jul14.0 8 Aug20.0 9 Sep32.4 10Oct58.9 11Nov94.5 12Dec 108.2 Eric Eric Taylor, Air Quality Meteorologist, Health Protection, Ministry of Health Services [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova vs aov commands for anova with repeated measures
lm() and aov() are not fully equivalent. They both fit linear models, but they use different algorighms, and this allows aov, for example, to handle some simple multistratum models. The algorithm used by lm does not allow this, but it has other advantages for simpler models. If you want to fit a multistratum model, such as a repeated measures model, you need to use aov. When it comes to finding the residuals, you have to be explicit about which residuals you mean, too. You get residuals for each stratum in a multistratum model. Using plain old resid() will not work as that function can only be used when there is only one kind of residuals vector defined. (it could me made to do something sensible, but that's another issue. Right now, it doesn't.) The function proj() can be used on a fitted model object to obtain the projections, at each stratum, on to the subspaces defined by the terms in the model, and includes the residuals (the projection onto the orthogonal complement of the model space in R^n) as the final column of the matrix of projections. These are easy to dig out and you can analyse away. See ?proj for an example of its use, but you will need to dig out the appropriate column yourself. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Frodo Jedi [frodo.j...@yahoo.com] Sent: 08 January 2011 01:51 To: r-help@r-project.org Subject: [R] anova vs aov commands for anova with repeated measures Dear all, I need to understand a thing in the beheaviour of the two functions aov and anova in the following case involving an analysis of ANOVA with repeated measures: If I use the folowing command I don´t get any problem: aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) summary(aov1) Instead if I try to fit the same model for the regression I get an error: fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) Error in eval(expr, envir, enclos) : could not find function Error so I cannot run the command anova(fit1) afterwards. I want to use fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) because I want to analyse the residuals in order to check normality, and see if the anova assumption of normality still holds. Could you please help me in understanding how to do this? Thanks in advance Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packagename:::functionname vs. importFrom
If you use ::: to access non-exported functions, as Frank confesses he does, then you can't complain if in the next release of the package involved the non-exported objects are missing and things are being done another way entirely. That's the deal. On the other hand, sometimes package authors do not envisage all the ways their package will be used and neglecting to export some object is mostly because the author simply did not anticipate that anyone would ever need to use it. But sometimes they do. A common case is when you need to do some operations very efficiently and there are simplifications in the input of which you can take advantage to cut down on the overheads. In that case you usually need the cut-down (non-exported) workhorse rather than the (exported) show-pony front end. The documentation suggests that if you ever need to use ::: perhaps you should be contacting the package maintainer to have the article in question exported. This makes a lot of sense, but it can also creates quite a bit of work for the maintainers, too, if they agree to do it. It's a very grey area, in my experience. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Hadley Wickham Sent: Tuesday, 4 January 2011 3:06 PM To: Frank Harrell Cc: r-help@r-project.org Subject: Re: [R] packagename:::functionname vs. importFrom Correct. I'm doing this because of non-exported functions in other packages, so I need ::: But you really really shouldn't be doing that. Is there a reason that the package authors won't export the functions? I'd still appreciate any insight about whether importFrom in NAMESPACE defers package loading so that if the package is not actually used (and is not installed) there will be no problem. Imported packages need to be installed - but it's the import vs. suggests vs. depends statement in DESCRIPTION that controls this behaviour, not the namespace. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing column names
You don't give us much to go on, but some variant of country - c(US, France, UK, NewZealand, Germany, Austria, Italy, Canada) result - read.csv(result.csv, header = FALSE) names(result) - country should do what you want. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Vincy Pyne [vincy_p...@yahoo.ca] Sent: 31 December 2010 16:07 To: r-help@r-project.org Subject: [R] Changing column names Dear R helpers Wish you all a very Happy and Prosperous New Year 2011. I have following query. country = c(US, France, UK, NewZealand, Germany, Austria, Italy, Canada) Through some other R process, the result.csv file is generated as result.csv var1 var2 var3 var4var5var6 var7 var8 1 25 452992 108 105 65 56 2 801328338 38 11 47 74 3 135 117456 74 74 74 29 I need the country names to be column heads i.e. I need an output like result_new USFrance UK NewZealand Germany Austria Italy Canada 1 25 45 29 92 108105 65 56 2 80132 83 38 38 11 47 74 3 135 11 74 56 74 74 74 29 The number of countries i.e. length(country) matches with total number of variables (i.e. no of columns in 'result.csv'). One way of doing this is to use country names as column names while writing the 'result.csv' file. write.csv(data.frame(US = ..., France = ...), 'result.csv', row.names = FALSE) However, the problem is I don't know in what order the country names will appear and also there could be addition or deletion of some country names. Also, if there are say 150 country names, the above way (i.e. writing.csv) of defining the column names is not practical. Basically I want to change the column heads after the 'result.csv' is generated. Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] access a column of a dataframe without qualifying the name of the column
Here is an alternaive approach that is closer to that used by lm and friends. df - data.frame(x=1:10,y=11:20) test - function(col, dat) eval(substitute(col), envir = dat) test(x, df) [1] 1 2 3 4 5 6 7 8 9 10 test(y, df) [1] 11 12 13 14 15 16 17 18 19 20 There is a slight added bonus this way test(x+y+1, df) [1] 13 15 17 19 21 23 25 27 29 31 (Well, I did say 'slight'.) Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of David Winsemius [dwinsem...@comcast.net] Sent: 30 December 2010 10:44 To: John Sorkin Cc: r-help@r-project.org Subject: Re: [R] access a column of a dataframe without qualifying the name of the column On Dec 29, 2010, at 7:11 PM, John Sorkin wrote: I am trying to write a function that will access a column of a data frame without having to qualify the name of the data frame column as long as the name of the dataframe is passed to the function. As can be seen from the code below, my function is not working: Not sure what the verb qualify means in programming. Quoting? df - data.frame(x=1:10,y=11:20) df test - function(column,data) { print(data$column) } test(x,df) I am trying to model my function after the way that lm works where one needs not qualify column names, i.e. df - data.frame(x=1:10,y=11:20) test - function(column,dat) { print(colname - deparse(substitute(column))) + dat[[colname]] + } test(x,df) [1] x [1] 1 2 3 4 5 6 7 8 9 10 -- David. fit1- lm(y~x,data=df) John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped: 6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filling up holes
Dear 'analyst41' (it would be a courtesy to know who you are) Here is a low-level way to do it. First create some dummy data allDates - seq(as.Date(2010-01-01), by = 1, length.out = 50) client_ID - sample(LETTERS[1:5], 50, rep = TRUE) value - 1:50 date - sample(allDates) clientData - data.frame(client_ID, date, value) At this point clientData has 50 rows, with 5 clients, each with a sample of datas. Everything is in random order execept value. Now write a little function to fill out a subset of the data consisting of one client's data only: fixClient - function(cData) { + dateRange - range(cData$date) + dates - seq(dateRange[1], dateRange[2], by = 1) + fullSet - data.frame(client_ID = as.character(cData$client_ID[1]), + date = dates, value = NA) + + fullSet$value[match(cData$date, dates)] - cData$value + fullSet + } Now split up the data, apply the fixClient function to each section and re-combine them again: allData - do.call(rbind, +lapply(split(clientData, clientData$client_ID), fixClient)) Check: head(allData) client_ID date value A.1 A 2010-01-0436 A.2 A 2010-01-0518 A.3 A 2010-01-06NA A.4 A 2010-01-07NA A.5 A 2010-01-08NA A.6 A 2010-01-0949 Seems OK. At this point the data are in sorted order by client and date, but that should not matter. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of analys...@hotmail.com Sent: Wednesday, 29 December 2010 10:45 AM To: r-help@r-project.org Subject: [R] filling up holes I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression for grouped data
library(nlme) lmList(y ~ x | factor(ID), myData) This gives a list of fitted model objects. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Entropi ntrp Sent: Wednesday, 29 December 2010 12:24 PM To: r-help@r-project.org Subject: [R] linear regression for grouped data Hi, I have been examining large data and need to do simple linear regression with the data which is grouped based on the values of a particular attribute. For instance, consider three columns : ID, x, y, and I need to regress x on y for each distinct value of ID. Specifically, for the set of data corresponding to each of the 4 values of ID (76,111,121,168) in the below data, I should invoke linear regression 4 times. The challenge is that, the length of the ID vector is around 2 and therefore linear regression must be done automatically for each distinct value of ID. IDx y 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168 37739 29.7 168 37746 97.4 I was wondering whether there is an easy way to group data based on the values of ID in R so that linear regression can be done easily for each group determined by each value of ID. Or, is the only way to construct loops with 'for' or 'while' in which a matrix is generated for each distinct value of ID that stores corresponding values of x and y by screening the entire ID vector? Thanks in advance, Yasin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] monthly median in a daily dataset
I find this function useful for digging out months from Date objects Month - function(date, ...) factor(month.abb[as.POSIXlt(date)$mon + 1], levels = month.abb) For this little data set below this is what it gives with(data, tapply(value, Month(date), median, na.rm = TRUE)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec NANANANANANANANANANA 0.035 0.030 Here is another useful little one: Year - function(date, ...) as.POSIXlt(date)$year + 1900 So if you wanted the median by year and month you could do with(data, tapply(value, list(Year(date), Month(date)), median, na.rm = TRUE)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2008 NA NA NA NA NA NA NA NA NA NA 0.035 0.03 (The result is a matrix, which in this case has only one row, of course.) See how you go. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of HUXTERE Sent: Monday, 20 December 2010 8:32 AM To: r-help@r-project.org Subject: [R] monthly median in a daily dataset Hello, I have a multi-year dataset (see below) with date, a data value and a flag for the data value. I want to find the monthly median for each month in this dataset and then plot it. If anyone has suggestions they would be greatly apperciated. It should be noted that there are some dates with no values and they should be removed. Thanks Emily print ( str(data$flow$daily) ) 'data.frame': 16071 obs. of 3 variables: $ date :Class 'Date' num [1:16071] -1826 -1825 -1824 -1823 -1822 ... $ value: num NA NA NA NA NA NA NA NA NA NA ... $ flag : chr ... NULL 5202008-11-01 0.034 1041 2008-11-02 0.034 1562 2008-11-03 0.034 2083 2008-11-04 0.038 2604 2008-11-05 0.036 3125 2008-11-06 0.035 3646 2008-11-07 0.036 4167 2008-11-08 0.039 4688 2008-11-09 0.039 5209 2008-11-10 0.039 5730 2008-11-11 0.038 6251 2008-11-12 0.039 6772 2008-11-13 0.039 7293 2008-11-14 0.038 7814 2008-11-15 0.037 8335 2008-11-16 0.037 8855 2008-11-17 0.037 9375 2008-11-18 0.037 9895 2008-11-19 0.034B 10415 2008-11-20 0.034B 10935 2008-11-21 0.033B 11455 2008-11-22 0.034B 11975 2008-11-23 0.034B 12495 2008-11-24 0.034B 13016 2008-11-25 0.034B 13537 2008-11-26 0.033B 14058 2008-11-27 0.033B 14579 2008-11-28 0.033B 15068 2008-11-29 0.034B 15546 2008-11-30 0.035B 5212008-12-01 0.035B 1042 2008-12-02 0.034B 1563 2008-12-03 0.033B 2084 2008-12-04 0.031B 2605 2008-12-05 0.031B 3126 2008-12-06 0.031B 3647 2008-12-07 0.032B 4168 2008-12-08 0.032B 4689 2008-12-09 0.032B 5210 2008-12-10 0.033B 5731 2008-12-11 0.033B 6252 2008-12-12 0.032B 6773 2008-12-13 0.031B 7294 2008-12-14 0.030B 7815 2008-12-15 0.030B 8336 2008-12-16 0.029B 8856 2008-12-17 0.028B 9376 2008-12-18 0.028B 9896 2008-12-19 0.028B 10416 2008-12-20 0.027B 10936 2008-12-21 0.027B 11456 2008-12-22 0.028B 11976 2008-12-23 0.028B 12496 2008-12-24 0.029B 13017 2008-12-25 0.029B 13538 2008-12-26 0.029B 14059 2008-12-27 0.030B 14580 2008-12-28 0.030B 15069 2008-12-29 0.030B 15547 2008-12-30 0.031B 15851 2008-12-31 0.031B -- View this message in context: http://r.789695.n4.nabble.com/monthly-median-in-a-daily-dataset-tp3094917p3094917.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use generalised additive model to plot curve
Dear Lurker, If all you art trying to do is to plot something, isn't all you need something like the following? x - c( 30, 50, 80, 90, 100) y - c(160, 180, 250, 450, 300) sp - spline(x, y, n = 500) plot(sp, type = l, xlab = x, ylab = y, las = 1, main = A Spline Interpolation) points(x, y, pch = 3, col = red, lwd = 2) Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of e-letter Sent: Wednesday, 15 December 2010 8:37 AM To: r-help@r-project.org Subject: [R] Use generalised additive model to plot curve Readers, I have been reading 'the r book' by Crawley and think that the generalised additive model is appropriate for this problem. The package 'gam' was installed using the command (as root) install.package(gam) ... library(gam) library(gam) Loading required package: splines Loading required package: akima library(mgcv) This is mgcv 1.3-25 Attaching package: 'mgcv' The following object(s) are masked from package:gam : gam, gam.control, gam.fit, plot.gam, predict.gam, s, summary.gam x-c(30,50,80,90,100) y-c(160,180,250,450,300) model-gam(y~s(x)) Error in smooth.construct.tp.smooth.spec(object, data, knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom The objective is to plot y against x, finally to produce a graph with a smooth curve (and then remove the data points). What is my mistake please? yours, r251 gnu/linux mandriva2008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LaTeX, MiKTeX, LyX: A Guide for the Perplexed
A new fortune is born? Sharing LaTeX documents with people using word processors only is no more difficult than giving driving directions to someone who is blindfolded and has all 4 limbs tied behind their back. Collaboration with people who insist on using programs that process their words much like a food processor processes food is the one legitimate reason to not use LaTeX (but untying them and removing the blindfold is much better). Gets my vote. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Greg Snow Sent: Wednesday, 8 December 2010 2:24 PM To: Paul Miller; r-help@r-project.org Subject: Re: [R] LaTeX, MiKTeX, LyX: A Guide for the Perplexed See inline below -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Paul Miller Sent: Tuesday, December 07, 2010 4:29 PM To: r-help@r-project.org Subject: [R] LaTeX, MiKTeX, LyX: A Guide for the Perplexed Hello Everyone, Been learning R over the past several months. Read several books and have learned a great deal about data manipulation, statistical analysis, and graphics. Now I want to learn how to make nice looking documents and about literate programming. My understanding is that R users normally do this using LaTeX, MiKTeX, LyX, etc. in conjuction with Sweave. An alternative might be to use the R2wd package to create Word documents. So I guess I have four questions: 1. How do I choose between the various options? Why would someone decide to use LaTeX instead of MiKTeX or vice versa for example? MikTeX is a flavor of LaTeX for windows, so there is not much to decide between them, MikTeX is LaTeX (though there are other implementations on windows) 2. What are the best sources of information about LaTeX, MiKTeX, LyX, etc.? There are many documents about LaTeX, including books. But I would suggest starting with the Not So Short Introduction to LaTeX that is free and comes with MikTeX (and probably the others). LyX should have its own docs with it. There are several other guides that come free that would be good to read after that. If you want a paper book then I would suggest the original by Lamport. 3. What is the learning curve like for each of these? What do you get for the time you put in learning something that is more difficult? That depends on your background, if you have never worked from the command line or with a markup language, then it is pretty steep, if you have worked with those types of tools then it is not as bad. But either way it is worth it. When I first learned LaTeX I was tempted to go back and rewrite my master's thesis in it instead of the word processor. I did not, but my dissertation was in LaTeX. 4. How do people who use LaTeX, MiKTeX, LyX, etc. share documents with people who are just using Word? How difficult does using LaTeX, MiKTeX, LyX, etc. make it to collaborate on projects with others? Sharing LaTeX documents with people using word processors only is no more difficult than giving driving directions to someone who is blindfolded and has all 4 limbs tied behind their back. Collaboration with people who insist on using programs that process their words much like a food processor processes food is the one legitimate reason to not use LaTeX (but untying them and removing the blindfold is much better). If you just need basic input or approval then give them a paper version or pdf file and then you make the changes. If they are going to be writing major portions or doing a lot of editing, then using LaTeX without all people understanding it will be a headache. Some other packages to consider are odfWeave (odf is the opensource office suite, it can read and write MSWord documents, but still sweave with R); R2HTML works with sweave where the base document is in html; sword is a tool from the same group as rexcel that gives the same general idea as sweave but using MSWord (windows only). Thanks, Paul -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie - want to view code for a function
For a substantial calculation like this the algorithms will likely be in C or Fortran. You will need to download the source for the stats package from CRAN (as a tar.gz file), expand it, and look at the source code in the appropriate sub-directories. You can get a bit of a road map in R by find(ar.yw) methods(ar.yw) stats:::ar.yw.default c but you can't see the underlying C or Fortran code, which is where the real action will be. (Of course you might want to read the help information first and follow up on the references there. Only in desperation, of course.) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dick Knox Sent: Wednesday, 8 December 2010 2:58 PM To: r-help@r-project.org Subject: [R] Newbie - want to view code for a function (I am) Brand new to R. (I) Want to understand the algorithm used in yule-walker time series autoregression model I assume there is a way to see the source for ar.yw I also assume that everybody except me knows how Could someone suggest to me how to find out I've looked thru some of the documenttion - there's a lot - and apparently I haven't looked the right place. Thanks in advance Dick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing up Non-numeric column
?unique -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of zhiji19 Sent: Wednesday, 8 December 2010 2:57 PM To: r-help@r-project.org Subject: [R] Summing up Non-numeric column Dear All If I have the following dataset V1 V2 x y y x z b a c b j d l c o How do I use R command to get the total number of different letter in column V1 column V1 has 7 different letters. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Summing-up-Non-numeric-column-tp3077710p3077710.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two time measures
I think all you need is ?split -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eduardo de Oliveira Horta Sent: Sunday, 28 November 2010 8:02 AM To: r-help@r-project.org Subject: [R] Two time measures Hello! I have a csv file of intra-day financial data (5-min closing prices) that looks like this: (obs - the dates are formated as day/month/year, as is usual here in Brazil) Date;Time;Close 01/09/2009;10:00;56567 01/09/2009;10:05;56463 01/09/2009;10:10;56370 ##(goes on all day) 01/09/2009;16:45;55771 01/09/2009;16:50;55823 01/09/2009;16:55;55814 ##(jumps to the subsequent day) 02/09/2009;10:00;55626 02/09/2009;10:05;55723 02/09/2009;10:10;55659 ##(goes on all day) 02/09/2009;16:45;55742 02/09/2009;16:50;55717 02/09/2009;16:55;55385 ## (and so on to the next day) I would like to store the intra-day 5-min prices into a list, where each element would represent one day, something like list[[1]] price at time 1, day 1 price at time 2, day 1 ... price at time n_1, day 1 list[[2]] price at time 1, day 2 price at time 2, day 2 ... price at time n_2, day 2 and so on. As the n_1, n_2, etc, suggest, each day have its own number of observations (this reflects the fact that the market may open and close at varying daytimes). I have guessed that a list would be a better way to store my data, since a matrix would be filled with NA's and that is not exactly what I'm looking for. Thanks in advance, and best regards, Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gap between graph and axis
perhaps you need something like this. par(yaxs = i) plot(runif(10), type = h, ylim = c(0, 1.1)) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sebastian Rudnick Sent: Tuesday, 23 November 2010 10:37 AM To: r-help@r-project.org Subject: [R] Gap between graph and axis Hi everyone! I want to plot some precipitation data via plot(type=h). Unfortunately there is always a gap between the bars and the x-axis, so that the bars reach in the negative area below 0 at the y-axis, which is very misleading. The ylim-parameter is set to 0 and max of precipitation, the min value of precipitation is 0 as well. I tried to fix this via the fig parameter, but I have no idea how to do it at all. I hope anyone can help. Thanks a lot, Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a philosophy R question
The conventional view used to be that S is the language and that R and S-PLUS are implementations of it. R is usually described as 'a programming environment for data analysis and graphics' (as was S-PLUS before it). However as the language that R implements diverges inexorably from the classical definition of S it is now probably more accurate to describe the language itself as R as well, now a dialect of S, if you will. The only thing most would agree upon, though is that neither R nor S-PLUS should *ever* be described as stats packages. Such a description is definitely to be avoided. In fact calling R a 'package' at all would be both confusing and misleading (not to mention demeaning!). Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erin Hodgess Sent: Sunday, 21 November 2010 11:56 AM To: R help Subject: [R] a philosophy R question Dear R People: Do we say that R is a programming language or a programming environment, please? Which is correct, please? Thanks in advance, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't invert matrix
What you show below is only a representation of the matrix to 7dp. If you look at that, though, the condition number is suspiciously large (i.e. the matrix is very ill-conditioned): txt - textConnection( + 0.99252358 0.93715047 0.7540535 0.4579895 + 0.01607797 0.09616267 0.2452471 0.3088614 + 0.09772828 0.58451468 1.4907090 1.8773815 + -0.0100 0. 0.090 0.170 + ) mat - matrix(scan(txt, quiet = TRUE), ncol = 4, byrow = TRUE) close(txt) with(svd(mat), d[1]/d[4]) [1] 82473793 With this representation, though, the matrix does seem to allow inversion on a windows 32 bit machine: solve(mat) [,1] [,2] [,3][,4] [1,] 1.4081192 -7826819 1287643 21.5115344 [2,] -0.3987761 18796863 -3092403 -25.7757002 [3,] -0.1208272 -18836093 3098860 -0.9159397 [4,] 0.1467979 9511648 -1564829 7.6326466 and horrible as it is (look closely at columns 2 and 3) it does check out pretty well: mat %*% solve(mat) [,1] [,2] [,3] [,4] [1,] 1.00e+00 -3.060450e-10 1.108447e-10 -1.025872e-15 [2,] 3.571091e-18 1.00e+00 -5.269385e-11 -1.840975e-16 [3,] 8.201989e-17 2.559318e-09 1.00e+00 -1.284563e-15 [4,] -3.203479e-18 -8.867573e-11 1.813305e-11 1.00e+00 but a generalized inverse (with default tolerances) is very different library(MASS) ginv(mat) [,1] [,2] [,3] [,4] [1,] 1.27299552 -0.2800302 -1.7022000 16.38665 [2,] -0.07426349 0.1804989 1.0971974 -13.46780 [3,] -0.44601712 0.2273789 1.3821521 -13.24953 [4,] 0.31100880 -0.1368457 -0.8318576 13.86073 which emphasises the delicacy of dealing with an ill-conditioned matrix. Incidently this also checks out fairly well according to the definition of a ginverse: range(mat - mat %*% ginv(mat) %*% mat) [1] -2.132894e-08 2.128452e-08 When dealing with numerical matrices you have to be prepared for the unexpected. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Noah Silverman Sent: Sunday, 21 November 2010 2:56 PM To: r-help@r-project.org Subject: [R] Can't invert matrix Hi, I'm trying to use the solve() function in R to invert a matrix. I get the following error, Lapack routine dgesv: system is exactly singular However, My matrix doesn't appear to be singular. [,1] [,2] [,3] [,4] [1,] 0.99252358 0.93715047 0.7540535 0.4579895 [2,] 0.01607797 0.09616267 0.2452471 0.3088614 [3,] 0.09772828 0.58451468 1.4907090 1.8773815 [4,] -0.0100 0. 0.090 0.170 Can anyone help me understand what is happening here? Thanks! -N __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density at particular values
It's actually not too difficult to write the density function itself as returning a function rather than a list of x and y values. Here is a no frills (well, few frills) version: ### cut here ### densityfun - local({ normd - function(value, bw) { force(value); force(bw) function(z) dnorm(z, mean = value, sd = bw) } function(x, bw = bw.nrd0, adjust = 1) { bandw - bw(x) * adjust flist - lapply(x, normd, bw = bandw) function(z) if(length(z) = 1) mean(sapply(flist, function(fun) fun(z))) else rowMeans(sapply(flist, function(fun) fun(z))) } }) ### cut here ### To test it: library(MASS) x - faithful$eruptions dx - density(x, n = 500) plot(dx) dfun - densityfun(x) sx - sample(dx$x, 25) points(sx, dfun(sx), pch = 4) curve(dfun, add = TRUE, col = blue, n = 500) This idea could be extneded to provide essentially the same features as density() itself. The details are left as an exercise... If anyone ever needs to integrate a kernel density estimate, (and for now I can't see why they would, but if), then this would provide a way to do it with integrate(). Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Sunday, 21 November 2010 3:21 PM To: Shant Ch Cc: r-help@r-project.org Subject: Re: [R] density at particular values On Nov 20, 2010, at 9:34 PM, Shant Ch wrote: David, I did look at ?density many times. I think I didn't explain what I have to find. Suppose I have a data. x- c(rnorm(40,5,3),rcauchy(30,0,4),rexp(30,6)) Suppose I don't have information about the mixture, I have been given only the data. density(x) will give the 6 number summary of the data, given as x and also the 6 number summary of the density of density given as y. I am not sure what the number six (6) represents in the two palces it occurs: str(density(x)) List of 7 $ x: num [1:512] -59.2 -59 -58.8 -58.5 -58.3 ... $ y: num [1:512] 3.30e-05 5.27e-05 8.19e-05 1.24e-04 1.82e-04 ... $ bw : num 1.37 $ n: int 100 $ call : language density.default(x = x) $ data.name: chr x $ has.na : logi FALSE - attr(*, class)= chr density Perhaps you want to look at: ?approxfun It would let you interpolate at points that are not part of the set density(x)$x __ David. I want to find the density of the given data at x=1. I basically want the value of y(=density) for x=1 i.e. kernel density at x=1. Shant From: David Winsemius dwinsem...@comcast.net To: Shant Ch sha1...@yahoo.com Cc: r-help@r-project.org Sent: Sat, November 20, 2010 8:54:32 PM Subject: Re: [R] density at particular values On Nov 20, 2010, at 8:07 PM, Shant Ch wrote: Hello everyone! I want to use density function of R to compute the density at x(=0, say). But it is giving me the 5-number summary and mean of the data and densities at that point. I just want the densities at different values specified by me. Can anyone let me know how can I find that? Here's what you should have done (even before posting): ?density Read the help page to see the structure of what density() returns. Value x the n coordinates of the points where the density is estimated. y the estimated density values. These will be non-negative, but can be zero. Realize that the specified by me part is either going to be modified to pick an existing estimate near my specification or that you will need to approximate the value. So what is the actual problem (and the actual data setup) ? --David. For example Thanks in advance for your help. Shant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.