[R] changing the x axis labels in a time series plot
OK, this has to be simple but I've searched through help files, mailing list archives and well, everything I could think of, and still no luck. I simply want to change the x axis labels in a time series graph, from its default numbering (which starts at 1 and increments by 1), to values I have in another vector, Year. It has to be a time series graph, I don't want to have to use a scatter plot because there are many lines to draw. Example: z = cbind(1:100,100:1); Year = 1322:1421 windows() plot.ts(z[,1:2],,single, xaxt=n, xlab=) axis(1,at=Year) This doesn't work, not any of the permutations I've tried with the various arguments to plot.ts and axis. Thanks for any help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract the p value
OK, what is the trick to extracting the overall p value from an lm object? It shows up in the summary(lm(model)) output but I can't seem to extract it: test2 = apply(aa, 1, function(x) summary(lm(x[,1] ~ 0 + x[,3] + x[,6]))) test2[[1]] Call: lm(formula = x[, 1] ~ 0 + x[, 3] + x[, 6]) [omitted summary output] F-statistic: 40.94 on 2 and 7 DF, p-value: 0.0001371 It does not seem to be obtainable from anova(lm(model)) either, only the p values for the individual predictors. Stumped. Jim Bouldin Research Ecologist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] converting object elements to variable names and making subsequent assignments thereto
This has got to be incredibly simple but I nevertheless can't figure it out as I am apparently brain dead. I just want to convert the elements of a character vector to variable names, so as to then assign formulas to them, e.g: z = c(model1,model2); I want to assign formulas, such as lm(y~x[,1]) and lm(y~x[,2]), to the variables model1 and model2. There are of course, many more than 2 models involved, so brute force is the option of absolute last resort. Thanks for any help. -- Jim Bouldin, Research Ecologist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting object elements to variable names and making subsequent assignments thereto
Yes, I tried to do it using assign. I couldn't get that to work. E.g: z=1:2; zz=rep(model,2);zzz = paste(zz,z,sep='');zzz [1] model1 model2 y = 1:10; v = rnorm(10,0,2); x2 = y + v; x3 = y + v^0.5 x = data.frame(x2,x3) for (i in 1:2){assign(zzz[i],lm(y~x[,i]))};zzz [1] model1 model2 stumped On Fri, Sep 23, 2011 at 1:08 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: The usual response to this sort of question is usually something like the following: assign() will do what you want; get() runs the other direction. But the more R way to do it is to put all the models in a list. Michael On Fri, Sep 23, 2011 at 1:03 PM, Jim Bouldin bouldi...@gmail.com wrote: This has got to be incredibly simple but I nevertheless can't figure it out as I am apparently brain dead. I just want to convert the elements of a character vector to variable names, so as to then assign formulas to them, e.g: z = c(model1,model2); I want to assign formulas, such as lm(y~x[,1]) and lm(y~x[,2]), to the variables model1 and model2. There are of course, many more than 2 models involved, so brute force is the option of absolute last resort. Thanks for any help. -- Jim Bouldin, Research Ecologist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Bouldin, PhD Research Ecologist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting object elements to variable names and making subsequent assignments thereto
OK, I see. I thought R was just returning the character strings of the model names without doing any assigning, since that's what it displayed. I had it right all along. Thanks for your help. On Fri, Sep 23, 2011 at 1:45 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: What exactly is the problem? Like I said, I'd personally put this in a list, but this seems like exactly what you wanted... model1 Call: lm(formula = y ~ x[, i]) Coefficients: (Intercept) x[, i] 1.0489 0.7175 model2 Call: lm(formula = y ~ x[, i]) Coefficients: (Intercept) x[, i] -0.4342 0.8734 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting object elements to variable names and making subsequent assignments thereto
OK. I was assuming that the call to zzz would print the model formulae, not the object names. That's what threw me. Jim On Fri, Sep 23, 2011 at 1:59 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: assign() doesn't return anything in this case. It's your addtional (unnecessary?) call to zzz at the end which triggers a print statement. Michael On Fri, Sep 23, 2011 at 1:59 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: assign() doesn't return anything in this case. It's your addtional (unnecessary?) call to zzz at the end which triggers a print statement. Michael On Fri, Sep 23, 2011 at 1:56 PM, Jim Bouldin bouldi...@gmail.com wrote: OK, I see. I thought R was just returning the character strings of the model names without doing any assigning, since that's what it displayed. I had it right all along. Thanks for your help. On Fri, Sep 23, 2011 at 1:45 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: What exactly is the problem? Like I said, I'd personally put this in a list, but this seems like exactly what you wanted... model1 Call: lm(formula = y ~ x[, i]) Coefficients: (Intercept) x[, i] 1.0489 0.7175 model2 Call: lm(formula = y ~ x[, i]) Coefficients: (Intercept) x[, i] -0.4342 0.8734 -- Jim Bouldin, PhD Research Ecologist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] functions on rows or columns of two (or more) arrays
I realize this should be simple, but even after reading over the several help pages several times, I still cannot decide between the myriad apply functions to address it. I simply want to apply a function to all the rows (or columns) of the same index from two (or more) identically sized arrays (or data frames). For example: a=matrix(1:50,nrow=10) a2=floor(jitter(a,amount=50)) a [,1] [,2] [,3] [,4] [,5] [1,]1 11 21 31 41 [2,]2 12 22 32 42 [3,]3 13 23 33 43 [4,]4 14 24 34 44 [5,]5 15 25 35 45 [6,]6 16 26 36 46 [7,]7 17 27 37 47 [8,]8 18 28 38 48 [9,]9 19 29 39 49 [10,] 10 20 30 40 50 a2 [,1] [,2] [,3] [,4] [,5] [1,] 31 56 -29 -13 10 [2,] 38 61 71 559 [3,] -29 38 47 12 38 [4,] 122 43 39 93 [5,] -43 23 -23 621 [6,] -13 61 55 112 [7,] -421 38 128 [8,] -13 -6 -18 16 95 [9,] -19 -2 78 331 [10,] 20 -16 -11 19 17 if I try the following for example: apply(a,1,function(x) lm(a~a2)) I get 10 identical repeats (except for the list indexer) of the following: [[1]] Call: lm(formula = a ~ a2) Coefficients: [,1] [,2] [,3] [,4] [,5] (Intercept) 8.372135 18.372135 28.372135 38.372135 48.372135 a21 -0.006163 -0.006163 -0.006163 -0.006163 -0.006163 a22 -0.093390 -0.093390 -0.093390 -0.093390 -0.093390 a23 0.009315 0.009315 0.009315 0.009315 0.009315 a24 -0.015143 -0.015143 -0.015143 -0.015143 -0.015143 a25 -0.026761 -0.026761 -0.026761 -0.026761 -0.026761 ...Which is clearly very wrong, in a number of ways. If I try by columns: apply(a,2,function(x) lm(a~a2)) ...I get exactly the same result. So, which is the appropriate apply-type function when two arrays (or d.f.'s?) are involved like this? Or none of them and some other approach (other than looping which I can do but which I assume is not optimal)? Thanks for any help. -- Jim Bouldin, PhD Research Ecologist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] object names from character strings
I realize this is probably pretty basic but I can't figure it out. I'm looping through an array, doing various calculations and producing a resulting data frame in each loop iteration. I need to give each data frame a different name. Although I can easily create a new character string for writing each frame to an output file, I cannot figure out how to convert such strings to corresponding object names within the R workspace itself, so as to give each d.f. a distinct name. The closest I got were various attempts with the as.name function, but couldn't get that to work either. Any help appreciated. Thanks. -- Jim Bouldin, PhD Research Ecologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls error regarding numerics vs logicals
I am trying to perform an nls for a valid negative exponential function: zz=nls(y~constant+a.est*2.7183^(b.est*x),start=list(constant=4.0,a.est=-4,b.est = -.005),trace=T) and am getting a number of different error messages, the most problematic of which is Error in nls(ring.area ~ constant + a.est * 2.7183^(b.est * ba.beg), start = list(constant = 4, : REAL() can only be applied to a 'numeric', not a 'logical' I can't see where there are any logicals in this equation to cause this problem. Any help appreciated. Thank you. Jim Bouldin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls error regarding numerics vs logicals
1. The expression you gave us is clearly not the one that produced the error: it involved ring.area and ba.beg. 2. You don't tell us what x and y are, so we can't reproduce anything. Sorry, I guess that was unclear. I changed the response and independent variable names to y and x respectively, in hopes that would be clearer. Both are numeric variables. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quantiles on rows of a matrix
I'm trying to obtain the mean of the middle 95% of the values from each row of a matrix (that is, the highest and lowest 2.5% of values in each row are removed before calculating the mean). I am having all sorts of problems with this; for example the command: apply(matrix1,1,function(x) quantile(c(.05,.90),na.rm=T)) returns the exact same quantile values for each row, which is clearly wrong. But even if the values were right, I'm not sure how I would then translate those quantile values into another apply function to get the mean, since they differ from row to row. I also tried: apply(matrix,1,mean,na.rm=T,trim=.05)) and the trim argument was simply ignored Stumped. Any help appreciated. Thanks. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removing duplicate rows
I'm trying to identify and remove rows in a data frame that are duplicated only on particular columns within it (i.e. not on all columns). The unique function looks for uniqueness across all columns of a data frame. Identifying unique rows based only on specific columns of interest returns only those columns, not all of the columns in the original frame. I tried this, and then added an identifier column to this truncated data frame, and then tried merging this with the original data frame and selecting only those rows container the identifier. But this did not work no matter how the arguments were altered: all records were returned instead of the uniques. Completely stumped--any help appreciated. Thanks. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting character strings and converting to numeric vectors
This seemingly should be quite simple but I can't solve it: I have a long character vector of geographic data (data frame column named XY) whose elements vary in length (from 11 to 14 chars). Each element is structured as a set of digits, then an underscore, then more digits, e.g: data.frame(head(as.character(XY))) head.as.character.XY.. 1 -448623_854854 2 -448563_854850 3 -448442_854842 4 -448301_854833 5 -448060_854818 6 -446828_854736 I simply need to separate the two sets of digits from each other and assign them into new columns. The closest I've been able to get is by: test=t(as.matrix(data.frame(head(strsplit(as.character(XY), \\_) test [,1] [,2] c...448623854854.. -448623 854854 c...448563854850.. -448563 854850 c...448442854842.. -448442 854842 c...448301854833.. -448301 854833 c...448060854818.. -448060 854818 c...446828854736.. -446828 854736 So far so good, but columns 1:2 will not coerce to either numeric or integer, for unknown reasons. Thanks for any help (and/or suggestions on a better way to code this). Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NAs and row/column calculations
I continue to have great frustrations with NA values--in particular making summary calculations on rows or cols of a matrix containing them. For example, why does: a = matrix(1:30,nrow=5) is.na(a[c(1:2),c(3:4)]);a [,1] [,2] [,3] [,4] [,5] [,6] [1,]16 NA NA 21 26 [2,]27 NA NA 22 27 [3,]38 13 18 23 28 [4,]49 14 19 24 29 [5,]5 10 15 20 25 30 apply(a[!is.na(a)],2,sum) give me this: Error in apply(a[!is.na(a)], 2, sum) : dim(X) must have a positive length when dim(a) [1] 5 6 What is the trick to calculating summary values from rows or columns containing NAs? Drives me nuts. More nuts that is. Thanks. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAs and row/column calculations
On 12/03/2010, at 11:25 AM, Jim Bouldin wrote: I continue to have great frustrations with NA values--in particular making summary calculations on rows or cols of a matrix containing them. For example, why does: a = matrix(1:30,nrow=5) is.na(a[c(1:2),c(3:4)]);a [,1] [,2] [,3] [,4] [,5] [,6] [1,]16 NA NA 21 26 [2,]27 NA NA 22 27 [3,]38 13 18 23 28 [4,]49 14 19 24 29 [5,]5 10 15 20 25 30 apply(a[!is.na(a)],2,sum) give me this: Error in apply(a[!is.na(a)], 2, sum) : dim(X) must have a positive length when dim(a) [1] 5 6 What is the trick to calculating summary values from rows or columns containing NAs? Drives me nuts. More nuts that is. When you do a[!is.na(a)] you get a ***vector*** --- not a matrix. ``Obviously''!!! Well, obvious to you maybe, or someone who's done it before, but not to me. The non-missing values of a cannot be arranged in a 5 x 6 matrix; there are only 26 of them. So (as my late Uncle Stanley would have said) ``What the hell do you expect?''. Silly me, I expected, based on (1) previous experience doing summary calcs on subsets of a matrix using exactly that style of command, and (2) the fact that dim(a) returns: [1] 5 6, and (3) the fact that a help search under the apply function gives NO INDICATION of any possible use of the na.rm command, AND (4) a help search on na.action does not even mention na.rm, that: apply(a[!is.na(a)],2,sum) would sum the non-NA elements of matrix a, by columns. Terribly faulty reasoning on my part, obviously. The ``trick'' is to remove the NAs at the summing stage: apply(a,2,sum,na.rm=TRUE) Not all that tricky. cheers, Rolf Turner ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assigning a file name, or part of, to an object
Is there a way to capure all, or part, of a filename and assign it to an object. Say I wanted to read in a file tiled example.txt and then assign the character string example (or exa or any other substring of example for that matter), to object a. Is there a simple way to do so? Thanks in advance for any help. Jim Bouldin Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculations on columns with partially matching names
Is there a command for partial matching of character strings? Specifically, I'd like to be able to calculate the mean of the values in any columns in a data frame or matrix that have identity in part of their column names. For example, columns labeled mpw06a and mpw06b match on the first five characters; their mean would be taken whereas any columns beginning with other than mpw06 would be excluded. I need to compare every pair of columns in the frame, and in some cases, possibly three at a time. Thanks in advance for any ideas. Jim Bouldin Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls error message
When I try to run the following non-linear regression with variables index1 and prl3: beta = 4 nls(index1~beta*(1/prl3),start = list(beta = 4)) I get this error message: Error in nls(index1 ~ beta * (1/prl3), start = list(beta = 4)) : REAL() can only be applied to a 'numeric', not a 'logical' I've got no clue as to the REAL() to which this is referring. Any help appreciated. Thanks in advance. Jim Bouldin Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] no html help upon upgrading to 2.10
I just upgraded from 2.8.1 to 2.10 on Windows Vista. BIG MISTAKE apparently because now when I type: help(functionname) or ?functionname I get only a small text window giving some very basic info on the topic, e.g.: base-package package:baseR Documentation The R Base Package Description: Base R functions Details: This package contains the basic functions which let R function as a language: arithmetic, input/output, basic programming support, etc. Its contents are available through inheritance from any environment. For a complete list of functions, use library(help=base). and not the html help screen with full package or function description like I used to. Exceedingly problematic, and I can find nothing either in the FAQs or the R search sites on what to do. Solutions much appreciated, thanks. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression on groups of consecutive rows of a matrix
I want to perform linear regression on groups of consecutive rows--say 5 to 10 such--of two matrices. There are many such potential groups because the matrices have thousands of rows. The matrices are both of the form: shp[1:5,16:20] SL495B SL004C SL005C SL005A SL017A -2649 1.06 0.56 NA NA NA -2648 0.97 0.57 NA NA NA -2647 0.46 0.30 NA NA NA -2646 0.92 0.48 NA NA NA -2645 0.82 0.48 NA NA NA That is, they both have NA values, and non-NA values, in the same matrix positions. In my attempts so far, I have had two problems. First, in using the split function (which I assume is essential here), I am unable to split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc): shp_split = split(shp,row(shp)) will split the matrix by rows but not by groups thereof. Stumped. Second, I cannot seem to get rid of the NA values, which would prevent the regression even is I could figure out how to split the matrices correctly, e.g.: shp_split = split(shp,row(shp)) shp_split = shp_split[!is.na(shp_split)] shp_split[1] $`1` [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67 0.17 1.36 1.84 1.06 0.56 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA etc IF I solve these problems, will I in fact be able to perform individual linear regressions on the (numerous) collections of 5 to 10 rows? Thanks as always for any insight. Jim Bouldin Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression on groups of consecutive rows of a matrix
But I do feel compelled to ask: Do you really get meaningful information from lm applied to 5 cases? Especially when the predictors used may not be the same from subset to subset??? Thanks again for your help David. Your question is a good one. It's a bit complicated but here's the basics. The predictors are the same between subsets, in the sense that, for each group of rows (which represent tree ring years), the predictors and predictands are always from the same set of trees, even though that set changes slightly between consecutive subsets. Typically there will be 20+ observations per year (row), so for 5 rows I have n = 100+. For my purposes (removing the effect of tree size on ring width for small groups of years) that is more than good enough. Now to try out your suggestion... Jim -- David On Nov 24, 2009, at 3:25 PM, Jim Bouldin wrote: I want to perform linear regression on groups of consecutive rows-- say 5 to 10 such--of two matrices. There are many such potential groups because the matrices have thousands of rows. The matrices are both of the form: shp[1:5,16:20] SL495B SL004C SL005C SL005A SL017A -2649 1.06 0.56 NA NA NA -2648 0.97 0.57 NA NA NA -2647 0.46 0.30 NA NA NA -2646 0.92 0.48 NA NA NA -2645 0.82 0.48 NA NA NA That is, they both have NA values, and non-NA values, in the same matrix positions. In my attempts so far, I have had two problems. First, in using the split function (which I assume is essential here), I am unable to split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc): shp_split = split(shp,row(shp)) will split the matrix by rows but not by groups thereof. Stumped. Second, I cannot seem to get rid of the NA values, which would prevent the regression even is I could figure out how to split the matrices correctly, e.g.: shp_split = split(shp,row(shp)) shp_split = shp_split[!is.na(shp_split)] shp_split[1] $`1` [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67 0.17 1.36 1.84 1.06 0.56 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA etc IF I solve these problems, will I in fact be able to perform individual linear regressions on the (numerous) collections of 5 to 10 rows? Thanks as always for any insight. Jim Bouldin Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] consecutive numbering of elements in a matrix
Many thanks to Dimitris, William and David for very helpful answers which solved my problem. Being a relatve newb, I am confused by something in the solutions by Dimitris and David. #Create a matrix A as follows: A - matrix(sample(50, 21), 7, 3) A[sample(21, 5)] - NA;A [,1] [,2] [,3] [1,] 36 38 24 [2,]6 33 13 [3,] 12 42 10 [4,]7 NA NA [5,] 48 NA NA [6,]3 NA 47 [7,] 29 234 B = row(A) - apply(is.na(A), 2, cumsum);B [,1] [,2] [,3] [1,]111 [2,]222 [3,]333 [4,]433 [5,]533 [6,]634 [7,]745 #But: B = row(A) - apply(!is.na(A), 2, cumsum);B [,1] [,2] [,3] [1,]000 [2,]000 [3,]000 [4,]011 [5,]022 [6,]032 [7,]032 This seems exactly backwards to me. The is.na(A) command should be cumulatively summing the NA values and !is.na(A) should be doing so on the non-NA values. But the opposite is the case. I'm glad I have a solution but this apparent backwardness of expected logic has me worried. I do have another, tougher question if anyone has the time, which is, given a resulting matrix like B below: is.na(B) - is.na(A);B [,1] [,2] [,3] [1,]111 [2,]222 [3,]333 [4,]4 NA NA [5,]5 NA NA [6,]6 NA4 [7,]745 how can I rearrange all the columns so that equal values are in the same row, i.e. in the case above, the NA values are removed from columns 2 and 3 and all non-NA values that had been below them are moved up to replace them. Thanks again for your help. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] consecutive numbering of elements in a matrix
Thank you Dimitris, that solves it exactly! I continue to be amazed at how a single line of code can be so powerful in R, containing so much information. Hard as hell to interpret though (for me). Jim one approach is the following: B - cbind(c(1:6, NA), c(1:3, NA,NA,NA, 4), c(1:3, NA,NA, 4,5)) matrix(B[order(col(B), B)], nrow(B), ncol(B)) I hope it helps. Best, Dimitris Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] consecutive numbering of elements in a matrix
Within a very large matrix composed of a mix of values and NAs, e.g, matrix A: [,1] [,2] [,3] [1,]1 NA NA [2,]3 NA NA [3,]3 10 17 [4,]4 12 18 [5,]6 16 19 [6,]6 22 20 [7,]5 11 NA I need to be able to consecutively number, in new columns, the non-NA values within each column (i.e. A[1,1] A[3,2] and A[3,3] would all be set to one, and subsequent values in those columns would increase by one, until the last non-NA value is reached, if any). Any ideas? Thanks Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] consecutive numbering of elements in a matrix
Thank you and apologies--I did not make it clear that there are no NAs mixed in with the valid values. Rather, they all occur consecutively, either toward the beginning of end of the column. Jim I didn't know what you wanted to do if there were NA's in the middle of a column. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Any ideas? Thanks Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting from a vector or matrix
I realize this should be simple but I'm having trouble subsetting vectors and matrices, for example extracting all values meeting a certain criterion, from a vector. Cannot seem to figure out the correct syntax and help page not very helpful. Or should I be using some other function than subset. Thanks for any help. Jim Bouldin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem selecting rows meeting a criterion
No problem John, thanks for your help, and also thanks to Dan and Patrick. Wasn't able to read or try anybody's suggestions yesterday. Here's what I've discovered in the meantime: What I did not include yesterday is that my original data frame, called data, was this: X Y V3 1 1 1 0.00 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 7 2 2 0.00 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 13 3 3 0.00 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 19 4 4 0.00 20 5 4 5.385165 21 1 5 5.00 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 25 5 5 0.00 To this data frame I applied the following command: data - data[data$V3 0,];data #to remove all rows where V3 = 0 giving me this (the point from which I started yesterday): X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.00 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 So far so good. But when I then submit the command data = data[XY,] #to select all rows where X Y I get the problem result already mentioned, namely: X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 which is clearly wrong! It doesn't matter if I give a new name to the data frame at each step or not, or whether I use the name data or not. It always gives the same wrong answer. However, if I instead use the command: subset(data, XY), I get the right answer, namely: X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 14 4 3 8.062258 15 5 3 5.099020 20 5 4 5.385165 OK so the lesson so far is use the subset function. But here it gets weirder. If I instead go straight from the initial data frame (data, given at the top of this post), selecting only rows where XY (without the intermediate step of removing rows with V3 = 0, which although is unnecessary in getting the result I want, is very relevant to the larger issue here), by using the command that caused me the original trouble (data = data[XY,]), I get the RIGHT answer (the data frame just above). The subset function also gives the right answer. Now what in the world is going on? This kind of thing scares me. Below is the full set of commands starting from scratch: #Point of the following is to measure the pairwise euclidean distances between 5 objects, each having X and Y coordinates #and put them into data frame format that labels each pair and gives the distance between them d = data.frame(x=sample(1:10, 5), y=sample(1:10, 5)) #create a sample data set ss2 = as.data.frame(as.matrix(dist(d))) #create a data.frame to extract row and column names X = rep(seq(1:length(row.names(ss2))), length(names(ss2))) #make a vector containing the X coordinate names Y = rep(seq(1:length(names(ss2))), length(row.names(ss2))) #the same for Y Y = sort(Y) #first sort coords = cbind(X, Y);rm(X,Y) #then cbind and remove X and Y data1 = as.data.frame(cbind(coords, as.vector(as.matrix(dist(d);rm(coords) # column bind the 3 vectors data2 = data1[data1$V3 0,] #remove those with V3 = 0 (= the original matrix diagonal) data3 = data2[XY,] #remove duplicates from original distance matrix data1;data2;data3 Thoughts much appreciated. Thanks. Jim Bouldin Clearly I was more tired than I realised last night. :( My appologies. In any case with the data.frame name changed to xx this seems to give you what you want subset(xx, xx[,1] xx[,2]) or using the data name subset(data, data[,1] data[,2]) should work as well __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem selecting rows meeting a criterion
Yes, thanks Steve and also to everyone else for helping me clear this up. The issue was definitely the existence of other objects named X and Y that I inadvertently referred to in my command statement. Only when these objects are removed AND the data frame in question is attached, will the command I originally used work. However, I see that it is much easier to just use the subset function or perhaps the with function. Seems that R has many painful lessons to teach. Thanks again. Jim Bouldin This won't work in general, and is probably only working in this particular case because you already have defined somewhere in your workspace vars named X and Y. What you wrote above isn't taking the values X,Y from data$X and data $Y, respectively, but rather from var X and Y defined elsewhere. Instead of doing data[X Y], do: data[data$X data$Y,] This should get you what you're expecting. ... Hopefully you're learning a slightly different lesson now :-) Does that clear things up at all? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem selecting rows meeting a criterion
When I try to select only those rows from the following data frame, called data, in which X Y X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.00 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 using the commands attach(data) data2 = data[X Y,];data2 I get this for data2: X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 Clearly, this is not what I intend but I cannot figure out what I've done wrong. Any help appreciated. Thanks. Jim Bouldin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem selecting rows meeting a criterion
What's wrong is I'm trying to select only those rows in which X Y, but I'm getting rows in which Y X and losing some in which X Y. The row numbers are not being read as values. Very confusing. Jim What's wrong with it? It looks okay to me. If you use subset(data, data$X data$Y)you get the same results. Any chance you're reading the row.numbers as values? BTW data is a reserved word in R and it is good practice not to use it as a variable name. My Results X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 --- On Mon, 8/10/09, Jim Bouldin jrboul...@ucdavis.edu wrote: From: Jim Bouldin jrboul...@ucdavis.edu Subject: [R] problem selecting rows meeting a criterion To: r-help@r-project.org Received: Monday, August 10, 2009, 5:49 PM When I try to select only those rows from the following data frame, called data, in which X Y X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.00 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 using the commands attach(data) data2 = data[X Y,];data2 I get this for data2: X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.00 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 Clearly, this is not what I intend but I cannot figure out what I've done wrong. Any help appreciated. Thanks. Jim Bouldin __ Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share what you know at http://ca.answers.yahoo.com Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R's database capabilities
I admit that I've not done a thorough search on this topic, but from the several instructional manuals and/or tutorials I've looked at, I don't see any mention of relational database capabilities in R? Have I missed something, and if so, can someone point me in the right direction to get started? Thanks! Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error message: .Random.seed is not an integer vector but of type 'list'
I'm trying to run this simple random sample procedure and keep getting the error message shown. I don't understand this; I've designated x as a numeric vector, so what is going on here? Thanks. x = as.vector(c(1:12));x [1] 1 2 3 4 5 6 7 8 9 10 11 12 mode(x) [1] numeric sample(x, 3) Error in sample(x, 3) : .Random.seed is not an integer vector but of type 'list' Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message: .Random.seed is not an integer vector but of type 'list'
Thank you. However, when I tried that, I got this message: Warning message: In rm(.Random.seed) : variable .Random.seed was not found Jim Bouldin wrote: I'm trying to run this simple random sample procedure and keep getting the error message shown. I don't understand this; I've designated x as a numeric vector, so what is going on here? Thanks. x = as.vector(c(1:12));x [1] 1 2 3 4 5 6 7 8 9 10 11 12 mode(x) [1] numeric sample(x, 3) Error in sample(x, 3) : .Random.seed is not an integer vector but of type 'list' Something has changed/corrupted an object called .Random.seed that is required by the Random Number Generator. Just say rm(.Random.seed) and try again. Uwe Ligges Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message: .Random.seed is not an integer vector but of type 'list'
Jim Bouldin wrote: Thank you. However, when I tried that, I got this message: Warning message: In rm(.Random.seed) : variable .Random.seed was not found In that case, have you attached some package that has its own .Random.seed? Try to find where the current .random.seed comes from R complains about. Uwe Ligges No, there are no attached packages, just the ones that load automatically. The R commander has some type of RNG but it is not loaded. Completely stumped. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message: .Random.seed is not an integer vector but
Thanks much Ted. I actually had just tried what you suggest here before you posted, and resolved the problem. Thanks also for the other tips. I wrote x = as.vector(c(1:12)) because I thought that the mode of x might be the problem, the error message pointing to .Random.seed notwithstanding. On a related note, I did a brief test a couple weeks back where I ran a million random samples of 3 from the vector 1:12 and compared the mean against the known mean. It was off by 1 percent, which indicated that the RNG was biased more than I'd have thought. Comments? Jim Follow-up to my previous reply (just posted). Having read the other responses and your reactions, try the following: rm(.Random.seed) set.seed(54321) ## (Or your favourite magic number) [*] x = as.vector(c(1:12)) ## To reproduce your original code ... ! sample(x,3) [*] When you did rm(.Random.seed) as suggested by Uwe, the variable .Random.seed was lost, so you have to create it again. If, after the above, you still get the problem, then something is very seriously wrong. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 17:23:09 -- XFMail -- Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random # generator accuracy
Dan Nordlund wrote: It would be necessary to see the code for your 'brief test' before anyone could meaningfully comment on your results. But your results for a single test could have been a valid random result. I've re-created what I did below. The problem appears to be with the weighting process: the unweighted sample came out much closer to the actual than the weighted sample (1% error) did. Comments? Jim x [1] 1 2 3 4 5 6 7 8 9 10 11 12 weights [1] 1 1 1 1 1 1 2 2 2 2 2 2 a = mean(replicate(100,(sample(x, 3, prob = weights;a # (1 million samples from x, of size 3, weighted by weights; the mean should be 7.50) [1] 7.406977 7.406977/7.5 [1] 0.987597 b = mean(replicate(100,(sample(x, 3;b # (1 million samples from x, of size 3, not weighted this time; the mean should be 6.50) [1] 6.501477 6.501477/6.5 [1] 1.000227 Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
Thanks Greg, that most definitely was it. So apparently the default is sampling without replacement. Fine, but this brings up a question I've had for a bit now, which is, how do you know what the default settings are for the arguments of any given function? The HTML help files don't seem to indicate in many (most) cases. Thanks. Try adding replace=TRUE to your call to sample, then you will get numbers closer to what you are expecting. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Jim Bouldin Sent: Thursday, July 23, 2009 12:00 PM To: r-help@r-project.org Subject: [R] Random # generator accuracy Dan Nordlund wrote: It would be necessary to see the code for your 'brief test' before anyone could meaningfully comment on your results. But your results for a single test could have been a valid random result. I've re-created what I did below. The problem appears to be with the weighting process: the unweighted sample came out much closer to the actual than the weighted sample (1% error) did. Comments? Jim x [1] 1 2 3 4 5 6 7 8 9 10 11 12 weights [1] 1 1 1 1 1 1 2 2 2 2 2 2 a = mean(replicate(100,(sample(x, 3, prob = weights;a # (1 million samples from x, of size 3, weighted by weights; the mean should be 7.50) [1] 7.406977 7.406977/7.5 [1] 0.987597 b = mean(replicate(100,(sample(x, 3;b # (1 million samples from x, of size 3, not weighted this time; the mean should be 6.50) [1] 6.501477 6.501477/6.5 [1] 1.000227 Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
You are absolutely correct Ted. When no weights are applied it doesn't matter if you sample with or without replacement, because the probability of choosing any particular value is equally distributed among all such. But when they're weighted unequally that's not the case. It is also interesting to note that if the problem is set up slightly differently, by say defining the vector x as: x = c(1,2,3,4,5,6,7,7,8,8,9,9,10,10,11,11,12,12), effectively giving the same probability of selection for the 12 integers as before, the same problem does not arise, or at least not as severely: x2 [1] 1 2 3 4 5 6 7 8 9 10 11 12 7 8 9 10 11 12 d = mean(replicate(100,(sample(x2, 3;d # (1 million samples from x2, of size 3; the mean should be 7.50) [1] 7.499233 e = mean(replicate(100,(sample(x2, 3, replace = TRUE;e # (1 million samples from x2, of size 3; with replacement this time the mean should still be 7.50) [1] 7.502085 d/e [1] 0.9996198 Jim To obtain the result you expected, you would need to explicitly specify replace=TRUE, since the default for replace is FALSE. (though probably what you really intended was sampling without replacement). Read carefully what is said about prob in '?sample' -- when replace=FALSE, the probability of inclusion of an element is not proportional to its weight in 'prob'. The reason is that elements with higher weights are more likely to be chosen early on. This then knocks that higher weight out of the contest, making it more likely that elements with smaller weights will be sampled subsequently. Hence the mean of the sample will be biased slightly downwards, relative to the weighted mean of the values in x. table(replicate(100,(sample(x, 3 # 1 2 3 4 5 6 # 250235 250743 249603 250561 249828 249777 # 7 8 9 10 11 12 # 249780 250478 249591 249182 249625 250597 (so all nice equal frequencies) table(replicate(100,(sample(x, 3,prob=weights # 1 2 3 4 5 6 # 174873 175398 174196 174445 173240 174110 # 7 8 9 10 11 12 # 325820 326140 325289 325098 325475 325916 Note that the frequencies of the values with weight=2 are a bit less than twice the frequencies of the values with weight=1: (325820+326140+325289+325098+325475+325916)/ (174873+175398+174196+174445+173240+174110) # [1] In fact this is fairly easily caluclated. The possible combinations (in order of sampling) of the two weights, with their probabilities, are: 1s 2s --- 1 1 1 P = 6/18 * 5/17 * 4/163 0 1 1 2 P = 6/18 * 5/17 * 12/162 1 1 2 1 P = 6/18 * 12/17 * 5/152 1 1 2 2 P = 6/18 * 12/17 * 10/151 2 2 1 1 P = 12/18 * 6/16 * 5/152 1 2 1 2 P = 12/18 * 6/16 * 10/151 2 2 2 1 P = 12/18 * 10/16 * 6/141 2 2 2 2 P = 12/18 * 10/16 * 8/140 3 So the expected number of weight=1 in the sample is 3*(6/18 * 5/17 * 4/16) + 2*(6/18 * 5/17 * 12/16) + 2*(6/18 * 12/17 * 5/15) + 1*(6/18 * 12/17 * 10/15) + 2*(12/18 * 6/16 * 5/15) + 1*(12/18 * 6/16 * 10/15) + 1*(12/18 * 10/16 * 6/14) + 0 = 1.046218 Hence the expected number of weight=2 in the sample is 3 - 1.046218 = 1.953782 and their ratio 1.953782/1.046218 = 1.867471 Compare this with the value 1.867351 (above) obtained by simulation! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 21:05:07 -- XFMail -- Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random # generator accuracy
Perfectly explained Ted. One might, at first reflection, consider that simply repeating the values 7 through 12 and sampling (w/o replacement) from among the 18 resulting values, would be similar to just doubling the selection probabilities for 7 through 12 and then sampling. That would clearly not be true though. Jim Whereas, if you replace x = c(1,2,3,4,5,6,7,8,9,110,11,12) with the weighted equivalent, doubling up 7-12 as in your x2 = c(1,2,3,4,5,6,7,7,8,8,9,9,10,10,11,11,12,12), each of the 18 items now has the same weight as the others, and the unweighted sampling mean(replicate(100,(sample(x2, 3 now gives the mean of the 18 values (7.5); whereas -- as my calculation showed -- the effect of the sequential weighting is to bias the result slightly downwards (in your example; generally, in favour of the items with lower weights), since the way weighting works in sample() is not equivalent to replicating each item weight times. The general problem, of sampling without replacement in such a way that for each item the probability that it is included in the sample is proportional to a pre-assigned weight (sampling with probability proportional to size) is quite tricky and, for certain choices of weights, impossible. For a glimpse of what's inside the can of worms, have a look at the reference manual for the 'sampfling' package, in particular the function samprop(): http://www.stats.bris.ac.uk/R/web/packages/sampfling/sampfling.pdf Ted. On 23-Jul-09 20:56:43, Jim Bouldin wrote: You are absolutely correct Ted. When no weights are applied it doesn't matter if you sample with or without replacement, because the probability of choosing any particular value is equally distributed among all such. But when they're weighted unequally that's not the case. It is also interesting to note that if the problem is set up slightly differently, by say defining the vector x as: x = c(1,2,3,4,5,6,7,7,8,8,9,9,10,10,11,11,12,12), effectively giving the same probability of selection for the 12 integers as before, the same problem does not arise, or at least not as severely: x2 [1] 1 2 3 4 5 6 7 8 9 10 11 12 7 8 9 10 11 12 d = mean(replicate(100,(sample(x2, 3;d # (1 million samples from x2, of size 3; the mean should be 7.50) [1] 7.499233 e = mean(replicate(100,(sample(x2, 3, replace = TRUE;e # (1 million samples from x2, of size 3; with replacement this time the mean should still be 7.50) [1] 7.502085 d/e [1] 0.9996198 Jim To obtain the result you expected, you would need to explicitly specify replace=TRUE, since the default for replace is FALSE. (though probably what you really intended was sampling without replacement). -- when replace=FALSE, the probability of inclusion of an element is not proportional to its weight in 'prob'. The reason is that elements with higher weights are more likely to be chosen early on. This then knocks that higher weight out of the contest, making it more likely that elements with smaller weights will be sampled subsequently. Hence the mean of the sample will be biased slightly downwards, relative to the weighted mean of the values in x. table(replicate(100,(sample(x, 3 # 1 2 3 4 5 6 # 250235 250743 249603 250561 249828 249777 # 7 8 9 10 11 12 # 249780 250478 249591 249182 249625 250597 (so all nice equal frequencies) table(replicate(100,(sample(x, 3,prob=weights # 1 2 3 4 5 6 # 174873 175398 174196 174445 173240 174110 # 7 8 9 10 11 12 # 325820 326140 325289 325098 325475 325916 Note that the frequencies of the values with weight=2 are a bit less than twice the frequencies of the values with weight=1: (325820+326140+325289+325098+325475+325916)/ (174873+175398+174196+174445+173240+174110) # [1] In fact this is fairly easily caluclated. The possible combinations (in order of sampling) of the two weights, with their probabilities, are: 1s 2s --- 1 1 1 P = 6/18 * 5/17 * 4/163 0 1 1 2 P = 6/18 * 5/17 * 12/162 1 1 2 1 P = 6/18 * 12/17 * 5/152 1 1 2 2 P = 6/18 * 12/17 * 10/151 2 2 1 1 P = 12/18 * 6/16 * 5/152 1 2 1 2 P = 12/18 * 6/16 * 10/151 2 2 2 1 P = 12/18 * 10/16 * 6/141 2 2 2 2 P = 12/18 * 10/16 * 8/140 3 So the expected number of weight=1 in the sample is 3*(6/18 * 5/17 * 4/16) + 2*(6/18 * 5/17 * 12/16) + 2*(6/18 * 12/17 * 5/15) + 1*(6/18 * 12/17 * 10/15) + 2*(12/18 * 6/16 * 5/15) + 1*(12/18 * 6/16 * 10/15) + 1*(12/18 * 10/16 * 6/14) + 0 = 1.046218 Hence the expected number of weight=2
[R] unloading loaded packages
I can't seem to find info on how to unload packages that have been loaded. My goal in doing so is to gain access to functions that have been masked out by those packages. Or is there another way to do so? Thanks in advance. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.