from:"Jim Bouldin"

[R] changing the x axis labels in a time series plot

2012-07-14 Thread Jim Bouldin

OK, this has to be simple but I've searched through help files, mailing
list archives and well, everything I could think of, and still no luck.

I simply want to change the x axis labels in a time series graph, from its
default numbering (which starts at 1 and increments by 1), to values I have
in another vector, Year.  It has to be a time series graph, I don't want
to have to use a scatter plot because there are many lines to draw.

Example:

z = cbind(1:100,100:1); Year = 1322:1421
windows()
plot.ts(z[,1:2],,single, xaxt=n, xlab=)
axis(1,at=Year)

This doesn't work, not any of the permutations I've tried with the various
arguments to plot.ts and axis.
Thanks for any help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] extract the p value

2011-10-24 Thread Jim Bouldin

OK, what is the trick to extracting the overall p value from an lm object?
It shows up in the summary(lm(model)) output but I can't seem to extract it:

 test2 = apply(aa, 1, function(x) summary(lm(x[,1] ~ 0 + x[,3] + x[,6])))
 test2[[1]]

Call:
lm(formula = x[, 1] ~ 0 + x[, 3] + x[, 6])

[omitted summary output]
F-statistic: 40.94 on 2 and 7 DF,  p-value: 0.0001371

It does not seem to be obtainable from anova(lm(model)) either, only the p
values for the individual predictors.
Stumped.

Jim Bouldin
Research Ecologist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] converting object elements to variable names and making subsequent assignments thereto

2011-09-23 Thread Jim Bouldin

This has got to be incredibly simple but I nevertheless can't figure it out
as I am apparently brain dead.

I just want to convert the elements of a character vector to variable names,
so as to then assign formulas to them, e.g:
z = c(model1,model2); I want to assign formulas, such as lm(y~x[,1]) and
lm(y~x[,2]), to the variables model1 and model2.

There are of course, many more than 2 models involved, so brute force is the
option of absolute last resort.
Thanks for any help.
-- 
Jim Bouldin, Research Ecologist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] converting object elements to variable names and making subsequent assignments thereto

2011-09-23 Thread Jim Bouldin

Yes, I tried to do it using assign.  I couldn't get that to work.  E.g:

 z=1:2; zz=rep(model,2);zzz = paste(zz,z,sep='');zzz
[1] model1 model2
 y = 1:10; v = rnorm(10,0,2); x2 = y + v; x3 = y + v^0.5
 x = data.frame(x2,x3)
 for (i in 1:2){assign(zzz[i],lm(y~x[,i]))};zzz
[1] model1 model2

stumped


On Fri, Sep 23, 2011 at 1:08 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 The usual response to this sort of question is usually something like the
 following:

 assign() will do what you want; get() runs the other direction. But the
 more R way to do it is to put all the models in a list.

 Michael

 On Fri, Sep 23, 2011 at 1:03 PM, Jim Bouldin bouldi...@gmail.com wrote:

 This has got to be incredibly simple but I nevertheless can't figure it
 out
 as I am apparently brain dead.

 I just want to convert the elements of a character vector to variable
 names,
 so as to then assign formulas to them, e.g:
 z = c(model1,model2); I want to assign formulas, such as lm(y~x[,1])
 and
 lm(y~x[,2]), to the variables model1 and model2.

 There are of course, many more than 2 models involved, so brute force is
 the
 option of absolute last resort.
 Thanks for any help.
 --
 Jim Bouldin, Research Ecologist

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Bouldin, PhD
Research Ecologist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] converting object elements to variable names and making subsequent assignments thereto

2011-09-23 Thread Jim Bouldin

OK, I see.  I thought R was just returning the character strings of the
model names without doing any assigning, since that's what it displayed. I
had it right all along. Thanks for your help.

On Fri, Sep 23, 2011 at 1:45 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 What exactly is the problem? Like I said, I'd personally put this in a
 list, but this seems like exactly what you wanted...

  model1

 Call:
 lm(formula = y ~ x[, i])

 Coefficients:
 (Intercept)   x[, i]
  1.0489   0.7175

  model2

 Call:
 lm(formula = y ~ x[, i])

 Coefficients:
 (Intercept)   x[, i]
 -0.4342   0.8734


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] converting object elements to variable names and making subsequent assignments thereto

2011-09-23 Thread Jim Bouldin

OK.  I was assuming that the call to zzz would print the model formulae, not
the object names.  That's what threw me.
Jim

On Fri, Sep 23, 2011 at 1:59 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:
assign() doesn't return anything in this case. It's your addtional
(unnecessary?) call to zzz at the end which triggers a print statement.

Michael

On Fri, Sep 23, 2011 at 1:59 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 assign() doesn't return anything in this case. It's your addtional
 (unnecessary?) call to zzz at the end which triggers a print statement.

 Michael


 On Fri, Sep 23, 2011 at 1:56 PM, Jim Bouldin bouldi...@gmail.com wrote:

 OK, I see.  I thought R was just returning the character strings of the
 model names without doing any assigning, since that's what it displayed. I
 had it right all along. Thanks for your help.


 On Fri, Sep 23, 2011 at 1:45 PM, R. Michael Weylandt 
 michael.weyla...@gmail.com wrote:

 What exactly is the problem? Like I said, I'd personally put this in a
 list, but this seems like exactly what you wanted...

  model1

 Call:
 lm(formula = y ~ x[, i])

 Coefficients:
 (Intercept)   x[, i]
  1.0489   0.7175

  model2

 Call:
 lm(formula = y ~ x[, i])

 Coefficients:
 (Intercept)   x[, i]
 -0.4342   0.8734






-- 
Jim Bouldin, PhD
Research Ecologist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] functions on rows or columns of two (or more) arrays

2011-08-04 Thread Jim Bouldin

I realize this should be simple, but even after reading over the several
help pages several times, I still cannot decide between the myriad apply
functions to address it.  I simply want to apply a function to all the rows
(or columns) of the same index from two (or more) identically sized arrays
(or data frames).

For example:

 a=matrix(1:50,nrow=10)
 a2=floor(jitter(a,amount=50))
 a
  [,1] [,2] [,3] [,4] [,5]
 [1,]1   11   21   31   41
 [2,]2   12   22   32   42
 [3,]3   13   23   33   43
 [4,]4   14   24   34   44
 [5,]5   15   25   35   45
 [6,]6   16   26   36   46
 [7,]7   17   27   37   47
 [8,]8   18   28   38   48
 [9,]9   19   29   39   49
[10,]   10   20   30   40   50
 a2
  [,1] [,2] [,3] [,4] [,5]
 [1,]   31   56  -29  -13   10
 [2,]   38   61   71   559
 [3,]  -29   38   47   12   38
 [4,]   122   43   39   93
 [5,]  -43   23  -23   621
 [6,]  -13   61   55   112
 [7,]  -421   38   128
 [8,]  -13   -6  -18   16   95
 [9,]  -19   -2   78   331
[10,]   20  -16  -11   19   17

if I try the following for example:
apply(a,1,function(x) lm(a~a2))

I get 10 identical repeats (except for the list indexer) of the following:

[[1]]

Call:
lm(formula = a ~ a2)

Coefficients:
 [,1]   [,2]   [,3]   [,4]   [,5]
(Intercept)   8.372135  18.372135  28.372135  38.372135  48.372135
a21  -0.006163  -0.006163  -0.006163  -0.006163  -0.006163
a22  -0.093390  -0.093390  -0.093390  -0.093390  -0.093390
a23   0.009315   0.009315   0.009315   0.009315   0.009315
a24  -0.015143  -0.015143  -0.015143  -0.015143  -0.015143
a25  -0.026761  -0.026761  -0.026761  -0.026761  -0.026761

...Which is clearly very wrong, in a number of ways.  If I try by columns:
apply(a,2,function(x) lm(a~a2))
...I get exactly the same result.

So, which is the appropriate apply-type function when two arrays (or
d.f.'s?) are involved like this? Or none of them and some other approach
(other than looping which I can do but which I assume is not optimal)?
Thanks for any help.
-- 
Jim Bouldin, PhD
Research Ecologist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] object names from character strings

2010-12-26 Thread Jim Bouldin


I realize this is probably pretty basic but I can't figure it out.

I'm looping through an array, doing various calculations and producing a 
resulting data frame in each loop iteration.  I need to give each data 
frame a different name.  Although I can easily create a new character 
string for writing each frame to an output file, I cannot figure out how 
to convert such strings to corresponding object names within the R 
workspace itself, so as to give each d.f. a distinct name.  The closest 
I got were various attempts with the as.name function, but couldn't get 
that to work either.  Any help appreciated.  Thanks.


--
Jim Bouldin, PhD
Research Ecologist

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nls error regarding numerics vs logicals

2010-07-09 Thread Jim Bouldin


I am trying to perform an nls for a valid negative exponential function:

zz=nls(y~constant+a.est*2.7183^(b.est*x),start=list(constant=4.0,a.est=-4,b.est
= -.005),trace=T)

and am getting a number of different error messages, the most problematic
of  which is Error in nls(ring.area ~ constant + a.est * 2.7183^(b.est *
ba.beg), start = list(constant = 4,  : 
  REAL() can only be applied to a 'numeric', not a 'logical'

I can't see where there are any logicals in this equation to cause this
problem.  Any help appreciated. Thank you.

Jim Bouldin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls error regarding numerics vs logicals

2010-07-09 Thread Jim Bouldin


 1.  The expression you gave us is clearly not the one that produced the 
 error:  it involved ring.area and ba.beg.
 
 2.  You don't tell us what x and y are, so we can't reproduce anything.

Sorry, I guess that was unclear.  I changed the response and independent
variable names to y and x respectively, in hopes that would be clearer.
Both are numeric variables.
Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] quantiles on rows of a matrix

2010-07-07 Thread Jim Bouldin


I'm trying to obtain the mean of the middle 95% of the values from each row
of a matrix  (that is, the highest and lowest 2.5% of values in each row
are removed before calculating the mean).  I am having all sorts of
problems with this; for example the command:

apply(matrix1,1,function(x) quantile(c(.05,.90),na.rm=T)) 

returns the exact same quantile values for each row, which is clearly
wrong.  But even if the values were right, I'm not sure how I would then
translate those quantile values into another apply function to get the
mean, since they differ from row to row.

I also tried:
apply(matrix,1,mean,na.rm=T,trim=.05))
and the trim argument was simply ignored


Stumped. Any help appreciated. Thanks.


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] removing duplicate rows

2010-05-11 Thread Jim Bouldin


I'm trying to identify and remove rows in a data frame that are duplicated
only on particular columns within it (i.e. not on all columns).  The
unique function looks for uniqueness across all columns of a data frame.
 Identifying unique rows based only on specific columns of interest returns
only those columns, not all of the columns in the original frame.  I tried
this, and then added an identifier column to this truncated data frame, and
then tried merging this with the original data frame and selecting only
those rows container the identifier.  But this did not work no matter how
the arguments were altered: all records were returned instead of the
uniques.  Completely stumped--any help appreciated. Thanks.



Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] splitting character strings and converting to numeric vectors

2010-05-06 Thread Jim Bouldin


This seemingly should be quite simple but I can't solve it:

I have a long character vector of geographic data (data frame column named
XY) whose elements vary in length (from 11 to 14 chars).  Each element is
structured as a set of digits, then an underscore, then more digits, e.g:

 data.frame(head(as.character(XY)))
  head.as.character.XY..
1 -448623_854854
2 -448563_854850
3 -448442_854842
4 -448301_854833
5 -448060_854818
6 -446828_854736

I simply need to separate the two sets of digits from each other and assign
them into new columns.  The closest I've been able to get is by:

 test=t(as.matrix(data.frame(head(strsplit(as.character(XY), \\_)
 test
   [,1]  [,2]
c...448623854854.. -448623 854854
c...448563854850.. -448563 854850
c...448442854842.. -448442 854842
c...448301854833.. -448301 854833
c...448060854818.. -448060 854818
c...446828854736.. -446828 854736

So far so good, but  columns 1:2 will not coerce to either numeric or
integer, for unknown reasons.  Thanks for any help (and/or suggestions on a
better way to code this).



Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NAs and row/column calculations

2010-03-11 Thread Jim Bouldin


I continue to have great frustrations with NA values--in particular making
summary calculations on rows or cols of a matrix containing them.  For
example, why does:

 a = matrix(1:30,nrow=5)
 is.na(a[c(1:2),c(3:4)]);a
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]16   NA   NA   21   26
[2,]27   NA   NA   22   27
[3,]38   13   18   23   28
[4,]49   14   19   24   29
[5,]5   10   15   20   25   30
 apply(a[!is.na(a)],2,sum)

give me this:

Error in apply(a[!is.na(a)], 2, sum) : dim(X) must have a positive length

when

 dim(a)
[1] 5 6

What is the trick to calculating summary values from rows or columns
containing NAs?  Drives me nuts.  More nuts that is.

Thanks.




Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NAs and row/column calculations

2010-03-11 Thread Jim Bouldin


 
 On 12/03/2010, at 11:25 AM, Jim Bouldin wrote:
 
  
  I continue to have great frustrations with NA values--in particular
 making
  summary calculations on rows or cols of a matrix containing them.  For
  example, why does:
  
  a = matrix(1:30,nrow=5)
  is.na(a[c(1:2),c(3:4)]);a
  [,1] [,2] [,3] [,4] [,5] [,6]
  [1,]16   NA   NA   21   26
  [2,]27   NA   NA   22   27
  [3,]38   13   18   23   28
  [4,]49   14   19   24   29
  [5,]5   10   15   20   25   30
  apply(a[!is.na(a)],2,sum)
  
  give me this:
  
  Error in apply(a[!is.na(a)], 2, sum) : dim(X) must have a positive
 length
  
  when
  
  dim(a)
  [1] 5 6
  
  What is the trick to calculating summary values from rows or columns
  containing NAs?  Drives me nuts.  More nuts that is.
 
 When you do a[!is.na(a)] you get a ***vector*** --- not a matrix.
 ``Obviously''!!!  

Well, obvious to you maybe, or someone who's done it before, but not to me.

The non-missing values of a cannot be arranged in
 a 5 x 6 matrix; there are only 26 of them.  So (as my late Uncle
 Stanley would have said) ``What the hell do you expect?''.

Silly me, I expected, based on (1) previous experience doing summary calcs
on subsets of a matrix using exactly that style of command, and (2) the
fact that dim(a) returns: [1] 5 6, and (3) the fact that a help search
under the apply function gives NO INDICATION of any possible use of the
na.rm command, AND (4) a help search on na.action does not even mention
na.rm, that:
 
 apply(a[!is.na(a)],2,sum)

would sum the non-NA elements of matrix a, by columns.  Terribly faulty
reasoning on my part, obviously.


 
 The ``trick'' is to remove the NAs at the summing stage:
 
 apply(a,2,sum,na.rm=TRUE)
 
 Not all that tricky.
 
   cheers,
 
   Rolf Turner
 
 ##
 Attention: 
 This e-mail message is privileged and confidential. If you are not the 
 intended recipient please delete the message and notify the sender. 
 Any views or opinions presented are solely those of the author.
 
 This e-mail has been scanned and cleared by MailMarshal 
 www.marshalsoftware.com
 ##
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] assigning a file name, or part of, to an object

2010-01-22 Thread Jim Bouldin


Is there a way to capure all, or part, of a filename and assign it to an
object.  Say I wanted to read in a file tiled example.txt and then assign
the character string example (or exa or any other substring of
example for that matter), to object a.  Is there a simple way to do so? 
Thanks in advance for any help.



Jim Bouldin
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calculations on columns with partially matching names

2010-01-03 Thread Jim Bouldin


Is there a command for partial matching of character strings? Specifically,
I'd like to be able to calculate the mean of the values in any columns in a
data frame or matrix that have identity in part of their column names.  For
example, columns labeled mpw06a and mpw06b match on the first five
characters; their mean would be taken whereas any columns beginning with
other than mpw06 would be excluded.  I need to compare every pair of
columns in the frame, and in some cases, possibly three at a time. 

Thanks in advance for any ideas.




Jim Bouldin
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nls error message

2009-12-28 Thread Jim Bouldin


When I try to run the following non-linear regression with  variables
index1 and prl3:

 beta = 4
 nls(index1~beta*(1/prl3),start = list(beta = 4))

I get this error message:

Error in nls(index1 ~ beta * (1/prl3), start = list(beta = 4)) : 
  REAL() can only be applied to a 'numeric', not a 'logical'

I've got no clue as to the REAL() to which this is referring.  Any help
appreciated. Thanks in advance.


Jim Bouldin
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] no html help upon upgrading to 2.10

2009-12-04 Thread Jim Bouldin


I just upgraded from 2.8.1 to 2.10 on Windows Vista.  BIG MISTAKE
apparently because now when I type:

 help(functionname)

or
?functionname

I get only a small text window giving some very basic info on the topic, e.g.:

base-package   package:baseR Documentation

The R Base Package

Description:

 Base R functions

Details:

 This package contains the basic functions which let R function as
 a language: arithmetic, input/output, basic programming support,
 etc.  Its contents are available through inheritance from any
 environment.

 For a complete list of functions, use library(help=base).


and not the html help screen with full package or function description like
I used to.  Exceedingly problematic, and I can find nothing either in the
FAQs or the R search sites on what to do.   Solutions much appreciated, thanks.


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] linear regression on groups of consecutive rows of a matrix

2009-11-24 Thread Jim Bouldin


I want to perform linear regression on groups of consecutive rows--say 5 to
10 such--of two matrices.  There are many such potential groups because the
matrices have thousands of rows. The matrices are both of the form:

 shp[1:5,16:20]
  SL495B SL004C SL005C SL005A SL017A
-2649   1.06   0.56 NA NA NA
-2648   0.97   0.57 NA NA NA
-2647   0.46   0.30 NA NA NA
-2646   0.92   0.48 NA NA NA
-2645   0.82   0.48 NA NA NA

That is, they both have NA values, and non-NA values, in the same matrix
positions.  In my attempts so far, I have had two problems.  First, in
using the split function (which I assume is essential here), I am unable to
split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc):

 shp_split = split(shp,row(shp))

will split the matrix by rows but not by groups thereof. Stumped.

Second, I cannot seem to get rid of the NA values, which would prevent the
regression even is I could figure out how to split the matrices correctly,
e.g.:
 shp_split = split(shp,row(shp))
 shp_split = shp_split[!is.na(shp_split)]
 shp_split[1]
$`1`
  [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67 0.17 1.36
1.84 1.06 0.56   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
  NA   NA   NA etc

IF I solve these problems, will I in fact be able to perform individual
linear regressions on the (numerous) collections of 5 to 10 rows?

Thanks as always for any insight.


Jim Bouldin
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear regression on groups of consecutive rows of a matrix

2009-11-24 Thread Jim Bouldin


 But I do feel compelled to ask: Do you really get meaningful  
 information from lm applied to 5 cases? Especially when the predictors  
 used may not be the same from subset to subset???

Thanks again for your help David.  Your question is a good one. It's a bit
complicated but here's the basics. The predictors are the same between
subsets, in the sense that, for each group of rows (which represent tree
ring years), the predictors and predictands are always from the same set of
trees, even though that set changes slightly between consecutive subsets. 
Typically there will be 20+ observations per year (row), so for 5 rows I
have n = 100+.  For my purposes (removing the effect of tree size on ring
width for small groups of years) that is more than good enough.

Now to try out your suggestion...
Jim


 
 -- 
 David
 
 On Nov 24, 2009, at 3:25 PM, Jim Bouldin wrote:
 
 
  I want to perform linear regression on groups of consecutive rows-- 
  say 5 to
  10 such--of two matrices.  There are many such potential groups  
  because the
  matrices have thousands of rows. The matrices are both of the form:
 
  shp[1:5,16:20]
   SL495B SL004C SL005C SL005A SL017A
  -2649   1.06   0.56 NA NA NA
  -2648   0.97   0.57 NA NA NA
  -2647   0.46   0.30 NA NA NA
  -2646   0.92   0.48 NA NA NA
  -2645   0.82   0.48 NA NA NA
 
  That is, they both have NA values, and non-NA values, in the same  
  matrix
  positions.  In my attempts so far, I have had two problems.  First, in
  using the split function (which I assume is essential here), I am  
  unable to
  split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc):
 
  shp_split = split(shp,row(shp))
 
  will split the matrix by rows but not by groups thereof. Stumped.
 
  Second, I cannot seem to get rid of the NA values, which would  
  prevent the
  regression even is I could figure out how to split the matrices  
  correctly,
  e.g.:
  shp_split = split(shp,row(shp))
  shp_split = shp_split[!is.na(shp_split)]
  shp_split[1]
  $`1`
   [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67  
  0.17 1.36
  1.84 1.06 0.56   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
  NA   NA
   NA   NA   NA etc
 
  IF I solve these problems, will I in fact be able to perform  
  individual
  linear regressions on the (numerous) collections of 5 to 10 rows?
 
  Thanks as always for any insight.
 
 
  Jim Bouldin
  Research Ecologist
  Department of Plant Sciences, UC Davis
  Davis CA, 95616
  530-554-1740
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] consecutive numbering of elements in a matrix

2009-11-22 Thread Jim Bouldin


Many thanks to Dimitris, William and David for very helpful answers which
solved my problem.  Being a relatve newb, I am confused by something in the
solutions by Dimitris and David.

#Create a matrix A as follows:

 A - matrix(sample(50, 21), 7, 3)
 A[sample(21, 5)] - NA;A

 [,1] [,2] [,3]
[1,]   36   38   24
[2,]6   33   13
[3,]   12   42   10
[4,]7   NA   NA
[5,]   48   NA   NA
[6,]3   NA   47
[7,]   29   234

 B = row(A) - apply(is.na(A), 2, cumsum);B

 [,1] [,2] [,3]
[1,]111
[2,]222
[3,]333
[4,]433
[5,]533
[6,]634
[7,]745

#But:

 B = row(A) - apply(!is.na(A), 2, cumsum);B
 [,1] [,2] [,3]
[1,]000
[2,]000
[3,]000
[4,]011
[5,]022
[6,]032
[7,]032

This seems exactly backwards to me.  The is.na(A) command should be
cumulatively summing the NA values and !is.na(A) should be doing so on the
non-NA values.  But the opposite is the case.  I'm glad I have a solution
but this apparent backwardness of expected logic has me worried.

I do have another, tougher question if anyone has the time, which is, given
a resulting matrix like B below:

 is.na(B) - is.na(A);B

 [,1] [,2] [,3]
[1,]111
[2,]222
[3,]333
[4,]4   NA   NA
[5,]5   NA   NA
[6,]6   NA4
[7,]745

how can I rearrange all the columns so that equal values are in the same
row, i.e. in the case above, the NA values are removed from columns 2 and 3
and all non-NA values that had been below them are moved up to replace them.

Thanks again for your help.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] consecutive numbering of elements in a matrix

2009-11-22 Thread Jim Bouldin


Thank you Dimitris, that solves it exactly!  I continue to be amazed at how
a single line of code can be so powerful in R, containing so much
information.  Hard as hell to interpret though (for me).
Jim

 one approach is the following:
 
 B - cbind(c(1:6, NA), c(1:3, NA,NA,NA, 4), c(1:3, NA,NA, 4,5))
 matrix(B[order(col(B), B)], nrow(B), ncol(B))
 
 
 I hope it helps.
 
 Best,
 Dimitris


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] consecutive numbering of elements in a matrix

2009-11-21 Thread Jim Bouldin


Within a very large matrix composed of a mix of values and NAs, e.g, matrix A:

 [,1] [,2] [,3]
[1,]1   NA   NA
[2,]3   NA   NA
[3,]3   10   17
[4,]4   12   18
[5,]6   16   19
[6,]6   22   20
[7,]5   11   NA

I need to be able to consecutively number, in new columns, the non-NA
values within each column (i.e. A[1,1] A[3,2] and A[3,3] would all be set
to one, and subsequent values in those columns would increase by one, until
the last non-NA value is reached, if any). 

Any ideas?
Thanks


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] consecutive numbering of elements in a matrix

2009-11-21 Thread Jim Bouldin


Thank you and apologies--I did not make it clear that there are no NAs
mixed in with the valid values.  Rather, they all occur consecutively,
either toward the beginning of end of the column.
Jim

 I didn't know what you wanted to do if there were NA's
 in the middle of a column.
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com  
 
 
 
  
  Any ideas?
  Thanks
  
  
  Jim Bouldin, PhD
  Research Ecologist
  Department of Plant Sciences, UC Davis
  Davis CA, 95616
  530-554-1740
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subsetting from a vector or matrix

2009-09-24 Thread Jim Bouldin


I realize this should be simple but I'm having trouble subsetting vectors
and matrices, for example extracting all values meeting a certain
criterion, from a vector. Cannot seem to figure out the correct syntax and
help page not very helpful.  Or should I be using some other function than
subset.  Thanks for any help.

Jim Bouldin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem selecting rows meeting a criterion

2009-08-11 Thread Jim Bouldin


No problem John, thanks for your help, and also thanks to Dan and Patrick.
Wasn't able to read or try anybody's suggestions yesterday.  Here's what
I've discovered in the meantime:

What I did not include yesterday is that my original data frame, called
data, was this:

   X Y   V3
1  1 1 0.00
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.00
6  1 2 8.062258
7  2 2 0.00
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
13 3 3 0.00
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
19 4 4 0.00
20 5 4 5.385165
21 1 5 5.00
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165
25 5 5 0.00

To this data frame I applied the following command:

data - data[data$V3 0,];data #to remove all rows where V3 = 0

giving me this (the point from which I started yesterday):

   X Y   V3
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.00
6  1 2 8.062258
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
20 5 4 5.385165
21 1 5 5.00
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165

So far so good.  But when I then submit the command
 data = data[XY,] #to select all rows where X  Y

I get the problem result already mentioned, namely:

   X Y   V3
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.00
6  1 2 8.062258
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
17 2 4 2.236068
18 3 4 8.062258
24 4 5 5.385165

which is clearly wrong!  It doesn't matter if I give a new name to the data
frame at each step or not, or whether I use the name data or not.  It
always gives the same wrong answer.

However, if I instead use the command:
subset(data, XY), I get the right answer, namely:

   X Y   V3
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.00
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
14 4 3 8.062258
15 5 3 5.099020
20 5 4 5.385165

OK so the lesson so far is use the subset function.  But here it gets
weirder.  If I instead go straight from the initial data frame (data,
given at the top of this post), selecting only rows where XY (without the
intermediate step of removing rows with V3 = 0, which although is 
unnecessary in getting the result I want, is very relevant to the larger
issue here), by using the command that caused me the original trouble (data
= data[XY,]), I get the RIGHT answer (the data frame just above).  The
subset function also gives the right answer. Now what in the world is going
on?  This kind of thing scares me.

Below is the full set of commands starting from scratch: 

#Point of the following is to measure the pairwise euclidean distances
between 5 objects, each having X and Y coordinates
#and put them into data frame format that labels each pair and gives the
distance between them

d = data.frame(x=sample(1:10, 5), y=sample(1:10, 5)) #create a sample data set
ss2 = as.data.frame(as.matrix(dist(d))) #create a data.frame to extract row
and column names
X = rep(seq(1:length(row.names(ss2))), length(names(ss2))) #make a vector
containing the X coordinate names
Y = rep(seq(1:length(names(ss2))), length(row.names(ss2))) #the same for Y
Y = sort(Y) #first sort
coords = cbind(X, Y);rm(X,Y) #then cbind and remove X and Y
data1 = as.data.frame(cbind(coords,
as.vector(as.matrix(dist(d);rm(coords) # column bind the 3 vectors
data2 = data1[data1$V3 0,] #remove those with V3 = 0 (= the original
matrix diagonal)
data3 = data2[XY,] #remove duplicates from original distance matrix
data1;data2;data3

Thoughts much appreciated.  Thanks.
Jim Bouldin

 
 Clearly I was more tired than I realised last night. :( My appologies.
 
 In any case with the data.frame name changed to xx this seems to give you
 what you want
 
   subset(xx, xx[,1]  xx[,2])
 
 or using the data name
subset(data, data[,1]  data[,2])  
 should work as well

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem selecting rows meeting a criterion

2009-08-11 Thread Jim Bouldin


Yes, thanks Steve and also to everyone else for helping me clear this up.
The issue was definitely the existence of other objects named X and Y that
I inadvertently referred to in my command statement.  Only when these
objects are removed AND the data frame in question is attached, will the
command I originally used work.  However, I see that it is much easier to
just use the subset function or perhaps the with function. Seems that R has
many painful lessons to teach. Thanks again.
Jim Bouldin

 This won't work in general, and is probably only working in this  
 particular case because you already have defined somewhere in your  
 workspace vars named X and Y.
 
 What you wrote above isn't taking the values X,Y from data$X and data 
 $Y, respectively, but rather from var X and Y defined elsewhere.
 
 Instead of doing data[X  Y], do:
 
 data[data$X  data$Y,]
 
 This should get you what you're expecting.
...
 
 Hopefully you're learning a slightly different lesson now :-)
 
 Does that clear things up at all?
 
 -steve
 
 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
|  Memorial Sloan-Kettering Cancer Center
|  Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem selecting rows meeting a criterion

2009-08-10 Thread Jim Bouldin


When I try to select only those rows from the following data frame, called
data, in which X  Y

   X Y   V3
2  2 1 8.062258
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.00
6  1 2 8.062258
8  3 2 9.486833
9  4 2 2.236068
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
14 4 3 8.062258
15 5 3 5.099020
16 1 4 6.324555
17 2 4 2.236068
18 3 4 8.062258
20 5 4 5.385165
21 1 5 5.00
22 2 5 5.656854
23 3 5 5.099020
24 4 5 5.385165

using the commands
 attach(data) 
 data2 = data[X Y,];data2

I get this for data2:

   X Y   V3
3  3 1 2.236068
4  4 1 6.324555
5  5 1 5.00
6  1 2 8.062258
10 5 2 5.656854
11 1 3 2.236068
12 2 3 9.486833
17 2 4 2.236068
18 3 4 8.062258
24 4 5 5.385165

Clearly, this is not what I intend but I cannot figure out what I've done
wrong.  Any help appreciated.  Thanks.

Jim Bouldin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem selecting rows meeting a criterion

2009-08-10 Thread Jim Bouldin


What's wrong is I'm trying to select only those rows in which X  Y, but
I'm getting rows in which Y  X and losing some in which X  Y.  The row
numbers are not being read as values.  Very confusing.
Jim
 
 What's wrong with it? It looks okay to me.  If you use
  subset(data, data$X data$Y)you get the same results. Any chance you're
 reading the row.numbers as values?
 
 BTW data is a reserved word in R and it is good practice not to use it
 as a variable name.
 
 My Results
 
 X Y   V3
  3  3 1 2.236068
  4  4 1 6.324555
  5  5 1 5.00
  6  1 2 8.062258
  10 5 2 5.656854
  11 1 3 2.236068
  12 2 3 9.486833
  17 2 4 2.236068
  18 3 4 8.062258
  24 4 5 5.385165
 
 
 --- On Mon, 8/10/09, Jim Bouldin jrboul...@ucdavis.edu wrote:
 
  From: Jim Bouldin jrboul...@ucdavis.edu
  Subject: [R] problem selecting rows meeting a criterion
  To: r-help@r-project.org
  Received: Monday, August 10, 2009, 5:49 PM
  
  When I try to select only those rows from the following
  data frame, called
  data, in which X  Y
  
     X Y       V3
  2  2 1 8.062258
  3  3 1 2.236068
  4  4 1 6.324555
  5  5 1 5.00
  6  1 2 8.062258
  8  3 2 9.486833
  9  4 2 2.236068
  10 5 2 5.656854
  11 1 3 2.236068
  12 2 3 9.486833
  14 4 3 8.062258
  15 5 3 5.099020
  16 1 4 6.324555
  17 2 4 2.236068
  18 3 4 8.062258
  20 5 4 5.385165
  21 1 5 5.00
  22 2 5 5.656854
  23 3 5 5.099020
  24 4 5 5.385165
  
  using the commands
   attach(data) 
   data2 = data[X Y,];data2
  
  I get this for data2:
  
     X Y       V3
  3  3 1 2.236068
  4  4 1 6.324555
  5  5 1 5.00
  6  1 2 8.062258
  10 5 2 5.656854
  11 1 3 2.236068
  12 2 3 9.486833
  17 2 4 2.236068
  18 3 4 8.062258
  24 4 5 5.385165
  
  Clearly, this is not what I intend but I cannot figure out
  what I've done
  wrong.  Any help appreciated.  Thanks.
  
  Jim Bouldin
 
 
 
   __
 Ask a question on any topic and get answers from real people. Go to Yahoo!
 Answers and share what you know at http://ca.answers.yahoo.com
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R's database capabilities

2009-08-04 Thread Jim Bouldin


I admit that I've not done a thorough search on this topic, but from the
several instructional manuals and/or tutorials I've looked at, I don't see
any mention of relational database capabilities in R?  Have I missed
something, and if so, can someone  point me in the right direction to get
started?  Thanks!


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error message: .Random.seed is not an integer vector but of type 'list'

2009-07-23 Thread Jim Bouldin


I'm trying to run this simple random sample procedure and keep getting the
error message shown. I don't understand this; I've designated x as a
numeric vector, so what is going on here?  Thanks.

 x = as.vector(c(1:12));x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
 mode(x)
[1] numeric
 sample(x, 3)
Error in sample(x, 3) : 
  .Random.seed is not an integer vector but of type 'list'
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error message: .Random.seed is not an integer vector but of type 'list'

2009-07-23 Thread Jim Bouldin


Thank you.  However, when I tried that, I got this message:

Warning message:
In rm(.Random.seed) : variable .Random.seed was not found
 
 
 Jim Bouldin wrote:
  I'm trying to run this simple random sample procedure and keep getting
 the
  error message shown. I don't understand this; I've designated x as a
  numeric vector, so what is going on here?  Thanks.
  
  x = as.vector(c(1:12));x
   [1]  1  2  3  4  5  6  7  8  9 10 11 12
  mode(x)
  [1] numeric
  sample(x, 3)
  Error in sample(x, 3) : 
.Random.seed is not an integer vector but of type 'list'
 
 
 Something has changed/corrupted an object called .Random.seed that is 
 required by the Random Number Generator.
 
 Just say
   rm(.Random.seed)
 and try again.
 
 Uwe Ligges
 
 
 
  Jim Bouldin, PhD
  Research Ecologist
  Department of Plant Sciences, UC Davis
  Davis CA, 95616
  530-554-1740
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error message: .Random.seed is not an integer vector but of type 'list'

2009-07-23 Thread Jim Bouldin


 
 
 Jim Bouldin wrote:
  Thank you.  However, when I tried that, I got this message:
  
  Warning message:
  In rm(.Random.seed) : variable .Random.seed was not found
 
 
 In that case, have you attached some package that has its own
 .Random.seed?
 Try to find where the current .random.seed comes from R complains about.
 
 Uwe Ligges

No, there are no attached packages, just the ones that load automatically.
The R commander has some type of RNG but it is not loaded.  Completely stumped.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error message: .Random.seed is not an integer vector but

2009-07-23 Thread Jim Bouldin


Thanks much Ted.  I actually had just tried what you suggest here before
you posted, and resolved the problem.  Thanks also for the other tips.  I
wrote x = as.vector(c(1:12)) because I thought that the mode of x might be
the problem, the error message pointing to .Random.seed notwithstanding.

On a related note, I did a brief test a couple weeks back where I ran a
million random samples of 3 from the vector 1:12 and compared the mean
against the known mean.  It was off by 1 percent, which indicated that the
RNG was biased more than I'd have thought.  Comments?
Jim
 
 Follow-up to my previous reply (just posted). Having read the other
 responses and your reactions, try the following:
 
   rm(.Random.seed)
   set.seed(54321) ## (Or your favourite magic number) [*]
   x = as.vector(c(1:12))  ## To reproduce your original code ... !
   sample(x,3)
 
 [*] When you did rm(.Random.seed) as suggested by Uwe, the variable
 .Random.seed was lost, so you have to create it again.
 
 If, after the above, you still get the problem, then something is
 very seriously wrong.
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 23-Jul-09   Time: 17:23:09
 -- XFMail --
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random # generator accuracy

2009-07-23 Thread Jim Bouldin


Dan Nordlund wrote:

It would be necessary to see the code for your 'brief test' before anyone
could meaningfully comment on your results.  But your results for a single
test could have been a valid random result.

I've re-created what I did below.  The problem appears to be with the
weighting process: the unweighted sample came out much closer to the actual
than the weighted sample (1% error) did.  Comments?
Jim

 x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
 weights
 [1] 1 1 1 1 1 1 2 2 2 2 2 2

 a = mean(replicate(100,(sample(x, 3, prob = weights;a  # (1
million samples from x, of size 3, weighted by weights; the mean should
be 7.50)
[1] 7.406977
 7.406977/7.5
[1] 0.987597

 b = mean(replicate(100,(sample(x, 3;b  # (1 million samples from
x, of size 3, not weighted this time; the mean should be 6.50)
[1] 6.501477
 6.501477/6.5
[1] 1.000227


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random # generator accuracy

2009-07-23 Thread Jim Bouldin


Thanks Greg, that most definitely was it.  So apparently the default is
sampling without replacement.  Fine, but this brings up a question I've had
for a bit now, which is, how do you know what the default settings are for
the arguments of any given function?  The HTML help files don't seem to
indicate in many (most) cases.  Thanks. 

 Try adding replace=TRUE to your call to sample, then you will get numbers
 closer to what you are expecting.
 
 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Jim Bouldin
  Sent: Thursday, July 23, 2009 12:00 PM
  To: r-help@r-project.org
  Subject: [R] Random # generator accuracy
  
  
  Dan Nordlund wrote:
  
  It would be necessary to see the code for your 'brief test' before
  anyone
  could meaningfully comment on your results.  But your results for a
  single
  test could have been a valid random result.
  
  I've re-created what I did below.  The problem appears to be with the
  weighting process: the unweighted sample came out much closer to the
  actual
  than the weighted sample (1% error) did.  Comments?
  Jim
  
   x
   [1]  1  2  3  4  5  6  7  8  9 10 11 12
   weights
   [1] 1 1 1 1 1 1 2 2 2 2 2 2
  
   a = mean(replicate(100,(sample(x, 3, prob = weights;a  # (1
  million samples from x, of size 3, weighted by weights; the mean
  should
  be 7.50)
  [1] 7.406977
   7.406977/7.5
  [1] 0.987597
  
   b = mean(replicate(100,(sample(x, 3;b  # (1 million samples
  from
  x, of size 3, not weighted this time; the mean should be 6.50)
  [1] 6.501477
   6.501477/6.5
  [1] 1.000227
  
  
  Jim Bouldin, PhD
  Research Ecologist
  Department of Plant Sciences, UC Davis
  Davis CA, 95616
  530-554-1740
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random # generator accuracy

2009-07-23 Thread Jim Bouldin


You are absolutely correct Ted.  When no weights are applied it doesn't
matter if you sample with or without replacement, because the probability
of choosing any particular value is equally distributed among all such. 
But when they're weighted unequally that's not the case.

It is also interesting to note that if the problem is set up slightly
differently, by say defining the vector x as:
x = c(1,2,3,4,5,6,7,7,8,8,9,9,10,10,11,11,12,12), effectively giving the
same probability of selection for the 12 integers as before, the same
problem does not arise, or at least not as severely:

 x2
 [1]  1  2  3  4  5  6  7  8  9 10 11 12  7  8  9 10 11 12

 d = mean(replicate(100,(sample(x2, 3;d  # (1 million samples from
x2, of size 3; the mean should be 7.50)
[1] 7.499233

 e = mean(replicate(100,(sample(x2, 3, replace = TRUE;e  # (1
million samples from x2, of size 3; with replacement this time the mean
should still be 7.50)
[1] 7.502085

 d/e
[1] 0.9996198

Jim
 
 To obtain the result you expected, you would need to explicitly
 specify replace=TRUE, since the default for replace is FALSE.
 (though probably what you really intended was sampling without
 replacement).
 
 Read carefully what is said about prob in '?sample' -- when
 replace=FALSE, the probability of inclusion of an element is not
 proportional to its weight in 'prob'.
 
 The reason is that elements with higher weights are more likely
 to be chosen early on. This then knocks that higher weight out
 of the contest, making it more likely that elements with smaller
 weights will be sampled subsequently. Hence the mean of the sample
 will be biased slightly downwards, relative to the weighted mean
 of the values in x.
 
   table(replicate(100,(sample(x, 3
   #  1  2  3  4  5  6
   # 250235 250743 249603 250561 249828 249777
   #  7  8  9 10 11 12
   # 249780 250478 249591 249182 249625 250597
 
 (so all nice equal frequencies)
 
   table(replicate(100,(sample(x, 3,prob=weights
   #  1  2  3  4  5  6
   # 174873 175398 174196 174445 173240 174110
   #  7  8  9 10 11 12
   # 325820 326140 325289 325098 325475 325916
 
 Note that the frequencies of the values with weight=2 are a bit
 less than twice the frequencies of the values with weight=1:
 
   (325820+326140+325289+325098+325475+325916)/
 (174873+175398+174196+174445+173240+174110)
   # [1] 
 
 
 In fact this is fairly easily caluclated. The possible combinations
 (in order of sampling) of the two weights, with their probabilities,
 are:
  1s  2s
 ---
 1 1 1   P =  6/18 *  5/17 *  4/163   0
 1 1 2   P =  6/18 *  5/17 * 12/162   1
 1 2 1   P =  6/18 * 12/17 *  5/152   1
 1 2 2   P =  6/18 * 12/17 * 10/151   2
 2 1 1   P = 12/18 *  6/16 *  5/152   1
 2 1 2   P = 12/18 *  6/16 * 10/151   2
 2 2 1   P = 12/18 * 10/16 *  6/141   2
 2 2 2   P = 12/18 * 10/16 *  8/140   3
 
 So the expected number of weight=1 in the sample is
 
   3*(6/18 *  5/17 *  4/16)  + 2*(6/18 *  5/17 * 12/16) +
   2*(6/18 * 12/17 *  5/15)  + 1*(6/18 * 12/17 * 10/15) +
   2*(12/18 *  6/16 *  5/15) + 1*(12/18 *  6/16 * 10/15) +
   1*(12/18 * 10/16 *  6/14) + 0
   = 1.046218
 
 Hence the expected number of weight=2 in the sample is
 
   3 - 1.046218 = 1.953782
 
 and their ratio 1.953782/1.046218 = 1.867471
 
 Compare this with the value 1.867351 (above) obtained by simulation!
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 23-Jul-09   Time: 21:05:07
 -- XFMail --
 

Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random # generator accuracy

2009-07-23 Thread Jim Bouldin


Perfectly explained Ted.  One might, at first reflection, consider that
simply repeating the values 7 through 12 and  sampling (w/o replacement)
from among the 18 resulting values, would be similar to just doubling the
selection probabilities for 7 through 12 and then sampling. That would
clearly not be true though.
Jim
 
 Whereas, if you replace x = c(1,2,3,4,5,6,7,8,9,110,11,12)
 with the weighted equivalent, doubling up 7-12 as in your
   x2 = c(1,2,3,4,5,6,7,7,8,8,9,9,10,10,11,11,12,12),
 each of the 18 items now has the same weight as the others,
 and the unweighted sampling
   mean(replicate(100,(sample(x2, 3
 now gives the mean of the 18 values (7.5); whereas -- as my
 calculation showed -- the effect of the sequential weighting is
 to bias the result slightly downwards (in your example; generally,
 in favour of the items with lower weights), since the way weighting
 works in sample() is not equivalent to replicating each item weight
 times.
 
 The general problem, of sampling without replacement in such a way
 that for each item the probability that it is included in the sample
 is proportional to a pre-assigned weight (sampling with probability
 proportional to size) is quite tricky and, for certain choices
 of weights, impossible. For a glimpse of what's inside the can of
 worms, have a look at the reference manual for the 'sampfling'
 package, in particular the function samprop():
 
 http://www.stats.bris.ac.uk/R/web/packages/sampfling/sampfling.pdf
 
 Ted.
 
 On 23-Jul-09 20:56:43, Jim Bouldin wrote:
  
  You are absolutely correct Ted.  When no weights are applied it doesn't
  matter if you sample with or without replacement, because the
  probability
  of choosing any particular value is equally distributed among all such.
  But when they're weighted unequally that's not the case.
  
  It is also interesting to note that if the problem is set up slightly
  differently, by say defining the vector x as:
  x = c(1,2,3,4,5,6,7,7,8,8,9,9,10,10,11,11,12,12), effectively giving
  the
  same probability of selection for the 12 integers as before, the same
  problem does not arise, or at least not as severely:
  
  x2
   [1]  1  2  3  4  5  6  7  8  9 10 11 12  7  8  9 10 11 12
  
  d = mean(replicate(100,(sample(x2, 3;d  # (1 million samples
  from
  x2, of size 3; the mean should be 7.50)
  [1] 7.499233
  
  e = mean(replicate(100,(sample(x2, 3, replace = TRUE;e  # (1
  million samples from x2, of size 3; with replacement this time the mean
  should still be 7.50)
  [1] 7.502085
  
  d/e
  [1] 0.9996198
  
  Jim
  
  To obtain the result you expected, you would need to explicitly
  specify replace=TRUE, since the default for replace is FALSE.
  (though probably what you really intended was sampling without
  replacement).
  
   -- when
  replace=FALSE, the probability of inclusion of an element is not
  proportional to its weight in 'prob'.
  
  The reason is that elements with higher weights are more likely
  to be chosen early on. This then knocks that higher weight out
  of the contest, making it more likely that elements with smaller
  weights will be sampled subsequently. Hence the mean of the sample
  will be biased slightly downwards, relative to the weighted mean
  of the values in x.
  
table(replicate(100,(sample(x, 3
#  1  2  3  4  5  6
# 250235 250743 249603 250561 249828 249777
#  7  8  9 10 11 12
# 249780 250478 249591 249182 249625 250597
  
  (so all nice equal frequencies)
  
table(replicate(100,(sample(x, 3,prob=weights
#  1  2  3  4  5  6
# 174873 175398 174196 174445 173240 174110
#  7  8  9 10 11 12
# 325820 326140 325289 325098 325475 325916
  
  Note that the frequencies of the values with weight=2 are a bit
  less than twice the frequencies of the values with weight=1:
  
(325820+326140+325289+325098+325475+325916)/
  (174873+175398+174196+174445+173240+174110)
# [1] 
  
  
  In fact this is fairly easily caluclated. The possible combinations
  (in order of sampling) of the two weights, with their probabilities,
  are:
   1s  2s
  ---
  1 1 1   P =  6/18 *  5/17 *  4/163   0
  1 1 2   P =  6/18 *  5/17 * 12/162   1
  1 2 1   P =  6/18 * 12/17 *  5/152   1
  1 2 2   P =  6/18 * 12/17 * 10/151   2
  2 1 1   P = 12/18 *  6/16 *  5/152   1
  2 1 2   P = 12/18 *  6/16 * 10/151   2
  2 2 1   P = 12/18 * 10/16 *  6/141   2
  2 2 2   P = 12/18 * 10/16 *  8/140   3
  
  So the expected number of weight=1 in the sample is
  
3*(6/18 *  5/17 *  4/16)  + 2*(6/18 *  5/17 * 12/16) +
2*(6/18 * 12/17 *  5/15)  + 1*(6/18 * 12/17 * 10/15) +
2*(12/18 *  6/16 *  5/15) + 1*(12/18 *  6/16 * 10/15) +
1*(12/18 * 10/16 *  6/14) + 0
= 1.046218
  
  Hence the expected number of weight=2

[R] unloading loaded packages

2009-04-30 Thread Jim Bouldin


I can't seem to find info on how to unload packages that have been loaded.
 My goal in doing so is to gain access to functions that have been masked
out by those packages.  Or is there another way to do so?  Thanks in advance.



Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

40 matches

Mail list logo