from:"Gavin Simpson"

Re: [R] Question about unicode characters in tcltk

2007-08-18 Thread Gavin Simpson

On Sat, 2007-08-18 at 14:40 +0200, Peter Dalgaard wrote:
 R Help wrote:
  hello list,
 
  Can someone help me figure out why the following code doesn't work?
  I'm trying to but both Greek letters and subscripts into a tcltk menu.
   The code creates all the mu's, and the 1 and 2 subscripts, but it
  won't create the 0.  Is there a certain set of characters that R won't
  recognize the unicode for?  Or am I input the \u2080 incorrectly?
 
  library(tcltk)
  m -tktoplevel()
  frame1 - tkframe(m)
  frame2 - tkframe(m)
  frame3 - tkframe(m)
  entry1 - tkentry(frame1,width=5,bg='white')
  entry2 - tkentry(frame2,width=5,bg='white')
  entry3 - tkentry(frame3,width=5,bg='white')
 
  tkpack(tklabel(frame1,text='\u03bc\u2080'),side='left')
  tkpack(tklabel(frame2,text='\u03bc\u2081'),side='left')
  tkpack(tklabel(frame3,text='\u03bc\u2082'),side='left')
 
  tkpack(frame1,entry1,side='top')
  tkpack(frame2,entry2,side='top')
  tkpack(frame3,entry3,side='top')
 
  thanks
  -- Sam
 

 Which OS was this? I can reproduce the issue on SuSE, but NOT Fedora 7.

I can reproduce this on Fedora 7 in that the \u2080 is reproduced as is
and not as a subscript, unlike the other \u which appear as
subscripted characters,

 sessionInfo()
R version 2.5.1 Patched (2007-08-02 r42389) 
i686-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tcltk stats graphics  grDevices utils
datasets 
[7] methods   base 

If there is something specific to my Fedora installation that is
different to Peter's that I can ascertain from installed packages/fonts
etc, then let me know and I can provide the output from my laptop.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RFclustering - is it available in R?

2007-08-15 Thread Gavin Simpson

On Wed, 2007-08-15 at 09:44 -0700, David Katz wrote:
 Several searches turned up nothing. Perhaps I will try to implement it if
 nobody else has. Thanks.

You can do this with Andy Liaw's randomForest package can do this and
the first hit on a Google search (on term RFclustering) was this:

http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering.htm

which shows how one might go about this with some helper functions.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Values in horizontal versus vertical position on 'y' axe

2007-08-13 Thread Gavin Simpson

On Mon, 2007-08-13 at 13:37 +0200, akki wrote:
 hi,
 When I do a graph. the values on y axe are vertical position. How can I put
 the values in horizontal position?
 
 thanks

If you mean How do I rotate the tick labels? then look at ?par and
parameter 'las'. E.g. this shows the options available:

opar - par(mfrow = c(2,2))
plot(1:10, main = expression(las == 0)) ## las = 0, default
plot(1:10, las = 1, main = expression(las == 1))
plot(1:10, las = 2, main = expression(las == 2))
plot(1:10, las = 3, main = expression(las == 3))
par(opar)

If not, try rephrasing your question, and provide an example of what you
can plot at the moment and what is wrong with it that you'd like to
change.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] clear workspace

2007-08-01 Thread Gavin Simpson

On Wed, 2007-08-01 at 14:06 +0200, Dong GUO 郭东 wrote:
 Dear all,
 
 How can I clear the workspace, as we do in Matlab clear all??
 
 Many thanks in advance.
 
 Dong

?rm 

E.g.:

rm(list = ls())

will remove everything shown by ls(). Look at ?ls to see possible
arguments to that function to fine tune this, for example, by default
objects that start with a . are omitted from the results of ls().

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting a smooth curve from predict

2007-07-31 Thread Gavin Simpson

On Tue, 2007-07-31 at 11:21 +0100, Wilson, Andrew wrote:
 Probably a very simple query:
 
 When I try to plot a curve from a fitted polynomial, it comes out rather
 jagged, not smooth like fitted curves in other stats software.  Is there
 a way of getting a smooth curve in R?
 
 What I'm doing at the moment (for the sake of example) is:
 
  x - c(1,2,3,4,5,6,7,8,9,10)
 
  y - c(10,9,8,7,6,6.5,7,8,9,10)
 
  b - data.frame(cbind(x,y))
 
  w - gls(y ~ I(x)+I(x^2),correlation=corARMA(p=1),method=ML,data=b)
 
  plot(predict(w),type=l)

replace the line above with the following:

pred.dat - data.frame(x = seq(min(x), max(x), length.out = 100))
plot(predict(w, pred.dat), type = l)

The general idea is to produce predictions over the range of x, so we
produce a new data frame with component x, that contains 100 values from
min(x) to max(x). We then get predicted values for each of these new
values of the predictor in pred.dat, and plot them

Increase/decrease length.out to get something suitably smooth without
sending your computer into meltdown.

HTH

G

 
 Many thanks,
 
 Andrew Wilson
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Q: obtaining non-transparent background in png

2007-07-31 Thread Gavin Simpson

On Tue, 2007-07-31 at 10:22 -0600, D. R. Evans wrote:
 I am not understanding something about generating PNG plots.
 
 I have tried several ways to obtain something other than a transparent
 background, but nothing I've done seems to change the background.
 
 For example:
 
 dev.print(png, width=800, height=600, bg='red', filename='example.png')
 
 which I thought would give a red background, simply gives the same
 transparent background I always get.

?dev.print says:

 'dev.print' copies the graphics contents of the current device to
 a new device which has been created by the function specified by
 'device' and then shuts the new device.

Note copies - given that you've already drawn a figure with a white
background, should this then produce one that is red? However, you are
correct that it does produce a plot with a transparent background.

I find it easier to wrap my plotting commands in the relevant device,
e.g. this works with the desired background:

 png(mypng.png, height = 400, width = 400, bg = red, 
  pointsize = 12)
 plot(1:10)
 dev.off()

Whereas these do not give red backgrounds as one might have expected,
but transparent ones:

 plot(1:10)
 dev.print(png, height = 400, width = 400, bg = red, pointsize = 12, 
filename = mypng2.png)
X11
  2
 dev.copy(png, height = 400, width = 400, bg = red, pointsize = 12, 
   filename = mypng3.png)
PNG
  3
 dev.off()
X11
  2

Not sure whether this is as intentional or not, but it does not appear
to be passing the bg argument on to the 'device', or if it does, it is
not being used/respected - perhaps all that is need is clarification as
to what can be specified in '...' in ?dev.print

 version
   _
platform   i686-pc-linux-gnu
arch   i686
os linux-gnu
system i686, linux-gnu
status Patched
major  2
minor  5.1
year   2007
month  07
day05
svn rev42131
language   R
version.string R version 2.5.1 Patched (2007-07-05 r42131)

G

 
 And I also don't understand why the default background is transparent,
 when the documentation seems to say that it's white:
   png(filename = Rplot%03d.png, width = 480, height = 480,
  pointsize = 12, bg = white,  res = NA,...)
 
 (This is on a Kubuntu dapper 64-bit system.)
 
 [I looked through the mail archives, and there seem to be a few very
 old postings talking about the opposite problem, but nothing recent;
 so I conclude that I'm doing something wrong.]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Set

2007-07-23 Thread Gavin Simpson

On Sun, 2007-07-22 at 21:51 -0700, Stephen Tucker wrote:
 It turns out that - and   (space) are not valid variable names. 

They are valid names, the problem is that they aren't very convenient to
use, as the OP discovered, because they need to be quoted.

Note that if using something like read.csv or read.table, R will correct
these problem variable names for you when you import the data. If you
read this file in for example:

Mydata,S-sharif,A site
1,45,34
2,66,45
3,79,56

using read.csv, you get easy to use names

 dat - read.csv(temp.csv)
 dat
  Mydata S.sharif A.site
1  1   45 34
2  2   66 45
3  3   79 56

You can turn off this safety checking using the argument check.names =
FALSE

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Set

2007-07-22 Thread Gavin Simpson

On Sun, 2007-07-22 at 03:25 -0700, amna khan wrote:
 Hi Sir
 I have made a data set having 23 stations of rainfall.
 when I use the attach function to approach indevidual stations then
 following error occurr.
 
 *attach(data)*
 *S.Sharif#S.Sharif is the station  name which has 50 data values*
 *Error: object S.Sharif not found*
 Now how to solve this problem.

Then you don't have a column named exactly S.Sharif in your object
data.

What does str(data) and names(data) tell you about the columns in your
data set? If looking at these doesn't help you, post the output from
str(data) and names(data) and someone might be able to help.

You should always check that R has imported the data in the way you
expect; just because you think there is something in there called
S.Sharif doesn't mean R sees it that way.

You also seem to have included the R-Help email address twice in the To:
header of your email - once is sufficient.

G

 Thank You
 Regards
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Set

2007-07-22 Thread Gavin Simpson

On Sun, 2007-07-22 at 12:09 -0700, amna khan wrote:
 Sir the station name S.Sharif exists in the data but still the error is
 ocurring of being not found.
 Please help in this regard.

If you take the time to do what I asked and actually post the results of
typing the following into your R session:

str(data)

And send the output to the list, then we will be able to help.

Did you read /all/ of my email? I did ask you to do this.

HTH

G

 
 
 On 7/22/07, Gavin Simpson [EMAIL PROTECTED] wrote:
 
  On Sun, 2007-07-22 at 03:25 -0700, amna khan wrote:
   Hi Sir
   I have made a data set having 23 stations of rainfall.
   when I use the attach function to approach indevidual stations then
   following error occurr.
  
   *attach(data)*
   *S.Sharif#S.Sharif is the station  name which has 50 data values*
   *Error: object S.Sharif not found*
   Now how to solve this problem.
 
  Then you don't have a column named exactly S.Sharif in your object
  data.
 
  What does str(data) and names(data) tell you about the columns in your
  data set? If looking at these doesn't help you, post the output from
  str(data) and names(data) and someone might be able to help.
 
  You should always check that R has imported the data in the way you
  expect; just because you think there is something in there called
  S.Sharif doesn't mean R sees it that way.
 
  You also seem to have included the R-Help email address twice in the To:
  header of your email - once is sufficient.
 
  G
 
   Thank You
   Regards
  
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 
 
 
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gamma MLE

2007-07-22 Thread Gavin Simpson

On Sat, 2007-07-21 at 11:36 -0700, rach.s wrote:
 Hello,
 
 I was asked to try the following code on R,
 

I think, if you typed the code below *exactly* as you reproduced it in
your email, that you are missing the assignment operator - between
gamma.mles and function(xx, shape0, rate0), i.e.:

gamma.mles - function(xx, shape0, rate0)
{
   ##function body here
}

Does this help? Notice where the error occurs - immediately after the
line gamma.mles, if you paste the code exactly as you reproduced.

G

 gamma.mles
 function (xx,shape0,rate0)
 {
 n- length(xx)
 xbar- mean(xx)
 logxbar- mean(log(xx))
 theta-c(shape0,rate0)
 repeat {
 theta0- theta
 shape- theta0[1]
 rate- theta0[2]
 S- n*matrix(c(log(rate)-digamma(shape)+logxbar,shape/rate-xbar),ncol=1)
 I- n*matrix(c(trigamma(shape),-1/rate,-1/rate,shape/rate^2),ncol=2)
 theta- theta0 + solve(I) %*% S
 if(max(abs(theta-theta0))  1e-08)
 break
 }
 list(estimates=theta, infmat=I)
 }
 
 However, this appears: Error: object gamma.mles not found
 
 I tried looking in the packages for gamma.mles, but I couldn't find it
 anywhere. Can someone tell me where can I load it?
 
 Thanks
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] main title on splited windows.

2007-07-20 Thread Gavin Simpson

On Fri, 2007-07-20 at 12:16 -0700, Milton Cezar Ribeiro wrote:
 Dear all,
 
 How can I put a main title on the top of a windows?
 I would like put a title like This is my for graphics :-)

Create an outer margin to the plot and pop the title in there, e.g.:

## your data
v1-sort(runif(50))
v2-sin(v1*3.14)

## set up plotting region
## set oma to have a 2 line + a bit margin at the top, 
## 0 lines on other 3 sides
opar - par(mfrow=c(2,2), oma = c(0, 0, 2.1, 0))
plot(v1,main=Sort V1)
plot(v2,main=Sin(V1))
hist(v1,main=Histogram of V1)
boxplot(v1,v2, main=Box plot - v1  v2)
## now add the title - use outer = TRUE to get it in correct place
## we use cex.main to increase the size a bit
title(main = This is my for graphics, outer = TRUE, cex.main = 1.5)
## reset the plotting parameters
par(opar)

Is this what you wanted?

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting Enigma: More rows after dataframe[-list,]?

2007-07-18 Thread Gavin Simpson

On Wed, 2007-07-18 at 11:40 +0200, Johannes Graumann wrote:
 Hello again,
 
 I'm trying to purge the indexes in i.delete from frame and end up with more
 rows!? Please be so kind and let me know where I screw this up ...

I think you'll have to explain why you think there are more rows after
using i.delete than before. (1975 - 173 = 1802). By purge, you mean
delete the rows indexed by i.delete? If so, you are doing nothing wrong:

 frame - data.frame(matrix(runif(1975*10), ncol = 10))
 i.delete - sample(nrow(frame), 173) # random rows to delete
 nrow(frame)
[1] 1975
 nrow(frame[-i.delete, ])
[1] 1802
 nrow(frame)  nrow(frame[-i.delete, ])
[1] TRUE

G

 
 Joh
 
  i.delete
   [1]   40   45  165  212  253  270  280  287  301  352  421  433  463  467 
 487
  [16]  517  537  542  573  594  596  612  614  621  635  650  696  699  707 
 732
  [31]  738  776  826  891  892  936  937  935  940  976  988  995 1037 1043
 1059
  [46] 1081  1123 1128 1132 1140 1153 1155 1165 1176 1179 1200 1281 1289
 1300
  [61] 1320 1346 1356 1366 1369 1396 1406 1420 1428 1429 1471 1474 1475 1525
 1540
  [76] 1554 1565 1645 1667 1665 1706 1711 1724 1764 1788 1791 1805 1808 1847
 1881
  [91]   10   18  137  238  254  260  262  288  292  314  338  349  414  447 
 457
 [106]  465  470  478  511  530  536  552  582  588  644  655  687  693  701 
 724
 [121]  739  763  771  836  848  859  888  900  902  919  939  972  979  989
 1000
 [136] 1002 1015 1020 1026 1029 1032 1055 1060 1073 1088 1104 1117 1124 1130
 1135
 [151] 1144 1221 1225 1249 1251 1257 1376 1384 1386 1453 1487 1529 1532 1534
 1605
 [166] 1624 1633 1646 1648 1702 1787 1948 1951
 
  length(i.delete)
 [1] 173
 
  nrow(frame)
 [1] 1975
 
  nrow(frame[-i.delete,])
 [1] 1802
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dates() is a great date function in R

2007-07-18 Thread Gavin Simpson

On Wed, 2007-07-18 at 12:14 -0700, Mr Natural wrote: 
 Proper calendar dates in R are great for plotting and calculating. 
 However for the non-wonks among us, they can be very frustrating.
 I have recently discussed the pains that people in my lab have had 
 with dates in R. Especially the frustration of bringing date data into R 
 from Excel, which we have to do a lot. 

I've always found the following reasonably intuitive:

Given the csv file that I've pasted in below, the following reads the
csv file in, formats the dates and class Date and then draws a plot.

I have dates in DD/MM/ format so year is not first - thus attesting
to R not hating dates in this format ;-)

## read in csv data
## as.is = TRUE stops characters being converted to factors
## thus saving us an extra step to convert them back
dat - read.csv(date_data.csv, as.is = TRUE)

## we convert to class Date
## format tells R how the dates are formatted in our character strings
## see ?strftime for the meaning and available codes
dat$Date - as.Date(dat$Date, format = %d/%m/%Y)

## check this worked ok
str(dat$Date)
dat$Date

## see nicely formatted dates and not a drop of R-related hatred 
## but just about the most boring graph I could come up with
plot(Data ~ Date, dat, type = l)

And you can keep your Excel file formatted as dates as well - bonus!

Oh, and before you get Martin'd, it is the chron *package*!

HTH

G

CSV file I used, generated in OpenOffice.org, but I presume it stores
Dates in the same way as Excel?:

Data,Date
1,01/01/2007
2,02/01/2007
3,03/01/2007
4,04/01/2007
5,05/01/2007
6,06/01/2007
7,07/01/2007
8,08/01/2007
9,09/01/2007
10,10/01/2007
11,11/01/2007
10,12/01/2007
9,13/01/2007
8,14/01/2007
7,15/01/2007
6,16/01/2007
5,17/01/2007
4,18/01/2007
3,19/01/2007
2,20/01/2007
1,21/01/2007
1,22/01/2007
2,23/01/2007
3,24/01/2007

 Please find below a simple analgesic for R date importation that I
 discovered 
 over the last 1.5 days (Learning new stuff in R is calculated in 1/2 days).
 
 The functiondates()gives the simplest way to get calendar dates into
 R from Excel that I can find.
 But straight importation of Excel dates, via a csv or txt file, can be a a
 huge pain (I'll give details for anyone who cares to know). 
 
 My pain killer is:
 Consider that you have Excel columns in month, day, year format. Note that R
 hates date data that does not lead with the year. 
 
 a. Load the chron library by typing   library(chron)   in the console.
 You know that you need this library from information revealed by 
 performing the query,
 ?dates()in the Console window. This gives the R documentation 
 help file for this and related time, date functions.  In the upper left 
 of the documentation, one sees dates(chron). This tells you that you
 need the library chron. 
 
 b. Change the format dates in Excel to format general, which gives 
 5 digit Julian dates. Import the csv file (I useread.csv()  with the 
 Julian dates and other data of interest.
 
 c.  Now, change the Julian dates that came in with the csv file into 
 calendar dates with thedates() function. Below is my code for performing 
 this activity, concerning an R data file called ss,
 
 ss holds the Julian dates, illustrated below from the column MPdate,
 
 ss$MPdate[1:5]
 [1] 34252 34425 34547 34759 34773
 
 The dates() function makes calendar dates from Julian dates,
 
 dmp-dates(ss$MPdate,origin=c(month = 1, day = 1, year = 1900))
 
  dmp[1:5]
 [1] 10/12/93 04/03/94 08/03/94 03/03/95 03/17/95
 
 I would appreciate the comments of more sophisticated programmers who
 can suggest streamlining or shortcutting this operation.
 
 regards, Don
 
 
 
  
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] distance function (analogue)

2007-07-17 Thread Gavin Simpson

On Tue, 2007-07-17 at 14:32 +0200, Birgit Lemcke wrote:
 Hello R-Users,
 
 its again me with a question.
 Im using R 2.5.0 on Mac Power Book running Mac OS X 10.4.10

And again, you should send this to the maintainer (that'd be me), *not*
R-help! Unless you are asking help on a widely used package, it is
unlikely that anyone on R-Help can help you. And I'm not being modest
when I say that analogue is unlikely to fit that description.

 
 I try to calculate distances betweeen two data tables looking like this
 
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18  
 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36
 11  0  0  0  1  1  0  1  0   0   0   1   0   0   0   0   0   1
 0   0   1   0   0   0   0   1   1   0   1   1   0   0   0   0   0   0
 21  1  0  0  1  1  0  0  1   0   1   1   0   0   0   0   0   1
 0   0   1   0   0   0   0   1   1   0   0   1   0   0   1   1   0   0
 31  1  0  0  1  1  1  1  0   0   1   0   0   0   0   0   0   1
 0   0   1   1   0   0   0   0   1   0   1   0   0   1   0   0   0   0
 40  1  0  0  1  1  0  1  0   0   0   1   0   0   0   0   0   1
 0   0   1   0   0   1   0   0   1   0   0   1   0   0   1   0   0   0
 50  1  0  0  1  1  0  0  1   0   0   1   1   0   0   0   0   1
 0   0   1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0
 61  1  0  0  1  1  0  1  1   0   1   0   0   0   0   0   1   1
 0   0   1   0   0   0   0   0   1   0   0   1   0   1   1   0   0   0

I need to know what R thinks the objects look like, not how you think
they look.

 As i know I did the same 2 weeks ago and it worked properly.
 
 Here are my codes:
 
 Table1-read.table(Alle_DatenFemBeschr4a.csv, sep = ;)
 
 Table0-read.table(Alle_DatenMalBeschr4a.csv, sep = ;)
 
 Dist.Gower- distance(Table1 ,Table0 ,method =mixed, weights = c 
 (0.333, 0.333, 0.333, 0.500, 0.500, 0.500,  
 0.500, 0.500, 0.500, 0.250, 0.250, 0.250,  
 0.250, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571,  
 0.1428571, 0.1428571, 0.333, 0.333, 0.333, 0.200,  
 0.200, 0.200, 0.200, 0.200, 0.111, 0.111,  
 0.111, 0.111, 0.111, 0.111, 0.111, 0.111,  
 0.111, 0.200, 0.200, 0.200, 0.200, 0.200,  
 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571,  
 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571,  
 0.1428571, 0.1428571, 0.200, 0.200, 0.200, 0.200,  
 0.200, 0.500, 0.500, 0.500, 0.500, 0.500,  
 0.500, 0.500, 0.500, 0.500, 0.500, 0.500,  
 0.500, 0.500, 0.500, 0.500, 0.500, 0.500,  
 0.500, 0.500, 0.500, 0.500, 0.500, 0.500,  
 0.500))
 
 This produces the following massage:
 
 Fehler in distance(Table1, Table0, method = mixed, weights = c 
 (0.333,  :
   Different columns (species) are coded as factors in 'x' and 'y'

This means that Table1 has columns coded as factors that are not coded
factors in Table0 or vice versa.

Do 

 str(Table1)
 str(Table0)

and compare the output.

Alternatively, send me the two csv files you are reading in and I'll try
to track down the problem for you.

G

Ps: It would be easier to read if you did

my.weights - c (0.333, 0.333, 0.333, 0.500, 0.500,
0.500, 0.500, 0.500, 0.500, 0.250, 0.250,
0.250, 0.250, 0.1428571, 0.1428571, 0.1428571, 0.1428571,
0.1428571, 0.1428571, 0.1428571, 0.333, 0.333, 0.333,
0.200, 0.200, 0.200, 0.200, 0.200, 0.111,
0.111, 0.111, 0.111, 0.111, 0.111, 0.111,
0.111, 0.111, 0.200, 0.200, 0.200, 0.200,
0.200, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571,
0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571,
0.1428571, 0.1428571, 0.1428571, 0.200, 0.200, 0.200,
0.200, 0.200, 0.500, 0.500, 0.500, 0.500,
0.500, 0.500, 0.500, 0.500, 0.500, 0.500,
0.500, 0.500, 0.500, 0.500, 0.500, 0.500,
0.500, 0.500, 0.500, 0.500, 0.500, 0.500,
0.500, 0.500)

Dist.Gower - distance(Table1, Table0, method =mixed, weights =
my.weights)

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] Alternative to xyplot()?

2007-07-17 Thread Gavin Simpson

On Tue, 2007-07-17 at 16:39 -0400, Manuel Morales wrote:
 Hello list,
 
 Is anyone aware of a non-lattice-based alternative to xyplot()?

x - rnorm(20)
y - rnorm(20)
plot(x, y) ?

If you mean some specific aspect of xyplot(), you'll have to tell us
what this is.

HTH

G

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] random sampling with some limitive conditions?

2007-07-09 Thread Gavin Simpson

-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why function 'sum' within 'aggregate' need an 'index'?

2007-07-09 Thread Gavin Simpson

On Mon, 2007-07-09 at 20:14 +0800, jingjiang yan wrote:
 Hi, people.
 I am using R-2.5.0 now, when tried the function aggregate with sum, it
 showed an  error as following:
  a - gl(3,10)
  b - rnorm(30)
  aggregate(b,list(a),sum)
 #  here is the error message,  it complained that an error in FUN(X[[1L]],
 missing INDEX, and no defaults value.
 
 but the tapply function will be okay.
  tapply(b,list(a),sum)
 1 2 3
  2.113349 -5.972195  4.854731
 
 furthermore, when I was using the R-2.5.0 pre-release version before.
 it could work well.
  a - gl(3,10)
  b - rnorm(30)
  aggregate(b,list(a),sum)  # it works well
   Group.1  x
 1   1 -1.0330482
 2   2  0.1235796
 3   3 -1.0086930
  tapply(b,list(a),sum)# so does tapply
  1  2  3
 -1.0330482  0.1235796 -1.0086930
 
 So, who can tell what should I do to overcome this?
 thanks a lot.

Update to R 2.5.1 as your first example works for me [version info
below]. If that is not possible, just use the tapply version for now.

G

 version
   _
platform   i686-pc-linux-gnu
arch   i686
os linux-gnu
system i686, linux-gnu
status Patched
major  2
minor  5.1
year   2007
month  07
day05
svn rev42131
language   R
version.string R version 2.5.1 Patched (2007-07-05 r42131)

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fees to use R

2007-07-06 Thread Gavin Simpson

On Fri, 2007-07-06 at 09:31 +0200, [EMAIL PROTECTED] wrote:
 Good morning to all,
 
 I work for a bank in Italy, I want to know if i can install R and
 relative add on like Rbloomberg for free or my company has to pay some
 fee.
 tanks to all.
 Stefano Colucci

R is released under the GNU GPL licence version 2. You can read the
licence online here:

http://www.gnu.org/copyleft/gpl.html

As such R is free (as in beer) and you can install it without paying a
fee. The source code is also free (as in speech) and is available from
www.r-project.org as are pre-compiled binaries for various systems.

You are however bound by the GPL licence and you should evaluate the
implication of the GPL for the use you/your employer has in mind.

HTH

G

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using self-written functions

2007-06-28 Thread Gavin Simpson

On Thu, 2007-06-28 at 17:29 +0800, R. Leenders wrote:
 Hi, I am pretty new to R, so I apologize for the obvious question.
 I
 have worked with R for a few months now and in the process have written
 several functions that I frequently use in various data analysis
 projects. I tend to give each project a directory of its own and set
 the working directory to that.
 Since there are several tasks that
 need to be accomplished in many of my projects, I frequently want to
 use functions I have written previously. My question is, how do I get
 access to them? The way I do it now is copy the relevant code to the
 script file of the project I am working on at the time and then run it
 so as to make the functions available. But that seems to be
 unnecessarily cumbersome. I used to work a lot with gauss, which had
 the opportunity of putting one's own functions is one directory and
 gauss would then have that directory in its search path always. How can
 I access my own functions in R without having to copy-paste them
 everytime and run them manually so I can call them later? Do I need to
 learn how to write a package and attach the package to make the
 functions available at all times? Is there another way?

Building a package is one way, and not that difficult once you've read
the Writing R Extensions manual.

An alternative is to have a directory where you keep R function scripts.
Put your functions in here in text files with say a .R extension. Then
in R you can source one or more of these R scripts as required, using
the source() function.

Say you have a directory, myScripts at the base of file system
(/home/user say on Linux or C:\ on Windows). in this directory there is
a file called my_r_function.R. To use this script/function in an R
session, you would issue:

## replace /home/user/ with what ever is the correct path for your
## system
source(/home/user/myScripts/my_r_function.R)

Which would make available to your current session any functions defined
in my_r_function.R.

Read ?source for more information.

HTH

G

 
 thanks, James
 
 
 
 

 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregating daily values

2007-06-26 Thread Gavin Simpson

)
 
 TIA
 
 Antonio
 
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Names of objects passed as ... to a function?

2007-06-24 Thread Gavin Simpson

On Sat, 2007-06-23 at 16:52 +0100, Prof Brian Ripley wrote:
 On Sat, 23 Jun 2007, Gavin Simpson wrote:
 
  Dear list,
 
  I have a function whose first argument is '...'. Each element of '...'
  is a data frame, and there will be at least 2 data frames in '...'. The
  function processes each of the data frames in '...' and returns a list,
  whose components are the processed data frames. I would like to name the
  components of this returned list with the names of the original data
  frames.
 
  Normally I'd use deparse(substitute()) to do this, but here I do not
  know the appropriate argument to run deparse(substitute()) on, and doing
  this on ... only returns a single name:
 
  foo - function(...)
  + deparse(substitute(...))
  dat1 - rnorm(10)
  dat2 - runif(10)
  foo(dat1, dat2)
  [1] dat1
 
  Can anyone suggest to me a way to get the names of objects passed as
  the ... argument of a function?
 
 That's a little tricky.  The following may suffice:
 
 foo - function(...)
 {
as.character(match.call())[-1]
 }

Thanks Brian and Marc for this solution. I simplified my example too
much, as in reality there are additional arguments after '...', but with
a minor change to the solution you provided I got it working.

 
 The problem is that under certain circumstances match.call can give names 
 like '..2'
 
  bar - function(...) foo(...)
  bar(dat1, dat2)
 [1] ..1 ..2
 
 and I don't know a comprehensive R-level solution to that.

Are there any particular situations (other than the one you show) that
you are aware of when this might happen? I will put a Warning section in
the Rd page for my function explaining that it might not name the
components correctly, so any further examples of where this might not
work could be helpful in writing that.

All the best,

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Names of objects passed as ... to a function?

2007-06-23 Thread Gavin Simpson

Dear list,

I have a function whose first argument is '...'. Each element of '...'
is a data frame, and there will be at least 2 data frames in '...'. The
function processes each of the data frames in '...' and returns a list,
whose components are the processed data frames. I would like to name the
components of this returned list with the names of the original data
frames. 

Normally I'd use deparse(substitute()) to do this, but here I do not
know the appropriate argument to run deparse(substitute()) on, and doing
this on ... only returns a single name:

 foo - function(...)
+ deparse(substitute(...))
 dat1 - rnorm(10)
 dat2 - runif(10)
 foo(dat1, dat2)
[1] dat1

Can anyone suggest to me a way to get the names of objects passed as
the ... argument of a function?

TIA

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distance function

2007-06-21 Thread Gavin Simpson

On Thu, 2007-06-21 at 19:56 +0200, Birgit Lemcke wrote:
 Hello you all from the R Help mailing list!
 
 I am working on a PowerBook with Mac Os X and use R 2.5.0.
 I used the distance function from the analogue package to perform a  
 similarity analysis using the Gowers Index and weighted Variables.
 My variables are bivariate data and measurements as well as interval  
 data transformed into minimum and maximum variables.
 I used this Code:
 
 Dist.Gowa-distance(Table1a ,Table0a ,method =mixed, weights
 
  
 (weighting),R = NULL )
   ^^^
Something is not right there () is this exactly what you typed?

You should not send questions about contributed packages to the list ---
as detailed in the posting guide. Without seeing Table1a and Table0a, it
is hard to say why this is failing - I suspect something about the
structure of the two data frames is throwing the function off.

If you can, send me that data ***off-list*** and I will take a look for
you, but as I'm teaching all day tomorrow, it won't happen till after
the weekend now.

HTH

G

 
 
 weighting is a vector created by this code:
 
   (weighting- c 
 (1/3,1/3,1/3,1/2,1/2,1/2,1/2,1/2,1/2,1/4,1/4,1/4,1/4,1/7,1/7,1/7,1/7,1/7 
 , 
 1/7,1/7,1/3,1/3,1/3,1/5,1/5,1/5,1/5,1/5,1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9, 
 1/9,1/5,1/5,1/5,1/5,1/5,1/7,1/7,1/7,1/7,1/7,1/7,1/7,1/7,1/7,1/7,1/7,1/7, 
 1/7,1/7,1/5,1/5,1/5,1/5,1/5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 
 1,1))
 It contains the weightings for the variables of the two data tables.
 
 My data tables look like this:
 
 
 Anth_cap18.0  NA   NA  4.0  5.0  1  1  3.0  5.0  2.4  4.5  5   
 5  2  2  2  3  1  1  1  1
 Anth_crin1   5.0  NA   NA  3.5 11.0  1  1  3.0 10.0  2.0  4.5  3   
 4  2  2  3  3  1  1  2  3
 Anth_eck17.0  NA   NA  6.0 12.0  1  1  6.0 11.0  2.0  3.0  3   
 5  2  2  3  3  1  1  1  2
 
 At the end of the analysis I get always this message:
 
 1: $ operator is deprecated for atomic vectors, returning NULL in:  
 object$na.action
 2: $ operator is deprecated for atomic vectors, returning NULL in:  
 object$weights
 
 Can anybody explain me what this means?
 
 Does anybody know if I have to standardize my measurements. As I  
 understood this is included in Gowers Index. If not is there a  
 function with different options of standardization more than rescaler  
 from the reshape package provides?
 
 Thanks for your help in advance.
 
 Greetings
 
 Birgit
 
 Birgit Lemcke
 Institut fr Systematische Botanik
 Zollikerstrasse 107
 CH-8008 Zrich
 Switzerland
 Ph: +41 (0)44 634 8351
 [EMAIL PROTECTED]
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract diagonals

2007-06-20 Thread Gavin Simpson

On Wed, 2007-06-20 at 13:26 +0200, Birgit Lemcke wrote:
 Hello,
 
 I am using Mac OS X on a power book and R 2.5.0
 
 I try to extract a diagonal from a dissimilarity matrix made with  
 dsvdis, with this code:
 
 diag(DiTestRR)
 
 But I get this error message:
 
 Fehler in array(0, c(n, p)) : 'dim' spezifiziert ein zu groes Array
 
 english:
 
 Error in array(0, c(n, p)) : 'dim' specifies a too big array.
 
 Is there a limit to extract diagonals?

The returned object is not a matrix, but an object of class dist which
doesn't store the diagonals or the upper triangle of the dissimilarity
matrix to save memory. You need to convert the dist object to a matrix
first, then extract the diagonal. But, as this shows:

 require(labdsv)
 ?dsvdis
 data(bryceveg)
 ?dsvdis
 dis.bc - dsvdis(bryceveg,index=bray/curtis)
Warning in symbol.For(dsvdis) : 'symbol.For' is not needed: please
remove it
 diag(as.matrix(dis.bc))

This is meaningless as the diagonals are all zero, as they should be;
this is the distance between a site and itself.

 
 I hope somebody will help me!

So perhaps you could explain why you want the diagonal. It would be
easier to just do:

diags - rep(0, length = nrow(bryceveg))

That will be without the sample labels, but that is easily rectified

 names(diags) - rownames(bryceveg)
 all.equal(diags, diag(as.matrix(dis.bc)))
[1] TRUE

So you'll have to reformulate your question if this is not what you
wanted.

A word of warning, do not do diag(dis.bc)) on the above as it brought my
Linux box to it's knees trying to do something silly - easily
recoverable, but beware.

HTH

G

 
 Greetings
 
 Birgit Lemcke

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dissimilarity

2007-06-20 Thread Gavin Simpson

On Wed, 2007-06-20 at 16:13 +0200, Birgit Lemcke wrote:
 Hello Stephen,
 
 I am happy that you help me. Thanks a million.
 
 It is a good feeling that you confirm my assumption that dsvdis is  
 not able to deal with missing data, because it says me that I am not  
 completely incapable.
 Okay now I have the problem what to do.
 I used this function cause there is an option to weight columns  
 differently what I havent found in other functions.
 
 But now I dont understand why I have to transpose the species as  
 columns? As I read in the help manual of dsvdis this function  
 calculates dissimilarities between rows.
 I have to calculate the dissimilarities between species that are in  
 rows by the use of morphological characters that are in columns.

If you really what to measure the associations between species then
leave them as you had them as the rows. But make sure you are choosing a
dissimilarity coefficient that works well for species associations.
There is a whole section in Legendre and Legendre 1998 Numerical Ecology
2nd English Edition Elsevier which may help here.

HTH

G

 
 Am I completely wrong with my thoughts?
 
 Birgit
 
 Am 20.06.2007 um 15:52 schrieb Stephen B. Cox:
 
  Hi Birgit - looks like you have a few issues here.
 
  Birgit Lemcke birgit.lemcke at systbot.uzh.ch writes:
 
 
  Hello you all!
 
  I am a completely new user of R and I have a problem to solve.
  I am using Mac OS X on a PowerBook.
 
  I have a table that looks like this:
 
  species X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
  X15 X16 X17 X18 X19 X20 X21
  1Anth_cap1  1  0  0  1  0  1  0  0  1   0   0   0   0   0
  0   0   1   0   0   0   1
  2   Anth_crin1  1  0  0  1  0  1  0  0  1   0   1   0   0   0
  0   0   0   1   0   0   1
  3Anth_eck1  1  0  0  1  0  1  0  0  1   0   0   0   0   0
  0   0   0   1   0   0   1
  4   Anth_gram1  1  0  0  1  0  1  0  0  1  NA  NA  NA  NA   0
  0   0   0   1   0   0   0
  5   Anth_insi1  1  0  0  1  0  1  0  0  1   0   0   0   1   0
  0   0   0   1   0   0   1
 
  All columns  are binary coded characters.
  The Import was done by this
 
  Test-read.table(TestRFemMalBivariat1.csv,header = TRUE, sep = ;)
 
  First - you need to transpose the matrix to have species as  
  columns.  You can do
  this with:
 
  d2 = data.frame(t(Test[,-1]))
  colnames(d2) = Test[,1]  #now use d2
 
 
 
  Now I try to perform a similarity analysis with the dsvdis function
  of the labdsv package with the sorensen-Index.
 
  My first question is if all zeros in my table are seen as missing
  values and if it islike that how can I change without turning zero
  into other numbers?
 
  no - the zeros are valid observations.  the na's are missing data.
 
 
DisTest-dsvdis(Test, index = sorensen)
 
  But I always get back this error message:
 
  Warnung in symbol.For(dsvdis) :'symbol.For' is not needed: please
  remove it
  Fehler in dsvdis(Test, index = sorensen) :
 NA/NaN/Inf in externem Funktionsaufruf (arg 1)
  Zustzlich: Warning message:
  NAs durch Umwandlung erzeugt
 
 
 
  Second - you have an issue with missing data.  It looks like dsvdis  
  does not
  like the NA's - so you must make a decision about what to do.   
  Delete that
  species, delete that site, or whatever...
 
  Finally - the warning over symbol.For is an issue with the labdsv  
  library itself
  - nothing you are doing wrong.  The results will still be valid -  
  but the use of
  symbol.For is something that will eventually need to be changed in  
  the labdsv
  library.
 
  hth,
 
  stephen
 
 Birgit Lemcke
 Institut fr Systematische Botanik
 Zollikerstrasse 107
 CH-8008 Zrich
 Switzerland
 Ph: +41 (0)44 634 8351
 [EMAIL PROTECTED]
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract diagonals

2007-06-20 Thread Gavin Simpson

On Wed, 2007-06-20 at 15:09 +0200, Birgit Lemcke wrote:
 Hello Gavin and thanks for your answer.
 
 Your completely right I dont need the diagonal that is the bisecting  
 line of the angle.
 
 I need another diagonal of the (now) matrix.
 
  A1 A2 A3 A4 B1 B2 B3 B4
 A1
 A2
 A3
 A4
 B1 X
 B2   X
 B3X
 B4 X
 

Not easily, especially without knowing how many samples are in A or B,
although all that is really needed is some careful subsetting of the
dist object and a minor amount of programming - unfortunately after
close to two weeks intensive teaching my brain isn't up to doing that
just now.

One simple way to do this is to use the distance() function in my
analogue package (on CRAN). distance() can calculate the dissimilarities
between one group of samples and another. Here is a simple example using
some dummy data, from ?distance:

 ## simple example using dummy data
 train - data.frame(matrix(abs(runif(200)), ncol = 10))
 rownames(train) - LETTERS[1:20]
 colnames(train) - as.character(1:10)
 fossil - data.frame(matrix(abs(runif(100)), ncol = 10))
 colnames(fossil) - as.character(1:10)
 rownames(fossil) - letters[1:10]

 ## calculate distances/dissimilarities between train and fossil
 ## samples
 test - distance(train, fossil)

test is now a matrix, the diagonal elements of which are the values that
you appear to want:

 diag(test)

if I'm reading your diagram correctly. Note that for this, you need to
be comparing row 1 from matrix A with row 1 from matrix B - if they are
in some other order, then this won't work.

distance() has a version of Gower's coefficient for mixed that allows
you to specify weights. The function is just about clever enough to
allow missing values if you use method = mixed in distance(). Be sure
to read up about Gower's mixed coefficient in his 1971 paper (Gower,
1971, Biometrics 23; 623--637) and the use that weights and the range
parameter Rj are put to, or see the relevant section in Legendre 
Legendre (1998).

 I need for example the diagonal that compares A1 with B1.
 Do you have an idea how I can handle this?
 
 What is the effect of this code?
 
 all.equal(diags, diag(as.matrix(dis.bc)))

This was showing you that the diagonals of the dissimilarity matrix are
just a vector of zeroes. all.equal tests equality of its arguments.

 
 Thanks a lot and sorry for my inability to solve my problems on my own.

You're welcome. Using R is a learning experience. You only need to
grovel and apologise if you have not done your homework before posting
and not read the FAQ, the documentation or searched the archives, or
followed the posting guide. Which is not the case here.

HTH

G

 
 Am 20.06.2007 um 14:11 schrieb Gavin Simpson:
 
  On Wed, 2007-06-20 at 13:26 +0200, Birgit Lemcke wrote:
  Hello,
 
  I am using Mac OS X on a power book and R 2.5.0
 
  I try to extract a diagonal from a dissimilarity matrix made with
  dsvdis, with this code:
 
  diag(DiTestRR)
 
  But I get this error message:
 
  Fehler in array(0, c(n, p)) : 'dim' spezifiziert ein zu groes Array
 
  english:
 
  Error in array(0, c(n, p)) : 'dim' specifies a too big array.
 
  Is there a limit to extract diagonals?
 
  The returned object is not a matrix, but an object of class dist  
  which
  doesn't store the diagonals or the upper triangle of the dissimilarity
  matrix to save memory. You need to convert the dist object to a matrix
  first, then extract the diagonal. But, as this shows:
 
  require(labdsv)
  ?dsvdis
  data(bryceveg)
  ?dsvdis
  dis.bc - dsvdis(bryceveg,index=bray/curtis)
  Warning in symbol.For(dsvdis) : 'symbol.For' is not needed: please
  remove it
  diag(as.matrix(dis.bc))
 
  This is meaningless as the diagonals are all zero, as they should be;
  this is the distance between a site and itself.
 
 
  I hope somebody will help me!
 
  So perhaps you could explain why you want the diagonal. It would be
  easier to just do:
 
  diags - rep(0, length = nrow(bryceveg))
 
  That will be without the sample labels, but that is easily rectified
 
  names(diags) - rownames(bryceveg)
  all.equal(diags, diag(as.matrix(dis.bc)))
  [1] TRUE
 
  So you'll have to reformulate your question if this is not what you
  wanted.
 
  A word of warning, do not do diag(dis.bc)) on the above as it  
  brought my
  Linux box to it's knees trying to do something silly - easily
  recoverable, but beware.
 
  HTH
 
  G
 
 
  Greetings
 
  Birgit Lemcke
 
  -- 
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk

Re: [R] How to extract diagonals

2007-06-20 Thread Gavin Simpson

On Wed, 2007-06-20 at 18:24 +0200, Birgit Lemcke wrote:
 Hello Gavin!
 
 
 I thank you so much that you help me here.
 Only to answer your questions there are 452 samples (species) in A and
 the same number in B.

If your matrices are both 452x452 then distance() is going to take a
while to crunch through the numbers. 24 seconds or there abouts on my
work desktop. distance() (and analogue for that matter) is still very
much in development, and these dissimilarities should really be coded in
C for speed. It is on the todo list but I need to learn some C first...

HTH

G

 Unfortunately I will get the book from Legendre  Legendre only in 2
 days (small library) but I think for the moment I am busy to try and
 learn with the codes you gave me here.
 For me it seems that this will solve all the problems I have at the
 moment.
 Now it is my turn to learn about it.
 
 
 Once again: thanks
 
 
 Greetings 
 
 
 Birgit
 
 
 
 Am 20.06.2007 um 18:02 schrieb Gavin Simpson:
 
  On Wed, 2007-06-20 at 15:09 +0200, Birgit Lemcke wrote:
   Hello Gavin and thanks for your answer.
   
   
   Your completely right I dont need the diagonal that is the
   bisecting  
   line of the angle.
   
   
   I need another diagonal of the (now) matrix.
   
   
A1 A2 A3 A4 B1 B2 B3 B4
   A1
   A2
   A3
   A4
   B1 X
   B2   X
   B3X
   B4 X
   
   
  
  
  Not easily, especially without knowing how many samples are in A or
  B,
  although all that is really needed is some careful subsetting of the
  dist object and a minor amount of programming - unfortunately after
  close to two weeks intensive teaching my brain isn't up to doing
  that
  just now.
  
  
  One simple way to do this is to use the distance() function in my
  analogue package (on CRAN). distance() can calculate the
  dissimilarities
  between one group of samples and another. Here is a simple example
  using
  some dummy data, from ?distance:
  
  
   ## simple example using dummy data
   train - data.frame(matrix(abs(runif(200)), ncol = 10))
   rownames(train) - LETTERS[1:20]
   colnames(train) - as.character(1:10)
   fossil - data.frame(matrix(abs(runif(100)), ncol = 10))
   colnames(fossil) - as.character(1:10)
   rownames(fossil) - letters[1:10]
  
  
   ## calculate distances/dissimilarities between train and fossil
   ## samples
   test - distance(train, fossil)
  
  
  test is now a matrix, the diagonal elements of which are the values
  that
  you appear to want:
  
  
   diag(test)
  
  
  if I'm reading your diagram correctly. Note that for this, you need
  to
  be comparing row 1 from matrix A with row 1 from matrix B - if they
  are
  in some other order, then this won't work.
  
  
  distance() has a version of Gower's coefficient for mixed that
  allows
  you to specify weights. The function is just about clever enough to
  allow missing values if you use method = mixed in distance(). Be
  sure
  to read up about Gower's mixed coefficient in his 1971 paper (Gower,
  1971, Biometrics 23; 623--637) and the use that weights and the
  range
  parameter Rj are put to, or see the relevant section in Legendre 
  Legendre (1998).
  
  
   I need for example the diagonal that compares A1 with B1.
   Do you have an idea how I can handle this?
   
   
   What is the effect of this code?
   
   
   all.equal(diags, diag(as.matrix(dis.bc)))
  
  
  This was showing you that the diagonals of the dissimilarity matrix
  are
  just a vector of zeroes. all.equal tests equality of its arguments.
  
  
   
   
   Thanks a lot and sorry for my inability to solve my problems on my
   own.
  
  
  You're welcome. Using R is a learning experience. You only need to
  grovel and apologise if you have not done your homework before
  posting
  and not read the FAQ, the documentation or searched the archives, or
  followed the posting guide. Which is not the case here.
  
  
  HTH
  
  
  G
  
  
   
   
   Am 20.06.2007 um 14:11 schrieb Gavin Simpson:
   
   
On Wed, 2007-06-20 at 13:26 +0200, Birgit Lemcke wrote:
 Hello,
 
 
 I am using Mac OS X on a power book and R 2.5.0
 
 
 I try to extract a diagonal from a dissimilarity matrix made
 with
 dsvdis, with this code:
 
 
 diag(DiTestRR)
 
 
 But I get this error message:
 
 
 Fehler in array(0, c(n, p)) : 'dim' spezifiziert ein zu groes
 Array
 
 
 english:
 
 
 Error in array(0, c(n, p)) : 'dim' specifies a too big array.
 
 
 Is there a limit to extract diagonals?


The returned object is not a matrix, but an object of class
dist  
which
doesn't store the diagonals or the upper triangle of the
dissimilarity
matrix to save memory. You need to convert the dist object to a
matrix
first, then extract the diagonal. But, as this shows:


 require(labdsv

Re: [R] BIC and Hosmer-Lemeshow statistic for logistic regression

2007-06-19 Thread Gavin Simpson

On Tue, 2007-06-19 at 04:59 -0700, spime wrote:
 
 I haven't find any helpful thread. How can i calculate BIC and
 Hosmer-Lemeshow statistic for a logistic regression model. I have used glm
 for logistic fit.

Not sure about the Hosmer-Lemeshow, but AIC() with argument k = log(n),
where n is number of observations,  will get BIC. See ?AIC.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Efficiently calculate sd on an array?

2007-06-17 Thread Gavin Simpson

Dear list,

Consider the following problem:

n.obs - 167
n.boot - 100
arr - array(runif(n.obs*n.obs*n.boot), dim = c(n.obs, n.obs, n.boot))
arr[sample(n.obs, 3), sample(n.obs, 3), ] - NA

Given the array arr, with dims = 167*167*100, I would like to calculate
the sd of the values in the 3rd dimension of arr, and an obvious way to
do this is via apply():

system.time(res - apply(arr, c(2,1), sd, na.rm = TRUE))

This takes over 4 seconds on my desktop.

I have found an efficient way to calculate the means of the 3rd
dimension using

temp - t(rowMeans(arr, na.rm = TRUE, dims = 2))

instead of

temp - apply(arr, c(2,1), mean, na.rm = TRUE)

but I am having difficulty seeing how to calculate the standard
deviations efficiently.

Any idea how I might go about this?

All the best,

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do you do an e-mail post that is within an ongoing thread?

2007-06-09 Thread Gavin Simpson

On Fri, 2007-06-08 at 20:39 -0500, Robert Wilkins wrote:
 That may sound like a stupid question, but if it confuses me, I'm sure
 it confuses others as well. I've tried to find that information on the
 R mail-group info pages, can't seem to find it. Is it something
 obvious?
 
 To begin a brand new discussion, you do your post as an e-mail sent to
  r-help@stat.math.ethz.ch .
 As I am doing right now.
 
 How do I do an additional post that gets included in the
 [R] Tools For Preparing Data For Analysis thread, a thread which I
 started myself yesterday ( thanks for all the responses everybody )?

Just reply all (to the list and the sender of the email, plus any other
recipients in the CC list if appropriate) to the email you wish to
comment on. You can reply at any point in the thread and your email will
end up located in that position in the thread, i.e. underneath the
message you replied to in the thread.

The actual threading is dealt with by peoples own email software (and by
the software used to manage the archives), via some of the headers sent
along with your email, for example:

In-Reply-To: [EMAIL PROTECTED]
References: [EMAIL PROTECTED]

The long code there is the Message-Id header of the email that the reply
references.

Most emailers will hide all of these headers from you, but a good one
will allow you to look at all the headers or the actual source of the
email, where you will be able to see them, along with a lot of other
information about the message sent.

 
 There's got to be a real easy answer to that, since everybody else does that.
 (I'm using gmail, does it make a difference what e-mail host you use?).

I've not use gmail much, but many people on the list do and they end up
in the correct place in the thread. Note that in Gmail to see the
headers for a message you can select show original from the little
drop down menu (down triangle) next to the reply button.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Formating the data

2007-06-08 Thread Gavin Simpson

On Fri, 2007-06-08 at 06:13 -0700, A Ezhil wrote:
 Hi All,
 
 I have a vector of length 48, something like:
 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 
 I would like to print (reformat) this vector as:
 00110011
 
 by simply removing the spaces between them. I have
 been trying with many option but not able to do this
 task.
 I would greatly appreciate your suggestion on fixing
 this simple task.
 
 Thanks in advance.

 dat - scan()
1: 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
28: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
49:
Read 48 items
 dat
 [1] 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1[39] 1 1 1 1 1 1 1 1 1 1
 print(dat, print.gap = 0)
 [1]00110011

Is that what you want? It is just altering how the data are printed. You
still get the [1] at the start though.

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spectral analysis

2007-06-06 Thread Gavin Simpson

On Wed, 2007-06-06 at 13:55 -0700, David LEDU wrote:
 Hi all,
 
 I am dealing with paleoceanographic data and I have a C14 time serie
 and one other variable. I would like to perform a spectral analysis
 (fft or wavelet) and plot it. Unfortunately I don't know the exact
 script to do this. Does anybody could send me an example to perform my
 spectral analysis ?
 
 I Thank you
 
 David

Invariably data of this nature are irregularly sampled in time, so you
should check whether the in-built spectrum() function is suitable for
your core data. I'm not aware of much else available in R, but one thing
I am aware of is a paper and R code by Mathias et al in the Journal of
Statistical Software:

http://www.jstatsoft.org/index.php?vol=11

It is issue 2 in that volume. This might be more suitable given your
data. The code is a few years old now and there isn't a ready built
package on CRAN so you'll have to compile it yourself.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why is the R mailing list so hard to figure out?

2007-06-05 Thread Gavin Simpson

On Mon, 2007-06-04 at 18:25 -0500, Robert Wilkins wrote:
 Why does the R mailing list need such an unusual and customized user 
 interface?
 
 Last January, I figured out how to read Usenet mailing lists ( or
 Usenet groups ) and they all pretty much work the same, learn to use
 one, you've learned to use them all ( gnu.misc.discuss ,
 comp.lang.lisp , and so on ).
 
 What's the best way to view and read discussions in this group for
 recent days? Can I view the postings for the current day via Google
 Groups?
 
 I hope I'm posting correctly.

Having never used Google Groups or Usenet I have no idea if this is the
same/similar, but the R mailing lists are archived via the Gmane service
(amongst others), which you can view, threaded in your browser (over
http), or point your favourite news (nntp) or RSS feed collator at it.

The entry page is:
http://gmane.org/info.php?group=gmane.comp.lang.r.general

In the Groups section, you'll find links to the various options. If
using a news reader (nntp) or RSS feed collator, just copy the relevant
links into your application of choice.

A threaded web view of the latest posts, for example, is at (via the
threaded HTTP link in the Groups section of the above mentioned link):

http://news.gmane.org/gmane.comp.lang.r.general

I personally let my emailer manage the posting for me as I find that the
easiest way to work...

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'trim' must be numeric of length one？

2007-05-28 Thread Gavin Simpson

On Mon, 2007-05-28 at 10:58 +0800, Ruixin ZHU wrote:
 Hi everybody,
  
 When I followed a practice example, I got an error as follows:
 
 ###
  cc-read.table('example5_2.dat',header=TRUE)
  cc
   EXAM1 EXAM2 EXAM3 EXAM4 EXAM5
 14534233550
 22336666634
 36759728069
 45643313440
 57466573266
  mean(cc)
 EXAM1 EXAM2 EXAM3 EXAM4 EXAM5 
  53.0  47.6  49.8  49.4  51.8 
  attach(cc)
  mean(EXAM1,EXAM2,EXAM3,EXAM4,EXAM5)
 Error in mean.default(EXAM1, EXAM2, EXAM3, EXAM4, EXAM5) : 
 'trim' must be numeric of length one

Why did you think that mean would work in the way you used it?

Reading ?mean shows that the default method for mean has a first
argument 'x', and second argument 'trim', plus some others. So in your
2nd example, you passed EXAM1 as argument 'x' and then EXAM2 as 'trim',
and the other EXAMx variables as other arguments. R was not expecting a
vector as argument 'trim' and rightly complained.

The reason the first example worked is that there is a method for data
frames (see the first entry in the usage section of ?mean) - where you
correctly passed cc as argument 'x' as the function/method requires.

 In addition: Warning message:
 the condition has length  1 and only the first element will be used in:
 if (na.rm) x - x[!is.na(x)] 
 Would anybody explain which caused this error, and how to modify it?

What is wrong with the first example you used? Why do you need to get
the means by specifying all the variables explicitly?

There are various ways of getting means other than mean():

lapply(cc, mean)
sapply(cc, mean)
colMeans(cc)

If you want specific columns, either subset the returned object:

mean(cc)[c(EXAM1, EXAM4)]

or subset the object before calculating the means:

mean(cc[, c(EXAM1, EXAM4)])
   ^^^

note the extra , as we need to specify the columns here.

You will need to explain more clearly what you want to do if the above
is not sufficient to solve your problem.

Also, be wary of overly using attach. It can be a handy little tool,
until it bites you in the ass because you forgot to detach/reattach the
object after making some really critical change to the underlying
data/object.

HTH

G

  
 Thanks!
 _
 Dr.Ruixin ZHU
 Shanghai Center for Bioinformation Technology
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 86-21-13040647832
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-About PLSR

2007-05-25 Thread Gavin Simpson

On Fri, 2007-05-25 at 17:25 +0530, Nitish Kumar Mishra wrote:
 hi R help group,
 I have installed PLS package in R and use it for princomp  prcomp
 commands for calculating PCA using its example file(USArrests example).
 But How I can use PLS for Partial least square, R square, mvrCv one more
 think how i can import external file in R. When I use plsr, R2, RMSEP it
 show error could not find function plsr, RMSEP etc.
 How I can calculate PLS, R2, RMSEP, PCR, MVR using pls package in R.
 Thanking you

Did you load the package with:

library(pls)

Before you tried to use the functions you mention?

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Goodness of fit for hclust?

2007-05-22 Thread Gavin Simpson

On Tue, 2007-05-22 at 00:35 +, [EMAIL PROTECTED] wrote:
 I'd like to get a measure of goodness of fit for a heirarchical
 clustering result from hclust.  Something that would indicate the
 extent to which the dendrogram accurately represents the original
 dissimilarity matrix.  Is there an easy way to do this?
 
 Or, does anyone have code for computing distances between nodes given
 an hclust structure?  So far, my searches have come up dry.
 
 -- David Hinds

Try ?cophenetic which calculates the cophenetic distances of a
hierarchical cluster analysis. The example on that help page shows how
to use the function to get the correlation between the original
distances and cophenetic distances.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] basic problem but can't solve it

2007-05-22 Thread Gavin Simpson

On Tue, 2007-05-22 at 19:01 +0200, Benoit Chemineau wrote:
 Hello,
I have a basic problem but i can't figure it out with the
 table underneath. I would like to compute monthly averages.
I would like to have the average measure for month #5 for the first
 three rows (the same number in the first three lines) and the average
 measure for month #6 for the last four rows ((the same number in the first
 three lines) in a separate vesctor (let's call it 'result')
I tried to use a while statement inside a for loop but it doesn't
 seem to work.
Can someone please help me with this ?
 
Measure Month
2.28 5
14.04 5
0.60 5
0.21 6
0.96 6
0.75 6
1.28 6

If dat is a data frame containing your data:

 dat
  Measure Month
12.28 5
2   14.04 5
30.60 5
40.21 6
50.96 6
60.75 6
71.28 6

 aggregate(dat$Measure, by = list(Month = dat$Month), mean)
  Monthx
1 5 5.64
2 6 0.80

 tapply(dat$Measure, dat$Month, mean)
   56
5.64 0.80

see ?aggregate and ?tapply for two solutions. The tapply one seems
cleaner and easier to get the vector you need, the aggregate version
needs an extra step:

aggregate(dat$Measure, by = list(Month = dat$Month), mean)$x
  ^^
Note the $x at the end to subset the object returned by aggregate

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xyplot with grid?

2007-05-09 Thread Gavin Simpson

On Wed, 2007-05-09 at 19:13 +0100, Gav Wood wrote:
  Giving a reproducible example would be a good start.
 
 Ok, what's the easiest way to get a grid (ala grid()) on this graph?
 
 xyplot(x~y,data.frame(x=1:9,y=1:9,z=sort(rep(c('A','B','C'),3))),
  groups=z,auto.key=list(columns=3))
 
 Bish bosh,

Er, write your own panel function:

xyplot(x~y,data.frame(x=1:9,y=1:9,z=sort(rep(c('A','B','C'),3))),
   groups=z,auto.key=list(columns=3), h = -1, v = -1,
   panel = function(x, y, ...) {
   panel.grid(...)
   panel.xyplot(x, y, ...)
 })

Not sure if that is the easiest way, or the best, but that's how I've
learnt to use lattice recently. The v and h arguments are passed to
panel.grid as part of ... and just tell it to plot the grids at the
tick marks.

 
 Gav

HTH Gav,

Gav

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] randomForest gives different results for formula call v. x, y methods. Why?

2007-04-29 Thread Gavin Simpson

On Sat, 2007-04-28 at 21:13 -0400, David L. Van Brunt, Ph.D. wrote:
 Just out of curiosity, I took the default iris example in the RF
 helpfile...
 but seeing the admonition against using the formula interface for large data
 sets, I wanted to play around a bit to see how the various options affected
 the output. Found something interesting I couldn't find documentation for...
 
 Just like the example...
  set.seed(12) # to be sure I have reproducibility

No differences between runs for me on FC4 using R 2.4.1 and 2.5.0 with:

 require(randomForest)
Loading required package: randomForest
randomForest 4.5-18

*if* I reset the seed before each call to randomForest.

Your example code doesn't seem to be resetting the random seed before
each run. As such, each run is using a different set of random variables
at each bootstrap sample.

E.g. runs all same with reset seed:

 set.seed(12)
 randomForest(Species ~ ., data=iris)

Call:
 randomForest(formula = Species ~ ., data = iris)
   Type of random forest: classification
 Number of trees: 500
No. of variables tried at each split: 2

OOB estimate of  error rate: 4%
Confusion matrix:
   setosa versicolor virginica class.error
setosa 50  0 00.00
versicolor  0 47 30.06
virginica   0  3470.06
 set.seed(12)
 randomForest(x=iris[,1:4],y=iris[,5])

Call:
 randomForest(x = iris[, 1:4], y = iris[, 5])
   Type of random forest: classification
 Number of trees: 500
No. of variables tried at each split: 2

OOB estimate of  error rate: 4%
Confusion matrix:
   setosa versicolor virginica class.error
setosa 50  0 00.00
versicolor  0 47 30.06
virginica   0  3470.06
 set.seed(12)
 randomForest(x=iris[,c(1:4)],y=iris[,5])

Call:
 randomForest(x = iris[, c(1:4)], y = iris[, 5])
   Type of random forest: classification
 Number of trees: 500
No. of variables tried at each split: 2

OOB estimate of  error rate: 4%
Confusion matrix:
   setosa versicolor virginica class.error
setosa 50  0 00.00
versicolor  0 47 30.06
virginica   0  3470.06
 set.seed(12)
 randomForest(x=iris[,c(1,2,3,4)],y=iris[,5])

Call:
 randomForest(x = iris[, c(1, 2, 3, 4)], y = iris[, 5])
   Type of random forest: classification
 Number of trees: 500
No. of variables tried at each split: 2

OOB estimate of  error rate: 4%
Confusion matrix:
   setosa versicolor virginica class.error
setosa 50  0 00.00
versicolor  0 47 30.06
virginica   0  3470.06

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Too slow to execute!

2007-04-29 Thread Gavin Simpson

On Sun, 2007-04-29 at 06:56 -0700, Usman Shehu wrote:
 Greetings,
 I have the following simple function but what worries me is that it
 takes about  5 or more minutes to execute. My machine runs on windows
 with 1.8GHz and 256 Ram.
  Re=NULL
  for(i in 1:10){
 + x=rnorm(20)
 + Re[i]=(x-2*10)/20
 + Re
 + }
 I would appreciate any help on how to make it faster.
 
 Usman

It is not clear exactly what you want to do, but taking what you wrote
literally, there are 3 problems that I see:

 1. You haven't allocated sufficient storage space for 'Re'. As
such, at each loop, R has to copy and enlarge the object which
take a all the time.
 2. The result of (x-2*10)/20 is a vector of length 20, which you
are trying to force into the space for a vector of length 1
 3. In a loop like this, the last line containing just 'Re' does
nothing. If you want 'Re' printed to the console, then you need
to wrap it in print. Quite why you'd want 'Re' flashing up on
the screen 100 000 times is beyond me...

Fixing each of these gives:

## number of permutations
n.perm - 10
## storage space for a 100 000 x 20 matrix
Re - matrix(ncol = 20, nrow = n.perm)
## set up loop
for(i in seq_len(n.perm)) {
   x - rnorm(20)
   ## store in a row of Re
   Re[i,] - (x-2*10)/20
}

Timing this shows that it runs in 3.5 seconds on my desktop - which has
similar processor but a lot more RAM:

 system.time({
+ n.perm - 10
+ Re - matrix(ncol = 20, nrow = n.perm)
+ for(i in seq_len(n.perm)) {
+x - rnorm(20)
+Re[i,] - (x-2*10)/20
+ }
+ })
   user  system elapsed
  3.336   0.056   3.394

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Randomising matrices

2007-04-27 Thread Gavin Simpson

 + 1]
  changed - changed + 1}
  }
  return(x)
}

 
 I'm working with vegetation presence-absence matrices based on field 
 observations. The matrices are formatted to have sites as rows and 
 species as columns. The presence of a species on a site is indicated 
 with a 1 (absence is obviously indicated with a 0).
 
 I would like to randomise the matrices many times in order to construct 
 null models. However, I cannot identify a function in R to do this, and 
 the programming looks tricky for someone of my limited skills.
 
 Can anybody help me out?
 
 Many thanks,
 
 Nick Cutler
 
 Institute of Geography
 School of Geosciences
 University of Edinburgh
 Drummond Street
 Edinburgh EH8 9XP
 United Kingdom
 
 Tel: 0131 650 2532
 Web: http://www.geos.ed.ac.uk/homes/s0455078
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] partitioning variation using the Vegan CCA routine?

2007-04-27 Thread Gavin Simpson

On Fri, 2007-04-27 at 16:03 +1000, Matthew McArthur wrote:
 Hello
 I am using Jari Oksanen's CCA routine from the Vegan package on some estuary
 data, following a technique applied in (Anderson, M.J.  Gribble, N.A.,
 1998, Partitioning the variation among spatial, temporal and environmental
 components in a multivariate data set, Australian Journal of Ecology 23,
 158-167).
 Some steps in the process require that the dependent matrix be constrained
 by one independent matrix, given the affect of another independent matrix.
 
 eg: CCA of species matrix, constrained by the environmental matrix, with
 spatial variables treated as covariables
 or:  CCA of species matrix, constrained by the temporal matrix, with
 environmental and spatial variables treated as covariables
 
 Does anyone know of a partitioning routine able to perform this feat or have
 suggestions on how I might approach the problem from scratch?

If you can survive with using RDA ( rda() ), then vegan has function
varpart() to do this automagically for you. If you really need CCA, then
perhaps try a standardisation of the raw data so that when you use rda()
via varpart(), what you get is close to something that cca() would
return or is a good compromise for species data - see ?decostand with
method == chi.square or method = hellinger in vegan and the cited
reference to see what I'm talking about here.

If you want to do things by hand the old fashioned way, then look at
using Condition(var_x) in your formula:

res - cca(spp ~ var1 + var2 + Condition(spatial.vars), data = my.data)

see ?cca

HTH

G

 
 Cheers
 Matt
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to be clever with princomp?

2007-04-27 Thread Gavin Simpson

On Fri, 2007-04-27 at 12:58 +0100, Simon Pickett wrote:
 Hi all,
 
 I have been using princomp() recently, its very useful indeed, but I have
 a question about how to specify the rows of data you want it to choose.
 
 I have a set of variables relating to bird characteristics and I have been
 using princomp to produce PC scores from these.
 
 However since I have multiple duplicate entries per individual (each bird
 had a varying number of chicks), I only want princomp to treat each
 individual bird as the sample and not include all the duplicates. Then I
 want to replicate the pc scores for all the duplicated rows for that
 individual.
 
 Any idea how to do this?

## example data using the vegan package
require(vegan)
data(varespec)
## duplicate some rows
vare2 - varespec
vare2 - rbind(vare2, varespec[sample(nrow(varespec), 50, replace =
TRUE), ])
## build the model using prcomp - it is better - on the original data
## without duplicates
mod - prcomp(varespec, centre = TRUE, scale. = TRUE)
## predict for full matrix inc duplicated rows
pred - predict(mod, vare2)

Takes 0.005 seconds on my machine. So get a subset of your data without
the duplicates, then use the predict method for prcomp.
See ?predict.prcomp.

Is that what you wanted?

G

 
 Up to now I have been using princomp to only select the entries which are
 not duplicated which is easy, but the difficult bit is the programming to
 duplicate the pc scores across the entries for each individual.
 
 (I developed something that worked but it takes about 5 minutes to run!)
 
 Thanks for all your help,
 
 very much appreciated,
 
 Simon.
 
 
 
 
 Simon Pickett
 PhD student
 Centre For Ecology and Conservation
 Tremough Campus
 University of Exeter in Cornwall
 TR109EZ
 Tel 01326371852
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] queries

2007-04-22 Thread Gavin Simpson

On Sat, 2007-04-21 at 12:03 -0700, Nima Tehrani wrote:
 Dear Help Desk,

   Is there any way to change some of the labels on R diagrams? 

   Specifically in histograms, I would like to: 

   1. change the word frequency to count. 
   2. Make the font of the title (Histogram of ) smaller.
   3. Have a different word below the histogram than the one 
 occurring in the title (right now if you choose X for your variable, it comes 
 both above the histogram (in the phrase Histogram of X) and below it).

   Thanks for your time,
   Nima

dat - rnorm(100)
hist(dat, ylab = Count, cex.main = 0.7, xlab = Something else)

for example.

But this is all R 101 and you could have found this by reading the An
Introduction to R manual that comes with your R installation or can be
found at the R website (www.r-project.org Manuals section of menu), and
by reading the help for ?hist and ?par (for cex.main).

 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Please do.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] division of decimal number

2007-04-19 Thread Gavin Simpson

On Wed, 2007-04-18 at 16:06 +0100, Barry Rowlingson wrote:
 Schmitt, Corinna wrote:
  Dear R-Experts,
  
  how can I divide the number 0.285 with 2. I need a function.
  Result: 0.285 / 2 = 0.1425
 
   Just get the / operator:
 
divide = get(/)
   
divide(0.285,2)
   [1] 0.1425
 
 Is that what you want?
 
 Barry

You can use the function directly, by quoting it:

 /(0.285, 2)
[1] 0.1425

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Abundance data ordination in R

2007-04-02 Thread Gavin Simpson

On Sun, 2007-04-01 at 09:20 -0700, Milton Cezar Ribeiro wrote:
 Dear R-gurus
 
 I have a data.frame with abundance data for species and sites which looks 
 like:
 mydf-data.frame(
  sp1=sample(0:10,5,replace=T),
  sp2=sample(0:20,5,replace=T),
  sp3=sample(0:4,5,replace=T),
  sp4=sample(0:2,5,replace=T))
 rownames(mydf)-paste(sites,1:5,sep=)
 
 I would like make an ordination analysis of these data and my worries
 is about the zeros (absence of species) into the matrix. Up to I
 read (Gotelli - A primir of ecological statistics, 2004), when I have
 abundance data I cant compute Euclidian Distances because the zeros
 have the meaning of absence of the species and not as zero counting.
 Gotelli suggests one make principal coordinates analysis. I would
 like to here from you what you think about and what is the best
 packages and functions to I compute my distance matrices and do my
 ordination analysis. Can I considere zero as NA on my data.frame? Is
 there a good PDF book available about Multivariate Analysis for
 abundance data available on the web?

In addition to the other suggestions, there is a Task View on CRAN for
the topic of Environmetrics. This has a section describing the various
ordination techniques available in R as well as functions to calculate
distance/dissimilarity matrices:

http://cran.r-project.org/src/contrib/Views/Environmetrics.html

G

 
 Kind regards
 
 Miltinho
 Brazil
 
 __
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] partial R

2007-04-02 Thread Gavin Simpson

On Mon, 2007-04-02 at 09:16 -0400, Michael Kubovy wrote:
 On Apr 2, 2007, at 5:49 AM, Pedram Rowhani wrote:
 
  i am wondering if there is a command in R that will give me the
  partial regression coefficients
 
 To answer your question, you could have started with
 RSiteSearch('partial regression')
 
 It's then likely that you would figured out that one way to proceed is
 install.packages('car')
 ?cr.plots
 
 (You may have to restart R to get the help on a newly-installed  
 package.)

No, you just missed out the fundamental step of loading the package from
the library:

install.packages('car')
library(car)
?cr.plots

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kmeans centers

2007-03-30 Thread Gavin Simpson

On Fri, 2007-03-30 at 09:07 +0200, Sergio Della Franca wrote:
 My simple problem is that when i run kmeans this give me different
 results because if centers is a number, a random set of (distinct)
 rows in x is chosen as the initial centres.

You can stop this and make it reproducible by setting the seed for the
random number generator before doing kmeans - this way the same
(pseudo)random set of rows get selected each time:

dat - data.frame(a = rnorm(100), b = rnorm(100), c = rnorm(100))
set.seed(1234)
km - kmeans(dat, 2)
set.seed(1234)
km2 - kmeans(dat, 2)
all.equal(km, km2) ## TRUE

But ask yourself is this is helpful? Are the solutions similar each time
you run the function (without setting the seed) and get different
results? If the runs give very different results then it is likely that
you are finding local minima not an optimal solution - a common problem
with iterative algorithms using random starts.

One solution to this /is/ to use several random starts and see if you
get similar results. Some samples may switch clusters, but if the bulk
of samples assigned to same cluster (i.e. together, not in cluster 1
as the cluster number is random) then you can be happy with the result.
That some samples switch clusters may just indicate that there isn't a
clearly defined clustering of all your samples - some are intermediate
between clusters.

Another is to use a hierarchical cluster analysis (via hclust()). Cut it
at the number of clusters you want and use the centers (sic) of those
clusters as the starting points for kmeans. This way the hclust()
results get you close to a good solution, which kmeans then updates as
it is not constrained by having a hierarchical structure.

There is an example of this in Modern Applied Statistics with S (2002 -
Venables and Ripley, Springer), but if you don't have this book, you can
see the MASS scripts for Chapter 11 of the book. The MASS scripts should
have been provided with your copy of R, in
RINSTALL/library/MASS/scripts/ where RINSTALL is the where your version
of R is installed. Then you want ch11.R in that directory. Look at
section 11.2 Cluster Analysis in that file

  
 About me the problem is simple. 
  
 The question i ask you is if it possible that centers could be
 different from number. 
 i.e. instead of indicate a number of center, could be possible
 indicate different character lable to identify the cluster i want to
 obtain?

No. And this is why, despite how clear and simple the problem is to you,
you need to show us an example of your data! Surly, if you have
information that exactly identifies the clusters you want to find, why
do you need a clustering algorithm to find them for you?

G


 thk you
 
 
  
 2007/3/29, Gavin Simpson [EMAIL PROTECTED]: 
 On Thu, 2007-03-29 at 15:02 +0200, Sergio Della Franca wrote:
  Dear R-Helpers,
 
  I read in the R documentation, about kmeans: 
 
centers
 
  Either the number of clusters or a set of initial (distinct)
 cluster
  centres. *If a number*, a random set of (distinct) rows in x
 is chosen as
  the initial centres. 
  My question is: could it be possible that the centers are
 character and not
  number?
 
 I think you misunderstand - centers is the number of clusters
 you want
 to partition your data into. How else would you specify the
 number of 
 clusters other than by a number? So no, it has to be a numeric
 number.
 
 The alternative use of centers is to provide known starting
 points for
 the algorithm, such as from the results of a hierarchical
 cluster 
 analysis, that are the locations of the cluster centroids, for
 each
 cluster, on each of the feature variables.
 
 Also, argument x to kmeans() is specific about requiring a
 numeric
 matrix (or something coercible to one), so characters here are
 not 
 allowed either.
 
 But then again, I may not have understood what it is that you
 are
 asking, but that is not surprising given that you have not
 provided an
 example of what you are trying to do, and how you tried to do
 it but 
 failed.
 
  and provide commented, minimal, self-contained, reproducible
 code.
 
 ^^^
 G
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
 %~%~%~% 
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e]
 gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w]
 http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT

Re: [R] substitute NA values

2007-03-30 Thread Gavin Simpson

On Fri, 2007-03-30 at 16:25 +0200, Sergio Della Franca wrote:
 This is that i obtained.
 
 There isn't a method to replace the NA values only for character variable?

This is R, there is always a way (paraphrasing an R-Helper the name of
whom I forget just now). If you mean a canned function, not that I'm
aware of.

Here is one way:

## some example data - not exactly like yours
set.seed(1234)
dat - data.frame(test = sample(c(t,f), 9, replace = TRUE), 
  num = c(10,14,25,NA,40,45,44,47,NA))

## add an NA to dat$test to match your example
dat$test[8] - NA

## print out dat
dat

## count the various options in $test and return the name of
## the most frequent
freq - names(which.max(table(dat$test)))

## replace NA in $test with most frequent
dat$test[is.na(dat$test)] - freq

## print out dat again to show this worked
dat

There may be better ways - the names(which.max(table(...))) seems a bit
clunky to me but it is Friday afternoon and it's been a long week...

And, as this /is/ R, you could wrap that into a function for you use on
other data sets, but I'll leave that bit up to you.

HTH

G

 
 2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]:
 
  I assume you are referring to na.roughfix in randomForest.  I don't think
  it
  works for logical vectors or for factors outside of data frames:
 
   library(randomForest)
   DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5))
   na.roughfix(DF)
  Error in na.roughfix.data.frame(DF) : na.roughfix only works for
  numeric or factor
   DF$a - factor(DF$a)
   na.roughfix(DF$a)
  Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric
  data.
   na.roughfix(DF)
   a   b
  1  TRUE 1.0
  2 FALSE 2.0
  3  TRUE 3.0
  4  TRUE 2.5
  5  TRUE 5.0
 
 
  On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote:
   Dear R-Helpers,
  
  
   I have the following data set(y):
  
Test_Result   #_Test
  t 10
  f 14
  f 25
  f NA
  f 40
  t45
  t44
NA   47
  tNA
  
  
   I want to replace the NA values with the following method:
   - for the numeric variable, replace NA with median
   - for character variable , replace NA with the most frequent level
  
   If i use x-na.roughfix(y) the NA values are correctly replaced.
   But if i x-na.roughfix(y$Test_Result) i obtain the following error:
  
   roughfix can only deal with numeric data.
  
   How can i solve this proble that i met every time i want to replace only
  the
   NA values of a column (type character)?
  
   Thank you in advance.
  
  
   Sergio Della Franca
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kmeans centers

2007-03-29 Thread Gavin Simpson

On Thu, 2007-03-29 at 15:02 +0200, Sergio Della Franca wrote:
 Dear R-Helpers,
 
 I read in the R documentation, about kmeans:
 
   centers
 
 Either the number of clusters or a set of initial (distinct) cluster
 centres. *If a number*, a random set of (distinct) rows in x is chosen as
 the initial centres.
 My question is: could it be possible that the centers are character and not
 number?

I think you misunderstand - centers is the number of clusters you want
to partition your data into. How else would you specify the number of
clusters other than by a number? So no, it has to be a numeric number.

The alternative use of centers is to provide known starting points for
the algorithm, such as from the results of a hierarchical cluster
analysis, that are the locations of the cluster centroids, for each
cluster, on each of the feature variables.

Also, argument x to kmeans() is specific about requiring a numeric
matrix (or something coercible to one), so characters here are not
allowed either.

But then again, I may not have understood what it is that you are
asking, but that is not surprising given that you have not provided an
example of what you are trying to do, and how you tried to do it but
failed.

 and provide commented, minimal, self-contained, reproducible code.
  ^^^
G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Xemacs, ESS, R config issue

2007-03-29 Thread Gavin Simpson

On Thu, 2007-03-29 at 13:57 -0500, c n wrote:
 I've searched for 45 minutes, apparently in all the wrong places for a
 solution to a configuration issue I'm having.
 
 When I use Xemacs with ESS running in R-mode, and I type a - character, it
 autocompletes it to - .  How do I disable this annoying feature?
 
 Thanks much.

Don't know about disabling it, but to get a _, just press it twice.
_ doesn't auto-complete in text strings (quoted).

_ isn't allowed for assignment anymore, so the only place I can think
of at the mo is that you want to type a _ in a variable name?

Also, this is best addressed on the ESS mailing list:
ess-helpATstatDOTmathDOTethzDOTch (just remove the AT and DOTs)

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] concatenate 2 data.frames

2007-03-23 Thread Gavin Simpson

On Fri, 2007-03-23 at 08:51 +0100, João Fadista wrote:
 Dear all,
  
 I would like to know how can I concatenate 2 data.frames into a single
 one. Both data frames have the same number of columns and the same
 class type in each correspondent column. So what I want is to have a
 new data.frame where I have first the values from one data.frame and
 then the values from a second data.frame would came after in this new
 data.frame.
  
 Thanks in advance.

By after, do you mean columns for dataframe1 then columns of
dataframe2, or do you mean you want to append dataframe2 onto the bottom
of dataframe1?

The first is:
dat1 - data.frame(var1 = rnorm(10), var2 = rnorm(10), 
   var3 = gl(2, 5, labels = c(red, blue)))
dat2 - data.frame(var4 = rnorm(10), var5 = rnorm(10), 
   var6 = gl(2, 5, labels = c(red, blue)))
combined - data.frame(dat1, dat2)
combined
  var1var2 var3var4var5 var6
1  -1.61397560 -0.40296928  red  1.48380888  1.35501273  red
2   1.01901681 -0.27616320  red -1.00234243 -0.79328309  red
3  -0.88272375 -0.42375566  red -1.31503261 -0.04570735  red
4   1.37368014 -0.63154987  red -1.40635604  1.50906371  red
5   0.66810230 -0.43453383  red  0.30449564 -0.24893343  red
6  -0.06403118 -1.59095216 blue  0.41945472  0.09143192 blue
7   0.02208197  1.70299530 blue -1.64188953 -0.30545702 blue
8  -1.13057000 -0.67610437 blue -1.15801044  1.17682587 blue
9  -2.32315433 -0.07500192 blue  0.03576081 -1.14670543 blue
10 -0.64734307  0.74789423 blue -0.57466841 -1.69753353 blue

You could also use cbind().

The second could be:
## need to  provide the same variables names for matching columns
names(dat2) - c(var1, var2, var3)
rbind(dat1, dat2)

  var1var2 var3
1  -1.61397560 -0.40296928  red
2   1.01901681 -0.27616320  red
3  -0.88272375 -0.42375566  red
4   1.37368014 -0.63154987  red
5   0.66810230 -0.43453383  red
6  -0.06403118 -1.59095216 blue
7   0.02208197  1.70299530 blue
8  -1.13057000 -0.67610437 blue
9  -2.32315433 -0.07500192 blue
10 -0.64734307  0.74789423 blue
11  1.48380888  1.35501273  red
12 -1.00234243 -0.79328309  red
13 -1.31503261 -0.04570735  red
14 -1.40635604  1.50906371  red
15  0.30449564 -0.24893343  red
16  0.41945472  0.09143192 blue
17 -1.64188953 -0.30545702 blue
18 -1.15801044  1.17682587 blue
19  0.03576081 -1.14670543 blue
20 -0.57466841 -1.69753353 blue

HTH

G

  
 
 Med venlig hilsen / Regards
 
 Joo Fadista
 Ph.d. studerende / Ph.d. student
 
 
   
AARHUS UNIVERSITET / UNIVERSITY OF AARHUS  
 Det Jordbrugsvidenskabelige Fakultet / Faculty of Agricultural Sciences   
 Forskningscenter Foulum / Research Centre Foulum  
 Genetik og Bioteknologi / Dept. of Genetics and Biotechnology 
 Blichers All 20, P.O. BOX 50  
 DK-8830 Tjele 
   
 Tel:   +45 8999 1900  
 Direct:+45 8999 1900  
 Mobile:+45
 E-mail:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]   
 Web:   www.agrsci.dk http://www.agrsci.dk/  
 
 
 Tilmeld dig DJF's nyhedsbrev / Subscribe Faculty of Agricultural Sciences 
 Newsletter http://www.agrsci.dk/user/register?lan=dan-DK . 
 
 Denne email kan indeholde fortrolig information. Enhver brug eller 
 offentliggrelse af denne email uden skriftlig tilladelse fra DJF er ikke 
 tilladt. Hvis De ikke er den tiltnkte adressat, bedes De venligst straks 
 underrette DJF samt slette emailen.
 
 This email may contain information that is confidential. Any use or 
 publication of this email without written permission from Faculty of 
 Agricultural Sciences is not allowed. If you are not the intended recipient, 
 please notify Faculty of Agricultural Sciences immediately and delete this 
 email.
 
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change Axis Size

2007-03-23 Thread Gavin Simpson

On Fri, 2007-03-23 at 12:40 +, [EMAIL PROTECTED] wrote:
 Sorry if this is an obvious question but I have not been able to find the
 answer.
 I wish to plot 3 lines on the same plot. However, whichever one I plot
 first, the axis does not have a big enough range for the other two to be
 shown in the plot (they get cut off at the top and the bottom). Is there a
 way to change the range of the y-axis when adding a new line to the plot?
 
 Thank you,
 
 Alex

Use the ylim parameter in plot():

x - 1:100
y1 - sort(runif(100))
y2 - y1 * 1.2
y3 - y2 * rnorm(100)
plot(x, y1, type = n, ylim = range(y1, y2, y3))
lines(x, y1, col = red)
lines(x, y2, col = blue)
lines(x, y3, col = green)

An alternative is matplot:

matplot(x, cbind(y1,y2,y3), col = c(red, blue, green), 
type = l, lty = solid)

or 

matplot(x, cbind(y1,y2,y3), type = n)
matlines(x, cbind(y1,y2,y3), col = c(red, blue, green), 
 lty = solid)

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dynamic linear models in R

2007-03-22 Thread Gavin Simpson

On Wed, 2007-03-21 at 23:58 -0400, Johann Hibschman wrote:
 Hi all,
 
 I've just started working my way through Mike West and Jeff Harrison's
 _Bayesian Forecasting and Dynamic Models_, and I was wondering if
 there were any publically-available packages to handle dynamic linear
 models, as they describe.

Johann,

The one I'm most familiar with is package dlm by Giovanni Petris. There
is also package sspir by Claus Dethlefsen and Søren Lundbye-Christensen.
Both packages are on CRAN.

I have been using dlm for some recent DLM analysis I was doing and have
found it reasonably easy to use and the maintainer, Giovanni Petris, has
been extremely patient and helpful with the odd question I have had
about how to specify the models I wanted in dlm.

sspir has a formula interface so it may be easier to specify models in
it than dlm, but I have no experience of sspir in use.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kmeans

2007-03-20 Thread Gavin Simpson

On Tue, 2007-03-20 at 19:10 +0100, Sergio Della Franca wrote:
 Dear R-helpers,
 
 I have this dataset(y):
 
   YEAR   PRODUCTS
   1 10
   2 42
   3 25
   4 42
   5 40
   6 45
   7 44
   8 47
   9 42
 
 I perform kmeans clustering, and the results are the following:
 
 
 Cluster means:
   YEAR  PRODUCTS
 1 3.67 41.3
 2 7.50 44.5
 3 2.00 17.5
 
 Clustering vector:
 1 2 3 4 5 6 7 8 9
 3 1 3 1 1 2 2 2 2
 Now my problem is add acolumn at my dataset(y) whit the information of
 clustering vector, i.e.:
 
YEAR   PRODUCTS *clustering vector*
   1 10*3*
   2 42*1*
   3 25*3*
   4 42*1*
   5 40*1*
   6 45*2*
   7 44*2*
   8 47*2*
   9 42*2*
 
 
 How can I obtain my new dataset with the information of clustering
 vector?

Given dat is your data.frame:

 dat
  YEAR PRODUCTS
11   10
22   42
33   25
44   42
55   40
66   45
77   44
88   47
99   42

then the following does what you want:

set.seed(12345)
clust - kmeans(dat, 3) # 3 clusters as per example
new.dat - data.frame(dat, Cluster = clust$cluster)
new.dat

Gives a new data frame with the extra column:

  YEAR PRODUCTS Cluster
11   10   1
22   42   3
33   25   1
44   42   3
55   40   3
66   45   2
77   44   2
88   47   2
99   42   2

Or if you really want to add to the original data do this directly:

dat$Cluster - clust$cluster

which yields:

 dat
  YEAR PRODUCTS cluster
11   10   1
22   42   3
33   25   1
44   42   3
55   40   3
66   45   2
77   44   2
88   47   2
99   42   2

This is all covered in An Introduction to R, which the posting guide
asks you to read.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MANOVA permutation testing

2007-03-16 Thread Gavin Simpson

On Fri, 2007-03-16 at 00:50 +, Tyler Smith wrote:
 Hi,
 
 I've got a dataset with 7 variables for 8 different species. I'd like
 to test the null hypothesis of no difference among species for these
 variables. MANOVA seems like the appropriate test, but since I'm
 unsure of how well the data fit the assumptions of equal
 variance/covariance and multivariate normality, I want to use a
 permutation test. 
 
 I've been through CRAN looking at packages boot, bootstrap, coin,
 permtest, but they all seem to be doing more than I need. Is the
 following code an appropriate way to test my hypothesis:
 
 result.vect - c()
 
 for (i in 1:1000){
   wilks - summary.manova(manova(maxent~sample(max.spec)),
test=Wilks)$stats[1,2]
   result.vect - c(res.vect,wilks)
 }
 
 maxent is the data, max.spec is a vector of species names. Comparing
 the result.vect with the wilks value for the unpermuted data suggests
 there are very significant differences among species -- but did I do
 this properly?
 

Hi Tyler,

(without knowing more about your data) I think you are almost there, but
the R code can be made much more efficient.

When you create your result vector, is is of length 0. Each time you add
a result, R has to copy the current result object, enlarge it and so on.
This all takes a lot of time. Better to allocate storage first, then add
each result in turn be replacement. E.g.:

Using an example from ?summary.manova

tear - c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
   6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss - c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity - c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
  2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
Y - cbind(tear, gloss, opacity)
rate - factor(gl(2,10), labels=c(Low, High))

## define number of permutations
nperm - 999
## allocate storage, here we want 999 + 1 for our observed stat
res - numeric(nperm+1)
## do the loop - the seq(along ...) bit avoids certain problems
for(i in seq(along = res[-1])) {
## here we replace the ith value in the vector res with the stat
res[i] - summary(manova(Y ~ sample(rate)), 
  test = Wilks)$stats[1,2]
}
## now we append the observed stat onto the end of the result vector res
## we also store this in 'obs' for convenience
res[nperm+1] - obs - summary(manova(Y ~ rate), test =  
   Wilks)$stats[1,2]

## this is the permutation p-value - the proportion of the nperm
## permutations + 1 that are  greater than or equal to the 
## observed stat 'obs'
sum(res = obs) / (nperm+1)

HTH,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Gavin Simpson

On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
 Since you can index a matrix or dataframe with
 a matrix of logicals, you can use is.na()
 to index all the NA locations and replace them
 all with 0 in one command.
 

A quicker solution, that, IIRC,  was posted to the list by Peter
Dalgaard several years ago is:

sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

Some timings on a larger problem with 100 columns:

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), 
 size = 1000*100, replace = TRUE), 
 nrow = 1000))

 system.time(retval - sapply(mydata.df, 
   function(x) {x[is.na(x)] - 0; x}))
[1] 0.108 0.008 0.120 0.000 0.000

 system.time(mydata.df[is.na(mydata.df)] - 0)
[1] 2.460 0.028 2.498 0.000 0.000

And a larger problem still, 1000 columns

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), 
 size = 1000*1000, replace = TRUE), 
 nrow = 1000))

 system.time(retval - sapply(mydata.df, function(x) {x[is.na(x)] - 0;
x}))
[1] 0.908 0.068 2.657 0.000 0.000
 system.time(mydata.df[is.na(mydata.df)] - 0)
[1] 43.127  0.332 46.440  0.000  0.000

Profiling mydata.df[is.na(mydata.df)] - 0 shows that it spends most of
this time subsetting the the individual cells of the data frame in turn
and setting the NA ones to 0.

HTH

G

  mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, 
  replace = TRUE), nrow = 6))
  mydata.df
   V1 V2 V3 V4 V5
 1  1 NA  1  1  1
 2  1 NA NA NA  1
 3 NA NA  1 NA NA
 4 NA NA NA NA  1
 5 NA  1 NA NA  1
 6  1 NA NA  1  1
  is.na(mydata.df)
  V1V2V3V4V5
 1 FALSE  TRUE FALSE FALSE FALSE
 2 FALSE  TRUE  TRUE  TRUE FALSE
 3  TRUE  TRUE FALSE  TRUE  TRUE
 4  TRUE  TRUE  TRUE  TRUE FALSE
 5  TRUE FALSE  TRUE  TRUE FALSE
 6 FALSE  TRUE  TRUE FALSE FALSE
  mydata.df[is.na(mydata.df)] - 0
  mydata.df
   V1 V2 V3 V4 V5
 1  1  0  1  1  1
 2  1  0  0  0  1
 3  0  0  1  0  0
 4  0  0  0  0  1
 5  0  1  0  0  1
 6  1  0  0  1  1
  
 
 Steven McKinney
 
 Statistician
 Molecular Oncology and Breast Cancer Program
 British Columbia Cancer Research Centre
 
 email: [EMAIL PROTECTED]
 
 tel: 604-675-8000 x7561
 
 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C. 
 V5Z 1L3
 Canada
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D.
 Sent: Wed 3/14/2007 5:22 PM
 To: R-Help List
 Subject: [R] replacing all NA's in a dataframe with zeros...
  
 I've seen how to  replace the NA's in a single column with a data frame
 
 * mydata$ncigs[is.na(mydata$ncigs)]-0
 
 *But this is just one column... I have thousands of columns (!) that I need
 to do this, and I can't figure out a way, outside of the dreaded loop, do
 replace all NA's in an entire data frame (all vars) without naming each var
 separately. Yikes.
 
 I'm racking my brain on this, seems like I must be staring at the obvious,
 but it eludes me. Searches have come up CLOSE, but not quite what I need..
 
 Any pointers?
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Gavin Simpson

On Thu, 2007-03-15 at 10:21 +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
  On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:

  Since you can index a matrix or dataframe with
  a matrix of logicals, you can use is.na()
  to index all the NA locations and replace them
  all with 0 in one command.
 
  
 
  A quicker solution, that, IIRC,  was posted to the list by Peter
  Dalgaard several years ago is:
 
  sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

 I hope your memory fails you, because it doesn't actually work.

Ah, yes, apologies Peter. I have the sapply version embedded in a
package function that I happened to be working on (where I wanted the
result to be a matrix) and pasted directly from there and not my crib
sheet of useful R-help snippets where I do have it as lapply(...). I'd
forgotten I'd changed Peter's suggestion slightly in my function.

That'll teach me to reply before my morning cup of Earl Grey.

All the best,

G

 
  sapply(test.df, function(x) {x[is.na(x)] - 0; x})
  x1 x2 x3
 [1,]  0  1  1
 [2,]  2  2  0
 [3,]  3  3  0
 [4,]  0  4  4
 
 is a matrix, not a data frame.
 
 Instead:
 
  test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
  test.df
   x1 x2 x3
 1  0  1  1
 2  2  2  0
 3  3  3  0
 4  0  4  4
 
 Speedwise, sapply() is doing lapply() internally, and the assignment
 overhead should be small, so I'd expect similar timings.
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dataframe layout

2007-03-14 Thread Gavin Simpson

On Wed, 2007-03-14 at 03:53 -0500, Robert Baer wrote:
 Can someone remind me how to change the columns in df.a into a two column 
 df.b that contains one column of data and another column of the original 
 column headings as levels.
 
 Example:
 a=1:3
 b=4:6
 c=7:9
 df.a=data.frame(a,b,c)
 
 Should become in df.b:
 dat   lev
 1  a
 2  a
 3  a
 4  b
 5  b
 6  b
 7  c
 8  c
 9  c
 
 Thanks.

One option is stack()

 a=1:3
 b=4:6
 c=7:9
 df.a=data.frame(a,b,c)
 df.a
  a b c
1 1 4 7
2 2 5 8
3 3 6 9
 stack(df.a)
  values ind
1  1   a
2  2   a
3  3   a
4  4   b
5  5   b
6  6   b
7  7   c
8  8   c
9  9   c
 class(stack(df.a))
[1] data.frame

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] distance metrics

2007-03-12 Thread Gavin Simpson

On Mon, 2007-03-12 at 16:02 -0700, Sender wrote:
 Thanks for the suggestion Christian. I'm trying to avoid expanding the dist
 object to a matrix, since i'm usually working with microarray data which
 produces a distance matrix of size 5000 x 5000.
 
 If i can keep it in its condensed form i think it will speed things up.
 
 Is my thinking correct?

That will all depend on what you want to do with it...

A dist object of that size is c. 100 MB in memory, and c. 200 MB in size
as the full dissimilarity matrix - values from object.size(). Of course,
you'll need a reasonable amount of free memory over and above this to do
anything useful with the matrix as copies may be required during
analysis/processing etc.

Of course, a dist object is just a vector of observed distances with
various attributes, so one can always use [ for vectors, but I imagine
that anything other than trivial operations will become fiddly,
complicated and time consuming - if you have the memory, give the
as.matrix option a try and see how it works for your specific problems.

G

 
 
 On 3/12/07, Christian Hennig [EMAIL PROTECTED] wrote:
 
  On Mon, 12 Mar 2007, Sender wrote:
 
   Hello:
  
   Does anyone know if there exists a package that handles methods for [
  for
   dist objects?
  
   I would like to access a dist object using matrix notation
  
   e.g.
  
   dMat = dist(x)
   dMat[i,j]
 
  Try
  dMat - as.matrix(dist(x))
 
  Christian
 
 
 
  *** --- ***
  Christian Hennig
  University College London, Department of Statistical Science
  Gower St., London WC1E 6BT, phone +44 207 679 1698
  [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dendrogram / clusteranalysis plotting

2007-03-09 Thread Gavin Simpson

On Fri, 2007-03-09 at 01:01 +0100, bunny , lautloscrew.com wrote:
 Dear all,
 
 i performed a clusteranalysis - which worked so far...
 i plotted the dendrogram and sooo many branches, a rough sketch would  
 be enough ;)
 
 i tried max.levels therefore which worked, but not for the plot...

(re-)read ?dendrogram. function cut.dendrogram() can prune a tree's
lower branches. You can plot the returned object's $upper component,
which is itself an object of class dendrogram.

There is an example in ?dendrogram of using cut.

HTH

G

 
 i used the following
 
 plot(hcd,nodePar =nP, str(hcd,max.level=1))
 
 the output on the terminal was:
 
 --[dendrogram w/ 2 branches and 196 members at h = 2.70]
|--[dendrogram w/ 2 branches and 34 members at h = 1.79] ..
`--[dendrogram w/ 2 branches and 162 members at h = 1.95] ..
 
 which is great !
 
 but i cant get it done for the plot, the plot always shows all the  
 branches...!
 does anybody know how to fix this one ?
 
 thx in advance
 
 -m.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i group branches of a dendrogram

2007-03-09 Thread Gavin Simpson

On Fri, 2007-03-09 at 02:00 +0100, bunny , lautloscrew.com wrote:
 Hi all,
 
 how can i group branches of a dendrogram ?

Err... you'll need to give us more than that to go on. What do you mean
by group? Draw a marker round broad clusters, or prune them? Or
something else? I just replied with an answer that deals with pruning
back objects of class dendrogram, but if this is not what you mean in
this mail, reply with an example of what you tried and a description of
exactly what you want to do, and maybe someone can help.

G

 
 thx in advance
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dendrogram again

2007-03-09 Thread Gavin Simpson

On Fri, 2007-03-09 at 12:17 +0100, bunny , lautloscrew.com wrote:
 Hi all,
 
 ok, i know i can cut a dendrogram, which i did.
 all i get is three objects that a dendrograms itself.
 
 for example:
 myd$upper, myd$lower[[1]], myd$lower[[2]]
 and so on. of course i can plot them seperately now.
 
 but the lower parts still have hundreds of branches. i´ll need a 30   
 widescreen to watch the whole picture.
 what i´d like to is group the lower branches , so that i get a  
 dendrogram with a few branches, splitting only in the upper levels.  
 In terms of the cluster analysis, i just want to have a few bigger  
 clusters.
 
 thx,
 
 m.
 
 P.S.:
 putting parts of a cutted dendrogram back into to one could be an  
 idea ? is it somehow possible ?

Again, perhaps I'm missing something, but if I understand you correctly
(again no example I can follow - what is myd and how did you create
it?), you only want to plot the upper part of the dendrogram and not the
lower branches. If so, then this /is/ on ?dendrogram and you /do/ use
cut() to do it ...:

'cut.dendrogram()' returns a list with components '$upper' and
 '$lower', the first is a truncated version of the original tree,
 also of class 'dendrogram', the latter a list with the branches
 obtained from cutting the tree, each a 'dendrogram'.

So to only show the pruned tree, you just plot $upper - it does say that
$upper is a dendrogram and that it is the truncated version of the
original tree - which is what I understand you to be asking for. This
example shows it in action - this is what I mean by a reproducible
example - (I'm using package vegan as I am familiar with this data set):

require(vegan) ## if false install it
data(varespec)

hc - hclust(vegdist(varespec, bray), method = ward)
hc - as.dendrogram(hc)

## this is the full dendrogram - too many nodes, so prune
plot(hc)

## lets take four clusters and prune it back
hc.pruned - cut(hc, h = 1) # can't specify k so read height of first
# plot - cutting at h = 1 gives 4 clusters

# plot only the upper part of the tree showing only the 4 clusters
plot(hc.pruned$upper, center = TRUE)

Is this what you want? If not, using the example I provide above, tell
us exactly what you want to achieve.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to open more windows to make more graphs at once!

2007-03-07 Thread Gavin Simpson

On Wed, 2007-03-07 at 09:39 +0100, Faramarzi Monireh wrote:
 Dear R users,
 I have a data frame (test) including five columns of upper (numeric),
 lower (numeric), observed (numeric), best_sim (numeric) and stname
 (factor with 80 levels, each level with different length). Now I would
 like to write a short program to draw one graph as follow for each
 level of stname but I would like also to draw each time 12 graphs for
 the 12 levels of stname in the same graphic windows and save it as
 jpeg' file . This means at the end I will have 7 (80 levels/12=7)
 graphic windows and 7 jpeg files each one with 12 graphs (the last one
 with 8 graphs) for the 12 levels of stname. I already wrote the
 following script to do it each time for 12 levels of stname but I have
 to change script each time for the another 12 levels [line 3 in the
 script for example: for( i in levels(test$stname)[12:24))] and I do
 not know how can I save the obtained graphs (seven graphic windows) as
 jpeg files (e.g. plot1.jpeg, plot2.jpeg and so on). As I have 45
 dataset like this it would be gr!
  eat if somebody can help me to complete this script to do all
 together for a dataset using a script.
 Thank you very much in advance for your cooperation,
 Monireh
 

Hi Monireh,

I don't have your data set so I have generated some random data to
illustrate one approach to this.

## generate some data 
set.seed(1234)
dat - data.frame(upper = rnorm(100), lower = rnorm(100), 
  observed = rnorm(100), best_sim = rnorm(100), 
  stname = factor(gl(5, 20), labels = letters[1:5]))

## because this is going to be called 45 times, I've wrapped it in a
## function, foo()
## Note the filename arg. It contains %03d which means that R will
## insert a number and produce many jpegs, varying by this number
## e.g. myplot1.jpeg, myplot2.jpeg - see ?jpeg.
## the ... allow passing of arguments to jpeg
foo - function(x, filename = Rplot%03d.jpeg, ...) {
   ## start the jpeg device
   jpeg(filename = filename, ...)
   ## store the parameter defaults and set a 2 by 2 plot regions
   opar - par(mfrow = c(2,2))
   ## this insures that the device is closed and defaults restored on
   ## function exit
   on.exit({dev.off(); par(opar)})
   ## set up a loop to go over the levels of your factor
   for(i in levels(x$stname)) {
  ## do the plotting - here you need to add the plot commands
  ## you really want to use - these are just examples.
  plot(lower ~ upper, data = x, subset = stname == i)
  ## this just adds a lowess line, I use with() to make it easier
  ## to read.
  with(x, lines(lowess(upper[stname == i], lower[stname == i]), 
   col = red))
   }
   invisible()
}

## to use the function on the demo data
## uses default filename
foo(dat)

## or passing arguments to jpeg()
foo(dat, width = 600, height = 600, pointsize = 10)

## or using your own file name
foo(dat, filename = dataset1_%03d.jpeg, width = 600, height = 600,
pointsize = 10)

See ?jpeg to see why this works - the filename with %03d allows R to
produce several jpegs.

   
 windows(9,9)
 par(mfrow = c(3,4))
 for( i in levels(test$stname)[1:12])
 { 
 data- test[test$stname==i,]
 xx - c(1:length(data$upper), length(data$upper):1)
 yy - c(data$upper, rev(data$lower))
 zz- data$observed
 tt- data$Best_Sim
 par(lab =c(10,15,2))

In the line below, where you set the x- and y-limits, it would be
simpler and more readable to use range() instead of c(min(x), max(x) -
so your plot call could be:

plot.jpeg- plot(xx,yy, type=n, xlim= range(xx),  
 ylim=range(zz,yy,tt)*1.4), main= i, 
 xlab=Month (1990-2002),  
 ylab=Discharge(m3/s), 
 font.axis=6)

Also, you can format the y-label more nicely with:

ylab = expression(paste(Discharge (, m^-3 * s^{-1}, )))

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave issue: quotes in verbatim-like output print incorrectly

2007-03-07 Thread Gavin Simpson

On Wed, 2007-03-07 at 15:33 +1000, Peter Dunn wrote:
 Hi all
 
 I love Sweave; use it all the time.
 
 But I recently received a new computer, and ever since I
 have had a problem I've never seen before.
 
 For example, I place the following in my Snw file:

Try this in the preamble of your Snw file:

\usepackage[utf8x]{inputenc}

(assuming you have the inputenc package installed and available). I'm
assuming you are now using a machine using UTF-8 for character
encodings. I used to get that output on my linux box (FC4 - 6) before I
added the above \usepackage statement.

HTH

G
 
 =
 sms - 
 read.table(http://www.sci.usq.edu.au/staff/dunn/Datasets/applications/popular/smsspeed.dat;,
 header=TRUE)
 attach(sms)
 
 sms.lm - lm( Time ~ Age*Phone, subset=(Age30) )
 summary(sms.lm)
 @
 
 Standard stuff.   The output appears in the corresponding LaTeX
 file as it should, in a verbatim-like environment as it should. 
 
 But since I have had this new machine, this line of output:
 
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 appears in my resulting pdf document as
 
 Signif. codes: 0 ^a˘A¨Y***^a˘A´Z0.001 ^a˘A¨Y**^a˘A´Z0.01 ^a˘A¨Y*^a
 ˘A´Z0.05 ^a˘A¨Y.^a˘A´Z0.1 ^a˘A¨Y^a˘A´Z1
 
 In short, every quote is replaced by garbage.  This makes my
 output looks incredibly bad.  (This is true for all cases; the above
 is the output from my example.)
 
 I also imagine (hope!) there is a very simple fix.  Can anyone help me?
 
 Documents which used to produce the correct output document
 now do this, so it must be something to do with my machine 
 set up, or R set up, rather than the documents themselves, I guess.
 
 Any help appreciated.  I have no idea where to look for the solution
 (the FAQ. manuals and mailing archives were no help that I could see;
 happy to be corrected).
 
 P.
 
 
  version
_
 platform   i486-pc-linux-gnu
 arch   i486
 os linux-gnu
 system i486, linux-gnu
 status Patched
 major  2
 minor  4.0
 year   2006
 month  11
 day25
 svn rev39997
 language   R
 version.string R version 2.4.0 Patched (2006-11-25 r39997)
 
  sessionInfo()
 R version 2.4.0 Patched (2006-11-25 r39997)
 i486-pc-linux-gnu
 
 locale:
 LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU.UTF-8;LC_MONETARY=en_AU.UTF-8;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 [7] base
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multi-line plots with matrices in R

2007-03-07 Thread Gavin Simpson

On Wed, 2007-03-07 at 12:30 +, Joseph Wakeling wrote:
 Hello all,
 
 I'm a new user of R, experienced with Octave/MATLAB and therefore
 struggling a bit with the new syntax.
 
 One of the easy things in Octave or MATLAB is to plot multiple lines or
  sets of points by using a matrix where either the columns or the rows
 contain the y-values to be plotted.  Both packages automatically give
 each line/points their own unique colour, character etc.
 
 I'm wondering how I get the same functionality in R.  For example, if X
 is a vector of x-values and Y is a matrix whose rows contain the
 y-values, I can do,
 
 apply(Y,1,lines,x=X)

You want maplot here. See ?matplot  but here is an example:

## generate some data to use, a matrix of Y values
## and a vector of x indices.
mat - matrix(runif(100), ncol = 5)
vec - seq(1, 100, length = 20)

## plot it using matplot
matplot(vec, mat, type = l) # type = l to get lines

There is also matlines() and matpoints() for adding lines and points to
existing plots.

 
 ... but of course everything is all in black, with the same type of line
 or points.  I'd like each line to have its own unique colour and/or style.
 
 Another thing I'd like clarification on is the ability to update an
 existing plot.  For example if I do,
 
 plot.window(xlim=c(0,100),ylim=c(0,1))

Standard graphics in R are not modifiable after being plotted. You need
to re-plot. When plotting data, I rarely need plot.window. This is what
I would do:

x - 1:100 * runif(100)
y - seq(0,1, length = 100) * runif(100)

plot(x, y, xlim = c(0, 100), ylim = c(0, 1))

# now change the limits
plot(x, y, xlim = c(0, 100), ylim = c(0, 0.5))

 
 and then after plotting data decide I want ylim=c(0,0.5), how do I
 update the graphic?  A new plot.window() command does nothing.

But it does:

opar - par(mfrow = c(1,2))
plot(x, y, xlim = c(0, 100), ylim = c(0, 0.5))
plot(x, y, xlim = c(0, 100), ylim = c(0, 1))
plot.window(xlim = c(0, 100), ylim = c(0, 0.5))
points(x, y, col = red)
par(opar)

The points on the left plot correspond exactly to the points in red on
the right plot. The axis limits have changed, but because the axes have
already been labelled, these are not updated. We can illustrate this by
adding axes to the top and right of that plot

opar - par(mfrow = c(1,2), mar = c(5,4,4,4) + 0.1)
plot(x, y, xlim = c(0, 100), ylim = c(0, 0.5))
plot(x, y, xlim = c(0, 100), ylim = c(0, 1))
plot.window(xlim = c(0, 100), ylim = c(0, 0.5))
points(x, y, col = red)
axis(3)
axis(4)
par(opar)

Note the changed axis range in the right-hand margin. The problem is
that you can't use plot.window to achieve what you want, not that
plot.window doesn't do anything.

 
 Many thanks,
 
 -- Joe

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multi-line plots with matrices in R

2007-03-07 Thread Gavin Simpson

On Wed, 2007-03-07 at 15:11 +, Joseph Wakeling wrote:
 Gavin Simpson wrote:
  You want maplot here. See ?matplot  but here is an example:
 
 Great!  Thanks to you and Petr for pointing this out, it's exactly what
 I wanted.  Petr's other suggestions look interesting and I'll explore
 them at length later.
 
  Note the changed axis range in the right-hand margin. The problem is
  that you can't use plot.window to achieve what you want, not that
  plot.window doesn't do anything.
 
 Ahhh, I see.  So, it does not affect what has already been plotted, but
 affects how new material is inserted into the plot area.  Entering
 
 plot.window(xlim=c(0,100),ylim=c(0,0.5))
 axis(1)
 axis(2)
 plot.window(xlim=c(0,100),ylim=c(0,1))
 axis(2)
 
 ... is instructive. :-)
 
 So, _is_ there a command which will rearrange the existing plotted
 items, including axes?  Or does R require that I have a good idea of the
 space in which I want to plot from the start?

Not with the standard R graphics - think of the graphics window as a
piece of paper and if you draw anything on it you have done so in
permanent ink. If something needs changing you need a new sheet of paper
and have to redraw the lot. Most people I know write their code in some
text editor and send (or copy paste) it into R. It is an easy matter to
edit one or two bits of your code to tweak the display and re-plot...

I think you can modify lattice graphics objects and just plot (print
really) them again - but again you are really redrawing the whole plot
from scratch. IIRC grid might be able to do some of what you are looking
for.

 
 Oh, and a quick cosmetic query---I notice that the axes when created are
 spaced apart somewhat so the axis lines do not meet at the plot origin.
  Is there a way to alter this so that the outline of the box, and the
 extreme values of the axis, match up?
 

Look at ?par and xaxs and yaxs. E.g.

plot(1:10, xaxs = i, yaxs = i)

G

 Thanks again,
 
 -- Joe
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stl function

2007-02-27 Thread Gavin Simpson

On Tue, 2007-02-27 at 15:55 +0100, Anja Eggert wrote:
 I want to apply the stl-function to decompose a time series (daily 
 measurements over 22 years) into seasonal component, trend and 
 residuals. I was able to get the diagrams.
 However, I could not find out what are the equations behind it. I.e. it 
 is probably not an additive or multiplicative combination of season (as 
 sin and cos-functions) and a linear trend?
 Furthermore, what are the grey bars on the right hand side of the diagrams?
 I would appreciate very much to receive some information or maybe a good 
 reference.
 
 Thank you very much,
 Anja
 

?stl tells you all you need to know to answer this, including the
reference to the academic publication that describes the method.

?plot.stl tells you that the grey bars are range bars - they are used to
assess the relative magnitude of various decomposed components.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ts; decompose; plot and title

2007-02-27 Thread Gavin Simpson

On Tue, 2007-02-27 at 15:24 -0200, Alberto Monteiro wrote:
 Is there any way to give a decent title after I plot something
 generated by decompose?
 
 For example:
 
 # generate something with period 12
 x - rnorm(600) + sin(2 * pi * (1:600) / 12)
 
 # transform to a monthy time series
 y - ts(x, frequency=12, start=c(1950,1))
 
 # decompose
 z - decompose(y)
 
 # plot
 plot(z)
 
 Now, the title is the ugly Decomposition of additive time series.
 How can do this with a decent title, like Analysis of UFO abductions?
 
 Alberto Monteiro

It is because plot.decompose.ts decides to impose it's own title for
some reason (using getAnywhere(plot.decompose.ts) to get the function
definition):

function (x, ...)
{
plot(cbind(observed = x$random + if (x$type == additive)
x$trend + x$seasonal
else x$trend * x$seasonal, trend = x$trend, seasonal = x$seasonal,
random = x$random), main = paste(Decomposition of,
x$type, time series), ...)
}

I'd just write your own wrapper instead, using plot.decompose.ts, along
the lines of:

decomp.plot - function(x, main = NULL, ...)
{
if(is.null(main))
main - paste(Decomposition of, x$type, time series)
plot(cbind(observed = x$random + if (x$type == additive)
x$trend + x$seasonal
else x$trend * x$seasonal, trend = x$trend, seasonal = x$seasonal,
random = x$random), main = main, ...)
}

#then to complete your example:

# generate something with period 12
x - rnorm(600) + sin(2 * pi * (1:600) / 12)

# transform to a monthy time series
y - ts(x, frequency=12, start=c(1950,1))

# decompose
z - decompose(y)

# plot
decomp.plot(z, main = Analysis of UFO abductions)

Perhaps you could also file a bug report under the wish list category,
showing your example and the fact that

plot(z, main = Analysis of UFO abductions) 

gives this error:

Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels =
xy.labels,  :
formal argument main matched by multiple actual arguments

It isn't really a bug, but an infelicity in the way the function
currently works - my decomp.plot function may even be a suitable patch
or maybe the following is better:

decomp.plot2 - function(x, main, ...)
{
if(missing(main))
main - paste(Decomposition of, x$type, time series)
plot(cbind(observed = x$random + if (x$type == additive)
x$trend + x$seasonal
else x$trend * x$seasonal, trend = x$trend, seasonal = x$seasonal,
random = x$random), main = main, ...)
}

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RDA and trend surface regression

2007-02-27 Thread Gavin Simpson

 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TRUE/FALSE as numeric values

2007-02-23 Thread Gavin Simpson

On Fri, 2007-02-23 at 14:38 +0100, Thomas Preuth wrote:
 Hello,
 
 I want to select in a column of a dataframe all numbers smaller than a 
 value x
 but when I type in test-(RSF_EU$AREA=x) I receiv as answer:
   test
  [1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE 
 FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
 [18]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  
 TRUE  TRUE FALSE  TRUE  TRUE  TRUE
 [35] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  
 TRUE  TRUE FALSE FALSE  TRUE FALSE
 [52]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE
 
 How can i get the values smaller than x and not the TRUE/FALSE reply?
 
 Thanks in advance,
 Thomas

You need to subset your object based on the results you achieved above.
What you did was only half the job. See this example, with a number of
ways to get what you want:

## some dummy data to work with
dat - 10 * runif(100)
dat - data.frame(AREA = dat, FOO = dat + rnorm(100))

## select values of AREA less than mean AREA
mn - mean(dat$AREA)
want1 - with(dat, AREA[AREA = mn])
## or
want2 - dat$AREA[dat$AREA = mn]
## or
want3 - subset(dat$AREA, dat$AREA = mn)
## or
want4 - subset(dat, AREA = mn)$AREA
## check they all do same thing
all.equal(want1, want2, want3, want4) ## TRUE

want2 is closest to how you tried to do it:

dat$AREA[dat$AREA = mn]
 ^^
Notice that you only did the inner bit marked, which as you found
returns TRUE/FALSE depending on whether that element of AREA met the
criterion of being less than or equal to your x. This information is
used to select elements from AREA using the subsetting functions for
objects.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Point estimate from loess contour plot

2007-02-09 Thread Gavin Simpson

On Thu, 2007-02-08 at 19:09 +, Laura Quinn wrote:
 Hi,
 
 I was wondering if anyone knows of a way by which one can estimate values
 from a contour plot created by using the loess function? I am hoping to
 use the loess contour plot as a means of interpolation to identify
 the loess created values at points at pre-defined (x,y) locations.
 
 Could anyone point me in the right direction please?
 
 Thanks.
 
 Laura Quinn

Hi Laura,

Using an example from MASS (the book by Venables and Ripley, page 423 in
4th Ed (2002)) and the topo data set:

require(MASS)
## loess surface
topo.lo - loess(z ~ x * y, topo, degree = 1, span = 0.25,
 normalize = FALSE)
topo.mar - list(x = seq(0, 6.5, 0.1), y = seq(0, 6.5, 0.1))
new.dat - expand.grid(topo.mar)
topo.pred - predict(topo.lo, new.dat)
## draw the contour map based on loess predictions
eqscplot(topo.mar, type = n)
contour(topo.mar$x, topo.mar$y, topo.pred,
levels = seq(700, 1000, 25), add = TRUE)
## original points
points(topo, col = red)

So now we have a loess surface defined by the model (topo.lo) and we can
use the predict.loess method to get point predictions based on the
model. This is what was used to produce the points draw the contour
surface, but on a regular grid. For some new point, not on this regular
grid we can use the same approach:

 predict(topo.lo, data.frame(x = 4.8, y = 3.1))
[1] 824.0046

which yields a height of 824 and a bit feet for those coordinates. You
can get the standard errors of the predicted value as well:

 predict(topo.lo, data.frame(x = 4.8, y = 3.1), se = TRUE)
$fit
[1] 824.0046

$se.fit
[1] 7.677035

$residual.scale
[1] 18.95324

$df
[1] 34.00484

And of course, you aren't restricted to doing this one point at a time:

 predict(topo.lo, data.frame(x = c(4.8, 4.9, 3.1, 2.6),
+ y = c(3.1, 2.3, 4.5, 5.6)),
+ se = TRUE)
$fit
[1] 824.0046 849.2514 760.2926 744.2987

$se.fit
[1] 7.677035 7.127979 6.364493 7.093619

$residual.scale
[1] 18.95324

$df
[1] 34.00484

Is this what you were looking for?

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R and S-Plus got the different results of principal component analysis from SAS, why?

2007-01-30 Thread Gavin Simpson

]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RMySQL connection

2007-01-19 Thread Gavin Simpson

On Fri, 2007-01-19 at 09:02 +, qing Jing wrote:
snip /
 Dear All,
 
 What's wrong?
snip /
 I am runningR 2.1.1 DBI 0.1-9   RMySQL  0.5-6 Windows  XP
  
I'd start with updating your R installation to something less archaic -
yours is at least 18 months out of date. Latest version is R 2.4.1.

 
 Thank you much for your help.
 
 Qing Jing   PhD  MD

If you still have problems, first read the posting guide and then email
the list, following the instructions in the guide.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] hiccup in apply?

2007-01-19 Thread Gavin Simpson

On Fri, 2007-01-19 at 11:36 -0500, bogdan romocea wrote:
 Hello, I don't understand the behavior of apply() on the data frame below.
 
 test -
 structure(list(Date = structure(c(13361, 13361, 13361, 13361,
 13361, 13361, 13361, 13361, 13362, 13362, 13362, 13362, 13362,
 13362, 13362, 13362, 13363, 13363, 13363, 13363, 13363, 13363,
 13363, 13363, 13364, 13364, 13364, 13364, 13364, 13364, 13364,
 13364, 13365, 13365, 13365, 13365, 13365, 13365, 13365, 13365,
 13366, 13366, 13366, 13366, 13366, 13366, 13366, 13366, 13367,
 13367), class = Date), RANK = as.integer(c(19, 7, 5, 4, 6,
 3, 3, 4, 18, 7, 6, 4, 6, 3, 3, 4, 19, 7, 6, 4, 6, 3, 3, 4, 18,
 6, 7, 4, 6, 3, 3, 4, 18, 6, 7, 4, 6, 3, 3, 4, 18, 6, 7, 4, 6,
 3, 3, 4, 18, 6))), .Names = c(Date, RANK), row.names = c(1,
 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
 47, 48, 49, 50), class = data.frame)
 
 #---fine
  summary(test)
   Date RANK
  Min.   :2006-08-01   Min.   : 3.00
  1st Qu.:2006-08-02   1st Qu.: 4.00
  Median :2006-08-04   Median : 5.50
  Mean   :2006-08-03   Mean   : 6.62
  3rd Qu.:2006-08-05   3rd Qu.: 6.75
  Max.   :2006-08-07   Max.   :19.00
 
 #---isn't this supposed to work?
  apply(test,2,mean)
 Date RANK
   NA   NA
 Warning messages:
 1: argument is not numeric or logical: returning NA in:
 mean.default(newX[, i], ...)
 2: argument is not numeric or logical: returning NA in:
 mean.default(newX[, i], ...)

Look at ?apply and details. 

Argument X of apply is supposed to be an array. Details says:

 If 'X' is not an array but has a dimension attribute, 'apply'
 attempts to coerce it to an array via 'as.matrix' if it is
 two-dimensional (e.g., data frames) or via 'as.array'.

So you should look at what is happening with as.matrix():

str(as.matrix(test))
 chr [1:50, 1:2] 2006-08-01 2006-08-01 2006-08-01 ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:50] 1 2 3 4 ...
  ..$ : chr [1:2] Date RANK

Notice this is now a character matrix and not what you thought it was.
So look at ?as.matrix and we see:

 'as.matrix' is a generic function. The method for data frames will
 convert any non-numeric/complex column into a character vector
 using 'format' and so return a character matrix, except that
 all-logical data frames will be coerced to a logical matrix.  When
 coercing a vector, it produces a one-column matrix, and promotes
 the names (if any) of the vector to the rownames of the matrix.

Which explains what is happening.

Workaround:

lapply(test, mean)
sapply(test, mean)

Both work

HTH,

G

 Thank you,
 b.
 
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  4.0
 year   2006
 month  10
 day03
 svn rev39566
 language   R
 version.string R version 2.4.0 (2006-10-03)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie question: Statistical functions (e.g., mean, sd) in a transform statement?

2007-01-19 Thread Gavin Simpson

On Fri, 2007-01-19 at 11:54 -0600, Ben Fairbank wrote:
 Greetings listeRs - 

Here are two solutions, depending on whether you wanted the NA's or not,
and I assume you wanted the row means:

 times3 - transform(times, meantime = rowMeans(times))
 times3
  time1time2 time3time4 meantime
1 70.408543 48.92378  7.399605 95.93050 55.66561
2 17.231940 27.48530 82.962916 10.20619 34.47159
3 20.279220 10.33575 66.209290 30.71846 31.88568
4NA 53.31993 12.398237 35.65782   NA
5  9.295965   NA 48.929201   NA   NA
6 63.966518 42.16304  1.777342   NA   NA
 times4 - transform(times, meantime = rowMeans(times, na.rm = TRUE))
 times4
  time1time2 time3time4 meantime
1 70.408543 48.92378  7.399605 95.93050 55.66561
2 17.231940 27.48530 82.962916 10.20619 34.47159
3 20.279220 10.33575 66.209290 30.71846 31.88568
4NA 53.31993 12.398237 35.65782 33.79200
5  9.295965   NA 48.929201   NA 29.11258
6 63.966518 42.16304  1.777342   NA 35.96897

HTH

G

 
 Given a data frame such as 
 
  
 
 times
 
time1time2 time3time4
 
 1  70.408543 48.92378  7.399605 95.93050
 
 2  17.231940 27.48530 82.962916 10.20619
 
 3  20.279220 10.33575 66.209290 30.71846
 
 4 NA 53.31993 12.398237 35.65782
 
 5   9.295965   NA 48.929201   NA
 
 6  63.966518 42.16304  1.777342   NA
 
  
 
 one can use transform to total all or some columns, thus,
 
  
 
 times2 - transform(times,totaltime=time1+time2+time3+time4)
 
  
 
  times2
 
time1time2 time3time4 totaltime
 
 1  70.408543 48.92378  7.399605 95.93050  222.6624
 
 2  17.231940 27.48530 82.962916 10.20619  137.8863
 
 3  20.279220 10.33575 66.209290 30.71846  127.5427
 
 4 NA 53.31993 12.398237 35.65782NA
 
 5   9.295965   NA 48.929201   NANA
 
 6  63.966518 42.16304  1.777342   NANA
 
  
 
 I cannot, however, find a way, other than for looping,
 
 to use statistical functions, such as mean or sd, to 
 
 compute the new column.  For example,
 
  
 
 
 times2-transform(times,meantime=(mean(c(time1,time2,time3,time4),na.rm=
 TRUE)))
 
  
 
  times2
 
  
 
  time1time2 time3time4 meantime
 
 1  70.408543 48.92378  7.399605 95.93050 45.54178
 
 2  17.231940 27.48530 82.962916 10.20619 45.54178
 
 3  20.279220 10.33575 66.209290 30.71846 45.54178
 
 4 NA 53.31993 12.398237 35.65782 45.54178
 
 5   9.295965   NA 48.929201   NA 45.54178
 
 6  63.966518 42.16304  1.777342   NA 45.54178
 
  
 
 How can this be done?  And, generally, what is the recommended method 
 
 for creating computed new columns in data frames when for loops take 
 
 too long?
 
  
 
 With thanks for any suggestions,
 
  
 
 Ben Fairbank
 
  
 
 Using version 2.4.1 on a Windows XP professional operating system.
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The math underlying the `betareg' package?

2007-01-18 Thread Gavin Simpson

On Thu, 2007-01-18 at 20:00 +0530, Ajay Narottam Shah wrote:
 Folks,
 
 The betareg package appears to be polished and works well. But I would
 like to look at the exact formulas for the underlying model being
 estimated, the likelihood function, etc. E.g. if one has to compute
 \frac{\partial E(y)}{\partial x_i}, this requires careful calculations
 through these formulas. I read Regression analysis of variates
 observed on (0,1): percentages, proportions and fractions, by
 Kieschnick  MucCullogh, `Statistical Modelling 2003, 3:193-213. They
 say that the beta regression that they show is a proposal of theirs -
 is this the same as what betareg does, or is this the Standard
 Formulation?

If you want to know, the best place to look is the source code for the
package, available as a tar.gz file from all good CRAN Mirrors.

I suggest this as the Windows binary might not contain the original
source (i.e unprocessed with comments etc) - I forget now exactly how
the binaries on that platform differ.

 
 What else should I be reading about beta regressions? :-)

The reference cited in the References section of ?betareg would also be
a good start, esp to understand what the betareg package is doing and
how it compares to the other ref you cite.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] percent sign in plot annotation

2007-01-17 Thread Gavin Simpson

On Wed, 2007-01-17 at 09:57 +, Martin Keller-Ressel wrote:
 Hello,
 
 I would like to annotate a graph with the expression 'alpha = 5%' (the  
 alpha should be displayed as the greek letter).
 I tried
 
  text(1,1,expression(alpha == 5%))
 
 which gives a syntax error.
 escaping the percent sign (\%) or doubling (%%) does not help.
 What do I do?
 
 Thanks,
 
 Martin Keller-Ressel

Escaping a % with \ and then escaping the \ is not valid syntactically.

This works, but there may be better ways to do this:

plot(0:10, 0:10, type = n)
text(5,5,expression(paste(alpha == 5, %, sep = )))

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Gavin Simpson

On Sun, 2007-01-07 at 12:01 +, Mark Wardle wrote:
 Dear all,
 
 The as.Date() function appears to give different results depending on
 the order of the vector passed into it.
 
 d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
 d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
 as.Date(d1)   # gives correct results
 as.Date(d2)   # fails with error (* see below)
 
 This problem does not arise if the dates are NA rather than an empty
 string, but my data is coming via RODBC and I still don't have NAs
 passed across properly.
 
 I might add that I initially noticed this behaviour when using RODBC's
 sqlQuery() function call, and I initially had difficulty explaining why
 one column of dates was passed correctly, but another failed. The
 failing column was a date of death column where it was NA () for
 most patients.
 
 I've come up with two workarounds that work. The first is to sort the
 data at the SQL level, ensuring the initial record is not null. The
 second is to use sqlQuery() with as.is=T option, and then do the sorting
 and conversion afterwards.

Why not just tell R what the format the dates are in, using the format
argument to as.Date?

 d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
 d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
 as.Date(d1, %Y-%m-%d)
[1] 1900-01-01 2007-01-01 NA   2001-05-03
 as.Date(d2, %Y-%m-%d)
[1] NA   1900-01-01 2007-01-01 2001-05-03

 
 Is the behaviour of as.Date() shown above as expected/designed?

I don't know about expected/designed, but I would have thought
explicitly stating the date format would be the most fool-proof way of
making sure R did what you wanted, and the easiest way to work around
your problem.

HTH

G

 
 Many thanks,
 
 Mark
 
 
 (*) Error in fromchar(x) : character string is not in a standard
 unambiguous format
 
 sessionInfo():
 R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale:
 C/en_GB.UTF-8/C/C/C/C
 attached base packages:
 [1] methods   stats graphics  grDevices utils
 datasets base
 
 other attached packages:
 rcompletion   RODBC
0.0-12 1.1-7
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] listing all functions in R

2007-01-06 Thread Gavin Simpson

Dear List,

I'm building an R syntax highlighting file for GeSHi [*] for a website I
am currently putting together. The syntax file needs a list of keywords
to highlight. How can I generate a list of all the functions in a base R
installation?

Ideally the list would be formatted like this:

'fun1', 'fun2', 'fun3' 

when printed to the screen so I can copy and paste it into the syntax
file.

I'm sure this has been asked before, but I stupidly didn't save that
email and I couldn't come up with a suitable query parameter for
Jonathan Baron's search site to return results before timing out.

Thanks in advance,

Gav
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] listing all functions in R

2007-01-06 Thread Gavin Simpson

On Sat, 2007-01-06 at 13:48 +, Prof Brian Ripley wrote:
 Could you tell us what you mean by

Thank you for your reply, Prof. Ripley.

 
 - 'function'  (if() and + are functions in R, so do you want those?)

I was thinking about functions that are used like this: foo()
So I don't need things like names-. I don't need functions like +. -,
$, as I can highlight the separately if desired, though I'm not doing
this at the moment.

Functions like for() while(), if() function() are handled separately.

 
 - 'a base R installation'?   What is 'base R' (standard + recommended 
 packages?)  And on what platform: the list is platform-specific?

Yes, I mean standard + recommended packages. As for platform, most of my
intended audience will be MS Windows users, though I am using Linux
(Fedora) to generate this list (i.e. my R installation is on Linux).

 
 Here is a reasonable shot:
 
 findfuns - function(x) {
  if(require(x, character.only=TRUE)) {
 env - paste(package, x, sep=:)
 nm - ls(env, all=TRUE)
 nm[unlist(lapply(nm, function(n) exists(n, where=env,
mode=function,
inherits=FALSE)))]
  } else character(0)
 }
 pkgs - dir(.Library)
 z -  lapply(pkgs, findfuns)
 names(z) - pkgs

Excellent, that works just fine for me. I can edit out certain packages
that I don't expect to use, before formatting as desired. I can also use
this function on a library of packages that I use regularly and will be
using in the web pages.

 
 I don't understand your desired format, but
 
 write(sQuote(sort(unique(unlist(z, )

I wanted a single string ..., with entries enclosed in '' and
separated by , (this is to go in a PHP array). I can generate such a
string from your z, above, as follows:

paste(sQuote(sort(unique(unlist(z)), decreasing = TRUE)), 
  collapse = , )

 
 gives a single-column quoted list.  It does include internal functions, 
 operators, S3 methods ... so you probably want to edit it.

Once again, thank you.

All the best

Gav

 
 
 On Sat, 6 Jan 2007, Gavin Simpson wrote:
 
  Dear List,
 
  I'm building an R syntax highlighting file for GeSHi [*] for a website I
  am currently putting together. The syntax file needs a list of keywords
  to highlight. How can I generate a list of all the functions in a base R
  installation?
 
  Ideally the list would be formatted like this:
 
  'fun1', 'fun2', 'fun3'
 
  when printed to the screen so I can copy and paste it into the syntax
  file.
 
  I'm sure this has been asked before, but I stupidly didn't save that
  email and I couldn't come up with a suitable query parameter for
  Jonathan Baron's search site to return results before timing out.
 
  Thanks in advance,
 
  Gav
 
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] listing all functions in R

2007-01-06 Thread Gavin Simpson

On Sat, 2007-01-06 at 10:43 -0500, Duncan Murdoch wrote:
 On 1/6/2007 9:25 AM, Gavin Simpson wrote:
  On Sat, 2007-01-06 at 13:48 +, Prof Brian Ripley wrote:
  Could you tell us what you mean by
  
  Thank you for your reply, Prof. Ripley.
  
  - 'function'  (if() and + are functions in R, so do you want those?)
  
  I was thinking about functions that are used like this: foo()
  So I don't need things like names-. I don't need functions like +. -,
  $, as I can highlight the separately if desired, though I'm not doing
  this at the moment.
  
  Functions like for() while(), if() function() are handled separately.
  
  - 'a base R installation'?   What is 'base R' (standard + recommended 
  packages?)  And on what platform: the list is platform-specific?
  
  Yes, I mean standard + recommended packages. As for platform, most of my
  intended audience will be MS Windows users, though I am using Linux
  (Fedora) to generate this list (i.e. my R installation is on Linux).

Cheers Duncan.

 Be careful:  the installed list of functions differs slightly from 
 platform to platform.  For example, on Windows there's a function 
 choose.dir in the utils package, but I don't think this exists on Unix.

Good point. However as I am in control of the R snippets I display on
the web pages and the highlighting file/list, I can add the odd thing
that Brian Ripley's findfuns function doesn't list because of platform
differences.

What I wanted to avoid was having to add functions to my key word list
each time I wrote another page on the site that used a new R snippet. As
it is early days, I'd probably spend about as much time adding functions
to the keyword list as writing pages for the site - which would put me
of a bit and slow me down. At least now I only have to add the odd
function missed.

 
 The list also varies from version to version, so if you could manage to 
 run some code in the user's installed R to generate the list on the fly, 
 you'd get the most accurate list.

Yes. I'm planning on wrapping findfuns into a little R script that
searches additional packages that I'll use, and that will update the
packages before compiling the list. I can then run this script
periodically in R to update the list, as R is updated etc.

 
 Duncan Murdoch

Many thanks for your reply,

All the best,

G

 
  
  Here is a reasonable shot:
 
  findfuns - function(x) {
   if(require(x, character.only=TRUE)) {
  env - paste(package, x, sep=:)
  nm - ls(env, all=TRUE)
  nm[unlist(lapply(nm, function(n) exists(n, where=env,
 mode=function,
 inherits=FALSE)))]
   } else character(0)
  }
  pkgs - dir(.Library)
  z -  lapply(pkgs, findfuns)
  names(z) - pkgs
  
  Excellent, that works just fine for me. I can edit out certain packages
  that I don't expect to use, before formatting as desired. I can also use
  this function on a library of packages that I use regularly and will be
  using in the web pages.
  
  I don't understand your desired format, but
 
  write(sQuote(sort(unique(unlist(z, )
  
  I wanted a single string ..., with entries enclosed in '' and
  separated by , (this is to go in a PHP array). I can generate such a
  string from your z, above, as follows:
  
  paste(sQuote(sort(unique(unlist(z)), decreasing = TRUE)), 
collapse = , )
  
  gives a single-column quoted list.  It does include internal functions, 
  operators, S3 methods ... so you probably want to edit it.
  
  Once again, thank you.
  
  All the best
  
  Gav
  
 
  On Sat, 6 Jan 2007, Gavin Simpson wrote:
 
  Dear List,
 
  I'm building an R syntax highlighting file for GeSHi [*] for a website I
  am currently putting together. The syntax file needs a list of keywords
  to highlight. How can I generate a list of all the functions in a base R
  installation?
 
  Ideally the list would be formatted like this:
 
  'fun1', 'fun2', 'fun3'
 
  when printed to the screen so I can copy and paste it into the syntax
  file.
 
  I'm sure this has been asked before, but I stupidly didn't save that
  email and I couldn't come up with a suitable query parameter for
  Jonathan Baron's search site to return results before timing out.
 
  Thanks in advance,
 
  Gav
 
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] listing all functions in R

2007-01-06 Thread Gavin Simpson

On Sat, 2007-01-06 at 10:58 -0500, Gabor Grothendieck wrote:
 The arguments to the functions can differ too even if they
 exist on multiple platforms.   system() on Windows has the
 input= argument but not on UNIX.

That's a good point Gabor, and one I hadn't considered as yet. As I'm
only just setting out on the road to providing R help resources for the
wider world (rather than the limited environs of the courses I have
run), I tend to not have thought about these things much - though I
guess I have a few gotchas waiting to bite me in the ass before too
long.

I am just starting to think about the best way to organise the snippets
of code to allow me to keep them up-to-date with current R and changes
in package code that the snippets use. Dropping the code verbatim into
PHP scripts isn't a good idea. At the moment I intend to store all
snippets in individual *.R files and read them into to variables within
the PHP scripts, from where they will be highlighted and formatted for
display.

It would be reasonably easy to write an R script to source all *.R files
in a directory to look for errors and problems. And having them all as
separate files means I can still use Emacs/ESS to prepare, format, and
run the code through R, which is my preferred environment.

All the best,

G

 
 On 1/6/07, Duncan Murdoch [EMAIL PROTECTED] wrote:
  On 1/6/2007 9:25 AM, Gavin Simpson wrote:
   On Sat, 2007-01-06 at 13:48 +, Prof Brian Ripley wrote:
   Could you tell us what you mean by
  
   Thank you for your reply, Prof. Ripley.
  
   - 'function'  (if() and + are functions in R, so do you want those?)
  
   I was thinking about functions that are used like this: foo()
   So I don't need things like names-. I don't need functions like +. -,
   $, as I can highlight the separately if desired, though I'm not doing
   this at the moment.
  
   Functions like for() while(), if() function() are handled separately.
  
   - 'a base R installation'?   What is 'base R' (standard + recommended
   packages?)  And on what platform: the list is platform-specific?
  
   Yes, I mean standard + recommended packages. As for platform, most of my
   intended audience will be MS Windows users, though I am using Linux
   (Fedora) to generate this list (i.e. my R installation is on Linux).
 
  Be careful:  the installed list of functions differs slightly from
  platform to platform.  For example, on Windows there's a function
  choose.dir in the utils package, but I don't think this exists on Unix.
 
  The list also varies from version to version, so if you could manage to
  run some code in the user's installed R to generate the list on the fly,
  you'd get the most accurate list.
 
  Duncan Murdoch
 
  
   Here is a reasonable shot:
  
   findfuns - function(x) {
if(require(x, character.only=TRUE)) {
   env - paste(package, x, sep=:)
   nm - ls(env, all=TRUE)
   nm[unlist(lapply(nm, function(n) exists(n, where=env,
  mode=function,
  inherits=FALSE)))]
} else character(0)
   }
   pkgs - dir(.Library)
   z -  lapply(pkgs, findfuns)
   names(z) - pkgs
  
   Excellent, that works just fine for me. I can edit out certain packages
   that I don't expect to use, before formatting as desired. I can also use
   this function on a library of packages that I use regularly and will be
   using in the web pages.
  
   I don't understand your desired format, but
  
   write(sQuote(sort(unique(unlist(z, )
  
   I wanted a single string ..., with entries enclosed in '' and
   separated by , (this is to go in a PHP array). I can generate such a
   string from your z, above, as follows:
  
   paste(sQuote(sort(unique(unlist(z)), decreasing = TRUE)),
 collapse = , )
  
   gives a single-column quoted list.  It does include internal functions,
   operators, S3 methods ... so you probably want to edit it.
  
   Once again, thank you.
  
   All the best
  
   Gav
  
  
   On Sat, 6 Jan 2007, Gavin Simpson wrote:
  
   Dear List,
  
   I'm building an R syntax highlighting file for GeSHi [*] for a website I
   am currently putting together. The syntax file needs a list of keywords
   to highlight. How can I generate a list of all the functions in a base R
   installation?
  
   Ideally the list would be formatted like this:
  
   'fun1', 'fun2', 'fun3'
  
   when printed to the screen so I can copy and paste it into the syntax
   file.
  
   I'm sure this has been asked before, but I stupidly didn't save that
   email and I couldn't come up with a suitable query parameter for
   Jonathan Baron's search site to return results before timing out.
  
   Thanks in advance,
  
   Gav
  
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide

Re: [R] pretended size postscript and size of the graphic device window

2007-01-04 Thread Gavin Simpson

On Thu, 2007-01-04 at 12:21 +0100, mirca heli wrote:
 Dear list members!
 
 I've two questions concerning graphic export:
 
 a) I want to export my graphics as PostScript files. in this way I use
 the postscript() function. The tricky part is that they must have a
 pretended size (7 x 7 cm) and an absoulte font size (10pt).

If I understand you correctly, ?postscript contains all you need to
know, eg:

postscript(file=foo.eps, paper=special, onefile=FALSE, 
   width=7/2.54, height=7/2.54, pointsize=10, 
   horizontal=FALSE)
plot(rnorm(100), rnorm(100), main = foo)
dev.off()

Is this what you wanted?

 b) how can i (permanent) change the size of the graphic device window?

This may well depend on your OS (unstated). I was looking for this the
other day as the window is too big on my laptop - I didn't look to hard
though so it is no surprise that I did not find a solution.

HTH

G

 
 Best regards
 mirca heli
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graphical parameters: margins

2007-01-02 Thread Gavin Simpson

On Tue, 2007-01-02 at 17:02 +0100, Ricardo Rodríguez - Your XEN ICT Team
wrote:
 Hi all,
  
 Please, while using image() which is the graphical parameter which
 control the space between ylab and the y axis? I do need to write a
 number of relatively long y labels and I am not able the control, if
 possible, this space.
 
 See the effect I need to avoid...
 
 http://nvx.environmentalchange.net/@rrodriguez/images/overlapping.jpg
 
 Thanks for your help,
 
 Ricardo

Either of these two gives you the answer

 help.search(graphical parameters)
 RSiteSearch(graphical parameters margin)

more specifically, read ?par and in particular, the entry for parameter
'mar' and it's relatives.

You might also need to add the axis label separately from the figure:

opar - par(mar = c(5,7,4,2) +0.1)
plot(1:10, ann = FALSE) # or plot(1:10, ylab = )
mtext(label, side = 2, line = 6)
par(opar)

1) opar - par(mar = c(5,7,4,2) +0.1) creates 7.1 lines on the left of
the plot and saves defaults
2) mtext(label, side = 2, line = 6) displays the axis label on line 6
to push it away from the plot axis. Repeat for other sides...
3) par(opar) resets to the defaults.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Drawing a 3-D plot for PCA?

2006-12-22 Thread Gavin Simpson

On Thu, 2006-12-21 at 16:28 -0600, Frank Duan wrote:
 Hi All,
 
 Can anyone point me a hint (package) how to draw a 3D plot using the first 3
 components from PCA?
 
 Thanks a lot,
 
 FD

See ?ordiplot3d and ?ordirgl in package vegan. rda() in that package can
be used to perform PCA, which can then be drawn in 3D using the rgl
package or the scatterplot3d package, the former allowing dynamic
rotation and zooming of the ordination configuration.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if(){} else{}

2006-12-05 Thread Gavin Simpson

On Tue, 2006-12-05 at 16:33 +0100, Hans-Juergen Eickelmann wrote:
 Dear R-community,
 
 my data set looks like 'mat' below.
 
 Plant-c(NA,1,1,1,NA,NA,NA,NA,NA,1);
 Value1-rnorm(1:10);

You only really need rnorm(10), as in ?rnorm, rnorm is defined as
rnorm(n, mean=0, sd=1) and n is the number of observations.

 Value2-rnorm(1:10);
 mat-cbind(Plant,Value1,Value2);

You don't need the ; at the ends of the lines, and cbind() returns a
matrix, for which you cannot use $ to access the columns:

 class(mat)
[1] matrix
 mat$Plant
NULL

What you are looking for is ifelse(), see ?ifelse, but here is your
example, suitable spaced out and minus the other infelicities.

Plant - c(NA, 1, 1, 1, NA, NA, NA, NA, NA, 1)
Value1 - rnorm(10)
Value2 - rnorm(10)
mat - data.frame(Plant, Value1, Value2)
mat$Plant1 - ifelse(is.na(mat$Plant), A, B)

 mat$Plant1
 [1] A B B B A A A A A B
 mat
   Plant  Value1  Value2 Plant1
1 NA  2.76603270 -0.20435729  A
2  1 -0.54688170 -0.81943566  B
3  1  0.30480812 -0.05404563  B
4  1  1.64959026 -0.10762260  B
5 NA  1.13528236 -0.04670294  A
6 NA  1.55636761  0.87617575  A
7 NA  0.40651924  1.90516887  A
8 NA  1.49827147  0.05080935  A
9 NA -0.04396752  0.53267040  A
10 1  0.42714137 -0.55944595  B

HTH

G

 I receive data from two different sites.
 One site is identified by an interger number, the other site has no data in
 column Plant=NA.
 
 My pb:
 
 I'm trying to assign labels A or B to these 2 sites into a new column,
 but my if(){} else{} statement fails with the following statement:
 Error in if (is.na(mat$Plant == TRUE)) { :
 argument is of length zero
 
 if(is.na(mat$Plant==TRUE)){mat$Plant1=A} else{mat$Plant1=B};

That's not how you use is.na(), see ?is.na, as is.na(x) returns
TRUE/FLASE depending on wither x is NA or not

 I looked through the avail doc and R-help for some time but wasn't able to
 fix the pb.
 
 Thx Hans
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if(){} else{}

2006-12-05 Thread Gavin Simpson

On Wed, 2006-12-06 at 05:26 +1300, David Scott wrote:
 Test should be
 
 if(is.na(mat$Plant)){ ...

No, that won't work because if() is not vectorized:

 if(is.na(mat$Plant)){mat$Plant1 - A} else{mat$Plant1 - B}
Warning message:
the condition has length  1 and only the first element will be used in:
if (is.na(mat$Plant)) {
 mat$Plant1
 [1] A A A A A A A A A A

G

 
 
 
 On Tue, 5 Dec 2006, Hans-Juergen Eickelmann wrote:
 
 
  Dear R-community,
 
  my data set looks like 'mat' below.
 
  Plant-c(NA,1,1,1,NA,NA,NA,NA,NA,1);
  Value1-rnorm(1:10);
  Value2-rnorm(1:10);
  mat-cbind(Plant,Value1,Value2);
  I receive data from two different sites.
  One site is identified by an interger number, the other site has no data in
  column Plant=NA.
 
  My pb:
 
  I'm trying to assign labels A or B to these 2 sites into a new column,
  but my if(){} else{} statement fails with the following statement:
  Error in if (is.na(mat$Plant == TRUE)) { :
 argument is of length zero
 
  if(is.na(mat$Plant==TRUE)){mat$Plant1=A} else{mat$Plant1=B};
 
  I looked through the avail doc and R-help for some time but wasn't able to
  fix the pb.
 
  Thx Hans
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 _
 David Scott   Visiting (Until January 07)
   Department of Probability and Statistics
   The University of Sheffield
   The Hicks Building
   Hounsfield Road
   Sheffield S3 7RH
   United Kingdom
 Phone:+44 114 222 3908
 Email:[EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summary shows wrong maximum

2006-12-04 Thread Gavin Simpson

On Mon, 2006-12-04 at 12:04 +0100, Sebastian Spaeth wrote:
 Hi all,
 I have a list with a numerical column cum_hardreuses. By coincidence I  
 discovered this:
 
  max(libs[,cum_hardreuses])
 [1] 1793
 
  summary(libs[,cum_hardreuses])
 Min. 1st Qu.  MedianMean 3rd Qu.Max.
1   2   4  36  141790
 
 (note the max value of 1790) Ouch this is bad! Anything I can do to remedy  
 this? Known bug?

Did you read ?summary, which has:

 ## Default S3 method:
 summary(object, ..., digits = max(3, getOption(digits)-3))

so this is a rounding issue of the *printed* representation of the
summary. Just change digits to be a larger number:

 dat - rnorm(100)
 max(dat)
[1] 2.434443
 summary(dat)
Min.  1st Qu.   Median Mean  3rd Qu. Max.
-2.21100 -0.65450  0.03793  0.06919  0.84650  2.43400
 summary(dat, digits = 10)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
-2.21106232 -0.65451716  0.03793040  0.06919486  0.84652269  2.43444263
 # same with integer as in your example
 dat - floor(dat * 100)
 max(dat)
[1] 2434442
 summary(dat)
Min.  1st Qu.   Median Mean  3rd Qu. Max.
-2211000  -6545003793069190   846500  2434000
 summary(dat, digits = 10)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
-2211063.00  -654517.5037930.0069194.38   846522.00  2434442.00

HTH

G

 
 This is a Version 1.16 (3198) of the MacOSX R.
 
 Regards,
 Sebastian Spaeth
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quicker way of combining vectors into a data.frame

2006-12-01 Thread Gavin Simpson

  -0.520  0.278 -0.546 -0.925  1.507 ...
 
 
 
 
 So there is something else going on, either with your code or some other
 conflict, unless my assumptions about your data are incorrect.
 
 HTH,
 
 Marc
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quicker way of combining vectors into a data.frame

2006-12-01 Thread Gavin Simpson

On Fri, 2006-12-01 at 12:13 +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
snip /
 
  I just don't understand what is going on with data.frame.
 

 I think there is something about the data you're not telling us...

Yes, that I was doing something very, very silly that I thought would
work (produce a vector CLmaxN of the required length), but was in fact
blowing out to a huge named list. It was this that was causing the
massive increase in computation time in data.frame over cbind.

After correcting my mistake, timings for data.frame are:

system.time(fab - data.frame(lc.ratio, Q,
+  fNupt,
+  rho.n, rho.s,
+  net.Nimm,
+  net.Nden,
+  CLminN,
+  CLmaxN,
+  CLmaxS))
[1] 0.012 0.000 0.011 0.000 0.000
Browse[1] system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
+  fNupt = fNupt,
+  rho.n = rho.n, rho.s = rho.s,
+  net.Nimm = net.Nimm,
+  net.Nden = net.Nden,
+  CLminN = CLminN,
+  CLmaxN = CLmaxN,
+  CLmaxS = CLmaxS))
[1] 0.008 0.000 0.018 0.000 0.000

One vector has names for some reason, removing them brings the un-named
data.frame version down to the named version timing and makes no
difference to the named version

Browse[1] names(CLmaxS) - NULL
Browse[1] system.time(fab - data.frame(lc.ratio, Q,
+  fNupt,
+  rho.n, rho.s,
+  net.Nimm,
+  net.Nden,
+  CLminN,
+  CLmaxN,
+  CLmaxS))
[1] 0.008 0.000 0.016 0.000 0.000
Browse[1] system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
+  fNupt = fNupt,
+  rho.n = rho.n, rho.s = rho.s,
+  net.Nimm = net.Nimm,
+  net.Nden = net.Nden,
+  CLminN = CLminN,
+  CLmaxN = CLmaxN,
+  CLmaxS = CLmaxS))
[1] 0.008 0.000 0.009 0.000 0.000

Apologies to the list for bothering you all with my stupidity and thank
you again to everyone who replied - I knew it was I who was doing
something wrong, but couldn't see it and thanks to your comments,
suggestions and queries I was able to work out what that was.

All the best,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Quicker way of combining vectors into a data.frame

2006-11-30 Thread Gavin Simpson

Hi,

In a function, I compute 10 (un-named) vectors of reasonable length
(4471 in the particular example I have to hand) that I want to combine
into a data frame object, that the function will return.

This is very slow, so *I'm* doing something wrong if I want it to be
quick and efficient, though I'm not sure what the best way to do this
would be.

I know it is the combining into data frame bit that is slow, because
I've Rprof'ed it:

$by.self
self.time self.pct total.time total.pct
names-.default   16.58 52.8  16.58  52.8
unlist 7.22 23.0   7.26  23.1
data.frame 1.72  5.5  29.38  93.6
duplicated.default 1.66  5.3   1.66   5.3
+  1.20  3.8   1.20   3.8
list   0.40  1.3   0.40   1.3
as.data.frame.numeric  0.28  0.9   3.32  10.6
apply  0.26  0.8   1.70   5.4
pmatch 0.22  0.7   0.22   0.7
paste  0.20  0.6   0.90   2.9
deparse0.14  0.4   0.70   2.2
eval   0.12  0.4  31.28  99.7
names-0.12  0.4  16.70  53.2
FUN0.12  0.4   1.32   4.2
names  0.12  0.4   0.14   0.4
as.list.default0.12  0.4   0.12   0.4
duplicated 0.10  0.3   1.76   5.6
gc 0.10  0.3   0.10   0.3

And I stepped through it under debug() and all the calculations before
are quick, and then this bit takes a little over 20 seconds to complete

 fab - data.frame(lc.ratio = lc.ratio, Q = Q,
 fNupt = fNupt,
 rho.n = rho.n, rho.s = rho.s,
 net.Nimm = net.Nimm,
 net.Nden = net.Nden,
 CLminN = CLminN,
 CLmaxN = CLmaxN,
 CLmaxS = CLmaxS)

I can get it down to c. 5 seconds if I do (not Rprof'ed):

 fab - data.frame(lc.ratio, Q,
 fNupt,
 rho.n, rho.s,
 net.Nimm,
 net.Nden,
 CLminN,
 CLmaxN,
 CLmaxS)

But this still seems quite a long time, so I'm thinking that there must
be a quicker of doing what I want (end up with a data.frame with the 10
vectors in it).

Can anyone enlighten me?

 version
   _  
platform   i686-pc-linux-gnu  
arch   i686   
os linux-gnu  
system i686, linux-gnu
status Patched
major  2  
minor  4.0
year   2006   
month  10 
day03 
svn rev39576  
language   R  
version.string R version 2.4.0 Patched (2006-10-03 r39576)

 sessionInfo()
R version 2.4.0 Patched (2006-10-03 r39576) 
i686-pc-linux-gnu 

locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] methods   stats graphics  grDevices utils
datasets 
[7] base

Thanks in advance,

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GAMS and Knots

2006-11-28 Thread Gavin Simpson

On Wed, 2006-11-29 at 11:40 +1300, Kathryn Baldwin wrote:
 Hi
 I was wondering if anyone knew how to work out the number of knots that 
 should be applied to each variable when using gams in the mgcv library?
 Any help or references would be much appreciated.
 Thanks
 Kathryn Baldwin

mgcv works out an optimal number of knots to use, using a Generalised
Cross-Validation (GCV) routine. Take a look at:

Simon N. Wood. mgcv: GAMs and generalized ridge regression for R. R
News, 1(2):20-25, June 2001.

And Simon's new book:

Simon N. Wood. Generalized Additive Models: An Introduction with R.
Chapman  Hall/CRC, Boca Raton, FL, 2006. ISBN 1-584-88474-6.

For further info on using mgcv for GAMs.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about capscale (vegan)

2006-11-27 Thread Gavin Simpson

On Mon, 2006-11-27 at 15:37 +0100, Alicia Amadoz wrote:
 Hi Gavin,
 
 I have been analyzing real data (sorry but I am not allowed to post
 these data here) and what I got was this,
 
 mydistmat_f.cap - capscale(distmat_f ~ F + L + F:L, mfactors_frame)

I believe you can write that formula as: distmat_f ~ F * L

 
 Warning messages:
 1: some of the first 30 eigenvalues are  0 in: cmdscale(X, k = k, eig =
 TRUE, add = add)
 2: Se han producido NaNs in: sqrt(ev)

Sorry, I don't know enough about this method to know whether this a
problem you should worry about or not. You should read up on the method
some more to decide if the first warning is something you should be
worried about. IIRC, negative eigenvalues are to be expected with this
method as they are handled explicitly by capscale, and as this is a
warning coming from cmdscale(), I suspect it is a helpful feature of
that function, which you don't need to worry about when used in
capscale().

 
  mydistmat_f.cap
 
 Call:
 capscale(formula = distmat_f ~ F + L + F:L, data = mfactors_frame)
 
   Inertia Rank
 Total  0.3758
 Constrained0.21104
 Unconstrained  0.16484
 Inertia is squared  distance
 Some constraints were aliased because they were collinear (redundant)
 
 Eigenvalues for constrained axes:
  CAP1  CAP2  CAP3  CAP4
 1.679e-01 2.954e-02 1.349e-02 1.233e-05
 
 Eigenvalues for unconstrained axes:
  MDS1  MDS2  MDS3  MDS4
 1.388e-01 2.601e-02 4.076e-05 2.064e-07
 
 So, by these results I can tell that there are 4 axes that explain
 0.1648 of the total variance and another 4 axes that explain 0.2110 of
 the total variance. But I don't understand the difference between
 constrained and unconstrained.

The constrained axes are axes that are linear combinations of your
explanatory variables (F, L and F:L), so this is the bit of your genomic
data that is explained by those explanatory factors. The unconstrained
bit is the remaining variance not explained, and are MDS (PCoord) axes.

So you can explain c. 56% of the variance in your genomic data with F,
L, and F:L.

Note the warning about aliased constraints - this means that at least
the variance of one variable in the model (inc interactions) is
completely correlated with another variable (or combination of
variables?) and is redundant.

Type alias(mydistmat_f.cap) to see which coefficients are aliased
and ?alias to see what this means.

 
  anova(mydistmat_f.cap)
 
 Permutation test for capscale under direct model
 
 Model: capscale(formula = distmat_f ~ F + L + F:L, data = mfactors_frame)
  DfVar  F N.Perm Pr(F)
 Model 4   0.21 1.2798 400.00 0.0875 .
 Residual  4   0.16
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  summary(anova(mydistmat_f.cap))
Df Var   F N.PermPr(F)
  Min.   :4   Min.   :0.1648   Min.   :1.280   Min.   :200   Min.   :0.12
  1st Qu.:4   1st Qu.:0.1764   1st Qu.:1.280   1st Qu.:200   1st Qu.:0.12
  Median :4   Median :0.1879   Median :1.280   Median :200   Median :0.12
  Mean   :4   Mean   :0.1879   Mean   :1.280   Mean   :200   Mean   :0.12
  3rd Qu.:4   3rd Qu.:0.1994   3rd Qu.:1.280   3rd Qu.:200   3rd Qu.:0.12
  Max.   :4   Max.   :0.2110   Max.   :1.280   Max.   :200   Max.   :0.12
   NA's   :1.000   NA's   :  1   NA's   :1.00
 
 Then, I want to know the sum of squares of anova to check with other
 analysis that we performed but I can't see them by the output of anova.
 Besides, I am wondering if there is any manner to identify the main
 effects, factor effects and interaction in this anova analysis. I would
 be very grateful if you could help me to understand these results.

There isn't a summary method for anova.cca, and anyway, this anova isn't
working on sums of squares, but on other measures of variance. It is a
permutation test, and simply works out with brute force how likely you
are to have a model explaining 56% of the total variance given your
sample size and model complexity, under a null/random model.

It sounds like you haven't grasped fully the fundamentals of the methods
you are employing, and I would strongly advise you to do some more
reading up on these methods. I can, at best, only guide you as I am not
that familiar with the technique myself.

A good start would be the refs in ?capscale and then search for papers
that cite Anderson  Willis and that use the methodology.

 
 Thank you very much,
 Alicia

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk

Re: [R] automatic cleaning of workspace

2006-11-27 Thread Gavin Simpson

On Mon, 2006-11-27 at 18:29 -0500, Leeds, Mark (IED) wrote:
 I'm having that problem where I am sometimes using an object that's from
 a previous workspace when I don't want to be doing that. I was thinking
 of putting rm(objects=ls()) in my first.R function But, the problem with
 doing this, is that it doesn't prompt you with are you sure and there
 could be very rare cases where I don't want to delete the workspace ? Is
 there a way to
 make the cleaning of the workspace automatic but still prompt you ? I
 guess I can always just try to remember to manually 
 do rm(objects=ls())when I start up in whatever workspace I am in but I
 don't think I can trust my memory. Thanks.

I assume you are saving your workspace when exiting R, and then when
restarting R in same directory it is auto loading the saved workspace?

If so, try not saving the workspace at the end of the session when
quitting, but explicitly save the objects you wish to save using save()
and load() it when restarting.

You might need to locate and delete the file .Rdata in the working
directory first, before you start R for example, to stop it being loaded
when you start R again. Alternatively, give it a new name with a .Rdata
extension, so you can load the workspace again if needed.

You might also want save.image if you are interested in saving
workspaces rather than individual objects.

This way you are in control of what is and is not saved/reloaded and you
won't have to rely on your memory.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What training algorithm does nnet package use?

2006-11-23 Thread Gavin Simpson

On Thu, 2006-11-23 at 09:39 +, Wee-Jin Goh wrote:
 Hello,
 
 Thanks for that. I've taken a look at the source code, and I see that  
 the bulk of the processing is done in C, with R acting as a wrapper.  
 Below is the function I think is doing the training in the network.
 
 I'm guessing it's the standard Backpropagation with a decay term  
 algorithm? Can anyone confirm if that's correct?
 
 Cheers,
 Wee-Jin

How about you take a look at Section 8.10 in MASS (the book; Venables 
Ripley (2002) Modern Applied Statistics in S 4th Edition, Springer.) and
Brian Ripley's 1996 book Pattern Recognition and Neural Networks,
Cambridge University Press.

Both of which are documented in the help page for nnet; see ?nnet. nnet
(the package) is support software for these books so you should consult
these references.

nnet is a package within the VR bundle by the way.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mayday ! Needing urgent help about writing results to a file

2006-11-17 Thread Gavin Simpson

On Fri, 2006-11-17 at 01:39 -0800, Marc Feuerstein wrote:
 Hey listmembers,
 I am desperately trying to write a data frame to a file. Not in CSV,
 but as they appear on the screen (nice, easy to read tables). I've
 read that the sink function is the way to go.
 
 I have tried the following code inside a function. 
 
 sink(ABC.txt)
 MyFrameA
 MyFrameB
 sink()
 
 It gives the result I need when I use it outside a function, but when
 I use it *inside* a function I wrote, it creates an empty file. I want
 to pull my hair out !

Are you forgetting to *print* MyFrameA etc.? It works fine for me in a
function if you print the object:

foo - function(x, file = temp.file.txt)
{
sink(file = file)
print(x)
sink(file = NULL)
}

 dat - as.data.frame(matrix(rnorm(10), ncol = 5))
 names(dat) - LETTERS[1:5]
 rownames(dat) - letters[1:2]
 foo(dat)

G

 
 What should I do so that, when the function ends, I have a text file
 called ABC.txt having MyFrameA and MyFrameB in the same file. 
 
 Thanks so much (in advance) to help me out on this one ! I'm sure some
 of you already encountered such a situation.
 
 Marc.
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about capscale (vegan)

2006-11-17 Thread Gavin Simpson

, then
  you would describe the model as so:
  
  vare.cap - capscale(varespec ~ N + P + K + Condition(Al), 
   data = varechem,
   distance = bray)
  vare.cap
  
  In the above, LHS of formula is a data frame so capscale looks to
  argument distance for the name of the coefficient to turn it into a
  distance matrix. The terms on the RHS of the formula are variables
  looked up in the object assigned to the data argument.
  
  Now lets alter this to start with a dissimilarity/distance matrix
  instead. The exact complement of the above would be:
  
  dist.mat - vegdist(varespec, method = bray)
  vare.cap2 - capscale(dist.mat ~ N + P + K + Condition(Al), 
   data = varechem,
   comm = varespec)
  vare.cap2
  
  To explain the above example; first create the Bray Curtis distance
  matrix (dist.mat). Then use this on the LHS of the formula. When
  capscale now wants to calculate the species scores of the analysis it
  will look to argument comm to use in the calculation; which in this
  case we specify is the original species matrix varespec.
  
  As for what are species scores, well this is a throw back to the origins
  of the package and the methods included - all of this is related to
  ecology and mainly vegetation analysis (hence vegan).
  
  For species scores, read variable scores. The distance matrix (however
  calculated) describes how similar your individual sites (read samples)
  are to one another. You can also display information about the variables
  used to determine those distances/similarities, and this is what is
  meant by species scores. Whatever you used to generate the distance
  matrix, the columns represent the info used to generate the species
  scores.
  
  If some of this still isn't clear, email the list with the commands used
  to generate your distance matrix in R and I'll have a go at explaining
  this with reference to your data/example.
  
   
   I would be very grateful if you could help me with this fact in any
   manner. Thank you in advance for your help.
   
   Regards,
   Alicia
  
  HTH
  
  G
  
  -- 
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gower distance calculation

2006-11-17 Thread Gavin Simpson

On Fri, 2006-11-17 at 14:50 +0200, Roy Spitz wrote:
 Hello
 
  
 
 I have 2 rows in a matrix and I want to calculate the Gower Distance between
 the 2 , how can I do it?
 I searched and found nothing that can help me, and my program doesn't know
 the gdist function and I couldn't find it on the R help site.
 
  
 
 Can anyone help me plz

vegdist in package vegan has Gower's distance, but all variables have to
be numeric.

If you want to use mixed data (numerics, factors, binary), see ?daisy in
package cluster.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about capscale (vegan)

2006-11-16 Thread Gavin Simpson

On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote:
 Hello,
 
 I am interested in using the capscale function of vegan package of R. I
 already have a dissimilarity matrix and I am intended to use it as
 'distance' argument. But then, I don't know what kind of data must be in
 'comm' argument. I don't understand what type of data must be referred
 as 'species scores' and 'community data frame' since my data refer to
 nucleic distances between different sequences.

No, that is all wrong. Read ?capscale more closely! It says that you
need to use the formula to describe the model. distance is used to
tell capscale which distance coefficient to use if the LHS of the model
formula is a community matrix.

Argument comm is used to tell capscale where to find the species
matrix that will be used to determine species scores in the analysis,
*if* the LHS of the formula is a distance matrix. comm isn't used if
the LHS is a data frame, and distance is ignored if the LHS is a
distance matrix.

As you don't provide a reproducible example of your problem, I will use
the inbuilt example from ?capscale

## load some data
data(varespec)
data(varechem)

Now if you want to fit a capscale model using the raw species data, then
you would describe the model as so:

vare.cap - capscale(varespec ~ N + P + K + Condition(Al), 
 data = varechem,
 distance = bray)
vare.cap

In the above, LHS of formula is a data frame so capscale looks to
argument distance for the name of the coefficient to turn it into a
distance matrix. The terms on the RHS of the formula are variables
looked up in the object assigned to the data argument.

Now lets alter this to start with a dissimilarity/distance matrix
instead. The exact complement of the above would be:

dist.mat - vegdist(varespec, method = bray)
vare.cap2 - capscale(dist.mat ~ N + P + K + Condition(Al), 
 data = varechem,
 comm = varespec)
vare.cap2

To explain the above example; first create the Bray Curtis distance
matrix (dist.mat). Then use this on the LHS of the formula. When
capscale now wants to calculate the species scores of the analysis it
will look to argument comm to use in the calculation; which in this
case we specify is the original species matrix varespec.

As for what are species scores, well this is a throw back to the origins
of the package and the methods included - all of this is related to
ecology and mainly vegetation analysis (hence vegan).

For species scores, read variable scores. The distance matrix (however
calculated) describes how similar your individual sites (read samples)
are to one another. You can also display information about the variables
used to determine those distances/similarities, and this is what is
meant by species scores. Whatever you used to generate the distance
matrix, the columns represent the info used to generate the species
scores.

If some of this still isn't clear, email the list with the commands used
to generate your distance matrix in R and I'll have a go at explaining
this with reference to your data/example.

 
 I would be very grateful if you could help me with this fact in any
 manner. Thank you in advance for your help.
 
 Regards,
 Alicia

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 >

1 - 100 of 310 matches

Mail list logo