Re: [R] creating a reverse geometric sequence

2010-05-23 Thread Dan Davison
Erik Iverson er...@ccbr.umn.edu writes:

 Hello,

 Can anyone think of a non-iterative way to generate a decreasing
 geometric sequence in R?

 For example, for a hypothetical function dg, I would like:

 dg(20)
 [1] 20 10 5 2 1

 where I am using integer division by 2 to get each subsequent value in
 the sequence.


 There is of course:

 dg - function(x) {
   res - integer()
   while(x = 1) {
 res - c(res, x)
 x - x %/% 2
   }
   res
 }

 dg(20)
 [1] 20 10  5  2  1

 This implementation of 'dg' uses an interative 'while' loop.  I'm
 simply wondering if there is a way to vectorize this process?

Hi Erik,

How about

dg - function(x) {
maxi - floor(log(x)/log(2))
floor(x / (2^(0:maxi)))
}

I don't think the remainders cause a problem.

Dan


 Thanks,
 Erik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C function call in R

2010-05-18 Thread Dan Davison
John Lande john.land...@gmail.com writes:

 dear all,

 we am trying to improve the performance of my R code, with the implentation
 of some function with custom C code.
 we found difficult to import and export/import data structure such us
 matrices or data.frame into the external C functions.

Please give a *very simple* example of what you're trying and failing to
do.

Use the .C() interface, forget about the .Call interface. Then it is not
that hard. Start with the convolve example on p.69 and 70 of Writing R
Extensions. Get that working and then turn it into your problem.

Forget about lists and data frames: everything is going to be a simple
vector. That includes arrays and matrices: you can pass them in, but C
will know nothing about their dimensions until you tell it. Of course,
you can pass the dimension vectors in as a separate vector. So, if you
use arrays, you need to understand the order in which R stores the
elements of the array. If your problem cannot be solved with the .C
interface then you should consider whether it is worthwhile to proceed
as the .Call interface repays those who use it frequently but has a
considerably steeper learning (and forgetting) curve.

Dan



 we already tried the solution from Writing R Extensions form  the R
 webpage.

 do you have any other solution or more advanced documentation on that point?

 looking forward your answer

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading in all files of a certain type

2010-05-17 Thread Dan Davison
Dimitri Liakhovitski dimitri.liakhovit...@gmail.com writes:

 Thanks again - and one follow-up question.
 When I do do.call(rbind, lapply(dir(patt = \\.csv$), read.csv))
 What is the right way to speicify (probably under patt) that I only
 need to grab those .csv files that contain a certain string, e.g.,
 result?

I assume you mean whose names contain a certain string, rather than the
string being in the file contents.

The pattern argument to dir() is a regular expression. They are a
worthwhile thing to know a bit about, so, you should have a look at some
introductory material on regular expressions, but this might also help a
bit:

 dir()
[1] 1-result.csv result-2.csv resultcsvresult.csv  
 dir(patt=result\\.csv$)
[1] 1-result.csv result.csv  
 dir(patt=result.*\\.csv$)
[1] 1-result.csv result-2.csv result.csv  

Dan


 I tried a couple of things, like patt= \\.csv$  pat = result -
 but it does not seem to work
 Thanks a lot!
 Dimitri



 On Wed, May 12, 2010 at 6:16 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Thanks a lot, Henrique, will try!
 Dimitri

 On Wed, May 12, 2010 at 3:41 PM, Henrique Dallazuanna www...@gmail.com 
 wrote:
 Try this:

 do.call(rbind, lapply(dir(patt = \\.csv$), read.csv))

 On Wed, May 12, 2010 at 4:32 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:

 Hello,

 I am wondering if it's possible to read in all files of a certain type
 - without specifying their names.
 For example, I have 10 .csv files in my working directory.
 I would like to read them in and bind them all together. I was
 thinking of writing a loop, read in all files, and then bind them.
 Is it possible?

 Thanks a lot!

 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading in all files of a certain type

2010-05-17 Thread Dan Davison
jim holtman jholt...@gmail.com writes:

 try:

 pattern=*result*\\.csv$

Just for the record, that's not quite correct. The * doesn't behave like
in a shell glob. Instead, * says 0 or more copies of the previous
character. So the above pattern picks up resul.csv, which I don't think
was intended.

I don't know what a * is defined to do when it is the first character of
a regexp, but I believe it should be avoided.

Dan



 On Mon, May 17, 2010 at 9:06 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Thanks again - and one follow-up question.
 When I do do.call(rbind, lapply(dir(patt = \\.csv$), read.csv))
 What is the right way to speicify (probably under patt) that I only
 need to grab those .csv files that contain a certain string, e.g.,
 result?
 I tried a couple of things, like patt= \\.csv$  pat = result -
 but it does not seem to work
 Thanks a lot!
 Dimitri



 On Wed, May 12, 2010 at 6:16 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Thanks a lot, Henrique, will try!
 Dimitri

 On Wed, May 12, 2010 at 3:41 PM, Henrique Dallazuanna www...@gmail.com 
 wrote:
 Try this:

 do.call(rbind, lapply(dir(patt = \\.csv$), read.csv))

 On Wed, May 12, 2010 at 4:32 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:

 Hello,

 I am wondering if it's possible to read in all files of a certain type
 - without specifying their names.
 For example, I have 10 .csv files in my working directory.
 I would like to read them in and bind them all together. I was
 thinking of writing a loop, read in all files, and then bind them.
 Is it possible?

 Thanks a lot!

 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com




 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable variables using R ... e.g., looping over data frames with a numeric separator

2010-05-17 Thread Dan Davison
Monte Shaffer monte.shaf...@gmail.com writes:

 Hello,

 I have programmed in PHP a lot, and wanted to know if anyone figured out
 Variable variables using R.

 For example, I have several dataframes of unequal sizes that relate to L
 treatments (1, 2, 3, 4, 5,6, L) ... in this case L=7

You should create a list containing 7 data frames, rather than attempting
to identify them with names containing integers. Then you can process
your data frames in a for loop, or with lapply etc, and things should
generally seem much better.

df.list - list(fData.1, fData.2, fData.3, fData.4, fData.5, fData.6, fData.7)

Dan


 fData.1
 unique.1
 fit.nls.1
 summary.nls.1
 fit.var.1
 summary.var.1
 .
 fData.2
 unique.2
 fit.nls.2
 summary.nls.2
 fit.var.2
 summary.var.2
 .
 fData.L
 unique.L
 fit.nls.L
 summary.nls.L
 fit.var.L
 summary.var.L
 =

 I want to do something like

 for(i in 1:L-1)
 {
 dataStr = gsub(' ','',paste(fData.,i));
 dataVar = eval(dataStr);
  ## GOAL is to grab data frame  'fData.1' and do stuff with it, then in next
 loop grab data frame 'fData.2' and do stuff with it


 }

 #

 in PHP, I would define the string $dataStr = final.1 and then $dataVar =
 $$dataStr which is a variable variables use.

 Thanks in advance for any help you can offer or suggest.  My current
 solution is to write code in PHP that generates lots of R code.  I would
 like to do it all in R, so I don't have to rely on another language.


 monte

 {x:

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying quantile to a list using values of another object as probs

2010-05-17 Thread Dan Davison
Lorenzo Cattarino l.cattar...@uq.edu.au writes:

 Hi Jim,

 Thanks for your reply. Your codes does work but I was hoping to find a
 way to use lapply and avoid the for loop.

 Lorenzo


 -Original Message-
 From: Jim Lemon [mailto:j...@bitwrit.com.au] 
 Sent: Monday, 17 May 2010 8:27 PM
 To: Lorenzo Cattarino
 Cc: r-help@r-project.org
 Subject: Re: [R] applying quantile to a list using values of another
 object as probs

 On 05/17/2010 06:01 PM, Lorenzo Cattarino wrote:
 Hi r-users,

 I have a matrix B and a list of 3x3 matrices (mylist). I want to
 calculate the quantiles in the list using each of the value of B as
 probabilities.

It's a little confusing, because it isn't clear why the elements of
mylist are matrices, nor why B is a matrix. I.e. why aren't these things
just dimensionless vectors? However if you really do want to ignore the
row/column information then perhaps what you're looking for is

lapply(mylist, quantile, probs=B)

[[1]]
 26.55087%  37.21239%  57.28534%  90.82078%  20.16819%  89.83897%  94.46753%  
66.07978%   62.9114% 
-0.2191315  0.3738468  0.5389231  1.2277025 -0.4274793  1.1973174  1.3405621  
0.6223309  0.5811310 
 6.178627%  20.59746%  17.65568% 
-1.4270686 -0.4166326 -0.4909661 

[[2]]
   26.55087%37.21239%57.28534%90.82078%20.16819%89.83897%   
 94.46753% 
-0.004930323  0.072476814  0.703609732  0.925581428 -0.027300847  0.923628895  
0.932833742 
   66.07978% 62.9114%6.178627%20.59746%17.65568% 
 0.793329524  0.783422677 -1.028244961 -0.026313767 -0.033078300 

[[3]]
 26.55087%  37.21239%  57.28534%  90.82078%  20.16819%  89.83897%  94.46753%  
66.07978%   62.9114% 
-0.1492189 -0.1040074  0.2025300  0.8161114 -0.2803999  0.7580782  1.0316644  
0.3963404  0.3886679 
 6.178627%  20.59746%  17.65568% 
-0.9801188 -0.2693299 -0.3451936 

Dan




 The codes I wrote are:



 B- matrix (runif(12, 0, 1), 3, 4)

 mylist- lapply(mylist, function(x) {matrix (rnorm(9), 3, 3)})



 for (i in 1:length(B))

 {

quant- lapply (mylist, quantile, probs=B[i])

 }



 But quant returned the quantiles calculated using only the last value
 ([3,3]) of the matrix B.


 Hi Lorenzo,
 This works for me:

 B-matrix (runif(12,0,1),3,4)
 mylist-list()
 for(i in 1:3) mylist[[i]]-matrix(rnorm(9),3,3)
 myq-list()
 for(i in 1:3)myq[[i]]-quantile(mylist[[i]],probs=B[i,])

 Although looking at your example, I may have misunderstood what you want

 the result to be.

 Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using dev.copy

2010-03-22 Thread Dan Davison
I'm working over an ssh connection without X11 graphics. I'm making a
plot, the first stage of drawing which takes a long time. I want to
experiment with adding details. Here is what I was hoping to do, which
results in error.

## Draw the master plot on png dev 2
png(file=master.png)
plot(1:10)

## Save a copy on png dev 3
png(file=copy1.png)
dev.set(2)
dev.copy(which=3)

## Add details to copy, write to disk and view
abline(v=5)
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) : 
  plot.new has not been called yet

Can someone tell me how to do this correctly?

Thanks a lot,

Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply to apply a function using a vector

2010-02-17 Thread Dan Davison
Flana flana.bristo at gmail.com writes:

 
 Hi,
 
 First, thank you all for your help.
 
 Here is my problem (simplified):
 
 Say I have a list:
 a=list(matrix(50,nrow=5,ncol=5),
  matrix(25,nrow=5,ncol=5),
  matrix(10,nrow=5,ncol=5))
 
 I'd like to use rbinom with a different probability for each matrix. I
 tried:
 
 b=c(.8,.1,.9)
 brep=rep(b,each=25)
 lapply(a,function(a) rbinom(25,a,brep))
 
 but that doesn't work-- it just uses the first value of b rather than
 applying it over that list.

Seeing as you want to index in to both the size and prob arguments of
rbinom, you can use mapply, rather than lapply:

mapply(function(size, prob)
   matrix(rbinom(25, size=size, prob=prob), nrow=5, ncol=5),
   c(50,25,10), c(.8,.1,.9), SIMPLIFY=FALSE)

An lapply equivalent would have to use an explicit index variable, e.g.

lapply(1:3, function(i) matrix(rbinom(25, size=a[[i]], prob=b[i]), nrow=5))

However, it may be that neither of these are the most efficient way to
do this, as they involve calling rbinom multiple times. For just 3
different parameter sets (prob and size) that's unlikely to be a
problem, but if you were simulating for a large number of parameter sets 
then you might want to consider calling rbinom once and subsequently
unpacking the results, e.g.

size - rep(c(50,25,10), each=25)
prob - rep(c(.8,.1,.9), each=25)
x - rbinom(25*3, size=size, prob=prob)
lapply(split(x, rep(1:3, each=25)), matrix, nrow=5)

Dan


 what I am currently doing is:
 c=list()
 for (i in 1:3){c[[i]]=rbinom(25,a[[i]],b[i])}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to do calculations in data matrices?

2010-02-13 Thread Dan Davison
Zoppoli, Gabriele (NIH/NCI) [G] zoppolig at mail.nih.gov writes:

 
 Please give me just a reference where I can find something useful. 

The others are right that rather than randomly googling, you should bite
the bullet and sit down for a couple of hours with some introductory
material on R (a book, or one of the freely available pdfs). Unless you are
never going to use R again, it will be worth it. But seeing as you asked your
question clearly, here's one
way to do the steps you specify. Hopefully this will help as well.

First, make a matrix to work with:

 mat1 - matrix(sample(1:10, size=12, replace=TRUE), ncol=4)
 mat1
 [,1] [,2] [,3] [,4]
[1,]57   109
[2,]4   1082
[3,]8   1095

 
 
 In summary, I need to :
 
 - find the median of each row of a matrix

You can use apply for that:

 row.medians - apply(mat1, 1, median)
 row.medians
[1] 8.0 6.0 8.5

 - create a new matrix with each value in the first matrix divided by
 the median of its row

That's easy to *do*:

 mat2 - mat1 / row.medians
 mat2
  [,1] [,2] [,3]  [,4]
[1,] 0.625 0.875000 1.25 1.125
[2,] 0.667 1.67 1.33 0.333
[3,] 0.9411765 1.176471 1.058824 0.5882353

but it may take more time to understand why that worked. How come it
knew that we wanted to divide each row by the median of the row? (Hint:
understand the byrow argument in ?matrix and the mentions of the word
recyling in ?Arithmetic).

 - if a value a in the second matrix is  1, I need to substitute it
 with 1/a

First make a logical vector which identifies the elements of the matrix
you want to operate on:
 is.small - mat2  1

Then perform the operation on those elements:

 mat2[is.small] - 1 / mat2[is.small]
 mat2
   [,1] [,2] [,3]  [,4]
[1,] 1.6000 1.142857 1.25 1.125
[2,] 1.5000 1.67 1.33 3.000
[3,] 1.0625 1.176471 1.058824 1.700


# you could also use
ifelse(mat2  1, 1/mat2, mat2)

dan

 
 I know that for some of you it must be overeasy, but I swear I googled
 for two hours with keywords operations, calculations, data
 matrices, data tables, and CRAN, and I didn't find anything
 useful.
 
 Thank you all
 
 Gabriele Zoppoli, MD
 Ph.D. Fellow, Experimental and Clinical Oncology and Hematology,
 University of Genova, Genova, Italy
 Guest Researcher, LMP, NCI, NIH, Bethesda MD
 
 Work: 301-451-8575
 Mobile: 301-204-5642
 Email: zoppolig at mail.nih.gov


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NMDS ordination

2010-02-13 Thread Dan Davison
Aisyah aisyah.faruk at ioz.ac.uk writes:

 
 
 Hi
 
 Im currently trying to plot my NMDS data together with fitted variables
 (envfit funct) on an ordination plot. The plot function shows two
 displays=sites and sp. I was wondering how to plot it so that the sites
 come up as different points for different sites but the species come up as
 actual names? It looks a little busy at the moment with everything in.

Please provide an example. I.e. working code creating the plot as you have it at
the moment. Include an example data set and be explicit about what packages are
needed. Don't post a large data set -- just create a minimal example
demonstrating the problem you are having.


 
 Sya

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] List to matrix or to vectors conversion

2010-02-12 Thread Dan davison
 Ted.Harding at manchester.ac.uk writes:

 
 On 12-Feb-10 13:14:29, Juan Tomas Sayago wrote:
  Dear list,
  I have a list with 1000 x1000 lines and columns

Lists have neither lines nor columns. Can you explain exactly what you have?
E.g. show us the code that created your list?

 do you know how I can
  convert it to matrrix or data.frame.
  Thanks.
  Juan
 
 as.data.frame() will convert it to a dataframe. If you then apply
 as.matrix() to the result you will get a matrix:
 
   L - list(X=c(1,2,3),Y=c(4,5,6),Z=c(7,8,9))

If you want a matrix as opposed to a data.frame (e.g. your list entries are all
numeric), and the data set is large, this more efficient method might be useful:


 matrix(unlist(L), nrow=3)
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369

If it's not obvious to you what that does, consider:

 unlist(L)
X1 X2 X3 Y1 Y2 Y3 Z1 Z2 Z3 
 1  2  3  4  5  6  7  8  9 

 matrix(unlist(L), nrow=3, byrow=TRUE)
 [,1] [,2] [,3]
[1,]123
[2,]456
[3,]789
 matrix(unlist(L), nrow=3, byrow=FALSE)
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369



   L
   # $X
   # [1] 1 2 3
   # $Y
   # [1] 4 5 6
   # $Z
   # [1] 7 8 9
 
   D - as.data.frame(L)
   D
   #   X Y Z
   # 1 1 4 7
   # 2 2 5 8
   # 3 3 6 9
 
   M - as.matrix(D)
   M
   #  X Y Z
   # [1,] 1 4 7
   # [2,] 2 5 8
   # [3,] 3 6 9
 
 Note that applying as.matrix() directly to the original L will
 not work. It returns a list, not a matrix.
 
 Ted.
 
 
 E-Mail: (Ted Harding) Ted.Harding at manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 12-Feb-10   Time: 13:40:32
 -- XFMail --
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] paired wilcox test on each row of a large dataframe

2010-02-12 Thread Dan Davison
gauravbhatti gaurav15984 at hotmail.com writes:

 
 
 hI

 I have to calculate V statistic for each row of a large dataframe
 (28000). I can not use multtest package for paired wilcox test. I have
 been using for loop which are. Is there a way to speed the computation
 with another method like using apply or tapply?

Using a for loop is fine here (and basically unavoidable). If you need
it to be faster, use a matrix rather than a data.frame. (i.e. make a
matrix containing columns 1-12, which are all numeric and so do not need
to be in a data frame).

Below are versions using apply, sapply and an explicit for loop. There's
not much difference in speed. But the last one, in which the data is in
a data.frame with rownames, is much slower.


 d - matrix(rnorm(12000), nrow=1000)
 system.time(ans - apply(d, 1, function(row) unlist(wilcox.test(row[1:6],
row[7:12])[c(p.value,statistic)])))
   user  system elapsed 
  2.660   0.064   2.730 
 system.time(ans2 - sapply(1:nrow(d), function(i)
unlist(wilcox.test(d[i,1:6], d[i,7:12])[c(p.value,statistic)])))
   user  system elapsed 
  2.480   0.108   2.583 
 system.time({ans3 - matrix(nrow=nrow(d), ncol=2) ;
for(i in 1:nrow(d)) {
ans3[i,] - unlist(wilcox.test(d[i,1:6], d[i,7:12])
[c(p.value,statistic)])}})
   user  system elapsed 
  2.504   0.000   2.503 

 d - as.data.frame(d)
 rownames(d) - paste(letters, 1:nrow(d))
 system.time(ans2 - sapply(1:nrow(d), function(i)
unlist(wilcox.test(as.numeric(d[i,1:6]),
as.numeric(d[i,7:12]))[c(p.value,statistic)])))
   user  system elapsed 
  5.673   0.212   5.885 

Dan


 My data set looks like this:
  11573_MB   11911_MB   11966_MB   12091_MB  12168_MB  
 12420_MB
 cg0292 0.62123125 0.82663502 0.74687013 0.61774927 0.7337809 0.73203721
 cg2426 0.63631315 0.64408750 0.61975158 0.72500713 0.5753110 0.65146526
 cg3994 0.05035499 0.05189776 0.05882848 0.11198073 0.1313330 0.03883439
 cg5847 0.13936423 0.14967690 0.31874454 0.15876243 0.117 0.15070058
 cg6414 0.09059770 0.09915681 0.09952658 0.13955982 0.1757718 0.07566312
 cg7981 0.05622769 0.04143790 0.07167018 0.08051046 0.1378107 0.0543
   ..  11573_CB   11911_CB   11966_CB   12091_CB   12168_CB 
 12420_CB
 cg0292 0.83059018 0.65396035 0.74519819 0.76007659 0.70335691 0.7857631
 cg2426 0.61450928 0.59160923 0.69857198 0.73028911 0.71808719 0.6741295
 cg3994 0.04223668 0.07910444 0.05416764 0.06156407 0.06381321 0.0643354
 cg5847 0.13897704 0.06407313 0.20449931 0.15683154 0.18936196 0.1610695
 cg6414 0.06520757 0.12243180 0.11380134 0.10957321 0.15759518 0.1236715
 cg7981 0.04789030 0.11699024 0.07143036 0.05996888 0.10829510 0.1069037
 .
 ..
 .
 .
 .
 There are 12 columns and 27000 rows. I have to perform paired test on each
 row (1:6 vs 7:12) and store the p value and statistic in two columns . Whats
 the fastest way?
 Gaurav Bhatti


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code working but too slow, any idea for how to speed it up ?(no loop in it)

2010-02-12 Thread Dan Davison
anna lippelanna24 at hotmail.com writes:

 
 Hello my friends, 
 here is a code I wrote with no loops on matrix that is taking too long (2
 seconds and I call him 720 times -- 12 minutes):
 
 mat1 and mat2 are both matrix with 103 columns and 164 rows.

Could you provide some example code creating matrices mat1 and mat2 which have
exactly the same structure as the mat1 and mat2 you are using. We don't really
want your exact data, but just toy matrices that have exactly the same form as
your data matrices. Without that your question's hard to answer as we can't try
out your code. failing that, please post the output of str(mat1) and str(mat2).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Access variables by string

2010-02-11 Thread Dan davison
Philipp Rappold philipp.rappold at gmail.com writes:

 
 Dear all,
 
[...]
 (2) I need this functionality for a customized na.exclude() function 
 that I am building, which should only exclude rows that have NA in 
 certain columns. Maybe there is already a function which does 
 exactly what I need, so I'd highly appreciate if someone could point 
 me there ;)

I would use something like

naexclude - function(data, varnames)
d[rowSums(is.na(data[,varnames,drop=FALSE])) == 0,]

Dan

 
 My current implementation looks like this:
 
 naexlcude - function(data, varnames)
 {
   for(v in varnames){
   data = subset(data, !is.na(v))
   }
 
   data
 }
 
 Best
 Philipp
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matching matrix columns to a vector

2008-11-24 Thread Dan Davison



Jagat.K.Sheth wrote:
 
 How about which(colSums(t-v) == 0) ?
 

But what about v=c(2,1,3)? It needs to be something like

which(colSums((t - v)^2)) == 0
or
which(colSums(abs(t - v))) == 0

Dan


Jagat.K.Sheth wrote:
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf Of Salas, Andria Kay
 Sent: Monday, November 24, 2008 10:04 AM
 To: r-help@r-project.org
 Subject: [R] matching matrix columns to a vector
 
 I need help with (hopefully) just one more thing.  I have been fussing
 with this for quite some time and have decided just to give up and ask!
 I want to match a column in a matrix to a vector.  I found a which
 command that I thought would be helpful as it does the following:
 
 g=c(1,5,3,2,7)
 which(g==5)
 [1] 2
 
 As the above gave which placement in the g vector corresponded to 5 (the
 second place), I need this command to give me which column in a matrix
 matches to a vector.
 
 This is just a toy example of what I am trying to do:
 t=matrix(1:12,3,4)
v=c(1,2,3)
which(t[,j]==v)
 
 This does not work, and with my real matrices and vectors, I was
 getting outputs that did not make sense.  These examples are more to
 give an idea of what I am aiming to accomplish.
 
 Thank you for all the help!!
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/matching-matrix-columns-to-a-vector-tp20664376p20668707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matching matrix columns to a vector

2008-11-24 Thread Dan Davison


Dan Davison wrote:
 
 
 
 Jagat.K.Sheth wrote:
 
 How about which(colSums(t-v) == 0) ?
 
 
 But what about v=c(2,1,3)? It needs to be something like
 
 which(colSums((t - v)^2)) == 0
 or
 which(colSums(abs(t - v))) == 0
 
Sorry, apparently I tried to write a line of R code without using emacs. Bad
idea. I meant
which(colSums((t - v)^2) == 0)
Dan

Dan Davison wrote:
 
 
 Dan
 
 
 Jagat.K.Sheth wrote:
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf Of Salas, Andria Kay
 Sent: Monday, November 24, 2008 10:04 AM
 To: r-help@r-project.org
 Subject: [R] matching matrix columns to a vector
 
 I need help with (hopefully) just one more thing.  I have been fussing
 with this for quite some time and have decided just to give up and ask!
 I want to match a column in a matrix to a vector.  I found a which
 command that I thought would be helpful as it does the following:
 
 g=c(1,5,3,2,7)
 which(g==5)
 [1] 2
 
 As the above gave which placement in the g vector corresponded to 5 (the
 second place), I need this command to give me which column in a matrix
 matches to a vector.
 
 This is just a toy example of what I am trying to do:
 t=matrix(1:12,3,4)
v=c(1,2,3)
which(t[,j]==v)
 
 This does not work, and with my real matrices and vectors, I was
 getting outputs that did not make sense.  These examples are more to
 give an idea of what I am aiming to accomplish.
 
 Thank you for all the help!!
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/matching-matrix-columns-to-a-vector-tp20664376p20668878.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a way to not use an explicit loop?

2008-09-17 Thread Dan Davison
Both shape parameters of rbeta can be vectors; for

x - rbeta(n, shape1, shape2)

x[i] ~ Beta(shape1[i], shape2[i])

so

bbsim - function(m=1000, num.post.draws=1e4, size.a=100, prob.a=.27, 
prior.count=1) {
data.count - rbinom(m, size.a, prob.a)
shape1 - rep(prior.count + data.count, each=num.post.draws)
shape2 - rep(prior.count + size.a - data.count, each=num.post.draws)
matrix(rbeta(m * num.post.draws, shape1, shape2), num.post.draws, m)
}

Then you can do

beta.draws - bbsim()
means - apply(beta.draws, 2, mean)
medians - apply(beta.draws, 2, median)
etc

Dan

On Wed, Sep 17, 2008 at 11:56:36AM -0700, Juancarlos Laguardia wrote:
 I have a problem in where i generate m independent draws from a binomial 
 distribution,
 say

 draw1 = rbinom( m , size.a, prob.a )


 then I need to use each draw to generate a beta distribution.  So, like 
 using a beta prior, binomial likelihood, and obtain beta posterior, m 
 many times.  I have not found out a way to vectorize draws from a beta 
 distribution, so I have an explicit for loop within my code



 for( i in 1: m ) {

 beta.post = rbeta( 1, draw1[i] + prior.constant  ,  prior.constant + 
 size.a  - draw1[i] )

 beta.post.mean[i] = mean(beta.post)
 beta.post.median[i] = median(beta.post)

 etc.. for other info

 }

 Is there a way to vectorize draws from an beta distribution?

 UC Slug

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
http://www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scripting in R -- pattern matching, logic, system calls, the works!

2008-09-16 Thread Dan Davison
Instead of writing some long, ugly, script, the way to use R is to
break problems down into distinct tasks. Reading data is one task, and
performing regressions on the data, plotting  summarising are
different tasks. Write functions to do each task in general, and then
use those functions.

So one task is reading the data from a Coverage dir. You want to do a
linear regression on the data, so you want to have the data stored as
a data frame. Following on from Don McQueen's good advice, here's a
function that does the job:

read.data.from.coverage.dir - function(dir, pattern=Length_[0-9]+, 
min.length=0, max.length=Inf) {

## return a data frame with lengths in first column and means of
## file contents in second column

files - list.files(dir, pattern)
lengths - as.numeric(gsub(Length_, , files, perl=TRUE))
files - files[lengths = min.length  lengths = max.length]
get.mean.from.file - function(file) mean(scan(file.path(dir,file), 
quiet=TRUE))
data.frame(x=lengths, y=sapply(files, get.mean.from.file))
}

And here's a function, that uses the first one, to get all the data
from your various coverage dirs

get.all.data - function(topdir) {
coverage.dirs - list.files(path=topdir, pattern=Coverage, 
full.names=TRUE)
lapply(coverage.dirs, read.data.from.coverage.dir)
}

So now you can do

## read all the data
all.data - get.all.data(topdir=~)

## perform all the regressions
regression.fits - lapply(all.data, function(df) lm(y ~ x, data=df))

## summarise them
summaries - lapply(regression.fits, summary)

## etc

All those commands are generating lists of objects; lapply is a
shorthand for doing a for loop over a list.

You can use sink() to redirect output, but it would probably be better
to create tables and/or figures in R first, then write them to files.

Dan


On Tue, Sep 16, 2008 at 07:01:42AM -0700, bioinformatics_guy wrote:
 
 Don,
 Excellent advice.  I've gone back and done a bit of coding and wanted to see
 what you think and possibly shore up some of the technical stuff I am
 still having a bit of difficulty with.
 
 I'll past the code I have to date with any important annotations:
 
 topdir=~
 library(gmodels)
 
 setwd(topdir)
 
 ### Will probably want to do two for loops as opposed to recursive
 files=list.files(path=topdir,pattern=Coverage)
 
 for (i in files)
 {
 dir=paste(~/hangers/,i,sep=)
 
 files2=list.files(path=dir,pattern=Length)
 
 ### Make an empty matrix that will have the independent variable as
 the filenum and the dependent variable
 ### as the mean of the length or should I have two vectors for the
 regression.  Basically the Length_(\d+) is the independent variable (which
 is taken from the filename) which all the regressions will have and then
 inside the Length_(\d+) is a 1d set of numbers which I take the mean of
 which in turn becomes the dependent variable.  So in essence the points are:
 f(length)=mean(length$V1)
 f(45)=50
 f(50)=60
 etc ...
 
 
 for (j in files2)
 {
 ## I just rearranged the following line but I'm not sure what the
 command is doing
 ## I am assuming 'as.numeric' means take the input as a number
 instead of a string and the gsub has#me stumped 

 filenum=as.numeric(gsub('Length_','',j))
 
 ## Can I assign variables at the top instead of hardcoding? like
 upper=50 , lower=30?
 ## And I don't need to put brackets for this if statement do I? 
 Does it basically just
 ## say that if the filenum is outside those parameters, just go to
 the next j in files2?
 if (filenum  200 | filenum  -10) next
 
 dir2=paste(~/hangers,i,j,sep=/)
 
 tmp=read.table(dir2)
 
 mean(tmp($V1))
 
 Now should I put these in a matrix or a vector (all j values (length
 vs mean(tmp$V1) for each i iteration) 
 }
 }
 
 I think lastly, Id like to get a print out of each of the regressions (each
 iteration of i).  Is that when I use the summary command?  And, like in
 unix, can I redirect the output to a file?
 
 Best
 
 
 Don MacQueen wrote:
  
  I can't go through all the details, but hopefully this will help get 
  you started.
  
  If you look at the help page for the list.files() function, you will see
  this:
  
list.files(path = ., pattern = NULL, all.files = FALSE,
   full.names = FALSE, recursive = FALSE,
   ignore.case = FALSE)
  
  The . in path means to start at your current working directory. 
  Assuming your 5 Coverage directories are subdirectories of your 
  current working directory, that's what you want.
  
  Then, setting recursive to TRUE will cause it to also list the 
  contents of all subdirectories. Since your Length files are in the 
  Coverage subdirectories, that's what you want.
  
  Finally, the pattern argument returns only files that match the 
  pattern, so something like
 patter=Length
  should get you 

Re: [R] Spatial join ? optimizing code

2008-09-16 Thread Dan Davison
Hi Monica,

I think the key to speeding this up is, for every point in 'track', to
compute the distance to all points in 'classif' 'simultaneously',
using vectorized calculations. Here's my function. On my laptop it's
about 160 times faster than the original for the case I looked at
(10,000 observations in track and 500 in classif). I get around 18
seconds for the 30,000 and 4,000 example (2 GHz processor running
linux).

Dan

dist.merge2 - function(x, y, xeast, xnorth, yeast, ynorth) {
## construct data frame d in which d[i,] contains information   
   
## associated with the closest point in y to x[i,]  
   
xpos - as.matrix(x[,c(xeast, xnorth)])
xposl - lapply(seq.int(nrow(x)), function(i) xpos[i,])
ypos - t(as.matrix(y[,c(yeast, ynorth)]))
yinfo - y[,! colnames(y) %in% c(yeast,ynorth)]

get.match.and.dist - function(point) {
sqdists - colSums((point - ypos)^2)
ind - which.min(sqdists)
c(ind, sqrt(sqdists[ind]))
}
match - sapply(xposl, get.match.and.dist)
cbind(xpos, mindist=match[2,], yinfo[match[1,],])
}

It's marginally faster to convert xpos to a list followed by sapply as
I do here, than to leave it as a matrix and use apply to get the
matches.






On Tue, Sep 16, 2008 at 04:23:33PM +, Monica Pisica wrote:
 
 Hi,
 
 Few days ago I have asked about spatial join on the minimum distance between 
 2 sets of points with coordinates and attributes in 2 different data frames.
 
 Simon Knapp sent code to do it when calculating distance on a sphere using 
 lat, long coordinates and I've change his code to use Euclidian distances 
 since my data had UTM coordinates. 
 
 Typically one data frame has around 30 000 points and the classification data 
 frame has around 4000 points, and the aim is to add to each point from the 
 first data frame all the attributes from the second data frame of the point 
 that is closest to it. 
 
 On my PC (Dell, OptiPlex GX620, X86 ? based PC, 4 GB RAM, 3192 Mhz processor)
 It took quite a long time to do the join:
 
user  system   elapsed 
 8166.07   2.98  8194.43
 
 Sys.info()
  sysname  release 
Windows XP 
  version nodename 
 build 2600, Service Pack 2  
  machine
x86   
 I am running R 2.7.1 patched.
 I wonder if any of you can suggest or help (or have time) in optimizing this 
 code to make it run faster. My programming skills are not high enough to do 
 it.
 
 Thanks,
 
 Monica
 
  code follows:
  x a data frame with over 3 points with coord in UTM, xeast, xnorth
  y a data frame with over 4000 points with UTM coord (yeast, ynorth) and 
 # classification
 ### calculating Euclidian distance
 
 dist - function(xeast, xnorth, yeast, ynorth) {
 ((xeast-yeast)^2 + (xnorth-ynorth)^2)^0.5
 }
 
 ### doing the merge by location with minimum distance
 
 dist.merge - function(x, y, xeast, xnorth, yeast, ynorth){
 tmp - t(apply(x[,c(xeast, xnorth)], 1, function(x, y){
 dists - apply(y, 1, function(x, y) dist(x[2],
 x[1], y[2], y[1]), x)
 cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
 }
 , y[,c(yeast, ynorth)]))
 tmp - cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast,
 ynorth), names(y))])
 row.names(tmp) - NULL
 tmp
 }
 
  code end
 
 _
 
  Live.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
http://www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very confused with class

2008-08-22 Thread Dan Davison

Hi Rich,


Richard M. Heiberger wrote:
 
 Dan,
 
 The real problem is the use of csv files.  csv files don't handle missing
 values
 (#VALUE is most likely from Excel), dates, or other complications very
 well.
 
 Read your Excel file directly into
 R with one of the packages designed specifically for that purpose.  I
 recommend
 RExcel (Windows only) which allows complete two-way communication between
 R
 and Excel.
 Missing values and dates are handled correctly.
 You can download the RExcelInstaller package from CRAN.
 

I'm sure RExcel is an excellent technology. However, it is an unnecessarily
complex technology in this instance. What I was trying to do was help the
original poster read in tabular data stored in a standard text format, which
is a fundamental skill for any R programmer. In general, I would encourage
people (beginners especially) to avoid the use of hi-tech solutions, when
simple text-based solutions suffice. But when people do need to have more
sophisticated integration of R and e.g. Excel, it's nice that the tools
exist.

Dan


Richard M. Heiberger wrote:
 
 
 Rich
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Very-confused-with-class-tp19090246p19104343.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [help] simulation of a simple Marcov Stochastic process for population genetics

2008-08-21 Thread Dan Davison
On Thu, Aug 21, 2008 at 03:00:51AM -0700, z3mao wrote:
 
 Hi, this is my first time using R. I want to simulate the following process:
 in a population of size N, there are i individuals bearing genotype A, the
 number of those bearing A is j in the next generation, which following a
 binominal distribution (choose j from 2*N, the p is i/2*N), to plot the
 probability of the next generations, my script is as follows. It cannot run
 successfully, declaring that the ylim should be limited. 

In a situation like this, try using options(error=recover) to debug.

 I wonder where  the bug is. Thanks very much!

There are several bugs... The most serious is that your homemade
binomial random number generator is wrong. (For example, look at what
happens when it is given a probability parameter of 0: it returns 1
rather than 0. Your alleles aren't going to be lost from the
population very often!). So, if someone has set you the task of
simulating drift without using a built-in binomial RNG, then you'll
need to think through your RNG code again. But if you are free to do
what you want, then you should use the R function rbinom to generate
binomial RVs.

Here are comments on the other bugs with a cleaned up (but still
probabilistically wrong) version below.

 
 generation-function(i,N)
 {
 m-1;

## Don't initialise m here; it gets initialised in the for loop

 gen-numeric(); 

##  gen - rep(NA, 50) is better

 for(m in 1:50)
   {
testp-runif(1,0,1);
 j-0; sump-0;
 while(sump  testp)
   {  sump-sump+dbinom(j,2*N,i/(2*N));
  j-j+1;
}
## I've already said that the above is wrong

  i-j; gen[m]-j/(2*N);

m-m+1; 

## The for loop deals with incrementing m; don't do it yourself!


}
   plot(m, gen[m]); 

## You want plot(1:50, gen, type=l)

## You don't need semicolons at the end of lines in R!

 }

Here's a version of your code that corrects the other bugs, but
still has your incorrect binomial RNG code in it.

generation - function(i,N)
{
warning(binomial RNG code is wrong)
mvals- 1:50;
gen- numeric();
for(m in mvals)
{
testp- runif(1,0,1);
j- 0; sump- 0;
while(sump  testp)
{
sump- sump+dbinom(j,2*N,i/(2*N));
j- j+1;
}
i- j; gen[m]- j/(2*N);## m- m+1;
}
plot(mvals, gen, type=l);
}

Dan




 -- 
 View this message in context: 
 http://www.nabble.com/-help--simulation-of-a-simple-Marcov-Stochastic-process-for-population-genetics-tp19085705p19085705.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
http://www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very confused with class

2008-08-21 Thread Dan Davison
Hi Robin,

You haven't said where you're getting the data from. But if the answer
is that you're using read.table, read.csv or similar to read the data
into R, then I advise you to go back to that stage and get it right
from the outset. It's very, very common to see people who are
relatively new to R splattering their code with calls to as.numeric,
just because they haven't read the data in properly in the first
place. It's also common in those who aren't new to R... So e.g. if you
are using read.table, then use the colClasses argument to specify the
classes of your columns, and use str() on the result until you're
happy with the data frame produced.

It's not entirely clear why you would have ended up with factors if
your data are numeric. That often happens when people mix characters
with numbers. Perhaps you have mixed the header row up with the data?

Anyway, what you are seeing are the integer encodings of the factors. E.g. 

 f - factor(11:20)
 str(f)
 Factor w/ 10 levels 11,12,13,..: 1 2 3 4 5 6 7 8 9 10
 as.numeric(f)
 [1]  1  2  3  4  5  6  7  8  9 10

But don't mess with them. Just make sure that things which shouldn't
be factors never become factors.

Dan

On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote:
 Hi all,
   I am very confused with class.
   I am looking at some weather data which I want to use as explanatory
 variables in an lm. R has treated these variables as factors (i.e. with
 different levels), whereas I want them treated as discretely measured
 continuous variables. So I need to reassign the class of these
 variables, right?
 Indeed, doing 
 class(southwest$pressure)
 (pressure being air pressure), I get 
 # factor.
   Now what class should I use to reassign them so that my model fitting
 process goes as I want it to? I have obviously done something wrong. I
 did 
 southwest$pressure - as(southwest$pressure,numeric)
 numeric seeming like a reasonable class to assign to this variable.
 However, doing some summary stats like 
 mean(southwest$pressure)
 # 341,
 max(southwest$pressure)
 # 761,
 which is clearly nonsense, as my maximum value is around 1040. Something
 similar has happened to maxtemp (maximum temperature), which I also
 reassigned from a factor to class numeric, which now apparently has a
 maximum value of 147! 
   Clearly it must be the reassignment of class that has caused these
 problems, as summary stats on the data before I reassigned the classes
 were fine. What is wrong with the class numeric? Reading the numeric
 help page didn't reveal anything to me. Can someone suggest the correct
 class?
 Many thanks for any help.  
 Robin Williams
 Met Office summer intern - Health Forecasting
 [EMAIL PROTECTED] 
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
http://www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very confused with class

2008-08-21 Thread Dan Davison
On Thu, Aug 21, 2008 at 04:20:57PM +0100, Williams, Robin wrote:

 Hi Dan, 
   Thanks for the reply, yes, I am using read.csv on the attached file.

OK, so how about using the colClasses argument. Your problem is that
some malfunctioning software has inserted the value #VALUE! into
some of your supposedly numeric cells. So deal with that with the
na.strings argument. Like I said, when reading in data, it's worth
spending a minute looking at the documentation for read.table/read.csv
rather than spending an hour messing about with the results of not
doing so.

 Southwest -  read.csv(southwest.csv, 
 colClasses=c(character,rep(numeric,10), character), 
 na.strings=#VALUE!)
 str(Southwest)
'data.frame':   1530 obs. of  12 variables:
 $ date  : chr  5/1/1997 5/2/1997 5/3/1997 5/4/1997 ...
 $ maxtemp   : num  18.8 21.8 16.6 14.9 14.2 9.3 9.9 12.7 12.8 13.2 ...
 $ mintemp   : num  7.7 9.8 11 12.2 11.3 4.5 2.1 5.7 6.7 7.3 ...
 $ pressure  : num  1028 1023 1015 1001  989 ...
 $ humid : num  59 44 83 80 87 57 64 83 70 69 ...
 $ wind  : num  8.4 11.1 8.2 17.4 13.8 16.2 11.1 14.9 12.7 16.6 ...
 $ rain  : num  0 0 6 1 3.3 2.6 4.3 6 3.2 1.6 ...
 $ index : num  1 2 3 4 5 6 7 8 9 10 ...
 $ admissions: num  5.00 4.72 5.16 3.67 3.62 ...
 $ detrended : num  4.79 4.47 5.30 3.91 3.51 ...
 $ detrended2: num  4.79 4.47 5.30 3.91 3.51 ...
 $ d.o.w.: chr  Thu Fri Sat Sun ...

NB you could coerce those dates to a date class rather than character
but I'll leave that up to you.

str() is your friend.

Dan

 However, as when I do 
 Southwest - data.frame(read.csv(southwest.csv)

read.csv returns a data frame; no need to wrap it in data.frame()

 Names(southwest)
   the output is the column headings (i.e. the variables), and looking at
 the data I only get the numbers, I assume the column headings haven't
 become confused with the data. 
 I.e. if I just do 
 Southwest$pressure
 The output is correct, i.e. the values contained in the pressure column.
 
   Appologies for my repeated question, but I'm somewhat confused on this
 one and my lack of experience with R isn't helping matters. I don't even
 understand why R is interpreting these figures as factors in the first
 place, doesn't this imply that any similar data would be interpreted as
 factors?   
 Thanks for any further help.
 Robin Williams 
 Met Office summer intern - Health Forecasting 
 [EMAIL PROTECTED] 
 -Original Message-
 From: Dan Davison [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, August 21, 2008 4:11 PM
 To: Williams, Robin
 Cc: r-help@r-project.org
 Subject: Re: [R] Very confused with class
 
 Hi Robin,
 
 You haven't said where you're getting the data from. But if the answer
 is that you're using read.table, read.csv or similar to read the data
 into R, then I advise you to go back to that stage and get it right from
 the outset. It's very, very common to see people who are relatively new
 to R splattering their code with calls to as.numeric, just because they
 haven't read the data in properly in the first place. It's also common
 in those who aren't new to R... So e.g. if you are using read.table,
 then use the colClasses argument to specify the classes of your columns,
 and use str() on the result until you're happy with the data frame
 produced.
 
 It's not entirely clear why you would have ended up with factors if your
 data are numeric. That often happens when people mix characters with
 numbers. Perhaps you have mixed the header row up with the data?
 
 Anyway, what you are seeing are the integer encodings of the factors.
 E.g. 
 
  f - factor(11:20)
  str(f)
  Factor w/ 10 levels 11,12,13,..: 1 2 3 4 5 6 7 8 9 10
  as.numeric(f)
  [1]  1  2  3  4  5  6  7  8  9 10
 
 But don't mess with them. Just make sure that things which shouldn't be
 factors never become factors.
 
 Dan
 
 On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote:
  Hi all,
I am very confused with class.
I am looking at some weather data which I want to use as explanatory
 
  variables in an lm. R has treated these variables as factors (i.e. 
  with different levels), whereas I want them treated as discretely 
  measured continuous variables. So I need to reassign the class of 
  these variables, right?
  Indeed, doing
  class(southwest$pressure)
  (pressure being air pressure), I get
  # factor.
Now what class should I use to reassign them so that my model 
  fitting process goes as I want it to? I have obviously done something 
  wrong. I did southwest$pressure - as(southwest$pressure,numeric) 
  numeric seeming like a reasonable class to assign to this variable.
  However, doing some summary stats like
  mean(southwest$pressure)
  # 341,
  max(southwest$pressure)
  # 761,
  which is clearly nonsense, as my maximum value is around 1040. 
  Something similar has happened to maxtemp (maximum temperature), which
 
  I also reassigned from a factor to class numeric, which now apparently
 
  has a maximum value of 147!
Clearly it must

Re: [R] Quickly calculating the mean results over a collection of data sets?

2008-08-12 Thread Dan Davison
On Tue, Aug 12, 2008 at 04:47:14AM -0400, Michael R. Head wrote:
 I have a collection of datasets in separate data frames which have 3
 independent test parameters (w, x, y) and one dependent variable (z) ,
 together with some additional static test data on each row. What I want
 is a data frame which contains the test data, the parameters (w, x, y)
 and the mean value of all (z)s in the Z column.
 
 Each datasets has  around 6000 rows and around 7 columns, which doesn't
 seem outrageously large, so it seems like this shouldn't too time
 consuming, but the way I've been approaching it seems to take way too
 long (20 seconds for datasets over 4 runs, longer for my datasets over
 10 runs). 
 
 My imperative-coding brain lead me to use for loops, which seems to be
 particularly problematic for R performance. My first attempt at this
 looked like the following, which takes roughly 60 seconds to complete. I
 rewrote it a little, but the code was much longer and effectively
 replaces one of the for loops with an lapply(). I could paste the other
 code, but it's much longer and less clear about its intent.
 

Hi Michael,

 ###
 # Start code snippet
 ###
 ### inputFiles just a list of paths to the test runs
 testRuns - lapply(inputFiles, 
   function(x) {
   read.table(x, header=TRUE)})

(Just BTW lapply(inputFiles, read.table, header=TRUE) is slightly nicer to look 
at)

 
 ### W, X, Y have (small) natural values
 w - unique(testRuns[[1]]$W)
 x - unique(testRuns[[1]]$X)
 y - unique(testRuns[[1]]$Y)
 
 ### All runs have the same values for all columns
 ### with the exception of the Z values, so just
 ### copy the first test run data
 testMeans - data.frame(testRuns[[1]])

How about rbind()ing all the data frames together, and working with
the combined data frame? Say that testRuns is

 testRuns
[[1]]
  W X Y  Z
1 1 5 5 -0.5251156
2 5 1 3  1.1761139
3 2 4 4 -0.8934380
4 5 1 1  1.4076303
5 5 3 1  0.4679745

[[2]]
  W X Y  Z
1 1 5 5 -0.8556862
2 5 1 3  0.3517671
3 2 4 4 -1.0202064
4 5 1 1  1.2152349
5 5 3 1  0.4340249

 allRuns - do.call(rbind, testRuns)
 aggregate(allRuns$Z, by=allRuns[c(W,X,Y)], mean)
  W X Y  x
1 5 1 1  1.3114326
2 5 3 1  0.4509997
3 5 1 3  0.7639405
4 2 4 4 -0.9568222
5 1 5 5 -0.6904009

Dan

 for(w0 in w) {
for(y0 in y) {
  for (x0 in x) {
row - which(testMeans$W == w0 
 testMeans$Y == y0 
 testMeans$X == x0)
meanValues - sapply(testRuns,
 function(r)
 {mean( subset(r,
   r$W == w0 
   r$Y == y0 
   r$X == x0)$Z )})
testMeans[row,]$Z = mean(meanValues)
  }
}
  }
 ### I will then want to plot certain values over (X, Z),
 ### so ultimately, I'm going to subset the data further.
 ### Code which gives me a list of W tables with mean Z values
 ### works, too.
 ###
 # End code snippet
 ###
 
 
 Thanks,
 mike
 
 -- 
 Michael R. Head [EMAIL PROTECTED]
 http://www.cs.binghamton.edu/~mike/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Frequency vector

2008-08-12 Thread Dan Davison
On Tue, Aug 12, 2008 at 01:21:29AM -0700, dennis11 wrote:
 
 I want to create a vecor with frequencies. 
 
 I have tried this:
 
 a - c(1,1,1,1,2,3,4,5,5)
 b - table(a)
 print (b[1])
 
 which results in:
  print (b[1])
 1 
 4 
 
 The only thing I want is the 4.
 
 So this seems obvious:
 print (b[1,2])

No! The 1 is just a label. You're not looking at a matrix. (BTW, I think you 
meant b[2,1]).

First I would say don't get rid of the 1 label unless you need
to. It's just a label telling you what the count is referring to, and
it wouldn't be there if there weren't a good reason for it. It won't
interfere with any numeric calculations you do, e.g.

 b[1] * 2
1
8

But if you really want to extract the integer counts from an object of
class table you could do

 as.vector(b)
[1] 4 1 1 1 2

Remember that if an object is not behaving as you would expect, use
str() and class() to see what you've really got:

 class(b)
[1] table
 str(b)
 'table' int [, 1:5] 4 1 1 1 2
 - attr(*, dimnames)=List of 1
  ..$ a: chr [1:5] 1 2 3 4 ...

Dan


 
 but it does not work:
 Error in b[1, 2] : incorrect number of dimensions
 
 How do I get a vector or how do I refer to the 4 without getting the 1
 label as well?





 -- 
 View this message in context: 
 http://www.nabble.com/Frequency-vector-tp18939882p18939882.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Between the values

2008-08-12 Thread Dan Davison
On Tue, Aug 12, 2008 at 05:16:01PM +0530, Shubha Vishwanath Karanth wrote:
 Hi R,
 
  
 
 This is a very trivial one
 
  
 
 C=0.1
 
  
 
 I want to check whether my value of C is between 0 and 1 exclusively
 I don't want to use (C0  C1). And I can't use a single statement like
 (0C1). Is there a between function? Or how do we specify from 0 to 1?
 Does %in% help me?

If you don't like (C  0  C  1), then just write your own function
is.between(x, low, high) (NB1 you've basically written it already; NB2
single '' for the vectorised version 'are.between'). People's
personal tastes about what's desirable will vary, and anyway it's good
practice to build up your own personal library of
functions. Ultimately if you have a high quality collection of related
functions for working on a particular sort of problem, then you should
publish them as an R package on CRAN.

Dan

 
  
 
  
 
 Many Thanks,
 
 Shubha
 
  
 
 This e-mail may contain confidential and/or privileged i...{{dropped:13}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
www.stats.ox.ac.uk/~davison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Between the values

2008-08-12 Thread Dan Davison



Shubha Vishwanath Karanth wrote:
 
 Or at least anyways of defining a vector/(or something like that) which
 has all values between 0 and 1? 
 
 For example:
 C(0,1) is incorrect, seq(0,1,0.2) is also incorrect, seq(0,1,0.1) is
 also incorrect How does one specify this?
 
 

Hi Shubha,

What are you trying to do? The set of all real numbers between 0 and 1 is
infinitely large. Obviously you can't explicitly construct an infinitely
large vector in R. If you want to construct an implicit specification of
that set, then I think I've already given you a good answer in R: define a
predicate function and use it. E.g.

between - function(x, low, high) x  low  x  high

I don't know much at all about symbolic mathematics packages like Maple and
Mathematica, but maybe you're thinking of something you can do in those
softwares? R is not trying to be a competitor to them; they do lots of
things R doesn't, and vice versa.

Dan


Shubha Vishwanath Karanth wrote:
 
 
 Thanks, Shubha
  
 
 -Original Message-
 From: Dan Davison [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, August 12, 2008 5:54 PM
 To: Shubha Vishwanath Karanth
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] Between the values
 
 On Tue, Aug 12, 2008 at 05:16:01PM +0530, Shubha Vishwanath Karanth
 wrote:
 Hi R,
 
  
 
 This is a very trivial one
 
  
 
 C=0.1
 
  
 
 I want to check whether my value of C is between 0 and 1
 exclusively
 I don't want to use (C0  C1). And I can't use a single statement
 like
 (0C1). Is there a between function? Or how do we specify from 0 to
 1?
 Does %in% help me?
 
 If you don't like (C  0  C  1), then just write your own function
 is.between(x, low, high) (NB1 you've basically written it already; NB2
 single '' for the vectorised version 'are.between'). People's
 personal tastes about what's desirable will vary, and anyway it's good
 practice to build up your own personal library of
 functions. Ultimately if you have a high quality collection of related
 functions for working on a particular sort of problem, then you should
 publish them as an R package on CRAN.
 
 Dan
 
 
  
 
  
 
 Many Thanks,
 
 Shubha
 
  
 
 This e-mail may contain confidential and/or privileged
 i...{{dropped:13}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 www.stats.ox.ac.uk/~davison
 This e-mail may contain confidential and/or privileged i...{{dropped:10}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Between-the-values-tp18943069p18944668.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dynamically extract data from a list

2008-08-12 Thread Dan Davison


Dries Knapen-2 wrote:
 
 Hi,
 
 Thanks for your reply. However, this didn't work exactly as I needed  
 it to since the expression is dynamically built as a character vector
 
 i.e. not executed as
 e - expression(Sepal.Width  4)
 
 but as
 e - expression(Sepal.Width  4)
 
 in which case subset() throws an error (must evaluate to logical).
 
 Fortunately, a good night of sleep resulted in this workaround:
 
 s - iris[Sepal.Width  4,]
 execute.string - function(string) {
write(string, 'tmp.txt')
out - source('tmp.txt')
unlink('tmp.txt')
return(out$value)
 }
 execute.string(s)
 
 

Is this what you want?

 eval(parse(text=s))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
16  5.7 4.4  1.5 0.4  setosa
33  5.2 4.1  1.5 0.1  setosa
34  5.5 4.2  1.4 0.2  setosa

Dan


Dries Knapen-2 wrote:
 
 
 
 On 12 Aug 2008, at 04:08, Gabor Grothendieck wrote:
 
 Try this:

 e - expression(Sepal.Width  4)
 subset(iris, eval(e), select = Sepal.Length)
Sepal.Length
 16  5.7
 33  5.2
 34  5.5
 subset(iris, eval(e))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
 16  5.7 4.4  1.5 0.4  setosa
 33  5.2 4.1  1.5 0.1  setosa
 34  5.5 4.2  1.4 0.2  setosa


 On Mon, Aug 11, 2008 at 9:36 PM, Dries Knapen  
 [EMAIL PROTECTED] wrote:
 Hi,

 Based on user input, I wrote a function that creates a list which  
 looks
 like:

 str(list)
 List of 4
  $ varieties: chr [1:12] temp.26_time.5dagen_biorep.1
 time.5dagen_temp.26_biorep.2 temp.18_time.5dagen_biorep.1
 temp.18_time.5dagen_biorep.2 ...
  $ temp : Factor w/ 2 levels 18,26: 2 2 1 1 2 2 1 1 1 1 ...
  $ time : Factor w/ 3 levels 14dagen,28dagen,..: 3 3 3 3 1  
 1 1 1 2 2
 ...
  $ biorep   : Factor w/ 2 levels 1,2: 1 2 1 2 1 2 1 2 1 2 ...

 Now, based on user input as well, I want to dynamically extract  
 data from
 list$varieties. Therefore, I wrote a function which generates a  
 string
 containing the data extraction conditions which looks like this:

 query - make.contrast.substring(negative.contrast, list)
 Read 1 item
 [1]
 (list$temp=='18')(list$time=='14dagen'|list$time=='28dagen'|list 
 $time=='5dagen')(list$biorep=='1'|list$biorep=='2')

 Now what I want to achieve is to extract data by doing:

 list$varieties[query]

 which doesn't work since query is a string and object names are not
 expanded...

 Obviously, manually copying the string like so

 list$varieties[(list$temp=='18')(list$time=='14dagen'|list 
 $time=='28dagen'|list$time=='5dagen')(list$biorep=='1'|list 
 $biorep=='2')]

 works perfectly - but I need it to be automated.

 I'm quite new to R and used to programming in PHP, so I may just be
 conceptually confused about how to do this. Any help would be  
 greatly
 appreciated.

 thanks in advance,
 Dries Knapen



 
 Dr. Dries Knapen

 University of Antwerp
 Department of Biology
 Ecophysiology, Biochemistry and Toxicology
 Groenenborgerlaan 171 - U711, B-2020 Antwerp
 Belgium

 tel ++32 3 265 33 49
 fax ++32 3 265 34 97

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/dynamically-extract-data-from-a-list-tp18936737p18945945.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scripting - query

2008-08-10 Thread Dan Davison
On Sun, Aug 10, 2008 at 02:44:00PM +1200, Gareth Campbell wrote:
 I have a vector:
 alleles.present-c(D3, D16, ... )
 
 The alleles present changes given the case I'm dealing with - i.e. either
 all of the alleles I use for my calculations are present, or some of them.
 
 Depending on what alleles are present, I need to make matrices and do
 calculations on those alleles present and completely disregard any formula
 or other use of the alleles not present.
 
 I'm trying to figure out the best way to do this.
 
 Basically I'm trying to do if() commands (with no success so far) to allow
 me to query the alleles.present for the presence of each allele I use and
 then let dictate which formula to use etc...
 
 Does anyone have a good way to do this?  I've been fiddling with grep()
 etc... but I can't get it to do what I need!!  Very frustrating.

It's going to be hard for people to make good suggestions here without
a concrete example. Can you provide a toy example that is as simple as
possible, while illustrating (some of) the problems you are trying to
solve?

Dan

p.s. Are you familiar with %in% ? E.g.

if(D3 %in% alleles.present) do.something()
else do.something.else()

See help(%in%)


 
 Thanks very much
 
 -- 
 Gareth Campbell
 PhD Candidate
 The University of Auckland
 
 P +649 815 3670
 M +6421 256 3511
 E [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting nested for loops to an apply function(s)

2008-08-10 Thread Dan Davison
On Sat, Aug 09, 2008 at 08:53:00PM -0400, Kurt Newman wrote:
 
 Resending.  Previous message was truncated.  Sorry for possible confusion.
 
 
  From: [EMAIL PROTECTED]
  To: r-help@r-project.org
  Date: Sat, 9 Aug 2008 18:25:47 -0400
  Subject: [R] Converting nested for loops to an apply function(s)
  
  
  Hello,
  

  I would like to know more about how to use the apply family and
  have attempted to convert nested for loops in example code from
  Contributed Documentation (The Friendly Beginners' R Course? by
  Toby Marthews (ZIP, 2007-03-01)) to an apply function(s).  The
  relevant code is:

  
  distances=c(51,65,175,196,197,125,10,56)#distances of 8 houses from the 
  town centre in m
  bearings=c(10,8,210,25,74,128,235,335)  #bearings of the houses in 
  degrees
  
  xpos=distances*sin(bearings*pi/180) #in sin and cos the argument 
  MUST be in radians
  ypos=distances*cos(bearings*pi/180) 
  
  numpoints=length(distances)
  nnd=rep(sqrt(2*400*400),times=numpoints)#start with the maximum 
  possible distance
  for (i in 1:numpoints) {
   for (j in 1:numpoints) {
if (i!=j) {   
 diffx=abs(xpos[i]-xpos[j])
 diffy=abs(ypos[i]-ypos[j])
 nd=sqrt((diffx^2)+(diffy^2))
 if (nd  nnd[i]) {nnd[i]=nd}
   }
  }
 }
 print(data.frame(xpos,ypos,nnd))
 

 My attempts to convert the nested for loops to an apply
  function(s) have not been successful.  I would like to know how to
  convert the code to increase my knowledge of R programming and to
  evaluate operational efficiency of the different strategies.

Hi Kurt,

It's not just the apply() family that help in vectorising problems. In
this case, outer() is also going to be helpful, as well as remembering
that all the standard arithmetical operators automatically
vectorise. I would use something like this:

nearest.neighbour.distance - function(xpos, ypos) {
xdist - abs(outer(xpos, xpos, -))
ydist - abs(outer(ypos, ypos, -))
dist - sqrt(xdist^2 + ydist^2)
diag(dist) - NA
apply(dist, 1, min, na.rm=TRUE)
}

Dan


 
 Thank you in advance for your comments / suggestions.
 
 Kurt Newman
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help using outer function

2008-08-10 Thread Dan Davison
On Sun, Aug 10, 2008 at 09:02:59AM -0700, warthog29 wrote:
 
 Hi,
 I would like to use the R's outer function on y below so that I can subtract
 elements from each other. The resulting dataframe is symmetric, save for the
   ^^
outer() returns a matrix, not a data frame.

 negative signs on the other half of the numbers. I would like to get only
 half of the dataframe. Here is the code I wrote (it is returning only the
 first line of the all elements I want. Please help). 
 y-c(4,4,3.9,3.8,3.7,3.6,3.5,3.5,3.5,3.3,3.2,3.2)
 
 b-outer(y,y,-)

 b-as.matrix(by)

I assume that line was supposed to be b-as.matrix(by). In any case
you don't need it; b is a matrix already.

 # I want to keep the elements:
 #b[1,2:12],
 #b[2,3:12],
 #.until
 #b[11,12:12].

Use upper.tri() to get the upper-triangle:

 b[upper.tri(b, diag=FALSE)]
 [1] 0.0 0.1 0.1 0.2 0.2 0.1 0.3 0.3 0.2 0.1 0.4 0.4 0.3 0.2 0.1 0.5 0.5 0.4 0.3
[20] 0.2 0.1 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.7 0.7
[39] 0.6 0.5 0.4 0.3 0.2 0.2 0.2 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.8 0.8
[58] 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.0

Or perhaps you want to knock out the negative entries, but still keep the 
matrix structure:

 b[lower.tri(b)] - NA

or perhaps you wanted 

b - abs(outer(y,y,-))

in the first place?

 #Here is the function I wrote to get half of matrix:
 
 wk-function(p){
 for (i in 2:p){
 ri-b[i-1,i:p]
 return(ri)
 }
 }
 wk(12)
 #[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.5 0.5 0.7 0.8 0.8

I think you were intending this function to be something like this

wk-function(p){
ri - NULL
for (i in 2:p){
ri-c(ri, b[i-1,i:p])
}
return(ri)
}

Note that this function will give a different result from upper.tri(),
because you are concatenating elements in the *rows* of the matrix,
whereas the way matrices are represented in R has consecutive elements
running down the columns. I.e. look at

 A - matrix(nrow=2,ncol=2)
 A
 [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
 A[] - 1:4
 A
 [,1] [,2]
[1,]13
[2,]24

Dan

 
 As you can see, it is only returning the first line. I would like other
 corresponding elements too, to be found in row 2 to 12. Thanks. 
 -- 
 View this message in context: 
 http://www.nabble.com/help-using-outer-function-tp18914432p18914432.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help using outer function

2008-08-10 Thread Dan Davison
On Sun, Aug 10, 2008 at 06:00:21PM +0100, Dan Davison wrote:
 On Sun, Aug 10, 2008 at 09:02:59AM -0700, warthog29 wrote:
  
  Hi,
  I would like to use the R's outer function on y below so that I can subtract
  elements from each other. The resulting dataframe is symmetric, save for the
^^
 outer() returns a matrix, not a data frame.
 
  negative signs on the other half of the numbers. I would like to get only
  half of the dataframe. Here is the code I wrote (it is returning only the
  first line of the all elements I want. Please help). 
  y-c(4,4,3.9,3.8,3.7,3.6,3.5,3.5,3.5,3.3,3.2,3.2)
  
  b-outer(y,y,-)
 
  b-as.matrix(by)
 
 I assume that line was supposed to be b-as.matrix(by). In any case
 
Hmm, I didn't really clarify things there. I meant
b-as.matrix(b). But anyway, not needed.

 you don't need it; b is a matrix already.
 
  # I want to keep the elements:
  #b[1,2:12],
  #b[2,3:12],
  #.until
  #b[11,12:12].
 
 Use upper.tri() to get the upper-triangle:
 
  b[upper.tri(b, diag=FALSE)]
  [1] 0.0 0.1 0.1 0.2 0.2 0.1 0.3 0.3 0.2 0.1 0.4 0.4 0.3 0.2 0.1 0.5 0.5 0.4 
 0.3
 [20] 0.2 0.1 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.7 
 0.7
 [39] 0.6 0.5 0.4 0.3 0.2 0.2 0.2 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.8 
 0.8
 [58] 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.0
 
 Or perhaps you want to knock out the negative entries, but still keep the 
 matrix structure:
 
  b[lower.tri(b)] - NA
 
 or perhaps you wanted 
 
 b - abs(outer(y,y,-))
 
 in the first place?
 
  #Here is the function I wrote to get half of matrix:
  
  wk-function(p){
  for (i in 2:p){
  ri-b[i-1,i:p]
  return(ri)
  }
  }
  wk(12)
  #[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.5 0.5 0.7 0.8 0.8
 
 I think you were intending this function to be something like this
 
 wk-function(p){
 ri - NULL
 for (i in 2:p){
 ri-c(ri, b[i-1,i:p])
 }
 return(ri)
 }
 
 Note that this function will give a different result from upper.tri(),
 because you are concatenating elements in the *rows* of the matrix,
 whereas the way matrices are represented in R has consecutive elements
 running down the columns. I.e. look at
 
  A - matrix(nrow=2,ncol=2)
  A
  [,1] [,2]
 [1,]   NA   NA
 [2,]   NA   NA
  A[] - 1:4
  A
  [,1] [,2]
 [1,]13
 [2,]24
 
 Dan
 
  
  As you can see, it is only returning the first line. I would like other
  corresponding elements too, to be found in row 2 to 12. Thanks. 
  -- 
  View this message in context: 
  http://www.nabble.com/help-using-outer-function-tp18914432p18914432.html
  Sent from the R help mailing list archive at Nabble.com.
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] import/export txt file

2008-08-09 Thread Dan Davison
On Fri, Aug 08, 2008 at 04:44:13PM -0700, Alessandro wrote:
 Hi All,
 
  
 
 I have 2 questions:
 
 1.   Import: when I import my txt file (X,Y and Z) in R with testground
 - read.table(file=c:/work_LIDAR_USA/R_kriging/ground26841492694149.txt,
 header=T), I lost the 4 number after the point (.).  does It possible add
 in the code the possibility to read the 4 numbers after the .
 

I think the problem is simply that you have options()$digits set to 7
(the default). Read the 'digits' section in help(options) and try

options(digits=11)

  
 
 2.   Does It possible to write a X, Y,  Z *txt file without the ID in R
 and sep, for the rows?
 

write.csv(your.data.frame, row.names=FALSE, quote=FALSE)

x - read.table(path/to/your/file.txt, header=T)
x
#  X   Y   Z
# 1 26800.47 4149984 1543.39
# 2 26800.47 4149984 1543.39
options(digits=11)
x
#  X  Y   Z
# 1 26800.47 4149983.94 1543.39
# 2 26800.47 4149983.94 1543.39
 write.csv(x, row.names=FALSE, quote=FALSE)
# X,Y,Z
# 26800.47,4149983.94,1543.39
# 26800.47,4149983.94,1543.39

Dan

  
 
 Example:
 
 Original data:
 
 X Y Z
 
 26800.4700 4149983.9400 1543.3900
 
 ... . ..
 
  
 
 I wish to create a txt file (with , sep):
 
 X, Y, Z
 
 26800.4700, 4149983.9400, 1543.3900
 
 ..., ., ..
 
  
 
 Thanks (It's Friday night, sorry I am tired)
 
  
 
 Ale
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table question

2008-08-09 Thread Dan Davison
On Fri, Aug 08, 2008 at 07:27:13PM -0700, Alessandro wrote:
 Hi All.
 
  
 
 I have a file txt with 3 columns (X, Y and Z).  every rows has 4 decimal
 place (i.e. x.). I use read.table to import the data in R, but with
 summary(), I don't see the decimal place after the dot. Is there any way for
 me to preserve the information?

I hope I've answered this in the first thread on the subject.

https://stat.ethz.ch/pipermail/r-help/2008-August/170422.html

Dan

p.s. People don't like it if you submit the same question twice.


 
  
 
 testground - read.table
 (file=c:/work_LIDAR_USA/R_kriging/ground26841492694149.txt, header=T)
 
  
 
 thanks
 
  
 
 Ale
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] effective matrix subset

2008-08-09 Thread Dan Davison
On Sat, Aug 09, 2008 at 06:29:59AM -0500, Marc Schwartz wrote:
 on 08/09/2008 06:01 AM [EMAIL PROTECTED] wrote:
 Hi;
 If we have a matrix A, and a vector X, where length(X)=nrow(A), and X
 contains a wanted column for each row in A, in row ascending order. How
 would be the most effective way to extract the desired vector V (with
 length(V)=nrow(A))?


 A - matrix(1:20, 4, 5)

  A
  [,1] [,2] [,3] [,4] [,5]
 [1,]159   13   17
 [2,]26   10   14   18
 [3,]37   11   15   19
 [4,]48   12   16   20


 # Create an arbitrary set of indices, one for each row in A
 X - c(2, 5, 1, 4)

  X
 [1] 2 5 1 4


 Presumably you want:

 V - c(A[1, 2], A[2, 5], A[3, 1], A[4, 4])

  V
 [1]  5 18  3 16


 If so, then:

  sapply(seq(nrow(A)), function(i) A[i, X[i]])
 [1]  5 18  3 16

Or

 A[cbind(seq(nrow(A)), X)]
[1]  5 18  3 16

Dan



 Is that what you were looking for?


 BTW, see ?diag for a special case:

  diag(A)
 [1]  1  6 11 16


 HTH,

 Marc Schwartz

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Index alternative to nasty FOR loop?

2008-08-07 Thread Dan Davison
On Wed, Aug 06, 2008 at 05:42:21PM +, zack holden wrote:
 
 Dear R wizards,
  
 I have a folder containing 1000 files. For each file, I need to extract the 
 first row of each file, paste it to a new file, then write out that file. 
 Then I need to repeat this operation for each additional row (row 2, then row 
 3, etc) for 23 rows in each file.
  
 I can do this with a for loop (as below). 

Hi Zack,

There's a few problems with your sketched-out for loop (see below),
but if I've understood your problem, then here are a couple of
solutions that use for loops in the way you were intending. They both
take line i from file 1, line i from file 2, ..., and write them to a
file called lines_i, for i in 1:23. The first one is for the case when
you have tabular data, so it uses read.table, and write.table. You
might want to mess about with the arguments to read.table and
write.table, specifying whether you have a header, and whether you
want the row.names printed out, etc. The second one is similar but
just works line by line, regardless of what the line looks like
(i.e. doesn't assume you have tabular data in the files).

collate.lines.1 - function(folder, nrows=23) {
files - list.files(folder, full.names=TRUE)
for(file in files) {
file.as.data.frame - read.table(file)
for(row in 1:nrows) {
outfile - paste(lines_, row, .csv, sep=)
write.table(file.as.data.frame[row,], file=outfile, append=TRUE, 
row.names=FALSE, col.names=FALSE, sep=,)
}
}
}

collate.lines.2 - function(folder, nrows=23) {
files - list.files(folder, full.names=TRUE)
for(file in files) {
file.as.character.vector - scan(file, what=, sep=\n)
for(row in 1:nrows) {
outfile - paste(lines, row, sep=_)
cat(file.as.character.vector[row], \n, file=outfile, append=TRUE)
}
}
}

  
 Is there a way to use some of the indexing power of R to get around this 
 nasty loop?

If you really mean that you want a solution without explicit for loops
in R, then that is possible. But I would recommend that you stick to
a straightforward solution until you're completely comfortable with
programming in that style. It's conceivable that the no-for-loop
versions might be faster if you have lots of files / rows, but don't
worry aout speed until it's a problem. Here's my effort at doing it
without for loops; it's a bit of a stretch and wasn't as easy to write
down as the first two. I've probably missed a cleaner solution.

collate.lines.1.fancy - function(folder, nrows=23) {
outfiles - paste(lines_, 1:nrows, .csv, sep=)
files - list.files(folder, full.names=TRUE)
files.as.data.frames - lapply(files, read.table)
x - lapply(files.as.data.frames, function(df) split(df, 
f=factor(1:nrow(df ## split all rows apart
x - do.call(mapply, c(x, list(FUN=function(...) rbind(...), 
SIMPLIFY=FALSE))) ## collate rows from different data frames
write.function - function(dataframe, outfile) write.table(dataframe, 
file=outfile, row.names=FALSE, col.names=FALSE, sep=,)
invisible(mapply(write.function, x, outfiles))
}

  
 Thank you in advance for any suggestions
  
 ###
 newoutfile - data.frame()
 list - list.files(c:/data) ## 'list' not such a good name as it's a 
 built-in function
  
 file = 1 ## you don't need this
 for(file in list) {
row - file[1, ] ## that's not going to work; 'list' is a character 
 vector, you haven't got the files as data.frames yet
newoutfile - rbind(row, newoutfile)
file = file + 1
 write.csv(outfile, file = output.csv)
 }
 
  
  
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Union of columns of two matrices

2008-08-07 Thread Dan Davison
On Wed, Aug 06, 2008 at 06:32:43PM -0400, Giuseppe Paleologo wrote:
 I was posed the following problem/teaser:
 
 given two matrices, come up with an elegant (=fast  short) function that
 returns a matrix with all and only the non-duplicated columns of both
 matrices; the column order does not matter. In essence, a matrix equivalent
 of union(x,y), where x and y are vectors. I could not come with anything
 nice. Any ideas?

union.matrices - function(a, b) {
u - cbind(a,b)
u[,!duplicated(u, MARGIN=2)]
}

?

(Obviously not attempting to deal with issues of identity of columns containing 
real numbers)

Dan

 
 Giuseppe
 
 -- 
 Giuseppe A. Paleologo :: Email: [EMAIL PROTECTED] :: AOL: gappy3000 ::
 Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497
 fact: 2^32,582,657-1 is a prime
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] List of occurrence matrices

2008-08-07 Thread Dan Davison


Lauri Nikkinen wrote:
 
 R users,
 
 I don't know if I can make myself clear but I'll give it a try. I have
 a data.frame like this
 
 x - var1,var2,var3,var4
 a,b,b,a
 b,b,c,b
 c,a,a,a
 a,b,c,c
 b,a,c,a
 c,c,b,b
 a,c,a,b
 b,c,a,c
 c,a,b,c
 DF - read.table(textConnection(x),  header=T, sep=,)
 DF
 
 and I would like to sum all the combinations/occurences by a factor
 (letter in this case) between variables and produce a list of
 occurrence matrices. For example in this case the occurrence
 matrix (first element of list) for factor a should look like this
 
occulist
 $a
   var1var2var3var4
 var1  x   0   1   1
 var2  0   x   1   2
 var3  1   1   x   1
 var4  1   2   1   x
 
 $b
 etc.
 
 because there is two rows where var2 and var4 has a
 
 
 I think this does it:
 
 occur.matrices - function(df) {
 levels - levels(unlist(df))
 ans - lapply(levels, function(level) crossprod(df == level))
 structure(ans, names=levels)
 }
 
 Dan
 
 occur.matrices(DF)
 $a
  var1 var2 var3 var4
 var13011
 var20312
 var31131
 var41213
 
 $b
  var1 var2 var3 var4
 var13101
 var21311
 var30131
 var41113
 
 $c
  var1 var2 var3 var4
 var13101
 var21301
 var30031
 var41113
 
 
 
 DF[DF$var2==a  DF$var4==a,]
 
 Can you give an advice how to achieve this kind of a list of matrices?
 
 -Lauri
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/List-of-%22occurrence%22-matrices-tp18870809p18871268.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating an array of lists

2008-08-07 Thread Dan Davison


Gang Chen-4 wrote:
 
 Hi,
 
 I want to store some number of outputs from running a bunch of
 analyses such as lm() into an array. I know how to do this with a
 one-dimensional array (vector) by creating
 
 myArray - vector(mode='list', length=10)
 
Note that in R terminology, 'myArray' is a list, not an array. You are right
to store things like lm() output in a list. If you want to store multiple lm
outputs in a way that is conceptually multi-dimensional, I would suggest
using lists of lists. Then you can use rapply(lm.fits, some.function,
how=replace) to process the model fits while keeping the multi-dimensional
structure.

Dan


Gang Chen-4 wrote:
 
 and storing each lm() result into a component of myArray.
 
 My question is, how can do this for a multiple dimensional array? It
 seems array() does not have such a 'mode' option as in vector(). Any
 alternatives?
 
 Thanks in advance,
 Gang
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Creating-an-array-of-lists-tp18874326p18875567.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Font size in plots (I do NOT understand par help)

2008-08-06 Thread Dan Davison
On Wed, Aug 06, 2008 at 03:37:48PM +0100, Stephane Bourgeois wrote:
 Hi,
 
  
 
 I do not get how par works, help please.
 
  
 
 Let's say I have a simple plot: plot(1:10)
 
  
 
 I want to change the font size for the x axis... how do I do that?


OK, so firstly go to the help page for par by typing
?par

I'm not saying you should read the whole thing right now. There's
quite a lot of options. But you want to change something to do with
axes, so search for the word 'axis'. The 3rd hit I get shows the
following lines.

  'cex.axis' The magnification to be used for axis annotation
  relative to the current setting of 'cex'.

 'cex.lab' The magnification to be used for x and y labels relative
  to the current setting of 'cex'.

Note that one of those refers to the axis annotation (i.e the numbers
along the axis), whereas the other refers to the axis labels. Now
there's two ways to proceed. First, note that par() is a
function. When you call the function, it changes the values of the
graphics parameters you specify. So say you want to make the axis
labels font twice as big. The first method would be

par(cex.lab=2)
plot(1:10)

An alternative method is as follows:

plot(1:10, cex.lab=2)

If you don't know why that works, look at the help page for plot by
typing ?plot, and read the stuff about the three dots (...)

If you go for the first method, one useful trick is to save the
previous values, so you can restore them. You would do that like this:

old.par.settings - par(cex.lab=2)
plot(1:10)
## now restore them
par(old.par.settings)

That works because the function par() happens to spit out the old
values as its return value, although its effect is to change them.

To be fair, you actually asked how to change the font size on the
x-axis, whereas the above changes it on both axes. AFAIK there's no
par() options that do exactly that, so the way I'd do it would be to
first plot without any axis labels, and subsequently add the x- and y-
labels independently using the title() function, and passing extra
'cex.lab=' arguments in the same way as the second method above:

 plot(1:10, xlab=, ylab=)
 title(xlab=xlab title, cex.lab=3)
 title(ylab=ylab title, cex.lab=.5)

Dan


 
  
 
 Thank you,
 
  
 
 Stephane
 
 
 
 
 -- 
  The Wellcome Trust Sanger Institute is operated by Genome Research 
 
  Limited, a charity registered in England with number 1021457 and a 
  compa
 ny registered in England with number 2742969, whose registered 
  office is 2
 15 Euston Road, London, NW1 2BE. 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.