[R] suggestions for plotting 5000 data points

2008-10-03 Thread Tania Oh

Dear all,

I have a collection of 5000 entries which represent the evolutionary  
rates of 3 animals.


I would like to show the differences between the rates of all 3  
animals and have tried using the function parallel (from the lattice  
package) and pairs() function.


The parallel function would have been perfect save for the large  
number of data (5000). The pairs() function doesn't show the  
difference explicitly. Does anyone have any suggestions on  
representing such data or have done similar plots?


I attach some simulated data:

mat3 -matrix(sample(1:5000),nrow=5000,ncol=3, byrow=TRUE)
colnames(mat3) - c(human,mouse, chicken)
mat3 -data.frame(mat3)
mat2$model - factor( rep(  Model 3),  labels=model3)


## code I used for parallel

require(lattice)
parallel( ~ mat3[1:3]|model , mat3,varnames = c(human\ndnds, mouse 
\ndnds, chicken\ndnds) )



any suggestions or pointers would be greatly appreciated.

many thanks
tania

D.phil student
Department of Physiology, Anatomy and Genetics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestions for plotting 5000 data points

2008-10-03 Thread Tania Oh

sorry, I made a slight typo in the code below, it should be


mat3 -matrix(sample(1:5000),nrow=5000,ncol=3, byrow=TRUE)
colnames(mat3) - c(human,mouse, chicken)
mat3 -data.frame(mat3)
mat3$model - factor( rep(  Model 3),  labels=model3)


## code I used for parallel

require(lattice)
parallel( ~ mat3[1:3]|model , mat3,varnames = c(human\ndnds, mouse 
\ndnds, chicken\ndnds) )



so very sorry to clog up your inboxes,
tania

On 3 Oct 2008, at 15:17, Tania Oh wrote:


Dear all,

I have a collection of 5000 entries which represent the evolutionary  
rates of 3 animals.


I would like to show the differences between the rates of all 3  
animals and have tried using the function parallel (from the lattice  
package) and pairs() function.


The parallel function would have been perfect save for the large  
number of data (5000). The pairs() function doesn't show the  
difference explicitly. Does anyone have any suggestions on  
representing such data or have done similar plots?


I attach some simulated data:

mat3 -matrix(sample(1:5000),nrow=5000,ncol=3, byrow=TRUE)
colnames(mat3) - c(human,mouse, chicken)
mat3 -data.frame(mat3)
mat2$model - factor( rep(  Model 3),  labels=model3)


## code I used for parallel

require(lattice)
parallel( ~ mat3[1:3]|model , mat3,varnames = c(human\ndnds, mouse 
\ndnds, chicken\ndnds) )



any suggestions or pointers would be greatly appreciated.

many thanks
tania

D.phil student
Department of Physiology, Anatomy and Genetics





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to read in multiple files with unequal number of columns

2008-04-23 Thread Tania Oh
Thank you John.
It was useful to know about this package.

I tried merge_all and I got this error:

Error in .subset2(x, i, exact = exact) : subscript out of bounds

It could be due to the way my data is and I will try the other  
solutions suggested by the other kind souls on this list.

Best wishes,
tania

On 22 Apr 2008, at 19:29, John Kane wrote:

 You might want to have a look at the merge_all
 function in the reshape package.
 --- Tania Oh [EMAIL PROTECTED] wrote:

 Dear all,

 I want to read in 1000 files which contain varying
 number of columns.
 For example:

 file[1] contains 8 columns (mixture of characters
 and numbers)
 file[2] contains 16 columns etc

 I'm reading everything into one big data frame and
 when I try rbind, R
 returns an error of
 Error in rbind(deparse.level, ...) :
   numbers of columns of arguments do not match


 Below is my code:

 all - NULL
 all - as.data.frame(all)

 ##read in the contents of the files
 for (f in 1:length(fnames)){

   tmp - try(read.table(fnames[f], header=F,
 fill=T, sep=\t),
 TRUE)

   if (class(tmp) == try-error) {
   next ## skip this file if it's
 empty/non-existent
}else{
 ## combine all the file contents into one
 big data frame
all - rbind(all, tmp)
   }
 }


 Here is some example of what the data in the files:

 L3 - LETTERS[1:3]
 (d - data.frame(cbind(x=1, y=1:10), fac=sample(L3,
 10, replace=TRUE)))

 str(d)
 'data.frame':10 obs. of  3 variables:
  $ x  : num  1 1 1 1 1 1 1 1 1 1
  $ y  : num  1 2 3 4 5 6 7 8 9 10
  $ fac: Factor w/ 3 levels A,B,C: 1 3 1 2 2 2
 2 1 1 2

 my.fake.data - data.frame(cbind(x=1, y=2))
 str(my.fake.data)
 'data.frame':1 obs. of  2 variables:
  $ x: num 1
  $ y: num 2


 all - rbind(d, my.fake.data)

 Error in rbind(deparse.level, ...) :
   numbers of columns of arguments do not match


 I've searched the R-site but couldn't find any
 relevant solution.I
 might have used the wrong keywords to search, so if
 this question has
 been answered already, I'd be very grateful if
 someone could point me
 to the post. Else any help/suggestions would be
 greatly appreciated.

 Many thanks in advance,
 tania

 D.Phil student
 Department of Physiology, Anatomy and Genetics
 University of Oxford

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.




   
 __
 Be smarter than spam. See how smart SpamGuard is at giving junk  
 email the boot with the All-new Yahoo! Mail.  Click on Options in  
 Mail and switch to New Mail today or register for free at http://mail.yahoo.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to read in multiple files with unequal number of columns

2008-04-22 Thread Tania Oh
Dear all,

I want to read in 1000 files which contain varying number of columns.
For example:

file[1] contains 8 columns (mixture of characters and numbers)
file[2] contains 16 columns etc

I'm reading everything into one big data frame and when I try rbind, R  
returns an error of
Error in rbind(deparse.level, ...) :
   numbers of columns of arguments do not match


Below is my code:

all - NULL
all - as.data.frame(all)

##read in the contents of the files
for (f in 1:length(fnames)){

   tmp - try(read.table(fnames[f], header=F, fill=T, sep=\t),  
TRUE)

   if (class(tmp) == try-error) {
   next ## skip this file if it's empty/non-existent
}else{
   ## combine all the file contents into one big data frame
all - rbind(all, tmp)
   }
}


Here is some example of what the data in the files:

L3 - LETTERS[1:3]
(d - data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE)))

  str(d)
'data.frame':   10 obs. of  3 variables:
  $ x  : num  1 1 1 1 1 1 1 1 1 1
  $ y  : num  1 2 3 4 5 6 7 8 9 10
  $ fac: Factor w/ 3 levels A,B,C: 1 3 1 2 2 2 2 1 1 2

my.fake.data - data.frame(cbind(x=1, y=2))
  str(my.fake.data)
'data.frame':   1 obs. of  2 variables:
  $ x: num 1
  $ y: num 2


all - rbind(d, my.fake.data)

Error in rbind(deparse.level, ...) :
   numbers of columns of arguments do not match


I've searched the R-site but couldn't find any relevant solution.I  
might have used the wrong keywords to search, so if this question has  
been answered already, I'd be very grateful if someone could point me  
to the post. Else any help/suggestions would be greatly appreciated.

Many thanks in advance,
tania

D.Phil student
Department of Physiology, Anatomy and Genetics
University of Oxford

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to read in multiple files with unequal number of columns

2008-04-22 Thread Tania Oh
Thanks Ingmar,

but when I used merge in :

all - merge(all, tmp),

I get an error:

Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
   invalid 'times' value

is  the error because of the  way I initialised 'all'?
what is the correct way of using merge in this case?

thanks
tania




On 22 Apr 2008, at 14:12, Ingmar Visser wrote:

 you may be looking for ?merge
 hth, Ingmar

 On 22 Apr 2008, at 15:05, Tania Oh wrote:

 Dear all,

 I want to read in 1000 files which contain varying number of columns.
 For example:

 file[1] contains 8 columns (mixture of characters and numbers)
 file[2] contains 16 columns etc

 I'm reading everything into one big data frame and when I try  
 rbind, R
 returns an error of
 Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match


 Below is my code:

 all - NULL
 all - as.data.frame(all)

 ##read in the contents of the files
 for (f in 1:length(fnames)){

tmp - try(read.table(fnames[f], header=F, fill=T, sep=\t),
 TRUE)

if (class(tmp) == try-error) {
next ## skip this file if it's empty/non-existent
 }else{
 ## combine all the file contents into one big data frame
 all - rbind(all, tmp)
}
 }


 Here is some example of what the data in the files:

 L3 - LETTERS[1:3]
 (d - data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10,  
 replace=TRUE)))

 str(d)
 'data.frame':10 obs. of  3 variables:
   $ x  : num  1 1 1 1 1 1 1 1 1 1
   $ y  : num  1 2 3 4 5 6 7 8 9 10
   $ fac: Factor w/ 3 levels A,B,C: 1 3 1 2 2 2 2 1 1 2

 my.fake.data - data.frame(cbind(x=1, y=2))
 str(my.fake.data)
 'data.frame':1 obs. of  2 variables:
   $ x: num 1
   $ y: num 2


 all - rbind(d, my.fake.data)

 Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match


 I've searched the R-site but couldn't find any relevant solution.I
 might have used the wrong keywords to search, so if this question has
 been answered already, I'd be very grateful if someone could point me
 to the post. Else any help/suggestions would be greatly appreciated.

 Many thanks in advance,
 tania

 D.Phil student
 Department of Physiology, Anatomy and Genetics
 University of Oxford

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Ingmar Visser
 Department of Psychology, University of Amsterdam
 Roetersstraat 15
 1018 WB Amsterdam
 The Netherlands
 t: +31-20-5256723



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is this an artifact of using which?

2008-04-14 Thread Tania Oh
Dear all,

I used which to obtain a subset of values from my data.frame.  
however, I find that there is a trace of the values I  have removed.  
Any suggestions would be greatly appreciate.

Below is my data:

d - data.frame( val   = 1:10,
 group = sample(LETTERS[1:5], 10, repl=TRUE) )

 d
val group
11 B
22 E
33 B
44 C
55 A
66 B
77 A
88 E
99 E
10  10 A

## selecting everything that is not group A
  d-d[which(d$group !=A),]

  d
   val group
1   1 B
2   2 E
3   3 B
4   4 C
6   6 B
8   8 E
9   9 E

  levels(d$group)
[1] A B C E

## why is group A still reflected here?

Many thanks in advance,
tania

D.phil student
Department of Physiology, Anatomy and Genetics
Oxford University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is this an artifact of using which?

2008-04-14 Thread Tania Oh
Dear Uwe,
thank you very much for this.
After reading your solution below, I searched the help pages for  
data.frame, which, factor but I didn't see the option for drop in  
them.

I googled and found drop associated with the function subset. is  
this the help page you were alluding to?


Sorry if I've missed something.
thanks so much in advance again.
tania

On 14 Apr 2008, at 12:39, Uwe Ligges wrote:



 Tania Oh wrote:
 Dear all,
 I used which to obtain a subset of values from my data.frame.   
 however, I find that there is a trace of the values I  have  
 removed.  Any suggestions would be greatly appreciate.
 Below is my data:
 d - data.frame( val   = 1:10,
 group = sample(LETTERS[1:5], 10, repl=TRUE) )
 d
val group
 11 B
 22 E
 33 B
 44 C
 55 A
 66 B
 77 A
 88 E
 99 E
 10  10 A
 ## selecting everything that is not group A
  d-d[which(d$group !=A),]
  d
   val group
 1   1 B
 2   2 E
 3   3 B
 4   4 C
 6   6 B
 8   8 E
 9   9 E
  levels(d$group)
 [1] A B C E
 ## why is group A still reflected here?


 Because you have removed elements from a factor objects that has  
 particular levels. You remove elements (=observations), but the  
 factor still knows that all levels are possible (stired in  
 attributes of the object).

 If you want to remove all levels without corresponding observations,  
 use explicit drop=TRUE as the help page suggests, e.g.:


 d - d[d$group != A, ]
 d$group - d$group[ , drop = TRUE]

 Uwe Ligges



 Many thanks in advance,
 tania
 D.phil student
 Department of Physiology, Anatomy and Genetics
 Oxford University
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to check if a variable is preferentially present in a sample

2008-04-08 Thread Tania Oh
Dear All,

I do apologise if this question is out of place for this list but I've  
tried searching mailing lists and read Introductory Statistics with  
R by Peter Dalgaard, but couldn't find any hints on solving my  
question below:

I have a data frame (d) of values which I will rank in decreasing  
order of val. Each value belongs to a group, either 'A', 'B', 'C',  
'D', or 'E'.  I then take the first 10 entries in data frame 'd'  and  
count the number of occurrences for each of the groups.  I want to  
test if certain groups occur more frequently than by chance in my  
first 10 entries. Would a chi-square test or a hypergeometric test be  
more suitable? If neither, what would be an alternative solution in  
R?  Below is my data:


## data
L5 - LETTERS[1:5]
d - data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100,  
repl=TRUE)))

str(d)
##'data.frame': 100 obs. of  2 variables:
##$ val  : Factor w/ 10 levels 0.000169268449333046,..: 10 3 5 6 1 2  
7 8 4 9 ...
##$ group: Factor w/ 5 levels A,B,C,D,..: 4 4 4 5 3 1 5 2 1  
2 ...


Many thanks in advance and apologies again,
tania

D. phil student
Department of Physiology, Anatomy and Genetics
University of Oxford

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.