[R] suggestions for plotting 5000 data points
Dear all, I have a collection of 5000 entries which represent the evolutionary rates of 3 animals. I would like to show the differences between the rates of all 3 animals and have tried using the function parallel (from the lattice package) and pairs() function. The parallel function would have been perfect save for the large number of data (5000). The pairs() function doesn't show the difference explicitly. Does anyone have any suggestions on representing such data or have done similar plots? I attach some simulated data: mat3 -matrix(sample(1:5000),nrow=5000,ncol=3, byrow=TRUE) colnames(mat3) - c(human,mouse, chicken) mat3 -data.frame(mat3) mat2$model - factor( rep( Model 3), labels=model3) ## code I used for parallel require(lattice) parallel( ~ mat3[1:3]|model , mat3,varnames = c(human\ndnds, mouse \ndnds, chicken\ndnds) ) any suggestions or pointers would be greatly appreciated. many thanks tania D.phil student Department of Physiology, Anatomy and Genetics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suggestions for plotting 5000 data points
sorry, I made a slight typo in the code below, it should be mat3 -matrix(sample(1:5000),nrow=5000,ncol=3, byrow=TRUE) colnames(mat3) - c(human,mouse, chicken) mat3 -data.frame(mat3) mat3$model - factor( rep( Model 3), labels=model3) ## code I used for parallel require(lattice) parallel( ~ mat3[1:3]|model , mat3,varnames = c(human\ndnds, mouse \ndnds, chicken\ndnds) ) so very sorry to clog up your inboxes, tania On 3 Oct 2008, at 15:17, Tania Oh wrote: Dear all, I have a collection of 5000 entries which represent the evolutionary rates of 3 animals. I would like to show the differences between the rates of all 3 animals and have tried using the function parallel (from the lattice package) and pairs() function. The parallel function would have been perfect save for the large number of data (5000). The pairs() function doesn't show the difference explicitly. Does anyone have any suggestions on representing such data or have done similar plots? I attach some simulated data: mat3 -matrix(sample(1:5000),nrow=5000,ncol=3, byrow=TRUE) colnames(mat3) - c(human,mouse, chicken) mat3 -data.frame(mat3) mat2$model - factor( rep( Model 3), labels=model3) ## code I used for parallel require(lattice) parallel( ~ mat3[1:3]|model , mat3,varnames = c(human\ndnds, mouse \ndnds, chicken\ndnds) ) any suggestions or pointers would be greatly appreciated. many thanks tania D.phil student Department of Physiology, Anatomy and Genetics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to read in multiple files with unequal number of columns
Thank you John. It was useful to know about this package. I tried merge_all and I got this error: Error in .subset2(x, i, exact = exact) : subscript out of bounds It could be due to the way my data is and I will try the other solutions suggested by the other kind souls on this list. Best wishes, tania On 22 Apr 2008, at 19:29, John Kane wrote: You might want to have a look at the merge_all function in the reshape package. --- Tania Oh [EMAIL PROTECTED] wrote: Dear all, I want to read in 1000 files which contain varying number of columns. For example: file[1] contains 8 columns (mixture of characters and numbers) file[2] contains 16 columns etc I'm reading everything into one big data frame and when I try rbind, R returns an error of Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Below is my code: all - NULL all - as.data.frame(all) ##read in the contents of the files for (f in 1:length(fnames)){ tmp - try(read.table(fnames[f], header=F, fill=T, sep=\t), TRUE) if (class(tmp) == try-error) { next ## skip this file if it's empty/non-existent }else{ ## combine all the file contents into one big data frame all - rbind(all, tmp) } } Here is some example of what the data in the files: L3 - LETTERS[1:3] (d - data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE))) str(d) 'data.frame':10 obs. of 3 variables: $ x : num 1 1 1 1 1 1 1 1 1 1 $ y : num 1 2 3 4 5 6 7 8 9 10 $ fac: Factor w/ 3 levels A,B,C: 1 3 1 2 2 2 2 1 1 2 my.fake.data - data.frame(cbind(x=1, y=2)) str(my.fake.data) 'data.frame':1 obs. of 2 variables: $ x: num 1 $ y: num 2 all - rbind(d, my.fake.data) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match I've searched the R-site but couldn't find any relevant solution.I might have used the wrong keywords to search, so if this question has been answered already, I'd be very grateful if someone could point me to the post. Else any help/suggestions would be greatly appreciated. Many thanks in advance, tania D.Phil student Department of Physiology, Anatomy and Genetics University of Oxford __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail. Click on Options in Mail and switch to New Mail today or register for free at http://mail.yahoo.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to read in multiple files with unequal number of columns
Dear all, I want to read in 1000 files which contain varying number of columns. For example: file[1] contains 8 columns (mixture of characters and numbers) file[2] contains 16 columns etc I'm reading everything into one big data frame and when I try rbind, R returns an error of Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Below is my code: all - NULL all - as.data.frame(all) ##read in the contents of the files for (f in 1:length(fnames)){ tmp - try(read.table(fnames[f], header=F, fill=T, sep=\t), TRUE) if (class(tmp) == try-error) { next ## skip this file if it's empty/non-existent }else{ ## combine all the file contents into one big data frame all - rbind(all, tmp) } } Here is some example of what the data in the files: L3 - LETTERS[1:3] (d - data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE))) str(d) 'data.frame': 10 obs. of 3 variables: $ x : num 1 1 1 1 1 1 1 1 1 1 $ y : num 1 2 3 4 5 6 7 8 9 10 $ fac: Factor w/ 3 levels A,B,C: 1 3 1 2 2 2 2 1 1 2 my.fake.data - data.frame(cbind(x=1, y=2)) str(my.fake.data) 'data.frame': 1 obs. of 2 variables: $ x: num 1 $ y: num 2 all - rbind(d, my.fake.data) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match I've searched the R-site but couldn't find any relevant solution.I might have used the wrong keywords to search, so if this question has been answered already, I'd be very grateful if someone could point me to the post. Else any help/suggestions would be greatly appreciated. Many thanks in advance, tania D.Phil student Department of Physiology, Anatomy and Genetics University of Oxford __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to read in multiple files with unequal number of columns
Thanks Ingmar, but when I used merge in : all - merge(all, tmp), I get an error: Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : invalid 'times' value is the error because of the way I initialised 'all'? what is the correct way of using merge in this case? thanks tania On 22 Apr 2008, at 14:12, Ingmar Visser wrote: you may be looking for ?merge hth, Ingmar On 22 Apr 2008, at 15:05, Tania Oh wrote: Dear all, I want to read in 1000 files which contain varying number of columns. For example: file[1] contains 8 columns (mixture of characters and numbers) file[2] contains 16 columns etc I'm reading everything into one big data frame and when I try rbind, R returns an error of Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Below is my code: all - NULL all - as.data.frame(all) ##read in the contents of the files for (f in 1:length(fnames)){ tmp - try(read.table(fnames[f], header=F, fill=T, sep=\t), TRUE) if (class(tmp) == try-error) { next ## skip this file if it's empty/non-existent }else{ ## combine all the file contents into one big data frame all - rbind(all, tmp) } } Here is some example of what the data in the files: L3 - LETTERS[1:3] (d - data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE))) str(d) 'data.frame':10 obs. of 3 variables: $ x : num 1 1 1 1 1 1 1 1 1 1 $ y : num 1 2 3 4 5 6 7 8 9 10 $ fac: Factor w/ 3 levels A,B,C: 1 3 1 2 2 2 2 1 1 2 my.fake.data - data.frame(cbind(x=1, y=2)) str(my.fake.data) 'data.frame':1 obs. of 2 variables: $ x: num 1 $ y: num 2 all - rbind(d, my.fake.data) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match I've searched the R-site but couldn't find any relevant solution.I might have used the wrong keywords to search, so if this question has been answered already, I'd be very grateful if someone could point me to the post. Else any help/suggestions would be greatly appreciated. Many thanks in advance, tania D.Phil student Department of Physiology, Anatomy and Genetics University of Oxford __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ingmar Visser Department of Psychology, University of Amsterdam Roetersstraat 15 1018 WB Amsterdam The Netherlands t: +31-20-5256723 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is this an artifact of using which?
Dear all, I used which to obtain a subset of values from my data.frame. however, I find that there is a trace of the values I have removed. Any suggestions would be greatly appreciate. Below is my data: d - data.frame( val = 1:10, group = sample(LETTERS[1:5], 10, repl=TRUE) ) d val group 11 B 22 E 33 B 44 C 55 A 66 B 77 A 88 E 99 E 10 10 A ## selecting everything that is not group A d-d[which(d$group !=A),] d val group 1 1 B 2 2 E 3 3 B 4 4 C 6 6 B 8 8 E 9 9 E levels(d$group) [1] A B C E ## why is group A still reflected here? Many thanks in advance, tania D.phil student Department of Physiology, Anatomy and Genetics Oxford University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this an artifact of using which?
Dear Uwe, thank you very much for this. After reading your solution below, I searched the help pages for data.frame, which, factor but I didn't see the option for drop in them. I googled and found drop associated with the function subset. is this the help page you were alluding to? Sorry if I've missed something. thanks so much in advance again. tania On 14 Apr 2008, at 12:39, Uwe Ligges wrote: Tania Oh wrote: Dear all, I used which to obtain a subset of values from my data.frame. however, I find that there is a trace of the values I have removed. Any suggestions would be greatly appreciate. Below is my data: d - data.frame( val = 1:10, group = sample(LETTERS[1:5], 10, repl=TRUE) ) d val group 11 B 22 E 33 B 44 C 55 A 66 B 77 A 88 E 99 E 10 10 A ## selecting everything that is not group A d-d[which(d$group !=A),] d val group 1 1 B 2 2 E 3 3 B 4 4 C 6 6 B 8 8 E 9 9 E levels(d$group) [1] A B C E ## why is group A still reflected here? Because you have removed elements from a factor objects that has particular levels. You remove elements (=observations), but the factor still knows that all levels are possible (stired in attributes of the object). If you want to remove all levels without corresponding observations, use explicit drop=TRUE as the help page suggests, e.g.: d - d[d$group != A, ] d$group - d$group[ , drop = TRUE] Uwe Ligges Many thanks in advance, tania D.phil student Department of Physiology, Anatomy and Genetics Oxford University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to check if a variable is preferentially present in a sample
Dear All, I do apologise if this question is out of place for this list but I've tried searching mailing lists and read Introductory Statistics with R by Peter Dalgaard, but couldn't find any hints on solving my question below: I have a data frame (d) of values which I will rank in decreasing order of val. Each value belongs to a group, either 'A', 'B', 'C', 'D', or 'E'. I then take the first 10 entries in data frame 'd' and count the number of occurrences for each of the groups. I want to test if certain groups occur more frequently than by chance in my first 10 entries. Would a chi-square test or a hypergeometric test be more suitable? If neither, what would be an alternative solution in R? Below is my data: ## data L5 - LETTERS[1:5] d - data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100, repl=TRUE))) str(d) ##'data.frame': 100 obs. of 2 variables: ##$ val : Factor w/ 10 levels 0.000169268449333046,..: 10 3 5 6 1 2 7 8 4 9 ... ##$ group: Factor w/ 5 levels A,B,C,D,..: 4 4 4 5 3 1 5 2 1 2 ... Many thanks in advance and apologies again, tania D. phil student Department of Physiology, Anatomy and Genetics University of Oxford __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.