Please, reply to the r-help and not only to me personally. That way
others can can also help, or perhaps benefit from the answers.
You can use strplit to remove the last part of the strings. strplit
returns a list of character vectors from which you (if I understand
you correctly) only want to select the first element. I use laply from
the plyr library for this, although there are probably also other ways
of doing this.
library(plyr)
dat$V3 - laply(strsplit(as.character(dat$V1), '_'), function(l) l[1])
After that you can use daply as I showed in my previous post
[daply(dat, V3 ~ V2, nrow)] or use the methods suggested by Dennis
Murphy to build your table.
Regards,
Jan
On Thu, Aug 26, 2010 at 1:41 AM, Trip Sweeney tripswee...@gmail.com wrote:
Jan,
Thanks for responding to my post to listeserve about arranging data matrix
for heat map.
I am still a beginner, so the below is the code I used for the matrix and
did not yet learn how to
input 'data.frame' (which I need to know to use your code). The below code
works
and mock.txt file is attached. There is one thing, though. The input in
column 1 is tricky
in the mock.txt file. I need it to sum per unique ID based on character
prior to the _
So, for example the current script call 1079_17891 and 1079_14794 uniques
when I want
them to be tallied together since they are both part of same 1079 samples.
Occasionally
a sample has three characters before the _, like 111_463428 etc in
mock.txt. The substring
after the _ is variable length. In the end, it should be one row for 1079,
one for 111, and one for 5576.
Can you help me with this modification of the code? Any advice much
appreciated. Sincerely, Trip
dat-read.table('mock.txt',sep=\t)
sumData=matrix(NA,nrow=length(unique(dat[,1])),ncol=length(unique(dat[,2])))
rownames(sumData)-unique(dat[,1])
colnames(sumData)-unique(dat[,2])
for (i in 1:dim(sumData)[1]){
for(j in 1:dim(sumData)[2]){
sumData[i,j]-sum (dat[,1]==unique(dat[,1])[i]
dat[,2]==unique(dat[,2])[j])
}
}
write.table(sumData,SummarizedData.txt,sep=\t,col.names=NA)
On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney tripswee...@gmail.com wrote:
Hi all,
I have read posts of heat map creation but I am one step prior --
Here is what I am trying to do and wonder if you have any tips?
We are trying to map sequence reads from tumors to viral genomes.
Example input file :
111 abc
111 sdf
111 xyz
1079 abc
1079 xyz
1079 xyz
5576 abc
5576 sdf
5576 sdf
How may xyz's are there for 1079 and 111? How many abc's, etc?
How many times did reads from sample (1079) align to virus xyz.
In some cases there are thousands per virus in a give sample, sometimes one.
The original file (two columns by tens of thousands of rows; 20 MB) is
text file (tab delimited).
Output file:
abc sdf xyz
111 1 1 1
1079 1 0 2
5576 1 2 0
Or, other ways to generate this data so I can then use it for heat map
creation?
Thanks for any help you may have,
rtsweeney
palo alto, ca
--
View this message in context:
http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
1079_346281416490|ref|NC_013643.1|
1079_346281416323|ref|NC_013646.1|
1079_3789629367|ref|NC_001803.1|
1079_58830984428|ref|NC_004812.1|
1079_1292 9629367|ref|NC_001803.1|
1079_3956 9629357|ref|NC_001802.1|
1079_4736 9629357|ref|NC_001802.1|
1079_7732 21427641|ref|NC_004015.1|
1079_7855 118197620|ref|NC_008584.1|
1079_8618 32453484|ref|NC_004928.1|
1079_11540 10140926|ref|NC_002531.1|
1079_14794 9629367|ref|NC_001803.1|
1079_15738 109255272|ref|NC_008168.1|
1079_17891 299778956|ref|NC_014260.1|
1079_18414 157781212|ref|NC_009823.1|
1079_18414 157781216|ref|NC_009824.1|
1079_20312 9629367|ref|NC_001803.1|
1079_20497 9629357|ref|NC_001802.1|
1079_26750 9629367|ref|NC_001803.1|
1079_27926 9628113|ref|NC_001659.1|
1079_27926 9628113|ref|NC_001659.1|
1079_28033 84662653|ref|NC_007710.1|
1079_30020 47835019|ref|NC_004333.2|
1079_30371 9629367|ref|NC_001803.1|
1079_35750 50313241|ref|NC_001491.2|
1079_35750 50313241|ref|NC_001491.2|
111_463428 56694721|ref|NC_006560.1|
111_464636 114680053|ref|NC_008349.1|
111_464636 9627742|ref|NC_001623.1|
111_465190 9627186|ref|NC_001539.1|
111_467613 51557483|ref|NC_006151.1|
111_467613 51557483|ref|NC_006151.1|
111_467975 9627742|ref|NC_001623.1|
111_467975 114680053|ref|NC_008349.1|
111_467975