Re: [R] frequency, count rows, data for heat map

2010-08-26 Thread Jan van der Laan
Please, reply to the r-help and not only to me personally. That way
others can can also help, or perhaps benefit from the answers.

You can use strplit to remove the last part of the strings. strplit
returns a list of character vectors from which you (if I understand
you correctly) only want to select the first element. I use laply from
the plyr library for this, although there are probably also other ways
of doing this.

library(plyr)
dat$V3 - laply(strsplit(as.character(dat$V1), '_'), function(l) l[1])

After that you can use daply as I showed in my previous post
[daply(dat, V3 ~ V2, nrow)] or use the methods suggested by Dennis
Murphy to build your table.

Regards,

Jan



On Thu, Aug 26, 2010 at 1:41 AM, Trip Sweeney tripswee...@gmail.com wrote:
 Jan,
 Thanks for responding to my post to listeserve about arranging data matrix
 for heat map.
 I am still a beginner, so the below is the code I used for the matrix and
 did not yet learn how to
 input 'data.frame' (which I need to know to use your code). The below code
 works
 and mock.txt file is attached. There is one thing, though. The input in
 column 1 is tricky
 in the mock.txt file. I need it to sum per unique ID based on character
 prior to the _
 So, for example the current script call 1079_17891 and 1079_14794 uniques
 when I want
 them to be tallied together since they are both part of same 1079 samples.
 Occasionally
 a sample has three characters before the _, like 111_463428 etc in
 mock.txt. The substring
 after the _ is variable length. In the end, it should be one row for 1079,
 one for 111, and one for 5576.
 Can you help me with this modification of the code? Any advice much
 appreciated. Sincerely, Trip

 dat-read.table('mock.txt',sep=\t)
 sumData=matrix(NA,nrow=length(unique(dat[,1])),ncol=length(unique(dat[,2])))
 rownames(sumData)-unique(dat[,1])
 colnames(sumData)-unique(dat[,2])

 for (i in 1:dim(sumData)[1]){
   for(j in 1:dim(sumData)[2]){
  sumData[i,j]-sum (dat[,1]==unique(dat[,1])[i] 
 dat[,2]==unique(dat[,2])[j])
   }
 }

 write.table(sumData,SummarizedData.txt,sep=\t,col.names=NA)




On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney tripswee...@gmail.com wrote:

 Hi all,
 I have read posts of heat map creation but I am one step prior --
 Here is what I am trying to do and wonder if you have any tips?
 We are trying to map sequence reads from tumors to viral genomes.

 Example input file :
 111     abc
 111     sdf
 111     xyz
 1079   abc
 1079   xyz
 1079   xyz
 5576   abc
 5576   sdf
 5576   sdf

 How may xyz's are there for 1079 and 111? How many abc's, etc?
 How many times did reads from sample (1079) align to virus xyz.
 In some cases there are thousands per virus in a give sample, sometimes one.
 The original file (two columns by tens of thousands of rows; 20 MB) is
 text file (tab delimited).

 Output file:
         abc  sdf  xyz
 111     1      1     1
 1079   1      0     2
 5576   1      2     0

 Or, other ways to generate this data so I can then use it for heat map
 creation?

 Thanks for any help you may have,

 rtsweeney
 palo alto, ca
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

1079_346281416490|ref|NC_013643.1|
1079_346281416323|ref|NC_013646.1|
1079_3789629367|ref|NC_001803.1|
1079_58830984428|ref|NC_004812.1|
1079_1292   9629367|ref|NC_001803.1|
1079_3956   9629357|ref|NC_001802.1|
1079_4736   9629357|ref|NC_001802.1|
1079_7732   21427641|ref|NC_004015.1|
1079_7855   118197620|ref|NC_008584.1|
1079_8618   32453484|ref|NC_004928.1|
1079_11540  10140926|ref|NC_002531.1|
1079_14794  9629367|ref|NC_001803.1|
1079_15738  109255272|ref|NC_008168.1|
1079_17891  299778956|ref|NC_014260.1|
1079_18414  157781212|ref|NC_009823.1|
1079_18414  157781216|ref|NC_009824.1|
1079_20312  9629367|ref|NC_001803.1|
1079_20497  9629357|ref|NC_001802.1|
1079_26750  9629367|ref|NC_001803.1|
1079_27926  9628113|ref|NC_001659.1|
1079_27926  9628113|ref|NC_001659.1|
1079_28033  84662653|ref|NC_007710.1|
1079_30020  47835019|ref|NC_004333.2|
1079_30371  9629367|ref|NC_001803.1|
1079_35750  50313241|ref|NC_001491.2|
1079_35750  50313241|ref|NC_001491.2|
111_463428  56694721|ref|NC_006560.1|
111_464636  114680053|ref|NC_008349.1|
111_464636  9627742|ref|NC_001623.1|
111_465190  9627186|ref|NC_001539.1|
111_467613  51557483|ref|NC_006151.1|
111_467613  51557483|ref|NC_006151.1|
111_467975  9627742|ref|NC_001623.1|
111_467975  114680053|ref|NC_008349.1|
111_467975  

Re: [R] frequency, count rows, data for heat map

2010-08-25 Thread Jan van der Laan
Your problem is not completely clear to me, but perhaps something like

data - data.frame(
   a = rep(c(1,2), each=10),
   b = rep(c('a', 'b', 'c', 'd'), 5))
library(plyr)
daply(data, a ~ b, nrow)

does what you need.

Regards,
Jan

On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney tripswee...@gmail.com wrote:

 Hi all,
 I have read posts of heat map creation but I am one step prior --
 Here is what I am trying to do and wonder if you have any tips?
 We are trying to map sequence reads from tumors to viral genomes.

 Example input file :
 111     abc
 111     sdf
 111     xyz
 1079   abc
 1079   xyz
 1079   xyz
 5576   abc
 5576   sdf
 5576   sdf

 How may xyz's are there for 1079 and 111? How many abc's, etc?
 How many times did reads from sample (1079) align to virus xyz.
 In some cases there are thousands per virus in a give sample, sometimes one.
 The original file (two columns by tens of thousands of rows; 20 MB) is
 text file (tab delimited).

 Output file:
         abc  sdf  xyz
 111     1      1     1
 1079   1      0     2
 5576   1      2     0

 Or, other ways to generate this data so I can then use it for heat map
 creation?

 Thanks for any help you may have,

 rtsweeney
 palo alto, ca
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] frequency, count rows, data for heat map

2010-08-25 Thread Dennis Murphy
Hi:

Here are a couple of ways to render a basic 2D table. Let's call your input
data frame dat:

 names(dat) - c('samp', 'sequen')
 ssTab - as.data.frame(with(dat, table(samp, sequen)))
 ssTab   # data frame version
  samp sequen Freq
1  111abc1
2 1079abc1
3 5576abc1
4  111sdf1
5 1079sdf0
6 5576sdf2
7  111xyz1
8 1079xyz2
9 5576xyz0
 with(dat, table(samp, sequen))   # table version
  sequen
samp   abc sdf xyz
  1111   1   1
  1079   1   0   2
  5576   1   2   0

HTH,
Dennis

On Wed, Aug 25, 2010 at 7:53 AM, rtsweeney tripswee...@gmail.com wrote:


 Hi all,
 I have read posts of heat map creation but I am one step prior --
 Here is what I am trying to do and wonder if you have any tips?
 We are trying to map sequence reads from tumors to viral genomes.

 Example input file :
 111 abc
 111 sdf
 111 xyz
 1079   abc
 1079   xyz
 1079   xyz
 5576   abc
 5576   sdf
 5576   sdf

 How may xyz's are there for 1079 and 111? How many abc's, etc?
 How many times did reads from sample (1079) align to virus xyz.
 In some cases there are thousands per virus in a give sample, sometimes
 one.
 The original file (two columns by tens of thousands of rows; 20 MB) is
 text file (tab delimited).

 Output file:
 abc  sdf  xyz
 111 1  1 1
 1079   1  0 2
 5576   1  2 0

 Or, other ways to generate this data so I can then use it for heat map
 creation?

 Thanks for any help you may have,

 rtsweeney
 palo alto, ca
 --
 View this message in context:
 http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.