Re: [R] How to Store the executed values in a dataframe rle function

2011-09-29 Thread viritha k
Thanks,It does work for the sample data.
When I use it for my actual data it is throwing this error
Error in data.frame(Sample = .samp, Chr = .set$Chr[1L], Start =
min(.set$Start),  :
  arguments imply differing number of rows: 1, 0
I am not able to understand Why I am getting this?
waiting for your reply,
Thanks,
Suji
On Wed, Sep 28, 2011 at 4:15 PM, jim holtman jholt...@gmail.com wrote:

 I only used textConnection for the sample data.  Just put your file
 name in the read.table; e.g.,


 x-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))

 as you have in your email.  I used 'x' in my code, so I replaced your
 'm' with 'x'.

 Try it and see if it works; no reason it shouldn't.



 On Wed, Sep 28, 2011 at 3:03 PM, viritha k virith...@gmail.com wrote:
  Hi Jim,
   Thanks for the reply, ok but I dont want to use textConnection and paste
  each line but want the input to be read from a file like
 
 m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric').
  So how do I incorporate that in your code.
  Thanks,
  Suji
  On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote:
 
  The solution that I sent will handle the 150 different samples; just
  list the column names in the argument to the top 'lapply'.  You don't
  need the 'rle' in my approach.
 
  On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
   Hi,
   This is the code that I wrote for 3 samples:
   code:
 
  
 m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
  
  
  
 s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
  
   names(s)=c(Values,Probes)
  
  
  
 c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
   G=1
   n=4
  
   for(i in 1:length(s$Probes)){
  
   + if(G==1){c[i,1]-names(m[n])
   + c[i,2]-unique(m$Chr[G:s$Probes[i]])
   + c[i,3]-min(m$Start[G:s$Probes[i]])
   + c[i,4]-max(m$End[G:s$Probes[i]])
   + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
  
   + G=(G+s$Probes[i])}
   + else if((G-1)  length(m$Sample1)) {
  
   + c[i,1]-names(m[n])
   + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
   + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
   + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
   + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
  
   + G=(G+s$Probes[i])}
   + else {
   + G=1
  
   + n=n+1
   +  c[i,1]-names(m[n])
   + c[i,2]-unique(m$Chr[G:s$Probes[i]])
   + c[i,3]-min(m$Start[G:s$Probes[i]])
   + c[i,4]-max(m$End[G:s$Probes[i]])
   + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
  
   + G=(G+s$Probes[i])}}
  
   c
  
   Sample  ChrStart  End Values Probes
  
   1  Sample1 chr2  9896633 14404502  0  4
   2  Sample1 chr2 14421718 16048724  -0.43  4
   3  Sample1 chr2 37491676 37703009  0  2
   4  Sample2 chr2  9896633  9896690  0  2
   5  Sample2 chr2 14314039 16048724  -0.35  6
   6  Sample2 chr2 37491676 37703009  0  2
   7  Sample3 chr2  9896633 14314098  0  3
   8  Sample3 chr2 14404467 16031769   0.32  3
   9  Sample3 chr2 16036178 37491735   0.45  3
   10 Sample3 chr2 37702947 37703009  0  1
  
  
   The problem that I am facing is for expanding rle function for values
   and
   probes.
   Defintely your code looks simpler, but I would like to read the file
 by
   just
   giving the name of the file as written in my code because my original
   file
   contains 150 samples,but how to use lapply or rle function for 150
 such
   samples, if my file contain 150 samples similiar to sample1 and
 sample2.
  
   waiting for your reply,
   Thanks,
   Suji
  
   On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com
   wrote:
  
   Here one approach:
  
x - read.table(textConnection(Chr start end sample1 sample2
   + chr2 9896633 9896683 0 0
   + chr2 9896639 9896690 0 0
   + chr2 14314039 14314098 0 -0.35
   + chr2 14404467 14404502 0 -0.35
   + chr2 14421718 14421777 -0.43 -0.35
   + chr2 16031710 16031769 -0.43 -0.35
   + chr2 16036178 16036237 -0.43 -0.35
   + chr2 16048665 16048724 -0.43 -0.35
   + chr2 37491676 37491735 0 0
   + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
closeAllConnections()
   
result - lapply(c('sample1', 'sample2'), function(.samp){
   + # split by breaks in the values
   + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
   +
   + # combine the list of dataframes
   + .range - do.call(rbind, lapply(.grps, function(.set){
   + # create a dataframe of the results
   + data.frame(Sample = .samp
   +, Chr = .set$Chr[1L]
   +, Start = 

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread jim holtman
Here one approach:

 x - read.table(textConnection(Chr start end sample1 sample2
+ chr2 9896633 9896683 0 0
+ chr2 9896639 9896690 0 0
+ chr2 14314039 14314098 0 -0.35
+ chr2 14404467 14404502 0 -0.35
+ chr2 14421718 14421777 -0.43 -0.35
+ chr2 16031710 16031769 -0.43 -0.35
+ chr2 16036178 16036237 -0.43 -0.35
+ chr2 16048665 16048724 -0.43 -0.35
+ chr2 37491676 37491735 0 0
+ chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
 closeAllConnections()

 result - lapply(c('sample1', 'sample2'), function(.samp){
+ # split by breaks in the values
+ .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
+
+ # combine the list of dataframes
+ .range - do.call(rbind, lapply(.grps, function(.set){
+ # create a dataframe of the results
+ data.frame(Sample = .samp
+, Chr = .set$Chr[1L]
+, Start = min(.set$start)
+, End = max(.set$end)
+, Values = .set[[.samp]][1L]
+, Probes = nrow(.set)
+)
+ }))
+ })
 # put the list of dataframes together
 result - do.call(rbind, result)
 result
Sample  ChrStart  End Values Probes
0  sample1 chr2  9896633 14404502   0.00  4
1  sample1 chr2 14421718 16048724  -0.43  4
2  sample1 chr2 37491676 37703009   0.00  2
01 sample2 chr2  9896633  9896690   0.00  2
11 sample2 chr2 14314039 16048724  -0.35  6
21 sample2 chr2 37491676 37703009   0.00  2



On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
 Hi group,

 This is how my test file looks like:
 Chr start end sample1 sample2
 chr2 9896633 9896683 0 0
 chr2 9896639 9896690 0 0
 chr2 14314039 14314098 0 -0.35
 chr2 14404467 14404502 0 -0.35
 chr2 14421718 14421777 -0.43 -0.35
 chr2 16031710 16031769 -0.43 -0.35
 chr2 16036178 16036237 -0.43 -0.35
 chr2 16048665 16048724 -0.43 -0.35
 chr2 37491676 37491735 0 0
 chr2 37702947 37703009 0 0

 This is the output that I am expecting:
 Sample Chr Start End Values Probes
 sample1 chr2 9896633 14404502 0 4
 sample1 chr2 14421718 16048724 -0.43 4
 sample1 chr2 37491676 37703001 0 2
 sample2 chr2 9896633 9896690 0 2
 sample2  chr2 14314039 16048724 -0.35 6
 sample2 chr2 37491676 37703009 0 2

 Here the Chr value is same but can be any other value aswell so unique among
 the similar values. The Start for the first line would be the least value
 until values are similiar (4) then the end would be highest value. The
 values is the unique value among the common values and probes is number of
 similar values.

 Code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
 #reading the test file
s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
 # to get the last 2 columns
 names(s)=c(Values,Probes)
G=1
 for(i in 1:length(s$Probes)){
 + if(G==1){first-unique(m$Chr[G:s$Probes[i]])
 + second-min(m$Start[G:s$Probes[i]])
 + third-max(m$End[G:s$Probes[i]])
 + c-cbind(first,second,third,s$Values[i],s$Probes[i])
 + print (c)
 + G=(G+s$Probes[i])}
 + else if((G-1)  length(m$Sample1)) {
 + first-unique(m$Chr[G:(G+s$Probes[i]-1)])
 + second-min(m$Start[G:(G+s$Probes[i]-1)])
 + third-max(m$End[G:(G+s$Probes[i]-1)])
 + c-cbind(first,second,third,s$Values[i],s$Probes[i])
 + print (c)
 + G=(G+s$Probes[i])}
 + else {
 + G=1
 + first-unique(m$Chr[G:s$Probes[i]])
 + second-min(m$Start[G:s$Probes[i]])
 + third-max(m$End[G:s$Probes[i]])
 + c-cbind(first,second,third,s$Values[i],s$Probes[i])
 + print (c)
 + G=(G+s$Probes[i])}
 + }
 so the output is:
     first  second    third
 [1,] chr2 9896633 14404502 0 4
     first  second     third
 [1,] chr2 14421718 16048724 -0.43 4
     first  second     third
 [1,] chr2 37491676 37703009 0 2
     first  second    third
 [1,] chr2 9896633 9896690 0 2
     first  second     third
 [1,] chr2 14314039 16048724 -0.35 6
     first  second     third
 [1,] chr2 37491676 37703009 0 2

 I get almost the required output but just need 3 modifications to this code:
 1) Since this is just a small part of the file (with 2 samples), but my
 actual file has 150 samples, so how do I write rle function for that?
 2) How do I store all the executed c values as a dataframe (here I am just
 printing the values)?
 3) How do I include sample name in execution?
 Waiting for your reply ,
 Thanks,
 Suji


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?


Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread jim holtman
The solution that I sent will handle the 150 different samples; just
list the column names in the argument to the top 'lapply'.  You don't
need the 'rle' in my approach.

On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
 Hi,
 This is the code that I wrote for 3 samples:
 code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))

 s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))

 names(s)=c(Values,Probes)

 c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
 G=1
 n=4

 for(i in 1:length(s$Probes)){

 + if(G==1){c[i,1]-names(m[n])
 + c[i,2]-unique(m$Chr[G:s$Probes[i]])
 + c[i,3]-min(m$Start[G:s$Probes[i]])
 + c[i,4]-max(m$End[G:s$Probes[i]])
 + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

 + G=(G+s$Probes[i])}
 + else if((G-1)  length(m$Sample1)) {

 + c[i,1]-names(m[n])
 + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
 + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
 + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
 + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

 + G=(G+s$Probes[i])}
 + else {
 + G=1

 + n=n+1
 +  c[i,1]-names(m[n])
 + c[i,2]-unique(m$Chr[G:s$Probes[i]])
 + c[i,3]-min(m$Start[G:s$Probes[i]])
 + c[i,4]-max(m$End[G:s$Probes[i]])
 + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

 + G=(G+s$Probes[i])}}

 c

     Sample  Chr    Start  End Values Probes

 1  Sample1 chr2  9896633 14404502  0  4
 2  Sample1 chr2 14421718 16048724  -0.43  4
 3  Sample1 chr2 37491676 37703009  0  2
 4  Sample2 chr2  9896633  9896690  0  2
 5  Sample2 chr2 14314039 16048724  -0.35  6
 6  Sample2 chr2 37491676 37703009  0  2
 7  Sample3 chr2  9896633 14314098  0  3
 8  Sample3 chr2 14404467 16031769   0.32  3
 9  Sample3 chr2 16036178 37491735   0.45  3
 10 Sample3 chr2 37702947 37703009  0  1


 The problem that I am facing is for expanding rle function for values and
 probes.
 Defintely your code looks simpler, but I would like to read the file by just
 giving the name of the file as written in my code because my original file
 contains 150 samples,but how to use lapply or rle function for 150 such
 samples, if my file contain 150 samples similiar to sample1 and sample2.

 waiting for your reply,
 Thanks,
 Suji

 On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote:

 Here one approach:

  x - read.table(textConnection(Chr start end sample1 sample2
 + chr2 9896633 9896683 0 0
 + chr2 9896639 9896690 0 0
 + chr2 14314039 14314098 0 -0.35
 + chr2 14404467 14404502 0 -0.35
 + chr2 14421718 14421777 -0.43 -0.35
 + chr2 16031710 16031769 -0.43 -0.35
 + chr2 16036178 16036237 -0.43 -0.35
 + chr2 16048665 16048724 -0.43 -0.35
 + chr2 37491676 37491735 0 0
 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
  closeAllConnections()
 
  result - lapply(c('sample1', 'sample2'), function(.samp){
 +     # split by breaks in the values
 +     .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
 +
 +     # combine the list of dataframes
 +     .range - do.call(rbind, lapply(.grps, function(.set){
 +         # create a dataframe of the results
 +         data.frame(Sample = .samp
 +                    , Chr = .set$Chr[1L]
 +                    , Start = min(.set$start)
 +                    , End = max(.set$end)
 +                    , Values = .set[[.samp]][1L]
 +                    , Probes = nrow(.set)
 +                    )
 +         }))
 +     })
  # put the list of dataframes together
  result - do.call(rbind, result)
  result
    Sample  Chr    Start      End Values Probes
 0  sample1 chr2  9896633 14404502   0.00      4
 1  sample1 chr2 14421718 16048724  -0.43      4
 2  sample1 chr2 37491676 37703009   0.00      2
 01 sample2 chr2  9896633  9896690   0.00      2
 11 sample2 chr2 14314039 16048724  -0.35      6
 21 sample2 chr2 37491676 37703009   0.00      2
 


 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
  Hi group,
 
  This is how my test file looks like:
  Chr start end sample1 sample2
  chr2 9896633 9896683 0 0
  chr2 9896639 9896690 0 0
  chr2 14314039 14314098 0 -0.35
  chr2 14404467 14404502 0 -0.35
  chr2 14421718 14421777 -0.43 -0.35
  chr2 16031710 16031769 -0.43 -0.35
  chr2 16036178 16036237 -0.43 -0.35
  chr2 16048665 16048724 -0.43 -0.35
  chr2 37491676 37491735 0 0
  chr2 37702947 37703009 0 0
 
  This is the output that I am expecting:
  Sample Chr Start End Values Probes
  sample1 chr2 9896633 14404502 0 4
  sample1 chr2 14421718 16048724 -0.43 4
  sample1 chr2 37491676 37703001 0 2
  sample2 chr2 9896633 9896690 0 2
  sample2  chr2 14314039 16048724 -0.35 6
  sample2 chr2 37491676 37703009 0 2
 
  Here the Chr value is same but 

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread viritha k
Hi,
This is the code that I wrote for 3 samples:
code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))

s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))

 names(s)=c(Values,Probes)

c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
 G=1
 n=4

 for(i in 1:length(s$Probes)){

+ if(G==1){c[i,1]-names(m[n])
+ c[i,2]-unique(m$Chr[G:s$Probes[i]])
+ c[i,3]-min(m$Start[G:s$Probes[i]])
+ c[i,4]-max(m$End[G:s$Probes[i]])
+ c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

+ G=(G+s$Probes[i])}
+ else if((G-1)  length(m$Sample1)) {

+ c[i,1]-names(m[n])
+ c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
+ c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
+ c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
+ c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

+ G=(G+s$Probes[i])}
+ else {
+ G=1

+ n=n+1
+  c[i,1]-names(m[n])
+ c[i,2]-unique(m$Chr[G:s$Probes[i]])
+ c[i,3]-min(m$Start[G:s$Probes[i]])
+ c[i,4]-max(m$End[G:s$Probes[i]])
+ c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

+ G=(G+s$Probes[i])}}

 c
Sample  ChrStart  End Values Probes

1  Sample1 chr2  9896633 14404502  0  4
2  Sample1 chr2 14421718 16048724  -0.43  4
3  Sample1 chr2 37491676 37703009  0  2
4  Sample2 chr2  9896633  9896690  0  2
5  Sample2 chr2 14314039 16048724  -0.35  6
6  Sample2 chr2 37491676 37703009  0  2
7  Sample3 chr2  9896633 14314098  0  3
8  Sample3 chr2 14404467 16031769   0.32  3
9  Sample3 chr2 16036178 37491735   0.45  3
10 Sample3 chr2 37702947 37703009  0  1


The problem that I am facing is for expanding rle function for values and
probes.
Defintely your code looks simpler, but I would like to read the file by just
giving the name of the file as written in my code because my original file
contains 150 samples,but how to use lapply or rle function for 150 such
samples, if my file contain 150 samples similiar to sample1 and sample2.
waiting for your reply,
Thanks,
Suji

On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote:

 Here one approach:

  x - read.table(textConnection(Chr start end sample1 sample2
 + chr2 9896633 9896683 0 0
 + chr2 9896639 9896690 0 0
 + chr2 14314039 14314098 0 -0.35
 + chr2 14404467 14404502 0 -0.35
 + chr2 14421718 14421777 -0.43 -0.35
 + chr2 16031710 16031769 -0.43 -0.35
 + chr2 16036178 16036237 -0.43 -0.35
 + chr2 16048665 16048724 -0.43 -0.35
 + chr2 37491676 37491735 0 0
 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
  closeAllConnections()
 
  result - lapply(c('sample1', 'sample2'), function(.samp){
 + # split by breaks in the values
 + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
 +
 + # combine the list of dataframes
 + .range - do.call(rbind, lapply(.grps, function(.set){
 + # create a dataframe of the results
 + data.frame(Sample = .samp
 +, Chr = .set$Chr[1L]
 +, Start = min(.set$start)
 +, End = max(.set$end)
 +, Values = .set[[.samp]][1L]
 +, Probes = nrow(.set)
 +)
 + }))
 + })
  # put the list of dataframes together
  result - do.call(rbind, result)
  result
Sample  ChrStart  End Values Probes
 0  sample1 chr2  9896633 14404502   0.00  4
 1  sample1 chr2 14421718 16048724  -0.43  4
 2  sample1 chr2 37491676 37703009   0.00  2
 01 sample2 chr2  9896633  9896690   0.00  2
 11 sample2 chr2 14314039 16048724  -0.35  6
 21 sample2 chr2 37491676 37703009   0.00  2
 


 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
  Hi group,
 
  This is how my test file looks like:
  Chr start end sample1 sample2
  chr2 9896633 9896683 0 0
  chr2 9896639 9896690 0 0
  chr2 14314039 14314098 0 -0.35
  chr2 14404467 14404502 0 -0.35
  chr2 14421718 14421777 -0.43 -0.35
  chr2 16031710 16031769 -0.43 -0.35
  chr2 16036178 16036237 -0.43 -0.35
  chr2 16048665 16048724 -0.43 -0.35
  chr2 37491676 37491735 0 0
  chr2 37702947 37703009 0 0
 
  This is the output that I am expecting:
  Sample Chr Start End Values Probes
  sample1 chr2 9896633 14404502 0 4
  sample1 chr2 14421718 16048724 -0.43 4
  sample1 chr2 37491676 37703001 0 2
  sample2 chr2 9896633 9896690 0 2
  sample2  chr2 14314039 16048724 -0.35 6
  sample2 chr2 37491676 37703009 0 2
 
  Here the Chr value is same but can be any other value aswell so unique
 among
  the similar values. The Start for the first line would be the least value
  until values are similiar (4) then the end would be highest value. The
  values is the unique value among the common values and probes is number
 of
  similar values.
 

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread viritha k
Hi Jim,
 Thanks for the reply, ok but I dont want to use textConnection and paste
each line but want the input to be read from a file like
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric').
So how do I incorporate that in your code.
Thanks,
Suji
On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote:

 The solution that I sent will handle the 150 different samples; just
 list the column names in the argument to the top 'lapply'.  You don't
 need the 'rle' in my approach.

 On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
  Hi,
  This is the code that I wrote for 3 samples:
  code:

 m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
 
 
 s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
 
  names(s)=c(Values,Probes)
 
 
 c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
  G=1
  n=4
 
  for(i in 1:length(s$Probes)){
 
  + if(G==1){c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else if((G-1)  length(m$Sample1)) {
 
  + c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
  + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
  + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else {
  + G=1
 
  + n=n+1
  +  c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}}
 
  c
 
  Sample  ChrStart  End Values Probes
 
  1  Sample1 chr2  9896633 14404502  0  4
  2  Sample1 chr2 14421718 16048724  -0.43  4
  3  Sample1 chr2 37491676 37703009  0  2
  4  Sample2 chr2  9896633  9896690  0  2
  5  Sample2 chr2 14314039 16048724  -0.35  6
  6  Sample2 chr2 37491676 37703009  0  2
  7  Sample3 chr2  9896633 14314098  0  3
  8  Sample3 chr2 14404467 16031769   0.32  3
  9  Sample3 chr2 16036178 37491735   0.45  3
  10 Sample3 chr2 37702947 37703009  0  1
 
 
  The problem that I am facing is for expanding rle function for values and
  probes.
  Defintely your code looks simpler, but I would like to read the file by
 just
  giving the name of the file as written in my code because my original
 file
  contains 150 samples,but how to use lapply or rle function for 150 such
  samples, if my file contain 150 samples similiar to sample1 and sample2.
 
  waiting for your reply,
  Thanks,
  Suji
 
  On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com
 wrote:
 
  Here one approach:
 
   x - read.table(textConnection(Chr start end sample1 sample2
  + chr2 9896633 9896683 0 0
  + chr2 9896639 9896690 0 0
  + chr2 14314039 14314098 0 -0.35
  + chr2 14404467 14404502 0 -0.35
  + chr2 14421718 14421777 -0.43 -0.35
  + chr2 16031710 16031769 -0.43 -0.35
  + chr2 16036178 16036237 -0.43 -0.35
  + chr2 16048665 16048724 -0.43 -0.35
  + chr2 37491676 37491735 0 0
  + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
   closeAllConnections()
  
   result - lapply(c('sample1', 'sample2'), function(.samp){
  + # split by breaks in the values
  + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
  +
  + # combine the list of dataframes
  + .range - do.call(rbind, lapply(.grps, function(.set){
  + # create a dataframe of the results
  + data.frame(Sample = .samp
  +, Chr = .set$Chr[1L]
  +, Start = min(.set$start)
  +, End = max(.set$end)
  +, Values = .set[[.samp]][1L]
  +, Probes = nrow(.set)
  +)
  + }))
  + })
   # put the list of dataframes together
   result - do.call(rbind, result)
   result
 Sample  ChrStart  End Values Probes
  0  sample1 chr2  9896633 14404502   0.00  4
  1  sample1 chr2 14421718 16048724  -0.43  4
  2  sample1 chr2 37491676 37703009   0.00  2
  01 sample2 chr2  9896633  9896690   0.00  2
  11 sample2 chr2 14314039 16048724  -0.35  6
  21 sample2 chr2 37491676 37703009   0.00  2
  
 
 
  On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
   Hi group,
  
   This is how my test file looks like:
   Chr start end sample1 sample2
   chr2 9896633 9896683 0 0
   chr2 9896639 9896690 0 0
   chr2 14314039 14314098 0 -0.35
   chr2 14404467 14404502 0 -0.35
   chr2 14421718 14421777 -0.43 -0.35
  

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread jim holtman
I only used textConnection for the sample data.  Just put your file
name in the read.table; e.g.,

x-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))

as you have in your email.  I used 'x' in my code, so I replaced your
'm' with 'x'.

Try it and see if it works; no reason it shouldn't.



On Wed, Sep 28, 2011 at 3:03 PM, viritha k virith...@gmail.com wrote:
 Hi Jim,
  Thanks for the reply, ok but I dont want to use textConnection and paste
 each line but want the input to be read from a file like
 m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric').
 So how do I incorporate that in your code.
 Thanks,
 Suji
 On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote:

 The solution that I sent will handle the 150 different samples; just
 list the column names in the argument to the top 'lapply'.  You don't
 need the 'rle' in my approach.

 On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
  Hi,
  This is the code that I wrote for 3 samples:
  code:

  m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
 
 
  s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
 
  names(s)=c(Values,Probes)
 
 
  c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
  G=1
  n=4
 
  for(i in 1:length(s$Probes)){
 
  + if(G==1){c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else if((G-1)  length(m$Sample1)) {
 
  + c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
  + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
  + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else {
  + G=1
 
  + n=n+1
  +  c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}}
 
  c
 
      Sample  Chr    Start  End Values Probes
 
  1  Sample1 chr2  9896633 14404502  0  4
  2  Sample1 chr2 14421718 16048724  -0.43  4
  3  Sample1 chr2 37491676 37703009  0  2
  4  Sample2 chr2  9896633  9896690  0  2
  5  Sample2 chr2 14314039 16048724  -0.35  6
  6  Sample2 chr2 37491676 37703009  0  2
  7  Sample3 chr2  9896633 14314098  0  3
  8  Sample3 chr2 14404467 16031769   0.32  3
  9  Sample3 chr2 16036178 37491735   0.45  3
  10 Sample3 chr2 37702947 37703009  0  1
 
 
  The problem that I am facing is for expanding rle function for values
  and
  probes.
  Defintely your code looks simpler, but I would like to read the file by
  just
  giving the name of the file as written in my code because my original
  file
  contains 150 samples,but how to use lapply or rle function for 150 such
  samples, if my file contain 150 samples similiar to sample1 and sample2.
 
  waiting for your reply,
  Thanks,
  Suji
 
  On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com
  wrote:
 
  Here one approach:
 
   x - read.table(textConnection(Chr start end sample1 sample2
  + chr2 9896633 9896683 0 0
  + chr2 9896639 9896690 0 0
  + chr2 14314039 14314098 0 -0.35
  + chr2 14404467 14404502 0 -0.35
  + chr2 14421718 14421777 -0.43 -0.35
  + chr2 16031710 16031769 -0.43 -0.35
  + chr2 16036178 16036237 -0.43 -0.35
  + chr2 16048665 16048724 -0.43 -0.35
  + chr2 37491676 37491735 0 0
  + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
   closeAllConnections()
  
   result - lapply(c('sample1', 'sample2'), function(.samp){
  +     # split by breaks in the values
  +     .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
  +
  +     # combine the list of dataframes
  +     .range - do.call(rbind, lapply(.grps, function(.set){
  +         # create a dataframe of the results
  +         data.frame(Sample = .samp
  +                    , Chr = .set$Chr[1L]
  +                    , Start = min(.set$start)
  +                    , End = max(.set$end)
  +                    , Values = .set[[.samp]][1L]
  +                    , Probes = nrow(.set)
  +                    )
  +         }))
  +     })
   # put the list of dataframes together
   result - do.call(rbind, result)
   result
     Sample  Chr    Start      End Values Probes
  0  sample1 chr2  9896633 14404502   0.00      4
  1  sample1 chr2 14421718 16048724  -0.43      4
  2  sample1 chr2 37491676 37703009   0.00      2
  01 sample2 chr2  9896633  9896690   0.00