[R] read only certain parts of a file

2007-10-09 Thread João Fadista
Dear all,
 
I would like to know how can I read a text file and create a data frame of only 
certain parts of the file.
For instance, from this text file:
 
===

Matches For Query 0 (108 bases): 19_0070

===

Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

89 19_0070 Chr15 3 108 43251883 43251778 C 106 95.28 

88 19_0070 Chr1 4 108 85826948 85826844 C 105 95.24 

===

Matches For Query 1 (124 bases): 24_1262

===

Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

99 24_1262 Chr6 16 124 35738256 35738364 F 109 100.00 

 

I would like to have a data frame that has only:

Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

89 19_0070 Chr15 3 108 43251883 43251778 C 106 95.28 

88 19_0070 Chr1 4 108 85826948 85826844 C 105 95.24 

99 24_1262 Chr6 16 124 35738256 35738364 F 109 100.00 

 
 
Best regards,
João Fadista

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read only certain parts of a file

2007-10-09 Thread Gabor Grothendieck
Here are two possibilities.  The first extracts all lines with 10 fields
and then takes unique ones while the second extracts all lines that
consist only of alphanumerics, space, underscore and period and then
also takes unique lines.  Both then read the result using read.table.

The first one assumes the garbage never has 10 fields and the second
one assumes the garbage always has a character not in the set indicated.
You can probably come up with other rules as well along these lines.


Lines.raw - ===

Matches For Query 0 (108 bases): 19_0070

===

Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

89 19_0070 Chr15 3 108 43251883 43251778 C 106 95.28

88 19_0070 Chr1 4 108 85826948 85826844 C 105 95.24

===

Matches For Query 1 (124 bases): 24_1262

===

Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

99 24_1262 Chr6 16 124 35738256 35738364 F 109 100.00


Lines - readLines(textConnection(Lines.raw))
Lines - unique(grep(^[[:alnum:] ._]+$, Lines, value = TRUE))
read.table(textConnection(Lines), header = TRUE)

# or

Lines - readLines(textConnection(Lines.raw))
idx - count.fields(textConnection(Lines.raw), blank.lines.skip = FALSE)
Lines - unique(Lines[idx == 10])
read.table(textConnection(Lines), header = TRUE)


On 10/9/07, João Fadista [EMAIL PROTECTED] wrote:
 Dear all,

 I would like to know how can I read a text file and create a data frame of 
 only certain parts of the file.
 For instance, from this text file:

 ===

 Matches For Query 0 (108 bases): 19_0070

 ===

 Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

 89 19_0070 Chr15 3 108 43251883 43251778 C 106 95.28

 88 19_0070 Chr1 4 108 85826948 85826844 C 105 95.24

 ===

 Matches For Query 1 (124 bases): 24_1262

 ===

 Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

 99 24_1262 Chr6 16 124 35738256 35738364 F 109 100.00



 I would like to have a data frame that has only:

 Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

 89 19_0070 Chr15 3 108 43251883 43251778 C 106 95.28

 88 19_0070 Chr1 4 108 85826948 85826844 C 105 95.24

 99 24_1262 Chr6 16 124 35738256 35738364 F 109 100.00



 Best regards,
 João Fadista

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.