Re: [R] problem with reading data files with different numbers oflines to skip

john seers \(IFR\) Fri, 03 Aug 2007 05:44:25 -0700

Hi Tom

It looks as if you are reading in genepix files. I believe the format for the 
start lines includes a second line to say how many lines to skip. Something 
like this, specifying 27 lines to skip:


ATF     1
27      43
Type=GenePix Results 1.4        
DateTime=2003/11/14 17:18:30    

If so here is a function I use to do what you want to do. If your files have a 
different format then you need to modify how you set the number of lines to 
skip.



# Preprocess the genepix files - strip off first header lines
dopix<-function(genepixfiles, workingdir) {
    pre<-"Pre"
    # Read in each genepix file, strip unwanted rows and write out again
    for (pixfile in genepixfiles) {
        pixfileout<-paste(workingdir, pre, basename(pixfile), sep="")
        secondline<-read.table(pixfile, skip=1, nrows=1)
        skiplines<-as.numeric(secondline[1]) + 2
        outdf<-read.table(pixfile, header=T, skip=skiplines, sep="\t")
        write.table(outdf, file=pixfileout, sep="\t", row.names=FALSE)
    }
}


Regards

John Seers


 -----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Cohen
Sent: 03 August 2007 13:04
To: r-help@stat.math.ethz.ch
Subject: Re: [R] problem with reading data files with different numbers oflines 
to skip

Thanks to Ted and Gabor for your response. 
  I apology for not being clear with my previous description of the problem. I 
tried with  your suggestions using readLines but couldn't make it work. I now 
explain the problem in more details and hope that you can help me out.
   
   I have 30 data files, where some of them have 33 lines and the rests have 31 
lines that I want to skip (examples below with bold text). That is, I only want 
to keep the lines start from 
          Block  Column  Row  Name  ID
   
  I read in the data files with a loop like below, the problem is how do I tell 
the loop to skip 31 lines in some data files and 33 in the rests ?
   
  > for (i in 1:num.files) {
> a<-read.table(file=data[i], 
> ,header=T,skip=31,sep='\t',na.strings="NA")  }
   
  Thanks for your help,
  Tom
   
  # 33 lines to skip
   
            Type=GenePix Results 3                  DateTime=2006/10/20 
13:35:11                Settings=                      
GalFile=G:\Avdelningar\viv\translational 
immunologi\Peptide-arrays\Gal-files\742-human-pep2.gal    PixelSize=10          
          Wavelengths=635                    
ImageFiles=M:\Peptidearrays\061020\742-2.tif 1              
NormalizationMethod=None                  NormalizationFactors=1                
  JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human 
pep,742\742-2.s1.jpg    StdDev=Type 1                    
RatioFormulations=W1/W2 (635/)                FeatureType=Circular              
    Barcode=                      BackgroundSubtraction=LocalFeature            
    ImageOrigin=560, 1360                  JpegOrigin=1940, 3670                
  Creator=GenePix Pro 6.0.1.25                  Scanner=GenePix 4000B [84948]   
             FocusPosition=0                    Temperature=30.2                
    LinesAveraged=1      
              Comment=                    PMTGain=600                    
ScanPower=100                    LaserPower=3.36                    
Filters=<Empty>                    ScanRegion=56,136,2123,6532                  
Supplier=Genetix Ltd.                  ArrayerSoftwareName=MicroArraying        
        ArrayerSoftwareVersion=QSoft XP Build 6450 (Revision 131)            
Block  Column  Row  Name  ID  X  Y  Dia.  F635 Median  F635 Mean    1  1  1  
IgG-human  none  2390  4140  200  301  317    1  2  1  >PGDR_HUMAN (P09619)  
AHASDEIYEIMQK  2630  4140  200  254  250    1  3  1  >ML1X_HUMAN (Q13585)  
AIAHPVSDDSDLP  2860  4140  200  268  252   
  1000 more rows....
   
   
   
  # 31 lines to skip
   
              ATF  1.0                    29  41                    
Type=GenePix Results 3                  DateTime=2006/10/20 13:05:20            
    Settings=                      GalFile=G:\Avdelningar\viv\translational 
immunologi\Peptide-arrays\Gal-files\742-s2.gal      PixelSize=10                
    Wavelengths=635                    
ImageFiles=M:\Peptidearrays\061020\742-4.tif 1              
NormalizationMethod=None                  NormalizationFactors=1                
  JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human 
pep,742\742-4.s2.jpg    StdDev=Type 1                    
RatioFormulations=W1/W2 (635/)                FeatureType=Circular              
    Barcode=                      BackgroundSubtraction=LocalFeature            
    ImageOrigin=560, 1360                  JpegOrigin=1950, 24310               
   Creator=GenePix Pro 6.0.1.25                  Scanner=GenePix 4000B [84948]  
              FocusPosition=0                   
 Temperature=28.49                    LinesAveraged=1                    
Comment=                    PMTGain=600                    ScanPower=100        
            LaserPower=3.32                    Filters=<Empty>                  
  ScanRegion=56,136,2113,6532                  Supplier=                      
Block  Column  Row  Name  ID  X  Y  Dia.  F635 Median  F635 Mean    1  1  1  
IgG-human  none  2370  24780  200  133  175    1  2  1  >PGDR_HUMAN (P09619)  
AHASDEIYEIMQK  2600  24780  200  120  121    1  3  1  >ML1X_HUMAN (Q13585)  
AIAHPVSDDSDLP  2840  24780  200  120  118
  1000 more rows....
   
  
[EMAIL PROTECTED] skrev:
  On 02-Aug-07 21:14:20, Tom Cohen wrote:
> Dear List,
> 
> I have 30 data files with different numbers of lines (31 and 33) that 
> I want to skip before reading the files. If I use the skip option I 
> can only choose either to skip 31 or 33 lines. The data files with 31 
> lines have no blank rows between the lines and the header row. How can 
> I read the files without manually checking which files have 31 
> respectively 33 lines ? The only text line I want to keep is the header.
> 
> Thamks for your help,
> Tom
> 
> 
> for (i in 1:num.files) {
> a<-read.table(file=data[i],
> ,header=T,skip=31,sep='\t',na.strings="NA")
> 
> }

Apologies, I misunderstood your description in my previous response (I thought 
that the total number of lines in one of your files was either 31 or 33, and 
you wanted to know which was which).

I now think you mean that there are either 0 (you want to skip 31) or 2 (you 
want to skip 33) blank lines in the first 33, and then you want the remainder 
(aswell as the header). Though it's still not really clear ...

You can find out how many blank lines there are in the first 33 with

> sum(cbind(readLines("~/00_junk/temp.tr", 33))=="")

and then choose how many lines to skip.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding)
Fax-to-email: +44 (0)870 094 0861
Date: 03-Aug-07 Time: 00:11:21
------------------------------ XFMail ------------------------------


       
---------------------------------

Jämför pris på flygbiljetter och hotellrum: 
http://shopping.yahoo.se/c-169901-resor-biljetter.html
        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with reading data files with different numbers oflines to skip

Reply via email to