Re: [Tutor] Text Processing Query

Prasad, Ramit Thu, 14 Mar 2013 10:44:52 -0700

Spyros Charonis wrote:
> Hello Pythoners,
> 
> I am trying to extract certain fields from a file that whose text looks like 
> this:
> 
> COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
> COMPND   3 CHAIN: A, B;
> 
> COMPND  10 MOL_ID: 2;
> COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
> COMPND  12 CHAIN: D, F;
> COMPND  13 ENGINEERED: YES;
> COMPND  14 MOL_ID: 3;
> COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
> COMPND  16 CHAIN: E, G;
> 
> I would like the chain IDs, but only those following the text heading 
> "ANTIBODY FAB FRAGMENT", i.e. I
> need to create a list with D,F,E,G  which excludes A,B which have a 
> non-antibody text heading. I am
> using the following syntax:
> 
> with open(filename) as file:
>     scanfile=file.readlines()
>     for line in scanfile:
>         if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>         elif line[0:6]=='COMPND' and 'CHAIN' in line:
>             print line


There is no reason to use readlines in this example, just
iterate over the file object directly. 

 with open(filename) as file:
     for line in file:
         if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
         elif line[0:6]=='COMPND' and 'CHAIN' in line:
             print line


> 
> But this yields:
> 
> COMPND   3 CHAIN: A, B;
> COMPND  12 CHAIN: D, F;
> COMPND  16 CHAIN: E, G;
> 
> I would like to ignore the first line since A,B correspond to non-antibody 
> text headings, and instead
> want to extract only D,F & E,G whose text headings are specified as antibody 
> fragments.
> 
> Many thanks,
> Spyros
> 

Will 'FAB FRAGMENT' always be the line before 'CHAIN'? 
If so, then just keep track of the previous line. 

>>> raw
'COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;\nCOMPND   3 
CHAIN: A, B;\nCOMPND  10 MOL_ID: 2;\nCOMPND  11 MOLECULE: \
ANTIBODY FAB FRAGMENT LIGHT CHAIN;\nCOMPND  12 CHAIN: D, F;\nCOMPND  13 
ENGINEERED: YES;\nCOMPND  14 MOL_ID: 3;\nCOMPND  15 MOLECULE\
: ANTIBODY FAB FRAGMENT HEAVY CHAIN;\nCOMPND  16 CHAIN: E, G;'

>>> prev = ''
>>> chains = []
>>> for line in raw.split('\n'):
...     if 'COMPND' in prev and 'FAB FRAGMENT' in prev and 'CHAIN' in line:
...         chains.extend( 
line.split(':')[1].replace(',','').replace(';','').split())
...     prev = line
...     
>>> chains
['D', 'F', 'E', 'G']


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Text Processing Query

Reply via email to