Spyros Charonis wrote: > Hello Pythoners, > > I am trying to extract certain fields from a file that whose text looks like > this: > > COMPND 2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4; > COMPND 3 CHAIN: A, B; > > COMPND 10 MOL_ID: 2; > COMPND 11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN; > COMPND 12 CHAIN: D, F; > COMPND 13 ENGINEERED: YES; > COMPND 14 MOL_ID: 3; > COMPND 15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN; > COMPND 16 CHAIN: E, G; > > I would like the chain IDs, but only those following the text heading > "ANTIBODY FAB FRAGMENT", i.e. I > need to create a list with D,F,E,G which excludes A,B which have a > non-antibody text heading. I am > using the following syntax: > > with open(filename) as file: > scanfile=file.readlines() > for line in scanfile: > if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue > elif line[0:6]=='COMPND' and 'CHAIN' in line: > print line
There is no reason to use readlines in this example, just iterate over the file object directly. with open(filename) as file: for line in file: if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue elif line[0:6]=='COMPND' and 'CHAIN' in line: print line > > But this yields: > > COMPND 3 CHAIN: A, B; > COMPND 12 CHAIN: D, F; > COMPND 16 CHAIN: E, G; > > I would like to ignore the first line since A,B correspond to non-antibody > text headings, and instead > want to extract only D,F & E,G whose text headings are specified as antibody > fragments. > > Many thanks, > Spyros > Will 'FAB FRAGMENT' always be the line before 'CHAIN'? If so, then just keep track of the previous line. >>> raw 'COMPND 2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;\nCOMPND 3 CHAIN: A, B;\nCOMPND 10 MOL_ID: 2;\nCOMPND 11 MOLECULE: \ ANTIBODY FAB FRAGMENT LIGHT CHAIN;\nCOMPND 12 CHAIN: D, F;\nCOMPND 13 ENGINEERED: YES;\nCOMPND 14 MOL_ID: 3;\nCOMPND 15 MOLECULE\ : ANTIBODY FAB FRAGMENT HEAVY CHAIN;\nCOMPND 16 CHAIN: E, G;' >>> prev = '' >>> chains = [] >>> for line in raw.split('\n'): ... if 'COMPND' in prev and 'FAB FRAGMENT' in prev and 'CHAIN' in line: ... chains.extend( line.split(':')[1].replace(',','').replace(';','').split()) ... prev = line ... >>> chains ['D', 'F', 'E', 'G'] This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor