Hi, I would suggest you to use the biopython package. It has a PDB parser with which you can extract any specific information like atom name, residue, chain etc as you wish. Bala
On Wed, May 9, 2012 at 3:19 AM, Jerry Hill <[email protected]> wrote: > On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <[email protected]> > wrote: > > Hello python community, > > > > I'm having a small issue with list indexing. I am extracting certain > > information from a PDB (protein information) file and need certain > fields of > > the file to be copied into a list. The entries look like this: > > > > ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89 > > N > > ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64 > > C > > ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32 > > C > > > > I am using the following syntax to parse these lines into a list: > ... > > charged_res_coord.append(atom_coord[i].split()[1:9]) > > You're using split, assuming that there will be blank spaces between > your fields. That's not true, though. PDB is a fixed length record > format, according to the documentation I found here: > http://www.wwpdb.org/docs.html > > If you just have a couple of items to pull out, you can just slice the > string at the appropriate places. Based on those docs, you could pull > the x, y, and z coordinates out like this: > > > x_coord = atom_line[30:38] > y_coord = atom_line[38:46] > z_coord = atom_line[46:54] > > If you need to pull more of the data out, or you may want to reuse > this code in the future, it might be worth actually parsing the record > into all its parts. For a fixed length record, I usually do something > like this: > > pdbdata = """ > ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89 > N > ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64 > C > ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32 > C > ATOM 1617 N GLU A1005 11.906 -2.722 7.994 1.00 44.02 > N > """.splitlines() > > atom_field_spec = [ > slice(0,6), > slice(6,11), > slice(12,16), > slice(16,18), > slice(17,20), > slice(21,22), > slice(22,26), > slice(26,27), > slice(30,38), > slice(38,46), > slice(46,54), > slice(54,60), > slice(60,66), > slice(76,78), > slice(78,80), > ] > > for line in pdbdata: > if line.startswith('ATOM'): > data = [line[field_spec] for field_spec in atom_field_spec] > print(data) > > > You can build all kind of fancy data structures on top of that if you > want to. You could use that extracted data to build a namedtuple for > convenient access to the data by names instead of indexes into a list, > or to create instances of a custom class with whatever functionality > you need. > > -- > Jerry > _______________________________________________ > Tutor maillist - [email protected] > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- C. Balasubramanian
_______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
