Hi Roger, Thanks for the response
> The use of an integer file format "flavor" argument allows the caller to > customize the behavior of the readers and writers. The semantics is that a > reasonable default is zero (for all bits), but that new features may be added > without changing the API/ABI. > Most of the bits above (for the writer) control strict compliance with the PDB > format specification. For example, a flavor of 12 will write bond orders the > way the RCSB expects them both throwing away bond orders and increasing > the size of the PDB file. As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB using the following import requests url = "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VCI" response = requests.get(url) pdb_block = response.content response.close() pdb_block shows CONECT records only for the HETATM records. If I now read into RDKit, using the defaults, and write back out using the defaults, I see CONECT records for every atom (ie protein as well). And I can't see any double-bonds rendered in PyMOL: from rdkit import Chem from rdkit.Chem import AllChem pdb = Chem.MolFromPDBBlock(pdb_block) pdb_block_out = Chem.MolToPDBBlock(pdb) First 10 CONECT records of output: CONECT 1 2 CONECT 2 3 5 CONECT 3 4 4 10 CONECT 5 6 CONECT 6 7 CONECT 7 8 8 9 CONECT 10 11 CONECT 11 12 14 CONECT 12 13 13 17 CONECT 14 15 16 If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand CONECT records in what looks like the original format (albeit now numbered differently), and I still see CONECT records for the protein - but this PDB *will* render double bonds in PyMOL. First 10 CONECT records of output: CONECT 3 4 4 CONECT 7 8 8 CONECT 12 13 13 CONECT 19 20 20 CONECT 23 24 24 CONECT 28 29 29 CONECT 35 36 36 CONECT 38 39 39 CONECT 40 42 42 CONECT 41 43 43 And even if I use Chem.MolToPDBBlock(pdb, flavor=2) I still see CONECT records for every protein residue (and, again, I also see double bonds in PyMOL). First 10 CONECT records of output: CONECT 1 2 CONECT 2 1 3 5 CONECT 3 2 4 10 CONECT 4 3 CONECT 5 2 6 CONECT 6 5 7 CONECT 7 6 8 9 CONECT 8 7 CONECT 9 7 CONECT 10 3 11 Am I maybe doing something wrong with options in the reading step? > For the reader, the flavor argument controls whether alternate locations are > read (for use by PDB power users), or whether a sensible subset of atoms is > used for the RDKit::ROMol. Can you (or Greg) post a list of what the current input flavors do? > > 6. I also seem to be seeing all of the bonds (for all > > residues) being written out in CONECT records - such that they all > > appear as single bonds in eg PyMOL - is this expected behaviour at the > > moment? > > > > Another one for Roger. > > I believe this should work fine. RDKit's PDB file writer by default encodes > the > bond orders, which should be interpreted by PyMol. In the words of the late > great Warren: > http://www.phenix-online.org/pipermail/phenixbb/2008-April/012188.html > > We need to check where the bond orders are getting lost. If you read the > PDB file back RDKit's PDB file reader and write out the SMILES does it have > double bonds? More weirdness here... Reading the 3 flavours of output (pdb_block_out) from above back in and showing the kekulised SMILES gives the same SMILES - but not fully kekulised... print Chem.MolToSmiles(Chem.MolFromPDBBlock(pdb_block_out), kekuleSmiles=True) O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.CCNC(O)C1NOC(C2CC(C(C)C)C(O)CC2O)C1C1CCC(CN2CCOCC2)CC1.CCC(C)C(NC(=O)CNC(=O)C(NC(=O)C(CC(=O)O)NC(=O)C(NC(=O)C(NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCCNC(=N)N)NC(=O)C(CC(=O)O)NC(=O)C(CCC(N)=O)NC(=O)C(CCCCN)NC(=O)C(CC(N)=O)NC(=O)C1CCCN1C(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(CC1:C:[NH]:C:N:1)NC(=O)C(CC(C)C)NC(=O)C(CCC(=O)O)NC(=O)C(CCCCN)NC(=O)CNC(=O)C(CO)NC(=O)C(CC(=O)O)NC(=O)C(CC(C)C)NC(=O)C(CCCCN)NC(=O)C(CO)NC(=O)C1CCCN1C(=O)C(CC(=O)O)NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CO)NC(=O)C(CCC(=O)O)NC(=O)C(CC1:C:C:C(O):C:C:1 )NC(=O)C(CCCNC(=N)N)NC(=O)C(NC(=O)C(CCCCN)NC(=O)C(CC(=O)O)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CC(=O)O)NC(=O)C(CO)NC(=O)C(CO)NC(=O)C(CC(N)=O)NC(=O)C(CO)NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CCC(=O)O)NC(=O)C(CCCNC(=N)N)NC(=O)C(CC(C)C)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(NC(=O)C(CCC(=O)O)NC(=O)C(CCCCN)NC(=O)C(CC(N)=O)NC(=O)C(CO)NC(=O)C(CC1:C:C:C(O):C:C:1)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CO)NC(=O)C(CCSC)NC(=O)C(CC(C)C)NC(=O)C(CCC(N)=O)NC(=O)C(C)NC(=O)C(NC(=O)C(CCC(=O)O)NC(=O)C(C)NC(=O)C(CCC(N)=O)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(C)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(NC(=O)C(CCC(=O)O)NC(=O)C(NC(=O)C(N)CCC(=O)O)C(C)C)C(C)O)C(C)CC)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)CC)C(C)O)C(C)O)C(C)CC)C(C)C)C(C)O)C(=O)NCC(=O)NC(CCSC)C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(C)C(=O)NC(CC(=O)O)C(=O)NC(CC(C)C)C(=O)NC(C(=O)NC(CC(N)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC(C)C)C(=O)NCC(=O)NC(C(=O)NC(C(=O)NC(C)C(=O)NC(CCCCN)C(=O)NC(CO)C(=O)NCC(=O)NC(C(=O)NC(CCCCN)C(=O)NC(C)C(=O) NC(CC1:C:C:C:C:C:1)C(=O)NC(CCSC)C(=O)NC(CCC(=O)O)C(=O)NC(C)C(=O)NC(CC(C)C)C(=O)NC(C)C(=O)NC(C)C(=O)NCC(=O)NC(C)C(=O)NC(CC(=O)O)C(=O)NC(C(=O)NC(CO)C(=O)NC(CCSC)C(=O)NC(C(=O)NCC(=O)NC(CCC(N)=O)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NCC(=O)NC(C(=O)NCC(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(CO)C(=O)NC(C)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(CC(C)C)C(=O)NC(C(=O)NC(C)C(=O)NC(CCC(=O)O)C(=O)NC(CCCCN)C(=O)NC(C(=O)NC(C(=O)NC(C(=O)NC(C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(CC1:C:[NH]:C:N:1)C(=O)NC(CC(N)=O)C(=O)NC(CC(=O)O)C(=O)NC(CC(=O)O)C(=O)NC(CCC(=O)O)C(=O)NC(CCC(N)=O)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(C)C(=O)NC(CC1:C:[NH]:C2:C:C:C:C:C:1:2)C(=O)NC(CCC(=O)O)C(=O)NC(CO)C(=O)NC(CO)C(=O)NC(C)C(=O)NCC(=O)NCC(=O)NC(CO)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(C(=O)NC(C(=O)NC(CCCNC(=N)N)C(=O)NC(C(=O)NC(CC(=O)O)C(=O)NC(C(=O)NCC(=O)NC(CCC(=O)O)C(=O)N1CCCC1C(=O)NC(CCSC)C(=O)NCC(=O)NC(CCCNC(=N)N)C(=O)NCC(=O)NC(C(=O)NC(CCCCN)C(=O)NC(C(=O)NC(C(=O)NC(CC(C)C)C(=O)NC(CC1:C:[NH]:C:N:1)C(=O)NC(CC(C)C)C(=O)NC(CCCCN)C(=O)NC(C CC(=O)O)C(=O)NC(CC(=O)O)C(=O)NC(CCC(N)=O)C(=O)NC(C(=O)NC(CCC(=O)O)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(CC(C)C)C(=O)NC(CCC(=O)O)C(=O)NC(CCC(=O)O)C(=O)NC(CCCNC(=N)N)C(=O)NC(CCCNC(=N)N)C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(CCC(=O)O)C(=O)NC(C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(CCCCN)C(=O)NC(CC1:C:[NH]:C:N:1)C(=O)NC(CO)C(=O)NC(CCC(N)=O)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(C(=O)NCC(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)N1CCCC1C(=O)NC(C(=O)NC(C(=O)NC(CC(C)C)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(C(=O)NC(CCC(=O)O)C(N)=O)C(C)C)C(C)O)C(C)CC)C(C)CC)C(C)C)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)C)C(C)O)C(C)O)C(C)O)C(C)C)C(C)O)C(C)O)C(C)CC)C(C)C)C(C)O)C(C)C)C(C)C)C(C)C)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)O)C(C)CC)C(C)O Hopefully what I have described above clarifies what I am seeing(?) It looks like two issues - (1) non-render of bond orders when the default flavor=0 is used (either due to the format of the CONECT block, or PyMOL's interpretation of it?) - but the double bonds aren't lost; and (2) non-kekulisable aromatics coming from Chem.MolFromPDBBlock() - at least with the default options(?) Cheers James ______________________________________________________________________ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies 100 Berkshire Place Wharfedale Road Winnersh, Berkshire RG41 5RD, England Tel: +44 (0)118 938 0000 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. ______________________________________________________________________ ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss