Hi James and Greg, On Oct 25, 2013, at 4:03 AM, Greg Landrum wrote: > 1. Do I remember correctly that there was a proposal (from > Roger) to add some auto bond-type perception to the PDB parser for > ligands (or is that just wishful thinking!)? > > Roger will have to confirm this, but I believe he said something > along the lines of "that way lies madness".
My first comment is that a computational chemistry toolkit's "assign bonds orders, formal charges and protonation states from 3D coordinates" function is/ should be a (sanitize-like) step independent of its PDB file reader. For one thing, this functionality is required for reading XYZ format files, Schrodinger maestro files, and quantum mechanics files formats, such as Gaussian and MOPAC. For another thing, many PDB file reading applications don't require bond orders, e.g. GRASP surfaces and many docking functions/forcefield calculations, so handling bond order perception independently of PDB reading has some merit. All I'll say at this stage is that correctly perceiving bonds, formal charges and protonation state (they're all interdependent) is probably more complicated than most folks think. Indeed, many of the crystallographers at the RDKit meeting claimed it was impossible. The "bondage" algorithm used in OpenEye's OEChem is several thousands of lines of C++, and was still improving (on things like iron-sulfur clusters and oxime vs. nitroso perception) up to the point I left Santa Fe in 2010. The state-of-the-art from a decade ago is described at: http://www.daylight.com/meetings/mug01/Sayle/m4xbondage.html and was used at the time to produce a searchable database of PDB ligands: http://www.metaphorics.com/products/luna.html > 3. Is there some explanation for what the ‘flavor’ option does for > reading/writing PDB? > > I'm not sure about the reader. Roger, can you answer that? > > This is what's in the C++ for the PDBWriter: > // PDBWriter support multiple "flavors" of PDB output > // flavor & 1 : Write MODEL/ENDMDL lines around each record > // flavor & 2 : Don't write any CONECT records > // flavor & 4 : Write CONECT records in both directions > // flavor & 8 : Don't use multiple CONECTs to encode bond order > // flavor & 16 : Write MASTER record > // flavor & 32 : Write TER record > > This is now in the docs for both the Python and C++ code. The use of an integer file format "flavor" argument allows the caller to customize the behavior of the readers and writers. The semantics is that a reasonable default is zero (for all bits), but that new features may be added without changing the API/ABI. Most of the bits above (for the writer) control strict compliance with the PDB format specification. For example, a flavor of 12 will write bond orders the way the RCSB expects them both throwing away bond orders and increasing the size of the PDB file. For the reader, the flavor argument controls whether alternate locations are read (for use by PDB power users), or whether a sensible subset of atoms is used for the RDKit::ROMol. > 5. It seems to me that GetResidueNumber() and > GetSerialNumber() may have got mixed-up at some point(?). At least, > when I call GetSerialNumber() I see what appears to be the residue > number; and when I call GetResidueNumber() I get “0”! > > This was another dumb bug from me. It's fixed. Greg is being modest. At the time of the RDKit meeting, the MonomerInfo data structure had just a "SerialNumber" field which was used for storing residue numbers. One of my suggestions back to Greg was that although everything worked, this nomenclature might be confusing to folks using the API, so it was suggested to rename the field for the Q3 beta. The better solution was to support fields for both ResidueNumber and SerialNumber, but following that change I failed to send the patch to make the reader/ writer use the correct (changed) residueNumber field, and record/honour the serial number field. My apologies. I share some of the blame for this one. > 6. I also seem to be seeing all of the bonds (for all > residues) being written out in CONECT records – such that they all > appear as single bonds in eg PyMOL – is this expected behaviour at > the moment? > > Another one for Roger. I believe this should work fine. RDKit's PDB file writer by default encodes the bond orders, which should be interpreted by PyMol. In the words of the late great Warren: http://www.phenix-online.org/pipermail/phenixbb/2008-April/012188.html We need to check where the bond orders are getting lost. If you read the PDB file back RDKit's PDB file reader and write out the SMILES does it have double bonds? I hope this helps. Many thanks again to Greg for all the code polishing described above. Roger -- Roger Sayle, Ph.D. CEO and founder NextMove Software Limited Registered in England No. 07588305 Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, Cambridge CB4 0EY ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss