Re: [Rdkit-discuss] Beta of Q3 2013 release available

ro...@nextmovesoftware.com Fri, 25 Oct 2013 02:28:48 -0700

Hi James and Greg,

On Oct 25, 2013, at 4:03 AM, Greg Landrum wrote:
> 1.       Do I remember correctly that there was a proposal (from  
> Roger) to add some auto bond-type perception to the PDB parser for  
> ligands (or is that just wishful thinking!)?
>
> Roger will have to confirm this, but I believe he said something  
> along the lines of "that way lies madness".


My first comment is that a computational chemistry toolkit's "assign  
bonds orders,
formal charges and protonation states from 3D coordinates" function is/ 
should be
a (sanitize-like) step independent of its PDB file reader.  For one  
thing, this
functionality is required for reading XYZ format files, Schrodinger  
maestro files,
and quantum mechanics files formats, such as Gaussian and MOPAC.  For
another thing, many PDB file reading applications don't require bond  
orders,
e.g. GRASP surfaces and many docking functions/forcefield  
calculations, so
handling bond order perception independently of PDB reading has some  
merit.

All I'll say at this stage is that correctly perceiving bonds, formal  
charges and
protonation state (they're all interdependent) is probably more  
complicated than
most folks think.  Indeed, many of the crystallographers at the RDKit  
meeting
claimed it was impossible.  The "bondage" algorithm used in OpenEye's  
OEChem
is several thousands of lines of C++, and was still improving (on  
things like
iron-sulfur clusters and oxime vs. nitroso perception) up to the point  
I left
Santa Fe in 2010.  The state-of-the-art from a decade ago is described  
at:
http://www.daylight.com/meetings/mug01/Sayle/m4xbondage.html and was
used at the time to produce a searchable database of PDB ligands:
http://www.metaphorics.com/products/luna.html

> 3.   Is there some explanation for what the ‘flavor’ option does for  
> reading/writing PDB?
>
> I'm not sure about the reader. Roger, can you answer that?
>
> This is what's in the C++ for the PDBWriter:
> // PDBWriter support multiple "flavors" of PDB output
> // flavor & 1 : Write MODEL/ENDMDL lines around each record
> // flavor & 2 : Don't write any CONECT records
> // flavor & 4 : Write CONECT records in both directions
> // flavor & 8 : Don't use multiple CONECTs to encode bond order
> // flavor & 16 : Write MASTER record
> // flavor & 32 : Write TER record
>
> This is now in the docs for both the Python and C++ code.

The use of an integer file format "flavor" argument allows the caller  
to customize
the behavior of the readers and writers.  The semantics is that a  
reasonable default
is zero (for all bits), but that new features may be added without  
changing the API/ABI.
Most of the bits above (for the writer) control strict compliance with  
the PDB format
specification.  For example, a flavor of 12 will write bond orders the  
way the RCSB
expects them both throwing away bond orders and increasing the size of  
the PDB file.

For the reader, the flavor argument controls whether alternate  
locations are read
(for use by PDB power users), or whether a sensible subset of atoms is  
used for
the RDKit::ROMol.
> 5.       It seems to me that GetResidueNumber() and  
> GetSerialNumber() may have got mixed-up at some point(?).  At least,  
> when I call GetSerialNumber() I see what appears to be the residue  
> number; and when I call GetResidueNumber() I get “0”!
>
> This was another dumb bug from me. It's fixed.

Greg is being modest.  At the time of the RDKit meeting, the  
MonomerInfo data structure
had just a "SerialNumber" field which was used for storing residue  
numbers.  One of my
suggestions back to Greg was that although everything worked, this  
nomenclature might
be confusing to folks using the API, so it was suggested to rename the  
field for the Q3 beta.
The better solution was to support fields for both ResidueNumber and  
SerialNumber, but
following that change I failed to send the patch to make the reader/ 
writer use the correct
(changed) residueNumber field, and record/honour the serial number  
field.

My apologies.  I share some of the blame for this one.

> 6.       I also seem to be seeing all of the bonds (for all  
> residues) being written out in CONECT records – such that they all  
> appear as single bonds in eg PyMOL – is this expected behaviour at  
> the moment?
>
> Another one for Roger.

I believe this should work fine.  RDKit's PDB file writer by default  
encodes the bond
orders, which should be interpreted by PyMol.  In the words of the  
late great Warren:
http://www.phenix-online.org/pipermail/phenixbb/2008-April/012188.html

We need to check where the bond orders are getting lost.  If you read  
the PDB file
back RDKit's PDB file reader and write out the SMILES does it have  
double bonds?


I hope this helps.

Many thanks again to Greg for all the code polishing described above.

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science  
Park, Cambridge CB4 0EY


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Beta of Q3 2013 release available

Reply via email to