Re: [Rdkit-discuss] Beta of Q3 2013 release available

S.L. Chan Sat, 26 Oct 2013 00:16:39 -0700

Paul got a very good point. The PDB site does have info about the expected 
chemical 
structure of the ligands. Indeed if you are working with a single entry, you 
could simply 
download an SDF for the ligand - just scroll down to "Ligand Chemical 
Component" 
section and click download. It's atomic coordinates are already good to go so 
even atom
matching is unnecessary.


However, in the real world, you sometimes get PDB-formatted files that do not 
come from
the PDB. (Don't ask me why they use the PDB format.) You may need to make a 
guess on 
the bonding pattern. Hence some algorithm to make a guess would still be handy.

I certainly agree with Paul that one (both the user and the program) should 
check the 
PDB site first to see if it's already there.

Perhaps there should be a hint message reminding to the user to check the PDB 
first
whenever the "assign bond order" routine is called? But it may not be easy to 
strike a
balance so as not to annoy the user when he knowingly calls the routine 
multiple times
on some non-PDB-sourced PDB formatted files.

Ling



>________________________________
> From: Paul Emsley <pems...@mrc-lmb.cam.ac.uk>
>To: "rdkit-discuss@lists.sourceforge.net" 
><rdkit-discuss@lists.sourceforge.net> 
>Sent: Friday, October 25, 2013 10:24 AM
>Subject: Re: [Rdkit-discuss] Beta of Q3 2013 release available
> 
>
>On 25/10/13 08:09, James Davidson wrote:
>> Hi Roger,
>>
>> Thanks for the response
>>
>>> The use of an integer file format "flavor" argument allows the caller to
>>> customize the behavior of the readers and writers.  The semantics is that a
>>> reasonable default is zero (for all bits), but that new features may be 
>>> added
>>> without changing the API/ABI.
>>> Most of the bits above (for the writer) control strict compliance with the 
>>> PDB
>>> format specification.  For example, a flavor of 12 will write bond orders 
>>> the
>>> way the RCSB expects them both throwing away bond orders and increasing
>>> the size of the PDB file.
>> As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB 
>> using the following
>>
>> import requests
>> url = 
>> "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VCI";
>> response = requests.get(url)
>> pdb_block = response.content
>> response.close()
>>
>>
>> pdb_block shows CONECT records only for the HETATM records.
>> If I now read into RDKit, using the defaults, and write back out using the 
>> defaults, I see CONECT records for every atom (ie protein as well).  And I 
>> can't see any double-bonds rendered in PyMOL:
>>
>> from rdkit import Chem
>> from rdkit.Chem import AllChem
>> pdb = Chem.MolFromPDBBlock(pdb_block)
>> pdb_block_out = Chem.MolToPDBBlock(pdb)
>>
>> First 10 CONECT records of output:
>> CONECT    1    2
>> CONECT    2    3    5
>> CONECT    3    4    4   10
>> CONECT    5    6
>> CONECT    6    7
>> CONECT    7    8    8    9
>> CONECT   10   11
>> CONECT   11   12   14
>> CONECT   12   13   13   17
>> CONECT   14   15   16
>>
>>
>> If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand 
>> CONECT records in what looks like the original format (albeit now numbered 
>> differently), and I still see CONECT records for the protein - but this PDB 
>> *will* render double bonds in PyMOL.
>>
>> First 10 CONECT records of output:
>> CONECT    3    4    4
>> CONECT    7    8    8
>> CONECT   12   13   13
>> CONECT   19   20   20
>> CONECT   23   24   24
>> CONECT   28   29   29
>> CONECT   35   36   36
>> CONECT   38   39   39
>> CONECT   40   42   42
>> CONECT   41   43   43
>>
>
>If I may be so bold, I believe an important part of the puzzle is 
>missing.  The residue-name/3-letter-code/comp-id in the PDB file is a 
>pointer to an entry in the mmCIF-formatted chemical component dictionary 
>that describes the compound, for all compounds for all entries released 
>by the PDB.
>
>http://deposit.pdb.org/cc_dict_tut.html
>
>If this is an "internal" PDB file there will, very likely be a similar 
>mmCIF file used for crystallographic refinement.
>
>Only when these options fail would I consider turning to bond-order 
>perception and CONECT records.
>
>Paul.
>
>
>
>------------------------------------------------------------------------------
>October Webinars: Code for Performance
>Free Intel webinars can help you accelerate application performance.
>Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
>the latest Intel processors and coprocessors. See abstracts and register >
>http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>_______________________________________________
>Rdkit-discuss mailing list
>Rdkit-discuss@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Beta of Q3 2013 release available

Reply via email to