Paul got a very good point. The PDB site does have info about the expected
chemical
structure of the ligands. Indeed if you are working with a single entry, you
could simply
download an SDF for the ligand - just scroll down to "Ligand Chemical
Component"
section and click download. It's atomic coordinates are already good to go so
even atom
matching is unnecessary.
However, in the real world, you sometimes get PDB-formatted files that do not
come from
the PDB. (Don't ask me why they use the PDB format.) You may need to make a
guess on
the bonding pattern. Hence some algorithm to make a guess would still be handy.
I certainly agree with Paul that one (both the user and the program) should
check the
PDB site first to see if it's already there.
Perhaps there should be a hint message reminding to the user to check the PDB
first
whenever the "assign bond order" routine is called? But it may not be easy to
strike a
balance so as not to annoy the user when he knowingly calls the routine
multiple times
on some non-PDB-sourced PDB formatted files.
Ling
>________________________________
> From: Paul Emsley <pems...@mrc-lmb.cam.ac.uk>
>To: "rdkit-discuss@lists.sourceforge.net"
><rdkit-discuss@lists.sourceforge.net>
>Sent: Friday, October 25, 2013 10:24 AM
>Subject: Re: [Rdkit-discuss] Beta of Q3 2013 release available
>
>
>On 25/10/13 08:09, James Davidson wrote:
>> Hi Roger,
>>
>> Thanks for the response
>>
>>> The use of an integer file format "flavor" argument allows the caller to
>>> customize the behavior of the readers and writers. The semantics is that a
>>> reasonable default is zero (for all bits), but that new features may be
>>> added
>>> without changing the API/ABI.
>>> Most of the bits above (for the writer) control strict compliance with the
>>> PDB
>>> format specification. For example, a flavor of 12 will write bond orders
>>> the
>>> way the RCSB expects them both throwing away bond orders and increasing
>>> the size of the PDB file.
>> As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB
>> using the following
>>
>> import requests
>> url =
>> "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VCI"
>> response = requests.get(url)
>> pdb_block = response.content
>> response.close()
>>
>>
>> pdb_block shows CONECT records only for the HETATM records.
>> If I now read into RDKit, using the defaults, and write back out using the
>> defaults, I see CONECT records for every atom (ie protein as well). And I
>> can't see any double-bonds rendered in PyMOL:
>>
>> from rdkit import Chem
>> from rdkit.Chem import AllChem
>> pdb = Chem.MolFromPDBBlock(pdb_block)
>> pdb_block_out = Chem.MolToPDBBlock(pdb)
>>
>> First 10 CONECT records of output:
>> CONECT 1 2
>> CONECT 2 3 5
>> CONECT 3 4 4 10
>> CONECT 5 6
>> CONECT 6 7
>> CONECT 7 8 8 9
>> CONECT 10 11
>> CONECT 11 12 14
>> CONECT 12 13 13 17
>> CONECT 14 15 16
>>
>>
>> If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand
>> CONECT records in what looks like the original format (albeit now numbered
>> differently), and I still see CONECT records for the protein - but this PDB
>> *will* render double bonds in PyMOL.
>>
>> First 10 CONECT records of output:
>> CONECT 3 4 4
>> CONECT 7 8 8
>> CONECT 12 13 13
>> CONECT 19 20 20
>> CONECT 23 24 24
>> CONECT 28 29 29
>> CONECT 35 36 36
>> CONECT 38 39 39
>> CONECT 40 42 42
>> CONECT 41 43 43
>>
>
>If I may be so bold, I believe an important part of the puzzle is
>missing. The residue-name/3-letter-code/comp-id in the PDB file is a
>pointer to an entry in the mmCIF-formatted chemical component dictionary
>that describes the compound, for all compounds for all entries released
>by the PDB.
>
>http://deposit.pdb.org/cc_dict_tut.html
>
>If this is an "internal" PDB file there will, very likely be a similar
>mmCIF file used for crystallographic refinement.
>
>Only when these options fail would I consider turning to bond-order
>perception and CONECT records.
>
>Paul.
>
>
>
>------------------------------------------------------------------------------
>October Webinars: Code for Performance
>Free Intel webinars can help you accelerate application performance.
>Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
>the latest Intel processors and coprocessors. See abstracts and register >
>http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>_______________________________________________
>Rdkit-discuss mailing list
>Rdkit-discuss@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss