Hi Roger,

Thanks for the response

> The use of an integer file format "flavor" argument allows the caller to
> customize the behavior of the readers and writers.  The semantics is that a
> reasonable default is zero (for all bits), but that new features may be added
> without changing the API/ABI.
> Most of the bits above (for the writer) control strict compliance with the PDB
> format specification.  For example, a flavor of 12 will write bond orders the
> way the RCSB expects them both throwing away bond orders and increasing
> the size of the PDB file.

As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB using 
the following

import requests
url = 
"http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VCI";
response = requests.get(url)
pdb_block = response.content
response.close()


pdb_block shows CONECT records only for the HETATM records.
If I now read into RDKit, using the defaults, and write back out using the 
defaults, I see CONECT records for every atom (ie protein as well).  And I 
can't see any double-bonds rendered in PyMOL:

from rdkit import Chem
from rdkit.Chem import AllChem
pdb = Chem.MolFromPDBBlock(pdb_block)
pdb_block_out = Chem.MolToPDBBlock(pdb)

First 10 CONECT records of output:
CONECT    1    2
CONECT    2    3    5
CONECT    3    4    4   10
CONECT    5    6
CONECT    6    7
CONECT    7    8    8    9
CONECT   10   11
CONECT   11   12   14
CONECT   12   13   13   17
CONECT   14   15   16


If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand CONECT 
records in what looks like the original format (albeit now numbered 
differently), and I still see CONECT records for the protein - but this PDB 
*will* render double bonds in PyMOL.

First 10 CONECT records of output:
CONECT    3    4    4
CONECT    7    8    8
CONECT   12   13   13
CONECT   19   20   20
CONECT   23   24   24
CONECT   28   29   29
CONECT   35   36   36
CONECT   38   39   39
CONECT   40   42   42
CONECT   41   43   43


And even if I use Chem.MolToPDBBlock(pdb, flavor=2) I still see CONECT records 
for every protein residue (and, again, I also see double bonds in PyMOL).

First 10 CONECT records of output:
CONECT    1    2
CONECT    2    1    3    5
CONECT    3    2    4   10
CONECT    4    3
CONECT    5    2    6
CONECT    6    5    7
CONECT    7    6    8    9
CONECT    8    7
CONECT    9    7
CONECT   10    3   11


Am I maybe doing something wrong with options in the reading step?  


> For the reader, the flavor argument controls whether alternate locations are
> read (for use by PDB power users), or whether a sensible subset of atoms is
> used for the RDKit::ROMol.

Can you (or Greg) post a list of what the current input flavors do?



> > 6.       I also seem to be seeing all of the bonds (for all
> > residues) being written out in CONECT records - such that they all
> > appear as single bonds in eg PyMOL - is this expected behaviour at the
> > moment?
> >
> > Another one for Roger.
> 
> I believe this should work fine.  RDKit's PDB file writer by default encodes 
> the
> bond orders, which should be interpreted by PyMol.  In the words of the late
> great Warren:
> http://www.phenix-online.org/pipermail/phenixbb/2008-April/012188.html
> 
> We need to check where the bond orders are getting lost.  If you read the
> PDB file back RDKit's PDB file reader and write out the SMILES does it have
> double bonds?

More weirdness here...  Reading the 3 flavours of output (pdb_block_out) from 
above back in and showing the kekulised SMILES gives the same SMILES - but not 
fully kekulised...

print Chem.MolToSmiles(Chem.MolFromPDBBlock(pdb_block_out), kekuleSmiles=True)

O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.CCNC(O)C1NOC(C2CC(C(C)C)C(O)CC2O)C1C1CCC(CN2CCOCC2)CC1.CCC(C)C(NC(=O)CNC(=O)C(NC(=O)C(CC(=O)O)NC(=O)C(NC(=O)C(NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCCNC(=N)N)NC(=O)C(CC(=O)O)NC(=O)C(CCC(N)=O)NC(=O)C(CCCCN)NC(=O)C(CC(N)=O)NC(=O)C1CCCN1C(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(CC1:C:[NH]:C:N:1)NC(=O)C(CC(C)C)NC(=O)C(CCC(=O)O)NC(=O)C(CCCCN)NC(=O)CNC(=O)C(CO)NC(=O)C(CC(=O)O)NC(=O)C(CC(C)C)NC(=O)C(CCCCN)NC(=O)C(CO)NC(=O)C1CCCN1C(=O)C(CC(=O)O)NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CO)NC(=O)C(CCC(=O)O)NC(=O)C(CC1:C:C:C(O):C:C:1
 
)NC(=O)C(CCCNC(=N)N)NC(=O)C(NC(=O)C(CCCCN)NC(=O)C(CC(=O)O)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CC(=O)O)NC(=O)C(CO)NC(=O)C(CO)NC(=O)C(CC(N)=O)NC(=O)C(CO)NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CCC(=O)O)NC(=O)C(CCCNC(=N)N)NC(=O)C(CC(C)C)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(NC(=O)C(CCC(=O)O)NC(=O)C(CCCCN)NC(=O)C(CC(N)=O)NC(=O)C(CO)NC(=O)C(CC1:C:C:C(O):C:C:1)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(NC(=O)C(CC(C)C)NC(=O)C(CO)NC(=O)C(CCSC)NC(=O)C(CC(C)C)NC(=O)C(CCC(N)=O)NC(=O)C(C)NC(=O)C(NC(=O)C(CCC(=O)O)NC(=O)C(C)NC(=O)C(CCC(N)=O)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(C)NC(=O)C(CC1:C:C:C:C:C:1)NC(=O)C(NC(=O)C(CCC(=O)O)NC(=O)C(NC(=O)C(N)CCC(=O)O)C(C)C)C(C)O)C(C)CC)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)CC)C(C)O)C(C)O)C(C)CC)C(C)C)C(C)O)C(=O)NCC(=O)NC(CCSC)C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(C)C(=O)NC(CC(=O)O)C(=O)NC(CC(C)C)C(=O)NC(C(=O)NC(CC(N)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC(C)C)C(=O)NCC(=O)NC(C(=O)NC(C(=O)NC(C)C(=O)NC(CCCCN)C(=O)NC(CO)C(=O)NCC(=O)NC(C(=O)NC(CCCCN)C(=O)NC(C)C(=O)
 
NC(CC1:C:C:C:C:C:1)C(=O)NC(CCSC)C(=O)NC(CCC(=O)O)C(=O)NC(C)C(=O)NC(CC(C)C)C(=O)NC(C)C(=O)NC(C)C(=O)NCC(=O)NC(C)C(=O)NC(CC(=O)O)C(=O)NC(C(=O)NC(CO)C(=O)NC(CCSC)C(=O)NC(C(=O)NCC(=O)NC(CCC(N)=O)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NCC(=O)NC(C(=O)NCC(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(CO)C(=O)NC(C)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(CC(C)C)C(=O)NC(C(=O)NC(C)C(=O)NC(CCC(=O)O)C(=O)NC(CCCCN)C(=O)NC(C(=O)NC(C(=O)NC(C(=O)NC(C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(CC1:C:[NH]:C:N:1)C(=O)NC(CC(N)=O)C(=O)NC(CC(=O)O)C(=O)NC(CC(=O)O)C(=O)NC(CCC(=O)O)C(=O)NC(CCC(N)=O)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(C)C(=O)NC(CC1:C:[NH]:C2:C:C:C:C:C:1:2)C(=O)NC(CCC(=O)O)C(=O)NC(CO)C(=O)NC(CO)C(=O)NC(C)C(=O)NCC(=O)NCC(=O)NC(CO)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(C(=O)NC(C(=O)NC(CCCNC(=N)N)C(=O)NC(C(=O)NC(CC(=O)O)C(=O)NC(C(=O)NCC(=O)NC(CCC(=O)O)C(=O)N1CCCC1C(=O)NC(CCSC)C(=O)NCC(=O)NC(CCCNC(=N)N)C(=O)NCC(=O)NC(C(=O)NC(CCCCN)C(=O)NC(C(=O)NC(C(=O)NC(CC(C)C)C(=O)NC(CC1:C:[NH]:C:N:1)C(=O)NC(CC(C)C)C(=O)NC(CCCCN)C(=O)NC(C
 
CC(=O)O)C(=O)NC(CC(=O)O)C(=O)NC(CCC(N)=O)C(=O)NC(C(=O)NC(CCC(=O)O)C(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)NC(CC(C)C)C(=O)NC(CCC(=O)O)C(=O)NC(CCC(=O)O)C(=O)NC(CCCNC(=N)N)C(=O)NC(CCCNC(=N)N)C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(CCC(=O)O)C(=O)NC(C(=O)NC(C(=O)NC(CCCCN)C(=O)NC(CCCCN)C(=O)NC(CC1:C:[NH]:C:N:1)C(=O)NC(CO)C(=O)NC(CCC(N)=O)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(C(=O)NCC(=O)NC(CC1:C:C:C(O):C:C:1)C(=O)N1CCCC1C(=O)NC(C(=O)NC(C(=O)NC(CC(C)C)C(=O)NC(CC1:C:C:C:C:C:1)C(=O)NC(C(=O)NC(CCC(=O)O)C(N)=O)C(C)C)C(C)O)C(C)CC)C(C)CC)C(C)C)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)C)C(C)O)C(C)O)C(C)O)C(C)C)C(C)O)C(C)O)C(C)CC)C(C)C)C(C)O)C(C)C)C(C)C)C(C)C)C(C)CC)C(C)CC)C(C)O)C(C)CC)C(C)O)C(C)CC)C(C)O


Hopefully what I have described above clarifies what I am seeing(?)  It looks 
like two issues - (1) non-render of bond orders when the default flavor=0 is 
used (either due to the format of the CONECT block, or PyMOL's interpretation 
of it?) - but the double bonds aren't lost; and (2) non-kekulisable aromatics 
coming from Chem.MolFromPDBBlock() - at least with the default options(?)


Cheers

James

______________________________________________________________________
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 0000

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
______________________________________________________________________

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to