I'll note that the official definitions for all the chemical entities in
the PDB can be found in the wwPDB's Chemical Component Dictionary:
https://www.wwpdb.org/data/ccd

That's in mmCIF format, but there are various SMILES and InChI definitions
for the residues included in the file. (Your mileage may vary for the
quality of those representations, though, especially for the rarer ones,
but it should be no worse than the SDFs.)

You should be able to use an mmCIF parser to extract them.

e.g.
from mmcif.core.mmciflib import ParseCifSimple  # py-mmcif from the RCSB:
`pip install mmcif`
ccd = ParseCifSimple("components.cif", True, 0, 255, "?", "logfile.txt") #
logfile.txt is an arbitrary name

ALA = ccd.GetBlock("ALA")
desc = ALA.GetTable("pdbx_chem_comp_descriptor")
print( desc.GetColumnNames() )
for ii in range(desc.GetNumRows()):
    print( desc.GetRow(ii) )

*['comp_id', 'type', 'program', 'program_version', 'descriptor']*






*['ALA', 'SMILES', 'ACDLabs', '10.04', 'O=C(O)C(N)C']['ALA',
'SMILES_CANONICAL', 'CACTVS', '3.341', 'C[C@H](N)C(O)=O']['ALA', 'SMILES',
'CACTVS', '3.341', 'C[CH](N)C(O)=O']['ALA', 'SMILES_CANONICAL', 'OpenEye
OEToolkits', '1.5.0', 'C[C@@H](C(=O)O)N']['ALA', 'SMILES', 'OpenEye
OEToolkits', '1.5.0', 'CC(C(=O)O)N']['ALA', 'InChI', 'InChI', '1.03',
'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1']['ALA',
'InChIKey', 'InChI', '1.03', 'QNAYBMKLOCPYGJ-REOHCLBHSA-N']*

The components file is rather large, so parsing time might be a little long
at times.

On Fri, Oct 27, 2023 at 10:55 AM He, Amy <he.1...@buckeyemail.osu.edu>
wrote:

> Dear RDKit experts,
>
>
>
> I need your advice on finding a source Smiles library for reference, to
> build the template molecule from Smiles for AssignBondOrdersFromTemplate
> <https://www.rdkit.org/docs/source/rdkit.Chem.AllChem.html>.
>
>
>
> I am using AssignBondOrdersFromTemplate to perceive bonds in a
> residue-wise manner from an input PDB, using a reference Smiles library
> like this:
>
>
>
> ref_smi = {
>
>
>
>     "ALA": "NC(C)C(=O)",
>
>     "GLY": "NCC(=O)",
>
>     "ILE": "NC(C(C)CC)C(=O)",
>
>
>
> }
>
>
> I wonder if there has been an open reference library for common amino
> acids and ligands that present in PDB files. A previous post on
> rdkit-discuss (
> https://rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception)
> points me to this website:
>
> ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz
>
> and useful links from
>
> http://www.ebi.ac.uk/pdbe-srv/pdbechem/
>
>
>
> But I am no longer able to access the contents.
>
>
>
> I guess we could always generate Smiles from the standardized SDF files..
> Still I am wondering if there is an existing Smiles library (like a
> reference datafile), where we can retrieve the Smiles string using the
> residue names of common amino acids and maybe also ligands.
>
>
>
> Any comments or suggestions would be greatly appreciated. Thank you for
> your time and kind support in advance!
>
>
>
>
>
> Bests,
>
>
>
>
>
> --
>
> Amy He
>
> Chemistry Graduate Teaching Assistant
>
> Hadad Lab
>
> Ohio State University
>
> he.1...@osu.edu
>
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to