Since Roger's the expert here and isn't on list, I forwarded the email to
him. Just to have it on the list as well, here's his answer:


On Thu, Mar 26, 2020 at 10:27 AM Roger Sayle <ro...@nextmovesoftware.com>
wrote:

>
>
> Hi Greg (and Graham),
>
>
>
> The short answer is that RDKit currently only supports a “bioinformatics”
> subset of HELM,
>
> i.e. the protein and nucleic acid sequences that can usefully read to and
> from FASTA sequence
>
> files, and easily converted to or from PDB files.  Actually it allows a
> bit more this, including
>
> support for disulphide bridges, D-amino acids, and some common
> non-standard amino acids
>
> (like phosphotyrosine and phosphoserine in kinases) but again things that
> can internally be
>
> handled/represented using PDB residue codes
>
>
>
> HELM allows folks to do some pretty wacky stuff, in your example,
> connecting an amino
>
> group to a lysine sidechain to create a hydrazine, and connecting a acetyl
> group to the
>
> C-terminus of a peptide to form some weird kind of oxalate, i.e. a peptide
> that ends
>
> -C(=O)C(=O)C.   Alas, these cases and dendritic and cross-linked peptides
> aren’t in the
>
> current RDKit supported subset of HELM.
>
>
>
> I suspect that the HELM reader in RDKit will continue to improve over
> time, but this is
>
> mostly as a convenience, as there already exists a completely free, open
> source and
>
> definitive reference  implementation for converting HELM to SMILES, and
> that’s to use
>
> the Pistoia Alliance’s Java toolkit (perhaps even via Jython).  Hence, the
> same editors
>
> that folks use to enter HELM, also have export as SMILES or export as Mol
> options.
>
> Alas any significant effort to support all the corner cases and monomer
> definitions of HELM
>
> would mostly be “reinventing the wheel” of porting this Pistoia Alliance
> code from Java
>
> to C++, and then require the maintenance burden of keeping it up to date
> as the HELM
>
> specification continues to evolve.
>
>
>
> Finally, for sticking arbitrary functional groups (SMILES strings) onto
> the N- and C-
>
> terminii of peptides, Greg may have suggestions for ways of manipulating
> RDKit mols
>
> without the need to go via HELM, which has the burden of defining custom
> monomers.
>
> Instead, a SMIRKS or reaction SMARTS could be used, easily enumerating
> libraries of
>
> alternately terminated peptides.
>
>
>
> Graham: If you select your HELM depiction, then right click in the
> “Structure View” tab of
>
> the Pistoia HELM webeditor, you’ll be given the option “Copy Molfile”
> which should hopefully
>
> be sufficient for you to work around this issue and get your modified
> peptides into RDKit.
>
> Greg:  Hopefully, this response makes sense and the relevant bits can be
> pasted to
>
> the rdkit mailing list.  I suspect there’s an interesting blog post (or
> RDKit meeting talk)
>
> on using pyjnius or similar (perhaps even just os.system(“java …”)) to
> call the Pistoia
>
> Java libraries from python, and then process the resulting SMILES in
> RDKit.  I’ve also
>
> heard rumours that something called KNIME is a convenient way to link
> RDKit, python
>
> and Java.
>
>
>
> I hope this helps.  Please let me know if you disagree and would prefer
> branched peptides
>
> to be supported directly in RDKit’s HELM readers and writers.  Sorry for
> any inconvenience.
>
>
>
> Best regards,
>
> Roger
>
> --
>
> Roger Sayle, PhD.
>
> CEO and founder
>
> NextMove Software Limited
>
> Registered in England No. 07588305
>
> Registered Office: Innovation Centre, 320 Cambridge Science Park,
> Cambridge, CB4 0WG
>


On Wed, Mar 25, 2020 at 3:43 PM Graham Simpson <gra...@daringbio.com> wrote:

> Hi all, Please be gentle with me! To be honest I am an absolute amateur,
> at python and RDKit. I'm a trained peptide chemist and trying to convert
> some peptide sequences with modifications into SMILES codes. I've decided,
> maybe wrongly, that the best route is via HELM codes to MOL to SMILES using
> RDKit in Python - mainly since the peptides are quite modified at
> C-/N-term and branched.
> I'm a little embarrassed by my code but I have posted it here!
>
> https://github.com/grahamsimpson/peptides/blob/master/single_peptide_to_HELM.py
> <https://github.com/grahamsimpson/peptides/blob/master/single_peptide_to_HELM.py>
>
> The issue I'm having is the RDKit MolFromHELM - Ive been using the HELM
> Webeditor - http://webeditor.openhelm.org/hwe/examples/App.htm - to
> validate HELM codes - but the MolFromHELM keeps throwing up an error
> (resulting in none and subsequently, MolToSequence or MolToSMILES don't
> work.
>
> Traceback (most recent call last):
>
>   File "test_single_HELM.py", line 34, in <module>
>
>     Sequence = Chem.MolToSequence(mol)
>
> Boost.Python.ArgumentError: Python argument types in
>
>     rdkit.Chem.rdmolfiles.MolToSequence(NoneType)
>
> did not match C++ signature:
>
>     MolToSequence(RDKit::ROMol mol)
>
> The peptides I want are in the form
> - linear- Ac-ICECREAMAAICECREAMDD-NH2
> -branched - (Ac-ICECREAMAA)(Ac-ICECREAMDD)K-NH2
> and looking to get the SMILES codes.
>
>
> PEPTIDE1{[ac]}|PEPTIDE2{[am]}|PEPTIDE3{[ac].I.C.E.C.R.E.A.M.A.A.K.D.D.M.A.E.R.C.E.C.I}$PEPTIDE1,PEPTIDE3,1:R2-22:R2|PEPTIDE2,PEPTIDE3,1:R1-12:R3$$$V2.0
> or drawn another way
>
>
> PEPTIDE1{[ac].I.C.E.C.R.E.A.M.A.A}|PEPTIDE2{[ac].I.C.E.C.R.E.A.M.D.D.K.[am]}$PEPTIDE2,PEPTIDE1,12:R3-11:R2$$$V2.0
>
> [image: image.png]
>
>
> I would be very grateful for any help you could give me. I'm sure there
> are more efficient ways of doing this with dictionaries or other
> approaches. This has been bugging me for a while (but has been a great
> learning experience!).
> Please let me know if it there are other places I could post to find the
> answer to this.
> Thanks very much in advance,
>
> Best wishes,
>
> Graham
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to