Re: [Cdk-user] Atom Signatures with stereo-info

2017-07-21 Thread John Mayfield
Here's how you can convert the atom indices to a SMILES with stereo,
2.1-SNAPSHOT cleans up the stereo API avoids the cast and actually makes
this a lot easier, done quick and dirty here but you get the idea.

public static String toSmiles(CircularFingerprinter.FP fp,
IAtomContainer mol) throws CDKException
{
  IAtomContainer part = mol.getBuilder().newAtomContainer();
  Set aset = new HashSet<>();
  for (int idx : fp.atoms) {
aset.add(mol.getAtom(idx));
part.addAtom(mol.getAtom(idx));
  }
  for (IBond bond : mol.bonds()) {
if (aset.contains(bond.getBegin()) &&
aset.contains(bond.getEnd()))
  part.addBond(bond);
  }
  for (IStereoElement se : mol.stereoElements()) {
if (se instanceof ITetrahedralChirality) {
  ITetrahedralChirality tc = (ITetrahedralChirality) se;
  if (aset.contains(tc.getChiralAtom()) &&
  aset.contains(tc.getLigands()[0]) &&
  aset.contains(tc.getLigands()[1]) &&
  aset.contains(tc.getLigands()[2]) &&
  aset.contains(tc.getLigands()[3]))
part.addStereoElement(tc);
}
  }
  return SmilesGenerator.isomeric().create(part);
}


On 21 July 2017 at 13:12, John Mayfield  wrote:

>  Although this produces bit-fingerprints and not any String-representation
>> of the signatures if I'm reading this correctly?
>
>
> Yes but notice it also gives you the atom indexes, this is much more
> powerful that just giving the String. We actually have a utility to get the
> SMARTS for the atoms. Won't give you stereo but it's pretty easy to make it
> do that if you were so inclined, would be easy to output stereo as SMILES
> instead of SMARTS:
>
>
>> SmilesParser   smipar = new SmilesParser(SilentChemObjectBuilder.
>> getInstance());
>> IAtomContainer mol = smipar.parseSmiles("CC[C@H](C)CO");
>> CircularFingerprinter fp = new CircularFingerprinter(
>> CircularFingerprinter.CLASS_ECFP6);
>> fp.calculate(mol);
>>
> SmartsFragmentExtractor smafrag = new SmartsFragmentExtractor(mol);
>
> for (int i = 0; i < fp.getFPCount(); i++)
>>   System.out.println(smafrag.generate(fp.getFP(i).atoms));
>
>
> Result:
>
> [CH3v4X4+0]
>> [CH2v4X4+0]
>> [CH2v4X4+0]
>> [CH2v4X4+0]
>> [CH2v4X4+0]
>> [CH2v4X4+0]
>> [CH1v4X4+0]
>> [CH3v4X4+0]
>> [CH2v4X4+0]
>> [OH1v2X2+0]
>> [CH3v4X4+0][CH2v4X4+0]
>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>> [CH1v4X4+0][CH3v4X4+0]
>> [CH1v4X4+0][CH2v4X4+0][OH1v2X2+0]
>> [CH2v4X4+0][OH1v2X2+0]
>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+
>> 0][CH2v4X4+0][CH1v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+
>> 0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+
>> 0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][
>> OH1v2X2+0]
>
>
>
> However, I have done some experiments comparing the circular fingerprints
>> of enantiomers and also diastereomers, and they turn out to have 1.0
>> tanimoto scores.
>> What am I doing wrong?
>
>
> Unfortunately the way it was written you currently need 2D coordinates.
> It's an easy fix if you want to submit the patch, just need to pull the
> tetrahedral rubric out of the IStereoElements - note the IStereoElement's
> are created automatically on 2D/3D.
>
> SmilesParser  smipar = new SmilesParser(SilentChemObjectBuilder.
>> getInstance());
>> IAtomContainermol1 = smipar.parseSmiles("CC[C@H](C)CO");
>> IAtomContainermol2 = smipar.parseSmiles("CC[C@@H](C)CO");
>> CircularFingerprinter fp = new CircularFingerprinter(
>> CircularFingerprinter.CLASS_ECFP6);
>> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
>> fp.getFingerprint(mol2)));
>> // 1.0
>> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
>> sdg.generateCoordinates(mol1);
>> sdg.generateCoordinates(mol2);
>> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
>> fp.getFingerprint(mol2)));
>> // 0.77
>
>
>
> On 21 July 2017 at 12:25, Christoph Steinbeck <
> christoph.steinb...@uni-jena.de> wrote:
>
>> CircularFingerprinter.getBitFingerprint().asBitString().toString();
>>
>> or
>>
>> Integer.toString(CircularFingerprinter.getFP())
>>
>> Did not test this.
>>
>> Kind regards,
>>
>> Chris
>>
>>
>> —
>> Prof. Dr. Christoph 

Re: [Cdk-user] Atom Signatures with stereo-info

2017-07-21 Thread John Mayfield
>
>  Although this produces bit-fingerprints and not any String-representation
> of the signatures if I'm reading this correctly?


Yes but notice it also gives you the atom indexes, this is much more
powerful that just giving the String. We actually have a utility to get the
SMARTS for the atoms. Won't give you stereo but it's pretty easy to make it
do that if you were so inclined, would be easy to output stereo as SMILES
instead of SMARTS:


> SmilesParser   smipar = new
> SmilesParser(SilentChemObjectBuilder.getInstance());
> IAtomContainer mol = smipar.parseSmiles("CC[C@H](C)CO");
> CircularFingerprinter fp = new
> CircularFingerprinter(CircularFingerprinter.CLASS_ECFP6);
> fp.calculate(mol);
>
SmartsFragmentExtractor smafrag = new SmartsFragmentExtractor(mol);

for (int i = 0; i < fp.getFPCount(); i++)
>   System.out.println(smafrag.generate(fp.getFP(i).atoms));


Result:

[CH3v4X4+0]
> [CH2v4X4+0]
> [CH2v4X4+0]
> [CH2v4X4+0]
> [CH2v4X4+0]
> [CH2v4X4+0]
> [CH1v4X4+0]
> [CH3v4X4+0]
> [CH2v4X4+0]
> [OH1v2X2+0]
> [CH3v4X4+0][CH2v4X4+0]
> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
> [CH1v4X4+0][CH3v4X4+0]
> [CH1v4X4+0][CH2v4X4+0][OH1v2X2+0]
> [CH2v4X4+0][OH1v2X2+0]
> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>
> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>
> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]



However, I have done some experiments comparing the circular fingerprints
> of enantiomers and also diastereomers, and they turn out to have 1.0
> tanimoto scores.
> What am I doing wrong?


Unfortunately the way it was written you currently need 2D coordinates.
It's an easy fix if you want to submit the patch, just need to pull the
tetrahedral rubric out of the IStereoElements - note the IStereoElement's
are created automatically on 2D/3D.

SmilesParser  smipar = new
> SmilesParser(SilentChemObjectBuilder.getInstance());
> IAtomContainermol1 = smipar.parseSmiles("CC[C@H](C)CO");
> IAtomContainermol2 = smipar.parseSmiles("CC[C@@H](C)CO");
> CircularFingerprinter fp = new
> CircularFingerprinter(CircularFingerprinter.CLASS_ECFP6);
> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
> fp.getFingerprint(mol2)));
> // 1.0
> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
> sdg.generateCoordinates(mol1);
> sdg.generateCoordinates(mol2);
> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
> fp.getFingerprint(mol2)));
> // 0.77



On 21 July 2017 at 12:25, Christoph Steinbeck <
christoph.steinb...@uni-jena.de> wrote:

> CircularFingerprinter.getBitFingerprint().asBitString().toString();
>
> or
>
> Integer.toString(CircularFingerprinter.getFP())
>
> Did not test this.
>
> Kind regards,
>
> Chris
>
>
> —
> Prof. Dr. Christoph Steinbeck
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> Phone Secretariat: +49-3641-948171
> http://orcid.org/-0001-6966-0814
>
> What is man but that lofty spirit - that sense of enterprise.
> ... Kirk, "I, Mudd," stardate 4513.3..
>
> > On 21 Jul 02017, at 13:09, Staffan Arvidsson <
> staffan.arvids...@gmail.com> wrote:
> >
> > OK thanks! Although this produces bit-fingerprints and not any
> String-representation of the signatures if I'm reading this correctly?
> Currently all our code requires the Signatures to be Strings. Would require
> a large rewrite to get this to work for us. Because the javadoc says that
> method getRawFingerprint is not correct so I should not use it? (Even
> though this would be something more like what we want)
> >
> > Best,
> > Staffan
> >
> > 2017-07-21 11:59 GMT+02:00 John Mayfield :
> > Yes,
> >
> > Use the CircularFingerprinter, it encodes stereochemistry, the relevant
> method is CircularFingerprinter.getFP() which will give you the atoms
> involved and the hashed value. IIRC the first atom in the list is the
> 'root'.
> >
> > John
> >
> > On 21 July 2017 at 09:39, Staffan Arvidsson 
> wrote:
> > Hi all,
> >
> > I wonder if there is any way of producing atom signatures with
> 

Re: [Cdk-user] Atom Signatures with stereo-info

2017-07-21 Thread John Mayfield
Yes,

Use the CircularFingerprinter, it encodes stereochemistry, the relevant
method is CircularFingerprinter
.getFP()
which will give you the atoms involved and the hashed value. IIRC the first
atom in the list is the 'root'.

John

On 21 July 2017 at 09:39, Staffan Arvidsson 
wrote:

> Hi all,
>
> I wonder if there is any way of producing atom signatures with
> stereoinformation? Currently we're using
>
> String signature = new AtomSignature(atom, height,
> molecule).toCanonicalString();
>
> to produce the signatures.
>
>
> Best,
> Staffan
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Atom Signatures with stereo-info

2017-07-21 Thread Egon Willighagen
Hi Staffan,

some years ago we looked at exactly this with AZ (Lars) and Gilleain
(EBI)... I need to look up that code...

Egon


On Fri, Jul 21, 2017 at 10:39 AM, Staffan Arvidsson <
staffan.arvids...@gmail.com> wrote:

> Hi all,
>
> I wonder if there is any way of producing atom signatures with
> stereoinformation? Currently we're using
>
> String signature = new AtomSignature(atom, height,
> molecule).toCanonicalString();
>
> to produce the signatures.
>
>
> Best,
> Staffan
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>


-- 
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: -0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


[Cdk-user] Atom Signatures with stereo-info

2017-07-21 Thread Staffan Arvidsson
Hi all,

I wonder if there is any way of producing atom signatures with
stereoinformation? Currently we're using

String signature = new AtomSignature(atom, height,
molecule).toCanonicalString();

to produce the signatures.


Best,
Staffan
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user