from:"Esben Jannik Bjerrum via Rdkit\-discuss"

Re: [Rdkit-discuss] enumeration of smiles question

2018-08-10 Thread Esben Jannik Bjerrum via Rdkit-discuss

Hi There, Just saw this interesting thread :-) The code I posted on GitHub
https://github.com/EBjerrum/SMILES-enumeration as referenced previously in this
thread also uses randomization of atom order, similar to Greg's solution here,
to generate more enumerated SMILES than using the rootedAtom approach. Its not
a complete enumeration, as there interestingly also seem to be other ways to
represent the molecules with dots! Thanks, could be interesting to explore!

Nevertheless, the actual enumerator code is wrapped in a couple of objects,
which can be used to either just generate the SMILES dataset in various forms,
or do it on the fly as batch generators. That works nicely with the
fit_generator function of Keras if you use that framework. This avoids memory
issues with large datasets and is convenient, at the cost of some overhead in
the training (a few percent longer training).
In some of my recent applications I use the binary format or the mol objects
directly, instead of round tripping the SMILES over an RDKit molecule.

It seems like the enumeration trick is a nice way to break the SMILES
serialization of the molecular representation and somehow generate an internal
representation of the molecule closer to the graph we think of molecules in. I
did some work with autoencoders as hetereoencoder, trying to encode different
molecular formats and also from enumerated to enumerated. It seem to work! even
though I'm presenting a random SMILES and ask the network to encode it to a
vector and then decode into another randomly chosen SMILES of the same molecule
during training. Each time a new pair of two randomly generated SMILES of the
same molecule. The teacher forcing of the decoder is probably crucial here, as
it lets the decoder correct its later guesses, based on the actual right answer
pr. character. Doing this seem to have a lot of influence on the latent space
encoded by the autoencoder, with possible implications for molecular de novo
generation.
Theres a preprint here: https://arxiv.org/abs/1806.09300
Some researchers at Bayer have independently from me also worked on similar
approaches and showed improvements for using the latent space representation
for QSAR modelling.
https://chemrxiv.org/articles/Learning_Continuous_and_Data-Driven_Molecular_Descriptors_by_Translating_Equivalent_Chemical_Representations/6871628
I guess we haven't seen the end of this yet, as there is a lot to explore and
improve on. Its super fascinating how far a bit of deep learning and data
augmentation of the SMILES works.
Best RegardsEsben
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Reverse scale of svg coordinates to atom coordinates

2017-06-21 Thread Esben Jannik Bjerrum via Rdkit-discuss

Thank you for the fast reply, Greg and Yang,   GetDrawCoords seem to be exactly 
what I need to proceed. I'll get the bleeding edge version or try and patch the 
code :-)
Esben Jannik Bjerrum
cand.pharm, Ph.D
/Sent from my Ubuntu Touch Phone

Phone +45 2823 8009
http://dk.linkedin.com/in/esbenbjerrum
http://www.wildcardconsulting.dk--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Reverse scale of svg coordinates to atom coordinates

2017-06-21 Thread Esben Jannik Bjerrum via Rdkit-discuss

Hi RDkitters,   I'm experimenting a bit with an application with some user 
interactivity. I get the SVG coordinates from the Mol SVG drawing from the user 
interaction and need to get back to the atom coordinates with the goal of 
identifying the atom nearest the selected coordinates (or is there a smarter 
way to achieve that goal?).
Is this possible from Python currently?
I see that there is a MolDraw2D::getAtomCoords function in the cpp code for 
MolDraw2D, but I can't see it from the Python side, and there don't seem to be 
a way to get the scaling from the MolDraw2DSVG object.

As I understand it, the coordinates from the molecule are offset and scaled 
(and flipped for y) to fit the drawing canvas of the specified size. To get 
back to the original atom coordinates I must somehow reverse the 
transformation. Here's some script snippets illustrating what I try to achieve.

#Get som SVG depiction of a mol
mol = Chem.MolFromSmiles('')
mc = Chem.Mol(mol.ToBinary())
rdDepictor.Compute2DCoords(mc)
drawer = rdMolDraw2D.MolDraw2DSVG(300,300)
drawer.DrawMolecule(mc)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')

##Visualization and User interaction code here gives SVG coordinates
#
svg_x = 271.0svg_y = 237.0

#Is there a function to scale back the coordinates? alternatively get the 
scaling and the offset from drawer and handle it manually
atomcoords = drawer.getAtomCoords((svg_x, svg_y))
#But this function doesn't exist:-(
#...#Continue working with the rdkit mol

I would welcome some hints or suggestions.

Esben Jannik Bjerrum
cand.pharm, Ph.D
/Sent from my Ubuntu Touch Phone

Phone +45 2823 8009
http://dk.linkedin.com/in/esbenbjerrum
http://www.wildcardconsulting.dk--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] enumeration of smiles question

Re: [Rdkit-discuss] Reverse scale of svg coordinates to atom coordinates

[Rdkit-discuss] Reverse scale of svg coordinates to atom coordinates

3 matches

Site Navigation

Mail list logo

Footer information