On 06/15/2017 03:50 PM, Greg Landrum wrote:
Thanks for letting people know about this. If we can get a consensus form that people agree makes sense, this might be a nice addition to either the RDKit/Scripts directory or the cookbook.

A couple of smallish comments after a quick skim:
- I would really strongly encourage you to use the ETKDG parameters (http://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654) when doing the embedding. This really helps a lot with the quality of the conformations and lets you skip the UFF step. - The built-in RMSD pruning has improved since JP's article, it may be worth looking at that.

It would be nice if we have a way faster protocol than what I implemented.

This protocol (the one from the paper) is super slow due
to the RMSD pruning step (not due to UFF).
The more conformers/molecule you need, the slower.

But it works, at least.

The problem if you change the protocol to something more modern
is that you have to redo all the statistical validation they
did to confirm it works well.
Which requires quite some time and motivation.

- If you want to make the embedding step itself robust, it wouldn't be a bad idea to try switching to random coordinate generation if the initial embedding fails.

Thanks for the comment. I might update this part if I see it fail.

Regards,
F.

Best,
-greg



On Wed, Jun 14, 2017 at 9:27 AM, Francois BERENGER <beren...@bioreg.kyushu-u.ac.jp <mailto:beren...@bioreg.kyushu-u.ac.jp>> wrote:

    Hello,

    I gave a try at reproducing the protocol described in:

    @article{DBLP:journals/jcisd/EbejerMD12,
       author    = {Jean{-}Paul Ebejer and Garrett M. Morris and
                    Charlotte M. Deane},
       title     = {Freely Available Conformer Generation Methods:
                    How Good Are They?},
       journal   = {Journal of Chemical Information and Modeling},
       volume    = {52},
       number    = {5},
       pages     = {1146--1158},
       year      = {2012},
       url       = {https://doi.org/10.1021/ci2004658
    <https://doi.org/10.1021/ci2004658>},
       doi       = {10.1021/ci2004658},
    }

    The resulting script is there:

    https://github.com/UnixJunkie/smi2sdf3d
    <https://github.com/UnixJunkie/smi2sdf3d>

    I hope I could reproduce their protocol exactly.
    Sorry, my python is so rusty these days.

    Comments and contributions are welcome.

    Even auditing the code for correctness is welcome since it is
    doing some scientific computation.

    It is a little bit too slow to my taste.

    You can use it like this to get a max of 10 conformers
    per molecule in your input.smi file:

    ./smi2sdf.py 10 input.smi output.sdf

    Best regards,
    Francois.

    
------------------------------------------------------------------------------
    Check out the vibrant tech community on one of the world's most
    engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    <mailto:Rdkit-discuss@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
    <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to