[Rdkit-discuss] Chemical Formula to SMILES

2018-08-10 Thread Ali Eftekhari
Hello,

I am new to rdkit and for the start I want to conver the chemical formula
to SMILES; for example I have 'C16H14O10', how can I get its SMILES?

Thanks
Ali
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] UFF/MMFF atom types

2018-08-10 Thread Paolo Tosco
Hi Michal,

Ouch, it looks like I have somehow forgotten about this. I’ll have a look next 
week and get back to you.

Cheers,
p.

> On 10 Aug 2018, at 19:20, Michal Krompiec  wrote:
> 
> Hi Paolo,
> Has anything changed (re adding new atom types to UFF or MMFF) since then?
> Best,
> Michal 
> 
>> On Tue, 5 Nov 2013 at 06:56, Paolo Tosco  wrote:
>> Hi both,
>> 
>> now I realize that yesterday I replied to Michal and not to the list; sorry 
>> about that. Adding the option to force an atom type replacement wouldn't be 
>> too much work, but would not be ideal because in cause of selenium you would 
>> get, for instance, the same VdW radius and equilibrium bond distances as for 
>> sulfur, which would both be too short. That could also be handled upstream 
>> the atom typing by replacing Se with S and putting back Se downstream as I 
>> suggested yesterday to Michal, but again a bit of a hack. Probably a better 
>> solution would be to allow the user to provide some new parameters (as for 
>> UFF, adding Python support) and fall back to a related atom type (sulfur, in 
>> this case) for the missing ones. I'll look into that during the next days 
>> and let you know.
>> 
>> Best,
>> p.
>> 
>> -- 
>> ==
>> Paolo Tosco, Ph.D.
>> Department of Drug Science and Technology
>> Via Pietro Giuria, 9 - 10125 Torino (Italy)
>> Tel: +39 011 670 7680 | Mob: +39 348 5537206
>> Fax: +39 011 670 7687 | E-mail: paolo.to...@unito.it
>> http://open3dqsar.org | http://open3dalign.org
>> ==
>> 
>> 
>> 
>>> On 5 Nov 2013, at 07:20, Greg Landrum  wrote:
>>> 
>>> Hi Michal,
>>> 
>>> 
 On Mon, Nov 4, 2013 at 11:45 AM, Michal Krompiec 
  wrote:
 Hello,
 Is Se defined in UFF and/or MMFF94? Apparently, molecules with
 selenophene moieties don't optimize in RDKit, and a warning appears in
 the log: UFFTYPER: Unrecognized atom type: Se2+2
>>> 
>>> Right. UFF Parameters are there for sp3 Se ("Se3+2"), but not for the sp2 
>>> version.
>>> 
>>> There are no MMFF94 parameters for Se.
>>> 
 Is it possible to define/modify the force field by hand? (for example,
 use the parametrs of S for Se)
>>> 
>>> If you are working from C++, you can provide UFF parameters when you build 
>>> the force field, but it's not currently possible to do so from Python. It's 
>>> probably possible to add an option to the python UFF code to allow you to 
>>> provide a new atom type (or to over-ride parameters for an existing atom 
>>> type); I'd have to look into that.
>>> In the meantime, the quickest thing you could do would be to modify 
>>> $RDBASE/Code/ForceField/UFF/Params.cpp to add the atom type you want and 
>>> then to rebuild the RDKit.
>>> 
>>> I guess that adding new atom types to MMFF94S is considerably more complex. 
>>> Here one could imagine providing an interface to explicitly set the type of 
>>> an atom to another existing atom type. Paolo would have to let us know how 
>>> much work that is.
>>> 
>>> -greg
>>> 
>> 
>>> --
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models. Explore
>>> techniques for threading, error checking, porting, and tuning. Get the most 
>>> from the latest Intel processors and coprocessors. See abstracts and 
>>> register
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231=/4140/ostg.clktrk
>> 
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] UFF/MMFF atom types

2018-08-10 Thread Michal Krompiec
Hi Paolo,
Has anything changed (re adding new atom types to UFF or MMFF) since then?
Best,
Michal

On Tue, 5 Nov 2013 at 06:56, Paolo Tosco  wrote:

> Hi both,
>
> now I realize that yesterday I replied to Michal and not to the list;
> sorry about that. Adding the option to force an atom type replacement
> wouldn't be too much work, but would not be ideal because in cause of
> selenium you would get, for instance, the same VdW radius and equilibrium
> bond distances as for sulfur, which would both be too short. That could
> also be handled upstream the atom typing by replacing Se with S and putting
> back Se downstream as I suggested yesterday to Michal, but again a bit of a
> hack. Probably a better solution would be to allow the user to provide some
> new parameters (as for UFF, adding Python support) and fall back to a
> related atom type (sulfur, in this case) for the missing ones. I'll look
> into that during the next days and let you know.
>
> Best,
> p.
>
> --
> ==
> Paolo Tosco, Ph.D.
> Department of Drug Science and Technology
> Via Pietro Giuria, 9 - 10125 Torino (Italy)
> Tel: +39 011 670 7680 | Mob: +39 348 5537206
> Fax: +39 011 670 7687 | E-mail: paolo.to...@unito.it
> http://open3dqsar.org | http://open3dalign.org
> ==
>
>
>
> On 5 Nov 2013, at 07:20, Greg Landrum  wrote:
>
> Hi Michal,
>
>
> On Mon, Nov 4, 2013 at 11:45 AM, Michal Krompiec <
> michal.kromp...@gmail.com> wrote:
>
>> Hello,
>> Is Se defined in UFF and/or MMFF94? Apparently, molecules with
>> selenophene moieties don't optimize in RDKit, and a warning appears in
>> the log: UFFTYPER: Unrecognized atom type: Se2+2
>>
>
> Right. UFF Parameters are there for sp3 Se ("Se3+2"), but not for the sp2
> version.
>
> There are no MMFF94 parameters for Se.
>
> Is it possible to define/modify the force field by hand? (for example,
>> use the parametrs of S for Se)
>>
>
> If you are working from C++, you can provide UFF parameters when you build
> the force field, but it's not currently possible to do so from Python. It's
> probably possible to add an option to the python UFF code to allow you to
> provide a new atom type (or to over-ride parameters for an existing atom
> type); I'd have to look into that.
> In the meantime, the quickest thing you could do would be to modify
> $RDBASE/Code/ForceField/UFF/Params.cpp to add the atom type you want and
> then to rebuild the RDKit.
>
> I guess that adding new atom types to MMFF94S is considerably more
> complex. Here one could imagine providing an interface to explicitly set
> the type of an atom to another existing atom type. Paolo would have to let
> us know how much work that is.
>
> -greg
>
>
> --
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the
> most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231=/4140/ostg.clktrk
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] enumeration of smiles question

2018-08-10 Thread Esben Jannik Bjerrum via Rdkit-discuss
Hi There,  Just saw this interesting thread :-) The code I posted on GitHub 
https://github.com/EBjerrum/SMILES-enumeration as referenced previously in this 
thread also uses randomization of atom order, similar to Greg's solution here, 
to generate more enumerated SMILES than using the rootedAtom approach. Its not 
a complete enumeration, as there interestingly also seem to be other ways to 
represent the molecules with dots! Thanks, could be interesting to explore!

Nevertheless, the actual enumerator code is wrapped in a couple of objects, 
which can be used to either just generate the SMILES dataset in various forms, 
or do it on the fly as batch generators. That works nicely with the 
fit_generator function of Keras if you use that framework. This avoids memory 
issues with large datasets and is convenient, at the cost of some overhead in 
the training (a few percent longer training).
In some of my recent applications I use the binary format or the mol objects 
directly, instead of round tripping the SMILES over an RDKit molecule.

It seems like the enumeration trick is a nice way to break the SMILES 
serialization of the molecular representation and somehow generate an internal 
representation of the molecule closer to the graph we think of molecules in. I 
did some work with autoencoders as hetereoencoder, trying to encode different 
molecular formats and also from enumerated to enumerated. It seem to work! even 
though I'm presenting a random SMILES and ask the network to encode it to a 
vector and then decode into another randomly chosen SMILES of the same molecule 
during training. Each time a new pair of two randomly generated SMILES of the 
same molecule. The teacher forcing of the decoder is probably crucial here, as 
it lets the decoder correct its later guesses, based on the actual right answer 
pr. character. Doing this seem to have a lot of influence on the latent space 
encoded by the autoencoder, with possible implications for molecular de novo 
generation.
Theres a preprint here: https://arxiv.org/abs/1806.09300
Some researchers at Bayer have independently from me also worked on similar 
approaches and showed improvements for using the latent space representation 
for QSAR modelling.
https://chemrxiv.org/articles/Learning_Continuous_and_Data-Driven_Molecular_Descriptors_by_Translating_Equivalent_Chemical_Representations/6871628
I guess we haven't seen the end of this yet, as there is a lot to explore and 
improve on. Its super fascinating how far a bit of deep learning and data 
augmentation of the SMILES works.
Best RegardsEsben
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss