Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Curt Fischer
Thanks Ivan -- very helpful.

Is there any consensus on idioms for identifying multiple moieties in the
same fragment?  Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as
some kind of selector and then do some kind of graph traversal routine to
see if any of the matches are covalently connected?

On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi Curt,
>
> According to
> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
> it's not supported:
>
> Here’s the (hopefully complete) list of SMARTS features that are *not*
>>  supported:
>>
>>- Non-tetrahedral chiral classes
>>
>>
>>- the @? operator
>>
>>
>>- explicit atomic masses (though isotope queries are supported)
>>
>>
>>- component level grouping requiring matches in different components,
>>i.e. (C).(C)
>>
>> OK, the way it's worded it sounds like (C.C) might be supported (since
> that would be requiring matches in the same component), but as you've seen,
> it isn't supported either...
>
> Ivan
>
>
> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer 
> wrote:
>
>> Hi rdkit fiends!
>>
>> The [Daylight SMARTS example page](
>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
>> gives several examples for "multiple group" smarts, including these strings:
>>
>> ([Cl!$(Cl~c)].[c!$(c~Cl)])
>> ([Cl]).([c])
>> ([Cl].[c])
>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>>
>> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>>
>> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
>> this error message:
>>
>> ```
>> [13:01:41] SMARTS Parse Error: syntax error while parsing:
>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
>> ```
>> My understanding of SMARTS is that the outermost parentheses in this
>> SMARTS string are required to force the chlorine and the aromatic carbon to
>> be somewhere in the same covalently connected fragment.  E.g. this pattern
>> *should* hit benzyl chloride ClCc1c1 but should *not* hit the
>> hydrochloride salt of aniline Cl.Nc1c1.
>>
>> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
>> that achieves this?  (I want to filter our molecules that contain more than
>> one of certain moieties, while allowing molecules that have one (or zero)
>> such moieties.  But salts or covalently disconnected fragments that each
>> contain one instance of the moiety should be fine.)
>>
>> Details on my setup:
>>
>> - RDKit Version: 2019.09.3
>> - Operating system: macOS 10.15.2
>> - Python version (if relevant): 3.6
>> - Are you using conda? yes
>> - If you are using conda, which channel did you install the rdkit from?
>> `conda-forge`
>> - If you are not using conda: how did you install the RDKit?
>>
>> Curt
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Curt Fischer
Hi rdkit fiends!

The [Daylight SMARTS example page](
https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
gives several examples for "multiple group" smarts, including these strings:

([Cl!$(Cl~c)].[c!$(c~Cl)])
([Cl]).([c])
([Cl].[c])
[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]

In general, I cannot get these to be parsed by Chem.MolFromSmarts().

For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
this error message:

```
[13:01:41] SMARTS Parse Error: syntax error while parsing:
([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
[13:01:41] SMARTS Parse Error: Failed parsing SMARTS
'([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
```
My understanding of SMARTS is that the outermost parentheses in this SMARTS
string are required to force the chlorine and the aromatic carbon to be
somewhere in the same covalently connected fragment.  E.g. this pattern
*should* hit benzyl chloride ClCc1c1 but should *not* hit the
hydrochloride salt of aniline Cl.Nc1c1.

What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
that achieves this?  (I want to filter our molecules that contain more than
one of certain moieties, while allowing molecules that have one (or zero)
such moieties.  But salts or covalently disconnected fragments that each
contain one instance of the moiety should be fine.)

Details on my setup:

- RDKit Version: 2019.09.3
- Operating system: macOS 10.15.2
- Python version (if relevant): 3.6
- Are you using conda? yes
- If you are using conda, which channel did you install the rdkit from?
`conda-forge`
- If you are not using conda: how did you install the RDKit?

Curt
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] suggestions for comprehensive searchable database of natural products

2017-11-28 Thread Curt Fischer
I'd also be interested in Juptyer notebooks for this.  Thanks Tyler!

Curt

On Tue, Nov 28, 2017 at 8:26 AM, Tyler Backman  wrote:

> Hi Jim,
>
> MiBIG is a useful database of natural product gene clusters and
> structures, which you can download in JSON format here, and use pretty
> easily from within Python:
> https://mibig.secondarymetabolites.org/repository.html This also
> includes pathway and organism information.
>
> Secondly, our ClusterCAD database is built with RDKit and Django, but
> only includes Type I modular PKSs imported from MiBIG. You can use it
> online at clustercad.jbei.org, or view the code and launch a docker
> install locally from https://github.com/JBEI/clusterCAD. Internally,
> it has a RDKit postgresql database, and includes predicted chemical
> intermediates at each step of biosynthesis in addition to final
> products. It is hand curated, to improve on the automatic AntiSMASH
> annotations in MiBIG. I will gradually expand this to support a
> greater diversity of natural products. I could send you an example
> Jupyter notebook for using it programatically.
>
> Sincerely,
> Tyler
>
> On Mon, Nov 27, 2017 at 1:30 PM, James T. Metz via Rdkit-discuss
>  wrote:
> > RDkit Discussion Group,
> >
> > My apologies in advance if my request is not appropriate for this
> > discussion group.
> >
> > Given a small molecule that might have some resemblance to natural
> > products,
> > can someone suggest a free, comprehensive, PYTHON/RDkit searchable
> database
> > of natural products that might be suitable for similarity and
> substructure
> > searching.
> >
> > I am aware of a few websites that permit searching on the website. If
> > possible,
> > I would like to programmatically search by running a PYTHON/RDkit script
> on
> > my
> > local machine and then return the structures of related molecules to my
> > local script.
> >
> > I would prefer not having to download and store a huge database.
> >
> > Also, if possible, it would be important to return the organism(s)
> that
> > creates
> > the natural product.  Pathway information would be also very, very
> helpful.
> >
> > I greatly welcome comments and suggestions.
> >
> > Thank you.
> >
> > Regards,
> > Jim Metz
> > Northwestern University
> >
> >
> >
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
>
>
> --
> Tyler W. H. Backman
> Postdoctoral Fellow
> Lawrence Berkeley National Laboratory
> Joint BioEnergy Institute
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] two rdkit versions, always one is used

2017-10-25 Thread Curt Fischer
I find it also helps to install nb_conda in all the conda environments that
I use Jupyter in.

What happens when you do jupyter kernelspec list ?

Curt

On Wed, Oct 25, 2017 at 8:29 AM, Greg Landrum 
wrote:

> Hopefully you are using conda environments and not virtual envs, but
> whenever this happens to me it’s because I forgot to install Jupyter in the
> new environment, so when I invoke Jupyter from the command line it uses the
> one in the default conda environment.
>
> Test for this: check that you have Jupyter installed and are using the
> right one (“which jupyter”)
>
> -greg
>
> --
> *From:* Markus Metz 
> *Sent:* Wednesday, October 25, 2017 3:36:08 PM
> *To:* RDKit Discuss
> *Subject:* [Rdkit-discuss] two rdkit versions, always one is used
>
> Hello all:
>
> I just installed the newest rdkit version via conda in a virtual env
> called 09217.
> This went without any problems.
> In addition I have an older rdkit version installed in a virtual env
> called 092016.
>
> Now, when I run a jupyter notebook in the 092017 env and display the rdkit
> version it still says 092016.
>
> Does anybody have an idea what is going on?
>
> Many thanks in advance,
>
> Markus
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Curt Fischer
I'm not 100% sure about this particular case, but I suspect this is a
limitation of InChI.  For example, the InChI representation of zwitterionic
phenylalanine (negative COO-, positive NH3+) and "neutral" phenylalanine
(neutral COOH, neutral NH2) is exactly the same.  This is by design.  See
https://chemistry.stackexchange.com/questions/34563/pubchem-inchi-smiles-and-uniqueness
for some possibly useful additional discussion.

The InChI FAQ at http://www.inchi-trust.org/technical-faq/#13.2 says:

This is exemplified below by standard InChIKeys as well as standard InChI
> strings for neutral, zwitterionic, anionic and cationic states of glycine
> (note that neutral and zwitterionic states do not differ in the total
> number of protons so they have the same standard InChI/InChIKey):


Is this the same as or at least similar to the issue you are encountering?

Curt

On Thu, Sep 14, 2017 at 11:09 AM, Jason Biggs  wrote:

> Okay, all three of these smiles strings resolve to the same inchi,
>
> "O=[N+](C1=NC2=CC=CC=C2N=C1)[N-](=O)C1=NC2=CC=CC=C2N=C1"
> "C1=CC=C2C(=C1)N=CC(=N2)N(=N(=O)C3=NC4=CC=CC=C4N=C3)=O"
> "[O-][N+](c1cnc2c2n1)=[N+]([O-])c3cnc4c4n3"
>
> even though to me they seem like different structures due to the specified
> charges.  Is this a limitation of inchi, or do I need to rethink my ideas
> of what makes two chemical structures the same?
>
>
>
>
>
> Jason Biggs
>
>
> On Thu, Sep 14, 2017 at 12:38 PM, John Mayfield <
> john.wilkinson...@gmail.com> wrote:
>
>> InChI is an identifier and not a representation, you should not read
>> InChIs... but we are beyond hope there so...
>>
>> The InChI string is correct and is the same if you roundtrip your
>> preferred one with charge separated bonds and the 5 valent one.
>>
>> All toolkits will use the InChI library to read/write InChIs and it
>> generates the representation with 5v nitrogens, cactus is either applying
>> normalisation after reading or in this case (since it's the name resolved)
>> doing a identifier lookup from an original SMILES used to generate this
>> InChI:
>>
>> echo 'InChI=1S/C16H10N6O2/c23-21(15-9-17-11-5-1-3-7-13(11)19-15)
>>> 22(24)16-10-18-12-6-2-4-8-14(12)20-16/h1-10H' | inchi -STDIO
>>> -inChi2Struct -OutputSDF | obabel -imol -osmi
>>
>> c1ccc2c(c1)ncc(n2)N(=N(=O)c1cnc2c2n1)=O Structure #1
>>
>> SDF also attached without going though Open Babel.
>>
>> - John
>>
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Announce: RDKit for Excel.

2017-06-19 Thread Curt Fischer
This is great!  Thanks Jan.

I haven't tried it yet, but based on the GitHub README it looks like this
is only for Excel on a Windows box.  Is that right, or can Mac versions of
Excel also work?  And since it's come up on the mailing list recently, is
there a plan to expand to/move to Python 3 at any point soon?

Curt

On Sun, Jun 18, 2017 at 11:06 AM, Jan Holst Jensen 
wrote:

> Hi RDKitters,
>
> I am happy to announce an open source Excel add-in that gives easy access
> to the RDKit Python API. The add-in is BSD-licensed like RDKit.
> https://github.com/janholstjensen/rdkit4excel
>
> Screenshot of the add-in running in Excel 2016 (note: molecule rendering
> requires additional 3rd party software):
>
>
> The add-in is easily extendable via pure Python scripting. A new Excel
> function is added by adding a function to the CRDKitXL Python class and
> annotating the new function's input/output parameter types through
> structured comments. For example, adding an "rdkit_SmilesToMolBlock()"
> function that has a single "smiles" string input parameter:
>
> #RDKITXL: in:smiles:str, out:str
> def rdkit_SmilesToMolBlock(self, smiles):
> # Python function implementation follows here...
>
>
> Many thanks to Esben Jannik Bjerrum who did the implementation of this
> first version.
>
> Cheers
> -- Jan Holst Jensen
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] difficulties with AllChem.EmbedMultipleConfs() on a macrocycle

2017-05-08 Thread Curt Fischer
Thanks Greg!  This was the impetus I needed to upgrade to 2017.03.

I now confirm that conformer generation happens very quickly.  It's so nice
to be able to visualize so nicely with py3dmol too.

Curt

On Sun, May 7, 2017 at 9:37 PM, Greg Landrum <greg.land...@gmail.com> wrote:

> Going back through old mail I realized that I never did follow up on this
> one.
>
> It looks like these structures now embed (using the 2017.03.1 release). I
> assume that the changes I made to fix #1240 (https://github.com/rdkit/
> rdkit/issues/1240) - which make the code more permissive in how the
> conformers are filtered - are responsible for that.
>
> -greg
>
>
>
>
> On Sat, Mar 4, 2017 at 7:39 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Hi Curt,
>>
>> I believe that the problem here is caused by the number of specified
>> chiral centers in a ring. I'm basing that guess on the fact that if I turn
>> off the option to enforce chirality I get an answer very quickly:
>>
>> In [12]: ps = AllChem.ETKDG()
>>
>> In [13]: ps.randomSeed = 0xf00d
>>
>> In [14]: ps.enforceChirality=False
>>
>> In [15]: AllChem.EmbedMolecule(mh,ps)
>> Out[15]: 0
>>
>>
>> but if I go back to the defaults I get the same lack of results that you
>> were seeing:
>>
>> In [16]: ps.enforceChirality=True
>>
>> In [17]: AllChem.EmbedMolecule(mh,ps)
>> Out[17]: -1
>>
>>
>>
>> I'm not sure that there's a straightforward solution to this problem
>> without code changes, but I'll do a bit of looking to see if I can figure
>> something out.
>>
>> -greg
>>
>>
>>
>> On Thu, Mar 2, 2017 at 7:34 PM, Curt Fischer <curt.r.fisc...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I really like combination of rdkit and py3dmol and have been able to
>>> replicate e.g. Greg's notebook here: http://nbviewer.jupyter.
>>> org/github/greglandrum/rdkit_blog/blob/master/notebooks/Tryi
>>> ng%20py3Dmol.ipynb
>>>
>>> But I can't seem to get AllChem.EmbedMultipleConfs() to generate any
>>> valid conformers for a macrotriolide, macrosphelide A.
>>>
>>> *macrosphelide_a_smiles =
>>> 'C[C@H]1CC(O[C@H](C)[C@H](O)/C=C/C(O[C@@H](C)[C@@H](O)/C=C/C(O1)=O)=O)=O'*
>>> *m = Chem.MolFromSmiles(macrosphelide_a_smiles)*
>>> *mh = Chem.AddHs(m)*
>>> *AllChem.EmbedMultipleConfs(mh, useExpTorsionAnglePrefs=True,
>>> useBasicKnowledge=True)*
>>> *mb = Chem.MolToMolBlock(mh)*
>>>
>>> The EmbedMultipleConfs() call never terminates for me.  If I use a
>>> non-zero value for *maxAttempts*, the call does terminate, but when I
>>> look at *mb*, the coordinates for all atoms are zero.
>>>
>>> I've tried playing around with a few of the other options, without
>>> luck.  Either all atom coordinates are still zero after
>>> *EmbedMultipleConfs()*, or the function call never terminates.
>>>
>>> Any chance someone knows how to coax this function into yielding a
>>> useful conformation for my molecule?
>>>
>>> Curt
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] nitro-compounds from smarts

2017-05-04 Thread Curt Fischer
Rafal,


> this three Mols (m1, m2, m33) below should all represent nitrobenzene,
> right?
> >>> m33=Chem.MolFromSmarts('c1c1[N+](=O)[O-]')
> >>> m1=Chem.MolFromSmiles('c1c1[N+](=O)[O-]')
> >>> m2=Chem.MolFromSmiles('c1c1N(=O)(=O)')
>

I guess m33 represents nitrobenzene in a sense, but unlike m1 and m2, it
also is a representation of nitrotoluene, nitronapthalene, and all 2-, 3-,
and 4-substituted (and multiply substituted) nitrobenzenes.

If wanted a SMARTS query that would match "only" nitrobenzene, I would
start with something like
'[cH1]1[cH1][cH1][cH1][cH1][cH0]1[NX3H0](~[OX1H0])~[OX1H0]', but even that
might also represent some nitrobenzyl radical cation or something.

*nitrobenz = Chem.MolFromSmiles('c1c1[N+](=O)[O-]')*
*nitrotol = Chem.MolFromSmiles('Cc1c1[N+](=O)[O-]')*
*nitrobenz_smarts =
Chem.MolFromSmarts('[cH1]1[cH1][cH1][cH1][cH1][cH0]1[NX3H0](~[OX1H0])~[OX1H0]')*
*m33=Chem.MolFromSmarts('c1c1[N+](=O)[O-]')*

*print([m.HasSubstructMatch(nitrobenz_smarts) for m in [nitrotol,
nitrobenz]])*
*print([m.HasSubstructMatch(m33) for m in [nitrotol, nitrobenz]])*



> 2. results of HasSubstructMatch is really unexpected:
>
> >>> m2.HasSubstructMatch(m33)
> True
> >>> m1.HasSubstructMatch(m33)
> True
> >>> m33.HasSubstructMatch(m1)
> False
> >>> m33.HasSubstructMatch(m2)
> False
> >>>
>
> m1, m2 is substruct of m33 but m33 is not substuct of m1 or m2. I
> really dont understand this.
> It seems this is problem with smarts mol:
> >>> m33.HasSubstructMatch(m33)
> False
>

I think if you're going to find substructure matches on un-Santized
molecules, you probably want to use the useQueryQueryMatches option.  The
following results in *True*:

*m33.HasSubstructMatch(m33, useQueryQueryMatches=True)*

But it's not clear to me why you would need to query in this way.

I think there likely is some weird behavior of nitro groups that is
happening here, but it's hard to be sure unless we are all using our terms
the same way.


Curt
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Information contained in SMARTS and SMILES

2017-04-19 Thread Curt Fischer
Hi Thilo,

Interesting question.  rdkit-discuss members should know you also posted a
very similar question to
https://chemistry.stackexchange.com/questions/72880/is-converting-smarts-to-smiles-a-lossless-operation
.

If an interesting answer materializes here, it would be useful to post it
there, and vice-versa.

Curt

On Wed, Apr 19, 2017 at 3:03 AM, Thilo Bauer  wrote:

> Dear mailinglist-members,
>
> is converting SMARTS to SMILES a "lossless" operation, or does one loose
> information on doing so?
>
> Background:
> I've got three different SMARTS strings representing the same structure
> - at least when depicting it. Also all three strings result in the exact
> same SMILES (see code and output below).
>
> Now, don't take this wrong, I do know the differences between SMARTS and
> SMILES, and I do know what the symbols in SMARTS mean. I just wonder,
> when I use either the threes SMARTS or the single SMILES as a pattern
> for a substruct match, if there is a chance that I get different
> results, or let's say if I would miss substructure occurences by using
> the single SMILES? I could not make up a case where this happened.
>
>
>  >>> m =
> Chem.MolFromSmarts('[#6]-1=[#6]-[#6](-[#6]-[#6](-[#6]-1)-[#6])=[#8]')
>  >>> Chem.MolToSmiles(m)
> 'CC1CC=CC(=O)C1'
>  >>> m = Chem.MolFromSmarts('[#6]-1-[#6]=[#6]-[#6](-[#6]-[#6]-1-[#6]
> )=[#8]')
>  >>> Chem.MolToSmiles(m)
> 'CC1CC=CC(=O)C1'
>  >>> m = Chem.MolFromSmarts('[#6]-1-[#6](-[#6]=[#6]-[#6]-[#6]-1-[#6]
> )=[#8]')
>  >>> Chem.MolToSmiles(m)
> 'CC1CC=CC(=O)C1'
>
>
> Thank's a lot in advance!
>
> Thilo
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check If Atom Is in Two Small Rings

2017-04-11 Thread Curt Fischer
Brian's solution is obviously better (shorter, uses less functions) than
mine.  (Although mine assumes that you want atoms that are part of
_exactly_ two rings, not atoms that are part of _at least_ two rings as
Brian's does.  Probably Brian's solution is what you want but worth noting.)

CF

On Tue, Apr 11, 2017 at 1:03 PM, Brian Kelley  wrote:

> You are so close!
>
> >>> from rdkit import Chem
>
> >>> m = Chem.MolFromSmiles("C1CC12CCC2")
>
> >>> for atom in m.GetAtoms():
>
> ...   if atom.IsInRingSize(3) and atom.IsInRingSize(4): print atom.GetIdx()
>
> ...
>
> 2
>
> >>>
>
> Cheers,
>  Brian
>
> On Tue, Apr 11, 2017 at 1:38 PM, Jonathan Saboury 
> wrote:
>
>> Hello All,
>>
>> I'm trying to make a function to check if a mol has an atom that is part
>> of two small rings (3 or 4 atoms). Using GetRingInfo()/NumAtomRings() I can
>> find out how many ring systems each atom is in, but not the details of the
>> rings. atom.IsInRingSize(size) returns a bool so I couldn't use that. I'm
>> using the python api.
>>
>> Any suggestions? Thanks!
>>
>> - Jonathan
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] forcing better depictions for macrocycles

2017-04-05 Thread Curt Fischer
Thanks to Greg for the feedback.

One more follow-up question: it seems that *Compute2DCoords()* offers an
argument *coordMap* which is a dictionary of the format {int:
rdkit.Geometry.rdGeometry.Point2D}, where int is an atom index.
It seems like this is a way to fix certain atoms in a predefined position,
so that only non-specified atoms can float during *Compute2DCoords()*.
Thus it seems like a "polishing" step for depictions would be to use this
function  after aligning to a template, fixing the template-associated
atoms in place and letting any non-templated atoms float.

What's the easiest way to get an appropriately formatted *coordMap*
dictionary?

I tried something like this:

my_dict = {idx: mol.GetConformer(0).GetAtomPosition(idx) for idx in atom_list}
AllChem.Compute2DCoords(radicicol, coordMap = my_dict)


But it seems that *GetAtomPosition(idx)*returns *Point3D* objects instead
of Point2D objects, so I'm not sure how I can get *Point2D* objects.

Help appreciated!

Curt

On Wed, Apr 5, 2017 at 12:38 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Curt,
>
>
> On Tue, Apr 4, 2017 at 12:03 PM, Curt Fischer <curt.r.fisc...@gmail.com>
> wrote:
>
>>
>> RDKit's default 2D-depictions of macrocycles are very "round".  I found
>> some slides
>> <https://www.slideshare.net/NextMoveSoftware/rdkit-ugm-2016-higher-quality-chemical-depictions>
>>  from
>> John Mayfield that come from a 2016 UK RDKit user group meeting that says
>> the same thing.  (See in particular slide 31.)
>>
>
> Yes, they are indeed very round and non-chemical.
>
>
>>
>> I'm wondering, what is the best way of forcing RDKit's depictions of
>> these types of molecules to be less round?  (And I'm aware that a possible
>> answer is, "there isn't a good way yet".)
>>
>
> I'm afraid the answer is "there isn't a good way yet". I do really hope
> that this answer will change in a not-too-distant release, but I cannot
> promise anything.
>
> In a Jupyter notebook, I (hopefully) illustrate three approaches: (i) just
>> importing an .sdf of your molecules from somewhere else, (ii) aligning to a
>> non-macrocyclic substructure, and (iii) using the TemplateAlign module.
>> https://github.com/tentrillion/ipython_notebooks/blob/
>> master/force_pretty_macrocycles.ipynb
>>
>>
> I think you hit on everything that's currently possible here. Using a
> template to organize the atoms of the macrocycle, like you do to produce
> outputs 8 and 11, seems to me like the strategy that's most likely to work.
> It's unfortunate that it doesn't. My normal answer to this kind of
> situation is "it's a hard problem and the code does what it can. Changing
> the algorithm is a lot of work.", but those particular pathologies almost
> look like bugs, not algorithmic deficiencies. I will take a look to see I
> can track down what's causing that.
>
> -greg
>
>
>
>
>> What approaches did I miss?  What should I be doing?
>>
>> Curt
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] forcing better depictions for macrocycles

2017-04-04 Thread Curt Fischer
Hi RDKit community,

RDKit's default 2D-depictions of macrocycles are very "round".  I found
some slides

from
John Mayfield that come from a 2016 UK RDKit user group meeting that says
the same thing.  (See in particular slide 31.)

I'm wondering, what is the best way of forcing RDKit's depictions of these
types of molecules to be less round?  (And I'm aware that a possible answer
is, "there isn't a good way yet".)

In a Jupyter notebook, I (hopefully) illustrate three approaches: (i) just
importing an .sdf of your molecules from somewhere else, (ii) aligning to a
non-macrocyclic substructure, and (iii) using the TemplateAlign module.
https://github.com/tentrillion/ipython_notebooks/blob/master/force_pretty_macrocycles.ipynb

What approaches did I miss?  What should I be doing?

Curt
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] numpy array to bit vector

2017-03-17 Thread Curt Fischer
Hi Greg,

On Thu, Mar 16, 2017 at 9:05 PM, Greg Landrum 
wrote:

> I'm a bit confused by all this. The RDKit has Tanimoto (and a bunch of
> other similarity measures) built in:
>
>
Good point (as always).  I'd been assuming that for some reason that OP had
fingerprints that had been converted to *numpy.ndarray* objects, not
*rdkit.DataStructs.ExplicitBitVect
*objects.

Looking back over the thread, maybe what was really being asked was, "how
do I convert *numpy.ndarray* objects to *rdkit.DataStructs.ExplicitBitVect *
objects?

Curt
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] numpy array to bit vector

2017-03-16 Thread Curt Fischer
If you are looking for something quick and dirty, you could stay in numpy
to calculate Tanimoto.

*from rdkit import Chem*
*from rdkit.Chem import AllChem*

*import numpy as np*
*from __future__ import division*

*mol1 = Chem.MolFromSmiles('CCO')*
*mol2 = Chem.MolFromSmiles('CCC')*

*fp1 = np.array(AllChem.GetMorganFingerprintAsBitVect(mol1, 8),
dtype='bool')*
*fp2 = np.array(AllChem.GetMorganFingerprintAsBitVect(mol2, 8),
dtype='bool')*

*def tanimoto(v1, v2):*
*"""*
*Calculates tanimoto similarity for two bit vectors*
*"""*
*return(np.bitwise_and(v1, v2).sum() / np.bitwise_or(v1, v2).sum())*

*tanimoto(fp1, fp2)*

*Out[4]:0.42857142857142855*


On Thu, Mar 16, 2017 at 7:28 AM, Thomas Evangelidis 
wrote:

> Hello,
>
> I created a numpyarray from a molecule using the following function:
>
> AllChem.GetMorganFingerprintAsBitVect()
>
>
> Now I would like to convert back to bit vector the numpy array, in order
> to calculate the Tanimoto similarity of two compounds. Is this possible?
>
> thanks
> Thomas
>
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] difficulties with AllChem.EmbedMultipleConfs() on a macrocycle

2017-03-02 Thread Curt Fischer
Thanks for the notebook Sereina!

Unfortunately when I run it I get different results.  In your version, the
very first call to EmbedMolecule() returns 0, which presumably means that
embedding went OK.




*## Embed the molecule without HsAllChem.EmbedMolecule(m,
useExpTorsionAnglePrefs=True, useBasicKnowledge=True)Out[7]: 0*


When I run your notebook, this same call returns -1.  Maybe my rdkit is
different than yours?  I'm using '2016.09.2' on Mac OSX 64-bit.



On Thu, Mar 2, 2017 at 12:00 PM, Sereina <sereina.rini...@gmail.com> wrote:

> Hi Curt,
>
> This is an interesting one. If you add the hydrogens before generating the
> conformer as in your example, then no conformation can be found. However,
> if you add them *after* the conformer generation, it works fine. Maybe that
> could serve as a work around for you. I attach a notebook as illustration.
> As this occurs with both DG and ETKDG, it may be due to the tests to ensure
> that the chiral centers are correct. I will have a closer look (hopefully
> with Greg’s help).
>
> Best,
> Sereina
>
>
>
>
>
> On 02 Mar 2017, at 19:34, Curt Fischer <curt.r.fisc...@gmail.com> wrote:
>
> Hi all,
>
> I really like combination of rdkit and py3dmol and have been able to
> replicate e.g. Greg's notebook here: http://nbviewer.jupyter.
> org/github/greglandrum/rdkit_blog/blob/master/notebooks/
> Trying%20py3Dmol.ipynb
>
> But I can't seem to get AllChem.EmbedMultipleConfs() to generate any
> valid conformers for a macrotriolide, macrosphelide A.
>
> *macrosphelide_a_smiles =
> 'C[C@H]1CC(O[C@H](C)[C@H](O)/C=C/C(O[C@@H](C)[C@@H](O)/C=C/C(O1)=O)=O)=O'*
> *m = Chem.MolFromSmiles(macrosphelide_a_smiles)*
> *mh = Chem.AddHs(m)*
> *AllChem.EmbedMultipleConfs(mh, useExpTorsionAnglePrefs=True,
> useBasicKnowledge=True)*
> *mb = Chem.MolToMolBlock(mh)*
>
> The EmbedMultipleConfs() call never terminates for me.  If I use a
> non-zero value for *maxAttempts*, the call does terminate, but when I
> look at *mb*, the coordinates for all atoms are zero.
>
> I've tried playing around with a few of the other options, without luck.
> Either all atom coordinates are still zero after *EmbedMultipleConfs()*,
> or the function call never terminates.
>
> Any chance someone knows how to coax this function into yielding a useful
> conformation for my molecule?
>
> Curt
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot__
> _
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] difficulties with AllChem.EmbedMultipleConfs() on a macrocycle

2017-03-02 Thread Curt Fischer
Hi all,

I really like combination of rdkit and py3dmol and have been able to
replicate e.g. Greg's notebook here:
http://nbviewer.jupyter.org/github/greglandrum/rdkit_blog/blob/master/notebooks/Trying%20py3Dmol.ipynb

But I can't seem to get AllChem.EmbedMultipleConfs() to generate any valid
conformers for a macrotriolide, macrosphelide A.

*macrosphelide_a_smiles =
'C[C@H]1CC(O[C@H](C)[C@H](O)/C=C/C(O[C@@H](C)[C@@H](O)/C=C/C(O1)=O)=O)=O'*
*m = Chem.MolFromSmiles(macrosphelide_a_smiles)*
*mh = Chem.AddHs(m)*
*AllChem.EmbedMultipleConfs(mh, useExpTorsionAnglePrefs=True,
useBasicKnowledge=True)*
*mb = Chem.MolToMolBlock(mh)*

The EmbedMultipleConfs() call never terminates for me.  If I use a non-zero
value for *maxAttempts*, the call does terminate, but when I look at *mb*,
the coordinates for all atoms are zero.

I've tried playing around with a few of the other options, without luck.
Either all atom coordinates are still zero after *EmbedMultipleConfs()*, or
the function call never terminates.

Any chance someone knows how to coax this function into yielding a useful
conformation for my molecule?

Curt
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] isotopic SMILES

2017-02-07 Thread Curt Fischer
I think you've persuaded me that .SetIsotope() is the way to go...

I don't understand how that avoids any problem. How do you specify the
> target atom for that case?
> In any case, won't the InChI normalization affect some of your structures
> (e.g., detaching metals) and make it even harder to specify isotopes?


but just to clarify, here is the admittedly hackish via-InChI conversion I
had in mind.  The specification of isotopes still happens in smiles, which
I find easier for humans to grok than InChI.  {{Side note: the end
application here is modeling the labeling patterns in structurally complex
natural products that are biosynthesized from simple labeled substrates
(such as ethanol).  So I ultimately want to feed my labeled molecules to a
series of SMARTS reactions that will eventually lead to a complex
structure.  Given the vagaries of SMARTS reaction matching, I want to be
sure that all my reactions apply equally well to labeled and unlabeled
substrates.}}

def convert_to_smiles_via_inchi(smiles):
"""Make a molecule from SMILES but via Inchi"""
temp_mol = Chem.MolFromSmiles(smiles)
inchi = Chem.MolToInchi(temp_mol)
final_mol = Chem.MolFromInchi(inchi)
return final_mol

def same_implicit_valence(mol_1, mol_2, atom_idx=1):
"""Returns True if mol_1 and mol_2 have the same implicit valence for
the indexed atom"""
mol_1_implicitH = mol_1.GetAtomWithIdx(atom_idx).GetImplicitValence()
mol_2_implicitH = mol_2.GetAtomWithIdx(atom_idx).GetImplicitValence()
return mol_1_implicitH == mol_2_implicitH

etoh_v1 = 'C[13C]O'
etoh_v2 = 'CCO'

etoh_versions = [etoh_v1, etoh_v2]

via_inchi = [convert_to_smiles_via_inchi(mol) for mol in etoh_versions]
smiles_only = [Chem.MolFromSmiles(mol) for mol in etoh_versions]

# this works
assert same_implicit_valence(*via_inchi)

# this doesn't
assert same_implicit_valence(*smiles_only)
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] isotopic SMILES

2017-02-07 Thread Curt Fischer
I replied to Andrew's very nice discussion of implicit hydrogens in SMILES
 but forgot to include the whole list.

Wow, thank you, that was very useful.  I didn't realize those nuances of
> SMILES.
>
> On the rdkit "side", the distinction made in Smiles between implicit and
> explicit hydrogens seems to live in the atom properties ImplicitValence and
> NumImplicitHs.  rdkit, unlike SMILES apparently, does not require that
> isotopically labeled atoms have an implicit valence explicitly specified.
>
> # print out some key properties of all atoms in a molecule
> def print_atom_info(mol):
> for atom in mol.GetAtoms():
> print(atom.GetIdx(), atom.GetSymbol(),
>   atom.GetIsotope(), atom.GetImplicitValence(),
>   atom.GetNumImplicitHs())
> print('\n')
> return
>
> # make some ethanol
> ethanol_v1 = Chem.MolFromSmiles('CCO')
>
> # label carbon #1 with C13
> ethanol_v1.GetAtomWithIdx(1).SetIsotope(13)
>
> # make more ethanol, labeled as it's created
> ethanol_v2 = Chem.MolFromSmiles('C[13C]O')
>
> # implicit valence is different for the two molecules
> # so is implicit
> show_atom_info(ethanol_v1)
> show_atom_info(ethanol_v2)
>
> Draw.MolsToGridImage([ethanol_v1, ethanol_v2])
>

I also didn't finish my thought here.  My ultimate goal is an easy way to
create rdkit molecules that have isotopic substitutions but which are
otherwise exactly the same as non-substituted variants.  What's the best
approach?  Is it to directly call .SetIsotope() like I do above?  This
requires figuring out the rdkit atom index of my target atom, which is
doable but perhaps (?) overly complicated?   Converting to InChI and back
seems to avoid the problem, I guess because MolToInchi() has a removeHs
parameter that defaults to True.  That also seems a bit hack-ish but I'm
not sure what the best approach is.

Curt
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] isotopic SMILES

2017-02-06 Thread Curt Fischer
Hellow rdkit users,

What behavior should we expect for Chem.MolToSmiles() when dealing with
isotopically substituted molecules?


I am confused by this behavior:

>>> labeled_etoh = Chem.MolFromSmiles('C[13C]O')
>>> print(Chem.MolToSmiles(labeled_etoh))

C[C]O


>>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True))

C[13C]O


1. Why are there any brackets at all in the first output?  Why not just 'CCO
'?
2. Is there any documentation anywhere that the "isomericSmiles" argument
is also an "isotopicSmiles" argument?

I am also confused about when Chem.MolToSmiles() puts in H atoms in the
output.

>>> three_hb1 = Chem.MolFromSmiles('C[13CH](O)C[13C](=O)O')
>>> three_hb2 = Chem.MolFromSmiles('C[13C](O)C[13C](=O)O')
>>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=True))


C[13CH](O)C[13C](=O)O


>>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=True))

C[13C](O)C[13C](=O)O


>>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=False))

CC(O)CC(=O)O


>>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=False))


C[C](O)CC(=O)O


3. Why are there no brackets for three_hb1 output, but there are for
three_hb2?
4. As far as I can tell, the two three_hb molecules are identical.   Why
aren't all Hs removed during canonicalization?

Curt
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Kekulizing thiazoles

2017-01-17 Thread Curt Fischer
To troubleshoot your sanitization problems, I think it would be helpful if
you could share your SMARTS reaction string and the rdkit version you are
using.

I just simulated the Hantzsch thiazole synthesis shown on Wikipedia, and
everythink worked normally for me.  Admittedly, my reaction definition is
overly tailored toward these two reactants, but I think it shows that rdkit
can *Sanitize()* thiazoles correctly.

# Hantzsch thiazole synthesis
thiourea = Chem.MolFromSmiles('CN(C)C(=S)N')
haloketone = Chem.MolFromSmiles('c1c1C(=O)C(C)Cl')
rxn_smarts =
'[NH2:1][C:2](=[S:3])[NH0:4].[C:5](=[O:6])[C:7][Cl:8]>>[N:4][c:2]1[s:3][c:5][c:7][n:1]1'
rxn = AllChem.ReactionFromSmarts(rxn_smarts)
product = rxn.RunReactants((thiourea, haloketone))[0][0]
Chem.SanitizeMol(product)
Chem.MolToSmiles(product)

Out[33]: 'Cc1nc(N(C)C)sc1-c1c1'


On Tue, Jan 17, 2017 at 9:29 AM, Curt Fischer <curt.r.fisc...@gmail.com>
wrote:

> I can't answer your root question, but if you want to go to SMILES and
> then back, I think you want *Chem.MolFromSmiles()*, not
> *Chem.MolToSmiles()*.
>
> Curt
>
> On Tue, Jan 17, 2017 at 8:52 AM, Chris Arthur <chris.art...@bristol.ac.uk>
> wrote:
>
>> Dear all
>>
>>
>> I have a molecule containing a thiazole ring which has been generated by
>> a reaction in Rdkit.
>>
>> Sanitising the molecule gives kekulization error...
>>
>> Chem.SanitizeMol(forwardProduct_)
>> Traceback (most recent call last):
>>
>>   File "", line 1, in 
>> Chem.SanitizeMol(forwardProduct_)
>>
>> ValueError: Sanitization error: Can't kekulize mol
>>
>> I can generate a smiles string from it (I had thought of doing a smiles
>> to molecule conversion)
>>
>> #Rdkit generated smiles that started us down this rabbit-hole
>> temp = Chem.MolToSmiles('CC(=O)c1sc(C2CCOCC2)nc1C')
>>
>> But this fails
>>
>> ArgumentError: Python argument types in
>> rdkit.Chem.rdmolfiles.MolToSmiles(str)
>> did not match C++ signature:
>> MolToSmiles(class RDKit::ROMol mol, bool isomericSmiles=False, bool
>> kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
>> allBondsExplicit=False, bool allHsExplicit=False)
>>
>>
>> So I thought I would try with simpler thiazoles
>>
>> #ChemDraws smiles representation
>> temp = Chem.MolToSmiles('C1=CN=CS1')
>>
>> #From wikipedias smile for thiazole
>> temp = Chem.MolToSmiles('n1ccsc1')
>>
>> These however also fail.
>>
>>  Can anyone suggest how I can proceed in order to sanitize such molecules
>>
>>  Thanks
>>
>>  Chris
>>
>>
>>
>> --
>> Dr Christopher J. Arthur
>> School of Chemistry
>> University of Bristol
>> BRISTOL, BS8 1TS,  UK
>> E-mail:  chris.art...@bristol.ac.uk
>>
>> Office: (+44 117) 331 7192 <+44%20117%20331%207192>
>> Mass Spectrometry Lab: (+44 117) 331 7358 <+44%20117%20331%207358>.
>> FAX: (+44 117) 927 7985 <+44%20117%20927%207985>
>>
>> WWW URL: http://www.chm.bris.ac.uk/staff/carthur.htm
>> LinkedIn  Profile: https://www.linkedin.com/in/drchrisarthur
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] UpdatePropertyCache() after RunReactants

2017-01-12 Thread Curt Fischer
Oof, thanks for the pointer MIchal.  I'm sorry I didn't read the docs
carefully enough! ~CF

On Thu, Jan 12, 2017 at 9:32 AM, Michal Krompiec <michal.kromp...@gmail.com>
wrote:

> You need to sanitize the products, just run Chem.SanitizeMol on each
> molecule. See http://www.rdkit.org/docs/GettingStartedInPython.html#
> chemical-reactions : "the molecules that are produced by the chemical
> reaction processing code are not sanitized".
>
> Best,
> Michal
>
> On 12 January 2017 at 17:22, Curt Fischer <curt.r.fisc...@gmail.com>
> wrote:
>
>> What makes you think the molecules are nonsensical?  They look OK to me.
>> Converting to SMILES before doing any UpdatePropertyCache() stuff
>>
>>
>>
>> *products_tuples = copper_click.RunReactants((diyne, azide))products =
>> list(chain(*products_tuples))print [Chem.MolToSmiles(prod) for prod in
>> products]*
>>
>> gives
>>
>>
>>> *['C#CC(O)Cc1cnnn1CCC', 'C#CC(O)Cc1cn(CCC)nn1', 'C#CCC(O)c1cn(CCC)nn1',
>>> 'C#CCC(O)c1cnnn1CCC']*
>>
>>
>> ...and those all look like valid SMILES strings to me.
>>
>> I'm not sure exactly how to turn off all sanitization, but I did
>>
>> *Draw.MolsToGridImage(products, kekulize = False)*
>>
>> and as long that is invoked before UpdatePropertyCache, there is a
>> *different* error than the one I reported last time.
>>
>> ---RuntimeError
>>   Traceback (most recent call 
>> last) in ()  7 print 
>> [Chem.MolToSmiles(prod) for prod in products]  8 > 9 
>> Draw.MolsToGridImage(products, kekulize = False)
>> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/IPythonConsole.pyc
>>  in ShowMols(mols, **kwargs)198   else:199 fn = 
>> Draw.MolsToGridImage--> 200   res = fn(mols, **kwargs)201   if 
>> kwargs['useSVG']:202 return SVG(res)
>> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>>  in MolsToGridImage(mols, molsPerRow, subImgSize, legends, 
>> highlightAtomLists, useSVG, **kwargs)400   if useSVG:401 return 
>> _MolsToGridSVG(mols, molsPerRow=molsPerRow, subImgSize=subImgSize, 
>> legends=legends,--> 402   
>> highlightAtomLists=highlightAtomLists, **kwargs)403   else:404 
>> return _MolsToGridImage(mols, molsPerRow=molsPerRow, subImgSize=subImgSize, 
>> legends=legends,
>> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>>  in _MolsToGridSVG(mols, molsPerRow, subImgSize, legends, 
>> highlightAtomLists, stripSVGNamespace, **kwargs)374   nmol = 
>> rdMolDraw2D.PrepareMolForDrawing(mol, kekulize=kwargs.get('kekulize', True)) 
>>375   d2d = rdMolDraw2D.MolDraw2DSVG(subImgSize[0], subImgSize[1])--> 
>> 376   d2d.DrawMolecule(nmol, legend=legends[i], 
>> highlightAtoms=highlights)377   d2d.FinishDrawing()378   txt 
>> = d2d.GetDrawingText()
>> RuntimeError: Pre-condition Violation
>>  getNumImplicitHs() called without preceding call to 
>> calcImplicitValence()
>>  Violation occurred on line 153 in file Code/GraphMol/Atom.cpp
>>  Failed Expression: d_implicitValence > -1
>>  RDKIT: 2016.09.2
>>  BOOST: 1_56
>>
>>
>>
>> On Thu, Jan 12, 2017 at 4:41 AM, Brian Kelley <fustiga...@gmail.com>
>> wrote:
>>
>>> The outputs of reaction are a bit confusing.
>>>
>>> Reactions can have multiple product templates so the output of
>>> RunReactants is a list of list of molecules.
>>>
>>> For products in result:
>>>   For molecule in products:
>>>  Molecule.UpdatePropertyCache()
>>>
>>> However, it looks like your reaction is generating non sensical
>>> molecules so you may want to draw with sanitizaton turned off so you can
>>> see the reaction output.
>>>
>>> 
>>> Brian Kelley
>>>
>>> On Jan 11, 2017, at 9:11 PM, Curt Fischer <curt.r.fisc...@gmail.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I recently wanted to use RDKit to model the famous copper-catalyzed
>>> cycloaddition of alkynes and azides.
>>>
>>> I eventually got things working, kind of, but had two questions.  First,
>>> I was surprised to find that the products of RunReactants don't have update
>>> property caches.  Is this something I should have expected, or is it a
>&

[Rdkit-discuss] UpdatePropertyCache() after RunReactants

2017-01-11 Thread Curt Fischer
Hi all,

I recently wanted to use RDKit to model the famous copper-catalyzed
cycloaddition of alkynes and azides.

I eventually got things working, kind of, but had two questions.  First, I
was surprised to find that the products of RunReactants don't have update
property caches.  Is this something I should have expected, or is it a
bug?  If the latter, is it any easy-to-fix bug or a hard-to-fix one?

Second, how can I modify my SMARTS reaction query to avoid duplication of
each product?

Here's some example code, also available at
https://github.com/tentrillion/ipython_notebooks/blob/master/rdkit_smarts_reactions_needs_updating.ipynb

# ---BEGIN CODE-- #
# import rdkit components
from rdkit import rdBase
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw

# use IPythonConsole for pretty drawings
from rdkit.Chem.Draw import IPythonConsole
# IPythonConsole.ipython_useSVG=True  # leave out for github

# for flattening
from itertools import chain

# define reactants
diyne_smiles = 'C#CCC(O)C#C'
azide_smiles = 'CCCN=[N+]=[N-]'

diyne = Chem.MolFromSmiles(diyne_smiles)
azide = Chem.MolFromSmiles(azide_smiles)

# define reaction
copper_click_smarts =
'[C:1]#[C:2].[N:3]=[N+:4]=[N-:5]>>[c:1]1[c:2][n-0:3][n-0:4][n-0:5]1'
copper_click = AllChem.ReactionFromSmarts(copper_click_smarts)

# run reaction
products_tuples = copper_click.RunReactants((diyne, azide))

# flatten product tuple of tuples into list
products = list(chain(*products_tuples))

# FAILS: mol property caches are not updated
try:
Draw.MolsToGridImage(products)
except (RuntimeError, ValueError) as e:
print 'FAILED!'
my_error = e

# this works: force updating
for product in products:
product.UpdatePropertyCache()

Draw.MolsToGridImage(products)

my_error

products_tuples = copper_click.RunReactants((diyne, azide))
products = list(chain(*products_tuples))
# FAILS: mol property caches are not updated
Draw.MolsToGridImage(products)

# ---END CODE-- #

The stacktrace is:

---ValueError
   Traceback (most recent call
last) in ()  2 products =
list(chain(*products_tuples))  3 # FAILS: mol property caches are
not updated> 4 Draw.MolsToGridImage(products)
/Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/IPythonConsole.pyc
in ShowMols(mols, **kwargs)198   else:199 fn =
Draw.MolsToGridImage--> 200   res = fn(mols, **kwargs)201   if
kwargs['useSVG']:202 return SVG(res)
/Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
in MolsToGridImage(mols, molsPerRow, subImgSize, legends,
highlightAtomLists, useSVG, **kwargs)403   else:404 return
_MolsToGridImage(mols, molsPerRow=molsPerRow, subImgSize=subImgSize,
legends=legends,--> 405
highlightAtomLists=highlightAtomLists, **kwargs)406 407
/Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
in _MolsToGridImage(mols, molsPerRow, subImgSize, legends,
highlightAtomLists, **kwargs)344   highlights =
highlightAtomLists[i]345 if mol is not None:--> 346   img
= _moltoimg(mol, subImgSize, highlights, legends[i], **kwargs)347
 res.paste(img, (col * subImgSize[0], row * subImgSize[1]))348
  return res
/Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
in _moltoimg(mol, sz, highlights, legend, **kwargs)309   from
rdkit.Chem.Draw import rdMolDraw2D310   if not
hasattr(rdMolDraw2D, 'MolDraw2DCairo'):--> 311 img =
MolToImage(mol, sz, legend=legend, highlightAtoms=highlights,
**kwargs)312   else:313 nmol =
rdMolDraw2D.PrepareMolForDrawing(mol, kekulize=kwargs.get('kekulize',
True))
/Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
in MolToImage(mol, size, kekulize, wedgeBonds, fitImage, options,
canvas, **kwargs)112 from rdkit import Chem113 mol =
Chem.Mol(mol.ToBinary())--> 114 Chem.Kekulize(mol)115 116
 if not mol.GetNumConformers():
ValueError: Sanitization error: Can't kekulize mol.  Unkekulized atoms: 3
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SD file read error

2017-01-11 Thread Curt Fischer
I also got this to run with no problem in a Jupyter notebook.

BUT...I did see the error messages Milinda mentioned in the terminal that
was running the jupyter notebook server.  If I do *from rdkit.Chem.Draw
import IPythonConsole *before running the code, I see all the
errors/warnings in Jupyter.

I think this version of the loop is a bit more informative (best to do with
IPythonConsole disabled):


> *from rdkit import Chem**from rdkit.Chem import Descriptors*
> *input_file = 'structures.sdf'*
> *suppl  = Chem.SDMolSupplier(input_file)*
>
> *low_mass=50**high_mass=1000*
> *ms = []*
>
> *for idx, mol in enumerate(suppl) :*
> *if mol is None: *
> *print "No molecule: " + str(idx)*
> *continue*
> *try:*
> *if (mol and *
> *round(Descriptors.ExactMolWt(mol), 4) >= low_mass
> and *
> *round(Descriptors.ExactMolWt(mol), 4) <= high_mass*
> *   ):*
>
> *ms.append(mol)*
> *except:*
> *print "Error: " + str(idx)**pass*



It shows that all the problems are from rdkit failing to generate
molecules, i.e. the try/except isn't doing anything.  (Note it is bad
practice to have a naked *except*).

The first molecule that fails is #491, heparin sulfate.  The molecule can
be imported using *Chem.MolFromInchi()*. This gels nicely with the rdkit
error message for this molecule:

RDKit ERROR: [12:12:56] Unhandled CTAB feature: S group SRU on line: 75.
Molecule skipped.



The problem is thus the line M STY 1 1 SRU in the mol block, which you can
see if you do

*suppl.reset() for idx, mol in enumerate(suppl): if idx == 491: print
> suppl.GetItemText(idx)*
>

I don't know enough to pinpoint the precise reason for the error.  And
there are lots more errors to go through to get everything from HMDB into
RDKit, it seeems.

Curt

On Wed, Jan 11, 2017 at 11:39 AM, Steve O'Hagan 
wrote:

> With same code and fresh file download, works fine for me without error.
>
> ms contains 35177 molecules. Perhaps your download was corrupt?
>
>
> On 11/01/2017 18:26, Milinda Samaraweera wrote:
>
> Dear Experts,
>
> I was trying to read in the attached SD file (downloaded from HMDB) and
> trying to calculate the exact mass of each entry:
> ​
>  structures.sdf
> 
> ​
> from rdkit import Chem
> from rdkit.Chem import Descriptors
>
> suppl  = Chem.SDMolSupplier(input_file)
>
> low_mass=50
> high_mass=1000
>
> ms = []
>
> for mol in suppl :
>
> if mol is None: continue
>
> try:
> if mol and round(Descriptors.ExactMolWt(mol),4)>=low_mass
> andround(Descriptors.ExactMolWt(mol),4)<=high_mass:
> ms.append(mol)
>
> except:
>   pass
>
> By running the script, I got a barrage of errors as:
>
> [13:15:14] ERROR: Could not sanitize molecule ending on line 1993855
> [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater than
> permitted
> [13:15:14] Explicit valence for atom # 9 O, 3, is greater than permitted
> [13:15:14] ERROR: Could not sanitize molecule ending on line 1994014
> [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater than
> permitted
> [13:15:14] Explicit valence for atom # 9 O, 3, is greater than permitted
> [13:15:14] ERROR: Could not sanitize molecule ending on line 1996036
> [13:15:14] ERROR: Explicit valence for atom # 9 O, 3, is greater than
> permitted
> [13:15:16] Explicit valence for atom # 46 N, 4, is greater than permitted
> [13:15:16] ERROR: Could not sanitize molecule ending on line 2302532
> [13:15:16] ERROR: Explicit valence for atom # 46 N, 4, is greater than
> permitte
> [13:15:16] Explicit valence for atom # 16 N, 4, is greater than permitted
> [13:15:16] ERROR: Could not sanitize molecule ending on line 2302918
> [13:15:16] ERROR: Explicit valence for atom # 16 N, 4, is greater than
> permitte
> [13:15:17] Explicit valence for atom # 11 N, 4, is greater than permitted
> [13:15:17] ERROR: Could not sanitize molecule ending on line 2556541
> [13:15:17] ERROR: Explicit valence for atom # 11 N, 4, is greater than
> permitte
> [13:15:18]  S group SUP ignored on line 2836416
> [13:15:18] Explicit valence for atom # 1 Cl, 4, is greater than permitted
> [13:15:18] ERROR: Could not sanitize molecule ending on line 2841449
> [13:15:18] ERROR: Explicit valence for atom # 1 Cl, 4, is greater than
> permitte
> [13:15:19] Warning: conflicting stereochemistry at atom 10 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 10 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 17 ignored.
> [13:15:19] Warning: conflicting stereochemistry at atom 17 ignored.
> [13:15:19] Explicit valence for atom # 3 B, 4, is greater than permitted
> [13:15:19] ERROR: Could not sanitize molecule ending on line 3107498
> 

Re: [Rdkit-discuss] Fwd: conda / Windows update to 2016.09 release gives error

2017-01-05 Thread Curt Fischer
This worked for me.  Thanks Greg.  CF

On Tue, Jan 3, 2017 at 8:03 PM, Greg Landrum <greg.land...@gmail.com> wrote:

> Curt,
>
> If you change lines 32 and 33 in /lib/site-
> packages\rdkit\RDConfig.py
> to:
>   condaDir += ['Library', 'share', 'RDKit']
>   _share = os.path.join(*condaDir)
>
> I think it should work.
>
> Sorry for the inconvenience here; we will fix it before running the next
> conda builds.
>
> -greg
>
>
> On Wed, Jan 4, 2017 at 1:44 AM, Curt Fischer <curt.r.fisc...@gmail.com>
> wrote:
>
>>
>> Thanks for writing in Matt!
>>
>> Do you or any other readers think there is any chance that a small manual
>> fix to RDConfig.py could fix the problem?  I have very little experience
>> with building anything from source and would like to use the newest version
>> of rdkit if possible.  Would it be as simple as adding the *.sep* to
>> */lib/site-packages\rdkit\RDConfig.py* ?
>> Curt
>>
>> On Wed, Dec 21, 2016 at 2:22 AM, Matthew Swain <m.sw...@me.com> wrote:
>>
>>> I've also encountered this problem with the 2016.09.2 windows packages
>>> on the rdkit conda channel. It looks like somehow the RDConfig patch in the
>>> conda recipe hasn't been applied properly in the published packages.
>>>
>>> The original lines in the rdkit are:
>>>
>>> condaDir += ['share', 'RDKit']
>>> _share = os.path.join(*condaDir)
>>>
>>> The conda recipe has a Windows-specific patch to change this to:
>>>
>>> condaDir += ['Library','share','RDKit']
>>> _share = os.path.sep.join(condaDir)
>>>
>>> Which looks fine (although the second line doesn't really need
>>> changing?). But in the published packages it is:
>>>
>>> condaDir += ['share', 'RDKit', 'RDKit']
>>> _share = os.path.join(condaDir)
>>>
>>> This causes the AttributeError because it incorrectly passes a list to
>>> os.path.join, with no asterisk for unpacking the list into *args. The first
>>> line is also incorrect.
>>>
>>> I built the package myself from the recipe, and didn't see this issue.
>>>
>>> Matt
>>>
>>> On Dec 09, 2016, at 05:05 PM, Curt Fischer <curt.r.fisc...@gmail.com>
>>> wrote:
>>>
>>> I'm not sure of the source of the problem with the conda 2016.09 release
>>> on my Windows box, but I was able to revert to a 2016.03 release with a 
>>> *conda
>>> install -c rmg rdkit=2016.03**
>>>
>>> conda couldn't seem to solve the specifications automagically, but after
>>> I uninstalled boost and did the above command, it identified the proper
>>> boost to install along with the 2016.03 rdkit.
>>>
>>> I now have a functioning rdkit again, but would still be interested in
>>> hearing from anyone that experiences a similar problem.
>>>
>>> On Thu, Dec 8, 2016 at 9:27 AM, Curt Fischer <curt.r.fisc...@gmail.com>
>>> wrote:
>>>
>>>> To update rdkit to the September release, I recently did a
>>>>
>>>> *conda install -f --channel https://conda.anaconda.org/rdkit
>>>> <https://conda.anaconda.org/rdkit> rdkit*
>>>>
>>>> on my Windows box, and everything seemed to update fine.
>>>>
>>>> However now, when I try from rdkit import Chem, I get the disturbing
>>>> error message below.
>>>>
>>>> Is this a sign that my particular installation got borked somehow, and
>>>> I maybe I should reinstall everything again?  Or is this perchance a known
>>>> issue with the 2016.09 release?  If the latter, how do I roll back to the
>>>> old release using conda?  I tried a *conda install --channel
>>>> https://conda.anaconda.org/rdkit <https://conda.anaconda.org/rdkit>
>>>> rdkit=2016.03.4 *but that didn't seem to do it.
>>>>
>>>> Thanks all for any help!
>>>>
>>>> Curt
>>>>
>>>> ---AttributeError
>>>> Traceback (most recent call 
>>>> last) in ()> 1 from rdkit import 
>>>> Chem
>>>> C:\Anaconda2\lib\site-packages\rdkit\Chem\__init__.py in () 17 
>>>> """ 18 from rdkit import rdBase---> 19 from rdkit import RDConfig 
>>>> 20  21 from rdkit import DataStructs

[Rdkit-discuss] Fwd: conda / Windows update to 2016.09 release gives error

2017-01-03 Thread Curt Fischer
Thanks for writing in Matt!

Do you or any other readers think there is any chance that a small manual
fix to RDConfig.py could fix the problem?  I have very little experience
with building anything from source and would like to use the newest version
of rdkit if possible.  Would it be as simple as adding the *.sep* to
*/lib/site-packages\rdkit\RDConfig.py* ?
Curt

On Wed, Dec 21, 2016 at 2:22 AM, Matthew Swain <m.sw...@me.com> wrote:

> I've also encountered this problem with the 2016.09.2 windows packages on
> the rdkit conda channel. It looks like somehow the RDConfig patch in the
> conda recipe hasn't been applied properly in the published packages.
>
> The original lines in the rdkit are:
>
> condaDir += ['share', 'RDKit']
> _share = os.path.join(*condaDir)
>
> The conda recipe has a Windows-specific patch to change this to:
>
> condaDir += ['Library','share','RDKit']
> _share = os.path.sep.join(condaDir)
>
> Which looks fine (although the second line doesn't really need changing?).
> But in the published packages it is:
>
> condaDir += ['share', 'RDKit', 'RDKit']
> _share = os.path.join(condaDir)
>
> This causes the AttributeError because it incorrectly passes a list to
> os.path.join, with no asterisk for unpacking the list into *args. The first
> line is also incorrect.
>
> I built the package myself from the recipe, and didn't see this issue.
>
> Matt
>
> On Dec 09, 2016, at 05:05 PM, Curt Fischer <curt.r.fisc...@gmail.com>
> wrote:
>
> I'm not sure of the source of the problem with the conda 2016.09 release
> on my Windows box, but I was able to revert to a 2016.03 release with a *conda
> install -c rmg rdkit=2016.03**
>
> conda couldn't seem to solve the specifications automagically, but after I
> uninstalled boost and did the above command, it identified the proper boost
> to install along with the 2016.03 rdkit.
>
> I now have a functioning rdkit again, but would still be interested in
> hearing from anyone that experiences a similar problem.
>
> On Thu, Dec 8, 2016 at 9:27 AM, Curt Fischer <curt.r.fisc...@gmail.com>
> wrote:
>
>> To update rdkit to the September release, I recently did a
>>
>> *conda install -f --channel https://conda.anaconda.org/rdkit
>> <https://conda.anaconda.org/rdkit> rdkit*
>>
>> on my Windows box, and everything seemed to update fine.
>>
>> However now, when I try from rdkit import Chem, I get the disturbing
>> error message below.
>>
>> Is this a sign that my particular installation got borked somehow, and I
>> maybe I should reinstall everything again?  Or is this perchance a known
>> issue with the 2016.09 release?  If the latter, how do I roll back to the
>> old release using conda?  I tried a *conda install --channel
>> https://conda.anaconda.org/rdkit <https://conda.anaconda.org/rdkit>
>> rdkit=2016.03.4 *but that didn't seem to do it.
>>
>> Thanks all for any help!
>>
>> Curt
>>
>> ---AttributeError
>> Traceback (most recent call 
>> last) in ()> 1 from rdkit import 
>> Chem
>> C:\Anaconda2\lib\site-packages\rdkit\Chem\__init__.py in () 17 
>> """ 18 from rdkit import rdBase---> 19 from rdkit import RDConfig 20 
>>  21 from rdkit import DataStructs
>> C:\Anaconda2\lib\site-packages\rdkit\RDConfig.py in () 31 
>> condaDir[0] = os.path.sep 32   condaDir += ['share', 'RDKit', 
>> 'RDKit']---> 33   _share = os.path.join(condaDir) 34   RDDataDir = 
>> os.path.join(_share, 'Data') 35   RDDocsDir = os.path.join(_share, 
>> 'Docs')
>> C:\Anaconda2\lib\ntpath.pyc in join(path, *paths) 63 def join(path, 
>> *paths): 64 """Join two or more pathname components, inserting "\\" 
>> as needed."""---> 65 result_drive, result_path = splitdrive(path) 66 
>> for p in paths: 67 p_drive, p_path = splitdrive(p)
>> C:\Anaconda2\lib\ntpath.pyc in splitdrive(p)114 """115 if 
>> len(p) > 1:--> 116 normp = p.replace(altsep, sep)117 if 
>> (normp[0:2] == sep*2) and (normp[2:3] != sep):118 # is a UNC 
>> path:
>> AttributeError: 'list' object has no attribute 'replace'
>>
>>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studi

Re: [Rdkit-discuss] conda / Windows update to 2016.09 release gives error

2016-12-09 Thread Curt Fischer
I'm not sure of the source of the problem with the conda 2016.09 release on
my Windows box, but I was able to revert to a 2016.03 release with a *conda
install -c rmg rdkit=2016.03**

conda couldn't seem to solve the specifications automagically, but after I
uninstalled boost and did the above command, it identified the proper boost
to install along with the 2016.03 rdkit.

I now have a functioning rdkit again, but would still be interested in
hearing from anyone that experiences a similar problem.

On Thu, Dec 8, 2016 at 9:27 AM, Curt Fischer <curt.r.fisc...@gmail.com>
wrote:

> To update rdkit to the September release, I recently did a
>
> *conda install -f --channel https://conda.anaconda.org/rdkit
> <https://conda.anaconda.org/rdkit> rdkit*
>
> on my Windows box, and everything seemed to update fine.
>
> However now, when I try from rdkit import Chem, I get the disturbing
> error message below.
>
> Is this a sign that my particular installation got borked somehow, and I
> maybe I should reinstall everything again?  Or is this perchance a known
> issue with the 2016.09 release?  If the latter, how do I roll back to the
> old release using conda?  I tried a *conda install --channel
> https://conda.anaconda.org/rdkit <https://conda.anaconda.org/rdkit>
> rdkit=2016.03.4 *but that didn't seem to do it.
>
> Thanks all for any help!
>
> Curt
>
> ---AttributeError
> Traceback (most recent call 
> last) in ()> 1 from rdkit import 
> Chem
> C:\Anaconda2\lib\site-packages\rdkit\Chem\__init__.py in () 17 
> """ 18 from rdkit import rdBase---> 19 from rdkit import RDConfig 20  
> 21 from rdkit import DataStructs
> C:\Anaconda2\lib\site-packages\rdkit\RDConfig.py in () 31 
> condaDir[0] = os.path.sep 32   condaDir += ['share', 'RDKit', 
> 'RDKit']---> 33   _share = os.path.join(condaDir) 34   RDDataDir = 
> os.path.join(_share, 'Data') 35   RDDocsDir = os.path.join(_share, 'Docs')
> C:\Anaconda2\lib\ntpath.pyc in join(path, *paths) 63 def join(path, 
> *paths): 64 """Join two or more pathname components, inserting "\\" 
> as needed."""---> 65 result_drive, result_path = splitdrive(path) 66  
>for p in paths: 67 p_drive, p_path = splitdrive(p)
> C:\Anaconda2\lib\ntpath.pyc in splitdrive(p)114 """115 if 
> len(p) > 1:--> 116 normp = p.replace(altsep, sep)117 if 
> (normp[0:2] == sep*2) and (normp[2:3] != sep):118 # is a UNC 
> path:
> AttributeError: 'list' object has no attribute 'replace'
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] File Conversion?

2016-12-04 Thread Curt Fischer
This is not really possible.  Fasta files contain only sequence
information, not 3D structural information.

Curt

On Sun, Dec 4, 2016 at 7:00 AM, Carl MacGentey 
wrote:

> Dear RDKit Discussion Group-
>
>
>
> Is it possible to convert fasta files (DNA nucleotide sequences) into PDB
> files? I am wanting to view strands of DNA and full length genes in three
> dimensions.
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] new rdkit-tutorial repository

2016-10-09 Thread Curt Fischer
I think this is an amazing idea.  Thanks for starting it Greg!

I'm looking forward to porting some of my own self-tutorial Jupyter
notebooks into this repo.

Curt

On Sat, Oct 8, 2016 at 10:33 PM, Greg Landrum 
wrote:

> Dear all,
>
> Based on a bunch of feedback I've gotten from multiple people, I've
> created a new RDKit repository in github to host short tutorials:
> https://github.com/rdkit/rdkit-tutorials
>
> The Python-based tutorials (currently the only type there) are created
> using jupyter, so that they can be nicely viewed in github, and are
> automatically tested.
>
> There's not much there at the moment, but I will try and get into the
> habit of adding new ones on a fairly frequent basis. Pull requests with new
> tutorials are very welcome. Please take a look at the Contributing.md
> document (https://github.com/rdkit/rdkit-tutorials/blob/master/
> Contributing.md) for some guidelines.
>
> Best,
> -greg
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] (no subject)

2016-09-27 Thread Curt Fischer
Last week, Greg gave us a nice example of substructure matching using a
SMARTS query (see below).  I learned a lot from this thread, not the least
of which was to use the IPythonConsole module and to enable SVG for
beautiful drawings.

This style is indeed much better than the old drawings.

I was interested in making grid images of molecules, where some molecules
in the grid matches a query structure and some didn't.  I eventually
figured it out.  See my Jupyter notebook at
http://nbviewer.jupyter.org/github/tentrillion/ipython_notebooks/blob/master/smarts_queries_in_rdkit.ipynb
if you are interested.

I had a question about these lines:

highlight_lists = [mol.__sssAtoms for mol in
my_molecules]Draw.MolsToGridImage(my_molecules, highlightAtomLists =
highlight_lists)


It took my forever to figure out that this was required.  Is this desired
behavior?

I ask because it seems to differ from what's required to show matches in
single molecules.  No explicit reference to atoms for highlighting was
required.  The behavior for multiple molecules -- at least the way I am
doing it -- seems different.  Would it just be a small addition to the
ShowMols() function in rdkit.Chem.Draw.IPythonConsole to fix this, if it is
indeed a bug?

Curt


On Wed, Sep 21, 2016 at 8:31 PM, Greg Landrum 
wrote:

> Hi Markus,
>
> Curt's instincts are dead on: the problem here is the rings.
>
> I'll show the fix and then explain what's going on. You just need to add
> one line to your code:
>
> core = "[a]12[a][a][a][a][a]1[a][a][a]2"
> pattern = Chem.MolFromSmarts(core)
> Chem.GetSSSR(pattern)
> AllChem.Compute2DCoords(pattern)
>
> when I do this, I get the following depiction for "c1(ocn2)c21":
>
> (The highlighting is due to the substructure match that's done during the
> generation of coordinates).
>
> So why is this necessary?
> The code that generates 2D coordinates uses information about the size of
> ring systems in the molecule as part of the coordinate generation. If no
> ring information is present (which is true of molecules generated from
> SMARTS since they are not fully sanitized on construction) then the code
> calls FastFindRings(). This function is perfectly capable of identifying
> all ring atoms and bonds, but it isn't very good at getting ring sizes
> correct for fused systems (it finds rings, but not the smallest rings). The
> consequences are the badly generated coordinates for fused ring systems
> that you were seeing.
>
> I think the current behavior of the code "isn't really ideal": the
> coordinate generation code should call the SSSR algorithm in these cases so
> that it can generate better coordinates. I'll take a look at the code and
> think about changing it.
>
> As an aside: if you're puzzled by the behavior of
> AllChem.GenerateDepictionMatching2DStructure() you can always just take a
> look at the drawing of the query molecule itself. It's not always the most
> informative depiction when it comes to what the atom and bond queries are,
> but you at least will see the coordinates.
>
> A second aside: the molecule depictions in that notebook indicate that you
> are stuck using the fallback drawing code, which creates fairly ugly
> pictures. You can get better drawings by either installing cairo and
> pycairo (in which case the code should automatically use those) or telling
> the drawing code to use SVG for the rendering:
>
> from rdkit.Chem.Draw import IPythonConsole
> IPythonConsole.ipython_useSVG=True
>
> It really does make the drawings a lot better.
>
> I hope this helps,
> -greg
>
>
>
>
>
>
> On Wed, Sep 21, 2016 at 8:47 PM, Markus Metz  wrote:
>
>> Hello all:
>>
>> I am trying to perform a 2D alignment of molecules by using a pattern for
>> which I am using Compute2DCoords.
>>
>> If I use a smarts string matching napthalene the 2D depiction is as one
>> would expect.
>> However, if I am switching to a 5,6 aromatic smarts pattern the matched
>> benzoxazol the 2D structure looks rather unusual.
>>
>> Is there a way to match the 5,6 with the 6,6 pattern behavior?
>>
>> Any hint is very much appreciated,
>>
>> Markus
>>
>> P.S. a work book is attached.
>>
>> 
>> --
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] (no subject)

2016-09-21 Thread Curt Fischer
Hi Markus,

I suspect the problem is that your SMARTS query is not as specific as you
might think.  For example, RDKit does not understand how many rings there
are in your SMARTS query.  Each [a] could be an atom arbitrarily connected
to many other rings that wouldn't be a part of the substructure match.
Thus, RDKit cannot generate a meaningful set of 2D coordinates for your
SMARTS patterns, and defaults somehow to the unhelpful representation you
reported.

Compare:

# define two molecules one from smiles, one from smarts
naphthalene = Chem.MolFromSmiles('c12c12')
naphthalene_smarts = Chem.MolFromSmarts('c12c12')

# define a query that hits atoms that are in two rings
in_two_rings = Chem.MolFromSmarts('[R2]')

# find atoms in our molecules that are in two rings
naphthalene.GetSubstructMatches(in_two_rings)
naphthalene_smarts.GetSubstructMatches(in_two_rings)  # fails because the
RingInfo object of this molecule could not be initiated

A path forward for you could be setting the RingInfo of your SMARTS query
manually, but I"m not exactly sure how to do that.  Maybe others could
weigh in?  Here's a SMARTS that might be useful: it should hit any molecule
that consists of an aromatic benzene fused to any (aromatic or aliphatic)
five-membered ring:
benzene_with_five_membered_fusion =
Chem.MolFromSmarts('[*r5R1]1[cR2]2[cR1][cR1][cR1][cR1][cR2]2[*r5R1][*r5R1]1')

Curt

On Wed, Sep 21, 2016 at 11:47 AM, Markus Metz  wrote:

> Hello all:
>
> I am trying to perform a 2D alignment of molecules by using a pattern for
> which I am using Compute2DCoords.
>
> If I use a smarts string matching napthalene the 2D depiction is as one
> would expect.
> However, if I am switching to a 5,6 aromatic smarts pattern the matched
> benzoxazol the 2D structure looks rather unusual.
>
> Is there a way to match the 5,6 with the 6,6 pattern behavior?
>
> Any hint is very much appreciated,
>
> Markus
>
> P.S. a work book is attached.
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] a SMILES that rdkit cannot read

2016-09-19 Thread Curt Fischer
> I ran it in jupyter via browser. Both docker/ubuntu and windows were
> tested and found that jupyter won't give error. Ipython or python in
> terminal(cmd) would show this error.
>
>
That problem is related to how Jupyter interacts with python kernels.  It's
not an RDKit issue.  If you check the terminal that you used to launch the
jupyter notebook process, the error/warning that you are talking about is
probably displayed there rather than in the Jupyter notebook.

Curt
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fwd: Chem.Descriptors.ExactMolWt

2015-12-08 Thread Curt Fischer
Hi RDKit users,

Should we expect the ExactMolWt() function from the Descriptors module to
know about the mass of electrons?  I initially expected that it would, and
thus was surprised by this behavior:


>
>
>
>
>
>
>
>
>
> *proton_smiles = '[H+]'proton =
> Chem.MolFromSmiles(proton_smiles)proton_mass =
> Descriptors.ExactMolWt(proton)H_atom_smiles = '[H]'H_atom =
> Chem.MolFromSmiles(H_atom_smiles)H_atom_mass =
> Descriptors.ExactMolWt(H_atom)print proton_massprint H_atom_massprint
> rdmolops.GetFormalCharge(proton)print rdmolops.GetFormalCharge(H_atom)*



>
>
> *1.0078250321.00782503210 *


That is, the proton and the neutral hydrogen atom have the same "exact"
mass.  But since electrons weigh 0.0005485799 Daltons, I was hoping that
*Descriptors.ExactMolWt(proton) *would return 1.00727645.

Am I misunderstanding what this function is for, or is this a bug?

Curt
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fwd: Possible stereo centers incorrectly assigned?

2015-09-08 Thread Curt Fischer
Hi John, Michal, and everyone,

I've been following this discussion with interest and wrote a code snippet
to test out John M.'s idea to access the '_CIPCode' property of atoms.  The
results of doing are opposite to the result of the original getStereoInfo()
function, and thus are in accord with Michal's reports of JChem output.  So
for the simple test case of ephedrine all is well, but I'd be interested in
hearing stories about what goes wrong in more complex cases.

from rdkit import Chem



>
> # old function from Michal Nowotka
> def getStereoInfo(smiles):
> ret = []
> mol = Chem.MolFromSmiles(smiles)
> Chem.AssignStereochemistry(mol, flagPossibleStereoCenters=True,
> force=True)
> for atom in mol.GetAtoms():
> stereo = str(atom.GetChiralTag())
> atomIndex = atom.GetIdx()
> if str(atom.GetChiralTag()) != "CHI_UNSPECIFIED":
> if stereo == "CHI_TETRAHEDRAL_CW":
> chirality = "R"
> elif stereo == "CHI_TETRAHEDRAL_CCW":
> chirality = "S"
> else:
> chirality = "R/S"
> ret.append({"atomIndex":atomIndex,"chirality":chirality})
> return ret



>
> # new function as suggested by John M.
> def get_stereo_info_new(smiles):
> # import the molecule into rdkit
> mol = Chem.MolFromSmiles(smiles)
>
> # find chiral centers
> Chem.FindMolChiralCenters(mol)
>
> # recover info on chiral centers
> chiral_centers = {}
> for atom in mol.GetAtoms():
> try:
> stereo = str(atom.GetProp('_CIPCode'))
> chiral_centers[atom.GetIdx()] = stereo
> except KeyError:
> pass
> return chiral_centers



>
> # compare the functions on a chiral molecule
> ephedrine = 'O[C@H](c1c1)[C@@H](NC)C'
> print getStereoInfo(ephedrine)
> print get_stereo_info_new(ephedrine)


[{'chirality': 'S', 'atomIndex': 1}, {'chirality': 'R', 'atomIndex': 8}]
{8: 'S', 1: 'R'}


On Tue, Sep 8, 2015 at 7:19 AM, John M  wrote:

> Yes, but the ordering is relative to some ranking. I think you're
> accessing "local parity" here, see slide 8
> http://baoilleach.blogspot.co.uk/2015/08/the-whole-of-cheminformatics-best.html
> .
>
> Try accessing the "_CIPCode" prop on the atoms. Note there are still
> problems.
>
> J
>
>
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] serialization of EditableMol objects ?

2015-07-15 Thread Curt Fischer
Hi all,

I use rdkit through its Python API.  As part of an iterative computation on
a molecule, I have to repeatedly remove different combinations of bonds
from it.  I noticed that the calls to Chem.EditableMol() to create the em,
to em.GetMol() to get back the new molecule, and to em.RemoveBonds() to
remove the bonds were taking the most time in my code that loops over
different combinations of bonds.  To speed things up, I though about
storing copies of intermediate ems in a dictionary, but unfortunately no
methods for pickling EditableMol() objects seems to be available.

For example, this code...

trp_inchi =
 'InChI=1S/C11H12N2O2/c12-9(11(14)15)5-7-6-13-10-4-2-1-3-8(7)10/h1-4,6,9,13H,5,12H2,(H,14,15)/t9-/m0/s1'
 trp = Chem.MolFromInchi(trp_inchi)
 e_trp = Chem.EditableMol(trp)
 from copy import copy, deepcopy
 for mol in [trp, e_trp]:
 try:
 copy(mol); print '%s can be copied' % mol
 deepcopy(mol); print '%s can be deep copied' % mol
 mol.ToBinary(); print '%s can be serialized by converting to
 binary' % mol
 except RuntimeError:
 print 'These methods are not available for %s' % mol


...results in

rdkit.Chem.rdchem.Mol object at 0x11334ee60 can be copied
 rdkit.Chem.rdchem.Mol object at 0x11334ee60 can be deep copied
 rdkit.Chem.rdchem.Mol object at 0x11334ee60 can be serialized by
 converting to binary
 These methods are not available for rdkit.Chem.rdchem.EditableMol object
 at 0x11334f7e0


Is there an easy way to roll my own serialization or pickle method for
this class, if I have extremely limited knowledge of C++ but an OK to good
knowledge of Python?

Also, I'm new to the rdkit community and hope I'm sending my message to the
right place.  Would rdkit-devel be better for this sort of question?
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss