Re: [Rdkit-discuss] RDKit Cookbook

2017-01-18 Thread Chris Swain
Hi,

I just copied the first 4 rows of the warning, it is actually repeated many, 
many times.

The change Greg suggests fixes things, thanks.

Chris
> On 19 Jan 2017, at 03:32, Greg Landrum  wrote:
> 
> Peter is correct, those are just warnings. But they are irritating warnings.
> You should be able to get rid of them by changing the call to:
> fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m5, 
> SimilarityMaps.GetMorganFingerprint, lambda x: getProba((x,), 
> rf.predict_proba))
> This passes getProba() a tuple with the fingerprint in question.
> 
> -greg
> 
> 
> On Thu, Jan 19, 2017 at 12:12 AM, Peter S. Shenkin  > wrote:
> These are not errors. They are warnings, as stated in the messages. 
> 
> Do you have any evidence that the procedure is not working? If not, the 
> reason is unlikely to be these warnings.
> 
> What the message is saying is that in a future release of sklearn 
> (scikit-learn), the calling code will have to use the validation module 
> differently. 
> 
> I don't know this part of RDKit, but the purport is presumably that whoever 
> is maintaining the part of RDKit that calls the validation module (directly 
> or indirectly) will have to make the change described in the warning messages 
> before RDKit starts using version 0.19 of sklearn. A quick google search 
> indicates that version 0.19 hasn't been released yet, so the current calling 
> protocol should work for now. But we have been forewarned
> 
> -P.
> 
> On Wed, Jan 18, 2017 at 2:26 PM, Chris Swain  > wrote:
> Hi,
> 
> I’ve been trying a few o the examples in 
> http://www.rdkit.org/docs/Cookbook.html 
> 
> 
> and I’d like to use the Similarity Maps as shown below
> 
> from rdkit.Chem.Draw import SimilarityMaps
> 
> # helper function
> def getProba(fp, predictionFunction):
>   return predictionFunction(fp)[0][1]
> 
> m5 = Chem.MolFromSmiles('c1c1O')
> fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m5, 
> SimilarityMaps.GetMorganFingerprint, lambda x: getProba(x, rf.predict_proba))
> But I get this error.
> 
> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
> your data has a single feature or X.reshape(1, -1) if it contains a single 
> sample.
>   DeprecationWarning)
> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
> your data has a single feature or X.reshape(1, -1) if it contains a single 
> sample.
>   DeprecationWarning)
> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
> your data has a single feature or X.reshape(1, -1) if it contains a single 
> sample.
>   DeprecationWarning)
> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
> your data has a single feature or X.reshape(1, -1) if it contains a single 
> sample.
>   DeprecationWarning)
> 
> Cheers,
> 
> Chris
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> 
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> 
> 
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> 
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> 
> 
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] UpdatePropertyCache() after RunReactants

2017-01-18 Thread Greg Landrum
Coming back to an old one that was mostly answered already:

On Thu, Jan 12, 2017 at 3:11 AM, Curt Fischer 
wrote:

>
> I recently wanted to use RDKit to model the famous copper-catalyzed
> cycloaddition of alkynes and azides.
>
> I eventually got things working, kind of, but had two questions.  First, I
> was surprised to find that the products of RunReactants don't have update
> property caches.  Is this something I should have expected, or is it a
> bug?  If the latter, is it any easy-to-fix bug or a hard-to-fix one?
>

This was covered already: you need to sanitize the outputs of RunReactants()


> Second, how can I modify my SMARTS reaction query to avoid duplication of
> each product?
>

You probably can't and still support the same reaction.
Here's your input:

copper_click_smarts = '[C:1]#[C:2].[N:3]=[N+:4]=[N-:
5]>>[c:1]1[c:2][n-0:3][n-0:4][n-0:5]1'

The first reactant - '[C:1]#[C:2]' - can match any given acetylene group
two ways. This means you'll get two possible products for every acetylene.
In the specific example you give below, these actually yield different
products, but there are more symmetric cases where the products are
degenerate.
If you were willing to only allow terminal groups - *-C#[CH] - to react you
could use [C:1]#[CH1:2] as a SMARTS pattern. This would prevent you from
getting duplicates, but I suspect this isn't what you want.

The standard approach to solving the degenerate products problem is to
uniquify them via SMILES:

products = list(chain(*products_tuples))
[Chem.SanitizeMol(x) for x in products]
uproducts = []
smis = set()
for p in products:
smi = Chem.MolToSmiles(p,isomericSmiles=True)
if smi in smis:
continue
smis.add(smi)
uproducts.append(p)

There's a bit of extra computation here due to the need to generate
canonical SMILES, but it shouldn't be too bad.

Best,
-greg





> Here's some example code, also available at https://github.com/
> tentrillion/ipython_notebooks/blob/master/rdkit_smarts_
> reactions_needs_updating.ipynb
>
> # ---BEGIN CODE-- #
> # import rdkit components
> from rdkit import rdBase
> from rdkit import Chem
> from rdkit.Chem import AllChem
> from rdkit.Chem import Draw
>
> # use IPythonConsole for pretty drawings
> from rdkit.Chem.Draw import IPythonConsole
> # IPythonConsole.ipython_useSVG=True  # leave out for github
>
> # for flattening
> from itertools import chain
>
> # define reactants
> diyne_smiles = 'C#CCC(O)C#C'
> azide_smiles = 'CCCN=[N+]=[N-]'
>
> diyne = Chem.MolFromSmiles(diyne_smiles)
> azide = Chem.MolFromSmiles(azide_smiles)
>
> # define reaction
> copper_click_smarts = '[C:1]#[C:2].[N:3]=[N+:4]=[N-:
> 5]>>[c:1]1[c:2][n-0:3][n-0:4][n-0:5]1'
> copper_click = AllChem.ReactionFromSmarts(copper_click_smarts)
>
> # run reaction
> products_tuples = copper_click.RunReactants((diyne, azide))
>
> # flatten product tuple of tuples into list
> products = list(chain(*products_tuples))
>
> # FAILS: mol property caches are not updated
> try:
> Draw.MolsToGridImage(products)
> except (RuntimeError, ValueError) as e:
> print 'FAILED!'
> my_error = e
>
> # this works: force updating
> for product in products:
> product.UpdatePropertyCache()
>
> Draw.MolsToGridImage(products)
>
> my_error
>
> products_tuples = copper_click.RunReactants((diyne, azide))
> products = list(chain(*products_tuples))
> # FAILS: mol property caches are not updated
> Draw.MolsToGridImage(products)
>
> # ---END CODE-- #
>
> The stacktrace is:
>
> ---ValueError
> Traceback (most recent call 
> last) in ()  2 products = 
> list(chain(*products_tuples))  3 # FAILS: mol property caches are not 
> updated> 4 Draw.MolsToGridImage(products)
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/IPythonConsole.pyc
>  in ShowMols(mols, **kwargs)198   else:199 fn = 
> Draw.MolsToGridImage--> 200   res = fn(mols, **kwargs)201   if 
> kwargs['useSVG']:202 return SVG(res)
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>  in MolsToGridImage(mols, molsPerRow, subImgSize, legends, 
> highlightAtomLists, useSVG, **kwargs)403   else:404 return 
> _MolsToGridImage(mols, molsPerRow=molsPerRow, subImgSize=subImgSize, 
> legends=legends,--> 405 
> highlightAtomLists=highlightAtomLists, **kwargs)406 407
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>  in _MolsToGridImage(mols, molsPerRow, subImgSize, legends, 
> highlightAtomLists, **kwargs)344   highlights = highlightAtomLists[i] 
>345 if mol is not None:--> 346   img = _moltoimg(mol, subImgSize, 
> highlights, legends[i], **kwargs)347   res.paste(img, (col * 
> subImgSize[0], row * subImgSize[1]))348   return res
> 

Re: [Rdkit-discuss] invisible methanol / water

2017-01-18 Thread Greg Landrum
That's a bug in the new drawing code with molecules that have no extent in
either the X or Y directions. Here's the github item:
https://github.com/rdkit/rdkit/issues/1271
I'm surprised that the non-SVG version is working for you.

On Wed, Jan 18, 2017 at 11:19 PM, Curt Fischer  wrote:

> Hi all,
>
> I'm not sure if this topic has come up before, but SVG rendering in
> Jupyter notebooks seems to make small molecules like methanol and water
> completely invisible.  The same molecules render successfully for me using
> non-SVG renderers.
>
> Can others reproduce this behavior?  Are there any easy-ish fixes for this?
>
> Curt
>
>
> *# import rdkit components*
> *from rdkit import Chem*
> *from rdkit.Chem import Draw*
> *from rdkit.Chem.Draw import IPythonConsole*
>
> *# two very small molecules*
> *methanol = Chem.MolFromSmiles('CO')*
> *water = Chem.MolFromSmiles('O')*
>
> *# SVG rendering seems to make small molecules like water and methanol
> completely invisible*
> *IPythonConsole.ipython_useSVG=True*
> *Draw.MolsToGridImage([water, methanol])*
>
> *# With SVG off the molecules are visible*
> *IPythonConsole.ipython_useSVG=False*
> *Draw.MolsToGridImage([water, methanol])*
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Cookbook

2017-01-18 Thread Greg Landrum
Peter is correct, those are just warnings. But they are irritating warnings.
You should be able to get rid of them by changing the call to:

fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m5,
SimilarityMaps.GetMorganFingerprint, lambda x: getProba((x,),
rf.predict_proba))

This passes getProba() a tuple with the fingerprint in question.

-greg


On Thu, Jan 19, 2017 at 12:12 AM, Peter S. Shenkin 
wrote:

> These are not errors. They are warnings, as stated in the messages.
>
> Do you have any evidence that the procedure is not working? If not, the
> reason is unlikely to be these warnings.
>
> What the message is saying is that in a future release of sklearn
> (scikit-learn), the calling code will have to use the validation module
> differently.
>
> I don't know this part of RDKit, but the purport is presumably that
> whoever is maintaining the part of RDKit that calls the validation module
> (directly or indirectly) will have to make the change described in the
> warning messages before RDKit starts using version 0.19 of sklearn. A quick
> google search indicates that version 0.19 hasn't been released yet, so the
> current calling protocol should work for now. But we have been
> forewarned
>
> -P.
>
> On Wed, Jan 18, 2017 at 2:26 PM, Chris Swain  wrote:
>
>> Hi,
>>
>> I’ve been trying a few o the examples in http://www.rdkit.org/docs/C
>> ookbook.html
>>
>> and I’d like to use the Similarity Maps as shown below
>>
>> from rdkit.Chem.Draw import SimilarityMaps
>> # helper functiondef getProba(fp, predictionFunction):
>>   return predictionFunction(fp)[0][1]
>> m5 = Chem.MolFromSmiles('c1c1O')fig, maxweight = 
>> SimilarityMaps.GetSimilarityMapForModel(m5, 
>> SimilarityMaps.GetMorganFingerprint, lambda x: getProba(x, rf.predict_proba))
>>
>> But I get this error.
>>
>> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
>> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
>> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
>> your data has a single feature or X.reshape(1, -1) if it contains a single 
>> sample.
>>   DeprecationWarning)
>> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
>> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
>> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
>> your data has a single feature or X.reshape(1, -1) if it contains a single 
>> sample.
>>   DeprecationWarning)
>> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
>> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
>> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
>> your data has a single feature or X.reshape(1, -1) if it contains a single 
>> sample.
>>   DeprecationWarning)
>> /usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
>> DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
>> raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
>> your data has a single feature or X.reshape(1, -1) if it contains a single 
>> sample.
>>   DeprecationWarning)
>>
>>
>> Cheers,
>>
>> Chris
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit "cannot create mol from SMILE" error

2017-01-18 Thread Peter S. Shenkin
In addition to Brian's observation, there is also a "C1" early in the
SMILES, but no corresponding X1 to make a ring bond before or after it.

It appears that you might be reading the second half of a SMILES for some
reason. My guess is that the (C=C1) is associated with a preceding atom
that was not read.

-P

On Wed, Jan 18, 2017 at 6:32 PM, Brian Kelley  wrote:

> That doesn't look like a valid SMILES to me, I don't think a think a
> smiles string can start with a parenthesis ( branch ).
>
> 
> Brian Kelley
>
> On Jan 18, 2017, at 6:18 PM, Larson Danes  wrote:
>
> Hi all,
>
> I'm using the following query in postgresql (with the rdkit extension
> installed):
>
> "select casrn from mols where m @> CAST(? AS mol)"
>
>
> This returns "ERROR: could not create molecule from SMILES '...' " on 
> occasion. One such SMILE that causes this error regularly is 
> '(C=C1)[N+]([O-])=O'. I'm curious if there's documentation on this specific 
> error message anywhere. I've looked and haven't had luck finding any.
>
> Any information about this error message is much appreciated.
>
>
> Thanks,
>
>
> Larson
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Milinda Samaraweera
Yes, most common should be the correct term.

Thanks,
Milinda

On Wed, Jan 18, 2017 at 5:49 PM, Peter S. Shenkin  wrote:

> You say "most stable", but I think you mean "most common." 2H is as stable
> as 1H, but less common.
>
> -P.
>
> On Wed, Jan 18, 2017 at 5:01 PM, Milinda Samaraweera <
> milindaatw...@gmail.com> wrote:
>
>> Hi Bob,
>>
>> I am trying to filter out any compound that does not have the most stable
>> isotopic form;  (anything other than: 12C,1H,14N,16O, 31P, 32S) or to
>> contain only MonoIsotopic compounds.
>>
>> Thanks,
>> Milinda
>> ​
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>


-- 
Milinda Samaraweera, Ph.D.
Postdoctoral Fellow, Department of Pharmacy
University of Connecticut
69 North Eagleville road
Storrs, CT, 06269
milindaatw...@gmail.com
860-617-8046
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit "cannot create mol from SMILE" error

2017-01-18 Thread Brian Kelley
That doesn't look like a valid SMILES to me, I don't think a think a smiles 
string can start with a parenthesis ( branch ).


Brian Kelley

> On Jan 18, 2017, at 6:18 PM, Larson Danes  wrote:
> 
> Hi all,
> 
> I'm using the following query in postgresql (with the rdkit extension 
> installed):
> 
> "select casrn from mols where m @> CAST(? AS mol)"
> 
> This returns "ERROR: could not create molecule from SMILES '...' " on 
> occasion. One such SMILE that causes this error regularly is 
> '(C=C1)[N+]([O-])=O'. I'm curious if there's documentation on this specific 
> error message anywhere. I've looked and haven't had luck finding any.
> Any information about this error message is much appreciated.
> 
> Thanks,
> 
> Larson
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit "cannot create mol from SMILE" error

2017-01-18 Thread Larson Danes
Hi all,

I'm using the following query in postgresql (with the rdkit extension
installed):

"select casrn from mols where m @> CAST(? AS mol)"


This returns "ERROR: could not create molecule from SMILES '...' " on
occasion. One such SMILE that causes this error regularly is
'(C=C1)[N+]([O-])=O'. I'm curious if there's documentation on this
specific error message anywhere. I've looked and haven't had luck
finding any.

Any information about this error message is much appreciated.


Thanks,


Larson
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Peter S. Shenkin
You say "most stable", but I think you mean "most common." 2H is as stable
as 1H, but less common.

-P.

On Wed, Jan 18, 2017 at 5:01 PM, Milinda Samaraweera <
milindaatw...@gmail.com> wrote:

> Hi Bob,
>
> I am trying to filter out any compound that does not have the most stable
> isotopic form;  (anything other than: 12C,1H,14N,16O, 31P, 32S) or to
> contain only MonoIsotopic compounds.
>
> Thanks,
> Milinda
> ​
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Milinda Samaraweera
Hi Bob,

I am trying to filter out any compound that does not have the most stable
isotopic form;  (anything other than: 12C,1H,14N,16O, 31P, 32S) or to
contain only MonoIsotopic compounds.

Thanks,
Milinda
​
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Bob Funchess
Hi Milinda,



As an aside, most of the isotopes you listed are stable.  The only nuclides
in the list that are actually unstable are 14C, 3H, 24P and 46P.



If your goal is to exclude isotopically enriched structures rather than
radioactive ones, it might be better to just look for ANY isotopic
specification, rather than for a specific list.



Kind Regards,

Bob



--

Bob Funchess, Ph.D.Kelaroo,
Inc

Director of Software Support & Development
www.kelaroo.com

bfunch...@kelaroo.com (858)
259-7561 x3







*From:* Milinda Samaraweera [mailto:milindaatw...@gmail.com]
*Sent:* Wednesday, January 18, 2017 11:48 AM
*To:* Greg Landrum 
*Cc:* RDKit Discuss 
*Subject:* Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit



Greg,

I am looking to remove entries that contain un-stable isotopes of elements
CHNOPS (e.g. heavy_isotopes =['13C', '14C', '2H', '3H', '15N', '24P',
'46P', '33S', '34S', '36S'] ). Is there a way to modify the above code to
achieve that?

Thanks,

Milinda





On Wed, Jan 18, 2017 at 11:16 AM, Greg Landrum 
wrote:

Hi Milinda,



Here's an approach that finds all the atoms that have an isotope specified:



In [1]: from rdkit import Chem



In [2]: from rdkit.Chem import rdqueries



In [3]: q = rdqueries.IsotopeGreaterQueryAtom(1)



In [7]: list(x.GetIdx() for x in
Chem.MolFromSmiles('CC[13CH3]').GetAtomsMatchingQuery(q))

Out[7]: [2]



In [8]: list(x.GetIdx() for x in
Chem.MolFromSmiles('[12CH3]CC[13CH3]').GetAtomsMatchingQuery(q))

Out[8]: [0, 3]



Does that do what you want it to do?



-greg







On Wed, Jan 18, 2017 at 3:56 PM, Milinda Samaraweera <
milindaatw...@gmail.com> wrote:

Dear Experts,

I am trying to figure out a way to exclude entries which contain heavy
atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
entries) and write an updated file with the remaining entries.

I do understand how to read/write SD files using rdkit.

What I do understand is how to detect entries with heavy isotopes: Is there
an efficient and correct way of achieving this using rdkit?



thanks,

-- 

Milinda Samaraweera



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






-- 

Milinda Samaraweera, Ph.D.

Postdoctoral Fellow, Department of Pharmacy

University of Connecticut

69 North Eagleville road

Storrs, CT, 06269

milindaatw...@gmail.com
860-617-8046
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Stiefl, Nikolaus
Hi
Maybe this is much less efficient but I guess if you need it for specific 
isotopes then you could try using a smarts pattern and check for that?

In [20]: q = Chem.MolFromSmarts("[13C,14C,2H,3H,15N,24P,46P,33S,34S,36S]")

In [21]: m = Chem.MolFromSmiles('CC[15NH2]')

In [22]: m.HasSubstructMatch(q)
Out[22]: True


So you could loop over your molecules and then remove the ones that match the 
smarts.
Ciao
Nik


From: Milinda Samaraweera 
Date: Wednesday 18 January 2017 at 20:47
To: Greg Landrum 
Cc: RDKit Discuss 
Subject: Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

Greg,
I am looking to remove entries that contain un-stable isotopes of elements 
CHNOPS (e.g. heavy_isotopes =['13C', '14C', '2H', '3H', '15N', '24P', '46P', 
'33S', '34S', '36S'] ). Is there a way to modify the above code to achieve that?
Thanks,
Milinda


On Wed, Jan 18, 2017 at 11:16 AM, Greg Landrum 
> wrote:
Hi Milinda,

Here's an approach that finds all the atoms that have an isotope specified:

In [1]: from rdkit import Chem

In [2]: from rdkit.Chem import rdqueries

In [3]: q = rdqueries.IsotopeGreaterQueryAtom(1)

In [7]: list(x.GetIdx() for x in 
Chem.MolFromSmiles('CC[13CH3]').GetAtomsMatchingQuery(q))
Out[7]: [2]

In [8]: list(x.GetIdx() for x in 
Chem.MolFromSmiles('[12CH3]CC[13CH3]').GetAtomsMatchingQuery(q))
Out[8]: [0, 3]

Does that do what you want it to do?

-greg



On Wed, Jan 18, 2017 at 3:56 PM, Milinda Samaraweera 
> wrote:
Dear Experts,
I am trying to figure out a way to exclude entries which contain heavy atoms 
(13C, 2H, 3H, etc), from a SD file (which has close to two thousand entries) 
and write an updated file with the remaining entries.

I do understand how to read/write SD files using rdkit.

What I do understand is how to detect entries with heavy isotopes: Is there an 
efficient and correct way of achieving this using rdkit?

thanks,
--
Milinda Samaraweera

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Milinda Samaraweera, Ph.D.
Postdoctoral Fellow, Department of Pharmacy
University of Connecticut
69 North Eagleville road
Storrs, CT, 06269
milindaatw...@gmail.com
860-617-8046
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Milinda Samaraweera
Nik,

That works too...

Thanks
Milinda

On Wed, Jan 18, 2017 at 3:08 PM, Stiefl, Nikolaus <
nikolaus.sti...@novartis.com> wrote:

> Hi
>
> Maybe this is much less efficient but I guess if you need it for specific
> isotopes then you could try using a smarts pattern and check for that?
>
>
>
> In [*20*]: q = Chem.MolFromSmarts("[13C,14C,2H,3H,15N,24P,46P,33S,34S,36S]
> ")
>
>
>
> In [*21*]: m = Chem.MolFromSmiles('CC[15NH2]')
>
>
>
> In [*22*]: m.HasSubstructMatch(q)
>
> Out[*22*]: True
>
>
>
>
>
> So you could loop over your molecules and then remove the ones that match
> the smarts.
>
> Ciao
>
> Nik
>
>
>
>
>
> *From: *Milinda Samaraweera 
> *Date: *Wednesday 18 January 2017 at 20:47
> *To: *Greg Landrum 
> *Cc: *RDKit Discuss 
> *Subject: *Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit
>
>
>
> Greg,
>
> I am looking to remove entries that contain un-stable isotopes of elements
> CHNOPS (e.g. heavy_isotopes =['13C', '14C', '2H', '3H', '15N', '24P',
> '46P', '33S', '34S', '36S'] ). Is there a way to modify the above code to
> achieve that?
>
> Thanks,
>
> Milinda
>
>
>
>
>
> On Wed, Jan 18, 2017 at 11:16 AM, Greg Landrum 
> wrote:
>
> Hi Milinda,
>
>
>
> Here's an approach that finds all the atoms that have an isotope specified:
>
>
>
> In [1]: from rdkit import Chem
>
>
>
> In [2]: from rdkit.Chem import rdqueries
>
>
>
> In [3]: q = rdqueries.IsotopeGreaterQueryAtom(1)
>
>
>
> In [7]: list(x.GetIdx() for x in Chem.MolFromSmiles('CC[13CH3]'
> ).GetAtomsMatchingQuery(q))
>
> Out[7]: [2]
>
>
>
> In [8]: list(x.GetIdx() for x in Chem.MolFromSmiles('[12CH3]CC[13CH3]').
> GetAtomsMatchingQuery(q))
>
> Out[8]: [0, 3]
>
>
>
> Does that do what you want it to do?
>
>
>
> -greg
>
>
>
>
>
>
>
> On Wed, Jan 18, 2017 at 3:56 PM, Milinda Samaraweera <
> milindaatw...@gmail.com> wrote:
>
> Dear Experts,
>
> I am trying to figure out a way to exclude entries which contain heavy
> atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
> entries) and write an updated file with the remaining entries.
>
> I do understand how to read/write SD files using rdkit.
>
> What I do understand is how to detect entries with heavy isotopes: Is
> there an efficient and correct way of achieving this using rdkit?
>
>
>
> thanks,
>
> --
>
> Milinda Samaraweera
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
>
>
> --
>
> Milinda Samaraweera, Ph.D.
>
> Postdoctoral Fellow, Department of Pharmacy
>
> University of Connecticut
>
> 69 North Eagleville road
>
> Storrs, CT, 06269
>
> milindaatw...@gmail.com
> 860-617-8046 <(860)%20617-8046>
>



-- 
Milinda Samaraweera, Ph.D.
Postdoctoral Fellow, Department of Pharmacy
University of Connecticut
69 North Eagleville road
Storrs, CT, 06269
milindaatw...@gmail.com
860-617-8046
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Milinda Samaraweera
Greg,

I am looking to remove entries that contain un-stable isotopes of elements
CHNOPS (e.g. heavy_isotopes =['13C', '14C', '2H', '3H', '15N', '24P',
'46P', '33S', '34S', '36S'] ). Is there a way to modify the above code to
achieve that?

Thanks,
Milinda



On Wed, Jan 18, 2017 at 11:16 AM, Greg Landrum 
wrote:

> Hi Milinda,
>
> Here's an approach that finds all the atoms that have an isotope specified:
>
> In [1]: from rdkit import Chem
>
> In [2]: from rdkit.Chem import rdqueries
>
> In [3]: q = rdqueries.IsotopeGreaterQueryAtom(1)
>
> In [7]: list(x.GetIdx() for x in Chem.MolFromSmiles('CC[13CH3]'
> ).GetAtomsMatchingQuery(q))
> Out[7]: [2]
>
> In [8]: list(x.GetIdx() for x in Chem.MolFromSmiles('[12CH3]CC[13CH3]').
> GetAtomsMatchingQuery(q))
> Out[8]: [0, 3]
>
> Does that do what you want it to do?
>
> -greg
>
>
>
> On Wed, Jan 18, 2017 at 3:56 PM, Milinda Samaraweera <
> milindaatw...@gmail.com> wrote:
>
>> Dear Experts,
>>
>> I am trying to figure out a way to exclude entries which contain heavy
>> atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
>> entries) and write an updated file with the remaining entries.
>>
>> I do understand how to read/write SD files using rdkit.
>>
>> What I do understand is how to detect entries with heavy isotopes: Is
>> there an efficient and correct way of achieving this using rdkit?
>>
>> thanks,
>> --
>> Milinda Samaraweera
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>


-- 
Milinda Samaraweera, Ph.D.
Postdoctoral Fellow, Department of Pharmacy
University of Connecticut
69 North Eagleville road
Storrs, CT, 06269
milindaatw...@gmail.com
860-617-8046
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Cookbook

2017-01-18 Thread Chris Swain
Hi,

I’ve been trying a few o the examples in 
http://www.rdkit.org/docs/Cookbook.html 


and I’d like to use the Similarity Maps as shown below

from rdkit.Chem.Draw import SimilarityMaps

# helper function
def getProba(fp, predictionFunction):
  return predictionFunction(fp)[0][1]

m5 = Chem.MolFromSmiles('c1c1O')
fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m5, 
SimilarityMaps.GetMorganFingerprint, lambda x: getProba(x, rf.predict_proba))
But I get this error.

/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
your data has a single feature or X.reshape(1, -1) if it contains a single 
sample.
  DeprecationWarning)
/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
your data has a single feature or X.reshape(1, -1) if it contains a single 
sample.
  DeprecationWarning)
/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
your data has a single feature or X.reshape(1, -1) if it contains a single 
sample.
  DeprecationWarning)
/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will 
raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if 
your data has a single feature or X.reshape(1, -1) if it contains a single 
sample.
  DeprecationWarning)

Cheers,

Chris--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Greg Landrum
Hi Milinda,

Here's an approach that finds all the atoms that have an isotope specified:

In [1]: from rdkit import Chem

In [2]: from rdkit.Chem import rdqueries

In [3]: q = rdqueries.IsotopeGreaterQueryAtom(1)

In [7]: list(x.GetIdx() for x in
Chem.MolFromSmiles('CC[13CH3]').GetAtomsMatchingQuery(q))
Out[7]: [2]

In [8]: list(x.GetIdx() for x in
Chem.MolFromSmiles('[12CH3]CC[13CH3]').GetAtomsMatchingQuery(q))
Out[8]: [0, 3]

Does that do what you want it to do?

-greg



On Wed, Jan 18, 2017 at 3:56 PM, Milinda Samaraweera <
milindaatw...@gmail.com> wrote:

> Dear Experts,
>
> I am trying to figure out a way to exclude entries which contain heavy
> atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
> entries) and write an updated file with the remaining entries.
>
> I do understand how to read/write SD files using rdkit.
>
> What I do understand is how to detect entries with heavy isotopes: Is
> there an efficient and correct way of achieving this using rdkit?
>
> thanks,
> --
> Milinda Samaraweera
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Peter S. Shenkin
How about a regex filter on the all-atom SMILES?

-P.

On Wed, Jan 18, 2017 at 9:56 AM, Milinda Samaraweera <
milindaatw...@gmail.com> wrote:

> Dear Experts,
>
> I am trying to figure out a way to exclude entries which contain heavy
> atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
> entries) and write an updated file with the remaining entries.
>
> I do understand how to read/write SD files using rdkit.
>
> What I do understand is how to detect entries with heavy isotopes: Is
> there an efficient and correct way of achieving this using rdkit?
>
> thanks,
> --
> Milinda Samaraweera
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Milinda Samaraweera
Dear Experts,

I am trying to figure out a way to exclude entries which contain heavy
atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
entries) and write an updated file with the remaining entries.

I do understand how to read/write SD files using rdkit.

What I do understand is how to detect entries with heavy isotopes: Is there
an efficient and correct way of achieving this using rdkit?

thanks,
-- 
Milinda Samaraweera
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Kekulizing thiazoles

2017-01-18 Thread Rafal Roszak
On Tue, 17 Jan 2017 16:52:36 +
Chris Arthur  wrote:

> ValueError: Sanitization error: Can't kekulize mol

In most case I have 'Can't kekulize mol' error for hetorocycle with hydrogen on 
nitrogen and smiles which have not explicite hydrogen on N.
Exempli gratia:

>>> Chem.MolFromSmiles('c1ccnc1')
[10:23:21] Can't kekulize mol 
>>> Chem.MolFromSmiles('c1cc[nH]c1')



> I can generate a smiles string from it (I had thought of doing a smiles to
> molecule conversion)

so if this is the issue, you can convert your Mol object to smiles add missing 
H and build Mol from this new smiles.

Regards,

Rafał

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss