Re: [Rdkit-discuss] Fast similarity search

2017-05-18 Thread Nils Weskamp
Hi Tim,

according to https://www.knime.org/files/01_greg_landrum.pdf, the
PostgreSQL cartridge can compare ~1 million compounds/sec on a single
CPU (and this talk is from 2011). ChemFP is much faster if you pre-load
all your FPs into main memory.

Hope this helps,
Nils

Am 18.05.2017 um 23:15 schrieb Tim Dudgeon:
> I think I recall Greg mentioning that RDKit can be used for very fast 
> similarity search (e.g. all vs. all comparisons or searches against 
> multi-million sized datasets).
> If so, is this part the of the standard distro, or something extra 
> (chemfp?).
> And can it run inside the cartridge?
> And any benchmarks?
> 
> Thanks
> Tim
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MaxMinPicker Bug

2017-05-18 Thread Greg Landrum
Hi Steve,

That is indeed a bug. thanks for the detailed report!

Here's a very small reproducible that demonstrates it:

def pick2(n=1000,m=10,seed=2748):
def func(i, j):
  assert(ihttps://github.com/rdkit/rdkit/issues/1421

-greg





On Thu, May 18, 2017 at 8:59 PM, Steven Wilkens  wrote:

> I've been using MaxMinPicker() to run a series of simulations where I
> select several small subsets of molecules from a larger set and I've come
> across some odd behavior. In summary, this is my algorithm:
>
> 1. select a small subset using MaxMinPicker.Pick()
> 2. remove that subset from the input set
> 3. repeat until the desired number of subsets is reached
> 4. store subsets, and restart the process to generate a new set of subsets
>
> The process seems to work fine for a few simulations. However, eventually
> and randomly MaxMinPicker.Pick() returns an index that is 1 position above
> the end of the input array. After debugging the behavior, I added error
> checking to detect this situation. This fix works fine in Linux. However,
> my fix does not work in Windows. The error condition is detected, but
> Python still crashes.
>
> The most obvious source of the bug is that I'm making an error when I
> construct the input matrix. However, I've gone over my code several times
> and I'm quite sure I'm doing it right. Also, successful simulations produce
> subsets that are diverse by the desired metric. Unfortunately, the random
> nature of the bug makes it difficult to pinpoint the root cause. My current
> hunch is that MaxMinPicker has some static variables that are hanging
> around from one run to the next. If that is the case, one would only
> encounter the bug if one were to repeatedly call the Pick() method within a
> single script like I am doing (maybe that is why no one has encountered
> this bug yet?)
>
> Any help would be most appreciated. Thanks!
> Regards,
> Steve
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fast similarity search

2017-05-18 Thread Tim Dudgeon
I think I recall Greg mentioning that RDKit can be used for very fast 
similarity search (e.g. all vs. all comparisons or searches against 
multi-million sized datasets).
If so, is this part the of the standard distro, or something extra 
(chemfp?).
And can it run inside the cartridge?
And any benchmarks?

Thanks
Tim


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MaxMinPicker Bug

2017-05-18 Thread Steven Wilkens
I've been using MaxMinPicker() to run a series of simulations where I
select several small subsets of molecules from a larger set and I've come
across some odd behavior. In summary, this is my algorithm:

1. select a small subset using MaxMinPicker.Pick()
2. remove that subset from the input set
3. repeat until the desired number of subsets is reached
4. store subsets, and restart the process to generate a new set of subsets

The process seems to work fine for a few simulations. However, eventually
and randomly MaxMinPicker.Pick() returns an index that is 1 position above
the end of the input array. After debugging the behavior, I added error
checking to detect this situation. This fix works fine in Linux. However,
my fix does not work in Windows. The error condition is detected, but
Python still crashes.

The most obvious source of the bug is that I'm making an error when I
construct the input matrix. However, I've gone over my code several times
and I'm quite sure I'm doing it right. Also, successful simulations produce
subsets that are diverse by the desired metric. Unfortunately, the random
nature of the bug makes it difficult to pinpoint the root cause. My current
hunch is that MaxMinPicker has some static variables that are hanging
around from one run to the next. If that is the case, one would only
encounter the bug if one were to repeatedly call the Pick() method within a
single script like I am doing (maybe that is why no one has encountered
this bug yet?)

Any help would be most appreciated. Thanks!
Regards,
Steve
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-18 Thread Alexis Parenty
Hi Brian, thanks a lot, this will be very useful to get my head around
smarts writing.
Best,
Alexis

On 18 May 2017 at 02:06, Brian Kelley  wrote:

> Dear All,
>   In case it helps, there is a wealth of functional groups already in
> RDKit available here:
>
> https://github.com/rdkit/rdkit/blob/master/Data/
> Functional_Group_Hierarchy.txt
>
> For instance, the functional group halogen pattern we use is a bit more
> complicated:
>
> [$([F,Cl,Br,I]-!@[#6]);!$([F,Cl,Br,I]-!@C-!@[F,Cl,Br,I]);!$
> ([F,Cl,Br,I]-[C,S](=[O,S,N]))]
>
> That can (1) help you write your own patterns and (2) be used (from
> python) as follows:
>
>
> from __future__ import print_function
> from rdkit import Chem
> from rdkit.Chem import FilterCatalog
>
> queryDefs = FilterCatalog.GetFlattenedFunctionalGroupHierarchy()
> smiles = "ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
> mol = Chem.MolFromSmiles(smiles)
> items = sorted(queryDefs.items())
> for name, pat in items:
>print("%s\t%s"%(name, mol.HasSubstructMatch(pat)))
>
>
> AcidChloride False
>
> AcidChloride.Aliphatic False
>
> AcidChloride.Aromatic False
>
> Alcohol False
>
> Alcohol.Aliphatic False
>
> Alcohol.Aromatic False
>
> Aldehyde False
>
> Aldehyde.Aliphatic False
>
> Aldehyde.Aromatic False
>
> Amine True
>
> Amine.Aliphatic True
>
> Amine.Aromatic False
>
> Amine.Cyclic True
>
> Amine.Primary False
>
> Amine.Primary.Aliphatic False
>
> Amine.Primary.Aromatic False
>
> Amine.Secondary True
>
> Amine.Secondary.Aliphatic True
>
> Amine.Secondary.Aromatic False
>
> Amine.Tertiary False
>
> Amine.Tertiary.Aliphatic False
>
> Amine.Tertiary.Aromatic False
>
> Azide False
>
> Azide.Aliphatic False
>
> Azide.Aromatic False
>
> BoronicAcid False
>
> BoronicAcid.Aliphatic False
>
> BoronicAcid.Aromatic False
>
> CarboxylicAcid False
>
> CarboxylicAcid.Aliphatic False
>
> CarboxylicAcid.AlphaAmino False
>
> CarboxylicAcid.Aromatic False
>
> Halogen True
>
> Halogen.Aliphatic False
>
> Halogen.Aromatic True
>
> Halogen.Bromine False
>
> Halogen.Bromine.Aliphatic False
>
> Halogen.Bromine.Aromatic False
>
> Halogen.Bromine.BromoKetone False
>
> Halogen.NotFluorine True
>
> Halogen.NotFluorine.Aliphatic False
>
> Halogen.NotFluorine.Aromatic True
>
> Isocyanate False
>
> Isocyanate.Aliphatic False
>
> Isocyanate.Aromatic False
>
> Nitro False
>
> Nitro.Aliphatic False
>
> Nitro.Aromatic False
>
> SulfonylChloride False
>
> SulfonylChloride.Aliphatic False
>
> SulfonylChloride.Aromatic False
>
> TerminalAlkyne False
>
>
> Cheers,
>  Brian
>
> On Wed, May 17, 2017 at 9:20 AM, Alexis Parenty <
> alexis.parenty.h...@gmail.com> wrote:
>
>> Hi Michal, thanks for your response.
>> I think I made a typo somewhere in my previous code since it now works
>> fine, even without the the kekule notation... Sorry about the confusion...
>> Best,
>>
>> Alexis
>>
>> On 17 May 2017 at 13:59, Michal Krompiec 
>> wrote:
>>
>>> Hi Alexis,
>>> Try aromatic form instead of Kekule notation.
>>> Best,
>>> Michal
>>>
>>> On 17 May 2017 at 12:55, Alexis Parenty 
>>> wrote:
>>>
 Hi everyone,

 I am looking for substructure match between a smarts and a smiles, but
 I want any heteroatom from the smarts to match any heteroatom from a 
 smiles:


 [image: Inline images 1]





 The following does not return what I would expect:

 smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " 
 ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"

 mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2)
 *print*("mol1 is a substructure of mol2: 
 {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of 
 mol1: {}".format(mol1.HasSubstructMatch(mol2)))



 ð  mol1 is a substructure of mol2: False

 ð  mol2 is a substructure of mol1: False

 How could I do that?



 Thanks,



 Alexis


>
 
 --
 Check out the vibrant tech community on one of the world's most
 engaging tech sites, Slashdot.org! http://sdm.link/slashdot
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech