Re: [Rdkit-discuss] Fast similarity search
Hi Tim, according to https://www.knime.org/files/01_greg_landrum.pdf, the PostgreSQL cartridge can compare ~1 million compounds/sec on a single CPU (and this talk is from 2011). ChemFP is much faster if you pre-load all your FPs into main memory. Hope this helps, Nils Am 18.05.2017 um 23:15 schrieb Tim Dudgeon: > I think I recall Greg mentioning that RDKit can be used for very fast > similarity search (e.g. all vs. all comparisons or searches against > multi-million sized datasets). > If so, is this part the of the standard distro, or something extra > (chemfp?). > And can it run inside the cartridge? > And any benchmarks? > > Thanks > Tim > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MaxMinPicker Bug
Hi Steve, That is indeed a bug. thanks for the detailed report! Here's a very small reproducible that demonstrates it: def pick2(n=1000,m=10,seed=2748): def func(i, j): assert(ihttps://github.com/rdkit/rdkit/issues/1421 -greg On Thu, May 18, 2017 at 8:59 PM, Steven Wilkenswrote: > I've been using MaxMinPicker() to run a series of simulations where I > select several small subsets of molecules from a larger set and I've come > across some odd behavior. In summary, this is my algorithm: > > 1. select a small subset using MaxMinPicker.Pick() > 2. remove that subset from the input set > 3. repeat until the desired number of subsets is reached > 4. store subsets, and restart the process to generate a new set of subsets > > The process seems to work fine for a few simulations. However, eventually > and randomly MaxMinPicker.Pick() returns an index that is 1 position above > the end of the input array. After debugging the behavior, I added error > checking to detect this situation. This fix works fine in Linux. However, > my fix does not work in Windows. The error condition is detected, but > Python still crashes. > > The most obvious source of the bug is that I'm making an error when I > construct the input matrix. However, I've gone over my code several times > and I'm quite sure I'm doing it right. Also, successful simulations produce > subsets that are diverse by the desired metric. Unfortunately, the random > nature of the bug makes it difficult to pinpoint the root cause. My current > hunch is that MaxMinPicker has some static variables that are hanging > around from one run to the next. If that is the case, one would only > encounter the bug if one were to repeatedly call the Pick() method within a > single script like I am doing (maybe that is why no one has encountered > this bug yet?) > > Any help would be most appreciated. Thanks! > Regards, > Steve > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Fast similarity search
I think I recall Greg mentioning that RDKit can be used for very fast similarity search (e.g. all vs. all comparisons or searches against multi-million sized datasets). If so, is this part the of the standard distro, or something extra (chemfp?). And can it run inside the cartridge? And any benchmarks? Thanks Tim -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MaxMinPicker Bug
I've been using MaxMinPicker() to run a series of simulations where I select several small subsets of molecules from a larger set and I've come across some odd behavior. In summary, this is my algorithm: 1. select a small subset using MaxMinPicker.Pick() 2. remove that subset from the input set 3. repeat until the desired number of subsets is reached 4. store subsets, and restart the process to generate a new set of subsets The process seems to work fine for a few simulations. However, eventually and randomly MaxMinPicker.Pick() returns an index that is 1 position above the end of the input array. After debugging the behavior, I added error checking to detect this situation. This fix works fine in Linux. However, my fix does not work in Windows. The error condition is detected, but Python still crashes. The most obvious source of the bug is that I'm making an error when I construct the input matrix. However, I've gone over my code several times and I'm quite sure I'm doing it right. Also, successful simulations produce subsets that are diverse by the desired metric. Unfortunately, the random nature of the bug makes it difficult to pinpoint the root cause. My current hunch is that MaxMinPicker has some static variables that are hanging around from one run to the next. If that is the case, one would only encounter the bug if one were to repeatedly call the Pick() method within a single script like I am doing (maybe that is why no one has encountered this bug yet?) Any help would be most appreciated. Thanks! Regards, Steve -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?
Hi Brian, thanks a lot, this will be very useful to get my head around smarts writing. Best, Alexis On 18 May 2017 at 02:06, Brian Kelleywrote: > Dear All, > In case it helps, there is a wealth of functional groups already in > RDKit available here: > > https://github.com/rdkit/rdkit/blob/master/Data/ > Functional_Group_Hierarchy.txt > > For instance, the functional group halogen pattern we use is a bit more > complicated: > > [$([F,Cl,Br,I]-!@[#6]);!$([F,Cl,Br,I]-!@C-!@[F,Cl,Br,I]);!$ > ([F,Cl,Br,I]-[C,S](=[O,S,N]))] > > That can (1) help you write your own patterns and (2) be used (from > python) as follows: > > > from __future__ import print_function > from rdkit import Chem > from rdkit.Chem import FilterCatalog > > queryDefs = FilterCatalog.GetFlattenedFunctionalGroupHierarchy() > smiles = "ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" > mol = Chem.MolFromSmiles(smiles) > items = sorted(queryDefs.items()) > for name, pat in items: >print("%s\t%s"%(name, mol.HasSubstructMatch(pat))) > > > AcidChloride False > > AcidChloride.Aliphatic False > > AcidChloride.Aromatic False > > Alcohol False > > Alcohol.Aliphatic False > > Alcohol.Aromatic False > > Aldehyde False > > Aldehyde.Aliphatic False > > Aldehyde.Aromatic False > > Amine True > > Amine.Aliphatic True > > Amine.Aromatic False > > Amine.Cyclic True > > Amine.Primary False > > Amine.Primary.Aliphatic False > > Amine.Primary.Aromatic False > > Amine.Secondary True > > Amine.Secondary.Aliphatic True > > Amine.Secondary.Aromatic False > > Amine.Tertiary False > > Amine.Tertiary.Aliphatic False > > Amine.Tertiary.Aromatic False > > Azide False > > Azide.Aliphatic False > > Azide.Aromatic False > > BoronicAcid False > > BoronicAcid.Aliphatic False > > BoronicAcid.Aromatic False > > CarboxylicAcid False > > CarboxylicAcid.Aliphatic False > > CarboxylicAcid.AlphaAmino False > > CarboxylicAcid.Aromatic False > > Halogen True > > Halogen.Aliphatic False > > Halogen.Aromatic True > > Halogen.Bromine False > > Halogen.Bromine.Aliphatic False > > Halogen.Bromine.Aromatic False > > Halogen.Bromine.BromoKetone False > > Halogen.NotFluorine True > > Halogen.NotFluorine.Aliphatic False > > Halogen.NotFluorine.Aromatic True > > Isocyanate False > > Isocyanate.Aliphatic False > > Isocyanate.Aromatic False > > Nitro False > > Nitro.Aliphatic False > > Nitro.Aromatic False > > SulfonylChloride False > > SulfonylChloride.Aliphatic False > > SulfonylChloride.Aromatic False > > TerminalAlkyne False > > > Cheers, > Brian > > On Wed, May 17, 2017 at 9:20 AM, Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Hi Michal, thanks for your response. >> I think I made a typo somewhere in my previous code since it now works >> fine, even without the the kekule notation... Sorry about the confusion... >> Best, >> >> Alexis >> >> On 17 May 2017 at 13:59, Michal Krompiec >> wrote: >> >>> Hi Alexis, >>> Try aromatic form instead of Kekule notation. >>> Best, >>> Michal >>> >>> On 17 May 2017 at 12:55, Alexis Parenty >>> wrote: >>> Hi everyone, I am looking for substructure match between a smarts and a smiles, but I want any heteroatom from the smarts to match any heteroatom from a smiles: [image: Inline images 1] The following does not return what I would expect: smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1" mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2) *print*("mol1 is a substructure of mol2: {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of mol1: {}".format(mol1.HasSubstructMatch(mol2))) ð mol1 is a substructure of mol2: False ð mol2 is a substructure of mol1: False How could I do that? Thanks, Alexis > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech