Hi Maciek, thanks for your response. I did try that function too, but it
also takes smiles only (not smarts). I think the solution of Gregori is
very interesting: I am going to transform all smiles and smarts into their
single-bonded-carbon-based skeleton and will store the pattern fingerprint
of those skeletons in a dictionary using the smarts or the smiles as a key.
Then I will use your proposed function to match the sub-skeletons with
skeletons and will only do the expensive molecular graph substructure
search of the keys of the dictionary from which the dictionary values have
been identified as potential substructure of others. Thanks Gregori!
Any other good tips?
Cheers,
Alexis

On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski <mac...@wojcikowski.pl>
wrote:

> Alexis,
>
> I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the
> function you are looking for here. More advanced usage and code snippets
> you can find on RDKit blog post that Greg has put together here:
> https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html
>
> Best,
> Maciek
>
> ----
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
>
> pon., 10 lut 2020 o 16:10 Alexis Parenty <alexis.parenty.h...@gmail.com>
> napisał(a):
>
>> Dear Rdkiters,
>>
>> I am interested in doing substructure searches between many thousands
>> structures and many thousands of fragments, as quickly as possible, with
>> reasonable accuracy (> 0.95)...
>>
>> I did read Greg's excellent post on that subject:
>>
>>
>> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>>
>> I was using the rdkit pattern fingerprint approach to filter out any
>> fragments that have no chance of matching the bigger structure through the
>> slow and more accurate molecular graph approach, saving a lot of time.
>>
>> However, I realized that this rdkit pattern fingerprint approach only
>> works well if we compared smiles with smiles:
>>
>>
>>
>> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>>     pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
>>     pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>>
>>     frag_bits = set(pfp_frag.GetOnBits())
>>     structure_bits = set(pfp_structure.GetOnBits())
>>
>>     if frag_bits.issubset(structure_bits):
>>         return True
>>     else:
>>         return False
>>
>>
>>
>> Unfortunately, some of my fragments are Smarts that are not valid Smiles:
>> Using Chem.MolFromSmarts(smarts) gives really poor result (Many False
>> Positives leading to poor Specificity). Interestingly, there is no False
>> Negative, leading to a Sensitivity of 1!
>>
>>
>>
>> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>>     pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
>>     pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>>
>>     frag_bits = set(pfp_frag.GetOnBits())
>>     structure_bits = set(pfp_structure.GetOnBits())
>>
>>     if frag_bits.issubset(structure_bits):
>>         return True
>>     else:
>>         return False
>>
>>
>>
>> Is there a way to use pattern fingerprint (or other method) for
>> substructure searches independently of the Smiles/Smarts format of the
>> fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I
>> am left with?
>>
>> Many thanks,
>>
>> Alexis
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to