The function takes two Explicit or Sparse bit vectors. Could you elaborate
on what you mean that it accept smarts only? PatternFingerprints will work
with SMARTS too.

It is always more effective to have the SMARTS as explicit as possible,
since if you have all alternative atoms, the FP cannot make a lot of
assumptions about the molecule, so things like filling your valences on
atoms and defining bonds explicitly as single will help a lot. For very
small SMARTS the screen out rate might be small anyhow, unfortunately.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 10 lut 2020 o 21:02 Alexis Parenty <alexis.parenty.h...@gmail.com>
napisał(a):

> Hi Maciek, thanks for your response. I did try that function too, but it
> also takes smiles only (not smarts). I think the solution of Gregori is
> very interesting: I am going to transform all smiles and smarts into their
> single-bonded-carbon-based skeleton and will store the pattern fingerprint
> of those skeletons in a dictionary using the smarts or the smiles as a key.
> Then I will use your proposed function to match the sub-skeletons with
> skeletons and will only do the expensive molecular graph substructure
> search of the keys of the dictionary from which the dictionary values have
> been identified as potential substructure of others. Thanks Gregori!
> Any other good tips?
> Cheers,
> Alexis
>
> On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski <mac...@wojcikowski.pl>
> wrote:
>
>> Alexis,
>>
>> I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the
>> function you are looking for here. More advanced usage and code snippets
>> you can find on RDKit blog post that Greg has put together here:
>> https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html
>>
>> Best,
>> Maciek
>>
>> ----
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>>
>> pon., 10 lut 2020 o 16:10 Alexis Parenty <alexis.parenty.h...@gmail.com>
>> napisał(a):
>>
>>> Dear Rdkiters,
>>>
>>> I am interested in doing substructure searches between many thousands
>>> structures and many thousands of fragments, as quickly as possible, with
>>> reasonable accuracy (> 0.95)...
>>>
>>> I did read Greg's excellent post on that subject:
>>>
>>>
>>> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>>>
>>> I was using the rdkit pattern fingerprint approach to filter out any
>>> fragments that have no chance of matching the bigger structure through the
>>> slow and more accurate molecular graph approach, saving a lot of time.
>>>
>>> However, I realized that this rdkit pattern fingerprint approach only
>>> works well if we compared smiles with smiles:
>>>
>>>
>>>
>>> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>>>     pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
>>>     pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>>>
>>>     frag_bits = set(pfp_frag.GetOnBits())
>>>     structure_bits = set(pfp_structure.GetOnBits())
>>>
>>>     if frag_bits.issubset(structure_bits):
>>>         return True
>>>     else:
>>>         return False
>>>
>>>
>>>
>>> Unfortunately, some of my fragments are Smarts that are not valid
>>> Smiles: Using Chem.MolFromSmarts(smarts) gives really poor result (Many
>>> False Positives leading to poor Specificity). Interestingly, there is no
>>> False Negative, leading to a Sensitivity of 1!
>>>
>>>
>>>
>>> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>>>     pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
>>>     pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>>>
>>>     frag_bits = set(pfp_frag.GetOnBits())
>>>     structure_bits = set(pfp_structure.GetOnBits())
>>>
>>>     if frag_bits.issubset(structure_bits):
>>>         return True
>>>     else:
>>>         return False
>>>
>>>
>>>
>>> Is there a way to use pattern fingerprint (or other method) for
>>> substructure searches independently of the Smiles/Smarts format of the
>>> fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I
>>> am left with?
>>>
>>> Many thanks,
>>>
>>> Alexis
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to