Dear Rdkiters,

I am interested in doing substructure searches between many thousands
structures and many thousands of fragments, as quickly as possible, with
reasonable accuracy (> 0.95)...

I did read Greg's excellent post on that subject:

http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html

I was using the rdkit pattern fingerprint approach to filter out any
fragments that have no chance of matching the bigger structure through the
slow and more accurate molecular graph approach, saving a lot of time.

However, I realized that this rdkit pattern fingerprint approach only works
well if we compared smiles with smiles:



def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
    pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
    pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))

    frag_bits = set(pfp_frag.GetOnBits())
    structure_bits = set(pfp_structure.GetOnBits())

    if frag_bits.issubset(structure_bits):
        return True
    else:
        return False



Unfortunately, some of my fragments are Smarts that are not valid Smiles:
Using Chem.MolFromSmarts(smarts) gives really poor result (Many False
Positives leading to poor Specificity). Interestingly, there is no False
Negative, leading to a Sensitivity of 1!



def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
    pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
    pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))

    frag_bits = set(pfp_frag.GetOnBits())
    structure_bits = set(pfp_structure.GetOnBits())

    if frag_bits.issubset(structure_bits):
        return True
    else:
        return False



Is there a way to use pattern fingerprint (or other method) for
substructure searches independently of the Smiles/Smarts format of the
fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I
am left with?

Many thanks,

Alexis
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to