HI Evgueni,

On Thu, Jul 16, 2009 at 11:23 AM, Evgueni Kolossov<ekolos...@gmail.com> wrote:
> Hi Greg,
>
> What's the best way to check for duplicate in this smart pointers vector of
> fragments when adding a new fragment to the vector?

The easy answer would be to use the canonical smiles for the fragment.
This *might* work, and it would be easy, but I'm not sure I'd trust
it.
Here's a simple example where it did work:
[6] >>> m1 = Chem.MolFromSmiles('Occcc',False)

[7] >>> m2 = Chem.MolFromSmiles('ccccO',False)

[8] >>> m1.UpdatePropertyCache()

[9] >>> m2.UpdatePropertyCache()

[10] >>> Chem.MolToSmiles(m1)
Out[10]: 'ccccO'

[11] >>> Chem.MolToSmiles(m2)
Out[11]: 'ccccO'

An answer that's more likely to be correct, but perhaps more difficult
to implement, is the use of subgraph invariants. This is what the
existing RDKit fragment catalog code does. Take a look at the
getDiscrims() method of FragCatalogEntry
($RDBASE/Code/GraphMol/FragCatalog/FragCatalogEntry.cpp); it shows an
approach that I have more confidence in than the "canonical smiles for
pieces of molecules" method.

-greg

Reply via email to