HI Evgueni, On Thu, Jul 16, 2009 at 11:23 AM, Evgueni Kolossov<ekolos...@gmail.com> wrote: > Hi Greg, > > What's the best way to check for duplicate in this smart pointers vector of > fragments when adding a new fragment to the vector?
The easy answer would be to use the canonical smiles for the fragment. This *might* work, and it would be easy, but I'm not sure I'd trust it. Here's a simple example where it did work: [6] >>> m1 = Chem.MolFromSmiles('Occcc',False) [7] >>> m2 = Chem.MolFromSmiles('ccccO',False) [8] >>> m1.UpdatePropertyCache() [9] >>> m2.UpdatePropertyCache() [10] >>> Chem.MolToSmiles(m1) Out[10]: 'ccccO' [11] >>> Chem.MolToSmiles(m2) Out[11]: 'ccccO' An answer that's more likely to be correct, but perhaps more difficult to implement, is the use of subgraph invariants. This is what the existing RDKit fragment catalog code does. Take a look at the getDiscrims() method of FragCatalogEntry ($RDBASE/Code/GraphMol/FragCatalog/FragCatalogEntry.cpp); it shows an approach that I have more confidence in than the "canonical smiles for pieces of molecules" method. -greg