On Oct 3, 2019, at 20:34, Ondrej Gutten via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: > # MCS is a benzene > my_mcs = Chem.MolFromSmiles(res.smarts)
The res.smarts (or res.smartsString if you use the rdFMCS module) returns a SMARTS string, not a SMILES string. You should be using Chem.MolFromSmarts() in the code I quoted. More specifically, the MCS SMARTS pattern is: [#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1 This can be interpreted as a SMILES, because RDKit supports "#6" as part of its extensions to the SMILES grammar. The "#6" means an element with atomic number 6. >>> Chem.CanonSmiles("[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1") '[c]1[c][c][c][c][c]1' However, benzene has a different SMILES: >>> Chem.CanonSmiles("c1ccccc1") 'c1ccccc1' >>> Chem.MolToSmiles(Chem.MolFromSmiles("c1ccccc1"), allBondsExplicit=True, allHsExplicit=True) '[cH]1:[cH]:[cH]:[cH]:[cH]:[cH]:1' Each of the benzene atoms has a hydrogen on it. The difference appears when you call: mol1.HasSubstructMatch(my_mcs) There is a difference if my_mcs is a molecule from a SMILES vs. from a SMARTS. They have different definitions of what it means to match. One of the differences is that a SMILES-made molecule considers the hydrogen counts: >>> mol = Chem.MolFromSmiles("OCc1ccccc1") >>> query = Chem.MolFromSmiles("OC");mol.GetSubstructMatches(query) ((0, 1),) >>> query = Chem.MolFromSmiles("[O]C");mol.GetSubstructMatches(query) () >>> query = Chem.MolFromSmiles("[OH]C");mol.GetSubstructMatches(query) ((0, 1),) while if I made the query from a SMARTS: >>> query = Chem.MolFromSmarts("[O]C");mol.GetSubstructMatches(query) ((0, 1),) Cheers, Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss