Hi Paolo, Great thanks for this, that's very helpful.
I am actually using this query in the postgres cartridge and am seeing similar behaviour using version 2021.09.2 but I think this might be fixed in 2021.09.3 with #4787 <https://github.com/rdkit/rdkit/issues/4787> . However, I think the conda package ( https://anaconda.org/rdkit/rdkit-postgresql ) is still on version 2021.09.2, would it be possible to update this? But I also wanted to ask about the concept of a qmol in the cartridge that doesn't undergo sanitization versus the corresponding behaviour in Python? Please correct me if I'm wrong, but there is no concept of a qmol in Python? Many thanks! Susan On Tue, Jul 26, 2022 at 9:04 PM Paolo Tosco <paolo.tosco.m...@gmail.com> wrote: > Hi Susan, > > I see why that happens, and I'll let Greg comment if this is a bug or the > intended behavior. > In the meantime, I can propose a workaround. > > The reason why it happens is that aromatization, which is part of the > sanitization operations, converts your aromatic query bond into a single > bond, probably in the assumption that it was labelled as aromatic by > mistake (it is indeed an exocyclic bond and it is not part of a ring in the > q_aromatic molecule). You can clearly see that if you carry out > sanitization as a separate step: > > q_aromatic = Chem.MolFromMolBlock(qb_aromatic, sanitize=False) > q_aromatic > [image: 16cd080c-d76f-457e-a210-1b5f9a347b77.png] > for b in q_aromatic.GetBonds(): > print(b.GetIdx(), b.GetBondType(), b.DescribeQuery()) > 0 DOUBLE > 1 SINGLE > 2 DOUBLE > 3 SINGLE > 4 SINGLE > 5 DOUBLE > 6 AROMATIC > > Bond 6 is AROMATIC, but bears no query. > Let's store an array of the currently aromatic bonds: > > are_aromatic = [b.GetIdx() for b in q_aromatic.GetBonds() if > b.GetIsAromatic()] > are_aromatic > [6] > > After sanitization, the aromatic bond turns into a single bond: > > Chem.SanitizeMol(q_aromatic) > rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE > for b in q_aromatic.GetBonds(): > print(b.GetIdx(), b.GetBondType(), b.DescribeQuery()) > 0 AROMATIC > 1 AROMATIC > 2 AROMATIC > 3 AROMATIC > 4 AROMATIC > 5 AROMATIC > 6 SINGLE > > We know that it was aromatic before sanitization, so let's make it an > aromatic query bond: > > aromatic_query_bond = Chem.MolFromSmarts("*:*").GetBondWithIdx(0) > aromatic_query_bond.GetBondType(), aromatic_query_bond.DescribeQuery() > (rdkit.Chem.rdchem.BondType.AROMATIC, 'BondOrder 12 = val\n') > > q_aromatic = Chem.RWMol(q_aromatic) > for b_idx in are_aromatic: > q_aromatic.ReplaceBond(b_idx, aromatic_query_bond) > > Now q_aromatic matches as expected: > > print(m.HasSubstructMatch(q_aromatic)) > True > > Cheers, > p. > > > On Tue, Jul 26, 2022 at 6:17 PM Susan Leung <susanhle...@gmail.com> wrote: > >> Hi all, >> >> >> >> Sorry it's me with another substructure query question... >> >> >> >> Please can anyone explain the following behaviour to me? I have 4 queries >> that differ by just one query bond. To me, it should match an aromatic bond >> type (4) but it doesn't. However, it matches single_or_aromatic and >> double_or_aromatic query bond types but not single_or_doubleā¦. >> >> >> Best wishes, >> >> >> Susan >> >> >> import rdkit >> print(rdkit.__version__) >> from rdkit import Chem >> >> qb_double_or_aromatic = """ >> ACCLDraw07262216372D >> >> 0 0 0 0 0 999 V3000 >> M V30 BEGIN CTAB >> M V30 COUNTS 7 7 0 0 0 >> M V30 BEGIN ATOM >> M V30 1 C 45.7538 -37.6779 0 0 >> M V30 2 C 44.7323 -37.0872 0 0 >> M V30 3 C 44.7323 -35.9099 0 0 >> M V30 4 C 45.7553 -35.3193 0 0 >> M V30 5 C 46.7807 -35.9049 0 0 >> M V30 6 C 46.7807 -37.085 0 0 >> M V30 7 O 45.7553 -34.1382 0 0 >> M V30 END ATOM >> M V30 BEGIN BOND >> M V30 1 2 2 1 >> M V30 2 1 3 2 >> M V30 3 2 4 3 >> M V30 4 1 5 4 >> M V30 5 1 1 6 >> M V30 6 2 6 5 >> M V30 7 7 4 7 >> M V30 END BOND >> M V30 END CTAB >> M END >> """ >> qb_single_or_aromatic = """ >> ACCLDraw07262216372D >> >> 0 0 0 0 0 999 V3000 >> M V30 BEGIN CTAB >> M V30 COUNTS 7 7 0 0 0 >> M V30 BEGIN ATOM >> M V30 1 C 45.7538 -37.6779 0 0 >> M V30 2 C 44.7323 -37.0872 0 0 >> M V30 3 C 44.7323 -35.9099 0 0 >> M V30 4 C 45.7553 -35.3193 0 0 >> M V30 5 C 46.7807 -35.9049 0 0 >> M V30 6 C 46.7807 -37.085 0 0 >> M V30 7 O 45.7553 -34.1382 0 0 >> M V30 END ATOM >> M V30 BEGIN BOND >> M V30 1 2 2 1 >> M V30 2 1 3 2 >> M V30 3 2 4 3 >> M V30 4 1 5 4 >> M V30 5 1 1 6 >> M V30 6 2 6 5 >> M V30 7 6 4 7 >> M V30 END BOND >> M V30 END CTAB >> M END >> """ >> qb_aromatic = """ >> ACCLDraw07262216372D >> >> 0 0 0 0 0 999 V3000 >> M V30 BEGIN CTAB >> M V30 COUNTS 7 7 0 0 0 >> M V30 BEGIN ATOM >> M V30 1 C 45.7538 -37.6779 0 0 >> M V30 2 C 44.7323 -37.0872 0 0 >> M V30 3 C 44.7323 -35.9099 0 0 >> M V30 4 C 45.7553 -35.3193 0 0 >> M V30 5 C 46.7807 -35.9049 0 0 >> M V30 6 C 46.7807 -37.085 0 0 >> M V30 7 O 45.7553 -34.1382 0 0 >> M V30 END ATOM >> M V30 BEGIN BOND >> M V30 1 2 2 1 >> M V30 2 1 3 2 >> M V30 3 2 4 3 >> M V30 4 1 5 4 >> M V30 5 1 1 6 >> M V30 6 2 6 5 >> M V30 7 4 4 7 >> M V30 END BOND >> M V30 END CTAB >> M END >> """ >> qb_single_or_double = """ >> ACCLDraw07262216372D >> >> 0 0 0 0 0 999 V3000 >> M V30 BEGIN CTAB >> M V30 COUNTS 7 7 0 0 0 >> M V30 BEGIN ATOM >> M V30 1 C 45.7538 -37.6779 0 0 >> M V30 2 C 44.7323 -37.0872 0 0 >> M V30 3 C 44.7323 -35.9099 0 0 >> M V30 4 C 45.7553 -35.3193 0 0 >> M V30 5 C 46.7807 -35.9049 0 0 >> M V30 6 C 46.7807 -37.085 0 0 >> M V30 7 O 45.7553 -34.1382 0 0 >> M V30 END ATOM >> M V30 BEGIN BOND >> M V30 1 2 2 1 >> M V30 2 1 3 2 >> M V30 3 2 4 3 >> M V30 4 1 5 4 >> M V30 5 1 1 6 >> M V30 6 2 6 5 >> M V30 7 5 4 7 >> M V30 END BOND >> M V30 END CTAB >> M END >> """ >> mb = """ >> ACCLDraw07262216212D >> >> 0 0 0 0 0 999 V3000 >> M V30 BEGIN CTAB >> M V30 COUNTS 10 11 0 0 0 >> M V30 BEGIN ATOM >> M V30 1 O 4.9598 -34.3327 0 0 >> M V30 2 O 3.4666 -32.8272 0 0 >> M V30 3 C 6.9426 -35.2057 0 0 >> M V30 4 C 4.5926 -33.1985 0 0 >> M V30 5 C 6.143 -34.3327 0 0 >> M V30 6 C 8.0972 -34.9529 0 0 >> M V30 7 C 6.5101 -33.1985 0 0 >> M V30 8 C 8.4562 -33.8227 0 0 >> M V30 9 N 5.5514 -32.509 0 0 CFG=3 >> M V30 10 C 7.6606 -32.9537 0 0 >> M V30 END ATOM >> M V30 BEGIN BOND >> M V30 1 1 1 4 >> M V30 2 2 4 2 >> M V30 3 1 5 3 >> M V30 4 1 5 1 >> M V30 5 2 3 6 >> M V30 6 2 5 7 >> M V30 7 1 6 8 >> M V30 8 1 7 9 >> M V30 9 1 9 4 >> M V30 10 1 7 10 >> M V30 11 2 10 8 >> M V30 END BOND >> M V30 END CTAB >> M END >> """ >> m = Chem.MolFromMolBlock(mb) >> >> q_double_or_aromatic = Chem.MolFromMolBlock(qb_double_or_aromatic) >> print(m.HasSubstructMatch(q_double_or_aromatic)) >> >> q_single_or_aromatic = Chem.MolFromMolBlock(qb_single_or_aromatic) >> print(m.HasSubstructMatch(q_single_or_aromatic)) >> >> q_aromatic = Chem.MolFromMolBlock(qb_aromatic) >> print(m.HasSubstructMatch(q_aromatic)) >> >> q_single_or_double = Chem.MolFromMolBlock(qb_single_or_double) >> print(m.HasSubstructMatch(q_single_or_double)) >> >> >> >>> 2022.03.2 >> >> >>> True>>> True>>> False >> >> >>> False >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss