Re: [Rdkit-discuss] Changes in morgan fingerprint code?
Hi Eric, That would be due to the fix for this bug: https://github.com/rdkit/rdkit/issues/5036 If you were generating the fingerprints on "normal" (i.e. hydrogen-suppressed) graphs, you wouldn't notice this one, but the fact that you add the Hs before generating the fingerprint causes you to notice it. Just as an FYI: the best easy way, by far, to keep track of whether or not you've seen a particular molecule is to use the SMILES. -greg On Fri, Jan 13, 2023 at 6:27 AM Eric Jonas wrote: > Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to > keep track of different molecules, but now I realize it might have been too > quick and dirty! In particular, there appears to have been a change in the > morgan code sometime between 2021.09.02 and 2022.03.05. The following code > produces different output under these versions: > > import rdkit.Chem > import pickle > from rdkit import Chem > > import rdkit.Chem.rdMolDescriptors > import zlib > > def get_morgan4_crc32(m): > mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4) > morgan4_crc32 = zlib.crc32(mf.ToBinary()) > return morgan4_crc32 > > mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1')) > print(get_morgan4_crc32(mol)) > > 2021.09.2 : 1567135676 > 2022.03.5 : 204854560 > > I tried looking at the release notes but I didn't seem to see any breaking > changes (I might have missed them!) and I tried looking at "blame" for the > relevant source but didn't see any seemingly-substantive changes within the > relevant timeframe. > > So am I doing something crazy here, or did something change deliberately, > or is it possible this is a bug? > > ...E > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Changes in morgan fingerprint code?
Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to keep track of different molecules, but now I realize it might have been too quick and dirty! In particular, there appears to have been a change in the morgan code sometime between 2021.09.02 and 2022.03.05. The following code produces different output under these versions: import rdkit.Chem import pickle from rdkit import Chem import rdkit.Chem.rdMolDescriptors import zlib def get_morgan4_crc32(m): mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4) morgan4_crc32 = zlib.crc32(mf.ToBinary()) return morgan4_crc32 mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1')) print(get_morgan4_crc32(mol)) 2021.09.2 : 1567135676 2022.03.5 : 204854560 I tried looking at the release notes but I didn't seem to see any breaking changes (I might have missed them!) and I tried looking at "blame" for the relevant source but didn't see any seemingly-substantive changes within the relevant timeframe. So am I doing something crazy here, or did something change deliberately, or is it possible this is a bug? ...E ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Question about tautomer hash
Hi Greg, Actually, sorry I forgot I had another question. How about cis/trans immines? I would expect them to have different tautomer hashes but they don't e.g. m1 = Chem.MolFromSmiles('C\\N=C\\C') m2 = Chem.MolFromSmiles('C/N=C\\C') h1 = RegistrationHash.GetMolLayers(m1) h2 = RegistrationHash.GetMolLayers(m2) print(f'>> {h1[HashLayer.TAUTOMER_HASH]}') print(f'>> {h2[HashLayer.TAUTOMER_HASH]}') >> C[CH][N]C_0_0 >> C[CH][N]C_0_0 Thanks! Susan On Thu, Jan 12, 2023 at 9:22 AM Susan Leung wrote: > Hi Greg, > > Thanks very much, I suspected as much! > > Susan > > On Thu, Jan 12, 2023 at 5:45 AM Greg Landrum > wrote: > >> Hi Susan, >> >> The current version of the tautomer hash doesn't do keto-enol tautomerism >> (your first example). It would be worthwhile for us to add this as an >> option, but it's not currently available. >> >> -greg >> >> >> On Wed, Jan 11, 2023 at 3:04 PM Susan Leung >> wrote: >> >>> Hi all, >>> >>> >>> >>> I am trying out the new registration hash and have a question about the >>> tautomer hash. I think these two molecules (m1 and m2) should have the same >>> tautomer hash but they are different. However, molecules m3 and m4 have the >>> same hash. Please can you explain? >>> >>> >>> >>> import rdkit >>> >>> from rdkit import Chem >>> >>> from rdkit.Chem import Draw >>> >>> from rdkit.Chem import RegistrationHash >>> >>> from rdkit.Chem.RegistrationHash import HashLayer >>> >>> >>> >>> print(f'>> {rdkit.__version__}') >>> >>> >>> >>> m1 = Chem.MolFromSmiles('C=C(O)C') >>> >>> m2 = Chem.MolFromSmiles('CC(=O)C') >>> >>> h1 = RegistrationHash.GetMolLayers(m1) >>> >>> h2 = RegistrationHash.GetMolLayers(m2) >>> >>> print(f'>> {h1[HashLayer.TAUTOMER_HASH]}') >>> >>> print(f'>> {h2[HashLayer.TAUTOMER_HASH]}') >>> >>> >>> >>> m3 = Chem.MolFromSmiles('N=C(O)C') >>> >>> m4 = Chem.MolFromSmiles('NC(=O)C') >>> >>> h3 = RegistrationHash.GetMolLayers(m3) >>> >>> h4 = RegistrationHash.GetMolLayers(m4) >>> >>> print(f'>> {h3[HashLayer.TAUTOMER_HASH]}') >>> >>> >>> >>> >> 2022.09.1 >>> >> [CH2][C](C)[O]_1_0 >>> >> C[C](C)[O]_0_0 >>> >> C[C]([N])[O]_2_0 >>> >> C[C]([N])[O]_2_0 >>> >>> >>> Thanks! >>> >>> >>> Susan >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Question about tautomer hash
Hi Greg, Thanks very much, I suspected as much! Susan On Thu, Jan 12, 2023 at 5:45 AM Greg Landrum wrote: > Hi Susan, > > The current version of the tautomer hash doesn't do keto-enol tautomerism > (your first example). It would be worthwhile for us to add this as an > option, but it's not currently available. > > -greg > > > On Wed, Jan 11, 2023 at 3:04 PM Susan Leung wrote: > >> Hi all, >> >> >> >> I am trying out the new registration hash and have a question about the >> tautomer hash. I think these two molecules (m1 and m2) should have the same >> tautomer hash but they are different. However, molecules m3 and m4 have the >> same hash. Please can you explain? >> >> >> >> import rdkit >> >> from rdkit import Chem >> >> from rdkit.Chem import Draw >> >> from rdkit.Chem import RegistrationHash >> >> from rdkit.Chem.RegistrationHash import HashLayer >> >> >> >> print(f'>> {rdkit.__version__}') >> >> >> >> m1 = Chem.MolFromSmiles('C=C(O)C') >> >> m2 = Chem.MolFromSmiles('CC(=O)C') >> >> h1 = RegistrationHash.GetMolLayers(m1) >> >> h2 = RegistrationHash.GetMolLayers(m2) >> >> print(f'>> {h1[HashLayer.TAUTOMER_HASH]}') >> >> print(f'>> {h2[HashLayer.TAUTOMER_HASH]}') >> >> >> >> m3 = Chem.MolFromSmiles('N=C(O)C') >> >> m4 = Chem.MolFromSmiles('NC(=O)C') >> >> h3 = RegistrationHash.GetMolLayers(m3) >> >> h4 = RegistrationHash.GetMolLayers(m4) >> >> print(f'>> {h3[HashLayer.TAUTOMER_HASH]}') >> >> >> >> >> 2022.09.1 >> >> [CH2][C](C)[O]_1_0 >> >> C[C](C)[O]_0_0 >> >> C[C]([N])[O]_2_0 >> >> C[C]([N])[O]_2_0 >> >> >> Thanks! >> >> >> Susan >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss