Re: [Rdkit-discuss] Changes in morgan fingerprint code?

2023-01-12 Thread Greg Landrum
Hi Eric,

That would be due to the fix for this bug:
https://github.com/rdkit/rdkit/issues/5036
If you were generating the fingerprints on "normal" (i.e.
hydrogen-suppressed) graphs, you wouldn't notice this one, but the fact
that you add the Hs before generating the fingerprint causes you to notice
it.

Just as an FYI: the best easy way, by far, to keep track of whether or not
you've seen a particular molecule is to use the SMILES.

-greg


On Fri, Jan 13, 2023 at 6:27 AM Eric Jonas  wrote:

> Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to
> keep track of different molecules, but now I realize it might have been too
> quick and dirty! In particular, there appears to have been a change in the
> morgan code sometime between 2021.09.02 and 2022.03.05. The following code
> produces different output under these versions:
>
> import rdkit.Chem
> import pickle
> from rdkit import Chem
>
> import rdkit.Chem.rdMolDescriptors
> import zlib
>
> def get_morgan4_crc32(m):
> mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4)
> morgan4_crc32 = zlib.crc32(mf.ToBinary())
> return morgan4_crc32
>
> mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1'))
> print(get_morgan4_crc32(mol))
>
> 2021.09.2 : 1567135676
> 2022.03.5 : 204854560
>
> I tried looking at the release notes but I didn't seem to see any breaking
> changes (I might have missed them!) and I tried looking at "blame" for the
> relevant source but didn't see any seemingly-substantive changes within the
> relevant timeframe.
>
> So am I doing something crazy here, or did something change deliberately,
> or is it possible this is a bug?
>
> ...E
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Changes in morgan fingerprint code?

2023-01-12 Thread Eric Jonas
Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to
keep track of different molecules, but now I realize it might have been too
quick and dirty! In particular, there appears to have been a change in the
morgan code sometime between 2021.09.02 and 2022.03.05. The following code
produces different output under these versions:

import rdkit.Chem
import pickle
from rdkit import Chem

import rdkit.Chem.rdMolDescriptors
import zlib

def get_morgan4_crc32(m):
mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4)
morgan4_crc32 = zlib.crc32(mf.ToBinary())
return morgan4_crc32

mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1'))
print(get_morgan4_crc32(mol))

2021.09.2 : 1567135676
2022.03.5 : 204854560

I tried looking at the release notes but I didn't seem to see any breaking
changes (I might have missed them!) and I tried looking at "blame" for the
relevant source but didn't see any seemingly-substantive changes within the
relevant timeframe.

So am I doing something crazy here, or did something change deliberately,
or is it possible this is a bug?

...E
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question about tautomer hash

2023-01-12 Thread Susan Leung
Hi Greg,

Actually, sorry I forgot I had another question.

How about cis/trans immines? I would expect them to have different tautomer
hashes but they don't e.g.

m1 = Chem.MolFromSmiles('C\\N=C\\C')
m2 = Chem.MolFromSmiles('C/N=C\\C')
h1 = RegistrationHash.GetMolLayers(m1)
h2 = RegistrationHash.GetMolLayers(m2)
print(f'>> {h1[HashLayer.TAUTOMER_HASH]}')
print(f'>> {h2[HashLayer.TAUTOMER_HASH]}')

>> C[CH][N]C_0_0
>> C[CH][N]C_0_0

Thanks!

Susan

On Thu, Jan 12, 2023 at 9:22 AM Susan Leung  wrote:

> Hi Greg,
>
> Thanks very much, I suspected as much!
>
> Susan
>
> On Thu, Jan 12, 2023 at 5:45 AM Greg Landrum 
> wrote:
>
>> Hi Susan,
>>
>> The current version of the tautomer hash doesn't do keto-enol tautomerism
>> (your first example). It would be worthwhile for us to add this as an
>> option, but it's not currently available.
>>
>> -greg
>>
>>
>> On Wed, Jan 11, 2023 at 3:04 PM Susan Leung 
>> wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I am trying out the new registration hash and have a question about the
>>> tautomer hash. I think these two molecules (m1 and m2) should have the same
>>> tautomer hash but they are different. However, molecules m3 and m4 have the
>>> same hash. Please can you explain?
>>>
>>>
>>>
>>> import rdkit
>>>
>>> from rdkit import Chem
>>>
>>> from rdkit.Chem import Draw
>>>
>>> from rdkit.Chem import RegistrationHash
>>>
>>> from rdkit.Chem.RegistrationHash import HashLayer
>>>
>>>
>>>
>>> print(f'>> {rdkit.__version__}')
>>>
>>>
>>>
>>> m1 = Chem.MolFromSmiles('C=C(O)C')
>>>
>>> m2 = Chem.MolFromSmiles('CC(=O)C')
>>>
>>> h1 = RegistrationHash.GetMolLayers(m1)
>>>
>>> h2 = RegistrationHash.GetMolLayers(m2)
>>>
>>> print(f'>> {h1[HashLayer.TAUTOMER_HASH]}')
>>>
>>> print(f'>> {h2[HashLayer.TAUTOMER_HASH]}')
>>>
>>>
>>>
>>> m3 = Chem.MolFromSmiles('N=C(O)C')
>>>
>>> m4 = Chem.MolFromSmiles('NC(=O)C')
>>>
>>> h3 = RegistrationHash.GetMolLayers(m3)
>>>
>>> h4 = RegistrationHash.GetMolLayers(m4)
>>>
>>> print(f'>> {h3[HashLayer.TAUTOMER_HASH]}')
>>>
>>>
>>>
>>> >> 2022.09.1
>>> >> [CH2][C](C)[O]_1_0
>>> >> C[C](C)[O]_0_0
>>> >> C[C]([N])[O]_2_0
>>> >> C[C]([N])[O]_2_0
>>>
>>>
>>> Thanks!
>>>
>>>
>>> Susan
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question about tautomer hash

2023-01-12 Thread Susan Leung
Hi Greg,

Thanks very much, I suspected as much!

Susan

On Thu, Jan 12, 2023 at 5:45 AM Greg Landrum  wrote:

> Hi Susan,
>
> The current version of the tautomer hash doesn't do keto-enol tautomerism
> (your first example). It would be worthwhile for us to add this as an
> option, but it's not currently available.
>
> -greg
>
>
> On Wed, Jan 11, 2023 at 3:04 PM Susan Leung  wrote:
>
>> Hi all,
>>
>>
>>
>> I am trying out the new registration hash and have a question about the
>> tautomer hash. I think these two molecules (m1 and m2) should have the same
>> tautomer hash but they are different. However, molecules m3 and m4 have the
>> same hash. Please can you explain?
>>
>>
>>
>> import rdkit
>>
>> from rdkit import Chem
>>
>> from rdkit.Chem import Draw
>>
>> from rdkit.Chem import RegistrationHash
>>
>> from rdkit.Chem.RegistrationHash import HashLayer
>>
>>
>>
>> print(f'>> {rdkit.__version__}')
>>
>>
>>
>> m1 = Chem.MolFromSmiles('C=C(O)C')
>>
>> m2 = Chem.MolFromSmiles('CC(=O)C')
>>
>> h1 = RegistrationHash.GetMolLayers(m1)
>>
>> h2 = RegistrationHash.GetMolLayers(m2)
>>
>> print(f'>> {h1[HashLayer.TAUTOMER_HASH]}')
>>
>> print(f'>> {h2[HashLayer.TAUTOMER_HASH]}')
>>
>>
>>
>> m3 = Chem.MolFromSmiles('N=C(O)C')
>>
>> m4 = Chem.MolFromSmiles('NC(=O)C')
>>
>> h3 = RegistrationHash.GetMolLayers(m3)
>>
>> h4 = RegistrationHash.GetMolLayers(m4)
>>
>> print(f'>> {h3[HashLayer.TAUTOMER_HASH]}')
>>
>>
>>
>> >> 2022.09.1
>> >> [CH2][C](C)[O]_1_0
>> >> C[C](C)[O]_0_0
>> >> C[C]([N])[O]_2_0
>> >> C[C]([N])[O]_2_0
>>
>>
>> Thanks!
>>
>>
>> Susan
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss