Thanks a bunch Greg for the very helpful explanation! Things make more
senses now.

On Wed, May 20, 2020 at 12:51 AM Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Hao,
>
> Good question! I had to do a bit of digging to figure that out
>
> Here's what's going on:
> The Morgan fingerprint code uses CIP codes when you set useChirality=True
> Atomic CIP codes are stored as an atomic property
> When you use the multiprocessing module everything ends up being pickled
> and sent to the individual workers in the pool.
> By default, when you pickle RDKit molecules the properties (things you
> access via GetProp()) are not preserved.
> So when you call a function using multiprocessing, the CIP information
> doesn't make it through to the function you call and you don't see any
> difference between different stereoisomers.
>
> The fix to #1993 (https://github.com/rdkit/rdkit/issues/1993), which was
> part of the 2018.09 release, modified the Morgan fingerprinting code so
> that it re-assigns stereochemistry when that information is not present
> already.
>
> Best,
> -greg
>
>
> On Tue, May 19, 2020 at 11:53 PM Hao <shenha...@gmail.com> wrote:
>
>> Hello,
>>
>> This was a very strange bug that I saw. I was getting inconsistent
>> fingerprints using GetMorganFingerprint with useChirality=True, when I used
>> multiprocessing vs when I ran serially on rdkit 2017.09.1 and 2018.03.2. It
>> seems to have been fixed in the latest version. Woo! I was just wondering
>> if anyone has any insights on what was causing this before because I was
>> stumped for the longest time. Example:
>>
>> from multiprocessing import Pool
>> from rdkit import Chem
>> from rdkit.Chem import AllChem
>>
>> def compute_ecfp_bitvect(mol, ecfp_power = 11):
>>     print(Chem.MolToSmiles(mol, isomericSmiles=True))
>>     print(list(Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2,
>> nBits=2 ** ecfp_power, useChirality=True).GetOnBits()))
>>     return Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2,
>> nBits=2 ** ecfp_power, useChirality=True)
>>
>> smiles = ["N[C@@H](C)C(=O)O", "N[C@H](C)C(=O)O"]
>>
>> mol1 = Chem.MolFromSmiles(smiles[0])
>> mol2 = Chem.MolFromSmiles(smiles[1])
>> print("with pool")
>> with Pool(1) as pool:
>>     jobs = pool.imap(compute_ecfp_bitvect, [mol1,mol2])
>>     list(jobs)
>> print("without pool")
>> [compute_ecfp_bitvect(m) for m in [mol1,mol2]]
>>
>> ===== Output =====
>> with pool
>> C[C@H](N)C(=O)O
>> [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917]
>> C[C@@H](N)C(=O)O
>> [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917]
>> without  pool
>> C[C@H](N)C(=O)O
>> [1, 283, 389, 650, 786, 807, 1057, 1112, 1171, 1187, 1844, 1917]
>> C[C@@H](N)C(=O)O
>> [1, 46, 283, 389, 650, 786, 807, 1057, 1113, 1171, 1844, 1917]
>>
>> Thanks and hope everyone is staying healthy!
>> Hao
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to