Hi Hao,

Good question! I had to do a bit of digging to figure that out

Here's what's going on:
The Morgan fingerprint code uses CIP codes when you set useChirality=True
Atomic CIP codes are stored as an atomic property
When you use the multiprocessing module everything ends up being pickled
and sent to the individual workers in the pool.
By default, when you pickle RDKit molecules the properties (things you
access via GetProp()) are not preserved.
So when you call a function using multiprocessing, the CIP information
doesn't make it through to the function you call and you don't see any
difference between different stereoisomers.

The fix to #1993 (https://github.com/rdkit/rdkit/issues/1993), which was
part of the 2018.09 release, modified the Morgan fingerprinting code so
that it re-assigns stereochemistry when that information is not present
already.

Best,
-greg


On Tue, May 19, 2020 at 11:53 PM Hao <shenha...@gmail.com> wrote:

> Hello,
>
> This was a very strange bug that I saw. I was getting inconsistent
> fingerprints using GetMorganFingerprint with useChirality=True, when I used
> multiprocessing vs when I ran serially on rdkit 2017.09.1 and 2018.03.2. It
> seems to have been fixed in the latest version. Woo! I was just wondering
> if anyone has any insights on what was causing this before because I was
> stumped for the longest time. Example:
>
> from multiprocessing import Pool
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
> def compute_ecfp_bitvect(mol, ecfp_power = 11):
>     print(Chem.MolToSmiles(mol, isomericSmiles=True))
>     print(list(Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2,
> nBits=2 ** ecfp_power, useChirality=True).GetOnBits()))
>     return Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2,
> nBits=2 ** ecfp_power, useChirality=True)
>
> smiles = ["N[C@@H](C)C(=O)O", "N[C@H](C)C(=O)O"]
>
> mol1 = Chem.MolFromSmiles(smiles[0])
> mol2 = Chem.MolFromSmiles(smiles[1])
> print("with pool")
> with Pool(1) as pool:
>     jobs = pool.imap(compute_ecfp_bitvect, [mol1,mol2])
>     list(jobs)
> print("without pool")
> [compute_ecfp_bitvect(m) for m in [mol1,mol2]]
>
> ===== Output =====
> with pool
> C[C@H](N)C(=O)O
> [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917]
> C[C@@H](N)C(=O)O
> [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917]
> without  pool
> C[C@H](N)C(=O)O
> [1, 283, 389, 650, 786, 807, 1057, 1112, 1171, 1187, 1844, 1917]
> C[C@@H](N)C(=O)O
> [1, 46, 283, 389, 650, 786, 807, 1057, 1113, 1171, 1844, 1917]
>
> Thanks and hope everyone is staying healthy!
> Hao
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to