Hi Hao, Good question! I had to do a bit of digging to figure that out
Here's what's going on: The Morgan fingerprint code uses CIP codes when you set useChirality=True Atomic CIP codes are stored as an atomic property When you use the multiprocessing module everything ends up being pickled and sent to the individual workers in the pool. By default, when you pickle RDKit molecules the properties (things you access via GetProp()) are not preserved. So when you call a function using multiprocessing, the CIP information doesn't make it through to the function you call and you don't see any difference between different stereoisomers. The fix to #1993 (https://github.com/rdkit/rdkit/issues/1993), which was part of the 2018.09 release, modified the Morgan fingerprinting code so that it re-assigns stereochemistry when that information is not present already. Best, -greg On Tue, May 19, 2020 at 11:53 PM Hao <shenha...@gmail.com> wrote: > Hello, > > This was a very strange bug that I saw. I was getting inconsistent > fingerprints using GetMorganFingerprint with useChirality=True, when I used > multiprocessing vs when I ran serially on rdkit 2017.09.1 and 2018.03.2. It > seems to have been fixed in the latest version. Woo! I was just wondering > if anyone has any insights on what was causing this before because I was > stumped for the longest time. Example: > > from multiprocessing import Pool > from rdkit import Chem > from rdkit.Chem import AllChem > > def compute_ecfp_bitvect(mol, ecfp_power = 11): > print(Chem.MolToSmiles(mol, isomericSmiles=True)) > print(list(Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, > nBits=2 ** ecfp_power, useChirality=True).GetOnBits())) > return Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, > nBits=2 ** ecfp_power, useChirality=True) > > smiles = ["N[C@@H](C)C(=O)O", "N[C@H](C)C(=O)O"] > > mol1 = Chem.MolFromSmiles(smiles[0]) > mol2 = Chem.MolFromSmiles(smiles[1]) > print("with pool") > with Pool(1) as pool: > jobs = pool.imap(compute_ecfp_bitvect, [mol1,mol2]) > list(jobs) > print("without pool") > [compute_ecfp_bitvect(m) for m in [mol1,mol2]] > > ===== Output ===== > with pool > C[C@H](N)C(=O)O > [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917] > C[C@@H](N)C(=O)O > [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917] > without pool > C[C@H](N)C(=O)O > [1, 283, 389, 650, 786, 807, 1057, 1112, 1171, 1187, 1844, 1917] > C[C@@H](N)C(=O)O > [1, 46, 283, 389, 650, 786, 807, 1057, 1113, 1171, 1844, 1917] > > Thanks and hope everyone is staying healthy! > Hao > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss