Thanks a bunch Greg for the very helpful explanation! Things make more senses now.
On Wed, May 20, 2020 at 12:51 AM Greg Landrum <greg.land...@gmail.com> wrote: > Hi Hao, > > Good question! I had to do a bit of digging to figure that out > > Here's what's going on: > The Morgan fingerprint code uses CIP codes when you set useChirality=True > Atomic CIP codes are stored as an atomic property > When you use the multiprocessing module everything ends up being pickled > and sent to the individual workers in the pool. > By default, when you pickle RDKit molecules the properties (things you > access via GetProp()) are not preserved. > So when you call a function using multiprocessing, the CIP information > doesn't make it through to the function you call and you don't see any > difference between different stereoisomers. > > The fix to #1993 (https://github.com/rdkit/rdkit/issues/1993), which was > part of the 2018.09 release, modified the Morgan fingerprinting code so > that it re-assigns stereochemistry when that information is not present > already. > > Best, > -greg > > > On Tue, May 19, 2020 at 11:53 PM Hao <shenha...@gmail.com> wrote: > >> Hello, >> >> This was a very strange bug that I saw. I was getting inconsistent >> fingerprints using GetMorganFingerprint with useChirality=True, when I used >> multiprocessing vs when I ran serially on rdkit 2017.09.1 and 2018.03.2. It >> seems to have been fixed in the latest version. Woo! I was just wondering >> if anyone has any insights on what was causing this before because I was >> stumped for the longest time. Example: >> >> from multiprocessing import Pool >> from rdkit import Chem >> from rdkit.Chem import AllChem >> >> def compute_ecfp_bitvect(mol, ecfp_power = 11): >> print(Chem.MolToSmiles(mol, isomericSmiles=True)) >> print(list(Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, >> nBits=2 ** ecfp_power, useChirality=True).GetOnBits())) >> return Chem.AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, >> nBits=2 ** ecfp_power, useChirality=True) >> >> smiles = ["N[C@@H](C)C(=O)O", "N[C@H](C)C(=O)O"] >> >> mol1 = Chem.MolFromSmiles(smiles[0]) >> mol2 = Chem.MolFromSmiles(smiles[1]) >> print("with pool") >> with Pool(1) as pool: >> jobs = pool.imap(compute_ecfp_bitvect, [mol1,mol2]) >> list(jobs) >> print("without pool") >> [compute_ecfp_bitvect(m) for m in [mol1,mol2]] >> >> ===== Output ===== >> with pool >> C[C@H](N)C(=O)O >> [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917] >> C[C@@H](N)C(=O)O >> [1, 283, 389, 537, 650, 786, 807, 1057, 1119, 1171, 1844, 1917] >> without pool >> C[C@H](N)C(=O)O >> [1, 283, 389, 650, 786, 807, 1057, 1112, 1171, 1187, 1844, 1917] >> C[C@@H](N)C(=O)O >> [1, 46, 283, 389, 650, 786, 807, 1057, 1113, 1171, 1844, 1917] >> >> Thanks and hope everyone is staying healthy! >> Hao >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss