Hi Pat, Solution, either change your calc_bcut function to: def calc_bcut(smi): from rdkit.Chem.rdMolDescriptors import BCUT2D mol = Chem.MolFromSmiles(smi) return BCUT2D(mol)
or change the import on line 8 at the top to: from rdkit.Chem import rdMolDescriptors and do: def calc_bcut(smi): mol = Chem.MolFromSmiles(smi) return rdMolDescriptors.BCUT2D(mol) The second approach is probably more efficient. I'm not 100% sure what's happening, but it looks like dask is trying to somehow package up whatever is being used in calc_bcut() and is having a problem when it sees the BCUT2D object, which is a Boost.Python.function instead of a normal Python function: In [3]: type(MolWt) Out[3]: function In [4]: type(BCUT2D) Out[4]: Boost.Python.function By either explicitly doing the import in calc_bcut() or referencing the function through the module, dask seems to be able to figure out how to do the right thing. -greg p.s. in case you see different behavior: In [2]: dask.__version__ Out[2]: '2020.12.0' On Mon, Mar 22, 2021 at 1:51 PM Patrick Walters <wpwalt...@gmail.com> wrote: > Apologies, there was a bug in the code I sent in my previous message. The > problem is the same. Here is the corrected code in a gist. > > https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd > > > > On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters <wpwalt...@gmail.com> > wrote: > >> Hi All, >> >> I've been trying to calculate BCUT2D descriptors in parallel with Dask >> and get this error with the code below. >> TypeError: cannot pickle 'Boost.Python.function' object >> >> Everything works if I call mw_df, which calculates molecular weight, but >> I get the error above if I call bcut_df. Does anyone have a workaround? >> >> Thanks, >> >> Pat >> >> #!/usr/bin/env python >> >> import sys >> import dask.dataframe as dd >> import pandas as pd >> from rdkit import Chem >> from rdkit.Chem.Descriptors import MolWt >> from rdkit.Chem.rdMolDescriptors import BCUT2D >> import time >> >> # -- molecular weight functions >> def calc_mw(smi): >> mol = Chem.MolFromSmiles(smi) >> return MolWt(mol) >> >> def mw_df(df): >> return df.SMILES.apply(calc_mw) >> >> # -- bcut functions >> def bcut_df(df): >> return df.apply(calc_bcut) >> >> def calc_bcut(smi): >> mol = Chem.MolFromSmiles(smi) >> return BCUT2D(mol) >> >> def main(): >> start = time.time() >> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) >> ddf = dd.from_pandas(df,npartitions=16) >> ddf['MW'] = >> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') >> ddf['BCUT'] = >> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') >> print(time.time()-start) >> print(ddf.head()) >> >> >> if __name__ == "__main__": >> main() >> > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss