Re: [Rdkit-discuss] Using the RDKit with Dask

Peter Schmidtke Mon, 22 Mar 2021 06:26:34 -0700

Hi,

did you try to do something along these lines already?:
https://stackoverflow.com/questions/7089386/pickling-boost-python-functions


Peter
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-i...@2.png?v=73d79a89bded]<https://stackoverflow.com/questions/7089386/pickling-boost-python-functions>
pickle - Pickling boost python functions - Stack 
Overflow<https://stackoverflow.com/questions/7089386/pickling-boost-python-functions>
For the use with joblib.Parallel, I need to be able to pickle a boost::python 
function. When I try to do so, I get a TypeError: can't pickle 
builtin_function_or_method objects As far as I underst...
stackoverflow.com


________________________________
From: Patrick Walters <wpwalt...@gmail.com>
Sent: Monday, March 22, 2021 13:49
To: rdkit-discuss <rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Using the RDKit with Dask

Apologies, there was a bug in the code I sent in my previous message.  The 
problem is the same.  Here is the corrected code in a gist.

https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd



On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
<wpwalt...@gmail.com<mailto:wpwalt...@gmail.com>> wrote:
Hi All,

I've been trying to calculate BCUT2D descriptors in parallel with Dask and get 
this error with the code below.
TypeError: cannot pickle 'Boost.Python.function' object

Everything works if I call mw_df, which calculates molecular weight, but I get 
the error above if I call bcut_df.  Does anyone have a workaround?

Thanks,

Pat

#!/usr/bin/env python

import sys
import dask.dataframe as dd
import pandas as pd
from rdkit import Chem
from rdkit.Chem.Descriptors import MolWt
from rdkit.Chem.rdMolDescriptors import BCUT2D
import time

# --  molecular weight functions
def calc_mw(smi):
    mol = Chem.MolFromSmiles(smi)
    return MolWt(mol)

def mw_df(df):
    return df.SMILES.apply(calc_mw)

# -- bcut functions
def bcut_df(df):
    return df.apply(calc_bcut)

def calc_bcut(smi):
    mol = Chem.MolFromSmiles(smi)
    return BCUT2D(mol)

def main():
    start = time.time()
    df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
    ddf = dd.from_pandas(df,npartitions=16)
    ddf['MW'] = 
ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
    ddf['BCUT'] = 
ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
    print(time.time()-start)
    print(ddf.head())


if __name__ == "__main__":
    main()

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Using the RDKit with Dask

Reply via email to