Re: [Rdkit-discuss] [External] Re: Using the RDKit with Dask

Patrick Walters Mon, 22 Mar 2021 06:22:29 -0700

The input is just SMILES and molecule name separated by a space.   I've
attached an example.


Thanks,

Pat


On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
[email protected]> wrote:

> Hi Pat,
>
>
>
> Do you have a small example file to proceed , or can I use esol.csv for
> example ?
>
>
>
> Thanks
>
>
>
> Guillaume
>
>
>
> *De : *Patrick Walters <[email protected]>
> *Date : *lundi, 22 mars 2021 à 13:51
> *À : *rdkit-discuss <[email protected]>
> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
> Apologies, there was a bug in the code I sent in my previous message.  The
> problem is the same.  Here is the corrected code in a gist.
>
>
>
> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>
>
>
>
>
>
>
> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters <[email protected]>
> wrote:
>
> Hi All,
>
>
>
> I've been trying to calculate BCUT2D descriptors in parallel with Dask and
> get this error with the code below.
>
> TypeError: cannot pickle 'Boost.Python.function' object
>
>
>
> Everything works if I call mw_df, which calculates molecular weight, but I
> get the error above if I call bcut_df.  Does anyone have a workaround?
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
> #!/usr/bin/env python
>
> import sys
> import dask.dataframe as dd
> import pandas as pd
> from rdkit import Chem
> from rdkit.Chem.Descriptors import MolWt
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> import time
>
> # --  molecular weight functions
> def calc_mw(smi):
>     mol = Chem.MolFromSmiles(smi)
>     return MolWt(mol)
>
> def mw_df(df):
>     return df.SMILES.apply(calc_mw)
>
> # -- bcut functions
> def bcut_df(df):
>     return df.apply(calc_bcut)
>
> def calc_bcut(smi):
>     mol = Chem.MolFromSmiles(smi)
>     return BCUT2D(mol)
>
> def main():
>     start = time.time()
>     df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
>     ddf = dd.from_pandas(df,npartitions=16)
>     ddf['MW'] =
> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
>     ddf['BCUT'] =
> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
>     print(time.time()-start)
>     print(ddf.head())
>
>
> if __name__ == "__main__":
>     main()
>
>
> ***********************************************************************************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***********************************************************************************
>

zinc_100.smi
Description: Binary data

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

Reply via email to

Re: [Rdkit-discuss] [External] Re: Using the RDKit with Dask