The input is just SMILES and molecule name separated by a space. I've attached an example.
Thanks, Pat On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Hi Pat, > > > > Do you have a small example file to proceed , or can I use esol.csv for > example ? > > > > Thanks > > > > Guillaume > > > > *De : *Patrick Walters <wpwalt...@gmail.com> > *Date : *lundi, 22 mars 2021 à 13:51 > *À : *rdkit-discuss <rdkit-discuss@lists.sourceforge.net> > *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask > > Apologies, there was a bug in the code I sent in my previous message. The > problem is the same. Here is the corrected code in a gist. > > > > https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd > > > > > > > > On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters <wpwalt...@gmail.com> > wrote: > > Hi All, > > > > I've been trying to calculate BCUT2D descriptors in parallel with Dask and > get this error with the code below. > > TypeError: cannot pickle 'Boost.Python.function' object > > > > Everything works if I call mw_df, which calculates molecular weight, but I > get the error above if I call bcut_df. Does anyone have a workaround? > > > > Thanks, > > > > Pat > > > > #!/usr/bin/env python > > import sys > import dask.dataframe as dd > import pandas as pd > from rdkit import Chem > from rdkit.Chem.Descriptors import MolWt > from rdkit.Chem.rdMolDescriptors import BCUT2D > import time > > # -- molecular weight functions > def calc_mw(smi): > mol = Chem.MolFromSmiles(smi) > return MolWt(mol) > > def mw_df(df): > return df.SMILES.apply(calc_mw) > > # -- bcut functions > def bcut_df(df): > return df.apply(calc_bcut) > > def calc_bcut(smi): > mol = Chem.MolFromSmiles(smi) > return BCUT2D(mol) > > def main(): > start = time.time() > df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"]) > ddf = dd.from_pandas(df,npartitions=16) > ddf['MW'] = > ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes') > ddf['BCUT'] = > ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes') > print(time.time()-start) > print(ddf.head()) > > > if __name__ == "__main__": > main() > > > *********************************************************************************** > DISCLAIMER > This email and any files transmitted with it, including replies and > forwarded copies (which may contain alterations) subsequently transmitted > from Firmenich, are confidential and solely for the use of the intended > recipient. The contents do not represent the opinion of Firmenich except to > the extent that it relates to their official business. > > *********************************************************************************** >
zinc_100.smi
Description: Binary data
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss