Hi,
Thanks for your response. The problem is that I’d like to chunk Pandas dataframes to different processors. And efficiently as possible, remove those rows which fail to be converted into RDKit Mols. What I find however, is that the entire process dies if the PandasTools fails to convert a SMI to a Mol. Chunking individual rows (chunk = 1) should ensure that row operations get sent to processors and fail and will not affect “good” molecules as they would be in separate dataframes. But this isn’t every efficient for Pool, I’d rather chuck the dataframe into 5-10% chunks. So the question is. How to catch failed compounds within a dataframe and still write out something in the new fields (like add none to ROMol and HAC). Does that make sense? Sorry if this isn’t very clear. Cheers, mike From: Greg Landrum <greg.land...@gmail.com> Sent: 01 November 2019 10:40 To: Mike Mazanetz <mi...@novadatasolutions.co.uk>; RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule What I'm failing to understand here is what you want to do. Do you want the rows with molecules that failed to parse to remain in the DataFrame? If not you can just remove them (there's probably a simpler way to do this, but Pandas never fails to surprise me): filtered_df = df[df['ROMol'].astype(str).ne('None')] -greg On Thu, Oct 31, 2019 at 11:32 AM Mike Mazanetz <mi...@novadatasolutions.co.uk <mailto:mi...@novadatasolutions.co.uk> > wrote: Hi Taka and Jan, Thanks for your help. Worked out that I shouldn’t have added the names=[] when I read in my csv file (woops). It fails if you have a mol which is None, I’ll have to add a line asking it to check that ROMol isn’t None first. Annoying. Thanks for your help, mike From: Taka Seri <serit...@gmail.com <mailto:serit...@gmail.com> > Sent: 31 October 2019 10:15 To: Jan Halborg Jensen <jhjen...@chem.ku.dk <mailto:jhjen...@chem.ku.dk> > Cc: Mike Mazanetz <mi...@novadatasolutions.co.uk <mailto:mi...@novadatasolutions.co.uk> >; RDKit Discuss <rdkit-discuss@lists.sourceforge.net <mailto:rdkit-discuss@lists.sourceforge.net> > Subject: Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule Hi, Pandas apply function will work too. AddMoleculeColumnToFrame(DF, "Smiles") at first. Default setting, rdkit mol object will be added "ROMol" column in your dataframe. https://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html Then call apply function to apply a calculation function an axis of ROMol. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html DF['HAC'] = DF["ROMol"].apply(Chem.Lipinski.HeavyAtomCount) Best regards, Taka 2019年10月31日(木) 18:30 Jan Halborg Jensen <jhjen...@chem.ku.dk <mailto:jhjen...@chem.ku.dk> >: Hi Mike This should work DF[‘HAC’] = [Chem.Lipinski.HeavyAtomCount(mol) for mol in DF[‘Molecule’]] Best regards, Jan On 31 Oct 2019, at 10.16, Mike Mazanetz <mi...@novadatasolutions.co.uk <mailto:mi...@novadatasolutions.co.uk> > wrote: Hi RDKit Gurus, I’ve followed the docs and created a molecule column in my Pandas dataframe. However, I do not seem to be able to do molecular operations on the column. For example, if you had a SMILES column, how would you calculate heavy atom count and append this result to a new column? This doesn’t work: DF[‘HAC’] = Chem.Lipinski.HeavyAtomCount(DF[‘Molecule’]) Where the Molecule column is generated by PandasTools.AddMoleculeColumnToFrame Thanks, mike _______________________________________________ Rdkit-discuss mailing list <mailto:Rdkit-discuss@lists.sourceforge.net> Rdkit-discuss@lists.sourceforge.net <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net <mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss