Hi,

 

Thanks for your response.

 

The problem is that I’d like to chunk Pandas dataframes to different 
processors.  And efficiently as possible, remove those rows which fail to be 
converted into RDKit Mols.  What I find however, is that the entire process 
dies if the PandasTools fails to convert a SMI to a Mol.  Chunking individual 
rows (chunk = 1) should ensure that row operations get sent to processors and 
fail and will not affect “good” molecules as they would be in separate 
dataframes.  But this isn’t every efficient for Pool, I’d rather chuck the 
dataframe into 5-10% chunks.

 

So the question is.  How to catch failed compounds within a dataframe and still 
write out something in the new fields (like add none to ROMol and HAC).

 

Does that make sense?  Sorry if this isn’t very clear.

 

Cheers,

mike

 

From: Greg Landrum <greg.land...@gmail.com> 
Sent: 01 November 2019 10:40
To: Mike Mazanetz <mi...@novadatasolutions.co.uk>; RDKit Discuss 
<rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] calculating molecular properties on a Pandas 
dataframe Molecule

 

What I'm failing to understand here is what you want to do.

 

Do you want the rows with molecules that failed to parse to remain in the 
DataFrame?

If not you can just remove them (there's probably a simpler way to do this, but 
Pandas never fails to surprise me):

filtered_df = df[df['ROMol'].astype(str).ne('None')]   

 

-greg

 

 

On Thu, Oct 31, 2019 at 11:32 AM Mike Mazanetz <mi...@novadatasolutions.co.uk 
<mailto:mi...@novadatasolutions.co.uk> > wrote:

Hi Taka and Jan,

 

Thanks for your help.

Worked out that I shouldn’t have added the names=[] when I read in my csv file 
(woops).

 

It fails if you have a mol which is None, I’ll have to add a line asking it to 
check that ROMol isn’t None first.  Annoying.

 

Thanks for your help,

 

mike

 

From: Taka Seri <serit...@gmail.com <mailto:serit...@gmail.com> > 
Sent: 31 October 2019 10:15
To: Jan Halborg Jensen <jhjen...@chem.ku.dk <mailto:jhjen...@chem.ku.dk> >
Cc: Mike Mazanetz <mi...@novadatasolutions.co.uk 
<mailto:mi...@novadatasolutions.co.uk> >; RDKit Discuss 
<rdkit-discuss@lists.sourceforge.net 
<mailto:rdkit-discuss@lists.sourceforge.net> >
Subject: Re: [Rdkit-discuss] calculating molecular properties on a Pandas 
dataframe Molecule

 

Hi,

 

Pandas apply function will work too.

AddMoleculeColumnToFrame(DF, "Smiles") at first.

Default setting, rdkit mol object will be added "ROMol" column in your 
dataframe.

https://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html

 

Then call apply function to apply a calculation function an axis of ROMol.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

 DF['HAC'] = DF["ROMol"].apply(Chem.Lipinski.HeavyAtomCount)

Best regards,

Taka

2019年10月31日(木) 18:30 Jan Halborg Jensen <jhjen...@chem.ku.dk 
<mailto:jhjen...@chem.ku.dk> >:

Hi Mike

 

This should work

 

DF[‘HAC’] = [Chem.Lipinski.HeavyAtomCount(mol) for mol in DF[‘Molecule’]]

 

Best regards, Jan

 

 

On 31 Oct 2019, at 10.16, Mike Mazanetz <mi...@novadatasolutions.co.uk 
<mailto:mi...@novadatasolutions.co.uk> > wrote:

 

Hi RDKit Gurus,

 

I’ve followed the docs and created a molecule column in my Pandas dataframe.

However, I do not seem to be able to do molecular operations on the column.

 

For example, if you had a SMILES column, how would you calculate heavy atom 
count and append this result to a new column?

 

This doesn’t work:

DF[‘HAC’] = Chem.Lipinski.HeavyAtomCount(DF[‘Molecule’])

 

Where the Molecule column is generated by PandasTools.AddMoleculeColumnToFrame

 

Thanks,

mike

 

_______________________________________________
Rdkit-discuss mailing list
 <mailto:Rdkit-discuss@lists.sourceforge.net> 
Rdkit-discuss@lists.sourceforge.net
 <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

 

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net 
<mailto:Rdkit-discuss@lists.sourceforge.net> 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net 
<mailto:Rdkit-discuss@lists.sourceforge.net> 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to