Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule

Greg Landrum Fri, 01 Nov 2019 03:42:02 -0700

What I'm failing to understand here is what you want to do.

Do you want the rows with molecules that failed to parse to remain in the
DataFrame?
If not you can just remove them (there's probably a simpler way to do this,
but Pandas never fails to surprise me):
filtered_df = df[df['ROMol'].astype(str).ne('None')]


-greg


On Thu, Oct 31, 2019 at 11:32 AM Mike Mazanetz <
mi...@novadatasolutions.co.uk> wrote:

> Hi Taka and Jan,
>
>
>
> Thanks for your help.
>
> Worked out that I shouldn’t have added the names=[] when I read in my csv
> file (woops).
>
>
>
> It fails if you have a mol which is None, I’ll have to add a line asking
> it to check that ROMol isn’t None first.  Annoying.
>
>
>
> Thanks for your help,
>
>
>
> mike
>
>
>
> *From:* Taka Seri <serit...@gmail.com>
> *Sent:* 31 October 2019 10:15
> *To:* Jan Halborg Jensen <jhjen...@chem.ku.dk>
> *Cc:* Mike Mazanetz <mi...@novadatasolutions.co.uk>; RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] calculating molecular properties on a
> Pandas dataframe Molecule
>
>
>
> Hi,
>
>
>
> Pandas apply function will work too.
>
> AddMoleculeColumnToFrame(DF, "Smiles") at first.
>
> Default setting, rdkit mol object will be added "ROMol" column in your
> dataframe.
>
> https://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html
>
>
>
> Then call apply function to apply a calculation function an axis of ROMol.
>
>
> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
>
>  DF['HAC'] = DF["ROMol"].apply(Chem.Lipinski.HeavyAtomCount)
>
> Best regards,
>
> Taka
>
> 2019年10月31日(木) 18:30 Jan Halborg Jensen <jhjen...@chem.ku.dk>:
>
> Hi Mike
>
>
>
> This should work
>
>
>
> DF[‘HAC’] = [Chem.Lipinski.HeavyAtomCount(mol) for mol in DF[‘Molecule’]]
>
>
>
> Best regards, Jan
>
>
>
>
>
> On 31 Oct 2019, at 10.16, Mike Mazanetz <mi...@novadatasolutions.co.uk>
> wrote:
>
>
>
> Hi RDKit Gurus,
>
>
>
> I’ve followed the docs and created a molecule column in my Pandas
> dataframe.
>
> However, I do not seem to be able to do molecular operations on the column.
>
>
>
> For example, if you had a SMILES column, how would you calculate heavy
> atom count and append this result to a new column?
>
>
>
> This doesn’t work:
>
> DF[‘HAC’] = Chem.Lipinski.HeavyAtomCount(DF[‘Molecule’])
>
>
>
> Where the Molecule column is generated by
> PandasTools.AddMoleculeColumnToFrame
>
>
>
> Thanks,
>
> mike
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] calculating molecular properties on a Pandas dataframe Molecule

Reply via email to