I think Nikolas is being a bit modest... the Pandas integration is
pretty cool. :-)

Here's an example of using it from the IPython prompt (it's better in
the notebook, but that doesn't paste so nicely into email)

Loading an SD file:

In [1]: from rdkit import Chem

In [2]: from rdkit.Chem import PandasTools

In [3]: import pandas as pd

In [4]: df = 
PandasTools.LoadSDF('hERG_inhibition_dataset.sdf',includeFingerprints=True)

In [5]: df
Out[5]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 242 entries, 0 to 241
Data columns:
ACTIVITY_CLASS    242  non-null values
CompoundName      242  non-null values
ID                242  non-null values
MDLPublicKeys     242  non-null values
SMILES            242  non-null values
pIC50             242  non-null values
ROMol             242  non-null values
dtypes: object(7)>


And doing a substructure search:

In [6]: N3s = df[df['ROMol']>=Chem.MolFromSmiles('N(C)(C)C')]

In [7]: N3s
Out[7]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 177 entries, 0 to 239
Data columns:
ACTIVITY_CLASS    177  non-null values
CompoundName      177  non-null values
ID                177  non-null values
MDLPublicKeys     177  non-null values
SMILES            177  non-null values
pIC50             177  non-null values
ROMol             177  non-null values
dtypes: object(7)

Because I used the "includeFingerprints" argument, that actually did
the search using a substructure fingerprint to speed things up. This
is using the avalon fingerprint at the moment, but that will change
between now and the release so as to not add an additional dependency.

-greg

On Fri, Apr 19, 2013 at 11:56 AM, Nikolas Fechner <niko...@fechner.cc> wrote:
> Dear all,
> We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for
> using RDKit molecule objects directly in pandas dataframes. Pandas
> (http://pandas.pydata.org/) is a python library that offers table-like
> datacontainers, which are incredibly useful for anything related to data
> mining. Moreover, it integrates nicely with the ipython notebook producing
> rendered HTML tables for the dataframes. The RDKit integration allows to
> have molecule-type columns and functionality to perform substructure-based
> row filtering directly on the pandas table. Additionally, if a dataframe is
> exported as HTML or shown within an ipython notebook, the molecules in the
> table are rendered as 2D structures.
>
> The new module is available in the current SF trunk and contains a doctest
> header that provides examples of how to use it.
>
> I hope some of you find that interesting. As always, bug reports, comments,
> ideas... are very much appreciated.
>
> Best,
> Nikolas
>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to