I think Nikolas is being a bit modest... the Pandas integration is pretty cool. :-)
Here's an example of using it from the IPython prompt (it's better in the notebook, but that doesn't paste so nicely into email) Loading an SD file: In [1]: from rdkit import Chem In [2]: from rdkit.Chem import PandasTools In [3]: import pandas as pd In [4]: df = PandasTools.LoadSDF('hERG_inhibition_dataset.sdf',includeFingerprints=True) In [5]: df Out[5]: <class 'pandas.core.frame.DataFrame'> Int64Index: 242 entries, 0 to 241 Data columns: ACTIVITY_CLASS 242 non-null values CompoundName 242 non-null values ID 242 non-null values MDLPublicKeys 242 non-null values SMILES 242 non-null values pIC50 242 non-null values ROMol 242 non-null values dtypes: object(7)> And doing a substructure search: In [6]: N3s = df[df['ROMol']>=Chem.MolFromSmiles('N(C)(C)C')] In [7]: N3s Out[7]: <class 'pandas.core.frame.DataFrame'> Int64Index: 177 entries, 0 to 239 Data columns: ACTIVITY_CLASS 177 non-null values CompoundName 177 non-null values ID 177 non-null values MDLPublicKeys 177 non-null values SMILES 177 non-null values pIC50 177 non-null values ROMol 177 non-null values dtypes: object(7) Because I used the "includeFingerprints" argument, that actually did the search using a substructure fingerprint to speed things up. This is using the avalon fingerprint at the moment, but that will change between now and the release so as to not add an additional dependency. -greg On Fri, Apr 19, 2013 at 11:56 AM, Nikolas Fechner <niko...@fechner.cc> wrote: > Dear all, > We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for > using RDKit molecule objects directly in pandas dataframes. Pandas > (http://pandas.pydata.org/) is a python library that offers table-like > datacontainers, which are incredibly useful for anything related to data > mining. Moreover, it integrates nicely with the ipython notebook producing > rendered HTML tables for the dataframes. The RDKit integration allows to > have molecule-type columns and functionality to perform substructure-based > row filtering directly on the pandas table. Additionally, if a dataframe is > exported as HTML or shown within an ipython notebook, the molecules in the > table are rendered as 2D structures. > > The new module is available in the current SF trunk and contains a doctest > header that provides examples of how to use it. > > I hope some of you find that interesting. As always, bug reports, comments, > ideas... are very much appreciated. > > Best, > Nikolas > > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss