Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Hi Niko, I tried this piece of code adapted from the doctest and got the same result (table is fine, but no rendering of molecules): from rdkit.Chem import PandasTools import pandas as pd import os from rdkit import RDConfig from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML antibiotics = pd.DataFrame(columns=['Name','Smiles']) antibiotics = antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline G'}, ignore_index=True)#Penicilline G antibiotics = antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'}, ignore_index=True)#Tetracycline antibiotics = antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'}, ignore_index=True)#Ampicilline PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True) display(HTML(antibiotics.to_html())) The img tag and the png encoding themselves are fine. If I paste one in a simple html page and open it with the same browser the molecule is rendered. Best, Markus On 05/08/2013 09:03 AM, Fechner, Nikolas wrote: Hi Markus, Could you try the examples that are included as doctests in the PandasTools.py module? These should definitely work and show rendered molecules in the tables. Best, Niko From: Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com Date: Tuesday, May 7, 2013 1:40 PM To: rdkit-discuss@lists.sourceforge.net mailto:rdkit-discuss@lists.sourceforge.net rdkit-discuss@lists.sourceforge.net mailto:rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS integration Sorry for the confusion, I truncated the string myself in the mail because I did not want to paste the whole beast. The fields contain the full strings and the tag is closed. Best, Markus On 05/07/2013 01:25 PM, Nikolas Fechner wrote: When developing the module I occasionally had problems with *very* long png strings, because the pandas maximal column width applies to the string, which is what is stored in the dataframe, before the image rendering. As an effect the truncated png string was shown in the table (exactly the ...' ending shown in your example). You could try manually setting the maximal width very high (e.g. pandas.set_option(display.max_colwidth,10)). This should be done automatically by the PandasTools, which sets it the len(PNG)+100 for the longest string found during rendering, but because this rarely had an impact I could very well have overseen some problems with this strategy. Best, Niko On May 7, 2013 at 1:13 PM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Thanks again for your reply. That's what I have tried: from rdkit import Chem from rdkit.Chem import AllChem import pandas as pd from rdkit.Chem import PandasTools from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False) display(HTML(df.to_html())) So it is a dataframe and .to_html() works fine in general. I see all sdf fields. It's just that the molecule column contains string value of this kind: img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... The notebook somehow does not realize that it is an html tag with an image, but instead renders it as a normal string (just like before with the single molecule). Best wishes, Markus On 05/07/2013 12:57 PM, Nikolas Fechner wrote: Just for clarification, are you trying to render a dataframe or a series/single column? The pandas series object has no to_html() method and is therefore rendered as string only. Moreover, if you select a single column, e.g. 'ROMol' from a dataframe by df['ROMol'] you will get a series object that is rendered as string. If you select a set of columns you get a dataframe, for which the HTML rendering should work. The latter also works for a single column if you enclose in double brackets df[ *[*'ROMol' *]*], which will give a single-column dataframe. This took me some time to figure out and the silent conversion that sometimes occurs can be quite confusing. Best, Niko On May 7, 2013 at 11:33 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Thanks for your help, Niko. Importing the iPythonConsole from rdkit + removing the 'print' command did the trick for a single molecule :) Unfortunately, molecules in data frames are still shown as strings, even when forcing html rendering. I will try to get this working and report here if I make any progress. In case somebody has already faced the same problem please let me know. Best, Markus On 05/07/2013 10:27 AM, Nikolas Fechner wrote: Hi Markus, glad you think it could be useful :). Regarding the problem, there are two things: You
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Hi Markus, Sorry, but I am running a bit out of ideas. Could you check whether the structures are rendered if you write the dataframe.to_html() to a file and open that as a webpage. If this works than it probably has to do something with the ipython environment (btw, which version are you using?). Best, Niko On May 8, 2013 at 9:51 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Hi Niko, I tried this piece of code adapted from the doctest and got the same result (table is fine, but no rendering of molecules): from rdkit.Chem import PandasTools import pandas as pd import os from rdkit import RDConfig from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML antibiotics = pd.DataFrame(columns=['Name','Smiles']) antibiotics = antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline G'}, ignore_index=True)#Penicilline G antibiotics = antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'}, ignore_index=True)#Tetracycline antibiotics = antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'}, ignore_index=True)#Ampicilline PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True) display(HTML(antibiotics.to_html())) The img tag and the png encoding themselves are fine. If I paste one in a simple html page and open it with the same browser the molecule is rendered. Best, Markus On 05/08/2013 09:03 AM, Fechner, Nikolas wrote: Hi Markus, Could you try the examples that are included as doctests in the PandasTools.py module? These should definitely work and show rendered molecules in the tables. Best, Niko From: Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com Date: Tuesday, May 7, 2013 1:40 PM To: rdkit-discuss@lists.sourceforge.net mailto:rdkit-discuss@lists.sourceforge.net rdkit-discuss@lists.sourceforge.net mailto:rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS integration Sorry for the confusion, I truncated the string myself in the mail because I did not want to paste the whole beast. The fields contain the full strings and the tag is closed. Best, Markus On 05/07/2013 01:25 PM, Nikolas Fechner wrote: When developing the module I occasionally had problems with *very* long png strings, because the pandas maximal column width applies to the string, which is what is stored in the dataframe, before the image rendering. As an effect the truncated png string was shown in the table (exactly the ...' ending shown in your example). You could try manually setting the maximal width very high (e.g. pandas.set_option(display.max_colwidth,10)). This should be done automatically by the PandasTools, which sets it the len(PNG)+100 for the longest string found during rendering, but because this rarely had an impact I could very well have overseen some problems with this strategy. Best, Niko On May 7, 2013 at 1:13 PM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Thanks again for your reply. That's what I have tried: from rdkit import Chem from rdkit.Chem import AllChem import pandas as pd from rdkit.Chem import PandasTools from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False) display(HTML(df.to_html())) So it is a dataframe and .to_html() works fine in general. I see all sdf fields. It's just that the molecule column contains string value of this kind: img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... The notebook somehow does not realize that it is an html tag with an image, but instead renders it as a normal string (just like before with the single molecule). Best wishes, Markus On 05/07/2013 12:57 PM, Nikolas Fechner wrote: Just for clarification, are you trying to render a dataframe or a series/single column? The pandas series object has no to_html() method and is therefore rendered
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Hi, Strange, I'm also using pandas 0.10.1, but it seems pretty obvious to me that the problem is related to that, although it's not exactly clear to me now why it should not happen at your system but on only on mine then :) For the others following the conversation: Sorry for being sloppy and discussing with Niko directly and in German, but I wanted to check the hypothesis with him first to spare you the additional email traffic and just let you know the result once we found the problem: When printing out the html code to a file as Niko suggested I realized that '' and '' at the beginning and the end of the img tag are masked in the html code as 'lt;' and 'gt;'. This makes the html parser of the browser ignore them (and at the same time displaying the correct characters in the string in the table). Unmasking them in the html code gives the correct renderings of the molecules. Best, Markus On 05/08/2013 11:25 AM, Nikolas Fechner wrote: Hi Markus, Nice find! That could very likely be the cause for the problem. I just saw that in the very recent version 0.11 (22. April 2013) a new attribute was introduced to the pandas to_html() method that should have exactly that effect. *escape : boolean, default True* *Convert the characters , , and to HTML-safe sequences.* This wasn't there in versions 0.10/0.10.1, which is what I was using so far. Are you using pandas 0.11? I will update my pandas and check that and if necessary find a way to deal with this in the PandasTools. Thanks for finding that. Best, Niko ** ** On May 8, 2013 at 10:59 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Hi Niko, Ich weiss jetzt denke ich woran es liegt: Im Anhang findest du 2 files: antibiotics.html ist der direkte print-out von python. Die Zeichen '' und '' am Anfang und am Ende des img tags sind im Code html-maskiert, also durch 'lt;' bzw. 'gt;' ersetzt. Deshalb werden sie im Browser auch 'normal' angezeigt. Wenn ich sie durch die ASCII Zeichen ersetze (wie im File _antibiotics.html) zeigt der browser die Strukturen korrekt an. Wenn du mal Zeit dafuer hast: Kannst du das im code nachvollziehen? Cheers, Markus On 05/08/2013 10:29 AM, Nikolas Fechner wrote: Hi Markus, Sorry, but I am running a bit out of ideas. Could you check whether the structures are rendered if you write the dataframe.to_html() to a file and open that as a webpage. If this works than it probably has to do something with the ipython environment (btw, which version are you using?). Best, Niko On May 8, 2013 at 9:51 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Hi Niko, I tried this piece of code adapted from the doctest and got the same result (table is fine, but no rendering of molecules): from rdkit.Chem import PandasTools import pandas as pd import os from rdkit import RDConfig from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML antibiotics = pd.DataFrame(columns=['Name','Smiles']) antibiotics = antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline G'}, ignore_index=True)#Penicilline G antibiotics = antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'}, ignore_index=True)#Tetracycline antibiotics = antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'}, ignore_index=True)#Ampicilline PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True) display(HTML(antibiotics.to_html())) The img tag and the png encoding themselves are fine. If I paste one in a simple html page and open it with the same browser the molecule is rendered. Best, Markus On 05/08/2013 09:03 AM, Fechner, Nikolas wrote: Hi Markus, Could you try the examples that are included as doctests in the PandasTools.py module? These should definitely work and show rendered molecules in the tables. Best, Niko From: Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com Date: Tuesday, May 7, 2013 1:40 PM To: rdkit-discuss@lists.sourceforge.net mailto:rdkit-discuss@lists.sourceforge.net rdkit-discuss@lists.sourceforge.net mailto:rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS integration Sorry for the confusion, I truncated the string myself in the mail because I did not want to paste the whole beast. The fields contain the full strings and the tag is closed. Best, Markus On 05/07/2013 01:25 PM, Nikolas Fechner wrote: When developing the module I occasionally had problems with *very* long png strings, because the pandas maximal column width applies to the string, which is what is stored in the dataframe, before the image rendering. As an effect the truncated png string was shown
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Hi Nikolas, I had a first look at the PandasTools package: very cool! I think this is going to be useful for many rdkit users. I'm looking forward to using it in the future. Thanks for sharing this module. I'm having troubles to see the molecule depictions in the ipython notebook though (both in tables and by just printing out a single molecule). This code in a ipython notebook from rdkit import Chem from rdkit.Chem import PandasTools m=Chem.MolFromSmiles('N1CCNCC1') print m gives me img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... a very long string with the base64 encoding of the image, but not the image itself. Plotting from matplotlib works fine. Did I forget to import something, or could it be a browser issue? I am using centOS 6 and Firefox. Thanks in advance. Best, Markus On 04/19/2013 11:56 AM, Nikolas Fechner wrote: Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas (http://pandas.pydata.org/) is a python library that offers table-like datacontainers, which are incredibly useful for anything related to data mining. Moreover, it integrates nicely with the ipython notebook producing rendered HTML tables for the dataframes. The RDKit integration allows to have molecule-type columns and functionality to perform substructure-based row filtering directly on the pandas table. Additionally, if a dataframe is exported as HTML or shown within an ipython notebook, the molecules in the table are rendered as 2D structures. The new module is available in the current SF trunk and contains a doctest header that provides examples of how to use it. I hope some of you find that interesting. As always, bug reports, comments, ideas... are very much appreciated. Best, Nikolas -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. This 200-page book is written by three acclaimed leaders in the field. The early access version is available now. Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Hi Markus, glad you think it could be useful :). Regarding the problem, there are two things: You have to import the RDKit IPythonConsole to enable the molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if you trigger the output using 'print' the notebook will always use string rendering (AFAIK). Just try 'm' alone (instead of 'print m'). Alternatively, you can always force the notebook to do a HTML rendering (useful for large dataframe): from IPython.core.display import HTML display(HTML('''any HTML string e.g. dataframe.to_html()''')) I hope that helps. Best, Niko On May 7, 2013 at 10:02 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Hi Nikolas, I had a first look at the PandasTools package: very cool! I think this is going to be useful for many rdkit users. I'm looking forward to using it in the future. Thanks for sharing this module. I'm having troubles to see the molecule depictions in the ipython notebook though (both in tables and by just printing out a single molecule). This code in a ipython notebook from rdkit import Chem from rdkit.Chem import PandasTools m=Chem.MolFromSmiles('N1CCNCC1') print m gives me img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... a very long string with the base64 encoding of the image, but not the image itself. Plotting from matplotlib works fine. Did I forget to import something, or could it be a browser issue? I am using centOS 6 and Firefox. Thanks in advance. Best, Markus On 04/19/2013 11:56 AM, Nikolas Fechner wrote: Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas ( http://pandas.pydata.org/ http://pandas.pydata.org/ ) is a python library that offers table-like datacontainers, which are incredibly useful for anything related to data mining. Moreover, it integrates nicely with the ipython notebook producing rendered HTML tables for the dataframes. The RDKit integration allows to have molecule-type columns and functionality to perform substructure-based row filtering directly on the pandas table. Additionally, if a dataframe is exported as HTML or shown within an ipython notebook, the molecules in the table are rendered as 2D structures. The new module is available in the current SF trunk and contains a doctest header that provides examples of how to use it. I hope some of you find that interesting. As always, bug reports, comments, ideas... are very much appreciated. Best, Nikolas -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net mailto:Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. This 200-page book is written by three acclaimed leaders in the field. The early access version is available now. Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss-- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. This 200-page book is written by three acclaimed leaders in the field. The early access version is available now. Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Thanks for your help, Niko. Importing the iPythonConsole from rdkit + removing the 'print' command did the trick for a single molecule :) Unfortunately, molecules in data frames are still shown as strings, even when forcing html rendering. I will try to get this working and report here if I make any progress. In case somebody has already faced the same problem please let me know. Best, Markus On 05/07/2013 10:27 AM, Nikolas Fechner wrote: Hi Markus, glad you think it could be useful :). Regarding the problem, there are two things: You have to import the RDKit IPythonConsole to enable the molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if you trigger the output using 'print' the notebook will always use string rendering (AFAIK). Just try 'm' alone (instead of 'print m'). Alternatively, you can always force the notebook to do a HTML rendering (useful for large dataframe): from IPython.core.display import HTML display(HTML('''any HTML string e.g. dataframe.to_html()''')) I hope that helps. Best, Niko On May 7, 2013 at 10:02 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Hi Nikolas, I had a first look at the PandasTools package: very cool! I think this is going to be useful for many rdkit users. I'm looking forward to using it in the future. Thanks for sharing this module. I'm having troubles to see the molecule depictions in the ipython notebook though (both in tables and by just printing out a single molecule). This code in a ipython notebook from rdkit import Chem from rdkit.Chem import PandasTools m=Chem.MolFromSmiles('N1CCNCC1') print m gives me img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... a very long string with the base64 encoding of the image, but not the image itself. Plotting from matplotlib works fine. Did I forget to import something, or could it be a browser issue? I am using centOS 6 and Firefox. Thanks in advance. Best, Markus On 04/19/2013 11:56 AM, Nikolas Fechner wrote: Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas ( http://pandas.pydata.org/) is a python library that offers table-like datacontainers, which are incredibly useful for anything related to data mining. Moreover, it integrates nicely with the ipython notebook producing rendered HTML tables for the dataframes. The RDKit integration allows to have molecule-type columns and functionality to perform substructure-based row filtering directly on the pandas table. Additionally, if a dataframe is exported as HTML or shown within an ipython notebook, the molecules in the table are rendered as 2D structures. The new module is available in the current SF trunk and contains a doctest header that provides examples of how to use it. I hope some of you find that interesting. As always, bug reports, comments, ideas... are very much appreciated. Best, Nikolas -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net mailto:Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. This 200-page book is written by three acclaimed leaders in the field. The early access version is available now. Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. This 200-page book is written by three acclaimed leaders in the field. The early access version is available now. Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Just for clarification, are you trying to render a dataframe or a series/single column? The pandas series object has no to_html() method and is therefore rendered as string only. Moreover, if you select a single column, e.g. 'ROMol' from a dataframe by df['ROMol'] you will get a series object that is rendered as string. If you select a set of columns you get a dataframe, for which the HTML rendering should work. The latter also works for a single column if you enclose in double brackets df[ ['ROMol' ]], which will give a single-column dataframe. This took me some time to figure out and the silent conversion that sometimes occurs can be quite confusing. Best, Niko On May 7, 2013 at 11:33 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Thanks for your help, Niko. Importing the iPythonConsole from rdkit + removing the 'print' command did the trick for a single molecule :) Unfortunately, molecules in data frames are still shown as strings, even when forcing html rendering. I will try to get this working and report here if I make any progress. In case somebody has already faced the same problem please let me know. Best, Markus On 05/07/2013 10:27 AM, Nikolas Fechner wrote: Hi Markus, glad you think it could be useful :). Regarding the problem, there are two things: You have to import the RDKit IPythonConsole to enable the molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if you trigger the output using 'print' the notebook will always use string rendering (AFAIK). Just try 'm' alone (instead of 'print m'). Alternatively, you can always force the notebook to do a HTML rendering (useful for large dataframe): from IPython.core.display import HTML display(HTML('''any HTML string e.g. dataframe.to_html()''')) I hope that helps. Best, Niko On May 7, 2013 at 10:02 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Hi Nikolas, I had a first look at the PandasTools package: very cool! I think this is going to be useful for many rdkit users. I'm looking forward to using it in the future. Thanks for sharing this module. I'm having troubles to see the molecule depictions in the ipython notebook though (both in tables and by just printing out a single molecule). This code in a ipython notebook from rdkit import Chem from rdkit.Chem import PandasTools m=Chem.MolFromSmiles('N1CCNCC1') print m gives me img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... a very long string with the base64 encoding of the image, but not the image itself. Plotting from matplotlib works fine. Did I forget to import something, or could it be a browser issue? I am using centOS 6 and Firefox. Thanks in advance. Best, Markus On 04/19/2013 11:56 AM, Nikolas Fechner wrote: Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas ( http://pandas.pydata.org/ http://pandas.pydata.org/ ) is a python library that offers table-like datacontainers, which are incredibly useful for anything related to data mining. Moreover, it integrates nicely with the ipython notebook producing rendered HTML tables for the dataframes. The RDKit integration allows to have molecule-type columns and functionality to perform substructure-based row filtering directly on the pandas table. Additionally, if a dataframe is exported as HTML or shown within an ipython notebook, the molecules in the table are rendered as 2D structures. The new module is available in the current SF trunk and contains a doctest header that provides examples of how to use it. I hope some of you find that interesting. As always, bug reports, comments, ideas... are very much appreciated. Best, Nikolas -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter http://www2.precog.com/precogplatform/slashdotnewsletter
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
When developing the module I occasionally had problems with *very* long png strings, because the pandas maximal column width applies to the string, which is what is stored in the dataframe, before the image rendering. As an effect the truncated png string was shown in the table (exactly the ...' ending shown in your example). You could try manually setting the maximal width very high (e.g. pandas.set_option(display.max_colwidth,10)). This should be done automatically by the PandasTools, which sets it the len(PNG)+100 for the longest string found during rendering, but because this rarely had an impact I could very well have overseen some problems with this strategy. Best, Niko On May 7, 2013 at 1:13 PM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Thanks again for your reply. That's what I have tried: from rdkit import Chem from rdkit.Chem import AllChem import pandas as pd from rdkit.Chem import PandasTools from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False) display(HTML(df.to_html())) So it is a dataframe and .to_html() works fine in general. I see all sdf fields. It's just that the molecule column contains string value of this kind: img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... The notebook somehow does not realize that it is an html tag with an image, but instead renders it as a normal string (just like before with the single molecule). Best wishes, Markus On 05/07/2013 12:57 PM, Nikolas Fechner wrote: Just for clarification, are you trying to render a dataframe or a series/single column? The pandas series object has no to_html() method and is therefore rendered as string only. Moreover, if you select a single column, e.g. 'ROMol' from a dataframe by df['ROMol'] you will get a series object that is rendered as string. If you select a set of columns you get a dataframe, for which the HTML rendering should work. The latter also works for a single column if you enclose in double brackets df[ ['ROMol' ]], which will give a single-column dataframe. This took me some time to figure out and the silent conversion that sometimes occurs can be quite confusing. Best, Niko On May 7, 2013 at 11:33 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Thanks for your help, Niko. Importing the iPythonConsole from rdkit + removing the 'print' command did the trick for a single molecule :) Unfortunately, molecules in data frames are still shown as strings, even when forcing html rendering. I will try to get this working and report here if I make any progress. In case somebody has already faced the same problem please let me know. Best, Markus On 05/07/2013 10:27 AM, Nikolas Fechner wrote: Hi Markus, glad you think it could be useful :). Regarding the problem, there are two things: You have to import the RDKit IPythonConsole to enable the molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if you trigger the output using 'print' the notebook will always use string rendering (AFAIK). Just try 'm' alone (instead of 'print m'). Alternatively, you can always force the notebook to do a HTML rendering (useful for large dataframe): from IPython.core.display import HTML display(HTML('''any HTML string e.g. dataframe.to_html()''')) I hope that helps. Best, Niko On May 7, 2013 at 10:02 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Hi Nikolas, I had a first look at the PandasTools package: very cool! I think this is going to be useful for many rdkit users. I'm looking forward to using it in the future. Thanks for sharing this module. I'm having troubles to see the molecule depictions in the ipython notebook though (both in tables and by just printing out a single molecule). This code in a ipython notebook from rdkit import Chem from rdkit.Chem import PandasTools m=Chem.MolFromSmiles('N1CCNCC1') print m gives me img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... a very long string with the base64
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Sorry for the confusion, I truncated the string myself in the mail because I did not want to paste the whole beast. The fields contain the full strings and the tag is closed. Best, Markus On 05/07/2013 01:25 PM, Nikolas Fechner wrote: When developing the module I occasionally had problems with *very* long png strings, because the pandas maximal column width applies to the string, which is what is stored in the dataframe, before the image rendering. As an effect the truncated png string was shown in the table (exactly the ...' ending shown in your example). You could try manually setting the maximal width very high (e.g. pandas.set_option(display.max_colwidth,10)). This should be done automatically by the PandasTools, which sets it the len(PNG)+100 for the longest string found during rendering, but because this rarely had an impact I could very well have overseen some problems with this strategy. Best, Niko On May 7, 2013 at 1:13 PM Markus Hartenfeller markus.hartenfel...@molecularhealth.com wrote: Thanks again for your reply. That's what I have tried: from rdkit import Chem from rdkit.Chem import AllChem import pandas as pd from rdkit.Chem import PandasTools from rdkit.Chem.Draw import IPythonConsole from IPython.core.display import HTML df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False) display(HTML(df.to_html())) So it is a dataframe and .to_html() works fine in general. I see all sdf fields. It's just that the molecule column contains string value of this kind: img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... The notebook somehow does not realize that it is an html tag with an image, but instead renders it as a normal string (just like before with the single molecule). Best wishes, Markus On 05/07/2013 12:57 PM, Nikolas Fechner wrote: Just for clarification, are you trying to render a dataframe or a series/single column? The pandas series object has no to_html() method and is therefore rendered as string only. Moreover, if you select a single column, e.g. 'ROMol' from a dataframe by df['ROMol'] you will get a series object that is rendered as string. If you select a set of columns you get a dataframe, for which the HTML rendering should work. The latter also works for a single column if you enclose in double brackets df[ *[*'ROMol' *]*], which will give a single-column dataframe. This took me some time to figure out and the silent conversion that sometimes occurs can be quite confusing. Best, Niko On May 7, 2013 at 11:33 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Thanks for your help, Niko. Importing the iPythonConsole from rdkit + removing the 'print' command did the trick for a single molecule :) Unfortunately, molecules in data frames are still shown as strings, even when forcing html rendering. I will try to get this working and report here if I make any progress. In case somebody has already faced the same problem please let me know. Best, Markus On 05/07/2013 10:27 AM, Nikolas Fechner wrote: Hi Markus, glad you think it could be useful :). Regarding the problem, there are two things: You have to import the RDKit IPythonConsole to enable the molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if you trigger the output using 'print' the notebook will always use string rendering (AFAIK). Just try 'm' alone (instead of 'print m'). Alternatively, you can always force the notebook to do a HTML rendering (useful for large dataframe): from IPython.core.display import HTML display(HTML('''any HTML string e.g. dataframe.to_html()''')) I hope that helps. Best, Niko On May 7, 2013 at 10:02 AM Markus Hartenfeller markus.hartenfel...@molecularhealth.com mailto:markus.hartenfel...@molecularhealth.com wrote: Hi Nikolas, I had a first look at the PandasTools package: very cool! I think this is going to be useful for many rdkit users. I'm looking forward to using it in the future. Thanks for sharing this module. I'm having troubles to see the molecule depictions in the ipython notebook though (both in tables and by just printing out a single molecule). This code in a ipython notebook from rdkit import Chem from rdkit.Chem import PandasTools m=Chem.MolFromSmiles('N1CCNCC1') print m gives me img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ... a very long string with the base64 encoding of the image, but not the image itself. Plotting from matplotlib works fine. Did I forget to import something, or could it be a browser issue? I am using centOS 6 and Firefox. Thanks in advance. Best, Markus On 04/19/2013 11:56 AM, Nikolas Fechner wrote: Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas ( http://pandas.pydata.org/) is a python library that offers table-like datacontainers, which
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
I just started playing around with the Pandas module, this is very cool stuff. Thanks so much Nikolas for the contribution. I definitely owe you a beer at the UGM. It might be worth noting that the you need to install PIL in order to use the Pandas module. Everything will install without a problem, but you'll get an exception like this when you try to print a dataframe without PIL. File /Users/walters/python/RDKIT_2013_04_21/rdkit/sping/PIL/pidPIL.py, line 33, in module import Image, ImageFont, ImageDraw Best, Pat On Sun, Apr 21, 2013 at 5:00 PM, Taka Seri serit...@gmail.com wrote: Dear Greg. Thank you your quick reply ! The modified version was worked without AvalonTools . That's nice tool . I appreciate your kindness. Takayuki 2013/4/22 Greg Landrum greg.land...@gmail.com Dear Takayuki, On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote: I'm interested in this work I want to use PandasTools. But I got error message, ImportError: cannot import name pyAvalonTools. I just checked in a modified version that will work when the avalon tools are not installed. If you want to install the avalon tools anyway, there's information below that shows how: So, I tried to rebuild RDKit like this. $ cmake -D RDK_BUID_AVALON_SUPPORT=ON But build was failed. -- Configuring done CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library): Cannot find source file: /common/layout.c Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp .hxx .in .txx Call Stack (most recent call first): External/AvalonTools/CMakeLists.txt:43 (rdkit_library) If anyone who has suggestion, please help me. You need to tell it where to find the source for the avalon tools. - Download the source from here: http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download - Create an avalon tools directory somewhere, for example in /usr/local/src/avalontools. - Extract the tar file in that directory. - Run cmake as follows: cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution -DRDK_BUILD_AVALON_SUPPORT=ON Best, -greg -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Hi Pat, I am glad you find it useful. Many thanks for pointing out the PIL dependency. I had installed It already for different reasons and did not think about mentioning it. Best, Niko On 22 Apr 2013, at 17:52, Patrick Walters wpwalt...@gmail.com wrote: I just started playing around with the Pandas module, this is very cool stuff. Thanks so much Nikolas for the contribution. I definitely owe you a beer at the UGM. It might be worth noting that the you need to install PIL in order to use the Pandas module. Everything will install without a problem, but you'll get an exception like this when you try to print a dataframe without PIL. File /Users/walters/python/RDKIT_2013_04_21/rdkit/sping/PIL/pidPIL.py, line 33, in module import Image, ImageFont, ImageDraw Best, Pat On Sun, Apr 21, 2013 at 5:00 PM, Taka Seri serit...@gmail.com wrote: Dear Greg. Thank you your quick reply ! The modified version was worked without AvalonTools . That's nice tool . I appreciate your kindness. Takayuki 2013/4/22 Greg Landrum greg.land...@gmail.com Dear Takayuki, On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote: I'm interested in this work I want to use PandasTools. But I got error message, ImportError: cannot import name pyAvalonTools. I just checked in a modified version that will work when the avalon tools are not installed. If you want to install the avalon tools anyway, there's information below that shows how: So, I tried to rebuild RDKit like this. $ cmake -D RDK_BUID_AVALON_SUPPORT=ON But build was failed. -- Configuring done CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library): Cannot find source file: /common/layout.c Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp .hxx .in .txx Call Stack (most recent call first): External/AvalonTools/CMakeLists.txt:43 (rdkit_library) If anyone who has suggestion, please help me. You need to tell it where to find the source for the avalon tools. - Download the source from here: http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download - Create an avalon tools directory somewhere, for example in /usr/local/src/avalontools. - Extract the tar file in that directory. - Run cmake as follows: cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution -DRDK_BUILD_AVALON_SUPPORT=ON Best, -greg -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Dear Takayuki, On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote: I'm interested in this work I want to use PandasTools. But I got error message, ImportError: cannot import name pyAvalonTools. I just checked in a modified version that will work when the avalon tools are not installed. If you want to install the avalon tools anyway, there's information below that shows how: So, I tried to rebuild RDKit like this. $ cmake -D RDK_BUID_AVALON_SUPPORT=ON But build was failed. -- Configuring done CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library): Cannot find source file: /common/layout.c Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp .hxx .in .txx Call Stack (most recent call first): External/AvalonTools/CMakeLists.txt:43 (rdkit_library) If anyone who has suggestion, please help me. You need to tell it where to find the source for the avalon tools. - Download the source from here: http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download - Create an avalon tools directory somewhere, for example in /usr/local/src/avalontools. - Extract the tar file in that directory. - Run cmake as follows: cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution -DRDK_BUILD_AVALON_SUPPORT=ON Best, -greg -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
Dear Greg. Thank you your quick reply ! The modified version was worked without AvalonTools . That's nice tool . I appreciate your kindness. Takayuki 2013/4/22 Greg Landrum greg.land...@gmail.com Dear Takayuki, On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote: I'm interested in this work I want to use PandasTools. But I got error message, ImportError: cannot import name pyAvalonTools. I just checked in a modified version that will work when the avalon tools are not installed. If you want to install the avalon tools anyway, there's information below that shows how: So, I tried to rebuild RDKit like this. $ cmake -D RDK_BUID_AVALON_SUPPORT=ON But build was failed. -- Configuring done CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library): Cannot find source file: /common/layout.c Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp .hxx .in .txx Call Stack (most recent call first): External/AvalonTools/CMakeLists.txt:43 (rdkit_library) If anyone who has suggestion, please help me. You need to tell it where to find the source for the avalon tools. - Download the source from here: http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download - Create an avalon tools directory somewhere, for example in /usr/local/src/avalontools. - Extract the tar file in that directory. - Run cmake as follows: cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution -DRDK_BUILD_AVALON_SUPPORT=ON Best, -greg -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] New module for RDKit - PANDAS integration
Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas (http://pandas.pydata.org/) is a python library that offers table-like datacontainers, which are incredibly useful for anything related to data mining. Moreover, it integrates nicely with the ipython notebook producing rendered HTML tables for the dataframes. The RDKit integration allows to have molecule-type columns and functionality to perform substructure-based row filtering directly on the pandas table. Additionally, if a dataframe is exported as HTML or shown within an ipython notebook, the molecules in the table are rendered as 2D structures. The new module is available in the current SF trunk and contains a doctest header that provides examples of how to use it. I hope some of you find that interesting. As always, bug reports, comments, ideas... are very much appreciated. Best, Nikolas -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
I think Nikolas is being a bit modest... the Pandas integration is pretty cool. :-) Here's an example of using it from the IPython prompt (it's better in the notebook, but that doesn't paste so nicely into email) Loading an SD file: In [1]: from rdkit import Chem In [2]: from rdkit.Chem import PandasTools In [3]: import pandas as pd In [4]: df = PandasTools.LoadSDF('hERG_inhibition_dataset.sdf',includeFingerprints=True) In [5]: df Out[5]: class 'pandas.core.frame.DataFrame' Int64Index: 242 entries, 0 to 241 Data columns: ACTIVITY_CLASS242 non-null values CompoundName 242 non-null values ID242 non-null values MDLPublicKeys 242 non-null values SMILES242 non-null values pIC50 242 non-null values ROMol 242 non-null values dtypes: object(7) And doing a substructure search: In [6]: N3s = df[df['ROMol']=Chem.MolFromSmiles('N(C)(C)C')] In [7]: N3s Out[7]: class 'pandas.core.frame.DataFrame' Int64Index: 177 entries, 0 to 239 Data columns: ACTIVITY_CLASS177 non-null values CompoundName 177 non-null values ID177 non-null values MDLPublicKeys 177 non-null values SMILES177 non-null values pIC50 177 non-null values ROMol 177 non-null values dtypes: object(7) Because I used the includeFingerprints argument, that actually did the search using a substructure fingerprint to speed things up. This is using the avalon fingerprint at the moment, but that will change between now and the release so as to not add an additional dependency. -greg On Fri, Apr 19, 2013 at 11:56 AM, Nikolas Fechner niko...@fechner.cc wrote: Dear all, We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using RDKit molecule objects directly in pandas dataframes. Pandas (http://pandas.pydata.org/) is a python library that offers table-like datacontainers, which are incredibly useful for anything related to data mining. Moreover, it integrates nicely with the ipython notebook producing rendered HTML tables for the dataframes. The RDKit integration allows to have molecule-type columns and functionality to perform substructure-based row filtering directly on the pandas table. Additionally, if a dataframe is exported as HTML or shown within an ipython notebook, the molecules in the table are rendered as 2D structures. The new module is available in the current SF trunk and contains a doctest header that provides examples of how to use it. I hope some of you find that interesting. As always, bug reports, comments, ideas... are very much appreciated. Best, Nikolas -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss