Hi,
Strange, I'm also using pandas 0.10.1, but it seems pretty obvious to me
that the problem is related to that, although it's not exactly clear to
me now why it should not happen at your system but on only on mine then :)
For the others following the conversation: Sorry for being sloppy and
discussing with Niko directly and in German, but I wanted to check the
hypothesis with him first to spare you the additional email traffic and
just let you know the result once we found the problem:
When printing out the html code to a file as Niko suggested I realized
that '<' and '>' at the beginning and the end of the img tag are masked
in the html code as '<' and '>'. This makes the html parser of the
browser ignore them (and at the same time displaying the correct
characters in the string in the table). Unmasking them in the html code
gives the correct renderings of the molecules.
Best,
Markus
On 05/08/2013 11:25 AM, Nikolas Fechner wrote:
Hi Markus,
Nice find! That could very likely be the cause for the problem. I just
saw that in the very recent version 0.11 (22. April 2013) a new
attribute was introduced to the pandas to_html() method that should
have exactly that effect.
*escape : boolean, default True*
*Convert the characters <, >, and & to HTML-safe sequences.*
This wasn't there in versions 0.10/0.10.1, which is what I was using
so far. Are you using pandas 0.11? I will update my pandas and check
that and if necessary find a way to deal with this in the PandasTools.
Thanks for finding that.
Best,
Niko
**
**
On May 8, 2013 at 10:59 AM Markus Hartenfeller
<markus.hartenfel...@molecularhealth.com> wrote:
Hi Niko,
Ich weiss jetzt denke ich woran es liegt: Im Anhang findest du 2
files: antibiotics.html ist der direkte print-out von python. Die
Zeichen '<' und '>' am Anfang und am Ende des img tags sind im Code
html-maskiert, also durch '<' bzw. '>' ersetzt. Deshalb werden
sie im Browser auch 'normal' angezeigt. Wenn ich sie durch die ASCII
Zeichen ersetze (wie im File _antibiotics.html) zeigt der browser die
Strukturen korrekt an.
Wenn du mal Zeit dafuer hast: Kannst du das im code nachvollziehen?
Cheers,
Markus
On 05/08/2013 10:29 AM, Nikolas Fechner wrote:
Hi Markus,
Sorry, but I am running a bit out of ideas. Could you check whether
the structures are rendered if you write the "dataframe.to_html()"
to a file and open that as a webpage. If this works than it probably
has to do something with the ipython environment (btw, which version
are you using?).
Best,
Niko
On May 8, 2013 at 9:51 AM Markus Hartenfeller
<markus.hartenfel...@molecularhealth.com>
<mailto:markus.hartenfel...@molecularhealth.com> wrote:
Hi Niko,
I tried this piece of code adapted from the doctest and got the
same result (table is fine, but no rendering of molecules):
from rdkit.Chem import PandasTools
import pandas as pd
import os
from rdkit import RDConfig
from rdkit.Chem.Draw import IPythonConsole
from IPython.core.display import HTML
antibiotics = pd.DataFrame(columns=['Name','Smiles'])
antibiotics =
antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline
G'}, ignore_index=True)#Penicilline G
antibiotics =
antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'},
ignore_index=True)#Tetracycline
antibiotics =
antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'},
ignore_index=True)#Ampicilline
PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True)
display(HTML(antibiotics.to_html()))
The img tag and the png encoding themselves are fine. If I paste
one in a simple html page and open it with the same browser the
molecule is rendered.
Best,
Markus
On 05/08/2013 09:03 AM, Fechner, Nikolas wrote:
Hi Markus,
Could you try the examples that are included as doctests in the
PandasTools.py module? These should definitely work and show
rendered molecules in the tables.
Best,
Niko
From: Markus Hartenfeller <
markus.hartenfel...@molecularhealth.com
<mailto:markus.hartenfel...@molecularhealth.com>>
Date: Tuesday, May 7, 2013 1:40 PM
To: " rdkit-discuss@lists.sourceforge.net
<mailto:rdkit-discuss@lists.sourceforge.net>" <
rdkit-discuss@lists.sourceforge.net
<mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS
integration
Sorry for the confusion, I truncated the string myself in the mail
because I did not want to paste the whole beast. The fields
contain the full strings and the tag is closed.
Best,
Markus
On 05/07/2013 01:25 PM, Nikolas Fechner wrote:
When developing the module I occasionally had problems with
*very* long png strings, because the pandas maximal column width
applies to the string, which is what is stored in the dataframe,
before the image rendering. As an effect the truncated png string
was shown in the table (exactly the "...' ending shown in your
example).
You could try manually setting the maximal width very high (e.g.
pandas.set_option("display.max_colwidth",100000)). This should be
done automatically by the PandasTools, which sets it the
len(PNG)+100 for the longest string found during rendering, but
because this rarely had an impact I could very well have overseen
some problems with this strategy.
Best,
Niko
On May 7, 2013 at 1:13 PM Markus Hartenfeller
<markus.hartenfel...@molecularhealth.com>
<mailto:markus.hartenfel...@molecularhealth.com> wrote:
Thanks again for your reply. That's what I have tried:
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import IPythonConsole
from IPython.core.display import HTML
df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False)
display(HTML(df.to_html()))
So it is a dataframe and .to_html() works fine in general. I see
all sdf fields. It's just that the molecule column contains
string value of this kind:
<img src=" ...
The notebook somehow does not realize that it is an html tag
with an image, but instead renders it as a normal string (just
like before with the single molecule).
Best wishes,
Markus
On 05/07/2013 12:57 PM, Nikolas Fechner wrote:
Just for clarification, are you trying to render a dataframe or
a series/single column? The pandas series object has no
to_html() method and is therefore rendered as string only.
Moreover, if you select a single column, e.g. 'ROMol' from a
dataframe by df['ROMol'] you will get a series object that is
rendered as string. If you select a set of columns you get a
dataframe, for which the HTML rendering should work. The latter
also works for a single column if you enclose in double
brackets df[ *[*'ROMol' *]*], which will give a single-column
dataframe. This took me some time to figure out and the silent
conversion that sometimes occurs can be quite confusing.
Best,
Niko
On May 7, 2013 at 11:33 AM Markus Hartenfeller
<markus.hartenfel...@molecularhealth.com>
<mailto:markus.hartenfel...@molecularhealth.com> wrote:
Thanks for your help, Niko. Importing the iPythonConsole from
rdkit + removing the 'print' command did the trick for a
single molecule :)
Unfortunately, molecules in data frames are still shown as
strings, even when forcing html rendering. I will try to get
this working and report here if I make any progress. In case
somebody has already faced the same problem please let me know.
Best,
Markus
On 05/07/2013 10:27 AM, Nikolas Fechner wrote:
Hi Markus,
glad you think it could be useful :). Regarding the problem,
there are two things: You have to import the RDKit
IPythonConsole to enable the molecule rendering (from
rdkit.Chem.Draw import IPythonConsole) and if you trigger the
output using 'print' the notebook will always use string
rendering (AFAIK). Just try 'm' alone (instead of 'print m').
Alternatively, you can always force the notebook to do a HTML
rendering (useful for large dataframe):
from IPython.core.display import HTML
display(HTML('''any HTML string e.g. dataframe.to_html()'''))
I hope that helps.
Best,
Niko
On May 7, 2013 at 10:02 AM Markus Hartenfeller
<markus.hartenfel...@molecularhealth.com>
<mailto:markus.hartenfel...@molecularhealth.com> wrote:
Hi Nikolas,
I had a first look at the PandasTools package: very cool! I
think this is going to be useful for many rdkit users. I'm
looking forward to using it in the future. Thanks for
sharing this module.
I'm having troubles to see the molecule depictions in the
ipython notebook though (both in tables and by just printing
out a single molecule).
This code in a ipython notebook
from rdkit import Chem
from rdkit.Chem import PandasTools
m=Chem.MolFromSmiles('N1CCNCC1')
print m
gives me
<img src=" ...
a very long string with the base64 encoding of the image,
but not the image itself. Plotting from matplotlib works
fine. Did I forget to import something, or could it be a
browser issue? I am using centOS 6 and Firefox.
Thanks in advance.
Best,
Markus
On 04/19/2013 11:56 AM, Nikolas Fechner wrote:
Dear all,
We developed a new module ( rdkit.Chem.PandasTools.py )
that allows for using RDKit molecule objects directly in
pandas dataframes. Pandas ( http://pandas.pydata.org/) is a
python library that offers table-like datacontainers, which
are incredibly useful for anything related to data mining.
Moreover, it integrates nicely with the ipython notebook
producing rendered HTML tables for the dataframes. The
RDKit integration allows to have molecule-type columns and
functionality to perform substructure-based row filtering
directly on the pandas table. Additionally, if a dataframe
is exported as HTML or shown within an ipython notebook,
the molecules in the table are rendered as 2D structures.
The new module is available in the current SF trunk and
contains a doctest header that provides examples of how to
use it.
I hope some of you find that interesting. As always, bug
reports, comments, ideas... are very much appreciated.
Best,
Nikolas
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis& visualization. Get a free
account!http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph
databases and
their applications. This 200-page book is written by three
acclaimed
leaders in the field. The early access version is available now.
Download your free book today!
http://p.sf.net/sfu/neotech_d2d_may_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph
databases and
their applications. This 200-page book is written by three
acclaimed
leaders in the field. The early access version is available now.
Download your free book today!
http://p.sf.net/sfu/neotech_d2d_may_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today!
http://p.sf.net/sfu/neotech_d2d_may_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today!
http://p.sf.net/sfu/neotech_d2d_may_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss