Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-08 Thread Markus Hartenfeller

Hi Niko,

I tried this piece of code adapted from the doctest and got the same 
result (table is fine, but no rendering of molecules):


from rdkit.Chem import PandasTools
import pandas as pd
import os
from rdkit import RDConfig
from rdkit.Chem.Draw import IPythonConsole
from IPython.core.display import HTML
antibiotics = pd.DataFrame(columns=['Name','Smiles'])
antibiotics = 
antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline 
G'}, ignore_index=True)#Penicilline G
antibiotics = 
antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'}, 
ignore_index=True)#Tetracycline
antibiotics = 
antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'}, 
ignore_index=True)#Ampicilline

PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True)
display(HTML(antibiotics.to_html()))


The img tag and the png encoding themselves are fine. If I paste one in 
a simple html page and open it with the same browser the molecule is 
rendered.


Best,
Markus



On 05/08/2013 09:03 AM, Fechner, Nikolas wrote:

Hi Markus,
Could you try the examples that are included as doctests in the 
PandasTools.py module? These should definitely work and show rendered 
molecules in the tables.


Best,
Niko

From: Markus Hartenfeller markus.hartenfel...@molecularhealth.com 
mailto:markus.hartenfel...@molecularhealth.com

Date: Tuesday, May 7, 2013 1:40 PM
To: rdkit-discuss@lists.sourceforge.net 
mailto:rdkit-discuss@lists.sourceforge.net 
rdkit-discuss@lists.sourceforge.net 
mailto:rdkit-discuss@lists.sourceforge.net

Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

Sorry for the confusion, I truncated the string myself in the mail 
because I did not want to paste the whole beast. The fields contain 
the full strings and the tag is closed.


Best,
Markus

On 05/07/2013 01:25 PM, Nikolas Fechner wrote:
When developing the module I occasionally had problems with *very* 
long png strings, because the pandas maximal column width applies to 
the string, which is what is stored in the dataframe, before the 
image rendering. As an effect the truncated png string was shown in 
the table (exactly the ...' ending shown in your example).
 You could try manually setting the maximal width very high (e.g. 
pandas.set_option(display.max_colwidth,10)). This should be 
done automatically by the PandasTools, which sets it the len(PNG)+100 
for the longest string found during rendering, but because this 
rarely had an impact I could very well have overseen some problems 
with this strategy.

Best,
Niko

On May 7, 2013 at 1:13 PM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com wrote:

Thanks again for your reply. That's what I have tried:

from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import IPythonConsole
from IPython.core.display import HTML
df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False)
display(HTML(df.to_html()))

So it is a dataframe and .to_html() works fine in general. I see all 
sdf fields. It's just that the molecule column contains string value 
of this kind:

img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB  ...

The notebook somehow does not realize that it is an html tag with an 
image, but instead renders it as a normal string (just like before 
with the single molecule).


Best wishes,
Markus


On 05/07/2013 12:57 PM, Nikolas Fechner wrote:
Just for clarification, are you trying to render a dataframe or a 
series/single column? The pandas series object has no to_html() 
method and is therefore rendered as string only. Moreover, if you 
select a single column, e.g. 'ROMol' from a dataframe by 
df['ROMol'] you will get a series object that is rendered as 
string. If you select a set of columns you get a dataframe, for 
which the HTML rendering should work. The latter also works for a 
single column if you enclose in double brackets df[ *[*'ROMol' 
*]*], which will give a single-column dataframe. This took me some 
time to figure out and the silent conversion that sometimes occurs 
can be quite confusing.

Best,
Niko

On May 7, 2013 at 11:33 AM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com 
mailto:markus.hartenfel...@molecularhealth.com wrote:
Thanks for your help, Niko. Importing the iPythonConsole from 
rdkit + removing the 'print' command did the trick for a single 
molecule :)


Unfortunately, molecules in data frames are still shown as 
strings, even when forcing html rendering. I will try to get this 
working and report here if I make any progress. In case somebody 
has already faced the same problem please let me know.


Best,
Markus


On 05/07/2013 10:27 AM, Nikolas Fechner wrote:

Hi Markus,
glad you think it could be useful :). Regarding the problem, 
there are two things: You

Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-08 Thread Nikolas Fechner
Hi Markus,
Sorry, but I am running a bit out of ideas. Could you check whether the
structures are rendered if you write the dataframe.to_html() to a file and
open that as a webpage. If this works than it probably has to do something with
the ipython environment (btw, which version are you using?).

Best,
Niko

On May 8, 2013 at 9:51 AM Markus Hartenfeller
markus.hartenfel...@molecularhealth.com wrote:

 Hi Niko,
 
  I tried this piece of code adapted from the doctest and got the same result
 (table is fine, but no rendering of molecules):
 
  from rdkit.Chem import PandasTools
  import pandas as pd
  import os
  from rdkit import RDConfig
  from rdkit.Chem.Draw import IPythonConsole
  from IPython.core.display import HTML
  antibiotics = pd.DataFrame(columns=['Name','Smiles'])
  antibiotics =
 antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline
 G'}, ignore_index=True)#Penicilline G
  antibiotics =
 antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'},
 ignore_index=True)#Tetracycline
  antibiotics =
 antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'},
 ignore_index=True)#Ampicilline
 
 PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True)
  display(HTML(antibiotics.to_html()))
 
 
  The img tag and the png encoding themselves are fine. If I paste one in a
 simple html page and open it with the same browser the molecule is rendered.
 
  Best,
  Markus
 
 
 
  On 05/08/2013 09:03 AM, Fechner, Nikolas wrote:
 
 Hi Markus,
   Could you try the examples that are included as doctests in the
  PandasTools.py module? These should definitely work and show rendered
  molecules in the tables.
  
   Best,
   Niko
  
   From: Markus Hartenfeller  markus.hartenfel...@molecularhealth.com
  mailto:markus.hartenfel...@molecularhealth.com 
   Date: Tuesday, May 7, 2013 1:40 PM
   To:  rdkit-discuss@lists.sourceforge.net
  mailto:rdkit-discuss@lists.sourceforge.net  
  rdkit-discuss@lists.sourceforge.net
  mailto:rdkit-discuss@lists.sourceforge.net 
   Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS integration
  
   Sorry for the confusion, I truncated the string myself in the mail
  because I did not want to paste the whole beast. The fields contain the full
  strings and the tag is closed.
  
   Best,
   Markus
  
   On 05/07/2013 01:25 PM, Nikolas Fechner wrote:
  
   When developing the module I occasionally had
  problems with *very* long png strings, because the pandas
  maximal column width applies to the string, which is what is
  stored in the dataframe, before the image rendering. As an
  effect the truncated png string was shown in the table
  (exactly the ...' ending shown in your example).
 You could try manually setting the maximal width very high (e.g.
   pandas.set_option(display.max_colwidth,10)). This should be done
   automatically by the PandasTools, which sets it the len(PNG)+100 for the
   longest string found during rendering, but because this rarely had an
   impact I could very well have overseen some problems with this strategy.
   
Best,
Niko
   
On May 7, 2013 at 1:13 PM Markus Hartenfeller
   markus.hartenfel...@molecularhealth.com
   mailto:markus.hartenfel...@molecularhealth.com wrote:
   
 Thanks again for your reply. That's what I have tried:

  from rdkit import Chem
  from rdkit.Chem import AllChem
  import pandas as pd
  from rdkit.Chem import PandasTools
  from rdkit.Chem.Draw import IPythonConsole
  from IPython.core.display import HTML
  df = PandasTools.LoadSDF('test.sdf',
includeFingerprints=False)
  display(HTML(df.to_html()))

  So it is a dataframe and .to_html() works fine in general. I
see all sdf fields. It's just that the molecule column contains string
value of this kind:

  img
src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB
data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ...


  The notebook somehow does not realize that it is an html tag
with an image, but instead renders it as a normal string (just like
before with the single molecule).

  Best wishes,
  Markus


  On 05/07/2013 12:57 PM, Nikolas Fechner wrote:

 Just for clarification, are you
   trying to render a dataframe or a series/single
   column? The pandas series object has no
   to_html() method and is therefore rendered

Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-08 Thread Markus Hartenfeller

Hi,

Strange, I'm also using pandas 0.10.1, but it seems pretty obvious to me 
that the problem is related to that, although it's not exactly clear to 
me now why it should not happen at your system but on only on mine then :)


For the others following the conversation: Sorry for being sloppy and 
discussing with Niko directly and in German, but I wanted to check the 
hypothesis with him first to spare you the additional email traffic and 
just let you know the result once we found the problem:


When printing out the html code to a file as Niko suggested I realized 
that '' and '' at the beginning and the end of the img tag are masked 
in the html code as 'lt;' and 'gt;'. This makes the html parser of the 
browser ignore them (and at the same time displaying the correct 
characters in the string in the table). Unmasking them in the html code 
gives the correct renderings of the molecules.


Best,
Markus

On 05/08/2013 11:25 AM, Nikolas Fechner wrote:


Hi Markus,
Nice find! That could very likely be the cause for the problem. I just 
saw that in the very recent version 0.11 (22. April 2013) a new 
attribute was introduced to the pandas to_html() method that should 
have exactly that effect.


*escape : boolean, default True*
*Convert the characters , , and  to HTML-safe sequences.* 

This wasn't there in versions 0.10/0.10.1, which is what I was using 
so far. Are you using pandas 0.11? I will update my pandas and check 
that and if necessary find a way to deal with this in the PandasTools.

Thanks for finding that.
Best,
Niko

** 
** 



On May 8, 2013 at 10:59 AM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com wrote:

Hi Niko,

Ich weiss jetzt denke ich woran es liegt: Im Anhang findest du 2 
files: antibiotics.html ist der direkte print-out von python. Die 
Zeichen '' und '' am Anfang und am Ende des img tags sind im Code 
html-maskiert, also durch 'lt;' bzw. 'gt;' ersetzt. Deshalb werden 
sie im Browser auch 'normal' angezeigt. Wenn ich sie durch die ASCII 
Zeichen ersetze (wie im File _antibiotics.html) zeigt der browser die 
Strukturen korrekt an.


Wenn du mal Zeit dafuer hast: Kannst du das im code nachvollziehen?

Cheers,
Markus


On 05/08/2013 10:29 AM, Nikolas Fechner wrote:

Hi Markus,
Sorry, but I am running a bit out of ideas. Could you check whether 
the structures are rendered if you write the dataframe.to_html() 
to a file and open that as a webpage. If this works than it probably 
has to do something with the ipython environment (btw, which version 
are you using?).

Best,
Niko

On May 8, 2013 at 9:51 AM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com 
mailto:markus.hartenfel...@molecularhealth.com wrote:

Hi Niko,

I tried this piece of code adapted from the doctest and got the 
same result (table is fine, but no rendering of molecules):


from rdkit.Chem import PandasTools
import pandas as pd
import os
from rdkit import RDConfig
from rdkit.Chem.Draw import IPythonConsole
from IPython.core.display import HTML
antibiotics = pd.DataFrame(columns=['Name','Smiles'])
antibiotics = 
antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C','Name':'Penicilline 
G'}, ignore_index=True)#Penicilline G
antibiotics = 
antibiotics.append({'Smiles':'CC1(C2CC3C(C(=O)C(=C(C3(C(=O)C2=C(C4=C1C=CC=C4O)O)O)O)C(=O)N)N(C)C)O','Name':'Tetracycline'}, 
ignore_index=True)#Tetracycline
antibiotics = 
antibiotics.append({'Smiles':'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C','Name':'Ampicilline'}, 
ignore_index=True)#Ampicilline

PandasTools.AddMoleculeColumnToFrame(antibiotics,'Smiles','Molecule',includeFingerprints=True)
display(HTML(antibiotics.to_html()))


The img tag and the png encoding themselves are fine. If I paste 
one in a simple html page and open it with the same browser the 
molecule is rendered.


Best,
Markus



On 05/08/2013 09:03 AM, Fechner, Nikolas wrote:

Hi Markus,
Could you try the examples that are included as doctests in the 
PandasTools.py module? These should definitely work and show 
rendered molecules in the tables.

Best,
Niko
From: Markus Hartenfeller  
markus.hartenfel...@molecularhealth.com 
mailto:markus.hartenfel...@molecularhealth.com

Date: Tuesday, May 7, 2013 1:40 PM
To:  rdkit-discuss@lists.sourceforge.net 
mailto:rdkit-discuss@lists.sourceforge.net  
rdkit-discuss@lists.sourceforge.net 
mailto:rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] New module for RDKit - PANDAS 
integration
Sorry for the confusion, I truncated the string myself in the mail 
because I did not want to paste the whole beast. The fields 
contain the full strings and the tag is closed.


Best,
Markus

On 05/07/2013 01:25 PM, Nikolas Fechner wrote:
When developing the module I occasionally had problems with 
*very* long png strings, because the pandas maximal column width 
applies to the string, which is what is stored in the dataframe, 
before the image rendering. As an effect the truncated png string 
was shown

Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-07 Thread Markus Hartenfeller

Hi Nikolas,

I had a first look at the PandasTools package: very cool! I think this 
is going to be useful for many rdkit users. I'm looking forward to using 
it in the future. Thanks for sharing this module.


I'm having troubles to see the molecule depictions in the ipython 
notebook though (both in tables and by just printing out a single molecule).


This code in a ipython notebook

from rdkit import Chem
from rdkit.Chem import PandasTools
m=Chem.MolFromSmiles('N1CCNCC1')
print m

gives me

img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB 
...

a very long string with the base64 encoding of the image, but not the 
image itself. Plotting from matplotlib works fine. Did I forget to 
import something, or could it be a browser issue? I am using centOS 6 
and Firefox.


Thanks in advance.

Best,
Markus


On 04/19/2013 11:56 AM, Nikolas Fechner wrote:

Dear all,
We developed a new module ( rdkit.Chem.PandasTools.py ) that allows 
for using RDKit molecule objects directly in pandas dataframes. Pandas 
(http://pandas.pydata.org/) is a python library that offers table-like 
datacontainers, which are incredibly useful for anything related to 
data mining. Moreover, it integrates nicely with the ipython notebook 
producing rendered HTML tables for the dataframes. The RDKit 
integration allows to have molecule-type columns and functionality to 
perform substructure-based row filtering directly on the pandas table. 
Additionally, if a dataframe is exported as HTML or shown within an 
ipython notebook, the molecules in the table are rendered as 2D 
structures.
The new module is available in the current SF trunk and contains a 
doctest header that provides examples of how to use it.
I hope some of you find that interesting. As always, bug reports, 
comments, ideas... are very much appreciated.

Best,
Nikolas


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-07 Thread Nikolas Fechner
Hi Markus,
glad you think it could be useful :). Regarding the problem, there are two
things: You have to import the RDKit IPythonConsole to enable the molecule
rendering (from rdkit.Chem.Draw import IPythonConsole) and if you trigger the
output using 'print' the notebook will always use string rendering (AFAIK). Just
try 'm' alone (instead of 'print m'). Alternatively, you can always force the
notebook to do a HTML rendering (useful for large dataframe):

from IPython.core.display import HTML
display(HTML('''any HTML string e.g. dataframe.to_html()'''))

I hope that helps.

Best,
Niko

On May 7, 2013 at 10:02 AM Markus Hartenfeller
markus.hartenfel...@molecularhealth.com wrote:

 Hi Nikolas,
 
  I had a first look at the PandasTools package: very cool! I think this is
 going to be useful for many rdkit users. I'm looking forward to using it in
 the future. Thanks for sharing this module.
 
  I'm having troubles to see the molecule depictions in the ipython notebook
 though (both in tables and by just printing out a single molecule).
 
  This code in a ipython notebook
 
  from rdkit import Chem
  from rdkit.Chem import PandasTools
  m=Chem.MolFromSmiles('N1CCNCC1')
  print m
 
  gives me
 
   img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB
 data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ...
 
  a very long string with the base64 encoding of the image, but not the image
 itself. Plotting from matplotlib works fine. Did I forget to import something,
 or could it be a browser issue? I am using centOS 6 and Firefox.
 
  Thanks in advance.
 
  Best,
  Markus
 
 
  On 04/19/2013 11:56 AM, Nikolas Fechner wrote:
 
 Dear all,
   We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for
  using RDKit molecule objects directly in pandas dataframes. Pandas (
  http://pandas.pydata.org/ http://pandas.pydata.org/ ) is a python library
  that offers table-like datacontainers, which are incredibly useful for
  anything related to data mining. Moreover, it integrates nicely with the
  ipython notebook producing rendered HTML tables for the dataframes. The
  RDKit integration allows to have molecule-type columns and functionality to
  perform substructure-based row filtering directly on the pandas table.
  Additionally, if a dataframe is exported as HTML or shown within an ipython
  notebook, the molecules in the table are rendered as 2D structures.
  
   The new module is available in the current SF trunk and contains a
  doctest header that provides examples of how to use it.
  
   I hope some of you find that interesting. As always, bug reports,
  comments, ideas... are very much appreciated.
  
   Best,
   Nikolas
  
  
  
  
  
  
  
  --
   Precog is a next-generation analytics platform capable of advanced
   analytics on semi-structured data. The platform includes APIs for
  building
   apps and a phenomenal toolset for data science. Developers can use
   our toolset for easy data analysis  visualization. Get a free account!
   http://www2.precog.com/precogplatform/slashdotnewsletter
  http://www2.precog.com/precogplatform/slashdotnewsletter
  
  
  
   ___
   Rdkit-discuss mailing list
   Rdkit-discuss@lists.sourceforge.net
  mailto:Rdkit-discuss@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  

 
  --
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today!
http://p.sf.net/sfu/neotech_d2d_may___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-07 Thread Markus Hartenfeller
Thanks for your help, Niko. Importing the iPythonConsole from rdkit + 
removing the 'print' command did the trick for a single molecule :)


Unfortunately, molecules in data frames are still shown as strings, even 
when forcing html rendering. I will try to get this working and report 
here if I make any progress. In case somebody has already faced the same 
problem please let me know.


Best,
Markus


On 05/07/2013 10:27 AM, Nikolas Fechner wrote:

Hi Markus,
glad you think it could be useful :). Regarding the problem, there are 
two things: You have to import the RDKit IPythonConsole to enable the 
molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if 
you trigger the output using 'print' the notebook will always use 
string rendering (AFAIK). Just try 'm' alone (instead of 'print m'). 
Alternatively, you can always force the notebook to do a HTML 
rendering (useful for large dataframe):

from IPython.core.display import HTML
display(HTML('''any HTML string e.g. dataframe.to_html()'''))
I hope that helps.
Best,
Niko

On May 7, 2013 at 10:02 AM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com wrote:

Hi Nikolas,

I had a first look at the PandasTools package: very cool! I think 
this is going to be useful for many rdkit users. I'm looking forward 
to using it in the future. Thanks for sharing this module.


I'm having troubles to see the molecule depictions in the ipython 
notebook though (both in tables and by just printing out a single 
molecule).


This code in a ipython notebook

from rdkit import Chem
from rdkit.Chem import PandasTools
m=Chem.MolFromSmiles('N1CCNCC1')
print m

gives me
  img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB  ...
a very long string with the base64 encoding of the image, but not the 
image itself. Plotting from matplotlib works fine. Did I forget to 
import something, or could it be a browser issue? I am using centOS 6 
and Firefox.


Thanks in advance.

Best,
Markus


On 04/19/2013 11:56 AM, Nikolas Fechner wrote:

Dear all,
We developed a new module ( rdkit.Chem.PandasTools.py ) that allows 
for using RDKit molecule objects directly in pandas dataframes. 
Pandas ( http://pandas.pydata.org/) is a python library that offers 
table-like datacontainers, which are incredibly useful for anything 
related to data mining. Moreover, it integrates nicely with the 
ipython notebook producing rendered HTML tables for the dataframes. 
The RDKit integration allows to have molecule-type columns and 
functionality to perform substructure-based row filtering directly 
on the pandas table. Additionally, if a dataframe is exported as 
HTML or shown within an ipython notebook, the molecules in the table 
are rendered as 2D structures.
The new module is available in the current SF trunk and contains a 
doctest header that provides examples of how to use it.
I hope some of you find that interesting. As always, bug reports, 
comments, ideas... are very much appreciated.

Best,
Nikolas


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net  
mailto:Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




  
-- 


Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! 
http://p.sf.net/sfu/neotech_d2d_may___ 


Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-07 Thread Nikolas Fechner
Just for clarification, are you trying to render a dataframe or a series/single
column? The pandas series object has no to_html() method and is therefore
rendered as string only. Moreover, if you select a single column, e.g. 'ROMol'
from a dataframe by df['ROMol'] you will get a series object that is rendered as
string. If you select a set of columns you get a dataframe, for which the HTML
rendering should work. The latter also works for a single column if you enclose
in double brackets df[ ['ROMol' ]], which will give a single-column dataframe.
This took me some time to figure out and the silent conversion that sometimes
occurs can be quite confusing.

Best,
Niko

On May 7, 2013 at 11:33 AM Markus Hartenfeller
markus.hartenfel...@molecularhealth.com wrote:

 Thanks for your help, Niko. Importing the iPythonConsole from rdkit + removing
 the 'print' command did the trick for a single molecule :)
 
  Unfortunately, molecules in data frames are still shown as strings, even when
 forcing html rendering. I will try to get this working and report here if I
 make any progress. In case somebody has already faced the same problem please
 let me know.
 
  Best,
  Markus
 
 
  On 05/07/2013 10:27 AM, Nikolas Fechner wrote:
 
 Hi Markus,
   glad you think it could be useful :). Regarding the problem, there are
  two things: You have to import the RDKit IPythonConsole to enable the
  molecule rendering (from rdkit.Chem.Draw import IPythonConsole) and if you
  trigger the output using 'print' the notebook will always use string
  rendering (AFAIK). Just try 'm' alone (instead of 'print m'). Alternatively,
  you can always force the notebook to do a HTML rendering (useful for large
  dataframe):
  
   from IPython.core.display import HTML
   display(HTML('''any HTML string e.g. dataframe.to_html()'''))
  
   I hope that helps.
  
   Best,
   Niko
  
   On May 7, 2013 at 10:02 AM Markus Hartenfeller
  markus.hartenfel...@molecularhealth.com
  mailto:markus.hartenfel...@molecularhealth.com wrote:
  
   Hi Nikolas,
   
 I had a first look at the PandasTools package: very cool! I think
   this is going to be useful for many rdkit users. I'm looking forward to
   using it in the future. Thanks for sharing this module.
   
 I'm having troubles to see the molecule depictions in the ipython
   notebook though (both in tables and by just printing out a single
   molecule).
   
 This code in a ipython notebook
   
 from rdkit import Chem
 from rdkit.Chem import PandasTools
 m=Chem.MolFromSmiles('N1CCNCC1')
 print m
   
 gives me
   
  img
   src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB
   data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ...
   
 a very long string with the base64 encoding of the image, but not
   the image itself. Plotting from matplotlib works fine. Did I forget to
   import something, or could it be a browser issue? I am using centOS 6 and
   Firefox.
   
 Thanks in advance.
   
 Best,
 Markus
   
   
 On 04/19/2013 11:56 AM, Nikolas Fechner wrote:
   
   Dear all,
  We developed a new module ( rdkit.Chem.PandasTools.py ) that
allows for using RDKit molecule objects directly in pandas dataframes.
Pandas ( http://pandas.pydata.org/ http://pandas.pydata.org/ ) is a
python library that offers table-like datacontainers, which are
incredibly useful for anything related to data mining. Moreover, it
integrates nicely with the ipython notebook producing rendered HTML
tables for the dataframes. The RDKit integration allows to have
molecule-type columns and functionality to perform substructure-based
row filtering directly on the pandas table. Additionally, if a dataframe
is exported as HTML or shown within an ipython notebook, the molecules
in the table are rendered as 2D structures.

  The new module is available in the current SF trunk and
contains a doctest header that provides examples of how to use it.

  I hope some of you find that interesting. As always, bug
reports, comments, ideas... are very much appreciated.

  Best,
  Nikolas






 
--
  Precog is a next-generation analytics platform capable of
advanced
  analytics on semi-structured data. The platform includes APIs
for building
  apps and a phenomenal toolset for data science. Developers can
use
  our toolset for easy data analysis  visualization. Get a free
account!
  http://www2.precog.com/precogplatform/slashdotnewsletter
http://www2.precog.com/precogplatform/slashdotnewsletter



  

Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-07 Thread Nikolas Fechner
When developing the module I occasionally had problems with *very* long png
strings, because the pandas maximal column width applies to the string, which is
what is stored in the dataframe, before the image rendering. As an effect the
truncated png string was shown in the table (exactly the ...' ending shown in
your example).
 You could try manually setting the maximal width very high (e.g.
pandas.set_option(display.max_colwidth,10)). This should be done
automatically by the PandasTools, which sets it the len(PNG)+100 for the longest
string found during rendering, but because this rarely had an impact I could
very well have overseen some problems with this strategy.

Best,
Niko

On May 7, 2013 at 1:13 PM Markus Hartenfeller
markus.hartenfel...@molecularhealth.com wrote:

 Thanks again for your reply. That's what I have tried:
 
  from rdkit import Chem
  from rdkit.Chem import AllChem
  import pandas as pd
  from rdkit.Chem import PandasTools
  from rdkit.Chem.Draw import IPythonConsole
  from IPython.core.display import HTML
  df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False)
  display(HTML(df.to_html()))
 
  So it is a dataframe and .to_html() works fine in general. I see all sdf
 fields. It's just that the molecule column contains string value of this kind:
 
  img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB
 data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ...
 
 
  The notebook somehow does not realize that it is an html tag with an image,
 but instead renders it as a normal string (just like before with the single
 molecule).
 
  Best wishes,
  Markus
 
 
  On 05/07/2013 12:57 PM, Nikolas Fechner wrote:
 
 Just for clarification, are you trying to render a dataframe or
a series/single column? The pandas series object has no to_html()
method and is therefore rendered as string only. Moreover, if you
select a single column, e.g. 'ROMol' from a dataframe by df['ROMol']
you will get a series object that is rendered as string. If you
select a set of columns you get a dataframe, for which the HTML
rendering should work. The latter also works for a single column if
you enclose in double brackets df[ ['ROMol' ]], which will give a
single-column dataframe. This took me some time to figure out and the
silent conversion that sometimes occurs can be quite confusing.
  
   Best,
   Niko
  
   On May 7, 2013 at 11:33 AM Markus Hartenfeller
  markus.hartenfel...@molecularhealth.com
  mailto:markus.hartenfel...@molecularhealth.com wrote:
  
   Thanks for your help, Niko. Importing the iPythonConsole from
   rdkit + removing the 'print' command did the trick for a single
   molecule :)
   
 Unfortunately, molecules in data frames are still shown as strings,
   even when forcing html rendering. I will try to get this working and
   report here if I make any progress. In case somebody has already faced the
   same problem please let me know.
   
 Best,
 Markus
   
   
 On 05/07/2013 10:27 AM, Nikolas Fechner wrote:
   
   Hi Markus,
  glad you think it could be useful :). Regarding the problem,
there are two things: You have to import the RDKit IPythonConsole to
enable the molecule rendering (from rdkit.Chem.Draw import
IPythonConsole) and if you trigger the output using 'print' the notebook
will always use string rendering (AFAIK). Just try 'm' alone (instead of
'print m'). Alternatively, you can always force the notebook to do a
HTML rendering (useful for large dataframe):

  from IPython.core.display import HTML
  display(HTML('''any HTML string e.g. dataframe.to_html()'''))

  I hope that helps.

  Best,
  Niko

  On May 7, 2013 at 10:02 AM Markus Hartenfeller
markus.hartenfel...@molecularhealth.com
mailto:markus.hartenfel...@molecularhealth.com wrote:

Hi Nikolas,
 
I had a first look at the PandasTools package: very cool! I
 think this is going to be useful for many rdkit users. I'm looking
 forward to using it in the future. Thanks for sharing this module.
 
I'm having troubles to see the molecule depictions in the
 ipython notebook though (both in tables and by just printing out a
 single molecule).
 
This code in a ipython notebook
 
from rdkit import Chem
from rdkit.Chem import PandasTools
m=Chem.MolFromSmiles('N1CCNCC1')
print m
 
gives me
 
 img
 src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB
 data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB ...
 
a very long string with the base64 

Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-05-07 Thread Markus Hartenfeller
Sorry for the confusion, I truncated the string myself in the mail 
because I did not want to paste the whole beast. The fields contain the 
full strings and the tag is closed.


Best,
Markus

On 05/07/2013 01:25 PM, Nikolas Fechner wrote:
When developing the module I occasionally had problems with *very* 
long png strings, because the pandas maximal column width applies to 
the string, which is what is stored in the dataframe, before the image 
rendering. As an effect the truncated png string was shown in the 
table (exactly the ...' ending shown in your example).
 You could try manually setting the maximal width very high (e.g. 
pandas.set_option(display.max_colwidth,10)). This should be done 
automatically by the PandasTools, which sets it the len(PNG)+100 for 
the longest string found during rendering, but because this rarely had 
an impact I could very well have overseen some problems with this 
strategy.

Best,
Niko

On May 7, 2013 at 1:13 PM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com wrote:

Thanks again for your reply. That's what I have tried:

from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import IPythonConsole
from IPython.core.display import HTML
df = PandasTools.LoadSDF('test.sdf', includeFingerprints=False)
display(HTML(df.to_html()))

So it is a dataframe and .to_html() works fine in general. I see all 
sdf fields. It's just that the molecule column contains string value 
of this kind:

img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB  ...

The notebook somehow does not realize that it is an html tag with an 
image, but instead renders it as a normal string (just like before 
with the single molecule).


Best wishes,
Markus


On 05/07/2013 12:57 PM, Nikolas Fechner wrote:
Just for clarification, are you trying to render a dataframe or a 
series/single column? The pandas series object has no to_html() 
method and is therefore rendered as string only. Moreover, if you 
select a single column, e.g. 'ROMol' from a dataframe by df['ROMol'] 
you will get a series object that is rendered as string. If you 
select a set of columns you get a dataframe, for which the HTML 
rendering should work. The latter also works for a single column if 
you enclose in double brackets df[ *[*'ROMol' *]*], which will give 
a single-column dataframe. This took me some time to figure out and 
the silent conversion that sometimes occurs can be quite confusing.

Best,
Niko

On May 7, 2013 at 11:33 AM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com 
mailto:markus.hartenfel...@molecularhealth.com wrote:
Thanks for your help, Niko. Importing the iPythonConsole from rdkit 
+ removing the 'print' command did the trick for a single molecule :)


Unfortunately, molecules in data frames are still shown as strings, 
even when forcing html rendering. I will try to get this working 
and report here if I make any progress. In case somebody has 
already faced the same problem please let me know.


Best,
Markus


On 05/07/2013 10:27 AM, Nikolas Fechner wrote:

Hi Markus,
glad you think it could be useful :). Regarding the problem, there 
are two things: You have to import the RDKit IPythonConsole to 
enable the molecule rendering (from rdkit.Chem.Draw import 
IPythonConsole) and if you trigger the output using 'print' the 
notebook will always use string rendering (AFAIK). Just try 'm' 
alone (instead of 'print m'). Alternatively, you can always force 
the notebook to do a HTML rendering (useful for large dataframe):

from IPython.core.display import HTML
display(HTML('''any HTML string e.g. dataframe.to_html()'''))
I hope that helps.
Best,
Niko

On May 7, 2013 at 10:02 AM Markus Hartenfeller 
markus.hartenfel...@molecularhealth.com 
mailto:markus.hartenfel...@molecularhealth.com wrote:

Hi Nikolas,

I had a first look at the PandasTools package: very cool! I think 
this is going to be useful for many rdkit users. I'm looking 
forward to using it in the future. Thanks for sharing this module.


I'm having troubles to see the molecule depictions in the ipython 
notebook though (both in tables and by just printing out a single 
molecule).


This code in a ipython notebook

from rdkit import Chem
from rdkit.Chem import PandasTools
m=Chem.MolFromSmiles('N1CCNCC1')
print m

gives me
  img src=data:image/png;base64,iVBORw0KGgoNSUhEUgAAASwAAAEsCAYAAAB  ...
a very long string with the base64 encoding of the image, but not 
the image itself. Plotting from matplotlib works fine. Did I 
forget to import something, or could it be a browser issue? I am 
using centOS 6 and Firefox.


Thanks in advance.

Best,
Markus


On 04/19/2013 11:56 AM, Nikolas Fechner wrote:

Dear all,
We developed a new module ( rdkit.Chem.PandasTools.py ) that 
allows for using RDKit molecule objects directly in pandas 
dataframes. Pandas ( http://pandas.pydata.org/) is a python 
library that offers table-like datacontainers, which 

Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-22 Thread Patrick Walters
I just started playing around with the Pandas module, this is very cool
stuff.  Thanks so much Nikolas for the contribution.  I definitely owe you
a beer at the UGM.  It might be worth noting that the you need to install
PIL in order to use the Pandas module.  Everything will install without a
problem, but you'll get an exception like this when you try to print a
dataframe without PIL.

File /Users/walters/python/RDKIT_2013_04_21/rdkit/sping/PIL/pidPIL.py,
line 33, in module
import Image, ImageFont, ImageDraw

Best,

Pat



On Sun, Apr 21, 2013 at 5:00 PM, Taka Seri serit...@gmail.com wrote:

 Dear Greg.

 Thank you your quick reply !
 The modified version was worked without AvalonTools .
 That's nice tool .
 I appreciate your kindness.

 Takayuki

 2013/4/22 Greg Landrum greg.land...@gmail.com

 Dear Takayuki,

 On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote:
 
  I'm interested in this work
  I want to use PandasTools.
  But I got error message, ImportError: cannot import name
 pyAvalonTools.

 I just checked in a modified version that will work when the avalon
 tools are not installed.

 If you want to install the avalon tools anyway, there's information
 below that shows how:

 
  So, I tried to rebuild RDKit like this.
  $ cmake -D RDK_BUID_AVALON_SUPPORT=ON
  But build was failed.
 
  -- Configuring done
  CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library):
Cannot find source file:
 
  /common/layout.c
 
Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm
 .hpp
.hxx .in .txx
  Call Stack (most recent call first):
External/AvalonTools/CMakeLists.txt:43 (rdkit_library)
 
  If anyone who has suggestion, please help me.

 You need to tell it where to find the source for the avalon tools.

 - Download the source from here:

 http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download

 - Create an avalon tools directory somewhere, for example in
 /usr/local/src/avalontools.
 - Extract the tar file in that directory.
 - Run cmake as follows:
 cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution
 -DRDK_BUILD_AVALON_SUPPORT=ON

 Best,
 -greg




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-22 Thread Nikolas Fechner
Hi Pat,
I am glad you find it useful. Many thanks for pointing out the PIL dependency. 
I had installed It already for different reasons and did not think about 
mentioning it.

Best,
Niko



On 22 Apr 2013, at 17:52, Patrick Walters wpwalt...@gmail.com wrote:

 I just started playing around with the Pandas module, this is very cool 
 stuff.  Thanks so much Nikolas for the contribution.  I definitely owe you a 
 beer at the UGM.  It might be worth noting that the you need to install PIL 
 in order to use the Pandas module.  Everything will install without a 
 problem, but you'll get an exception like this when you try to print a 
 dataframe without PIL.
 
 File /Users/walters/python/RDKIT_2013_04_21/rdkit/sping/PIL/pidPIL.py, line 
 33, in module
 import Image, ImageFont, ImageDraw
 
 Best,
 
 Pat
 
 
 
 On Sun, Apr 21, 2013 at 5:00 PM, Taka Seri serit...@gmail.com wrote:
 Dear Greg.
 
 Thank you your quick reply !
 The modified version was worked without AvalonTools . 
 That's nice tool .
 I appreciate your kindness.
 
 Takayuki
 
 2013/4/22 Greg Landrum greg.land...@gmail.com
 Dear Takayuki,
 
 On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote:
 
  I'm interested in this work
  I want to use PandasTools.
  But I got error message, ImportError: cannot import name pyAvalonTools.
 
 I just checked in a modified version that will work when the avalon
 tools are not installed.
 
 If you want to install the avalon tools anyway, there's information
 below that shows how:
 
 
  So, I tried to rebuild RDKit like this.
  $ cmake -D RDK_BUID_AVALON_SUPPORT=ON
  But build was failed.
 
  -- Configuring done
  CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library):
Cannot find source file:
 
  /common/layout.c
 
Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx
  Call Stack (most recent call first):
External/AvalonTools/CMakeLists.txt:43 (rdkit_library)
 
  If anyone who has suggestion, please help me.
 
 You need to tell it where to find the source for the avalon tools.
 
 - Download the source from here:
 http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download
 
 - Create an avalon tools directory somewhere, for example in
 /usr/local/src/avalontools.
 - Extract the tar file in that directory.
 - Run cmake as follows:
 cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution
 -DRDK_BUILD_AVALON_SUPPORT=ON
 
 Best,
 -greg
 
 
 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-21 Thread Greg Landrum
Dear Takayuki,

On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote:

 I'm interested in this work
 I want to use PandasTools.
 But I got error message, ImportError: cannot import name pyAvalonTools.

I just checked in a modified version that will work when the avalon
tools are not installed.

If you want to install the avalon tools anyway, there's information
below that shows how:


 So, I tried to rebuild RDKit like this.
 $ cmake -D RDK_BUID_AVALON_SUPPORT=ON
 But build was failed.

 -- Configuring done
 CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library):
   Cannot find source file:

 /common/layout.c

   Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
   .hxx .in .txx
 Call Stack (most recent call first):
   External/AvalonTools/CMakeLists.txt:43 (rdkit_library)

 If anyone who has suggestion, please help me.

You need to tell it where to find the source for the avalon tools.

- Download the source from here:
http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download

- Create an avalon tools directory somewhere, for example in
/usr/local/src/avalontools.
- Extract the tar file in that directory.
- Run cmake as follows:
cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution
-DRDK_BUILD_AVALON_SUPPORT=ON

Best,
-greg

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-21 Thread Taka Seri
Dear Greg.

Thank you your quick reply !
The modified version was worked without AvalonTools .
That's nice tool .
I appreciate your kindness.

Takayuki

2013/4/22 Greg Landrum greg.land...@gmail.com

 Dear Takayuki,

 On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote:
 
  I'm interested in this work
  I want to use PandasTools.
  But I got error message, ImportError: cannot import name pyAvalonTools.

 I just checked in a modified version that will work when the avalon
 tools are not installed.

 If you want to install the avalon tools anyway, there's information
 below that shows how:

 
  So, I tried to rebuild RDKit like this.
  $ cmake -D RDK_BUID_AVALON_SUPPORT=ON
  But build was failed.
 
  -- Configuring done
  CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library):
Cannot find source file:
 
  /common/layout.c
 
Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm
 .hpp
.hxx .in .txx
  Call Stack (most recent call first):
External/AvalonTools/CMakeLists.txt:43 (rdkit_library)
 
  If anyone who has suggestion, please help me.

 You need to tell it where to find the source for the avalon tools.

 - Download the source from here:

 http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download

 - Create an avalon tools directory somewhere, for example in
 /usr/local/src/avalontools.
 - Extract the tar file in that directory.
 - Run cmake as follows:
 cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution
 -DRDK_BUILD_AVALON_SUPPORT=ON

 Best,
 -greg

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-19 Thread Nikolas Fechner
Dear all,
We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for using
RDKit molecule objects directly in pandas dataframes. Pandas
(http://pandas.pydata.org/) is a python library that offers table-like
datacontainers, which are incredibly useful for anything related to data mining.
Moreover, it integrates nicely with the ipython notebook producing rendered HTML
tables for the dataframes. The RDKit integration allows to have molecule-type
columns and functionality to perform substructure-based row filtering directly
on the pandas table. Additionally, if a dataframe is exported as HTML or shown
within an ipython notebook, the molecules in the table are rendered as 2D
structures.

The new module is available in the current SF trunk and contains a doctest
header that provides examples of how to use it.

I hope some of you find that interesting. As always, bug reports, comments,
ideas... are very much appreciated.

Best,
Nikolas

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-19 Thread Greg Landrum
I think Nikolas is being a bit modest... the Pandas integration is
pretty cool. :-)

Here's an example of using it from the IPython prompt (it's better in
the notebook, but that doesn't paste so nicely into email)

Loading an SD file:

In [1]: from rdkit import Chem

In [2]: from rdkit.Chem import PandasTools

In [3]: import pandas as pd

In [4]: df = 
PandasTools.LoadSDF('hERG_inhibition_dataset.sdf',includeFingerprints=True)

In [5]: df
Out[5]:
class 'pandas.core.frame.DataFrame'
Int64Index: 242 entries, 0 to 241
Data columns:
ACTIVITY_CLASS242  non-null values
CompoundName  242  non-null values
ID242  non-null values
MDLPublicKeys 242  non-null values
SMILES242  non-null values
pIC50 242  non-null values
ROMol 242  non-null values
dtypes: object(7)


And doing a substructure search:

In [6]: N3s = df[df['ROMol']=Chem.MolFromSmiles('N(C)(C)C')]

In [7]: N3s
Out[7]:
class 'pandas.core.frame.DataFrame'
Int64Index: 177 entries, 0 to 239
Data columns:
ACTIVITY_CLASS177  non-null values
CompoundName  177  non-null values
ID177  non-null values
MDLPublicKeys 177  non-null values
SMILES177  non-null values
pIC50 177  non-null values
ROMol 177  non-null values
dtypes: object(7)

Because I used the includeFingerprints argument, that actually did
the search using a substructure fingerprint to speed things up. This
is using the avalon fingerprint at the moment, but that will change
between now and the release so as to not add an additional dependency.

-greg

On Fri, Apr 19, 2013 at 11:56 AM, Nikolas Fechner niko...@fechner.cc wrote:
 Dear all,
 We developed a new module ( rdkit.Chem.PandasTools.py ) that allows for
 using RDKit molecule objects directly in pandas dataframes. Pandas
 (http://pandas.pydata.org/) is a python library that offers table-like
 datacontainers, which are incredibly useful for anything related to data
 mining. Moreover, it integrates nicely with the ipython notebook producing
 rendered HTML tables for the dataframes. The RDKit integration allows to
 have molecule-type columns and functionality to perform substructure-based
 row filtering directly on the pandas table. Additionally, if a dataframe is
 exported as HTML or shown within an ipython notebook, the molecules in the
 table are rendered as 2D structures.

 The new module is available in the current SF trunk and contains a doctest
 header that provides examples of how to use it.

 I hope some of you find that interesting. As always, bug reports, comments,
 ideas... are very much appreciated.

 Best,
 Nikolas



 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss