Re: [Rdkit-discuss] RDKit Release 2018.09.2 available

2019-02-21 Thread Markus Sitzmann
Hi Greg,

I just saw it is available in the conda-forge channel (with a time stamp of
2 hours + a few minutes), however, if I install it from there (in a fresh
container) I receive 2018_09_1 - only when I explicitly force version
2018_09_2 I receive it (and at a very fast glance it is running).

But why do I have to request version _02 explicitly (right at the moment)
... this is one of the few things I never will get with conda?

Markus


On Thu, Feb 21, 2019 at 5:32 PM Greg Landrum  wrote:

> Dear all,
>
> I normally don't announce the patch releases, but there are a couple of
> changes with the conda builds, so I figured I should probably mention it.
> :-)
>
> This time I did builds for:
> Python 3.7: Mac, Linux, Windows
> Python 3.6: Mac, Linux, Windows
> Python 2.7: Mac, Linux
>
> The boost and numpy dependencies have also been changed.
>
> The conda-forge channel should be updated in the near future as well.
>
> The release notes and source download are here:
> https://github.com/rdkit/rdkit/releases/tag/Release_2018_09_2
>
> Hopefully this all works smoothly, but I'm not 100% optimistic about that;
> please let me know if you encounter any problems with the new builds!
> -greg
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Release 2018.09.2 available

2019-02-21 Thread Greg Landrum
Dear all,

I normally don't announce the patch releases, but there are a couple of
changes with the conda builds, so I figured I should probably mention it.
:-)

This time I did builds for:
Python 3.7: Mac, Linux, Windows
Python 3.6: Mac, Linux, Windows
Python 2.7: Mac, Linux

The boost and numpy dependencies have also been changed.

The conda-forge channel should be updated in the near future as well.

The release notes and source download are here:
https://github.com/rdkit/rdkit/releases/tag/Release_2018_09_2

Hopefully this all works smoothly, but I'm not 100% optimistic about that;
please let me know if you encounter any problems with the new builds!
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Save the Date for the RDKit UGM: 25-27 September

2019-02-21 Thread Greg Landrum
Dear all,

This year's RDKit UGM will take place from 25-27 September in at the
University of Hamburg in Hamburg, Germany. Emanuel Ehmki will be the
organizer/host this time around.

We'll put together the usual announcement and registration page over the
next couple of weeks, but definitely go ahead and mark your calendars for
it. :-)

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] idea for a general discussion thread

2019-02-21 Thread Mario Lovrić
Dear all,

Being that I (and I guess others) often have general cheminformats
questions I want to propose to add a general discussion thread, e.g "RDKit
general"
There is a cheminformatics tag on the chemistry.stackexchange but I find
the RDKit community much more open-minded, highly specialized and pleasant
to discuss with. And there are no anonymous "reputation" chasers here.

Let me know your thoughts on this.


Thanks
-- 
Mario Lovrić
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] warnings when exporting pandas tables with molecules to hdf

2019-02-21 Thread Jose Manuel Gally

Hi Greg,

thanks for your input, this is quite faster!

Cheers,
Jose Manuel

On 21.02.19 09:48, Greg Landrum wrote:

This is a nice solution to the problem. Thanks for sharing it!

I think there is, however a minor mistake. This line:

df['mol'] = df['mol'].map(lambda x: 
base64.b64encode(pickle.dumps(x)).decode())


should be:

df['mol'] = df['mol'].map(lambda x: 
base64.b64encode(x.ToBinary()).decode())


You could also fix this by changing how you decode the column, but 
this approach is faster.


-greg


On Mon, Feb 18, 2019 at 11:28 AM Jose Manuel Gally 
mailto:jose.manuel.ga...@gmail.com>> wrote:


Dear all,

in case this is helpful for others, here is the solution I came up
with by combining 2 snippets of code [1, 2]:

# init
import base64
from rdkit import Chem
n_records = 10
file='/tmp/test.hdf'
key='test'
df = pd.DataFrame({'mol': [Chem.MolFromSmiles('C1C1')] *
n_records})

# store the molecule as base64 encoding strings
df['mol'] = df['mol'].map(lambda x:
base64.b64encode(pickle.dumps(x)).decode())
df.to_hdf(file, key=key)

# read the stored molecules and convert them back to molecules
df = df = pd.read_hdf(file, key=key)
df['mol'] = df['mol'].map(lambda x: Chem.Mol(base64.b64decode(x)))

This is much faster than exporting to MolBlock because there is no
need for reparsing molecules and I got rid of the Pytables warning.
With this I could even just use good old csv files instead of hdf.

Cheers,
Jose Manuel

Refs:
[1]

https://github.com/rdkit/UGM_2016/blob/master/Notebooks/Pahl_NotebookTools_Tutorial.ipynb
[2]
http://rdkit.blogspot.com/2016/09/avoiding-unnecessary-work-and.html



On 15.02.19 22:21, Jose Manuel Gally wrote:


Dear Peter,

thank you for your reply.

That might work for me, I'll look into it.

As a side note, if I convert the Mol into RWMol, I don't get the
warning anymore (but then I cannot read the molecules anymore...)

Cheers,
Jose Manuel

On 15.02.19 17:14, Peter St. John wrote:

you might be better off not storing the molecule RDkit objects
themselves in the hdf file; but rather some other representation
of the molecule. If you need 3D atom coordinates, you could call
MolToMolBlock() on each of the rdkit mols, and then
MolFromMolBlock later to regenerate them. If you don't need 3D
atom coordinates to get saved, SMILES strings would work well.

PyTables is expecting each entry to be something like an 'int',
'string', 'float64', etc. So the RDKit mol object is a fairly
odd data structure for that library; and it's just warning you
that it will have to use Python's `pickle` module to serialize it.

On Fri, Feb 15, 2019 at 6:35 AM Jose Manuel Gally
mailto:jose.manuel.ga...@gmail.com>> wrote:

Hi all,

I am working on some molecules in a pandas DataFrame and
have to export
them to a hdf file.

This works just fine but I get a warning about Performance
due to mixed
types. (1)

Why are RDKIT Mol objects causing this warning in the first
place? Am I
doing something wrong?

Please find attached a small notebook with an example.

For now I set the type of hdf to 'table', but I'm unsure
this is the
best work-around.

Also, invoking pytest with --disable-warnings flag removes
the message
but the warning itself remains.

Thanks in advance for any hindsight!

Cheers,
Jose Manuel

(1) PerformanceWarning:
your performance may suffer as PyTables will pickle object
types that it
cannot
map directly to c-types [inferred_type->mixed,key->values]
[items->None]

   return pytables.to_hdf(path_or_buf, key, self, **kwargs)

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] warnings when exporting pandas tables with molecules to hdf

2019-02-21 Thread Greg Landrum
This is a nice solution to the problem. Thanks for sharing it!

I think there is, however a minor mistake. This line:

df['mol'] = df['mol'].map(lambda x:
base64.b64encode(pickle.dumps(x)).decode())

should be:

df['mol'] = df['mol'].map(lambda x: base64.b64encode(x.ToBinary()).decode())

You could also fix this by changing how you decode the column, but this
approach is faster.

-greg


On Mon, Feb 18, 2019 at 11:28 AM Jose Manuel Gally <
jose.manuel.ga...@gmail.com> wrote:

> Dear all,
>
> in case this is helpful for others, here is the solution I came up with by
> combining 2 snippets of code [1, 2]:
>
> # init
> import base64
> from rdkit import Chem
> n_records = 10
> file='/tmp/test.hdf'
> key='test'
> df = pd.DataFrame({'mol': [Chem.MolFromSmiles('C1C1')] * n_records})
>
> # store the molecule as base64 encoding strings
> df['mol'] = df['mol'].map(lambda x:
> base64.b64encode(pickle.dumps(x)).decode())
> df.to_hdf(file, key=key)
>
> # read the stored molecules and convert them back to molecules
> df = df = pd.read_hdf(file, key=key)
> df['mol'] = df['mol'].map(lambda x: Chem.Mol(base64.b64decode(x)))
>
> This is much faster than exporting to MolBlock because there is no need
> for reparsing molecules and I got rid of the Pytables warning.
> With this I could even just use good old csv files instead of hdf.
>
> Cheers,
> Jose Manuel
>
> Refs:
> [1]
> https://github.com/rdkit/UGM_2016/blob/master/Notebooks/Pahl_NotebookTools_Tutorial.ipynb
> [2] http://rdkit.blogspot.com/2016/09/avoiding-unnecessary-work-and.html
>
>
>
> On 15.02.19 22:21, Jose Manuel Gally wrote:
>
> Dear Peter,
>
> thank you for your reply.
>
> That might work for me, I'll look into it.
>
> As a side note, if I convert the Mol into RWMol, I don't get the warning
> anymore (but then I cannot read the molecules anymore...)
>
> Cheers,
> Jose Manuel
> On 15.02.19 17:14, Peter St. John wrote:
>
> you might be better off not storing the molecule RDkit objects themselves
> in the hdf file; but rather some other representation of the molecule. If
> you need 3D atom coordinates, you could call MolToMolBlock() on each of the
> rdkit mols, and then MolFromMolBlock later to regenerate them. If you don't
> need 3D atom coordinates to get saved, SMILES strings would work well.
>
> PyTables is expecting each entry to be something like an 'int', 'string',
> 'float64', etc. So the RDKit mol object is a fairly odd data structure for
> that library; and it's just warning you that it will have to use Python's
> `pickle` module to serialize it.
>
> On Fri, Feb 15, 2019 at 6:35 AM Jose Manuel Gally <
> jose.manuel.ga...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am working on some molecules in a pandas DataFrame and have to export
>> them to a hdf file.
>>
>> This works just fine but I get a warning about Performance due to mixed
>> types. (1)
>>
>> Why are RDKIT Mol objects causing this warning in the first place? Am I
>> doing something wrong?
>>
>> Please find attached a small notebook with an example.
>>
>> For now I set the type of hdf to 'table', but I'm unsure this is the
>> best work-around.
>>
>> Also, invoking pytest with --disable-warnings flag removes the message
>> but the warning itself remains.
>>
>> Thanks in advance for any hindsight!
>>
>> Cheers,
>> Jose Manuel
>>
>> (1) PerformanceWarning:
>> your performance may suffer as PyTables will pickle object types that it
>> cannot
>> map directly to c-types [inferred_type->mixed,key->values] [items->None]
>>
>>return pytables.to_hdf(path_or_buf, key, self, **kwargs)
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss