Re: [Rdkit-discuss] Ipc descriptor values too large for float32

2019-04-05 Thread Taka Seri
Hi Christoph,

I think same topic was discussed in github issue.
You can use Use avg=True option to avoid the problem.
I hope the URL would be help for you.
https://github.com/rdkit/rdkit/issues/1527

 Best regards,
Taka

2019年4月5日(金) 18:37 Christoph Hillisch :

> Hello all,
>
> I use RDKit to calculate descriptors, which I use to train a random forest
> model in scikit-learn.
> Since I do not scale my training data, I run into the problem that the
> descriptor Ipc may contain huge figures (1E+50), which then are too large
> for the data type float32 used in sklearn.
>
> Is there a way of making sure the value of this descriptor fits in
> float32, without scaling my data?
> Otherwise I’d probably have to remove this descriptor from my model.
>
> Best regards,
> Christoph
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of the 2019.03 release available

2019-04-05 Thread Markus Sitzmann
Hi Greg,

my Chembience RDKit image build with version 2019.03-b1b went fine (well, I
just pull it with conda; in case someone is interested it is available with
tag 0.2.10-beta-1 at Dockerhub).

For the Postgres extension (which I still compile myself during the Docker
build against Postgress), your python 3 enforcement uncovered some dark
corners of my build process, but that is fixed. However, compiling
2019.03-b1b against Postgres 11 fails during compilation (am I too cheeky?).

Markus

On Wed, Apr 3, 2019 at 11:38 AM Greg Landrum  wrote:

> Dear all,
>
> The beta of the 2019.03 RDKit release has been tagged in github:
> https://github.com/rdkit/rdkit/releases/tag/Release_2019_03_1b1
>
> There are a couple more bug fixes and maybe one more feature expected
> before the actual release, but I wanted to go ahead and get the beta out
> there.
>
> I've done conda builds for Python 3.6 and 3.7 for Windows, Mac, and Linux.
> These all use the beta label so that they do not install by default; you'll
> need to run "conda install" as follows:
>
> conda install -c rdkit/label/beta rdkit
>
> Be sure to confirm that it's installing the right version when you are
> prompted (if there's no build available, it will pick the current
> production release instead).
>
> The relevant section of the release notes is below, or you can see a
> nicely formatted version here:
> https://github.com/rdkit/rdkit/releases/tag/Release_2019_03_1b1
>
> As usual, if you have time to try out the new release I would love
> feedback. If nothing major comes up, I plan to do the actual release early
> next week.
>
> Best,
> -greg
>
> # Release_2019.03.1
> (Changes relative to Release_2018.09.1)
>
> ## REALLY IMPORTANT ANNOUNCEMENT
> - As of this realease (2019.03.1) the RDKit no longer supports Python 2. 
> Please read this rdkit-discuss post to learn what your options are if you 
> need to keep using Python 2:
>   
> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08354.html
>
> ## Backwards incompatible changes
> - The fix for github #2245 means that the default behavior of the MaxMinPicker
>   is now truly random. If you would like to reproduce the previous behavior,
>   provide a seed value of 42.
> - The uncharging method in the MolStandardizer now attempts to generate
>   canonical results for a given molecule. This may result in different output
>   for some molecules.
>
> ## Highlights:
> - There's now a Japanese translation of large parts of the RDKit documentation
> - SGroup data can now be read from and written to Mol/SDF files
> - The enhanced stereo handling has been improved: the information is now
>   accessible from Python, EnumerateStereoisomers takes advantage of it, and it
>   can be read from and written to CXSmiles
>
> ## Acknowledgements:
> Michael Banck, Francois Berenger, Thomas Blaschke, Brian Cole, Andrew Dalke,
> Bakary N'tji Diallo, Guillaume Godin, Jan Holst Jensen, Sunhwan Jo, Brian
> Kelley, Petr Kubat, Karl Leswing, Susan Leung, John Mayfield, Adam Moyer, Dan
> Nealschneider, Noel O'Boyle, Stephen Roughley, Takayuki Serizawa, Gianluca
> Sforna, Ricardo Rodriguez Schmidt, Matt Swain, Paolo Tosco, Ricardo Vianello,
> 'John-Videogames', 'magattaca', 'msteijaert', 'paconius', 'sirbiscuit'
>
> ## Bug Fixes:
>   - PgSQL: fix boolean definitions for Postgresql 11
>  (github pull #2129 from pkubatrh)
>   - update fingerprint tutorial notebook
>  (github pull #2130 from greglandrum)
>   - Fix typo in RecapHierarchyNode destructor
>  (github pull #2137 from iwatobipen)
>   - SMARTS roundtrip failure
>  (github issue #2142 from mcs07)
>   - Error thrown in rdMolStandardize.ChargeParent
>  (github issue #2144 from paconius)
>   - SMILES parsing inconsistency based on input order
>  (github issue #2148 from coleb)
>   - MolDraw2D: line width not in python wrapper
>  (github issue #2149 from greglandrum)
>   - Missing Python API Documentation
>  (github issue #2158 from greglandrum)
>   - PgSQL: mol_to_svg() changes input molecule.
>  (github issue #2174 from janholstjensen)
>   - Remove Unicode From AcidBasePair Name
>  (github pull #2185 from lilleswing)
>   - Inconsistent treatment of `[as]` in SMILES and SMARTS
>  (github issue #2197 from greglandrum)
>   - RGroupDecomposition fixes, keep userLabels more robust onlyMatchAtRGroups
>  (github pull #2202 from bp-kelley)
>   - Fix TautomerTransform in operator=
>  (github pull #2203 from bp-kelley)
>   - testEnumeration hangs/takes where long on 32bit architectures
>  (github issue #2209 from mbanck)
>   - Silencing some Python 3 warning messages
>  (github pull #2223 from coleb)
>   - removeHs shouldn't remove atom lists
>  (github issue #2224 from rvianello)
>   - failure round-tripping mol block with Q atom
>  (github issue #2225 from rvianello)
>   - problem round-tripping mol files that include bond topology info
>  (github issue #2229 from rvianello)
>   - aromatic main-group atoms written to SMARTS incorrectly
>  (github issue #2237 from 

[Rdkit-discuss] Ipc descriptor values too large for float32

2019-04-05 Thread Christoph Hillisch
Hello all,

I use RDKit to calculate descriptors, which I use to train a random forest 
model in scikit-learn.
Since I do not scale my training data, I run into the problem that the 
descriptor Ipc may contain huge figures (1E+50), which then are too large for 
the data type float32 used in sklearn.

Is there a way of making sure the value of this descriptor fits in float32, 
without scaling my data?
Otherwise I’d probably have to remove this descriptor from my model.

Best regards,
Christoph

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss