[Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Wandré
Hi, (first of all, sorry by my poor english...) I'm trying to clustering a large dataset of molecules, but, in a server with 64GB of RAM and 32 cores, all RAM memory and cache are occuped and, after 10 hours, the clustering is not calculated yet. My set of molecules have more than 1 million of hits

Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Andrew Dalke
Hi Wandré, You may want to look at chemfp for this sort of clustering. Last year Chris Swain reviewed a few different ways to do clustering, at https://www.macinchem.org/reviews/clustering/clustering.php . His data set had 4.4M fingerprints and it took 10 hours to cluster at 0.8 similarity th

Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Wandré
Hi Andrew, Thanks for the link. It is very interesting. I will read very carefully. So, as input on ChemFP, I have to put a file with all molecules in 1 SDF? -- Wandré Nunes de Pinho Veloso Professor Assistente - Unifei - Campus Avançado de Itabira-MG Doutorando em Bioinformática - Universidade F

Re: [Rdkit-discuss] strange linker error using VC++

2018-01-11 Thread Paolo Tosco
Hi Jason, I believe the problem here is that if you are building outside CMake the WIN32 preprocessor macro is not defined (_WIN32 is). So, when ROMol.h is parsed, the ROMol class definition includes a "private" directive that should not be there, hence the error. To fix the issue, you need to

Re: [Rdkit-discuss] strange linker error using VC++

2018-01-11 Thread Jason Biggs
On Thu, Jan 11, 2018 at 8:03 AM, Paolo Tosco wrote: > Hi Jason, > > I believe the problem here is that if you are building outside CMake the > WIN32 preprocessor macro is not defined (_WIN32 is). So, when ROMol.h is > parsed, the ROMol class definition includes a "private" directive that > should

Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Andrew Dalke
On Jan 11, 2018, at 12:04, Wandré wrote: > Thanks for the link. It is very interesting. I will read very carefully. > So, as input on ChemFP, I have to put a file with all molecules in 1 SDF? Chemfp works with fingerprint files, in your case, chemfp's text-based "FPS" format. You'll need to use

Re: [Rdkit-discuss] strange linker error using VC++

2018-01-11 Thread Paolo Tosco
Hi Jason, Correct. You'd probably better use std::unique_ptr unless you are planning to reference the original pointer from multiple std::shared_ptr's. Cheers, p. On 11/01/2018 14:42, Jason Biggs wrote: On Thu, Jan 11, 2018 at 8:03 AM, Paolo Tosco > wrote:

[Rdkit-discuss] How to convert numpy array to rdkit fingerprint object?

2018-01-11 Thread Michał Nowotka
Hi, Imagine I have two numpy arrays containing zeros and ones (or bools) effectively being fingerprints: np_1, np_2 = some_fingerprints_as_np_arrays() I want to convert them both to rdkit fingerprint objects so I can use DiceSimilarity: from rdkit import DataStructs # this won't work becuse of

[Rdkit-discuss] Generate depiction matching 2D structure

2018-01-11 Thread Lukas Pravda
Dear all, I’ve just recently started using rdkit in python. Btw. A very nice piece of work. First, I’d like to generate 2D depiction of molecules I’m constructing from 3D coordinate data. I’m aware of methods such as GenerateDepictionMatching2DStructure(…) and GenerateDepictionMatching3DSt

Re: [Rdkit-discuss] How to convert numpy array to rdkit fingerprint object?

2018-01-11 Thread Jason Biggs
There may be a better way (my python is rudimentary), explicitList = numpy.array([0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0]) onbits = numpy.where(explicitList==1)[0].tolist() bv1 = DataStructs.SparseBitVect(20) bv1.SetBitsFromList(onbits) bv1 for i in range(20): print(bv

Re: [Rdkit-discuss] How to convert numpy array to rdkit fingerprint object?

2018-01-11 Thread Maciek Wójcikowski
Hi, In DataStrucs there are CreateFrom* functions which do what you want, although you'd have to pass numpy array to a string of ints. ''.join(array) would probably be enough. See http://www.rdkit.org/Python_Docs/rdkit.DataStructs.cDataStructs-module.html#CreateFromBitString Pozdrawiam, |

Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Wandré
Thanks Andrew, I will try this steps. So, to avoid recalculate fingerprints, how can I calculate them and store in database? When I calculate AtomPair fingerprint, returns a rdkit.DataStructs.cDataStructs.IntSparseIntVect object How to store this rdkit Python object in a database and how to read th

Re: [Rdkit-discuss] Explicit hydrogens in substructure search

2018-01-11 Thread Andrey
Hi Greg, First of all, many thanks for all your help! I managed to get it working for Python wrapper. Could you please give me an idea how to implement it for Postgres cartridge? Kind regards, Andrew 13.12.2017 08:58, Greg Landrum >On Tue, Dec 12, 2017 at 7:28 PM, Andrey wrote: > > > > >

Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Andrew Dalke
Hi Wandré, The easiest way to avoid recalculating the fingerprints is to keep the FPS file around. The rdkit2fps program calculates the AtomPair fingerprint and converts the resulting DataStructs fingerprint object into a hex-encoded fingerprint, which is stored as text in the FPS file. One