Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-22 Thread Taka Seri
Dear Jing, How about your trying using bayon ? https://code.google.com/p/bayon/ It's not function of RDKit, but I think the library can cluster molecules using ECFP4. Unfortunately, input file format of bayon is not distance matrix but easy to prepare the format. Best regards. Takayuki

[Rdkit-discuss] Clustering 1M molecules

2015-08-22 Thread Jing Lu
Dear RDKit users, If I want to cluster more than 1M molecules by ECFP4. How could I do it? If I calculate the distance between every pair of molecules, the size of distance matrix will be too big. Does RDKit support any heuristic clustering algorithm without calculating the distance matrix of the

Re: [Rdkit-discuss] AP / DP descriptors

2015-08-22 Thread Greg Landrum
Just an FYI on this one: I just merged a Python DP and DT implementation onto master. Here's the github issue referencing the commits: https://github.com/rdkit/rdkit/issues/574 I will try to get a C++ version done in time for the next release. On Wed, Jul 15, 2015 at 11:02 AM, Greg Landrum