Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Francois Berenger
On 21/09/2018 16:53, Chris Earnshaw wrote: Hi I'm afraid I can't help with an RDkit solution to your question, but there are a couple of issues which should be born in mind: 1) The centroid of a cluster is a vector mean of the fingerprints of all the members of the cluster and probably will not

Re: [Rdkit-discuss] Saving mol file

2018-09-26 Thread GALLY Jose Manuel
Dear Colin, this is a specific problem I stumbled upon some time ago.[1] I also mentioned it to the rDock mailing list.[2] Maybe there is a better work-around, but in the meantime I wrote the attached function. It takes as input the Mol Block, which in my case are in a dataframe. Hope that

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Andrew Dalke
On Sep 26, 2018, at 20:26, Peter S. Shenkin wrote: > Ah, David, but how do you define a "real" singleton? There can be many different definitions of what a '"real" singleton' might be, but we are specifically talking about Butina clustering. The Butina paper defines the term "false singleton",

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Peter S. Shenkin
Ah, David, but how do you define a "real" singleton? -P. On Wed, Sep 26, 2018 at 1:30 PM David Cosgrove wrote: > Slightly off topic, but a minor issue with the Taylor-Butina algorithm is > that it generates “false singletons”. These are molecules just outside the > clustering cutoff that are

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread David Cosgrove
Slightly off topic, but a minor issue with the Taylor-Butina algorithm is that it generates “false singletons”. These are molecules just outside the clustering cutoff that are stranded when their neighbours are put in a different, larger cluster. We used to find it convenient to have a sweep of