I was about to reply "yeah, the first ID is the cluster centroid", but that
may not be true for all definitions of "centroid".
According to the Butina paper, the first point added is considered to be
the centroid. The definition of that is that all the other points in the
cluster are within the exclusion distance of the first point. It does not
necessarily mean that its the point in the cluster that would satisfy a
geometric definition of centroid.
But that's a detail. According to the definitions of the algorithm, the
first point is the centroid. :-)
-greg
On Thu, Oct 26, 2017 at 4:51 PM, Paul Hawkins <phawk...@eyesopen.com> wrote:
> Thanks Greg. In the results from the Butina clustering is the cluster
> centre the first ID in the list? If so then to recover the cluster centres
> I’d just need to do something like:
>
>
>
> For c in cs:
>
> conf_id_I_want = c[0]
>
>
>
>
>
> Paul.
>
>
>
> *From:* Greg Landrum [mailto:greg.land...@gmail.com]
> *Sent:* Thursday, October 26, 2017 12:41 AM
> *To:* Paul Hawkins <phawk...@eyesopen.com>
> *Cc:* rdkit-discuss@lists.sourceforge.net; Sereina <
> sereina.rini...@gmail.com>
> *Subject:* Re: [Rdkit-discuss] Conformer generation
>
>
>
>
>
>
>
> On Wed, Oct 25, 2017 at 6:52 PM, Sereina <sereina.rini...@gmail.com>
> wrote:
>
> Hi Paul,
>
>
>
> Regarding your second question:
>
>
>
> On 25 Oct 2017, at 18:36, Paul Hawkins <phawk...@eyesopen.com> wrote:
>
>
>
> Also, once I generate the conformers what is best way to cluster them by
> RMSD so that each conformer has a minimum RMSD to all the others in the set?
>
>
>
> I think the function AllChem.GetConformerRMSMatrix() might do (parts of)
> what you want.
>
>
>
> And since we're doing the "each reply refines the answer" thing, here's a
> bit of code that does Butina clustering to group the conformers:
>
>
>
> In [71]: m = Chem.AddHs(Chem.MolFromSmiles('OCCc1ncccc1CCCC'))
>
>
>
> In [72]: AllChem.EmbedMultipleConfs(m,50,AllChem.ETKDG())
>
> Out[72]: <rdkit.rdBase._vecti at 0x11630ac90>
>
>
>
> In [73]: dm = AllChem.GetConformerRMSMatrix(m)
>
>
>
> In [76]: cs = Butina.ClusterData(dm,m.GetNumConformers(),1.5,isDistData=
> True,reordering=True)
>
>
>
> In [77]: len(cs)
>
> Out[77]: 9
>
>
>
> In [78]: for c in cs:
>
> ...: print(c)
>
> ...:
>
> (36, 2, 3, 4, 7, 8, 9, 11, 14, 15, 16, 17, 21, 29, 30, 33, 34, 40, 43, 44,
> 46, 47)
>
> (10, 0, 32, 35, 38, 45, 49, 18, 19, 22, 23, 25, 27)
>
> (48, 6, 39, 42, 12, 24, 26)
>
> (5, 41, 31)
>
> (37,)
>
> (28,)
>
> (20,)
>
> (13,)
>
> (1,)
>
>
>
> The 1.5 argument in the call to Butina.ClusterData is the distance
> threshold for things to be considered neighbors.
>
>
>
> -greg
>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss