Hi, how about scaffold based clustering . You extract the scaffolds and then cluster it and then put the respective scaffold compounds inside the cluster .
Sent from my iPhone > On Aug 22, 2015, at 8:43 PM, Jing Lu <ajin...@gmail.com> wrote: > > Dear RDKit users, > > If I want to cluster more than 1M molecules by ECFP4. How could I do it? If I > calculate the distance between every pair of molecules, the size of distance > matrix will be too big. Does RDKit support any heuristic clustering algorithm > without calculating the distance matrix of the whole library? > > > > Thanks, > Jing > ------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss