Dear Jing, How about your trying using bayon ? https://code.google.com/p/bayon/ It's not function of RDKit, but I think the library can cluster molecules using ECFP4.
Unfortunately, input file format of bayon is not distance matrix but easy to prepare the format. Best regards. Takayuki 2015年8月23日(日) 12:03 Jing Lu <ajin...@gmail.com>: > Currently, I prefer fingerprint based clustering, because it's hard to set > the cutoff for scaffold based clustering. Does RDKit have scaffold based > clustering? > > On Sat, Aug 22, 2015 at 10:56 PM, <abhik1...@gmail.com> wrote: > >> Hi, how about scaffold based clustering . You extract the scaffolds and >> then cluster it and then put the respective scaffold compounds inside the >> cluster . >> >> Sent from my iPhone >> >> > On Aug 22, 2015, at 8:43 PM, Jing Lu <ajin...@gmail.com> wrote: >> > >> > Dear RDKit users, >> > >> > If I want to cluster more than 1M molecules by ECFP4. How could I do >> it? If I calculate the distance between every pair of molecules, the size >> of distance matrix will be too big. Does RDKit support any heuristic >> clustering algorithm without calculating the distance matrix of the whole >> library? >> > >> > >> > >> > Thanks, >> > Jing >> > >> ------------------------------------------------------------------------------ >> > _______________________________________________ >> > Rdkit-discuss mailing list >> > Rdkit-discuss@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > ------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss