Dear Christian,

2010/4/4 Christian de Bouillé <[email protected]>:
> Dear Greg
>
> Thank you for your email
>
> With now 6 descriptors for each structure
> over 11 million chemicals, my goal is to help the
> visitor to get a set of molecules with a good diversity.
>
> For instance if he find 1000 structures from a query with a subtructure
> the visitor can reduce to 96 molecules with a good diversity
> and he can buy the microplate.
>
> My idea is to use cluster over the descriptors
> but it is not evident with the scripts I wrote below
>
> Your "Overview PDF" uses Diversity Picking
> Could you advise how to use it ? with descriptors or fingerprints
> how to display the clusters? how to cut the cluster to find
> 96 molecules.

"Diverse" is not particularly well defined. If you are trying to pick
a set that looks chemically diverse to a chemist, I would suggest
using diversity picking based on a fingerprint like the Morgan
fingerprint or MACCS fingerprint provided by the RDKit. If you want a
set that are diverse in your descriptor space (but maybe don't look as
diverse to a chemist) then using your descriptors makes sense.

The RDKit has a couple of different ways of doing diversity picking
available that are in the rdkit.SimDivFilters module. The one I would
recommend using is the MaxMinPicker:
http://www.rdkit.org/Python_Docs/rdkit.SimDivFilters.rdSimDivPickers.MaxMinPicker-class.html

Unfortunately there's not much sample code available other than the
testing code in $RDBASE/Code/SimDivPickers/Wrap:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/SimDivPickers/Wrap/testPickers.py?revision=997&view=markup

> Is Python/Django quicker to query than jsp for 11 millions chemicals
>
> Could you display the chemicals with Python so well as Jchem ?

I think the 2D drawings from the RDKit are not bad; the ones from
JChem/Marvin are often better though.

> Voilà, some advices of yours would help me to start the work
>
> Best Regards
> Christian

-greg

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to