On Thu, Sep 18, 2008 at 11:07 PM, Robert DeLisle <rkdeli...@gmail.com> wrote:
> Greg,
>
> Thank you for the response.
>
> I was able to get PEOE_VSA1 through PEOE_VSA14, SMR_VSA1 through SMR_VSA10,
> and EState_VSA1 through EState_VSA11 working.  Are these the correct limits
> on the vector components?

Yes. Just in case you used a more painful approach, here's the
simplest way to check (without looking at the source in
$RDBASE/Python/Chem/MolSurf.py):
[17] >>> [x for x in AvailDescriptors.descDict.keys() if x.find('PEOE_VSA')!=-1]
Out[17]:
['PEOE_VSA14',
 'PEOE_VSA13',
 'PEOE_VSA12',
 'PEOE_VSA11',
 'PEOE_VSA10',
 'PEOE_VSA8',
 'PEOE_VSA7',
 'PEOE_VSA6',
 'PEOE_VSA5',
 'PEOE_VSA4',
 'PEOE_VSA3',
 'PEOE_VSA2',
 'PEOE_VSA1',
 'PEOE_VSA9']

> I was unable, however, to get Slogp_VSA or VSA_EState working with any
> integer suffix between 1 and 10.

That's strange. What errors were you getting?

> I've also done a correlation analysis on all the descriptors that I've
> gotten working.  After computing descriptors for some 24,000 compounds I
> removed those with less than 10% variance and limited correlations between
> variables to a maximum of 0.85 (using KNIME).  I'm happy to send a list of
> the resulting descriptors or a correlation matrix if you or anyone else is
> interested.

Sounds interesting. If you are willing, I would be happy to put this
on the wiki, linked from the descriptors page. It would be best if you
could also describe the source of the 24K compounds (or provide SMILES
for them).

-greg

Reply via email to