Re: [Rdkit-discuss] Creating Mol Object From SD File

2018-08-29 Thread Dimitri Maziuk via Rdkit-discuss
On 08/29/2018 01:54 PM, Chris Murphy wrote:
> Hi,
> 
> I finally realized that when passing an sdf string to Chem.MolFromMolBlock,
> the Mol object will not retain the properties from the sdf.

Ugh. You're right.

+1 for a MolFromSdfBlock() that doesn't lose the properties.

> Also, it seems that SDMolSupplier.next() does not work anymore? 

if sys.version_info[0] == 2 : next()
elif sys.version_info[0] == 3 : __next()__
else : raise Exception( "Go! is looking better every day" )

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Creating Mol Object From SD File

2018-08-29 Thread Chris Murphy
Hi,

I finally realized that when passing an sdf string to Chem.MolFromMolBlock,
the Mol object will not retain the properties from the sdf. Knowing that, I
am wondering if there is a way to create a single Mol object from a SDF
string. Right now, the only way I know is by using SDMolSupplier:

my_mol = None
suppl = Chem.SDMolSupplier(filename, )
for mol in suppl:
my_mol = mol


...


Are there other ways to do this without needing to create a SDMolSupplier?
Also, it seems that SDMolSupplier.next() does not work anymore? I am
getting the following:

AttributeError: 'SDMolSupplier' object has no attribute 'next'

Any help is much appreciated, thanks!

Best,
Chris Murphy
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread JW Feng via Rdkit-discuss
Hi Andrew,

What about building QSAR models to predict activity for a particular ChEMBL
assay?  This would allow you to discuss strength and limitations of QSAR
models.

Best,

JW
___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080


On Wed, Aug 29, 2018 at 7:24 AM 
wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. want advice for good teaching data set (Andrew Dalke)
>2. Re: Capturing 3D Conformational Flexibility in a Single
>   Descriptor (Richard Cooper)
>3. Re: want advice for good teaching data set (TJ O'Donnell)
>4. Re: Capturing 3D Conformational Flexibility in a Single
>   Descriptor (Ali Eftekhari)
>
>
> --
>
> Message: 1
> Date: Wed, 29 Aug 2018 14:51:57 +0200
> From: Andrew Dalke 
> To: RDKit Discuss 
> Subject: [Rdkit-discuss] want advice for good teaching data set
> Message-ID: <8625305a-6b76-4721-bdbf-297f23edc...@dalkescientific.com>
> Content-Type: text/plain; charset=us-ascii
>
> Hi all,
>
>   I am starting to put together materials for the Python/RDKit training
> course I'm giving just before the RDKit UGM next month.
>
> I would like to structure part of it around the SQLite release of the
> ChEMBL data set. More specifically, I plan to include examples of machine
> learning with scikit-learn, using RDKit descriptors and values from ChEMBL
> 24 (and making sure to use the new schema).
>
> Two problems. First, I'm not a computational chemist and I don't know what
> would constitute a good example to use. "Good" in this case means one whose
> outlines are well-known to likely students. Second, I don't have much
> experience with the ChEMBL data.
>
> My thought is to make a logP model. The easiest would be to based it on
> atom types. For this option, can anyone suggest where I can find logP data
> from ChEMBL?
>
> Another possibility is to use a pre-existing model, like the notebook
> George Papadatos did for Ligand-based Target Prediction at
> http://nbviewer.jupyter.org/gist/madgpap/10457778 .
>
> Perhaps someone here could point me to other existing resources along
> similar lines?
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
>
> --
>
> Message: 2
> Date: Wed, 29 Aug 2018 14:32:28 +0100
> From: Richard Cooper 
> To: Ali Eftekhari 
> Cc: RDKit Discuss 
> Subject: Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility
> in a Single Descriptor
> Message-ID:
> <
> cajwsdrteawmtnqrhzfnfojj54orgtsgj+-_6rwly26o98as...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just to follow up with the details - here is the line in the script to
> change:
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)
>
> to
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )
>
> (where 737 is an integer constant of your choice, but not -1).
>
> Richard
>
>
> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
> richardiancooper+rdkitdisc...@gmail.com> wrote:
> >
> > Hi Ali,
> >
> > Sorry I missed your email.
> >
> > The behaviour you describe is correct, due to a random seed in the
> conformer generation step. The descriptor value usually doesn't vary by too
> much.
> >
> > I think you can give the conformer generation a constant random seed if
> you need a reproducible number for nConf20.
> >
> > Regards, Richard
> >
> >
> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, 
> wrote:
> >>
> >> Hello all,
> >>
> >> I am trying to calculate 3D Descriptors following this publication:
> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper.  J.
> Chem. Inf. Model. 2016, 56, 2347?2352
> >>
> >> I am essentially using the same script as they have in the supporting
> information and i have attached it here as well.  In Table 2 from the above
> calculation, the value of the descriptor (nConf20) for ZINC000290539224
> molecule is listed as 10.  However, when I run the exact code as the one
> they used, I get different value at each run.
> >>
> >> I have already contacted the authors but got no response.  I am
> wondering if the code they have in the supporting information is 

Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor

2018-08-29 Thread Ali Eftekhari
Thank you very much! This is really helpful!

Ali

On Wed, Aug 29, 2018 at 7:52 AM Richard Cooper <
richardiancooper+rdkitdisc...@gmail.com> wrote:

> I think it depends on what you need the descriptor for. If it were for
> some kind of fingerprinting, the example implementation would be too noisy.
> We used it to estimate how many low energy conformations of a molecule
> might be present in a particular system - and it turned out that correlated
> well with our classifications of the system.  The variability increases
> with RBC: for totally rigid systems RBC and nConf20 are zero. For more
> reproducible results you can increase the number of conformers generated;
> the cost is longer calculations, but if you only have 350 molecules this
> might be OK.
>
> In the paper there are two example molecules with RBC of 1 and 8
> respectively which both have only a single low energy conformation, and it
> was this discrimination beyond simple RBC that drove its development.
>
> Analysis of the spread of nConf20 showed that it was larger than the
> spread of RBC, which might give it slightly better properties as an input
> descriptor. However, if you are finding less variability in your particular
> data set, then it might not be such a good discriminator of whatever you're
> trying to discriminate. I wouldn't recommend adopting it as the 'main
> descriptor' until you test whether it's useful.
>
> Regards,
> Richard
>
>
>
>
> On Wed, Aug 29, 2018 at 3:24 PM Ali Eftekhari 
> wrote:
>
>> Hi Dr. Cooper,
>>
>> Thanks for your response and the suggestions.  I added randomSeed=737 and
>> I now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule
>> (although it is different than your paper [the value is 10] it does not
>> change on each run).  My concern now is on the general usage of nConf20
>> descriptor.  For instance, is there a limitation on what molecules can be
>> used for estimating their nConf20? Since the conformers are generated
>> randomly, how reliable is this descriptor to use it as a replacement for
>> Rotatable Bond Count (RBC) in all machine learning models.
>>
>> In my application, the calculated values of RBC for 350 molecules range
>> from 0 to 7 with (80% between 0-4 and 20% between 5-7).  The calculated
>> values of nconf20 is between 0-40 but with 95% between 0-3.  Since nConf20
>> for majority of molecules is between 0-3, I am concerned on the usage of
>> nconf20 as the main descriptor.  Could you please comment on that?
>>
>> Thanks,
>> Ali
>>
>> On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper <
>> richardiancooper+rdkitdisc...@gmail.com> wrote:
>>
>>>
>>> Just to follow up with the details - here is the line in the script to
>>> change:
>>>
>>>conformers = AllChem.EmbedMultipleConfs
>>> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)
>>>
>>> to
>>>
>>>conformers = AllChem.EmbedMultipleConfs
>>> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )
>>>
>>> (where 737 is an integer constant of your choice, but not -1).
>>>
>>> Richard
>>>
>>>
>>> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
>>> richardiancooper+rdkitdisc...@gmail.com> wrote:
>>> >
>>> > Hi Ali,
>>> >
>>> > Sorry I missed your email.
>>> >
>>> > The behaviour you describe is correct, due to a random seed in the
>>> conformer generation step. The descriptor value usually doesn't vary by too
>>> much.
>>> >
>>> > I think you can give the conformer generation a constant random seed
>>> if you need a reproducible number for nConf20.
>>> >
>>> > Regards, Richard
>>> >
>>> >
>>> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, 
>>> wrote:
>>> >>
>>> >> Hello all,
>>> >>
>>> >> I am trying to calculate 3D Descriptors following this publication:
>>> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational
>>> Flexibility in a Single Descriptor", Jerome G. P. Wicker and Richard I.
>>> Cooper.  J. Chem. Inf. Model. 2016, 56, 2347−2352
>>> >>
>>> >> I am essentially using the same script as they have in the supporting
>>> information and i have attached it here as well.  In Table 2 from the above
>>> calculation, the value of the descriptor (nConf20) for ZINC000290539224
>>> molecule is listed as 10.  However, when I run the exact code as the one
>>> they used, I get different value at each run.
>>> >>
>>> >> I have already contacted the authors but got no response.  I am
>>> wondering if the code they have in the supporting information is not right
>>> or the value they listed in the table is wrong?
>>> >>
>>> >> The SMILES string for this particular molecule is:
>>> >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O'
>>> >>
>>> >> Thanks in advance for your help!
>>> >>
>>>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

Re: [Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread Eloy Félix
Hi Andrew,

If you want to build model I guess that what you want is to get
experimental logp values.

This should give you something to start with:

select ACTIVITY_ID, MOLREGNO, STANDARD_VALUE, STANDARD_TYPE from ACTIVITIES
where STANDARD_TYPE = 'LogP' and STANDARD_VALUE is not null and
data_validity_comment is null and POTENTIAL_DUPLICATE = 0;

Eloy.


2018-08-29 14:51 GMT+01:00 TJ O'Donnell :

> Hi Andrew
> ChEMBL 24 has compound properties in the table compound_properties.  I
> think the alogp
> is computed using (Crippen) atom types and the acd_logp is uses ACD labs
> methods.
> TJ
>
> On Wed, Aug 29, 2018 at 5:52 AM Andrew Dalke 
> wrote:
>
>> Hi all,
>>
>>   I am starting to put together materials for the Python/RDKit training
>> course I'm giving just before the RDKit UGM next month.
>>
>> I would like to structure part of it around the SQLite release of the
>> ChEMBL data set. More specifically, I plan to include examples of machine
>> learning with scikit-learn, using RDKit descriptors and values from ChEMBL
>> 24 (and making sure to use the new schema).
>>
>> Two problems. First, I'm not a computational chemist and I don't know
>> what would constitute a good example to use. "Good" in this case means one
>> whose outlines are well-known to likely students. Second, I don't have much
>> experience with the ChEMBL data.
>>
>> My thought is to make a logP model. The easiest would be to based it on
>> atom types. For this option, can anyone suggest where I can find logP data
>> from ChEMBL?
>>
>> Another possibility is to use a pre-existing model, like the notebook
>> George Papadatos did for Ligand-based Target Prediction at
>> http://nbviewer.jupyter.org/gist/madgpap/10457778 .
>>
>> Perhaps someone here could point me to other existing resources along
>> similar lines?
>>
>> Best regards,
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor

2018-08-29 Thread Richard Cooper
I think it depends on what you need the descriptor for. If it were for some
kind of fingerprinting, the example implementation would be too noisy. We
used it to estimate how many low energy conformations of a molecule might
be present in a particular system - and it turned out that correlated well
with our classifications of the system.  The variability increases with
RBC: for totally rigid systems RBC and nConf20 are zero. For more
reproducible results you can increase the number of conformers generated;
the cost is longer calculations, but if you only have 350 molecules this
might be OK.

In the paper there are two example molecules with RBC of 1 and 8
respectively which both have only a single low energy conformation, and it
was this discrimination beyond simple RBC that drove its development.

Analysis of the spread of nConf20 showed that it was larger than the spread
of RBC, which might give it slightly better properties as an input
descriptor. However, if you are finding less variability in your particular
data set, then it might not be such a good discriminator of whatever you're
trying to discriminate. I wouldn't recommend adopting it as the 'main
descriptor' until you test whether it's useful.

Regards,
Richard




On Wed, Aug 29, 2018 at 3:24 PM Ali Eftekhari 
wrote:

> Hi Dr. Cooper,
>
> Thanks for your response and the suggestions.  I added randomSeed=737 and
> I now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule
> (although it is different than your paper [the value is 10] it does not
> change on each run).  My concern now is on the general usage of nConf20
> descriptor.  For instance, is there a limitation on what molecules can be
> used for estimating their nConf20? Since the conformers are generated
> randomly, how reliable is this descriptor to use it as a replacement for
> Rotatable Bond Count (RBC) in all machine learning models.
>
> In my application, the calculated values of RBC for 350 molecules range
> from 0 to 7 with (80% between 0-4 and 20% between 5-7).  The calculated
> values of nconf20 is between 0-40 but with 95% between 0-3.  Since nConf20
> for majority of molecules is between 0-3, I am concerned on the usage of
> nconf20 as the main descriptor.  Could you please comment on that?
>
> Thanks,
> Ali
>
> On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper <
> richardiancooper+rdkitdisc...@gmail.com> wrote:
>
>>
>> Just to follow up with the details - here is the line in the script to
>> change:
>>
>>conformers = AllChem.EmbedMultipleConfs
>> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)
>>
>> to
>>
>>conformers = AllChem.EmbedMultipleConfs
>> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )
>>
>> (where 737 is an integer constant of your choice, but not -1).
>>
>> Richard
>>
>>
>> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
>> richardiancooper+rdkitdisc...@gmail.com> wrote:
>> >
>> > Hi Ali,
>> >
>> > Sorry I missed your email.
>> >
>> > The behaviour you describe is correct, due to a random seed in the
>> conformer generation step. The descriptor value usually doesn't vary by too
>> much.
>> >
>> > I think you can give the conformer generation a constant random seed if
>> you need a reproducible number for nConf20.
>> >
>> > Regards, Richard
>> >
>> >
>> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, 
>> wrote:
>> >>
>> >> Hello all,
>> >>
>> >> I am trying to calculate 3D Descriptors following this publication:
>> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
>> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper.  J.
>> Chem. Inf. Model. 2016, 56, 2347−2352
>> >>
>> >> I am essentially using the same script as they have in the supporting
>> information and i have attached it here as well.  In Table 2 from the above
>> calculation, the value of the descriptor (nConf20) for ZINC000290539224
>> molecule is listed as 10.  However, when I run the exact code as the one
>> they used, I get different value at each run.
>> >>
>> >> I have already contacted the authors but got no response.  I am
>> wondering if the code they have in the supporting information is not right
>> or the value they listed in the table is wrong?
>> >>
>> >> The SMILES string for this particular molecule is:
>> >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O'
>> >>
>> >> Thanks in advance for your help!
>> >>
>>
>>>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor

2018-08-29 Thread Ali Eftekhari
Hi Dr. Cooper,

Thanks for your response and the suggestions.  I added randomSeed=737 and I
now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule
(although it is different than your paper [the value is 10] it does not
change on each run).  My concern now is on the general usage of nConf20
descriptor.  For instance, is there a limitation on what molecules can be
used for estimating their nConf20? Since the conformers are generated
randomly, how reliable is this descriptor to use it as a replacement for
Rotatable Bond Count (RBC) in all machine learning models.

In my application, the calculated values of RBC for 350 molecules range
from 0 to 7 with (80% between 0-4 and 20% between 5-7).  The calculated
values of nconf20 is between 0-40 but with 95% between 0-3.  Since nConf20
for majority of molecules is between 0-3, I am concerned on the usage of
nconf20 as the main descriptor.  Could you please comment on that?

Thanks,
Ali

On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper <
richardiancooper+rdkitdisc...@gmail.com> wrote:

>
> Just to follow up with the details - here is the line in the script to
> change:
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)
>
> to
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )
>
> (where 737 is an integer constant of your choice, but not -1).
>
> Richard
>
>
> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
> richardiancooper+rdkitdisc...@gmail.com> wrote:
> >
> > Hi Ali,
> >
> > Sorry I missed your email.
> >
> > The behaviour you describe is correct, due to a random seed in the
> conformer generation step. The descriptor value usually doesn't vary by too
> much.
> >
> > I think you can give the conformer generation a constant random seed if
> you need a reproducible number for nConf20.
> >
> > Regards, Richard
> >
> >
> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, 
> wrote:
> >>
> >> Hello all,
> >>
> >> I am trying to calculate 3D Descriptors following this publication:
> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper.  J.
> Chem. Inf. Model. 2016, 56, 2347−2352
> >>
> >> I am essentially using the same script as they have in the supporting
> information and i have attached it here as well.  In Table 2 from the above
> calculation, the value of the descriptor (nConf20) for ZINC000290539224
> molecule is listed as 10.  However, when I run the exact code as the one
> they used, I get different value at each run.
> >>
> >> I have already contacted the authors but got no response.  I am
> wondering if the code they have in the supporting information is not right
> or the value they listed in the table is wrong?
> >>
> >> The SMILES string for this particular molecule is:
> >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O'
> >>
> >> Thanks in advance for your help!
> >>
>
>>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread TJ O'Donnell
Hi Andrew
ChEMBL 24 has compound properties in the table compound_properties.  I
think the alogp
is computed using (Crippen) atom types and the acd_logp is uses ACD labs
methods.
TJ

On Wed, Aug 29, 2018 at 5:52 AM Andrew Dalke 
wrote:

> Hi all,
>
>   I am starting to put together materials for the Python/RDKit training
> course I'm giving just before the RDKit UGM next month.
>
> I would like to structure part of it around the SQLite release of the
> ChEMBL data set. More specifically, I plan to include examples of machine
> learning with scikit-learn, using RDKit descriptors and values from ChEMBL
> 24 (and making sure to use the new schema).
>
> Two problems. First, I'm not a computational chemist and I don't know what
> would constitute a good example to use. "Good" in this case means one whose
> outlines are well-known to likely students. Second, I don't have much
> experience with the ChEMBL data.
>
> My thought is to make a logP model. The easiest would be to based it on
> atom types. For this option, can anyone suggest where I can find logP data
> from ChEMBL?
>
> Another possibility is to use a pre-existing model, like the notebook
> George Papadatos did for Ligand-based Target Prediction at
> http://nbviewer.jupyter.org/gist/madgpap/10457778 .
>
> Perhaps someone here could point me to other existing resources along
> similar lines?
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor

2018-08-29 Thread Richard Cooper
Just to follow up with the details - here is the line in the script to
change:

   conformers = AllChem.EmbedMultipleConfs
(molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)

to

   conformers = AllChem.EmbedMultipleConfs
(molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )

(where 737 is an integer constant of your choice, but not -1).

Richard


On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
richardiancooper+rdkitdisc...@gmail.com> wrote:
>
> Hi Ali,
>
> Sorry I missed your email.
>
> The behaviour you describe is correct, due to a random seed in the
conformer generation step. The descriptor value usually doesn't vary by too
much.
>
> I think you can give the conformer generation a constant random seed if
you need a reproducible number for nConf20.
>
> Regards, Richard
>
>
> On Tue, 28 Aug 2018, 00:25 Ali Eftekhari,  wrote:
>>
>> Hello all,
>>
>> I am trying to calculate 3D Descriptors following this publication:
>> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper.  J.
Chem. Inf. Model. 2016, 56, 2347−2352
>>
>> I am essentially using the same script as they have in the supporting
information and i have attached it here as well.  In Table 2 from the above
calculation, the value of the descriptor (nConf20) for ZINC000290539224
molecule is listed as 10.  However, when I run the exact code as the one
they used, I get different value at each run.
>>
>> I have already contacted the authors but got no response.  I am
wondering if the code they have in the supporting information is not right
or the value they listed in the table is wrong?
>>
>> The SMILES string for this particular molecule is:
>> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O'
>>
>> Thanks in advance for your help!
>>

>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread Andrew Dalke
Hi all,

  I am starting to put together materials for the Python/RDKit training course 
I'm giving just before the RDKit UGM next month.

I would like to structure part of it around the SQLite release of the ChEMBL 
data set. More specifically, I plan to include examples of machine learning 
with scikit-learn, using RDKit descriptors and values from ChEMBL 24 (and 
making sure to use the new schema).

Two problems. First, I'm not a computational chemist and I don't know what 
would constitute a good example to use. "Good" in this case means one whose 
outlines are well-known to likely students. Second, I don't have much 
experience with the ChEMBL data.

My thought is to make a logP model. The easiest would be to based it on atom 
types. For this option, can anyone suggest where I can find logP data from 
ChEMBL?

Another possibility is to use a pre-existing model, like the notebook George 
Papadatos did for Ligand-based Target Prediction at 
http://nbviewer.jupyter.org/gist/madgpap/10457778 .

Perhaps someone here could point me to other existing resources along similar 
lines?

Best regards,

Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss