[Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread JW Feng via Rdkit-discuss
Hi Andrew,

What about building QSAR models to predict activity for a particular ChEMBL
assay?  This would allow you to discuss strength and limitations of QSAR
models.

Best,

JW
___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080


On Wed, Aug 29, 2018 at 7:24 AM 
wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. want advice for good teaching data set (Andrew Dalke)
>2. Re: Capturing 3D Conformational Flexibility in a Single
>   Descriptor (Richard Cooper)
>3. Re: want advice for good teaching data set (TJ O'Donnell)
>4. Re: Capturing 3D Conformational Flexibility in a Single
>   Descriptor (Ali Eftekhari)
>
>
> --
>
> Message: 1
> Date: Wed, 29 Aug 2018 14:51:57 +0200
> From: Andrew Dalke 
> To: RDKit Discuss 
> Subject: [Rdkit-discuss] want advice for good teaching data set
> Message-ID: <8625305a-6b76-4721-bdbf-297f23edc...@dalkescientific.com>
> Content-Type: text/plain; charset=us-ascii
>
> Hi all,
>
>   I am starting to put together materials for the Python/RDKit training
> course I'm giving just before the RDKit UGM next month.
>
> I would like to structure part of it around the SQLite release of the
> ChEMBL data set. More specifically, I plan to include examples of machine
> learning with scikit-learn, using RDKit descriptors and values from ChEMBL
> 24 (and making sure to use the new schema).
>
> Two problems. First, I'm not a computational chemist and I don't know what
> would constitute a good example to use. "Good" in this case means one whose
> outlines are well-known to likely students. Second, I don't have much
> experience with the ChEMBL data.
>
> My thought is to make a logP model. The easiest would be to based it on
> atom types. For this option, can anyone suggest where I can find logP data
> from ChEMBL?
>
> Another possibility is to use a pre-existing model, like the notebook
> George Papadatos did for Ligand-based Target Prediction at
> http://nbviewer.jupyter.org/gist/madgpap/10457778 .
>
> Perhaps someone here could point me to other existing resources along
> similar lines?
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
>
> --
>
> Message: 2
> Date: Wed, 29 Aug 2018 14:32:28 +0100
> From: Richard Cooper 
> To: Ali Eftekhari 
> Cc: RDKit Discuss 
> Subject: Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility
> in a Single Descriptor
> Message-ID:
> <
> cajwsdrteawmtnqrhzfnfojj54orgtsgj+-_6rwly26o98as...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just to follow up with the details - here is the line in the script to
> change:
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)
>
> to
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )
>
> (where 737 is an integer constant of your choice, but not -1).
>
> Richard
>
>
> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
> richardiancooper+rdkitdisc...@gmail.com> wrote:
> >
> > Hi Ali,
> >
> > Sorry I missed your email.
> >
> > The behaviour you describe is correct, due to a random seed in the
> conformer generation step. The descriptor value usually doesn't vary by too
> much.
> >
> > I think you can give the conformer generation a constant random seed if
> you need a reproducible number for nConf20.
> >
> > Regards, Richard
> >
> >
> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, 
> wrote:
> >>
> >> Hello all,
> >>
> >> I am trying to calculate 3D Descriptors following this publication:
> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper.  J.
> Chem. Inf. Model. 2016, 56, 2347?2352
> >>
> >> I am essentially using the same script as they have in the supporting
> information and i have attached it here as well.  In Table 2 from the above
> calculation, the value of the descriptor (nConf20) for ZINC000290539224
> molecule is listed as 10.  However, when I run the exact code as the one
> they used, I get different value at each run.
> >>
> >> I have already contacted the authors but got no response.  I am
> wondering if the code they have in the supporting information is 

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 124, Issue 10

2018-02-07 Thread JW Feng via Rdkit-discuss
How about setting up a donation fund on rdkit.org to pay for summer
students to document code?  For companies that benefited from using RDKit,
it is a worthy cause to pay it forward.

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Feb 7, 2018 at 12:24 PM, <
rdkit-discuss-requ...@lists.sourceforge.net> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: RDKit and Google Summer of Code 2018 (Greg Landrum)
>
>
> --
>
> Message: 1
> Date: Wed, 7 Feb 2018 21:23:46 +0100
> From: Greg Landrum 
> To: Cameron Pye 
> Cc: RDKit Discuss 
> Subject: Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
> Message-ID:
>  o...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> A quick one on this as part of me digging out from the pile of email I
> should have replied to.
>
> Cameron's suggestion is a really good one, but unfortunately GSoC is really
> about coding projects, so it doesn't work here.
>
> But we should still talk about ways to improve the docs.
>
> I agree that this is a really important task but it's also a bit
> overwhelming and difficult to know where to start. This is too bad since
> it's something you don't need to be a coder to approach; more or less any
> RDKit user could contribute. Believe it or not, just having people point
> out pieces of code that could be (better) documented is already useful -
> I'm sure I'm not the only developer who has forgotten which bits of code
> they've left un(der)documented. I often have 10-15 minute slots of time
> that I could use for writing docs, but it really helps to know which pieces
> should be done first.
>
> I would love to hear suggestions for ways that we can make it easier for
> people to submit improved documentation or pointers to pieces of code that
> could use better documentation and then to let people know that these
> options exist. It needs to be something other than "send email to the list"
> though.
>
> It's currently pretty easy to submit bug reports/feature requests using the
> github interface. These could either provide suggested docs/doc changes or
> point to functions/methods/classes that could be better documented. The
> github guys just added the ability to specify different types of issue
> templates, I could look into doing one of these for documentation requests.
>
> -greg
>
>
>
> On Wed, Jan 24, 2018 at 7:38 PM, Cameron Pye 
> wrote:
>
> >  I know this isn't a particularly sexy job for a budding
> cheminformatician
> > but...
> >
> > Work on the Python documentation!!!
> >
> > I love rdKit and occasionally think I'm pretty savvy but I can't tell you
> > how often I'm scrolling through the documentation (or source) and either:
> >
> > a) discover something that exists but doesn't have anything documentation
> > but the function signature
> > or
> > b) discover some some functionality that exists (and i've wanted) but
> > didn't know it was there!
> >
> > I think this mailing list and Greg do a superb job of keeping the
> > community informed and creating and maintaining the codebase but I think
> > having some more "Pythonic" API documentation would be great.
> >
> > One shining example is the scikit-learn documentation
> >  that has a quick
> > start, tutorials etc.  and then in the well categorized and explanatory
> API
> > ref has links for examples in the User Guide (akin to the "Getting
> Started
> > with the RDKit in Python" doc) .
> >
> > Just my 2 cents!
> >
> > Thanks for all the hard work as always,
> > Cam
> >
> >
> > On Mon, Jan 15, 2018 at 12:52 PM  > sourceforge.net> wrote:
> >
> >> Send Rdkit-discuss mailing list submissions to
> >> rdkit-discuss@lists.sourceforge.net
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >> or, via email, send a message with subject or body 'help' to
> >> rdkit-discuss-requ...@lists.sourceforge.net
> >>
> >> You can reach the person managing the list at
> >> rdkit-discuss-ow...@lists.sourceforge.net
> 

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 123, Issue 26

2018-01-16 Thread JW Feng via Rdkit-discuss
Another +1 for MolVS.

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Tue, Jan 16, 2018 at 10:04 AM, <
rdkit-discuss-requ...@lists.sourceforge.net> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: RDKit and Google Summer of Code 2018 (Brian Cole)
>2. Re: RDKit and Google Summer of Code 2018 (JP)
>3. Re: RDKit and Google Summer of Code 2018 (George Papadatos)
>
>
> --
>
> Message: 1
> Date: Tue, 16 Jan 2018 10:00:00 -0500
> From: Brian Cole 
> To: Francois BERENGER 
> Cc: RDKit Discuss 
> Subject: Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
> Message-ID:
>  gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> +1 to the MolVS project as well.
>
> Perhaps an easy bite-size project is to incorporate the open source mae
> parser code into core RDKit: https://github.com/schrodinger/maeparser
>
>
> On Mon, Jan 15, 2018 at 9:08 PM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
>
> > On 01/16/2018 05:51 AM, Tim Dudgeon wrote:
> > > Incorporating and "industrialising" Matt's MolVS tautomer and
> > > standardizer code?
> > > http://molvs.readthedocs.io/en/latest/index.html
> >
> > If we can vote, I would vote for this one.
> >
> > > On 15/01/18 07:09, Greg Landrum wrote:
> > >> Dear all,
> > >>
> > >> We've been invited again to participate in the OpenChemistry
> > >> application for Google Summer of Code.
> > >>
> > >> In order to participate we need ideas for projects and mentors to go
> > >> along with them.
> > >>
> > >> The current list of RDKit ideas is being maintained here:
> > >> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas
> > >>
> > >> (Note: at the point that I'm pressing "send", that's still a copy of
> > >> last year's project ideas).
> > >>
> > >> If you're willing to be a mentor (please ask me about the ~5
> > >> hours/week required here) or have ideas, please reply to this thread.
> > >>
> > >> Best,
> > >> -greg
> > >>
> > >>
> > >> 
> > --
> > >> Check out the vibrant tech community on one of the world's most
> > >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > >>
> > >>
> > >> ___
> > >> Rdkit-discuss mailing list
> > >> Rdkit-discuss@lists.sourceforge.net
> > >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> > >
> > >
> > >
> > > 
> > --
> > > Check out the vibrant tech community on one of the world's most
> > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > >
> > >
> > >
> > > ___
> > > Rdkit-discuss mailing list
> > > Rdkit-discuss@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> > >
> >
> > 
> > --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
> Message: 2
> Date: Tue, 16 Jan 2018 18:19:46 +0100
> From: JP 
> To: Brian Cole 
> Cc: Francois BERENGER ,  RDKit Discuss
> 
> Subject: Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
> Message-ID:
>  gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Joining the fray, +1 for MolVS
>
> On 16 January 2018 at 16:00, Brian Cole  wrote:
>
> > +1 to the MolVS project as well.
> >
> > Perhaps an easy bite-size project is to incorporate the open source mae
> > parser code into core RDKit: 

Re: [Rdkit-discuss] Seg fault importing rdkit.Chem on Mac 10.13.2 and Python 3.6.3

2018-01-04 Thread JW Feng via Rdkit-discuss
Thanks, my colleague Katrina Lexa found that python 3.6.1 worked.  Conda
version is 4.4.6

conda create --name test-rdkit --channel https://conda.anaconda.org/rdkit rdkit
python=3.6.1

Best,

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Jan 3, 2018 at 11:55 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

> I'm going to guess that it's this problem: https://github.com/rd
> kit/rdkit/issues/1617
> and that the solution is to downgrade conda to v4.3.25 (conda install
> conda=4.3.25).
>
> This problem has proven much more frustrating to fix for the mac (linux
> and windows are now fine) than expected, but Brian and I continue to try.
>
> -greg
>
>
> On Tue, Jan 2, 2018 at 9:46 PM, JW Feng via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> Hi,
>>
>> I want to check to see if others encountered this problem before filing a
>> new issue on github.  I got a seg fault trying to import rdkit.Chem.  I am
>> using Python 3.6.3 on Mac OS 10.13.2 (High Sierra).  Below is a screenshot
>> showing how I reproduced the seg fault error.  RDKit was installed using
>> this conda command "conda install --channel
>> https://conda.anaconda.org/rdkit rdkit"
>>
>>
>> [image: Inline image 1]
>>
>> Python 2.7 works just fine.
>>
>> Thanks,
>>
>> JW
>>
>>
>> ___
>> JW Feng, Ph.D.
>> Denali Therapeutics Inc.
>> 151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080
>> <https://maps.google.com/?q=151+Oyster+Point+Blvd,+2nd+Floor,+South+San+Francisco,+CA+94080=gmail=g>
>>  |
>> (650) 270-0628
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Seg fault importing rdkit.Chem on Mac 10.13.2 and Python 3.6.3

2018-01-02 Thread JW Feng via Rdkit-discuss
Hi,

I want to check to see if others encountered this problem before filing a
new issue on github.  I got a seg fault trying to import rdkit.Chem.  I am
using Python 3.6.3 on Mac OS 10.13.2 (High Sierra).  Below is a screenshot
showing how I reproduced the seg fault error.  RDKit was installed using
this conda command "conda install --channel
https://conda.anaconda.org/rdkit rdkit"


[image: Inline image 1]

Python 2.7 works just fine.

Thanks,

JW


___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 121, Issue 15

2017-11-08 Thread JW Feng via Rdkit-discuss
The Daylight website is a very good resource for SMILES, SMARTS, and
SMIRKS.

http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Nov 8, 2017 at 2:52 PM,  wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. SMARTS for =C=, #CH, #C- (Chenyang Shi)
>2. Re: SMARTS for =C=, #CH, #C- (Andrew Dalke)
>3. Re: SMARTS for =C=, #CH, #C- (Chenyang Shi)
>4. SMARTS for Joback and Reid method (Chenyang Shi)
>
>
> --
>
> Message: 1
> Date: Wed, 8 Nov 2017 14:00:36 -0600
> From: Chenyang Shi 
> To: RDKit Discuss 
> Subject: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID:
>  com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear RDKitters,
>
> I have a question regarding SMARTS codes for three simple functional
> groups, these are =C=, #CH and #C-. I am new to SMARTS/SMILES. I indeed
> tried to guess their codes. Here are my guesses:
>
> =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> #CH : [CH1;A;X2;!R]#[$(*)]
>
> #C- :  [CH0;A;X2;!R]#[$(*)]
>
> I checked these SMARTS at
> http://smartsview.zbh.uni-hamburg.de/smartsview/calculate?method=get; they
> all seem make sense.
>
> For example, the webpage prints out following messages:
>
> =C=: it says "aliphatic C with 0 further total connections, with 0 further
> hydrogen, not in a ring".
>
> #CH: "aliphatic C with 0 further total connections, with 1 further
> hydrogen, not in a ring".
>
> #C-: "aliphatic C with 1 further total connections, with 0 further
> hydrogen, not in a ring".
>
> However, when I search subgroups using these SMARTS, I had problems.
>
> For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C=C=O')
> >>>
> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]"))
> ((1, 0, 2),)
>
> it prints out atomic positions 1, 0, 2--three positions. But I would expect
> only one position for the Carbon in the middle.
>
> Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C#C')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> ((0, 1),)
> I would expect two separate positions such as (0,), (1,), indicating there
> are two carbon triple bonds (with an hydrogen).
>
>
> Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('CC#CC')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts(" [CH0;A;X2;!R]#[$(*)]"))
> ((1, 2),)
> Again, I would expect two separate positions such as (1,), (2,), indicating
> two carbon triple bonds.
>
> I think the problem might be my SMARTS for these three groups are not
> SPECIFIC. I would appreciate everyone's help on this.
>
> Cheers,
> Chenyang
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
> Message: 2
> Date: Wed, 8 Nov 2017 21:27:29 +0100
> From: Andrew Dalke 
> Cc: RDKit Discuss 
> Subject: Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID: <8478f1ae-4916-4feb-8e67-e6cf4e52f...@dalkescientific.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Nov 8, 2017, at 21:00, Chenyang Shi  wrote:
> > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> The recursive SMARTS notation, which is the term inside of the [$(...)],
> finds a match for the entire pattern and returns the first atom in that
> pattern.
>
> > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]",
> > >>> from rdkit import Chem
> > >>> m = Chem.MolFromSmiles('C=C=O')
> > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=
> [$(*)])=[$(*)]"))
> > ((1, 0, 2),)
> >
> > it prints out atomic positions 1, 0, 2--three positions. But I would
> expect only one position for the Carbon in the middle.
>
> The $(*) finds the pattern, which is a "*" and in this case the terminal
> carbons, and returns it. The substructure search returns 3 positions
> because the first is [CH0;A;X2;!R], the second is the first atom of