Re: [Rdkit-discuss] want advice for good teaching data set
Thanks for the responses. I'll merge them into one reply: On Aug 29, 2018, at 16:56, Eloy Félix wrote: > If you want to build model I guess that what you want is to get experimental > logp values. > > This should give you something to start with: > > select ACTIVITY_ID, MOLREGNO, STANDARD_VALUE, STANDARD_TYPE from ACTIVITIES > where STANDARD_TYPE = 'LogP' and STANDARD_VALUE is not null and > data_validity_comment is null and POTENTIAL_DUPLICATE = 0; Yes, that's what I was looking for, including the pointers for validity and if it might be a duplicate. Thanks! On Aug 29, 2018, at 15:51, TJ O'Donnell wrote: > ChEMBL 24 has compound properties in the table compound_properties. I think > the alogp > is computed using (Crippen) atom types and the acd_logp is uses ACD labs > methods. I can see I wasn't clear. I was looking for experimental data. The ChEMBL blog post at https://chembl.blogspot.com/2018/05/chembl-24-released.html says that they switched to using RDKit for alogp; acd_logp is still from ACD. On Aug 29, 2018, at 18:07, JW Feng via Rdkit-discuss wrote: > What about building QSAR models to predict activity for a particular ChEMBL > assay? This would allow you to discuss strength and limitations of QSAR > models. I am, primarily, a software developer working in computational chemistry. Do you want fast similarity search? I can do that. Do you want a maximum common structure algorithm, or matched molecular pair algorithm? I can do that. Do you want to tell me which parameters and learning algorithm you want to use? I can make the pieces go together. What I don't have is the expertise to build a chemically relevant model on my own, and discuss its strength and weaknesses. When I build a model, I do it to predict molecular weight. :) Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] want advice for good teaching data set
Hi Andrew, If you want to build model I guess that what you want is to get experimental logp values. This should give you something to start with: select ACTIVITY_ID, MOLREGNO, STANDARD_VALUE, STANDARD_TYPE from ACTIVITIES where STANDARD_TYPE = 'LogP' and STANDARD_VALUE is not null and data_validity_comment is null and POTENTIAL_DUPLICATE = 0; Eloy. 2018-08-29 14:51 GMT+01:00 TJ O'Donnell : > Hi Andrew > ChEMBL 24 has compound properties in the table compound_properties. I > think the alogp > is computed using (Crippen) atom types and the acd_logp is uses ACD labs > methods. > TJ > > On Wed, Aug 29, 2018 at 5:52 AM Andrew Dalke > wrote: > >> Hi all, >> >> I am starting to put together materials for the Python/RDKit training >> course I'm giving just before the RDKit UGM next month. >> >> I would like to structure part of it around the SQLite release of the >> ChEMBL data set. More specifically, I plan to include examples of machine >> learning with scikit-learn, using RDKit descriptors and values from ChEMBL >> 24 (and making sure to use the new schema). >> >> Two problems. First, I'm not a computational chemist and I don't know >> what would constitute a good example to use. "Good" in this case means one >> whose outlines are well-known to likely students. Second, I don't have much >> experience with the ChEMBL data. >> >> My thought is to make a logP model. The easiest would be to based it on >> atom types. For this option, can anyone suggest where I can find logP data >> from ChEMBL? >> >> Another possibility is to use a pre-existing model, like the notebook >> George Papadatos did for Ligand-based Target Prediction at >> http://nbviewer.jupyter.org/gist/madgpap/10457778 . >> >> Perhaps someone here could point me to other existing resources along >> similar lines? >> >> Best regards, >> >> Andrew >> da...@dalkescientific.com >> >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] want advice for good teaching data set
Hi Andrew ChEMBL 24 has compound properties in the table compound_properties. I think the alogp is computed using (Crippen) atom types and the acd_logp is uses ACD labs methods. TJ On Wed, Aug 29, 2018 at 5:52 AM Andrew Dalke wrote: > Hi all, > > I am starting to put together materials for the Python/RDKit training > course I'm giving just before the RDKit UGM next month. > > I would like to structure part of it around the SQLite release of the > ChEMBL data set. More specifically, I plan to include examples of machine > learning with scikit-learn, using RDKit descriptors and values from ChEMBL > 24 (and making sure to use the new schema). > > Two problems. First, I'm not a computational chemist and I don't know what > would constitute a good example to use. "Good" in this case means one whose > outlines are well-known to likely students. Second, I don't have much > experience with the ChEMBL data. > > My thought is to make a logP model. The easiest would be to based it on > atom types. For this option, can anyone suggest where I can find logP data > from ChEMBL? > > Another possibility is to use a pre-existing model, like the notebook > George Papadatos did for Ligand-based Target Prediction at > http://nbviewer.jupyter.org/gist/madgpap/10457778 . > > Perhaps someone here could point me to other existing resources along > similar lines? > > Best regards, > > Andrew > da...@dalkescientific.com > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss