Re: [Rdkit-discuss] PDB processing
If you want to use C++, there is ESBTL, a PDB parser and data structure for the structural and geometric analysis of biological macromolecules (http://esbtl.sourceforge.net), MMTK (http://dirac.cnrs-orleans.fr/MMTK/). Open Babel can read PDB files but the last time I checked it was not possible to get all the structure factors from atoms. Depending on what you want to do, it will be very hard to find a toolkit that has good support for small molecules and structural biology. As long as you can separate those things you should be fine. On Tue, May 10, 2011 at 04:11, Greg Landrum greg.land...@gmail.com wrote: On Mon, May 9, 2011 at 7:29 PM, Andrew Fant f...@moleculargeek.com wrote: If you need to work with PDB files, you might look at biopython, which has support for that format available. I don't know specifically about occupancy and B-factors, but it would be someplace to start. Along those lines, another one worth looking at is Open Structure (http://www.openstructure.org/). Marco Biasini did a presentation on this impressive-looking system at the MIOSS meeting last week. -greg -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Antwort: Re: random forest in RDKit - ctd.
Dear Greg, However, I wonder how to build a 3-class model: for i,m in enumerate(ms): if m.GetProp('ACTIVITY_CLASS')=='active': act=1 else: act=0 pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act]) Naively, I just tried act 0,1 or 2 - but this did not work. You do need to add 0,1,2, but you also need to change the last value in nPossible (the number of values each descriptor + the activity can take) to 3. Something like this (not tested): nPossible = [0]+[2]*ndescrs+[3] Then it should work. Where does ndescrs come from? Using the MorganFingerprint example from the Wiki: # build fingerprints: fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,2048) for x in ms] nPossible = [0]+[2]*fps+[3] This did not work. Does your example code only go with descriptors rather than fingerprints? Cheers Thanks, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] More helpful error messages...
Greg, Thanks for this. I, for one, think it is useful - not in a parse this particular smiles string fashion. But consider this use case. I have 8,000,000 molecules in a few hundred smiles files on which I am calculating descriptors on the cloud. I only have access to log files. I get some ten thousand SMILES Parse Error without any additional info. Also, I think this error should be just one line (no need to bloat log files with redundant static data). These should have a bit of static info which is the same for both (so you can grep on that) and must have (on the same line) the offending smiles string, which you could extract easily with regex, so I suggest something structured like: In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC') [06:06:25] SMILES Parse Error: Ccc1XXXcCCC (reason: unknown atoms X) In [3]: Chem.MolFromSmiles('C1C') [06:06:28] SMILES Parse Error: C1C (reason: unclosed ring for input) In the words of the master, as it stands: Useless, it is. More info, I need. From smiles errors, young Padawan must learn. Jean-Paul Ebejer Early Stage Researcher InhibOx Ltd Pembroke House 36-37 Pembroke Street Oxford OX1 1BP UK (+44 / 0) 1865 262 034 This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised dissemination or copying of this email or its attachments, and any use or disclosure of any information contained in them, is strictly prohibited and may be illegal. If you have received this email in error please notify the sender and delete all copies from your system. We and our group companies accept no liability or responsibility for personal emails or emails unconnected with our business. Internet communications including emails and access and use of web sites cannot be guaranteed to be secure or error free as information can be intercepted, corrupted, lost or arrive late. Furthermore, while we have taken steps to control the spread of viruses on our systems, we cannot guarantee that this email and any files transmitted with it are virus free. No liability is accepted for any errors, omissions, interceptions, corrupted mail, lost communications or late delivery arising as a result of receiving this message via the Internet or for any virus that may be contained in it. On 10 May 2011 05:14, Greg Landrum greg.land...@gmail.com wrote: Best, -greg -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Access RDKit from Java
Hi Greg, You mentioned that there is a Jar file and associated shared library for RDKit in the KNIME repository. Unfortunately, they don't provide access except on request (see http://tech.knime.org/node/20231 for example). Could you make these available somewhere or include them in a future release? - Noel -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Access RDKit from Java
Hi Noel, On Tue, May 10, 2011 at 11:30 AM, Noel O'Boyle baoille...@gmail.com wrote: Hi Greg, You mentioned that there is a Jar file and associated shared library for RDKit in the KNIME repository. Unfortunately, they don't provide access except on request (see http://tech.knime.org/node/20231 for example). Could you make these available somewhere or include them in a future release? I'm not sure if this may be useful, but Greg recently announced the availability of new SWIG-based wrappers in $RDBASE/Code/JavaWrappers. Do you specifically need the KNIME ones? Riccardo -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Access RDKit from Java
On Tue, May 10, 2011 at 11:41 AM, Riccardo Vianello riccardo.viane...@gmail.com wrote: I'm not sure if this may be useful, but Greg recently announced the availability of new SWIG-based wrappers in $RDBASE/Code/JavaWrappers. Do you specifically need the KNIME ones? sorry for the double post, the information appeared on the rdkit-devel list, so I guess those bindings are currently only available in the svn .. Riccardo -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB processing
On May 10, 2011, at 2:37 AM, Adrian Schreyer wrote: Open Babel can read PDB files but the last time I checked it was not possible to get all the structure factors from atoms. If someone can explain to me what you actually want, that can get fixed. (I'm more a materials scientist than a biochemist.) That's more of a subject for the Open Babel tracker` or mailing list. And I completely understand Greg's view on processing PDB. It's one thing if you're talking about authentic PDB files, but the file format has been abused horribly and the OB version is a (necessary) mess. Any time we try to clean it up, people report a dozen bugs on their not-quite-standard files. Cheers, -Geoff -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Antwort: Re: random forest in RDKit - ctd.
Paul, nPossible = [0]+[2]*ndescrs+[3] Then it should work. Where does ndescrs come from? Using the MorganFingerprint example from the Wiki: # build fingerprints: fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,2048) for x in ms] nPossible = [0]+[2]*fps+[3] I'm new to this too, but I believe after [2] there should be the length of your fingerprint - in this case it's 2048: nPossible = [0]+[2]*2048+[3] Also, nPossible then goes into Grow() as an argument. Best regards, Igor -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Antwort: Re: random forest in RDKit - ctd.
Dear Paul, Not tested, but what I think you want is: # build a composite (bag) with 10 trees: cmp = Composite() cmp.Grow(pts,attrs=[1],nPossibleVals=[3],nTries=10, buildDriver=CrossValidate.CrossValidationDriver, treeBuilder=SigTreeBuilder,needsQuantization=False,maxDepth=3) -greg On Tue, May 10, 2011 at 10:02 AM, paul.czodrow...@merck.de wrote: Dear Greg, However, I wonder how to build a 3-class model: for i,m in enumerate(ms): if m.GetProp('ACTIVITY_CLASS')=='active': act=1 else: act=0 pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act]) Naively, I just tried act 0,1 or 2 - but this did not work. You do need to add 0,1,2, but you also need to change the last value in nPossible (the number of values each descriptor + the activity can take) to 3. Something like this (not tested): nPossible = [0]+[2]*ndescrs+[3] Then it should work. Where does ndescrs come from? Using the MorganFingerprint example from the Wiki: # build fingerprints: fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,2048) for x in ms] nPossible = [0]+[2]*fps+[3] This did not work. Does your example code only go with descriptors rather than fingerprints? Cheers Thanks, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Antwort: Re: Antwort: Re: random forest in RDKit - ctd.
try: from rdkit import ML (MY FIRST HELPFUL POST!!) Jean-Paul Ebejer Early Stage Researcher InhibOx Ltd Pembroke House 36-37 Pembroke Street Oxford OX1 1BP UK (+44 / 0) 1865 262 034 This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised dissemination or copying of this email or its attachments, and any use or disclosure of any information contained in them, is strictly prohibited and may be illegal. If you have received this email in error please notify the sender and delete all copies from your system. We and our group companies accept no liability or responsibility for personal emails or emails unconnected with our business. Internet communications including emails and access and use of web sites cannot be guaranteed to be secure or error free as information can be intercepted, corrupted, lost or arrive late. Furthermore, while we have taken steps to control the spread of viruses on our systems, we cannot guarantee that this email and any files transmitted with it are virus free. No liability is accepted for any errors, omissions, interceptions, corrupted mail, lost communications or late delivery arising as a result of receiving this message via the Internet or for any virus that may be contained in it. On 10 May 2011 14:02, paul.czodrow...@merck.de wrote: Composite -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] More helpful error messages...
Dear JP, On Tue, May 10, 2011 at 11:30 AM, JP jeanpaul.ebe...@inhibox.com wrote: Thanks for this. I, for one, think it is useful - not in a parse this particular smiles string fashion. But consider this use case. I have 8,000,000 molecules in a few hundred smiles files on which I am calculating descriptors on the cloud. I only have access to log files. I get some ten thousand SMILES Parse Error without any additional info. Also, I think this error should be just one line (no need to bloat log files with redundant static data). Yeah, the use case is clear. These should have a bit of static info which is the same for both (so you can grep on that) and must have (on the same line) the offending smiles string, which you could extract easily with regex, so I suggest something structured like: In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC') [06:06:25] SMILES Parse Error: Ccc1XXXcCCC (reason: unknown atoms X) In [3]: Chem.MolFromSmiles('C1C') [06:06:28] SMILES Parse Error: C1C (reason: unclosed ring for input) Providing a good reason for the failure would certainly sometimes be useful. It is theoretically possible, but it will require a lot of work (there are many, many reasons a SMILES could fail to parse). I think the initial version of this is going to have to just include the SMILES that caused the failure. Adding explanations is something that will need to wait. Best, -greg -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss