Re: [Rdkit-discuss] PDB processing

2011-05-10 Thread Adrian Schreyer
If you want to use C++, there is ESBTL, a PDB parser and data
structure for the structural and geometric analysis of biological
macromolecules (http://esbtl.sourceforge.net), MMTK
(http://dirac.cnrs-orleans.fr/MMTK/). Open Babel can read PDB files
but the last time I checked it was not possible to get all the
structure factors from atoms. Depending on what you want to do, it
will be very hard to find a toolkit that has good support for small
molecules and structural biology. As long as you can separate those
things you should be fine.

On Tue, May 10, 2011 at 04:11, Greg Landrum greg.land...@gmail.com wrote:
 On Mon, May 9, 2011 at 7:29 PM, Andrew Fant f...@moleculargeek.com wrote:

 If you need to work with PDB files, you might look at biopython, which has 
 support for that format available.  I don't know specifically about 
 occupancy and B-factors, but it would be someplace to start.


 Along those lines, another one worth looking at is Open Structure
 (http://www.openstructure.org/). Marco Biasini did a presentation on
 this impressive-looking system at the MIOSS meeting last week.

 -greg

 --
 Achieve unprecedented app performance and reliability
 What every C/C++ and Fortran developer should know.
 Learn how Intel has extended the reach of its next-generation tools
 to help boost performance applications - inlcuding clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Antwort: Re: random forest in RDKit - ctd.

2011-05-10 Thread Paul . Czodrowski
Dear Greg,


 
  However, I wonder how to build a 3-class model:
 
  
  for i,m in enumerate(ms):
   if m.GetProp('ACTIVITY_CLASS')=='active':
   act=1
   else:
   act=0
   pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act])
  
 
  Naively, I just tried act 0,1 or 2 - but this did not work.

 You do need to add 0,1,2, but you also need to change the last value
 in nPossible (the number of values each descriptor + the activity can
 take) to 3. Something like this (not tested):
 nPossible = [0]+[2]*ndescrs+[3]

 Then it should work.


Where does ndescrs come from?

Using the MorganFingerprint example from the Wiki:
# build fingerprints:
fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,2048) for x in ms]
nPossible = [0]+[2]*fps+[3]


This did not work.
Does your example code only go with descriptors rather than fingerprints?

Cheers  Thanks,
Paul

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://disclaimer.merck.de to access the German, French, Spanish and
Portuguese versions of this disclaimer.


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] More helpful error messages...

2011-05-10 Thread JP
Greg,

Thanks for this.

I, for one, think it is useful - not in a parse this particular smiles
string fashion.

But consider this use case.  I have 8,000,000 molecules in a few hundred
smiles files on which I am calculating descriptors on the cloud.  I only
have access to log files.
I get some ten thousand  SMILES Parse Error without any additional info.
 Also, I think this error should be just one line (no need to bloat log
files with redundant static data).

These should have a bit of static info which is the same for both (so you
can grep on that) and must have (on the same line) the offending smiles
string, which you could extract easily with regex, so I suggest something
structured like:

In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC')
[06:06:25] SMILES Parse Error: Ccc1XXXcCCC (reason: unknown atoms X)
In [3]: Chem.MolFromSmiles('C1C')
[06:06:28] SMILES Parse Error: C1C (reason: unclosed ring for input)


In the words of the master, as it stands:

Useless, it is.  More info, I need.
From smiles errors, young Padawan must learn.

Jean-Paul Ebejer
Early Stage Researcher

InhibOx Ltd
Pembroke House
36-37 Pembroke Street
Oxford
OX1 1BP
UK

(+44 / 0) 1865 262 034



This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
Any unauthorised dissemination or copying of this email or its attachments,
and any use or disclosure of any information contained in them, is strictly
prohibited and may be illegal.  If you have received this email in error
please notify the sender and delete all copies from your system.

We and our group companies accept no liability or responsibility for
personal emails or emails unconnected with our business.

Internet communications including emails and access and use of web sites
cannot be guaranteed to be secure or error free as information can be
intercepted, corrupted, lost or arrive late. Furthermore, while we have
taken steps to control the spread of viruses on our systems, we cannot
guarantee that this email and any files transmitted with it are virus free.
No liability is accepted for any errors, omissions, interceptions, corrupted
mail, lost communications or late delivery arising as a result of receiving
this message via the Internet or for any virus that may be contained in it.




On 10 May 2011 05:14, Greg Landrum greg.land...@gmail.com wrote:

 Best,
 -greg

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Access RDKit from Java

2011-05-10 Thread Noel O'Boyle
Hi Greg,

You mentioned that there is a Jar file and associated shared library
for RDKit in the KNIME repository. Unfortunately, they don't provide
access except on request (see http://tech.knime.org/node/20231 for
example). Could you make these available somewhere or include them in
a future release?

- Noel

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Access RDKit from Java

2011-05-10 Thread Riccardo Vianello
Hi Noel,

On Tue, May 10, 2011 at 11:30 AM, Noel O'Boyle baoille...@gmail.com wrote:
 Hi Greg,

 You mentioned that there is a Jar file and associated shared library
 for RDKit in the KNIME repository. Unfortunately, they don't provide
 access except on request (see http://tech.knime.org/node/20231 for
 example). Could you make these available somewhere or include them in
 a future release?

I'm not sure if this may be useful, but Greg recently announced the
availability of new SWIG-based wrappers in $RDBASE/Code/JavaWrappers.
Do you specifically need the KNIME ones?

Riccardo

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Access RDKit from Java

2011-05-10 Thread Riccardo Vianello
On Tue, May 10, 2011 at 11:41 AM, Riccardo Vianello
riccardo.viane...@gmail.com wrote:
 I'm not sure if this may be useful, but Greg recently announced the
 availability of new SWIG-based wrappers in $RDBASE/Code/JavaWrappers.
 Do you specifically need the KNIME ones?

sorry for the double post, the information appeared on the rdkit-devel
list, so I guess those bindings are currently only available in the
svn ..

Riccardo

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB processing

2011-05-10 Thread Geoffrey Hutchison

On May 10, 2011, at 2:37 AM, Adrian Schreyer wrote:

 Open Babel can read PDB files but the last time I checked it was not possible 
 to get all the structure factors from atoms.

If someone can explain to me what you actually want, that can get fixed. (I'm 
more a materials scientist than a biochemist.) That's more of a subject for the 
Open Babel tracker` or mailing list.

And I completely understand Greg's view on processing PDB. It's one thing if 
you're talking about authentic PDB files, but the file format has been abused 
horribly and the OB version is a (necessary) mess. Any time we try to clean it 
up, people report a dozen bugs on their not-quite-standard files.

Cheers,
-Geoff
--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Antwort: Re: random forest in RDKit - ctd.

2011-05-10 Thread Igor Filippov
Paul,


  nPossible = [0]+[2]*ndescrs+[3]
 
  Then it should work.
 
 
 Where does ndescrs come from?
 
 Using the MorganFingerprint example from the Wiki:
 # build fingerprints:
 fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,2048) for x in ms]
 nPossible = [0]+[2]*fps+[3]
 
I'm new to this too, but I believe after [2] there should be the length
of your fingerprint - in this case it's 2048:
nPossible = [0]+[2]*2048+[3]

Also, nPossible then goes into Grow() as an argument.

Best regards,
Igor


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Antwort: Re: random forest in RDKit - ctd.

2011-05-10 Thread Greg Landrum
Dear Paul,


Not tested, but what I think you want is:

# build a composite (bag) with 10 trees:
cmp = Composite()
cmp.Grow(pts,attrs=[1],nPossibleVals=[3],nTries=10,
 buildDriver=CrossValidate.CrossValidationDriver,
 treeBuilder=SigTreeBuilder,needsQuantization=False,maxDepth=3)

-greg


On Tue, May 10, 2011 at 10:02 AM,  paul.czodrow...@merck.de wrote:
 Dear Greg,


 
  However, I wonder how to build a 3-class model:
 
  
  for i,m in enumerate(ms):
   if m.GetProp('ACTIVITY_CLASS')=='active':
   act=1
   else:
   act=0
   pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act])
  
 
  Naively, I just tried act 0,1 or 2 - but this did not work.

 You do need to add 0,1,2, but you also need to change the last value
 in nPossible (the number of values each descriptor + the activity can
 take) to 3. Something like this (not tested):
 nPossible = [0]+[2]*ndescrs+[3]

 Then it should work.


 Where does ndescrs come from?

 Using the MorganFingerprint example from the Wiki:
 # build fingerprints:
 fps = [AllChem.GetMorganFingerprintAsBitVect(x,2,2048) for x in ms]
 nPossible = [0]+[2]*fps+[3]


 This did not work.
 Does your example code only go with descriptors rather than fingerprints?

 Cheers  Thanks,
 Paul

 This message and any attachment are confidential and may be privileged or
 otherwise protected from disclosure. If you are not the intended recipient,
 you must not copy this message or attachment or disclose the contents to
 any other person. If you have received this transmission in error, please
 notify the sender immediately and delete the message and any attachment
 from your system. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not accept liability for any omissions or errors in this
 message which may arise as a result of E-Mail-transmission or for damages
 resulting from any unauthorized changes of the content of this message and
 any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not guarantee that this message is free of viruses and does
 not accept liability for any damages caused by any virus transmitted
 therewith.

 Click http://disclaimer.merck.de to access the German, French, Spanish and
 Portuguese versions of this disclaimer.


 --
 Achieve unprecedented app performance and reliability
 What every C/C++ and Fortran developer should know.
 Learn how Intel has extended the reach of its next-generation tools
 to help boost performance applications - inlcuding clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Antwort: Re: Antwort: Re: random forest in RDKit - ctd.

2011-05-10 Thread JP
try:
from rdkit import ML

(MY FIRST HELPFUL POST!!)



Jean-Paul Ebejer
Early Stage Researcher

InhibOx Ltd
Pembroke House
36-37 Pembroke Street
Oxford
OX1 1BP
UK

(+44 / 0) 1865 262 034



This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
Any unauthorised dissemination or copying of this email or its attachments,
and any use or disclosure of any information contained in them, is strictly
prohibited and may be illegal.  If you have received this email in error
please notify the sender and delete all copies from your system.

We and our group companies accept no liability or responsibility for
personal emails or emails unconnected with our business.

Internet communications including emails and access and use of web sites
cannot be guaranteed to be secure or error free as information can be
intercepted, corrupted, lost or arrive late. Furthermore, while we have
taken steps to control the spread of viruses on our systems, we cannot
guarantee that this email and any files transmitted with it are virus free.
No liability is accepted for any errors, omissions, interceptions, corrupted
mail, lost communications or late delivery arising as a result of receiving
this message via the Internet or for any virus that may be contained in it.




On 10 May 2011 14:02, paul.czodrow...@merck.de wrote:

 Composite
--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] More helpful error messages...

2011-05-10 Thread Greg Landrum
Dear JP,

On Tue, May 10, 2011 at 11:30 AM, JP jeanpaul.ebe...@inhibox.com wrote:
 Thanks for this.
 I, for one, think it is useful - not in a parse this particular smiles
 string fashion.
 But consider this use case.  I have 8,000,000 molecules in a few hundred
 smiles files on which I am calculating descriptors on the cloud.  I only
 have access to log files.
 I get some ten thousand  SMILES Parse Error without any additional info.
  Also, I think this error should be just one line (no need to bloat log
 files with redundant static data).

Yeah, the use case is clear.

 These should have a bit of static info which is the same for both (so you
 can grep on that) and must have (on the same line) the offending smiles
 string, which you could extract easily with regex, so I suggest something
 structured like:
 In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC')
 [06:06:25] SMILES Parse Error: Ccc1XXXcCCC (reason: unknown atoms X)
 In [3]: Chem.MolFromSmiles('C1C')
 [06:06:28] SMILES Parse Error: C1C (reason: unclosed ring for input)

Providing a good reason for the failure would certainly sometimes be
useful. It is theoretically possible, but it will require a lot of
work (there are many, many reasons a SMILES could fail to parse). I
think the initial version of this is going to have to just include the
SMILES that caused the failure. Adding explanations is something that
will need to wait.

Best,
-greg

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss