Re: [Rdkit-discuss] 2D Drawing in C++

2014-07-02 Thread Maciek Wójcikowski
Hello,

Adrian: This is already possibile using
http://www.rdkit.org/docs/api/rdkit.Chem.AllChem-module.html#GenerateDepictionMatching2DStructure
Note that scaffold needs pregenerated 2d coordinates.
Also, it's available in python pandas (
https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/PandasTools.py#L284)
and warks really well.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


2014-07-02 14:19 GMT+02:00 Adrian Jasiński jasinski.adr...@gmail.com:

 If I can start a list of wishes for new function it will be nice to add a
 possibility for drawing the series of molecules with the same core
 in the same orientation (based on some alignment or something like that).

 pozdrawiam
 Adrian


 2014-07-02 14:04 GMT+02:00 Greg Landrum greg.land...@gmail.com:

  Ah, what timing.

 I have just started thinking about updating the C++ drawing code (which I
 agree is extremely basic) to improve the quality and make it more generally
 useful. I agree that the abstract base class idea makes a lot of sense.
 This is, in essence, what the Python code does already.

 There's definitely interest from my side. I think it would help a lot.

 If there's a way to do so, I would happy to help out as you proceed. I
 certainly would be more than willing to add refinements like dealing with
 chiral centers.

 -greg




 On Wednesday, July 2, 2014, David Cosgrove davidacosgrov...@gmail.com
 wrote:


 Hi All,

 I am currently writing some code for which I needed a Qt widget with a
 2D RDKit rendering that I could interact with.  This needed
 $RDBASE/GraphMol/MolDrawing/MolDrawing.h, but I found it very difficult to
 find the drawing coordinates for all atoms in the RDKit drawing model and
 convert them to screen coordinates so I could add annotation to the
 drawing, find which atom users had picked, etc.  So I have created new
 drawing classes using a design borrowed from the original OpenEye OEDepict
 toolkit written, I believe, by Roger Sayle.  This uses an abstract base
 class that handles all the drawing in terms of pure virtual functions for
 drawing lines, writing text etc.  The user derives a concrete class that
 implements these functions for his/her drawing library of choice.

 Currently I have something that's adequate for my purposes, but if
 there's enough interest from people on the list, I can tidy it up, provide
 some example code and submit it for inclusion. In the short term it would
 be an addition to the existing system so a non-breaking change, but in the
 longer term it could possibly be a route to unifying the python and C++ 2D
 drawing systems which seem at present to be quite separate.

 I mention this at this stage because it would be a non-trivial amount of
 work on my part to do this.  To start with, I don't need to worry about
 drawing chiral centres in my current project, so I haven't done anything
 about that (MolDrawing.h doesn't either, for that matter!). The example
 codes would have to be written to do something useful that showed all the
 features, there might be a need for documentation, etc.  Not insignificant
 is that I would have to bash this square peg through the round hole which
 is AstraZeneca's publication approval system that won't have seen anything
 like it before.  There don't seem to be many of us C++ programmers posting
 to the list, so I'm really looking to gauge interest. I'm happy to put the
 work in to make this a useful submission to the project, but only if other
 people are likely to use it.

 Please let me know, either to this list or privately to
 david.cosgr...@astrazeneca.com.

 Thanks,
 Dave



 --
 Open source business process management suite built on Java and Eclipse
 Turn processes into business applications with Bonita BPM Community
 Edition
 Quickly connect people, data, and systems into organized workflows
 Winner of BOSSIE, CODIE, OW2 and Gartner awards
 http://p.sf.net/sfu/Bonitasoft
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Open source business process management suite built on Java and Eclipse
 Turn processes into business applications with Bonita BPM Community Edition
 Quickly connect people, data, and systems into organized workflows
 Winner of BOSSIE, CODIE, OW2 and Gartner awards
 http://p.sf.net/sfu/Bonitasoft
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data

Re: [Rdkit-discuss] Installation of RDKit 2014 on Centos 5.10 (Final)

2014-07-28 Thread Maciek Wójcikowski
Just as Christos said: you're missing boost path in LD_LIBRARY_PATH


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


2014-07-28 15:53 GMT+02:00 Enrico Perspicace 
e.perspic...@mx.uni-saarland.de:

 Dear Greg,

 Thanks for your help.

 I have this:
 [root@Pc-Eric-linux build]# python -c 'from rdkit import rdBase'
 Traceback (most recent call last):
File string, line 1, in module
 ImportError: libboost_python.so.1.55.0: cannot open shared object
 file: No such file or directory
 [root@Pc-Eric-linux build]#


 and

 [root@Pc-Eric-linux build]# ldd $RDBASE/rdkit/rdBase.so
 linux-vdso.so.1 =  (0x7fffe57ff000)
 libRDGeneral.so.1 =
 /opt/RDKit/RDKit_2014_03_1/lib/libRDGeneral.so.1
 (0x7ff4d1392000)
 libRDBoost.so.1 = /opt/RDKit/RDKit_2014_03_1/lib/libRDBoost.so.1
 (0x7ff4d0fc6000)
 libboost_python.so.1.55.0 = not found
 libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x7ff4d0cae000)
 libm.so.6 = /opt/anaconda/lib/libm.so.6 (0x7ff4d0a2a000)
 libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x7ff4d0814000)
 libc.so.6 = /lib64/libc.so.6 (0x7ff4d048)
 /lib64/ld-linux-x86-64.so.2 (0x003148e0)
 libboost_python.so.1.55.0 = not found
 [root@Pc-Eric-linux build]#

 What do you think?

 Thanks

 Enrico



 Greg Landrum greg.land...@gmail.com a écrit :

  given that the failures are all for python tests, I guess you don't have
  your PYTHONPATH set properly or that there's an LD_LIBRARY_PATH problem.
 
  What do you get when you run these two commands?
 
  python -c 'from rdkit import rdBase'
 
  ldd $RDBASE/rdkit/rdBase.so
 
 
  -greg
 
 
 
 
 
  -greg
 
 
 
  On Mon, Jul 28, 2014 at 12:13 PM, Enrico Perspicace 
  e.perspic...@mx.uni-saarland.de wrote:
 
  Dear Christos,
 
  I'm back... So I upgraded my old Centos with Centos 6.5 and I updated
 it.
  I installed all packages needed as recommended on RDKit website
  followed by anaconda environment (which include Numpy) as you did.
  Now, I get no errors up to the test of the compilation of RDKit.
  Indeed, compilation of RDKit worked but now, after ctest I have some
  fails...
 
  I did first:
  [root@Pc-Eric-linux build]# cmake -D
  PYTHON_LIBRARY=/opt/anaconda/lib/python2.7/config/libpython2.7.a -D
  PYTHON_INCLUDE_DIR=/opt/anaconda/include/python2.7/ -D
  PYTHON_EXECUTABLE=/opt/anaconda/bin/python2.7 -D
  BOOST_ROOT=/opt/boost/boost_1_55_0 ..
  -- The C compiler identification is GNU 4.4.7
  -- The CXX compiler identification is GNU 4.4.7
  -- Check for working C compiler: /usr/bin/cc
  -- Check for working C compiler: /usr/bin/cc -- works
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++
  -- Check for working CXX compiler: /usr/bin/c++ -- works
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check if the system is big endian
  -- Searching 16 bit integer
  -- Looking for sys/types.h
  -- Looking for sys/types.h - found
  -- Looking for stdint.h
  -- Looking for stdint.h - found
  -- Looking for stddef.h
  -- Looking for stddef.h - found
  -- Check size of unsigned short
  -- Check size of unsigned short - done
  -- Using unsigned short
  -- Check if the system is big endian - little endian
  -- Found PythonLibs: /opt/anaconda/lib/python2.7/config/libpython2.7.a
  (found version 2.7.7)
  -- Found PythonInterp: /opt/anaconda/bin/python2.7 (found version
 2.7.7)
  -- Boost version: 1.55.0
  -- Found the following Boost libraries:
  --   python
  -- Looking for include file pthread.h
  -- Looking for include file pthread.h - found
  -- Looking for pthread_create
  -- Looking for pthread_create - not found
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - found
  -- Found Threads: TRUE
  -- Boost version: 1.55.0
  -- Found the following Boost libraries:
  --   regex
  -- Configuring done
  -- Generating done
  -- Build files have been written to: /opt/RDKit/RDKit_2014_03_1/build
  [root@Pc-Eric-linux build]#
 
  Then
 
  [root@Pc-Eric-linux build]# make
  [root@Pc-Eric-linux build]# make install
 
  RDKit is compiling correctly withour errors. I just had sometimes this
  warning:
 
  [ 46%] Building CXX object
  Code/DataStructs/Wrap/CMakeFiles/cDataStructs.dir/DataStructs.cpp.o
  Dans le fichier inclus à partir de
 
 
 /opt/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1761,
 à partir de
 
 
 /opt/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17,
 à partir de
 
 
 /opt/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
 à partir de
  /opt/RDKit/RDKit_2014_03_1/Code/DataStructs/Wrap/DataStructs.cpp:20:
 
 /opt/anaconda/lib/python2.7/site-packages/numpy/core

Re: [Rdkit-discuss] UGM Update

2014-10-24 Thread Maciek Wójcikowski
Hi,

Actually there are some slides on github: https://github.com/rdkit/UGM_2014
I can't wait to see them all since I couldn't be there myself.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2014-10-24 13:29 GMT+02:00 Nicholas Firth nicholas.fi...@icr.ac.uk:

 Hi Christos,

 I did a few tweets using #rdkitugm2014 but it's by no means representative
 of the excellent presentations.

 I guess people will be putting their slides into the repository in the
 next few days.


 Best,
 Nick

 Nicholas C. Firth | PhD Student | Cancer Therapeutics
 The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton |
 Surrey | SM2 5NG
 T 020 8722 4033 | E nicholas.fi...@icr.ac.uk | W www.icr.ac.uk | Twitter
 @ICRnews

 
 From: Christos Kannas [chriskan...@gmail.com]
 Sent: 24 October 2014 12:17
 To: RDKit
 Subject: [Rdkit-discuss] UGM Update

 Hi RDKiters,

 How is UGM going?
 Is there a tweet feed to follow?

 Hope you are having a nice and interesting time!

 Best,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 [
 http://s.c.lnkd.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png
 ]http://cy.linkedin.com/in/christoskannas

 The Institute of Cancer Research: Royal Cancer Hospital, a charitable
 Company Limited by Guarantee, Registered in England under Company No.
 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

 This e-mail message is confidential and for use by the addressee only.  If
 the message is received by anyone other than the addressee, please return
 the message to the sender by replying to it and then delete the message
 from your computer and network.


 --
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] A RDKit/Scikit-learn question

2015-02-20 Thread Maciek Wójcikowski
Hello,

If I can remember correctly coefficients are Numpy array. You can try
model.coef_.flatten()
to get flat Numpy Array. If you really want a python list, then you
probably should wrap it up with list(model.coef_.flatten()).

The main reason, why the vector is nested is that you can have many
output values for one feature vector.

PS.
I could also recommend my Open Drug Discovery Toolkit for playing around
with RDKit and scikit-learn.
https://github.com/oddt/oddt


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-02-20 7:29 GMT+01:00 Greg Landrum greg.land...@gmail.com:



 On Thu, Feb 19, 2015 at 11:59 PM, Matthew Lardy mla...@gmail.com wrote:


 I have been able to build models via scikit-learn with the RDKit python
 wrappers.  That all works beautifully!


 It's a nice combination, isn't it?


 What I am struggling to get are the weights, or scalers, applied to each
 bit position.  For a SVM regression model (SVR) I think that the values I
 seek are in the coef_ (if the model is created via the linear kernel).
 But, all I get is something like this when I print that out:

 [[-0. -0.87146158 -0.46331996 ...,  0.31076767 -0.
 -0.81882195]]


 I don't really know the SVM regression approach particularly well, but it
 looks like that's a vector of vectors. Is the length of the inner vector
 the same as the length of the fingerprint/descriptor vector you are
 providing?

 -greg



 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] A RDKit/Scikit-learn question

2015-02-20 Thread Maciek Wójcikowski
Anyhow, there you have the code. ODDT has another repo with code snippets:

   - snippet #2 [
   http://nbviewer.ipython.org/github/oddt/jcheminf/blob/master/Snippet_2.ipynb]
   - train various models (RF, SVM, NN, MLR) on RFScore descriptors
   - snippet #3 [
   http://nbviewer.ipython.org/github/oddt/jcheminf/blob/master/Snippet_3.ipynb]
   - train RF using many fingerprints (OpenBabel's and RDKit's)



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-02-20 13:52 GMT+01:00 Igor Filippov igor.v.filip...@gmail.com:

 Oops, sorry, got into a wrong branch! I am just repeating Maciek's answer
 it looks like!

 Igor

 On Fri, Feb 20, 2015 at 7:50 AM, Igor Filippov igor.v.filip...@gmail.com
 wrote:

 Maciek,

 I think scikit-learn is using numpy arrays and not plain Python lists.
 They look very similar, but are not quite the same thing.
 Maybe post a bit more complete code sample for people to play with?

 Igor

 On Fri, Feb 20, 2015 at 4:06 AM, Maciek Wójcikowski 
 mac...@wojcikowski.pl wrote:

 Hello,

 If I can remember correctly coefficients are Numpy array. You can try
 model.coef_.flatten() to get flat Numpy Array. If you really want a
 python list, then you probably should wrap it up with list(model.
 coef_.flatten()).

 The main reason, why the vector is nested is that you can have many
 output values for one feature vector.

 PS.
 I could also recommend my Open Drug Discovery Toolkit for playing
 around with RDKit and scikit-learn.
 https://github.com/oddt/oddt

 
 Pozdrawiam,  |  Best regards,
 Maciek Wójcikowski
 mac...@wojcikowski.pl

 2015-02-20 7:29 GMT+01:00 Greg Landrum greg.land...@gmail.com:



 On Thu, Feb 19, 2015 at 11:59 PM, Matthew Lardy mla...@gmail.com
 wrote:


 I have been able to build models via scikit-learn with the RDKit
 python wrappers.  That all works beautifully!


 It's a nice combination, isn't it?


 What I am struggling to get are the weights, or scalers, applied to
 each bit position.  For a SVM regression model (SVR) I think that the
 values I seek are in the coef_ (if the model is created via the linear
 kernel).  But, all I get is something like this when I print that out:

 [[-0. -0.87146158 -0.46331996 ...,  0.31076767 -0.
 -0.81882195]]


 I don't really know the SVM regression approach particularly well, but
 it looks like that's a vector of vectors. Is the length of the inner vector
 the same as the length of the fingerprint/descriptor vector you are
 providing?

 -greg



 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] A RDKit/Scikit-learn question

2015-02-20 Thread Maciek Wójcikowski
That's true. it's been a while since last release of OB and I'm currently
working on git master branch, since it has less bugs and more features.
Have fun and feel free to mail me if you'd have trouble with ODDT.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-02-20 18:21 GMT+01:00 Matthew Lardy mla...@gmail.com:

 Hi Maciek,

 Thanks!  My brain was stuck on this for a while, as it has been ages since
 I have written any Python.

 BTW- I also took a look at your ODDT, and it reminded me that I need to
 get the OB python wrappers re-compiled.  :)

 Thanks,
 Matthew

 On Fri, Feb 20, 2015 at 1:06 AM, Maciek Wójcikowski mac...@wojcikowski.pl
  wrote:

 Hello,

 If I can remember correctly coefficients are Numpy array. You can try
 model.coef_.flatten() to get flat Numpy Array. If you really want a
 python list, then you probably should wrap it up with list(model.
 coef_.flatten()).

 The main reason, why the vector is nested is that you can have many
 output values for one feature vector.

 PS.
 I could also recommend my Open Drug Discovery Toolkit for playing
 around with RDKit and scikit-learn.
 https://github.com/oddt/oddt

 
 Pozdrawiam,  |  Best regards,
 Maciek Wójcikowski
 mac...@wojcikowski.pl

 2015-02-20 7:29 GMT+01:00 Greg Landrum greg.land...@gmail.com:



 On Thu, Feb 19, 2015 at 11:59 PM, Matthew Lardy mla...@gmail.com
 wrote:


 I have been able to build models via scikit-learn with the RDKit python
 wrappers.  That all works beautifully!


 It's a nice combination, isn't it?


 What I am struggling to get are the weights, or scalers, applied to
 each bit position.  For a SVM regression model (SVR) I think that the
 values I seek are in the coef_ (if the model is created via the linear
 kernel).  But, all I get is something like this when I print that out:

 [[-0. -0.87146158 -0.46331996 ...,  0.31076767 -0.
 -0.81882195]]


 I don't really know the SVM regression approach particularly well, but
 it looks like that's a vector of vectors. Is the length of the inner vector
 the same as the length of the fingerprint/descriptor vector you are
 providing?

 -greg



 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] FYI: google code shutting down

2015-03-13 Thread Maciek Wójcikowski
I vote for setting up readthedocs.org automatic documentation generation,
plus some sphinx API docs. All we need to do then is to keep track of
changes and create solid docstrings.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-03-13 15:43 GMT+01:00 David Hall li...@cowsandmilk.net:

 well, presumably the documentation not in the wiki would continue to be
 online.

 http://rdkit.org/docs/index.html
 http://rdkit.org/docs/api/index.html
 http://rdkit.org/docs/cppapi/index.html

 None of those are the wiki and tend to be more up-to-date.

 -David


 On Mar 13, 2015, at 10:39 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
 wrote:

 On 2015-03-13 01:25, Greg Landrum wrote:

 Github does offer the option of setting up a wiki for a project, I
 haven't done this for the RDKit since it doesn't seem that necessary
 (and it seems that the information in wikis has a tendency to rot) but
 if anyone has strong opinion otherwise, we can get something set up.


 Documentation has a tendency to go out of date whether it's in wiki
 format or not. What's the proposed alternative: not have any
 documentation online at all?

 Dimitri



 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] FYI: google code shutting down

2015-03-13 Thread Maciek Wójcikowski
My bad. I knew it's Sphinx generated, although had no idea it's in sync
with Github :) There is also no mention of it on github (a badge would be
nice) - Pull Request coming.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-03-13 16:10 GMT+01:00 Greg Landrum greg.land...@gmail.com:

 You mean like the docs that are already there?
 The links that David provided are to docs that are built on Sphinx. Those
 are also available, without the API documentation from ReadTheDocs (
 https://readthedocs.org/projects/rdkit/)

 My solution for keeping the docs as up-to-date as possible is to include
 doctests in them. This isn't perfect, but it's the best I've found.

 -greg



 On Fri, Mar 13, 2015 at 3:52 PM, Maciek Wójcikowski mac...@wojcikowski.pl
  wrote:

 I vote for setting up readthedocs.org automatic documentation
 generation, plus some sphinx API docs. All we need to do then is to keep
 track of changes and create solid docstrings.

 
 Pozdrawiam,  |  Best regards,
 Maciek Wójcikowski
 mac...@wojcikowski.pl

 2015-03-13 15:43 GMT+01:00 David Hall li...@cowsandmilk.net:

 well, presumably the documentation not in the wiki would continue to be
 online.

 http://rdkit.org/docs/index.html
 http://rdkit.org/docs/api/index.html
 http://rdkit.org/docs/cppapi/index.html

 None of those are the wiki and tend to be more up-to-date.

 -David


 On Mar 13, 2015, at 10:39 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
 wrote:

 On 2015-03-13 01:25, Greg Landrum wrote:

 Github does offer the option of setting up a wiki for a project, I
 haven't done this for the RDKit since it doesn't seem that necessary
 (and it seems that the information in wikis has a tendency to rot) but
 if anyone has strong opinion otherwise, we can get something set up.


 Documentation has a tendency to go out of date whether it's in wiki
 format or not. What's the proposed alternative: not have any
 documentation online at all?

 Dimitri



 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software development, from weekly thought leadership
 blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering functions in Java API

2015-02-23 Thread Maciek Wójcikowski
Hello,

If interested in clustering in python I can recommend, as usual, sklearn:
http://scikit-learn.org/stable/modules/clustering.html
It's pretty much all you should need. Have fun!


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-02-23 11:43 GMT+01:00 Anthony Bradley anthony.brad...@worc.ox.ac.uk:

   Hi Anthony,



 On Sun, Feb 22, 2015 at 11:03 AM, Anthony Bradley 
 anthony.brad...@worc.ox.ac.uk wrote:

 Hi all,

 I am currently working with RDKit from the Java API (well jython actually).

 As has been discussed most of the documentation for this is found by
 trawling:

 Code/JavaWrappers/gmwrapper/src-test/org/RDKit/
 and
 Code/JavaWrappers/gmwrapper/src/org/RDKit/

 However I'm trying to perform a simple clustering. I can build my distance
 matrix - but I can't see where the actual clustering algorithms live.

 It may well be my grepping skills are not what they should be!



 No need to have any concerns about your skills with grep, the clustering
 functionality is not exposed via the SWIG wrappers. As currently configured
 the code isn't available as a library, it's really only useable from
 python. It's a medium-sized amount of work to convert this to a library, so
 it's doable, but I'm not sure it's worth it.



 That seems fair enough and there are definitely other options out there.
 It was more of method consistency thing – so I could be using the same code
 from the python / jython side.



 I've been assuming that there are high(er) quality replacements available
 for most of the RDKit machine learning functionality. Since it's somewhat
 removed from the cheminformatics focus, I haven't really put any time
 into that code in the past few years. Does this sound wrong to anyone? Any
 arguments that the clustering code is worth investing some time in?



 Unless anybody else is interested – I can see why it would be low
 priority!



 -greg



 Thanks a lot for responding so quickly and effectively!



 Best,



 Anthony




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] new parallel execution features in the 2015.03 RDKit release

2015-07-03 Thread Maciek Wójcikowski
Hi Greg,

This is super feature, but are there any downsides? Why not to enable it by
default (provided the dependencies are met)?


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-07-03 16:54 GMT+02:00 Greg Landrum greg.land...@gmail.com:

 Hi Adam,

 Sorry I forgot to mention that.
 If you are building your own version of the RDKit, you will need to
 provide the flag RDK_THREADSAFE_SSS=ON to cmake. You can do this when you
 first call cmake or, if you have run it already, by runing ccmake in your
 build directory, modifying the value of the variable, re-generating the
 cmake files (hitting 'c' and then 'g') and then re-building.

 Best,
 -greg


 On Friday, July 3, 2015, az adam.zalew...@mail.com wrote:

  Thanks Greg

 One silly question. Should the parallelization work right off the bat
 or do I need some special steps in building RDKit ? I'm betting the latter,
 but can't find what I'm missing (numThreads=4 does nothing for me).

 Cheers,
 Adam

 On 03-Jul-15 5:11, Greg Landrum wrote:

 Dear all,

  James' question about boost.thread dependencies reminded me that I seem
 to have neglected to post anything about one of the useful additions to the
 most recent RDKit release; I'll remedy that now.

  I added multi-threading support for a few more time-consuming functions
 that are embarrassingly parallel: conformation generation, force field
 minimization of multiple-conformer molecules, and Open3D alignment of
 multiple-conformer molecules.

  Here's a quick demonstration of how this works.

  Start by creating a basic molecule:

   In [1]: from rdkit import Chem

   In [2]: from rdkit.Chem import AllChem

   In [3]: m =
 Chem.MolFromSmiles('Cc1cc(C(=O)NCC(=O)N)c(C)n1c2ccc(F)cc2') # CHEMBL2113931

   In [4]: mh = Chem.AddHs(m)


  Time how long it takes to generate 50 conformers using one thread:

   In [5]: %timeit AllChem.EmbedMultipleConfs(mh,50)
  1 loops, best of 3: 260 ms per loop


  And using 4 threads:

   In [6]: %timeit AllChem.EmbedMultipleConfs(mh,50,numThreads=4)
  10 loops, best of 3: 77.6 ms per loop


  Nice speed up there.

  Do a force-field minimization of all of those conformations with one
 thread:

   In [7]: %timeit tm = Chem.Mol(mh);AllChem.UFFOptimizeMoleculeConfs(tm)
  1 loops, best of 3: 1.11 s per loop


  And using 4 threads:

   In [8]: %timeit tm =
 Chem.Mol(mh);AllChem.UFFOptimizeMoleculeConfs(tm,numThreads=4)
  1 loops, best of 3: 288 ms per loop


  Another good improvement.

  Use the O3A code to align all the conformers to another molecule:

   In [16]: m2 =
 Chem.MolFromSmiles(r'Cc1cc(\C=C\2/SC(=Nc3c3)NC2=O)c(C)n1c4c4') #
 CHEMBL599702

   In [17]: m2h = Chem.AddHs(m2)

   In [18]: AllChem.EmbedMolecule(m2h)
  Out[18]: 0

   In [19]: AllChem.UFFOptimizeMolecule(m2h)
  Out[19]: 1

   In [20]: refParams = AllChem.MMFFGetMoleculeProperties(m2h)

   In [21]: prbParams = AllChem.MMFFGetMoleculeProperties(mh)

   In [23]: %timeit
 tm=Chem.Mol(mh);AllChem.GetO3AForProbeConfs(tm,m2h,prbPyMMFFMolProperties=prbParams,refPyMMFFMolProperties=refParams)
  1 loops, best of 3: 1.13 s per loop


  Do the same alignment with 4 threads:

   In [24]: %timeit
 tm=Chem.Mol(mh);AllChem.GetO3AForProbeConfs(tm,m2h,prbPyMMFFMolProperties=prbParams,refPyMMFFMolProperties=refParams,numThreads=4)
  1 loops, best of 3: 304 ms per loop


  I think this is a convenient (and very easy) way to take advantage of
 one of the convenient features of modern compute hardware.

 Best,
 -greg



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud 
 Today.https://www.gigenetcloud.com/



 ___
 Rdkit-discuss mailing 
 listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-27 Thread Maciek Wójcikowski
Hi Jing,

Most fingerprints are binary, thus can be stored as np.bool_, which
compared to double should be 64 times more memory efficient.

Best,
Maciej


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-08-27 16:15 GMT+02:00 Jing Lu ajin...@gmail.com:

 Hi Greg,

 Thanks! It works! But, is that possible to fold the fingerprint to smaller
 size? np.zeros((100,2048)) still takes a lot of memory...


 Best,
 Jing

 On Wed, Aug 26, 2015 at 11:02 PM, Greg Landrum greg.land...@gmail.com
 wrote:


 On Thu, Aug 27, 2015 at 3:00 AM, Jing Lu ajin...@gmail.com wrote:


 So, I wonder is there any way to convert fingerprint to a numpy vector?


 Indeed there is:

 In [11]: from rdkit import Chem

 In [12]: from rdkit import DataStructs

 In [13]: import numpy

 In [14]: m =Chem.MolFromSmiles('C1CCC1')

 In [15]: fp = Chem.RDKFingerprint(m)

 In [16]: fpa = numpy.zeros((len(fp),),numpy.double)

 In [17]: DataStructs.ConvertToNumpyArray(fp,fpa)


 Best,
 -greg




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-28 Thread Maciek Wójcikowski
One small notice from me - I would still use other agregative function
instead of sum to get binary FP:
np.reshape(fpa, (4, -1)).any(axis = 0)
I guess it doesn't change a thing with tanimoto, but if you try other
distances then you can get unexpected results (assuming there are crashes).


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-08-28 17:17 GMT+02:00 Jing Lu ajin...@gmail.com:

 Thanks, Greg,

 Yes, sciket learn will automatically promote to arrays of float with 
 check_array()
 function. What I am currently doing is


 fpa = numpy.zeros((len(fp),),numpy.double)
 DataStructs.ConvertToNumpyArray(fp,fpa)
 np.sum(np.reshape(fpa, (4, -1)), axis = 0)


 Is this the same as FoldFingerprint()?


 Best,Jing



 On Fri, Aug 28, 2015 at 5:03 AM, Greg Landrum greg.land...@gmail.com
 wrote:

 If that doesn't help (and it may not since some Scikit-Learn functions
 automatically promote their arguments to arrays of doubles), you can always
 just generate a shorter fingerprint from the beginning (all the
 fingerprinting functions take an optional argument for this) or fold the
 existing fingerprints to a new size using the function
 rdkit.DataStructs.FoldFingerprint().

 Best,
 -greg


 On Thu, Aug 27, 2015 at 4:33 PM, Maciek Wójcikowski 
 mac...@wojcikowski.pl wrote:

 Hi Jing,

 Most fingerprints are binary, thus can be stored as np.bool_, which
 compared to double should be 64 times more memory efficient.

 Best,
 Maciej

 
 Pozdrawiam,  |  Best regards,
 Maciek Wójcikowski
 mac...@wojcikowski.pl

 2015-08-27 16:15 GMT+02:00 Jing Lu ajin...@gmail.com:

 Hi Greg,

 Thanks! It works! But, is that possible to fold the fingerprint to
 smaller size? np.zeros((100,2048)) still takes a lot of memory...


 Best,
 Jing

 On Wed, Aug 26, 2015 at 11:02 PM, Greg Landrum greg.land...@gmail.com
 wrote:


 On Thu, Aug 27, 2015 at 3:00 AM, Jing Lu ajin...@gmail.com wrote:


 So, I wonder is there any way to convert fingerprint to a numpy
 vector?


 Indeed there is:

 In [11]: from rdkit import Chem

 In [12]: from rdkit import DataStructs

 In [13]: import numpy

 In [14]: m =Chem.MolFromSmiles('C1CCC1')

 In [15]: fp = Chem.RDKFingerprint(m)

 In [16]: fpa = numpy.zeros((len(fp),),numpy.double)

 In [17]: DataStructs.ConvertToNumpyArray(fp,fpa)


 Best,
 -greg




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] possible SMARTS translating mistake?

2015-09-16 Thread Maciek Wójcikowski
Hi Christopher,

Since you're mentioning Rajarshi's SMARTS, I guess that you haven't seen
Greg's latest revision of PAINS filters (see
http://rdkit.blogspot.com.es/2015/08/curating-pains-filters.html). On the
other hand, during RDKit UGM I remember Greg saying that some of the
filters would require changes to RDKit's aromatic model, and this one seams
to be the case (Greg might confirm/check?).

Best,
Maciej

2015-09-15 18:48 GMT+02:00 Bodle, Christopher R :

> All,
>
> I am working on a filtering code in python to search for substructure
> matches against my hit list (in SMILES) and my filter lists (in SMARTS).
> My current filter lists were copied from Rajarshi Guha's blog at
> http://blog.rguha.net/?p=850.
>
> While working on this I was working with the following SMARTS string from
> the p_l150 collection, filter purrole_A(118):
>
> n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
>
>
> I have highlighted the problem area in the string.  Although this should
> be interpreted as 'not H', the rendering generated from Chem.MolFromSmarts
> does indeed result in a hydrogen in this position, which is in the middle
> of an aromatic ring and results in a valency issue and as such I can't
> standardize the mol for filtering purposes.
>
> I confirmed this by making the following edit to the SMILES string:
> n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
>
> Which results in a carbon in the position of the hydrogen from the
> original SMARTS.  Is this a problem with the SMARTS translator?  Or is
> there something that I am missing?
>
> I believe this happens quite frequently.  When running a standardization
> code for the filter p_l150 (55 compounds) using:
>
> p_l150['standardized mol']=''
> imax,jmax = p_l150.shape
> for i in range(imax):
> mol_file =mf= p_l150.loc[i,'mol file']
> s = Standardizer()
> try:
> m = Chem.MolToSmiles(mf)
> m2 = standardize_smiles(m)
> m3 = Chem.MolFromSmiles(m2)
> smol = s.standardize(m3)
> p_l150.loc[i,'standardized mol'] = smol
> except Exception as e:
> print p_l150.loc[i,'filter'], e
> p_l150
>
> I return 11 errors, 8 of which are valency (7 of those involve hydrogens):
>
> 

Re: [Rdkit-discuss] Latest version

2015-12-10 Thread Maciek Wójcikowski
Hi David,
I think that SF was abandoned in favor of GitHub, so for new releases go to
https://github.com/rdkit/rdkit/releases


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-12-10 11:12 GMT+01:00 David Cosgrove <davidacosgrov...@gmail.com>:

> Hi All,
>
> I'm sorry to trouble you all with this one, as I feel I should be able to
> do better.  I'm trying to install the latest version, 2015.09.1, but I
> can't find it on sourceforge.  The latest one I can find there is
> 2015.03.1.  I've managed to get the ubuntu installation installed via
> apt-get, and my python interpreter can find it. However, I can't find the
> include files or object libraries for C++ development, which is what I'm
> after.
>
> Can someone please point me in the right direction?  Be as rude as you
> like in the process, as I feel I must be being very dim!
>
> Thanks,
>
> Dave
>
>
>
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Conda packages bug

2015-11-18 Thread Maciek Wójcikowski
Hi,

I've noticed a bug in conda packages, when you're doing a clean install
["conda install -c https://conda.anaconda.org/rdkit rdkit"] the old version
2014.09 is install by default.

To force installation of 2015.03 you have to specify it's version:
conda install -c https://conda.anaconda.org/rdkit rdkit=2015

Here comes the probable bug cause: the 2015.03 package requires *older*
version of boost (as I recall 1.56), where 2014.09 version requires newer
version: 1.57. During the update boost is downgraded.

Code to reproduce the bug:
conda create -n test_rdkit python
source activate test_rdkit
conda install -c rdkit rdkit
conda install -c rdkit rdkit=2015

I hope we manage to get it fixed for 2015.09 :)


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Load mol2 file with partial charges

2015-11-20 Thread Maciek Wójcikowski
Hi Gaetano,

There is property called "_TriposPartialCharge" for each atom. To get
partial charges read from mol2 just execute:

[float(a.GetProp("_TriposPartialCharge")) if "_TriposPartialCharge" in
a.GetPropNames() else 0.0) for a in mol.GetAtoms()]



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2015-11-20 0:28 GMT+01:00 Gaetano Calabro <gcala...@uci.edu>:

> Hi there,
>
> I would like to load a mol2 file with partial charge information in
> RDkit. How can I retrieve the atomic partial charge in RDkit? I haven't
> seen any function related to it.
>
> Cheers,
>
> Gaetano
>
>
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reproduce this method to obtain graph using RDkit

2016-02-03 Thread Maciek Wójcikowski
Hello,

I'd guess they mean a graph = 2D representation of molecule.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-02-03 14:24 GMT+01:00 Guillaume GODIN <guillaume.go...@firmenich.com>:

> Dear All,
>
>
>
> Can you explain me a little how to reproduce this paper experimental
> process:
> http://www.broadinstitute.org/~jbloom/smrc/neural-fingerprints.pdf page 5
> using RDKit.
>
>
>
> Look like they tweak RDKit:
>
>
>
> From page 5: “Experimental setup:
>
> 1 Our pipeline takes as input the SMILES [30] string encoding of each
> molecule, which is then converted into a graph using RDKit [20].
>
> 2 We also used RDKit to produce the extended circular fingerprints used in
> the baseline.
>
> 3 Hydrogen atoms were treated implicitly.
>
>
>
> In our convolutional networks, the initial atom and bond features were
> chosen to be similar to those used by ECFP: Initial atom features
> concatenated a one-hot encoding of the atom’s element, its degree, the
> number of attached hydrogen atoms, and the implicit valence, and an
> aromaticity indicator.
>
> The bond features were a concatenation of whether the bond type was
> single, double, triple, or aromatic, whether the bond was conjugated, and
> whether the bond was part of a ring.”
>
>
>
> My question: I don’t see how to obtain the graph from the smile.
>
>
>
> Best regards,
>
>
>
> *Dr. Guillaume GODIN*
>
> Project Manager
>
> Innovation
>
> CORPORATE R DIVISION
>
> DIRECT LINE +41 (0)22 780 3645
>
> MOBILE   +41 (0)79 536 1039
>
> *Firmenich SA*
>
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
>
>
>
>
> **
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> **
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reproduce this method to obtain graph using RDkit

2016-02-03 Thread Maciek Wójcikowski
So my initial guess was wrong. They also include a github repo, where you
can exactly see how the graph is formed:
https://github.com/HIPS/neural-fingerprint/blob/master/neuralfingerprint/mol_graph.py#L75


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-02-03 14:35 GMT+01:00 Giuseppe Marco Randazzo <gmranda...@gmail.com>:

> The graph it’s a simple adjacency matrix (square matrix) which is able to
> represent the connectivity of the  molecule.
>
> The element of this matrix are the atoms which can be connected.
>
> If you have a simple smile like c1c1 this means that your molecule is
> a ring of six atoms (6*c), and the small letter “c” means that your atom is
> of aromatic type. thus you can write
> your adjacency matrix in this manner:
>
>
>
> Adjacency matrix:
>
>
>   c1 c2 c3 c4 c5 c6
> c1 0 1 0 0 0 1
> c2 1 0 1 0 0 0
> c3 0 1 0 1 0 0
> c4 0 0 1 0 1 0
> c5 0 0 0 1 0 1
> c6 1 0 0 0 1 0
>
> 1 means that there is a bond between two atoms.
>
> That’s it.
>
> Marco
>
>
> On 03 Feb 2016, at 14:24, Guillaume GODIN <guillaume.go...@firmenich.com>
> wrote:
>
> Dear All,
>
> Can you explain me a little how to reproduce this paper experimental
> process:http://www.broadinstitute.org/~jbloom/smrc/neural-fingerprints.pdf
>  page 5 using RDKit.
>
> Look like they tweak RDKit:
>
> From page 5: “Experimental setup:
> 1 Our pipeline takes as input the SMILES [30] string encoding of each
> molecule, which is then converted into a graph using RDKit [20].
> 2 We also used RDKit to produce the extended circular fingerprints used in
> the baseline.
> 3 Hydrogen atoms were treated implicitly.
>
> In our convolutional networks, the initial atom and bond features were
> chosen to be similar to those used by ECFP: Initial atom features
> concatenated a one-hot encoding of the atom’s element, its degree, the
> number of attached hydrogen atoms, and the implicit valence, and an
> aromaticity indicator.
> The bond features were a concatenation of whether the bond type was
> single, double, triple, or aromatic, whether the bond was conjugated, and
> whether the bond was part of a ring.”
>
> My question: I don’t see how to obtain the graph from the smile.
>
> Best regards,
>
> *Dr. Guillaume GODIN*
> Project Manager
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645
> MOBILE   +41 (0)79 536 1039
> *Firmenich SA*
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
>
>
> **
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> **
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
>
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Maciek Wójcikowski
Hi,

Few months back Greg has added CanonicalRankAtoms to rdkit.Chem after my
similar question.
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-03-10 13:18 GMT+01:00 Michal Krompiec <michal.kromp...@gmail.com>:

> Thanks a lot, this is exactly what I wanted.
> Best regards,
> Michal
>
> On 10 March 2016 at 12:13, Brian Kelley <fustiga...@gmail.com> wrote:
>
>> The canonicalizer doesn't treat hydrogens any differently than any other
>> atom, but they have to be in the graph.  If you are starting from smiles,
>> simply add explicit hydrogens, python example below:
>>
>> >>> from rdkit import Chem
>>
>> >>> m = Chem.MolFromSmiles("CC")
>>
>> >>> mh = Chem.AddHs(m)
>>
>> >>> Chem.MolToSmiles(mh)
>>
>> '[H]C([H])([H])C([H])([H])[H]'
>>
>> >>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))
>>
>> # safer non eval version...
>>
>> >>> order = mh.GetPropsAsDict(includePrivate=True,
>>
>>
>> includeComputed=True)['_smilesAtomOutputOrder']
>>
>> >>> list(order)
>>
>> [2,0,3,4,1,5,6,7]
>>
>> >>>
>>
>> Not that the output order is from the context of the output smiles
>> string, i.e. order[0] is the index of the original atom index that was the
>> outputs first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx
>>
>> On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec <
>> michal.kromp...@gmail.com> wrote:
>>
>>> Hello,
>>> I need a "canonical" method for generating atom indices for a given
>>> molecule (with 3D coordinates, so the input is e.g. a mol file), for a
>>> molecular descriptor which should be invariant with respect to atom
>>> indexing. As I understand, canonical SMILES will give the same atom indices
>>> for non-hydrogen atoms, but is there a way in RDKit to generate unique
>>> indices for hydrogens as well?
>>> Best regards,
>>> Michal
>>>
>>>
>>> --
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Pandas dataframe manipulation

2016-03-11 Thread Maciek Wójcikowski
Hi Paul,

I would suggest:

   - assigning dtype of dataframe/column to str/np.object
   - cleaning up the IC50s
   - casting to float/int as dataframe.astype()

Or alternatively you could use "converters" argument:
pd.read_csv('filename.csv', converters={'ic50_colname': lambda x:
x.replace('>', '')})

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-03-11 11:12 GMT+01:00 Paul Czodrowski <paul.czodrow...@merckgroup.com>:

> Dear RDKitter & Pandas-Dataframes heavy users,
>
>
>
> please find below a question concerning the conversion of pandas
> dataframes:
>
> df = pd.DataFrame({"item": ["a", "b", "c", "d", "e"], "row1":
> [1,2,3,">2",5],
> "row2":[0.1,0.2,0.3,0.4,0.5],"row3":["ab","cd","ed","gh","ij"]})
>
> df_new = df[df[["row1"]].applymap(np.isreal).all(1)]
>
>
>
> I would like to get rid of this nasty ">2" entry in "row1" => This works
> perfect  given the snippet above.
>
>
>
> However, when I read in a CSV file containing similar data (see the
> attached CSV) => The conversion does not work: all columns in the IC50
> value are discarded and end up in yielding "NaN".
>
>
>
> What is going wrong?
>
>
>
>
>
> Thanks & Cheers,
>
> Paul
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Submol: bond vs atom indicies

2016-03-22 Thread Maciek Wójcikowski
Hi Greg,
2016-03-22 6:28 GMT+01:00 Greg Landrum <greg.land...@gmail.com>:
>
> Hi Maciek,
>
>
> On Mon, Mar 21, 2016 at 8:33 PM, Maciek Wójcikowski <mac...@wojcikowski.pl>
wrote:
>>
>>
>> I came across one problem with RDKit today, namely Chem.PathToSubmol()
function. Does the "path" mean atom or bond indices? On this very list I
fount the examples showing usage with atom idx [
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03966.html],
while the example on "Getting started in python" is feeding
Chem.FindAtomEnvironmentOfRadiusN() which gives a list of bond indices. The
documentation could be more explicit here... After my brief analysis of the
code I found out that the bonds should be used (correct me if I'm wrong).
>
>
> The function is still not documented, but it's definitely bonds. I think
the thread you reference from the mailing list says the same thing.

Ok, you're right I've just noticed your comment, while the example was
still using atom indices (although they worked for the sample mol -
fortunatelly aligned with atom indices).

>
>
>>
>> So here comes the question: is there an equivalent function or a clever
way to do Chem.PathToSubmol() on atom indices? Currently I do: 1) get the
atom path; 2) get bonds between every atom in path (their indices); 3) get
submol with Chem.PathToSubmol()
>
>
> I don't think so.
>
>>
>> PS.
>> I use it to get each proteins residue (amino acid) in separate mol. It
would be much easier if we could use "Molecule -> Residues ->  Atoms"
instead of "Molecule -> Atoms -> (grouping of monomers) -> Residues".
>>
>
> SplitMolByPDBResidues() doesn't do what you want?
>
>

Not really. I want to get each amino acid separately, so I'd have to do
SplitMolByPDBChainId() -> SplitMolByPDBResidues() -> break the peptide
bonds (to eliminate series of aa) -> split disconnected molecules. And that
only outputs valid PDB amino acids. Accessing non-standard ones, like HOH,
LIG, UNL, although present in PDB would be also desired. In other words the
unique key should be "monomer index + chain id" instead of only three
letter name as in SplitMolByPDBResidues().

Maciek

>
> -greg
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Submol: bond vs atom indicies

2016-03-22 Thread Maciek Wójcikowski
I correct myself, all residue types are available
from Chem.SplitMolByPDBResidues().


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-03-22 9:50 GMT+01:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:

> Hi Greg,
> 2016-03-22 6:28 GMT+01:00 Greg Landrum <greg.land...@gmail.com>:
> >
> > Hi Maciek,
> >
> >
> > On Mon, Mar 21, 2016 at 8:33 PM, Maciek Wójcikowski <
> mac...@wojcikowski.pl> wrote:
> >>
> >>
> >> I came across one problem with RDKit today, namely Chem.PathToSubmol()
> function. Does the "path" mean atom or bond indices? On this very list I
> fount the examples showing usage with atom idx [
> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03966.html],
> while the example on "Getting started in python" is feeding
> Chem.FindAtomEnvironmentOfRadiusN() which gives a list of bond indices. The
> documentation could be more explicit here... After my brief analysis of the
> code I found out that the bonds should be used (correct me if I'm wrong).
> >
> >
> > The function is still not documented, but it's definitely bonds. I think
> the thread you reference from the mailing list says the same thing.
>
> Ok, you're right I've just noticed your comment, while the example was
> still using atom indices (although they worked for the sample mol -
> fortunatelly aligned with atom indices).
>
> >
> >
> >>
> >> So here comes the question: is there an equivalent function or a clever
> way to do Chem.PathToSubmol() on atom indices? Currently I do: 1) get the
> atom path; 2) get bonds between every atom in path (their indices); 3) get
> submol with Chem.PathToSubmol()
> >
> >
> > I don't think so.
> >
> >>
> >> PS.
> >> I use it to get each proteins residue (amino acid) in separate mol. It
> would be much easier if we could use "Molecule -> Residues ->  Atoms"
> instead of "Molecule -> Atoms -> (grouping of monomers) -> Residues".
> >>
> >
> > SplitMolByPDBResidues() doesn't do what you want?
> >
> >
>
> Not really. I want to get each amino acid separately, so I'd have to do
> SplitMolByPDBChainId() -> SplitMolByPDBResidues() -> break the peptide
> bonds (to eliminate series of aa) -> split disconnected molecules. And that
> only outputs valid PDB amino acids. Accessing non-standard ones, like HOH,
> LIG, UNL, although present in PDB would be also desired. In other words the
> unique key should be "monomer index + chain id" instead of only three
> letter name as in SplitMolByPDBResidues().
>
> Maciek
>
> >
> > -greg
> >
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Submol: bond vs atom indicies

2016-03-22 Thread Maciek Wójcikowski
FYI: If anybody needs the code which gets all residues as separate
Molecules, here it is.

prot = Chem.MolFromPDBFile('10gs/10gs_protein_rdkit.pdb', flavor=1)
residues = []
aa = Chem.MolFromSmarts('NCC(=O)N')
for res in Chem.SplitMolByPDBResidues(prot).values():
for frag in Chem.GetMolFrags(res, asMols=True, sanitizeFrags=False):
# match a peptide bond
peptide_bonds = [frag.GetBondBetweenAtoms(match[2],
match[4]).GetIdx() for match in frag.GetSubstructMatches(aa)]
if peptide_bonds:
disconnected_aa = Chem.FragmentOnBonds(frag, peptide_bonds,
addDummies=False)
residues.extend(Chem.GetMolFrags(disconnected_aa, asMols=True,
sanitizeFrags=False))
else:
residues.append(frag)

The downside is that there is no atom map, thus the indices relation is
lost. This is why I stick to the original solution of grouping the atoms in
residues by "residue number + residue chain".

Implementing such grouping in similar way as SplitMolByPDBResidues/Chains
would also loose the atom mapping if I understand the RDKit code correctly.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-03-22 11:05 GMT+01:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:

> I correct myself, all residue types are available
> from Chem.SplitMolByPDBResidues().
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2016-03-22 9:50 GMT+01:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:
>
>> Hi Greg,
>> 2016-03-22 6:28 GMT+01:00 Greg Landrum <greg.land...@gmail.com>:
>> >
>> > Hi Maciek,
>> >
>> >
>> > On Mon, Mar 21, 2016 at 8:33 PM, Maciek Wójcikowski <
>> mac...@wojcikowski.pl> wrote:
>> >>
>> >>
>> >> I came across one problem with RDKit today, namely Chem.PathToSubmol()
>> function. Does the "path" mean atom or bond indices? On this very list I
>> fount the examples showing usage with atom idx [
>> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03966.html],
>> while the example on "Getting started in python" is feeding
>> Chem.FindAtomEnvironmentOfRadiusN() which gives a list of bond indices. The
>> documentation could be more explicit here... After my brief analysis of the
>> code I found out that the bonds should be used (correct me if I'm wrong).
>> >
>> >
>> > The function is still not documented, but it's definitely bonds. I
>> think the thread you reference from the mailing list says the same thing.
>>
>> Ok, you're right I've just noticed your comment, while the example was
>> still using atom indices (although they worked for the sample mol -
>> fortunatelly aligned with atom indices).
>>
>> >
>> >
>> >>
>> >> So here comes the question: is there an equivalent function or a
>> clever way to do Chem.PathToSubmol() on atom indices? Currently I do: 1)
>> get the atom path; 2) get bonds between every atom in path (their indices);
>> 3) get submol with Chem.PathToSubmol()
>> >
>> >
>> > I don't think so.
>> >
>> >>
>> >> PS.
>> >> I use it to get each proteins residue (amino acid) in separate mol. It
>> would be much easier if we could use "Molecule -> Residues ->  Atoms"
>> instead of "Molecule -> Atoms -> (grouping of monomers) -> Residues".
>> >>
>> >
>> > SplitMolByPDBResidues() doesn't do what you want?
>> >
>> >
>>
>> Not really. I want to get each amino acid separately, so I'd have to do
>> SplitMolByPDBChainId() -> SplitMolByPDBResidues() -> break the peptide
>> bonds (to eliminate series of aa) -> split disconnected molecules. And that
>> only outputs valid PDB amino acids. Accessing non-standard ones, like HOH,
>> LIG, UNL, although present in PDB would be also desired. In other words the
>> unique key should be "monomer index + chain id" instead of only three
>> letter name as in SplitMolByPDBResidues().
>>
>> Maciek
>>
>> >
>> > -greg
>> >
>>
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Molecular properties + pickling

2016-03-19 Thread Maciek Wójcikowski
Hi all,

Is it a bug or am I doing something wrong - the properties are not passed
during pickling in python. Here comes the example:

from rdkit import Chem
import cPickle as pickle

mol = Chem.MolFromSmiles('c1c1')
mol.SetProp('aaa', '123')
print list(mol.GetPropNames()) # ['aaa']
mol2 = pickle.loads(pickle.dumps(mol))
print list(mol2.GetPropNames()) # ['']


In [19]: rdkit.__version__
Out[19]: '2015.09.2'


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mmFF minimization in context of two ligands/binding pocket

2016-04-01 Thread Maciek Wójcikowski
Hi Steven,

No. 1) is exactly what I'm doing to preform rigid protein minimization of
ligands. I use ff.AddFixedPoint() to fix all protein atoms (you could omit
the desired side chains). For better performance I also use following
params:

ff = AllChem.MMFFGetMoleculeForceField(comp, ff_mp, nonBondedThresh=10.,
ignoreInterfragInteractions=False)

where comp is complex Mol, ff_mp are force-field molecular properties.

PS. There is no C++ API to my knowledge, although Greg/Paolo might have
something in his undocumented wizard hat ;)


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-04-01 17:29 GMT+02:00 Steven Combs <steven.com...@gmail.com>:

> Hi,
>
> Is it possible to do a minimization in mmFF in context of two
> small-molecules or a single small molecule and sidechains from a binding
> pocket? From what I gather, I have two options:
>
> 1) Make a single RWMol object with the ligand and all residues from the
> binding pocket, then run mmFF on the combine molecule. The steps would look
> like this
>-use combineMols() for all residues
>-use mmFF on the complex
>
> 2) Use a hacked version of MCS from here:
> http://rdkit.blogspot.com/2013/12/using-allchemconstrainedembed.html
>
>
> Is there a better way of doing this? I am using the C++ api.
>
> Steven Combs
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Submol: bond vs atom indicies

2016-03-21 Thread Maciek Wójcikowski
Hi all,

I came across one problem with RDKit today, namely Chem.PathToSubmol()
function. Does the "path" mean atom or bond indices? On this very list I
fount the examples showing usage with atom idx [
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03966.html],
while the example on "Getting started in python" is feeding
Chem.FindAtomEnvironmentOfRadiusN() which gives a list of bond indices. The
documentation could be more explicit here... After my brief analysis of the
code I found out that the bonds should be used (correct me if I'm wrong).

So here comes the question: is there an equivalent function or a clever way
to do Chem.PathToSubmol() on atom indices? Currently I do: 1) get the atom
path; 2) get bonds between every atom in path (their indices); 3) get
submol with Chem.PathToSubmol()

PS.
I use it to get each proteins residue (amino acid) in separate mol. It
would be much easier if we could use "Molecule -> Residues ->  Atoms"
instead of "Molecule -> Atoms -> (grouping of monomers) -> Residues".


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Maciek Wójcikowski
Hi Janusz,

AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms
to include. [http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-
module.html#AddHs]


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:

> Dear RDKit Community,
>
> By default H atoms are not explicit in the molecular graph and because of
> that the substructure matching is ignoring them when searching for
> substructures. It is possible to use Chem.AddHs(mol) to add explicit
> hydrogens to all atoms in the molecule and then perform substructure
> matching but is it possible, in RDkit, to add explicit hydrogens
> specifically to atoms of choice instead to all of them?
>
> So let's say if I do:
>
> m1 = Chem.MolFromSmiles('C=C')
> m1_H = Chem.AddHs(m1)
> print m1_H.GetNumAtoms()
> print Chem.MolToSmiles(m1_H)
>
> The result is:
>
> >>> 6
> >>> [H]C([H])=C([H])[H]
>
> What if I would like to add only one (1)  explicit hydrogen atom to a
> specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I
> would want to have:
>
> print m1_H.GetNumAtoms()
> print Chem.MolToSmiles(m1_H)
>
> >>> 3
> >>> [H]C=C
>
> I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1)
> which correctly adds an explicit H to C=C molecule but somehow I cannot
> convert it to smiles with this one additional explicit H added or
> subsequently use  for substructure matching.
>
> At the end I would like to do a substructure matching where the following
> query structures:
>
>
> [H]C=C or [H]C=CC match the following molecule:
> [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]
>
> but at the same time those query structures: [H]C=C([H])[H] or
> [H]C([H])=CC do not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]
>
> PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using
> Chem.AddHs(mol) will not be matched onto 
> [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]
> which is correct.
>
> Thank you very much for your help,
>
> Best regards,
>
> Janusz Petkowski
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Maciek Wójcikowski
Which RDKit version do you have?

"print rdkit.__version__"


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-01-21 21:38 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:

> Czesc again,
>
> Many thanks for the code snippet. I thought that I use it wrongly, I
> previously tried to use it exactly like you wrote, but I always got an
> error back. I think that maybe I am missing a module? I copied your snippet
> and tried to use it and got the same error
>
> m1 = Chem.MolFromSmiles('c1c1')
> m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
> print Chem.MolToSmiles(m1)
>
>
>
> The error is below:
>
> m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
> Boost.Python.ArgumentError: Python argument types in
> rdkit.Chem.rdmolops.AddHs(Mol)
> did not match C++ signature:
> AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool
> addCoords=False)
>
> It looks like RDkit does not recognize the onlyOnAtoms function?
>
> Thanks again for all your help!
>
> Janusz
>
> --
> *From:* Maciek Wójcikowski [mac...@wojcikowski.pl]
> *Sent:* Saturday, January 21, 2017 3:11 PM
>
> *To:* Janusz Petkowski
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] adding custom number of explicit H to
> specified non-hydrogen atoms
>
> Cześć,
>
> Following code will add Hs to atoms 2,3,4. These are the usual RDKit
> indices which you get from "Atom.GetIdx()".
>
>> In [5]: m1 = Chem.MolFromSmiles('c1c1')
>>...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
>>...: Chem.MolToSmiles(m1)
>>...:
>>...:
>> Out[5]: '[H]c1([H])c1[H]'
>
>
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2017-01-21 15:54 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:
>
>> Czesc Maciek,
>>
>> Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is
>> exactly what I would need. If it is not too big of a problem would it be
>> possible for you to give me a simple example how to toggle that option on?
>> I am sorry if this question seems obvious but I am not a programmer and my
>> python skills are not yet advanced.
>>
>> Best regards,
>>
>> Janusz Petkowski
>> --
>> *From:* Maciek Wójcikowski [mac...@wojcikowski.pl]
>> *Sent:* Saturday, January 21, 2017 5:35 AM
>> *To:* Janusz Petkowski
>> *Cc:* rdkit-discuss@lists.sourceforge.net
>> *Subject:* Re: [Rdkit-discuss] adding custom number of explicit H to
>> specified non-hydrogen atoms
>>
>> Hi Janusz,
>>
>> AddHs has a parameter "onlyOnAtoms" which takes a list of indices of
>> atoms to include. [http://www.rdkit.org/Python_D
>> ocs/rdkit.Chem.rdmolops-module.html#AddHs]
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>> 2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:
>>
>>> Dear RDKit Community,
>>>
>>> By default H atoms are not explicit in the molecular graph and because
>>> of that the substructure matching is ignoring them when searching for
>>> substructures. It is possible to use Chem.AddHs(mol) to add explicit
>>> hydrogens to all atoms in the molecule and then perform substructure
>>> matching but is it possible, in RDkit, to add explicit hydrogens
>>> specifically to atoms of choice instead to all of them?
>>>
>>> So let's say if I do:
>>>
>>> m1 = Chem.MolFromSmiles('C=C')
>>> m1_H = Chem.AddHs(m1)
>>> print m1_H.GetNumAtoms()
>>> print Chem.MolToSmiles(m1_H)
>>>
>>> The result is:
>>>
>>> >>> 6
>>> >>> [H]C([H])=C([H])[H]
>>>
>>> What if I would like to add only one (1)  explicit hydrogen atom to a
>>> specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I
>>> would want to have:
>>>
>>> print m1_H.GetNumAtoms()
>>> print Chem.MolToSmiles(m1_H)
>>>
>>> >>> 3
>>> >>> [H]C=C
>>>
>>> I tried to use the following method: 
>>> m1.GetAtomWithIdx(0).SetNumExplicitHs(1)
>>> which correctly adds an explicit H to C=C molecule but somehow I cannot
>>> convert it to smiles with this one additional explicit H added or
>>> subsequently use  for substructure matching.
>>>
>>> At the end I would like to do a s

Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Maciek Wójcikowski
Cześć,

Following code will add Hs to atoms 2,3,4. These are the usual RDKit
indices which you get from "Atom.GetIdx()".

> In [5]: m1 = Chem.MolFromSmiles('c1c1')
>...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
>...: Chem.MolToSmiles(m1)
>...:
>...:
> Out[5]: '[H]c1([H])c1[H]'




Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-01-21 15:54 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:

> Czesc Maciek,
>
> Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is
> exactly what I would need. If it is not too big of a problem would it be
> possible for you to give me a simple example how to toggle that option on?
> I am sorry if this question seems obvious but I am not a programmer and my
> python skills are not yet advanced.
>
> Best regards,
>
> Janusz Petkowski
> --
> *From:* Maciek Wójcikowski [mac...@wojcikowski.pl]
> *Sent:* Saturday, January 21, 2017 5:35 AM
> *To:* Janusz Petkowski
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] adding custom number of explicit H to
> specified non-hydrogen atoms
>
> Hi Janusz,
>
> AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms
> to include. [http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module
> .html#AddHs]
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:
>
>> Dear RDKit Community,
>>
>> By default H atoms are not explicit in the molecular graph and because of
>> that the substructure matching is ignoring them when searching for
>> substructures. It is possible to use Chem.AddHs(mol) to add explicit
>> hydrogens to all atoms in the molecule and then perform substructure
>> matching but is it possible, in RDkit, to add explicit hydrogens
>> specifically to atoms of choice instead to all of them?
>>
>> So let's say if I do:
>>
>> m1 = Chem.MolFromSmiles('C=C')
>> m1_H = Chem.AddHs(m1)
>> print m1_H.GetNumAtoms()
>> print Chem.MolToSmiles(m1_H)
>>
>> The result is:
>>
>> >>> 6
>> >>> [H]C([H])=C([H])[H]
>>
>> What if I would like to add only one (1)  explicit hydrogen atom to a
>> specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I
>> would want to have:
>>
>> print m1_H.GetNumAtoms()
>> print Chem.MolToSmiles(m1_H)
>>
>> >>> 3
>> >>> [H]C=C
>>
>> I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1)
>> which correctly adds an explicit H to C=C molecule but somehow I cannot
>> convert it to smiles with this one additional explicit H added or
>> subsequently use  for substructure matching.
>>
>> At the end I would like to do a substructure matching where the following
>> query structures:
>>
>>
>> [H]C=C or [H]C=CC match the following molecule:
>> [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]
>>
>> but at the same time those query structures: [H]C=C([H])[H] or
>> [H]C([H])=CC do not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]
>>
>> PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using
>> Chem.AddHs(mol) will not be matched onto 
>> [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]
>> which is correct.
>>
>> Thank you very much for your help,
>>
>> Best regards,
>>
>> Janusz Petkowski
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-22 Thread Maciek Wójcikowski
I find installing RDKit using Conda the easiest and the most
straightforward across all platforms.
First install miniconda [http://conda.pydata.org/miniconda.html] and then
in terminal: "conda install -c rdkit rdkit"


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-01-21 23:33 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:

> Ok,  one last question. I try to update my RDKit to the current version
> (rdkit-Release_2016_09_3) which I downloaded from here
> https://github.com/rdkit/rdkit/releases so I can use onlyOnAtoms function.
>
> My current version (2015.03.1.) installed on Win 7 machine works perfectly
> well.  I have downloaded the new one - rdkit-Release_2016_09_3 - I have set
> up environmental variables as described in Win installation guide  (and as
> I had to set them up last time to get the previous 2015.03.1 version
> working) and at the end I have an import error like that:
>
> from rdkit import Chem
>   File "C:\rdkit-Release_2016_09_3\rdkit\__init__.py", line 2, in 
> from .rdBase import rdkitVersion as __version__
> ImportError: No module named rdBase
>
> I presume that this is somehow related to missing DLLs? But I had them
> installed when I got the old version, so they should be there. When I try
> to download them from here http://www.microsoft.com/en-
> us/download/details.aspx?id= anyway, I got a notification that newer
> DLLs are already installed.
>
> Reverting to my previous RDkit version 2015.03.1. allows everything to
> work again.
>
> Does anybody know how to circumvent this problem?
>
> Thank you once again!
>
> Janusz
> --
> *From:* Peter Gedeck [peter.ged...@gmail.com]
> *Sent:* Saturday, January 21, 2017 3:44 PM
> *To:* Janusz Petkowski; Maciek Wójcikowski
>
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] adding custom number of explicit H to
> specified non-hydrogen atoms
>
> Looks like you have a very old version of RDkit. The additional option was
> included in RDkit 2016.03.1. Check
>
> import rdkit
> print(rdkit.__version__)
>
> Best,
>
> Peter
>
>
>
> On Sat, Jan 21, 2017 at 3:39 PM Janusz Petkowski <jjpet...@mit.edu> wrote:
>
>> Czesc again,
>>
>> Many thanks for the code snippet. I thought that I use it wrongly, I
>> previously tried to use it exactly like you wrote, but I always got an
>> error back. I think that maybe I am missing a module? I copied your snippet
>> and tried to use it and got the same error
>>
>> m1 = Chem.MolFromSmiles('c1c1')
>>
>> m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
>> print Chem.MolToSmiles(m1)
>>
>>
>>
>> The error is below:
>>
>> m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
>> Boost.Python.ArgumentError: Python argument types in
>> rdkit.Chem.rdmolops.AddHs(Mol)
>> did not match C++ signature:
>> AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool
>> addCoords=False)
>>
>> It looks like RDkit does not recognize the onlyOnAtoms function?
>>
>> Thanks again for all your help!
>>
>> Janusz
>>
>> --
>> *From:* Maciek Wójcikowski [mac...@wojcikowski.pl]
>> *Sent:* Saturday, January 21, 2017 3:11 PM
>>
>> *To:* Janusz Petkowski
>> *Cc:* rdkit-discuss@lists.sourceforge.net
>> *Subject:* Re: [Rdkit-discuss] adding custom number of explicit H to
>> specified non-hydrogen atoms
>> Cześć,
>>
>> Following code will add Hs to atoms 2,3,4. These are the usual RDKit
>> indices which you get from "Atom.GetIdx()".
>>
>> In [5]: m1 = Chem.MolFromSmiles('c1c1')
>>...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
>>...: Chem.MolToSmiles(m1)
>>...:
>>...:
>> Out[5]: '[H]c1([H])c1[H]'
>>
>>
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>> 2017-01-21 15:54 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu>:
>>
>> Czesc Maciek,
>>
>> Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is
>> exactly what I would need. If it is not too big of a problem would it be
>> possible for you to give me a simple example how to toggle that option on?
>> I am sorry if this question seems obvious but I am not a programmer and my
>> python skills are not yet advanced.
>>
>> Best regards,
>>
>> Janusz Petkowski
>> --
>> *From:* Maciek Wójcikowski [mac...@wojcikowski.pl]
>> *Sent:* Saturda

Re: [Rdkit-discuss] Angstroms Hydrogen bonding

2016-09-14 Thread Maciek Wójcikowski
Hi Guillaume,

Greg's solution is great for intra-molecular H-Bonds. If you want to
achieve inter-molecular ones then it's a bit more complicated. I did such
implementation in my package ODDT [https://github.com/oddt/oddt], which
also uses RDKit. You can find the hbond function in interactions module [
https://github.com/oddt/oddt/blob/master/oddt/interactions.py#L92].

from oddt.toolkits import rdk
> from oddt.interactions import hbond
> hbond(rdk.Molecule(your_rdkit_mol1),
> rdk.Molecule(your_rdkit_mol2), cutoff=3.5)


The final function will return a series of donor-acceptor pairs which fall
within cutoff, and a bool array saying if they match the angle criteria.
Note, that it looks at D-A distance and not at H-A distance.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-09-14 12:16 GMT+02:00 Greg Landrum <greg.land...@gmail.com>:

> Hi Guillaume,
>
> On Tue, Sep 13, 2016 at 10:12 PM, Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>> 1 Does 3D coordinates of a conformer is in Angstroms ?
>>
>> If you read the conformer from a file, for example a mol file, then the
> 3D coordinates are in whatever units they were in in that file. This is
> usually Angstrsom.
> If you generate the conformation using one of the RDKit embedding
> functions, then they are certainly in Angstroms.
>
>> 2 How to enumerate all HBonding to determine the bond length ?
>>
>
> Interesting question. Here's a python function that might be a starting
> point:
>
> def findHBonds(m,confId=-1,possiblePartners='[#8,#7]',
> possibleHs='[#1][#8,#7]',distThresh=2.5):
> conf = m.GetConformer(confId)
> partners =[x[0] for x in  m.GetSubstructMatches(Chem.MolFromSmarts(
> possiblePartners))]
> hs=  [x[0] for x in m.GetSubstructMatches(Chem.
> MolFromSmarts(possibleHs))]
> res = []
> for h in hs:
> ph = conf.GetAtomPosition(h)
> for partner in partners:
> if m.GetBondBetweenAtoms(h,partner) is not None:
> continue
> d = conf.GetAtomPosition(partner).Distance(ph)
> if d<=distThresh:
> res.append((h,partner,d))
> return res
>
>
> In order to allow flexibility about what an H bond is, I left the
> definitions of the acceptors (partners in the above code) and polar Hs
> (just Hs in the above code) as SMARTS definitions so that they can be
> customized.
>
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-03 Thread Maciek Wójcikowski
I could only imagine the errors when using threading/multiprocessing +
reusing SDMolSupplier object... So if I understand correctly the official
line of RDKit is: "a multimol file => supplier => file(-like) objects".


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-10-03 14:53 GMT+02:00 Brian Kelley <fustiga...@gmail.com>:

> I'll admit that using StringIO here feels more pythonic, although SetData
> can be reused without a reconstructing the class.
>
> I suppose I would prefer having something like
>
> MolToSDDataBlock
>
> Which can be used in conjunction with MolToMolBlock.  I have often found
> that many times data changes without molecule change so perhaps both could
> be useful.
>
> 
> Brian Kelley
>
> > On Oct 3, 2016, at 7:08 AM, Andrew Dalke <da...@dalkescientific.com>
> wrote:
> >
> >> On Oct 2, 2016, at 10:48 PM, Maciek Wójcikowski wrote:
> >> Yes I get it, but obviously there is no MolFromSDBlock, so one would
> suspect MolFromMolBlock to support both formats. As I understand correctly
> the only way of reading SD from variable is as presented in my example? Or
> is there some marvelous undocumented API? ;)
> >
> > Six years ago, Greg Landrum at http://www.mail-archive.com/
> rdkit-discuss@lists.sourceforge.net/msg01436.html suggested:
> >
> >   nsuppl = Chem.SDMolSupplier()
> >   nsuppl.SetData(mb)
> >   mol = nsuppl.next()
> >
> > This is simpler than passing in a StringIO().
> >
> > I knew about this posting because my own code has MolFromSDBlock()
> wrapper layer, and a comment pointing to that URL as explanation.
> >
> >
> >> On Oct 2, 2016, at 11:06 PM, Brian Kelley wrote:
> >> The general idea, I believe, is that if the format can result in
> multiple molecules a supplier should be used.
> >
> > I wrote the function to make it easier to deal with web service input or
> database records where the text contains one and only one record in SD
> format. This is a proper subset of the SD format, which contains 0 or more
> records.
> >
> > If there are no records then my function returns a None, so I don't need
> to deal with a StopIteration. I don't care if there is more than one
> record, so I ignore anything past the first record.
> >
> > The use case occurs pretty frequently in my work, so I figured a
> MolFromSDBlock() for my own use was worthwhile.
> >
> > Cheers,
> >
> >Andrew
> >da...@dalkescientific.com
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-02 Thread Maciek Wójcikowski
Yes I get it, but obviously there is no MolFromSDBlock, so one would
suspect MolFromMolBlock to support both formats. As I understand correctly
the only way of reading SD from variable is as presented in my example? Or
is there some marvelous undocumented API? ;)


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-10-02 22:20 GMT+02:00 Brian Kelley <fustiga...@gmail.com>:

> It's neither a bug nor a feature in this case, simply the specification of
> the mdl format.
>
> The SD in an sd file stands for "structured data" which are the properties
> you are looking for plus the mol block.
>
> A decent write up is here:
>
> https://en.m.wikipedia.org/wiki/Chemical_table_file
>
> If you see the dollar signs in your text block, it is indeed an sd record
> not just a mol block.
>
> ----
> Brian Kelley
>
> On Oct 2, 2016, at 3:46 PM, Maciek Wójcikowski <mac...@wojcikowski.pl>
> wrote:
>
> Hi RDKitters,
>
> Is it a bug or a feature? When using Chem.MolFromMolBlock there is no
> properties from SD file. There is a bit of code to replicate that issue:
>
> from rdkit import Chem
>> tmp = """20346
>>  RDKit  3D
>>  36 38  0  0  0  0  0  0  0  0999 V2000
>>15.8390   -9.3370   68.8840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>17.1400   -9.1830   69.5480 N   0  0  0  0  0  0  0  0  0  0  0  0
>>17.4030   -7.7570   69.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>17.0930   -7.4160   71.2420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.2300   -6.5720   71.8210 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.6770   -7.1570   73.0920 N   0  0  0  0  0  0  0  0  0  0  0  0
>>20.1430   -7.2290   73.1530 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.5650   -8.5770   73.7380 C   0  0  0  0  0  0  0  0  0  0  0  0
>>21.6390   -9.1530   72.9180 N   0  0  0  0  0  0  0  0  0  0  0  0
>>21.3560  -10.5710   72.6640 C   0  0  0  0  0  0  0  0  0  0  0  0
>>21.5940  -10.8820   71.1850 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.4320  -11.7190   70.6460 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.0430  -11.2190   69.3210 N   0  0  0  0  0  0  0  0  0  0  0  0
>>18.5820  -11.1310   69.1980 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.1950   -9.7400   68.6920 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.7370   -9.2360   69.9070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.1700   -7.9140   71.1420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.1800   -8.0060   70.2040 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.2840  -10.3730   70.5500 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.2740  -10.2810   71.4890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>12.7160   -9.0510   71.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>11.6140   -8.9500   72.8070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>11.8810   -7.8160   73.7040 N   0  0  0  0  0  0  0  0  0  0  0  0
>>12.9350   -8.1480   74.6710 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.2790   -7.6200   74.1620 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.9870   -6.8610   75.2860 C   0  0  0  0  0  0  0  0  0  0  0  0
>>15.4440   -5.5580   74.7840 N   0  0  0  0  0  0  0  0  0  0  0  0
>>15.1190   -4.5130   75.7650 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.5870   -3.2770   75.0370 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.3710   -2.7990   75.7080 N   0  0  0  0  0  0  0  0  0  0  0  0
>>12.3120   -2.5090   74.7330 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.9900   -3.1010   75.2260 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.3040   -3.8450   74.0790 C   0  0  0  0  0  0  0  0  0  0  0  0
>> 9.7590   -5.1150   74.5740 N   0  0  0  0  0  0  0  0  0  0  0  0
>>10.0350   -6.2090   73.6340 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.6540   -7.3880   74.3890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>   2  1  1  0
>>   3  2  1  0
>>   4  3  1  0
>>   5  4  1  0
>>   6  5  1  0
>>   7  6  1  0
>>   8  7  1  0
>>   9  8  1  0
>>  10  9  1  0
>>  11 10  1  0
>>  12 11  1  0
>>  13 12  1  0
>>  14 13  1  0
>>  15 14  1  0
>>  15  2  1  0
>>  16  1  1  0
>>  18 16  1  0
>>  18 17  2  0
>>  19 16  2  0
>>  20 19  1  0
>>  21 20  2  0
>>  21 17  1  0
>>  22 21  1  0
>>  23 22  1  0
>>  24 23  1  0
>>  25 24  1  0
>>  26 25  1  0
>>  27 26  1  0
>>  28 27  1  0
>>  29 28  1  0
>>  30 29  1  0
>>  31 30  1  0
>>  32 31  1  0
>>  33 32  1  0
>>  34 33  1  0
>>  35 34  1  0
>>  36 35  1 

Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-02 Thread Maciek Wójcikowski
I've noticed that GMail might have mess up spaces in the text, so I
included the example as an attachment.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-10-02 21:46 GMT+02:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:

> Hi RDKitters,
>
> Is it a bug or a feature? When using Chem.MolFromMolBlock there is no
> properties from SD file. There is a bit of code to replicate that issue:
>
> from rdkit import Chem
>> tmp = """20346
>>  RDKit  3D
>>  36 38  0  0  0  0  0  0  0  0999 V2000
>>15.8390   -9.3370   68.8840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>17.1400   -9.1830   69.5480 N   0  0  0  0  0  0  0  0  0  0  0  0
>>17.4030   -7.7570   69.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>17.0930   -7.4160   71.2420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.2300   -6.5720   71.8210 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.6770   -7.1570   73.0920 N   0  0  0  0  0  0  0  0  0  0  0  0
>>20.1430   -7.2290   73.1530 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.5650   -8.5770   73.7380 C   0  0  0  0  0  0  0  0  0  0  0  0
>>21.6390   -9.1530   72.9180 N   0  0  0  0  0  0  0  0  0  0  0  0
>>21.3560  -10.5710   72.6640 C   0  0  0  0  0  0  0  0  0  0  0  0
>>21.5940  -10.8820   71.1850 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.4320  -11.7190   70.6460 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.0430  -11.2190   69.3210 N   0  0  0  0  0  0  0  0  0  0  0  0
>>18.5820  -11.1310   69.1980 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.1950   -9.7400   68.6920 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.7370   -9.2360   69.9070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.1700   -7.9140   71.1420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.1800   -8.0060   70.2040 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.2840  -10.3730   70.5500 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.2740  -10.2810   71.4890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>12.7160   -9.0510   71.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>11.6140   -8.9500   72.8070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>11.8810   -7.8160   73.7040 N   0  0  0  0  0  0  0  0  0  0  0  0
>>12.9350   -8.1480   74.6710 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.2790   -7.6200   74.1620 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.9870   -6.8610   75.2860 C   0  0  0  0  0  0  0  0  0  0  0  0
>>15.4440   -5.5580   74.7840 N   0  0  0  0  0  0  0  0  0  0  0  0
>>15.1190   -4.5130   75.7650 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.5870   -3.2770   75.0370 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.3710   -2.7990   75.7080 N   0  0  0  0  0  0  0  0  0  0  0  0
>>12.3120   -2.5090   74.7330 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.9900   -3.1010   75.2260 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.3040   -3.8450   74.0790 C   0  0  0  0  0  0  0  0  0  0  0  0
>> 9.7590   -5.1150   74.5740 N   0  0  0  0  0  0  0  0  0  0  0  0
>>10.0350   -6.2090   73.6340 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.6540   -7.3880   74.3890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>   2  1  1  0
>>   3  2  1  0
>>   4  3  1  0
>>   5  4  1  0
>>   6  5  1  0
>>   7  6  1  0
>>   8  7  1  0
>>   9  8  1  0
>>  10  9  1  0
>>  11 10  1  0
>>  12 11  1  0
>>  13 12  1  0
>>  14 13  1  0
>>  15 14  1  0
>>  15  2  1  0
>>  16  1  1  0
>>  18 16  1  0
>>  18 17  2  0
>>  19 16  2  0
>>  20 19  1  0
>>  21 20  2  0
>>  21 17  1  0
>>  22 21  1  0
>>  23 22  1  0
>>  24 23  1  0
>>  25 24  1  0
>>  26 25  1  0
>>  27 26  1  0
>>  28 27  1  0
>>  29 28  1  0
>>  30 29  1  0
>>  31 30  1  0
>>  32 31  1  0
>>  33 32  1  0
>>  34 33  1  0
>>  35 34  1  0
>>  36 35  1  0
>>  36 23  1  0
>> M  END
>> >(1)
>> 0.81
>> >(1)
>> =
>> >(1)
>> IC50
>> >(1)
>> CHEMBL18442
>> 
>> """
>> m = Chem.MolFromMolBlock(tmp)
>> print m.GetPropsAsDict()
>> from StringIO import StringIO
>> m = Chem.ForwardSDMolSupplier(StringIO(tmp)).next()
>> print m.GetPropsAsDict()
>
>
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>


rdkit_molprops.py
Description: application/chimera
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-03 Thread Maciek Wójcikowski
Thank you Andrew! Indeed it's working and it's a tiny bit faster too.

Best,
M


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-10-03 13:08 GMT+02:00 Andrew Dalke <da...@dalkescientific.com>:

> On Oct 2, 2016, at 10:48 PM, Maciek Wójcikowski wrote:
> > Yes I get it, but obviously there is no MolFromSDBlock, so one would
> suspect MolFromMolBlock to support both formats. As I understand correctly
> the only way of reading SD from variable is as presented in my example? Or
> is there some marvelous undocumented API? ;)
>
> Six years ago, Greg Landrum at http://www.mail-archive.com/
> rdkit-discuss@lists.sourceforge.net/msg01436.html suggested:
>
>nsuppl = Chem.SDMolSupplier()
>nsuppl.SetData(mb)
>mol = nsuppl.next()
>
> This is simpler than passing in a StringIO().
>
> I knew about this posting because my own code has MolFromSDBlock() wrapper
> layer, and a comment pointing to that URL as explanation.
>
>
> On Oct 2, 2016, at 11:06 PM, Brian Kelley wrote:
> > The general idea, I believe, is that if the format can result in
> multiple molecules a supplier should be used.
>
> I wrote the function to make it easier to deal with web service input or
> database records where the text contains one and only one record in SD
> format. This is a proper subset of the SD format, which contains 0 or more
> records.
>
> If there are no records then my function returns a None, so I don't need
> to deal with a StopIteration. I don't care if there is more than one
> record, so I ignore anything past the first record.
>
> The use case occurs pretty frequently in my work, so I figured a
> MolFromSDBlock() for my own use was worthwhile.
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure by atom indices

2016-11-01 Thread Maciek Wójcikowski
Hi,

There is PathToSubmol() although it takes the list of bonds. If you have
atom indices:

bonds = []
> atommap = {}

for i,j in combinations(atom_path, 2):
>  b = ParentMol.GetBondBetweenAtoms(i,j)
>  if b:
>bonds.append(b.GetIdx())

NewMol = Chem.PathToSubmol(ParentMol, bonds, atomMap=atommap)



atommap is a dictionary populated with atom indicies mapping from ParentMol
to the new one.



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-11-01 11:00 GMT+01:00 Juuso Lehtivarjo <juuso.lehtiva...@gmail.com>:

> Hi All,
>
> Is there a python function (or any simple way whatsoever) to create a
> substructure mol object from another one based on the given atom
> indices? In C++ this could apparently be done with
> getMolFragsWithQuery, but that does not seem to be much used in python
> wrappers...
>
> Best,
>Juuso
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Writing a Tripos MOL2 file with charges

2016-10-31 Thread Maciek Wójcikowski
Hi,

If you really desperately need it, there is a mockup of MolToMol2Block()
and MolToMol2File() by Jan and myself [see
https://github.com/rdkit/rdkit/pull/415], but it's still rough around the
eadges at best.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-10-31 18:11 GMT+01:00 James Johnson <totalboron...@gmail.com>:

> Is there any supported format that outputs partial charges?
>
> The speed of RDKit is phenomenal (0.02 seconds) vs obabel's 2 seconds, but
> if I cannot output partial charges I'll be forced to use obabel.
>
> Thank you for your time.
>
> -James
>
> On Mon, Oct 31, 2016 at 1:00 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Hi James,
>>
>> Due to problems with the general ambiguity of the format the RDKIt does
>> not have a mol2 writer.
>>
>> -greg
>>
>>
>>
>>
>>
>> On Mon, Oct 31, 2016 at 12:22 AM +0100, "James Johnson" <
>> totalboron...@gmail.com> wrote:
>>
>> Hello all,
>>>
>>> I've been trying to output my 3D mol object that has Gasteiger charges
>>> to mol2 file format. How would I go about that? I've only found it for mol
>>> and pdb.
>>>
>>> Here is the code I'be been using if that helps:
>>> ~~~
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>>
>>> smile = 'Cc1c1'
>>>
>>> uncharged_mol_1D = Chem.MolFromSmiles(smile)
>>>
>>> uncharged_mol_3D = Chem.AddHs(uncharged_mol_1D)
>>> AllChem.EmbedMolecule(uncharged_mol_3D)
>>> AllChem.UFFOptimizeMolecule(uncharged_mol_3D)
>>>
>>> charged_mol_3D = uncharged_mol_3D
>>> AllChem.ComputeGasteigerCharges(charged_mol_3D)
>>>
>>> fout = Chem.SDWriter('./charged_test.mol')
>>> fout.write(charged_mol_3D)
>>> fout.close()
>>> ~~~
>>>
>>> Thank you!
>>>
>>
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] HasSubstructMatch return False where it shouldn't

2016-11-01 Thread Maciek Wójcikowski
Hi Michał,

Have you tried using AdjustQueryProperties(). I think Greg mentioned it in
his presentation at UGM

http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AdjustQueryProperties


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-11-01 19:20 GMT+01:00 Michał Nowotka <mmm...@gmail.com>:

> Hi,
>
> I have this molfile (CHEMBL265667):
>
>
>   11280714432D 1   1.0 0.0 0
>
>  25 27  0 0  0999 V2000
> 3.8042   -1.60000. C   0  0  0  0  0  0   0  0  0
> 4.3167   -1.90000. N   0  0  3  0  0  0   0  0  0
> 3.8042   -1.0. N   0  0  0  0  0  0   0  0  0
> 4.8417   -1.60000. N   0  0  0  0  0  0   0  0  0
> 4.3167   -2.50000. C   0  0  0  0  0  0   0  0  0
> 4.3167   -3.69170. C   0  0  0  0  0  0   0  0  0
> 4.8417   -1.0. C   0  0  0  0  0  0   0  0  0
> 4.3167   -0.70000. C   0  0  0  0  0  0   0  0  0
> 3.7917   -3.39170. C   0  0  0  0  0  0   0  0  0
> 4.8375   -3.39170. C   0  0  0  0  0  0   0  0  0
> 3.8000   -2.79170. C   0  0  0  0  0  0   0  0  0
> 4.8375   -2.79170. C   0  0  0  0  0  0   0  0  0
> 4.3167   -4.29170. C   0  0  3  0  0  0   0  0  0
> 3.2875   -1.89170. O   0  0  0  0  0  0   0  0  0
> 4.8375   -4.59170. C   0  0  0  0  0  0   0  0  0
> 4.3167   -0.09170. O   0  0  0  0  0  0   0  0  0
> 4.8292   -5.19170. C   0  0  0  0  0  0   0  0  0
> 5.3500   -4.29170. C   0  0  0  0  0  0   0  0  0
> 5.8667   -5.19170. C   0  0  0  0  0  0   0  0  0
> 3.7917   -4.59170. O   0  0  0  0  0  0   0  0  0
> 5.8667   -4.59170. C   0  0  0  0  0  0   0  0  0
> 5.3542   -5.49170. C   0  0  0  0  0  0   0  0  0
> 6.3917   -5.49170. Cl  0  0  0  0  0  0   0  0  0
> 3.2750   -3.69170. C   0  0  0  0  0  0   0  0  0
> 5.3542   -3.69170. C   0  0  0  0  0  0   0  0  0
>   2  1  1  0 0  0
>   3  1  1  0 0  0
>   4  2  1  0 0  0
>   5  2  1  0 0  0
>   6 10  1  0 0  0
>   7  8  1  0 0  0
>   8  3  1  0 0  0
>   9 11  1  0 0  0
>  10 12  2  0 0  0
>  11  5  2  0 0  0
>  12  5  1  0 0  0
>  13  6  1  0 0  0
>  14  1  2  0 0  0
>  15 13  1  0 0  0
>  16  8  2  0 0  0
>  17 15  2  0 0  0
>  18 15  1  0 0  0
>  19 21  1  0 0  0
>  20 13  1  0 0  0
>  21 18  2  0 0  0
>  22 17  1  0 0  0
>  23 19  1  0 0  0
>  24  9  1  0 0  0
>  25 10  1  0 0  0
>   4  7  2  0 0  0
>   9  6  2  0 0  0
>  22 19  2  0 0  0
> M  END
>
> and this smarts: [OH1]-C(-c1c1)c2c2
>
> I'm using this code to find a substructure:
>
> mol = Chem.MolFromMolBlock(str(molstring), sanitize=False)
> mol.UpdatePropertyCache(strict=False)
> patt = Chem.MolFromSmarts(str(smarts))
> Chem.GetSSSR(patt)
> Chem.GetSSSR(mol)
> match = mol.HasSubstructMatch(patt)
>
> and the `match` is empty.
>
> But with indigo code:
>
> mol = indigoObj.loadMolecule(str(molstring))
> patt = indigoObj.loadSmarts(str(smarts))
> match = indigoObj.substructureMatcher(mol).match(patt)
>
> match is valid and I can render this to image:
>
>
> ​
> ​I'm I missing some flag or doing something wrong?
>
> --
>
> Michal
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Maciek Wójcikowski
Hi Alexis,

You may want to filter with some regex strings containing not valid
characters (i.e. there is small subset of atoms that may be without
brackets). See "Atoms" section:
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

The set might grow pretty quick and may be inefficient, so I'd parse all
strings passing above filter. Although there will be some false positives
like "CC" which may occur in text (emails especially).


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-12-02 10:11 GMT+01:00 Alexis Parenty <alexis.parenty.h...@gmail.com>:

> Dear all,
>
>
> I am looking for a way to extract SMILES scattered in many text documents
> (thousands documents of several pages each).
>
> At the moment, I am thinking to scan each words from the text and try to
> make a mol object from them using Chem.MolFromSmiles() then store the words
> if they return a mol object that is not None.
>
> Can anyone think of a better/quicker way?
>
>
> Would it be worth storing in a tuple any word that do not return a mol
> object from Chem.MolFromSmiles() and exclude them from subsequent search?
>
>
> Something along those lines
>
>
> excluded_set = set()
>
> smiles_list = []
>
> For each_word in text:
>
> If each_word not in excluded_set:
>
> each_word_mol =  Chem.MolFromSmiles(each_word)
>
> if each_word_mol is not None:
>
> smiles_list.append(each_word)
>
>  else:
>
>  excluded_set.add(each_word_mol)
>
>
> Would not searching into that growing tuple take actually more time than
> trying to blindly make a mol object for every word?
>
>
>
> Any suggestion?
>
>
> Many thanks and regards,
>
>
> Alexis
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Maciek Wójcikowski
Hi Jean-Marc and others,

There is also CanonicalRankAtoms [
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms]
which seams to be forgotten.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-12-18 23:14 GMT+01:00 Jean-Marc Nuzillard <jm.nuzill...@univ-reims.fr>:

> Thank you Andrew, Brian and David for your answers.
>
> mol.GetProp("_smilesAtomOutputOrder") does the job.
> I also expected a.GetProp("molAtomMapNumber") could do it for each atom a.
>
> All the best,
>
> Jean-Marc
>
> Le 18/12/2016 à 19:04, Andrew Dalke a écrit :
> > On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote:
> >>>>> m.GetProp("_smilesAtomOutputOrder")
> >> '[3,2,1,0,]'
> >>
> >> Note that this returns the list as a string which is sub-optimal.
> GetPropsAsDict will convert these to proper python objects, however, this
> is considered a private member so you need to return these as well:
> >>
> >>>>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"])
> >> [3, 2, 1, 0]
> > For fun, here are a few timing numbers:
> >
> ># Common setup
> > from rdkit import Chem
> > mol = Chem.MolFromSmiles("c1c1Oc1c1")
> > Chem.MolToSmiles(mol)'
> > import json
> > import ujson # third-party JSON decoder
> > import re
> > integer_pat = re.compile("[0-9]+")
> >
> >
> > # Get the string (give a lower bound)
> > mol.GetProp("_smilesAtomOutputOrder")'
> > 1 loops, best of 3: 31.3 usec per loop
> >
> >
> > Here are variations for how to get that information as a list of
> integers:
> >
> > # Using Python's "eval()" to decode the list (this is generally UNSAFE!)
> > eval(mol.GetProp("_smilesAtomOutputOrder"))'
> > 1 loops, best of 3: 157 usec per loop
> >
> > # Use the built-in json module (need to remove the terminal ",")
> > json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")'
> > 1 loops, best of 3: 66.5 usec per loop
> >
> > # Use the third-party "ujson" package, which is faster than json.
> > ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")
> > 1 loops, best of 3: 41.2 usec per loop
> >
> > ("cjson" takes 49.7 usec per loop)
> >
> > # Use the properties dictionary
> > mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]
> > 1000 loops, best of 3: 462 usec per loop
> >
> > # Parse it more directly
> > map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder")))
> > 1 loops, best of 3: 89 usec per loop
> >
> >
> >   Andrew
> >   da...@dalkescientific.com
> >
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
>
> --
> Jean-Marc Nuzillard
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/ICMR
>
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Install rdkit with anaconda3

2017-04-12 Thread Maciek Wójcikowski
Hi Francois,

There are no Python 3.6 packages of rdkit right now.

I guess we can ask Greg or Riccardo to build them with the next release of
RDKit.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-04-12 10:27 GMT+02:00 François-Régis Chalaoux <chalaou...@gmail.com>:

> Trying to install rdkit with anaconda on linux I got this error:
>
>
>
> UnsatisfiableError:
> The following specifications were found to be in conflict:
> - python 3.6*
> - rdkit -> boost ==1.56.0 -> python 2.7* -> openssl 1.0.1*
> Use "conda info " to see the dependencies for each package.
>
>
> What can I do ?
>
>
> FR.
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [Rdkit-devel] 2017.03 (Q1 2017) RDKit Release

2017-04-21 Thread Maciek Wójcikowski
Hi Greg,

Just FYI rdkit for Python 3.6 requires boost 1.56 which has no Python 3.6
version in your repo. I just tested Linux packages, but it should be the
same for other platforms.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-04-21 6:36 GMT+02:00 Greg Landrum <greg.land...@gmail.com>:

> I'm pleased to announce that the next version of the RDKit -- 2017.03
> (a.k.a. Q1 2017) -- is released. The release notes are below.
>
> The release files are on the github release page:
> *https://github.com/rdkit/rdkit/releases/tag/Release_2017_03_1
> <https://github.com/rdkit/rdkit/releases/tag/Release_2017_03_1>*
>
> We are in the process of updating the conda build scripts to reflect the
> new version and uploading the binaries to anaconda.org (https://
> anaconda.org/rdkit).
> The plan for conda binaries for this release is:
> Linux 64bit: python 2.7, 3.5, 3.6
> Mac OS 64bit: python 2.7, 3.5, 3.6
> Windows 64bit: python 2.7, 3.5, 3.6
> Windows 32bit: python 2.7
>
> Some things that will be finished over the next couple of days:
> - The conda build scripts will be updated to reflect the new version and
> new conda builds will be available in the RDKit channel at anaconda.org (
> https://anaconda.org/rdkit).
> - The homebrew script
> - The online version of the documentation at rdkit.org
>
> Thanks to everyone who submitted bug reports and suggestions for this
> release!
>
> Please let me know if you find any problems with the release or have
> suggestions for the next one, which is scheduled for September 2017.
>
> Best Regards,
> -greg
>
> # Release_2017.03.1
> (Changes relative to Release_2016.09.1)
>
> ## Important
> - The fix for bug #879 changes the definition of the layered fingerprint.
>   This means that all database columns using layered fingerprints as well
> as
>   all substructure search indices should be rebuilt.
> - All C++ library names now start with RDKit (see #1349).
>
> ## Acknowledgements:
> Brian Cole, David Cosgrove, JW Feng, Berend Huisman, Peter Gedeck, 'i-tub',
> Jan Holst Jensen, Brian Kelley, Rich Lewis, Brian Mack, Eloy Felix
> Manzanares,
> Stephen Roughley, Roger Sayle, Nadine Schneider, Gregor Simm, Matt Swain,
> Paolo Tosco, Riccardo Vianello, Hsiao Yi
>
> ## Highlights:
>   - It's now possible (though not the default) to pickle molecule
> properties
>   with the molecule
>   - There's a new, and still in development, "Getting started in C++"
> document.
>   - A lot of the Python code has been cleaned up
>
> ## New Features and Enhancements:
>   - Add removeHs option to MolFromSmiles()
>  (github issue #554 from greglandrum)
>   - support a fixed bond length in the MolDraw2D code
>  (github issue #565 from greglandrum)
>   - Pattern fingerprint should set bits for single-atom fragments.
>  (github issue #879 from greglandrum)
>   - Reviewed unit tests of rdkit.ML - coverage now 63.1%
>  (github pull #1148 from gedeck)
>   - Reviewed unit tests of rdkit.VLib - coverage now 67.1%
>  (github pull #1149 from gedeck)
>   - Removes exponetial numBonds behavior
>  (github pull #1154 from bp-kelley)
>   - Exposes normalize option to GetFlattenedFunctionalGroupHierarchy
>  (github pull #1165 from bp-kelley)
>   - Expose RWMol.ReplaceBond to Python
>  (github pull #1174 from coleb)
>   - Review of rdkit.Chem.Fraggle code
>  (github pull #1184 from gedeck)
>   - Add support for dative bonds.
>  (github pull #1190 from janholstjensen)
>   - Python 3 compatibility (issue #398)
>  (github pull #1192 from gedeck)
>   - 1194: Review assignments of range in Python code
>  (github pull #1195 from gedeck)
>   - Moved GenerateDepictionMatching[23]DStructure from Allchem.py to C++
>  (github pull #1197 from DavidACosgrove)
>   - Review rdkit.Chem.pharm#D modules
>  (github pull #1201 from gedeck)
>   - Find potential stereo bonds should return any
>  (github pull #1202 from coleb)
>   - Gedeck coverage sim div filters
>  (github pull #1208 from gedeck)
>   - Gedeck review unit test inchi
>  (github pull #1209 from gedeck)
>   - Coverage rdkit.Dbase
>  (github pull #1210 from gedeck)
>   - Coverage rdkit.DataStructs
>  (github pull #1211 from gedeck)
>   - UnitTestPandas works on Python3
>  (github pull #1213 from gedeck)
>   - Cleanup and improvement to test coverage of PandasTools
>  (github pull #1215 from gedeck)
>   - Cleanup of rdkit.Chem.Fingerprints
>  (github pull #1217 from gedeck)
>   - Optimization of UFF and MMFF forcefields
>  (github pull #1218 from ptosco)
>   - Support for ChemAxon Extended SMILES/SMARTS
>  (github issue #1226 from greglandrum)
>   - Improved test coverage for rdkit.Chem.Fingerprints
>  (g

Re: [Rdkit-discuss] Mapping of atom numbering, naming and coordinates for different conformers

2017-07-18 Thread Maciek Wójcikowski
Hi Max,

Have you tried getting the atom map simplu by mathing those molecules?

mol2.GetSubstructMatch(mol1)


If your molecules dont match this way you can seek inspiration in
AssignBondsFromTemplate function here:
https://github.com/rdkit/rdkit/blob/83d62a71f28b96b29458bcda225374d7f07f9c82/rdkit/Chem/AllChem.py#L370

Then you can use Chem.RenumberAtoms to set the new order.



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-07-18 14:31 GMT+02:00 Max Pillong <max.pill...@gmx.net>:

> Hi everyone,
>
> I am stuck on the following issue: I have two files of different
> conformers for the same molecule. One is an NMR generated reference PDB,
> the other one is an sdf with conformers generated in RDKit. I would now
> like to merge the two files into one PDB preserving the initial atom
> numbering and naming from the reference file. If I simply add the generated
> conformers to the reference molecule using AddConformer() everything seems
> fine in the beginning (even when looking at the generated PDB files, the
> atom numbering/naming seems to be in order), however it does not update the
> coordinates accordingly, resulting in wrong atom typings and connection
> issues in the outfile (see attached out.pdb).
>
> Is there a way to automatically generate an atom mapping between the two
> or update the coordinates accordingly? Or maybe someone can think of an
> overall better solution to the issue?
>
> Thanks and all the best!
> Max
>
> refFile="ref.pdb"
> confFile="confs.sdf"
> outfile="out.pdb"
>
> refMol=Chem.MolFromPDBFile(refFile, removeHs=False)
> confSupp=Chem.SDMolSupplier(confFile, removeHs=False)
>
> for m in confSupp:
> refMol.AddConformer(m.GetConformer(0), assignId=True)
>
> writer=rdmolfiles.PDBWriter(outfile)
> for i in range(0,refMol.GetNumConformers()):
> writer.write(refMol, confId=i)
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a Ubuntu ppa or some repository with the latest rdkit release as .deb ?

2017-06-22 Thread Maciek Wójcikowski
Hi,

I'd suppose the latest version if not in PPA (I dont know such repo) is
available in Debian Sid/Experimental. Unfortunately it's still two versions
behind. https://packages.debian.org/sid/python-rdkit It's still a year
newer that the one you have.

You might also try packages from newer versions of Ubuntu: Zetsy already
has the 2016.3 version https://packages.ubuntu.com/zesty/python-rdkit


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-22 5:00 GMT+02:00 Greg Landrum <greg.land...@gmail.com>:

> I'm not aware of any such repository.
>
> The debichem team (really Michael Banck) puts together the debian RDKit
> releases that end up in ubuntu:
> https://debichem.alioth.debian.org/
>
> This isn't an area that I know anything about, but here's the part of
> their SVN repo that has the RDKit stuff, perhaps that is useful?
> https://anonscm.debian.org/viewvc/debichem/unstable/rdkit/debian/
>
> -greg
>
>
> On Thu, Jun 22, 2017 at 1:55 AM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
>
>> Hello,
>>
>> I'd like to install rdkit system-wide.
>>
>> However, I'd like the install to use a regular system package
>> since rdkit is available in my distro.
>>
>> I don't like system-wide install from sources.
>> Because they tend to install things in different places
>> than what the binary package does, and several other sysadmin reasons.
>>
>> If there is some doc online on how to update the .deb
>> packages for Ubuntu/Debian, I might have a look at them.
>>
>> Regards,
>> F.
>>
>> PS: Ubuntu 16.04.2 LTS ships rdkit 201503-3
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to get a list of available properties from SD file

2017-06-27 Thread Maciek Wójcikowski
Hi,

There is a method GetPropsAsDict() or GetPropNames() for RDKit molecule.
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.Mol-class.html#GetPropsAsDict
which should do what you want.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-27 11:26 GMT+02:00 Malitha Kabir <malitha12...@gmail.com>:

> Hi,
>
> Thank you very much in advance for kindly looking into this.
> My question is in short:
> Is there any method that can create a list of available properties from SD
> file?
>
> I am describing the scenario here:
> You can view a sample SD file from the following github link:
> https://github.com/rdkit/rdkit/blob/master/Docs/Book/data/cdk2.sdf
>
> That file contains previously calculated properties (eg: Cluster,
> MODEL.SOURCE etc.). I can read the file in RDKit using the following codes:
>
> from rdkit.Chem.rdmolfiles import SDMolSupplier
> file1='cdk2.sdf'
> data=SDMolSupplier(fileName=file1, sanitize=True, removeHs=False,
> strictParsing=True)
>
> I can access the first molecule by using the following code:
> m0=data[0]
>
> Now the object m0 (rdkit Mol object) contains all the necessary
> information about the molecule including properties.
>
> I need to create a list of previously calculated properties from that file
> without seeing the file visually.
>
> Any direction is warmly appreciated. Thank you very much. Have a great day!
>
> -Malitha
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit on armv7h

2017-06-03 Thread Maciek Wójcikowski
Has anyone try the Arm64 (aarch64)? Is it the same?


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-03 8:46 GMT+02:00 Gianluca Sforna <gia...@gmail.com>:

> On Thu, Jun 1, 2017 at 12:03 AM, Samo Turk <samo.t...@gmail.com> wrote:
> > Thanks! It seems to work, but it's still compiling. Rockchip CPU in my
> > Chromebook is not very fast..
> >
>
> How did it turn out? According to my Fedora builds, compiling is fine
> but tests are going to fail :)
>
> --
> Gianluca Sforna
>
> http://plus.google.com/+gianlucasforna - http://twitter.com/giallu
> Tinker Garage - http://tinkergarage.it
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering

2017-06-04 Thread Maciek Wójcikowski
Is there a big difference in the quality of the final dataset between
K-means and random under-sampling of big database (~20M)?


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-04 12:24 GMT+02:00 Samo Turk <samo.t...@gmail.com>:

> Hi Chris,
>
> There are other options for clustering. According to this: http://hdbscan.
> readthedocs.io/en/latest/performance_and_scalability.html
> HDBSCAN and K-means scale well. HDBSCAN will find clusters based on
> density and it also allows for outliers, but can be fiddly to find the
> right parametes. You can not specify the number of clusters (like in Butina
> case). If you want to specify the number of clusters, you can simply use
> K-means. High dimensionality of fingerprints might be a problem for memory
> consumption. In this case you can use PCA to reduce dimensions to something
> manageable. To avoid memory issues with PCA and speed things up I would fit
> the model on random 100k compounds and then just use fit_transform method
> on the rest. http://scikit-learn.org/stable/modules/generated/
> sklearn.decomposition.PCA.html
>
> Cheers,
> Samo
>
> On Sun, Jun 4, 2017 at 9:08 AM, Chris Swain <sw...@mac.com> wrote:
>
>> Hi,
>>
>> I want to do clustering on around 4 million structures
>>
>> The Rdkit cookbook (http://www.rdkit.org/docs/Cookbook.html) suggests
>>
>> "For large sets of molecules (more than 1000-2000), it’s most efficient
>> to use the Butina clustering algorithm”
>>
>>  However it is quite a step up from a few thousand to several million and
>> I wondered if anyone had used this algorithm on larger data sets?
>>
>> As far as I can tell it is not possible to define the number of clusters,
>> is this correct?
>>
>> Cheers,
>>
>> Chris
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit on armv7h

2017-06-04 Thread Maciek Wójcikowski
I must correct myself, pandas was not installed, so the only test that
failed was "test3D".


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-04 15:07 GMT+02:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:

> I tried compiling Git master on armv8/arm64 Debian Sid (as mentioned
> before) and all tests but two passed.
>
> cmake .. -D
>> LD_LIBRARY_PATH="$RDBASE/lib:$PYROOT/lib:$LD_LIBRARY_PATH"
>> PYTHONPATH=$RDBASE:$PYTHONPATH ctest
>
>
> Failures:
>
>> 61: [12:08:04] -
>> 61: [12:08:04] 3D descriptor edge cases.
>> 61: [12:08:04]
>> 61:
>> 61: 
>> 61: Test Assert
>> 61: Expression Failed:
>> 61: Violation occurred on line 456 in file /root/rdkit/Code/GraphMol/
>> Descriptors/test3D.cpp
>> 61: Failed Expression: fabs(val) < 1e-4
>> 61: 
>> 61:
>> 61: terminate called after throwing an instance of 'Invar::Invariant'
>> 61:   what():  Test Assert
>> 1/1 Test #61: test3D ...***Exception: Other
>>  1.76 sec
>>
>
>
>  test 116
>> Start 116: pythonTestDirChem
>> 116: Test command: /usr/bin/python "/root/rdkit/rdkit/Chem/test_list.py"
>> "--testDir" "/root/rdkit/rdkit/Chem"
>> 116: Test timeout computed to be: 9.99988e+06
>> 116: 
>> 116: 
>> --
>> 116: Ran 4 tests in 0.173s
>> 116:
>> 116: OK
>> 116: ..
>> 116: 
>> --
>> 116: Ran 6 tests in 0.169s
>> 116:
>> 116: OK
>> 116: [12:10:20]
>> 116:
>> 116: 
>> 116: Pre-condition Violation
>> 116: valence not defined for atoms not associated with molecules
>> 116: Violation occurred on line 267 in file /root/rdkit/Code/GraphMol/
>> Atom.cpp
>> 116: Failed Expression: dp_mol
>> 116: 
>> 116:
>
>
> And few others, like Pandas fail to import etc. If you want full
> tracestack I can upload it in separate file.
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2017-06-04 12:45 GMT+02:00 Samo Turk <samo.t...@gmail.com>:
>
>> I'll just install debian in chroot and test if armv7 binary from the repo
>> works.
>>
>> On Sat, Jun 3, 2017 at 9:00 PM, Maciek Wójcikowski <mac...@wojcikowski.pl
>> > wrote:
>>
>>> I have Odroid C2 (which is Rpi3 faster cousin), also armv8. Thats a myth
>>> rather than reality. Debian is almost 100% arm64 friendly (spoiler alert).
>>>
>>> I fired a Docker container with Debian Sid arm64 and installed RDKit
>>> from repo. All things work fine, unfortunately Conda is a no-go for now. I
>>> can try to compile RDKit from source and let you know how it went.
>>>
>>> 
>>> Pozdrawiam,  |  Best regards,
>>> Maciek Wójcikowski
>>> mac...@wojcikowski.pl
>>>
>>> 2017-06-03 20:02 GMT+02:00 Samo Turk <samo.t...@gmail.com>:
>>>
>>>> It compiled successfully, unfortunately importing rdkit crashes python
>>>> interpreter (Segmentation fault). But I didn't have time to look more
>>>> closely into this. I successfully compiled rdkit on arm few years ago but
>>>> on raspberry pi (arm v6), this time it is a chromebook.
>>>>
>>>> @Maciek If you have rpi 3 in mind, I read that arm64 is not recommended
>>>> at this time since there is no support for vendor provided libraries..
>>>>
>>>> On Sat, Jun 3, 2017 at 5:00 PM, Maciek Wójcikowski <
>>>> mac...@wojcikowski.pl> wrote:
>>>>
>>>>> Has anyone try the Arm64 (aarch64)? Is it the same?
>>>>>
>>>>> 
>>>>> Pozdrawiam,  |  Best regards,
>>>>> Maciek Wójcikowski
>>>>> mac...@wojcikowski.pl
>>>>>
>>>>> 2017-06-03 8:46 GMT+02:00 Gianluca Sforna <gia...@gmail.com>:
>>>>>
>>>>>> On Thu, Jun 1, 2017 at 12:03 AM, Samo Turk <samo.t...@gmail.com>
>>>>>> wrote:
>>>>>> > Thanks! It seems to work, but it's still compiling. Rockchip CPU in
>>>>>> my
>>>>>> > Chromebook is not very fast..
>>>>>> >
>>>>>>
>>>>>> How did it turn out? According to my Fedora builds, compiling is fine
>>>>>> but tests are going to fail :)
>>>>>>
>>>>>> --
>>>>>> Gianluca Sforna
>>>>>>
>>>>>> http://plus.google.com/+gianlucasforna - http://twitter.com/giallu
>>>>>> Tinker Garage - http://tinkergarage.it
>>>>>>
>>>>>> 
>>>>>> --
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>>> ___
>>>>>> Rdkit-discuss mailing list
>>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit on armv7h

2017-06-04 Thread Maciek Wójcikowski
I tried compiling Git master on armv8/arm64 Debian Sid (as mentioned
before) and all tests but two passed.

cmake .. -D
> LD_LIBRARY_PATH="$RDBASE/lib:$PYROOT/lib:$LD_LIBRARY_PATH"
> PYTHONPATH=$RDBASE:$PYTHONPATH ctest


Failures:

> 61: [12:08:04] -
> 61: [12:08:04] 3D descriptor edge cases.
> 61: [12:08:04]
> 61:
> 61: 
> 61: Test Assert
> 61: Expression Failed:
> 61: Violation occurred on line 456 in file
> /root/rdkit/Code/GraphMol/Descriptors/test3D.cpp
> 61: Failed Expression: fabs(val) < 1e-4
> 61: 
> 61:
> 61: terminate called after throwing an instance of 'Invar::Invariant'
> 61:   what():  Test Assert
> 1/1 Test #61: test3D ...***Exception: Other  1.76
> sec
>


 test 116
> Start 116: pythonTestDirChem
> 116: Test command: /usr/bin/python "/root/rdkit/rdkit/Chem/test_list.py"
> "--testDir" "/root/rdkit/rdkit/Chem"
> 116: Test timeout computed to be: 9.99988e+06
> 116: 
> 116: --
> 116: Ran 4 tests in 0.173s
> 116:
> 116: OK
> 116: ..
> 116: --
> 116: Ran 6 tests in 0.169s
> 116:
> 116: OK
> 116: [12:10:20]
> 116:
> 116: 
> 116: Pre-condition Violation
> 116: valence not defined for atoms not associated with molecules
> 116: Violation occurred on line 267 in file
> /root/rdkit/Code/GraphMol/Atom.cpp
> 116: Failed Expression: dp_mol
> 116: 
> 116:


And few others, like Pandas fail to import etc. If you want full tracestack
I can upload it in separate file.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-04 12:45 GMT+02:00 Samo Turk <samo.t...@gmail.com>:

> I'll just install debian in chroot and test if armv7 binary from the repo
> works.
>
> On Sat, Jun 3, 2017 at 9:00 PM, Maciek Wójcikowski <mac...@wojcikowski.pl>
> wrote:
>
>> I have Odroid C2 (which is Rpi3 faster cousin), also armv8. Thats a myth
>> rather than reality. Debian is almost 100% arm64 friendly (spoiler alert).
>>
>> I fired a Docker container with Debian Sid arm64 and installed RDKit from
>> repo. All things work fine, unfortunately Conda is a no-go for now. I can
>> try to compile RDKit from source and let you know how it went.
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>> 2017-06-03 20:02 GMT+02:00 Samo Turk <samo.t...@gmail.com>:
>>
>>> It compiled successfully, unfortunately importing rdkit crashes python
>>> interpreter (Segmentation fault). But I didn't have time to look more
>>> closely into this. I successfully compiled rdkit on arm few years ago but
>>> on raspberry pi (arm v6), this time it is a chromebook.
>>>
>>> @Maciek If you have rpi 3 in mind, I read that arm64 is not recommended
>>> at this time since there is no support for vendor provided libraries..
>>>
>>> On Sat, Jun 3, 2017 at 5:00 PM, Maciek Wójcikowski <
>>> mac...@wojcikowski.pl> wrote:
>>>
>>>> Has anyone try the Arm64 (aarch64)? Is it the same?
>>>>
>>>> 
>>>> Pozdrawiam,  |  Best regards,
>>>> Maciek Wójcikowski
>>>> mac...@wojcikowski.pl
>>>>
>>>> 2017-06-03 8:46 GMT+02:00 Gianluca Sforna <gia...@gmail.com>:
>>>>
>>>>> On Thu, Jun 1, 2017 at 12:03 AM, Samo Turk <samo.t...@gmail.com>
>>>>> wrote:
>>>>> > Thanks! It seems to work, but it's still compiling. Rockchip CPU in
>>>>> my
>>>>> > Chromebook is not very fast..
>>>>> >
>>>>>
>>>>> How did it turn out? According to my Fedora builds, compiling is fine
>>>>> but tests are going to fail :)
>>>>>
>>>>> --
>>>>> Gianluca Sforna
>>>>>
>>>>> http://plus.google.com/+gianlucasforna - http://twitter.com/giallu
>>>>> Tinker Garage - http://tinkergarage.it
>>>>>
>>>>> 
>>>>> --
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>> ___
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>>
>>>>
>>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Maciek Wójcikowski
Hi,

If you really want to rely on the order of atom you can renumber them
anyhow you like with Chem.RenumberAtoms()
http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms
There is also a function which returns canonical order of atoms for
you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ
from the canonical smiles, although that might have changed.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-15 9:03 GMT+02:00 Brian Kelley <fustiga...@gmail.com>:

> Yes, atoms are always added in file order.  It would take a major change
> in rdkit to change/violate this.
>
> 
> Brian Kelley
>
> > On Jun 15, 2017, at 7:52 AM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
> >
> > Hello,
> >
> > If I read a molecule from a .sdf file, will the atom indexes be
> conserved/preserved?
> >
> > 1st atom in the file will have index 0,
> > 2nd index 1, etc.
> >
> > And, will this always hold in the future?
> > Is this an invariant of rdkit?
> >
> > Thanks,
> > F.
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fixed scale drawing

2017-09-14 Thread Maciek Wójcikowski
Hi RDKitters!

Quick question: is there a way to force drawing to output molecules on a
grid image or separate in fixed scale (i.e. constant/matching bond length)?


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fixed scale drawing

2017-09-15 Thread Maciek Wójcikowski
Hi again,

I've managed to make it work. Please find the working code attached as a
notebook. Unfortunately, it seam there is a small bug with SetScale: if the
diff in min/max x or y in the minimum point and maximum point is zero or
close to zero, then the molecules are broken. I've added a margin of 1 to
each sides and it worked.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-09-14 20:53 GMT+02:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:

> I've tried that, but ended up with molecules out of picture. I'll try
> again tomorrow and ping back here should I succeed. The real problem were
> two atom molecules, which have delta x or y = 0, such as C=0 or CN.
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2017-09-14 17:21 GMT+02:00 Greg Landrum <greg.land...@gmail.com>:
>
>> Hi Maciek,
>>
>> You can do this by calling setScale() method on MolDraw2D(). There's not
>> a decent python example around yet (would be something for the cookbook I
>> suppose), but the C++ code isn't too complex and demonstrates how it works:
>> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/
>> MolDraw2D/test1.cpp#L1962
>>
>> Hope this helps,
>> -greg
>>
>>
>> On Thu, Sep 14, 2017 at 10:24 AM, Maciek Wójcikowski <
>> mac...@wojcikowski.pl> wrote:
>>
>>> Hi RDKitters!
>>>
>>> Quick question: is there a way to force drawing to output molecules on a
>>> grid image or separate in fixed scale (i.e. constant/matching bond length)?
>>>
>>> 
>>> Pozdrawiam,  |  Best regards,
>>> Maciek Wójcikowski
>>> mac...@wojcikowski.pl
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>


fixed_bond_drawing.ipynb
Description: application/ipynb
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fixed scale drawing

2017-09-14 Thread Maciek Wójcikowski
I've tried that, but ended up with molecules out of picture. I'll try again
tomorrow and ping back here should I succeed. The real problem were two
atom molecules, which have delta x or y = 0, such as C=0 or CN.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-09-14 17:21 GMT+02:00 Greg Landrum <greg.land...@gmail.com>:

> Hi Maciek,
>
> You can do this by calling setScale() method on MolDraw2D(). There's not a
> decent python example around yet (would be something for the cookbook I
> suppose), but the C++ code isn't too complex and demonstrates how it works:
> https://github.com/rdkit/rdkit/blob/master/Code/
> GraphMol/MolDraw2D/test1.cpp#L1962
>
> Hope this helps,
> -greg
>
>
> On Thu, Sep 14, 2017 at 10:24 AM, Maciek Wójcikowski <
> mac...@wojcikowski.pl> wrote:
>
>> Hi RDKitters!
>>
>> Quick question: is there a way to force drawing to output molecules on a
>> grid image or separate in fixed scale (i.e. constant/matching bond length)?
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using Chem.WrapLogs()

2017-09-08 Thread Maciek Wójcikowski
Hi Noel,

sio.seek(0) before assert or sio.getvalue() instead read().


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-09-08 15:51 GMT+02:00 Noel O'Boyle <baoille...@gmail.com>:

> Hi all,
>
> I'd like to capture error messages during SMILES parsing, but am having
> trouble getting this to work.
>
> The following code raises an AssertionError, for example. Is there
> something here I'm missing? I'm using this from a Windows 7 conda
> environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda
> environment is also failing for me on Linux.
>
> import sys
> from rdkit import Chem
> Chem.WrapLogs()
> from StringIO import StringIO
>
> old_stderr = sys.stderr
> sio = sys.stderr = StringIO()
>
> mol = Chem.MolFromSmiles("c1c")
> sys.stderr = old_stderr
>
> assert sio.read() != ""
>
> Regards,
> - Noel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RPM distros

2017-11-28 Thread Maciek Wójcikowski
Hi Tim and Francois,

To fix missing dependencies use "sudo apt install -f" and use "yum install
rdkit_package_file.rpm" to install package to have all the dependencies.

On Debian (based systems) I prefer to install standalone packages via gdebi
which does this automatically.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-11-28 0:50 GMT+01:00 Francois BERENGER <beren...@bioreg.kyushu-u.ac.jp>
:

> On 11/28/2017 12:42 AM, Tim Dudgeon wrote:
> > I see exactly the same when I build with those cmake args.
>
> Maybe you are missing some of the dependencies.
> I don't think the packages we create have all the dependency information:
>
> fonts-freefont-ttf,
> libboost-python1.58.0,
> libboost-regex1.58.0,
> libboost-system1.58.0,
> libboost-thread1.58.0,
> libc6 (>= 2.14),
> libgcc1 (>= 1:4.1.1),
> libpython2.7 (>= 2.7),
> libstdc++6 (>= 5.2),
> python (<< 2.8),
> python (>= 2.7~)
>
> You should install the ones you are missing and test again.
>
> > On 27/11/2017 09:11, Francois BERENGER wrote:
> >> On 11/27/2017 06:01 PM, Tim Dudgeon wrote:
> >>> I did:
> >>>
> >>> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_INSTALL_INTREE=OFF
> >>> -DCMAKE_INSTALL_PREFIX=/usr/ ..
> >> Try this instead, just for the cmake part:
> >>
> >> cmake -Wno-dev \
> >>  -DRDK_INSTALL_INTREE=OFF \
> >>  -DRDK_BUILD_INCHI_SUPPORT=ON \
> >>  -DRDK_BUILD_AVALON_SUPPORT=ON \
> >>  -DRDK_BUILD_PYTHON_WRAPPERS=ON \
> >>  -DCMAKE_INSTALL_PREFIX=/usr \
> >>  -DRDKit_VERSION=`date +%Y.%m` \
> >>  ../
> >>
> >> then do the rest (cpack ...) and test again
> >> after an install of the freshly created package.
> >>
> >> I advise to wipe out any prior rdkit install from your machine
> >> before installing the new packages (so that we test what we intend to
> >> test).
> >>
> >> On a Debian-like:
> >> sudo apt-get remove $(dpkg -l | grep rdkit | awk '{print $2}')
> >>
> >>> cpack -G DEB
> >>> cpack -G RPM
> >>>
> >>>
> >>> On 27/11/2017 00:05, Francois BERENGER wrote:
> >>>> Hello,
> >>>>
> >>>> What are the exact commands you used to configure and compile rdkit?
> >>>>
> >>>> The script in there is my best attempt:
> >>>>
> >>>> https://github.com/rdkit/rdkit/pull/1655
> >>>>
> >>>> Regards,
> >>>> F.
> >>>>
> >>>> On 11/25/2017 12:50 AM, Tim Dudgeon wrote:
> >>>>> I got round to testing the debs and rpms but without success.
> >>>>>
> >>>>> For the debs the following were built:
> >>>>>
> >>>>> RDKit-2018.03.1.dev1-Linux-Development.deb
> >>>>> RDKit-2018.03.1.dev1-Linux-Extras.deb
> >>>>> RDKit-2018.03.1.dev1-Linux-Python.deb
> >>>>> RDKit-2018.03.1.dev1-Linux-Runtime.deb
> >>>>>
> >>>>> On a clean Ubuntu Xenial system, with just python added (apt-get -y
> >>>>> install python) the packages installed fine:
> >>>>>
> >>>>> # dpkg -i *.deb
> >>>>> Selecting previously unselected package rdkit-development.
> >>>>> (Reading database ... 5666 files and directories currently
> installed.)
> >>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Development.deb ...
> >>>>> Unpacking rdkit-development (2018.03.1.dev1) ...
> >>>>> Selecting previously unselected package rdkit-extras.
> >>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Extras.deb ...
> >>>>> Unpacking rdkit-extras (2018.03.1.dev1) ...
> >>>>> Selecting previously unselected package rdkit-python.
> >>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Python.deb ...
> >>>>> Unpacking rdkit-python (2018.03.1.dev1) ...
> >>>>> Selecting previously unselected package rdkit-runtime.
> >>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Runtime.deb ...
> >>>>> Unpacking rdkit-runtime (2018.03.1.dev1) ...
> >>>>> Setting up rdkit-development (2018.03.1.dev1) ...
> >>>>> Setting up rdkit-extras (2018.03.1.dev1) ...
> >>>>> Setting up rdkit-python (2018.03.1.dev1) ...
> >>>>> Setting up rdkit-runtime (2018.

Re: [Rdkit-discuss] How to create a list of molecules and iterate

2017-11-21 Thread Maciek Wójcikowski
Hi,

Wrap the reader into a list() function:

> mols = list(SDMolSupplier('in.sdf'))




Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-11-21 15:39 GMT+01:00 abhik <ab...@ebi.ac.uk>:

> Hi,
>
> I am running a similarity search process using rdkit where my query
> consists of ~20 templates and my target consists of ~1000 molecules. How
> can I create a list (or any container) of molecules that I can iterate
> later.
>
> My present algorithm is
>
> read template create molecule > read target create molecule > calculate
> similarity
> go back to template and follow the same process
>
> But by doing this I am creating same molecule multiple times, which is
> increasing the run time.
>
>
> Thank in advance for any helps.
> Regards,
> Abhik
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Atom mapping

2018-05-10 Thread Maciek Wójcikowski
Hi,

The smiles atom order is saved in a private property
'_smilesAtomOutputOrder', see discussion on Github:
https://github.com/rdkit/rdkit/issues/794

The order of atoms in PDB is the same as in RDKit's Mol object, thus it's
fairly easy to find such mapping.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-05-10 11:39 GMT+02:00 carlo del moro <delmoro.ca...@gmail.com>:

> Thanks to all for the replies,
>
> I put an example for better explain my problem.
> starting from a PDB representing HPE, I use RDKIT/obabel for calculate the
> relative SMILES. Next, using a RDKIT's function I fragment the smiles in
> substructure like this "CC(=O)O"; now I need to remap this substructure on
> the starting tridimensional structureand in order to get the atom
> coordinate. The task will be pretty easy if the numeration of the SMILES
> atom representation is the same of the starting PDB file. You know any
> methods to unify this two numeration? or to map the SMILES atom sequence on
> the PDB's one?
> This is the PDB for HPE.
>
>
> HETATM 4176  N   HPE B   2   5.227  20.107  15.512  1.00 17.92
>N
> HETATM 4177  CA  HPE B   2   4.065  20.646  16.205  1.00 16.87
>C
> HETATM 4178  C   HPE B   2   2.784  20.702  15.373  1.00 18.59
>C
> HETATM 4179  O   HPE B   2   2.806  21.092  14.215  1.00 17.45
>O
> HETATM 4180  CB  HPE B   2   4.377  22.085  16.699  1.00 17.52
>C
> HETATM 4181  CG  HPE B   2   5.532  22.067  17.720  1.00 14.97
>C
> HETATM 4182  CD  HPE B   2   5.886  23.416  18.279  1.00 17.87
>C
> HETATM 4183  CE1 HPE B   2   6.717  24.309  17.627  1.00 17.26
>C
> HETATM 4184  CE2 HPE B   2   5.385  23.752  19.520  1.00 19.20
>C
> HETATM 4185  CZ1 HPE B   2   7.025  25.546  18.162  1.00 17.16
>C
> HETATM 4186  CZ2 HPE B   2   5.698  24.993  20.061  1.00 22.45
>C
> HETATM 4187  CH  HPE B   2   6.517  25.906  19.409  1.00 19.18
>C
>
> Thanks to all.
>
> Carlo
>
> On Wed, May 9, 2018 at 8:37 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
> wrote:
>
>> On 05/09/2018 10:27 AM, carlo del moro wrote:
>> > Dear All,
>> >
>> > we would like to know if it is possible to map the atom's ID of a SMILES
>> > represented substructure to the atom sequence of a ligand contained in a
>> > pdb file. This in order to get the spatial coordinates related to such
>> > substructure.
>>
>> http://alatis.nmrfam.wisc.edu/ will generate unique stable IDs from a 3D
>> structure, and output the old->new ID map. It'll take a PDB,  you'll
>> have to convert your SMILES into a 3D .mol. ALATIS atom IDs should be
>> the same in the two maps, *provided both inputs describe the exact same
>> ligand*.
>>
>> (It's the *substructure* bit that I'm not entirely sure about.)
>> --
>> Dimitri Maziuk
>> Programmer/sysadmin
>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] another request for feedback on a new python API documentation format

2018-05-08 Thread Maciek Wójcikowski
Hi Greg,

Speaking about the new docs - would it be possible to have documentation
for few stable releases back, like 2017.09, 2017.03, etc. Recently I was
trying to establish the changes in RDKit's API and ended up using git
blame, whereas I could be able to get that info from changing the release
on the docs.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-05-02 11:17 GMT+02:00 David Cosgrove <davidacosgrov...@gmail.com>:

> Hi Greg,
> After a quick poke about, I think the new documentation looks great in
> general.  If a change is forced on you, then I suggest you just do it in a
> way that makes your life as easy as possible.  If people don't like it,
> they can always put the effort in to do something different and then I
> expect they'll quickly come round to realising that your way is perfectly
> fine.  One way of fixing the docstring formatting would be to put
> instructions and a couple of examples somewhere handy and ask people to fix
> problems when they encounter them as they read the docs.  That should be a
> small effort from each person that would hopefully fix the important ones
> quickly in a self-prioritising manner.
> Thanks for putting the time into this,
> Dave
>
>
> On Wed, May 2, 2018 at 8:40 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> Just over a year ago I asked for feedback on a new documentation format
>> for the RDKit python API: https://www.mail-archive.
>> com/rdkit-discuss@lists.sourceforge.net/msg06688.html
>> Some useful feedback came in on that thread (thanks to those who replied
>> there and in private email), but I ran out of time/motivation to spend time
>> on this.
>>
>> With my motivation recharged thanks to the "fun" of using epydoc to
>> document the last release, I revisited the topic this weekend and actually
>> made some progress.[1] I'd like to gather a second round of feedback on
>> that.
>>
>> The documentation is here:
>> http://rdkit.org/docs_temp/index.html
>> The API docs (which are where the biggest changes are) are here:
>> http://rdkit.org/docs_temp/api-docs.html
>>
>> To address some of the things raised last time:
>> - This really isn't optional. It's been more than a decade since epydoc
>> was updated and it requires python 2.7.
>> - My previous attempt to auto-generate docs used pdoc (
>> https://github.com/BurntSushi/pdoc). That project also seems to have
>> died, so it's not really an option.
>> - Based upon the two factors above I decided to use the autodoc
>> functionality that's part of Sphinx. It's not perfect, but it's supported
>> (and seems likely to continue to be so since it's part of Sphinx)
>>
>> - The docs now have a search box
>>
>> - We've lost the overview (list of classes/functions/etc) that epydoc
>> provides. There likely is a way to do this with sphinx, but I haven't
>> managed to get it to work yet
>>
>> - Formatting: Some of the docstrings end up looking pretty good, others
>> are awful. Here's a module that demonstrates both sides of the coin:
>> http://rdkit.org/docs_temp/source/rdkit.Chem.AtomPairs.Pairs
>> .html#module-rdkit.Chem.AtomPairs.Pairs
>> Fixing this is "just" a matter of editing the doc strings. This is
>> reasonably mechanical, but unfortunately not automatable, work. It should
>> be done, but in the meantime the broken docstrings aren't completely
>> useless.
>>
>> There's also a github issue for this:
>> https://github.com/rdkit/rdkit/issues/1656
>> I'm doing the work on this branch:
>> https://github.com/greglandrum/rdkit/tree/dev/usinx_sphinx_autodoc
>>
>> -greg
>> [1] Remember how I said I was going to take a short break and do
>> something fun? This isn't that.
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Properties through Pickle

2017-10-26 Thread Maciek Wójcikowski
Hi Lionel,

There is PropertyMol class which does what you want, see
http://www.rdkit.org/Python_Docs/rdkit.Chem.PropertyMol.PropertyMol-class.html


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-10-26 15:17 GMT+02:00 Lionel Colliandre <lio...@beckerdata.com>:

> Hi everyone,
>
> I have molecules from a SDfile to store in Pickle format.
>
> So I read the SDfile and with mol.GetProp() function, I can access to a
> property of the molecule included in the SDfile. Unfortunately, when I
> convert the molecule in Pickle and come back to the rdkit mol format, the
> property is lost.
>
> What I am doing wrong?
>
> Here is my script. I am using RDKit 2015_03_1 on Windows 10, 64bit.
>
>
> ---
>
> from __future__ import print_function
> from rdkit import Chem
> import cPickle as pickle
>
>
> SDfile = "sdfile.sdf"
>
> suppl = Chem.SDMolSupplier(SDfile, sanitize=True)
>
> ms = [x for x in suppl if x is not None]
>
> ## Convert in Pickle
> pklsuppl = []
> for mol in suppl:
> if mol is None: continue
> if mol.HasProp('Code'):
> print('Prop ok in Mol')
> else:
> print('No Prop in Mol')
> pkl = pickle.dumps(mol, protocol=pickle.HIGHEST_PROTOCOL)
> newmol = pickle.loads(pkl)
> if newmol.HasProp('Code'):
> print('Prop ok in NewMol')
> else:
> print('No Prop in NewMol')
> ---
>
>
> I obtain "Prop ok in Mol" and "No Prop in NewMol".
>
> Thanks for your help,
>
> Lionel
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] PDB and connect the dots algorithm

2017-10-26 Thread Maciek Wójcikowski
Hi,

Is there a way to force PDB parser not to invoke ConnectTheDots algorithm?
We are developing an automated protein fixer for RDKit, but even if our Mol
sanitizes it does not roundtrip through MolToPDBBlock -> MolFromPDBBlock.
The latter "rediscovers" problems that were fixed.

If not is it possible to have it as an option? I think most of the problems
in PDB parser are connected with ConnectTheDots and if the molecule is
parsed correctly and saved (in RDKit or whatever toolkit) the automated
bonding renders it impossible to load it again to RDKit.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to convert numpy array to rdkit fingerprint object?

2018-01-11 Thread Maciek Wójcikowski
Hi,

In DataStrucs there are CreateFrom* functions which do what you want,
although you'd have to pass numpy array to a string of ints. ''.join(array)
would probably be enough.

See
http://www.rdkit.org/Python_Docs/rdkit.DataStructs.cDataStructs-module.html#CreateFromBitString


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-01-11 16:30 GMT+01:00 Michał Nowotka <mmm...@gmail.com>:

> Hi,
>
> Imagine I have two numpy arrays containing zeros and ones (or bools)
> effectively being fingerprints:
>
> np_1, np_2 = some_fingerprints_as_np_arrays()
>
> I want to convert them both to rdkit fingerprint objects so I can use
> DiceSimilarity:
>
> from rdkit import DataStructs
>
> # this won't work becuse of type incompatibility
> DataStructs.DiceSimilarity(np_1, np_2)
>
> In the http://www.rdkit.org/Python_Docs/rdkit.DataStructs.
> cDataStructs.ExplicitBitVect-class.html
> docs I can't find any constructor apart from FromBase64.
> Any hints?
>
> Cheers,
>
> Michał
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Tanimoto Similarity

2018-07-04 Thread Maciek Wójcikowski
Hi

As Nils has mentioned this is fingerprint dependent. ECFP4 have the
significant cutoff ~0.4, see https://pubs.acs.org/doi/10.1021/ci7004498


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-07-04 8:44 GMT+02:00 Nils Weskamp :

> Dear Phuong,
>
> unfortunately, there is no generic answer to this question since it is
> highly dependent on the fingerprint, the type of compounds, your
> specific application and also your chemical intuition. I can only
> recommend to test a range of different cutoff values and to see how
> happy you are with the results.
>
> If you have access to a list of analogs that you definitely want to find
> ("known actives") and a large set of known irrelevant compounds, you
> might be able to use statistical analyses to derive some kind of
> "optimal" threshold.
>
> If we are talking about path-oriented fingerprints (like the RDKit
> Chemical Fingerprints) and "normal" drug-like molecules, I would
> typically go down to 0.70 - 0.75 and then manually weed out false hits.
>
> Hope this helps,
> Nils
>
> Am 04.07.2018 um 02:24 schrieb Phuong Chau:
> > To whom it may concern,
> >
> > I was working on finding a group of possible neighbors (similar)
> > chemicals based on Tanimoto Similarity. I am not sure what is the
> > optimal cutoff for finding similar chemicals. I searched online and they
> > said it is 0.85 but there are also many exceptions they mentioned about.
> > Do you have any suggestions?
> >
> > Thank you so much for your help
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rotatable bond

2018-03-09 Thread Maciek Wójcikowski
Hi Mariana,

You can do exactly what this function is doing - counting matches of a
SMARTS definition of rotatable bond. For a bond you can check if it matches
mentioned SMARTS. The definition is here:
http://www.rdkit.org/Python_Docs/rdkit.Chem.Lipinski-module.html

SMARTS_DEF = '[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]'
> bond = mol.GetBondWithIdx(5)  # Get any bond in a molecule
> rot_mol = Chem.MolFromSmarts(SMARTS_DEF)
> rot_bond = rot_mol.GetBondWithIdx(0)
> if bond.Match(rot_bond):
> print(rotatable)


Note there are more advanced definitions of such bonds, above example is
taken from RDKit's Lipinski module.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-03-09 13:56 GMT+01:00 Mariana Assmann <mariana.assm...@gmx.net>:

> Hi everyone,
>
> is it possible to check for a single bond if it is rotatable? I only found
> a function to calculate the total number of rotatable bonds in the
> molecule, but nothing specific to a selected bond.
>
> Best regards,
> Mariana
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-23 Thread Maciek Wójcikowski
>
> >  which could of course also be changed to something expensive to
> calculate.
> Yes, that could be possible. Abstractly, let the first 20 bytes of each
> fingerprint be a salt, and use something like bcrypt so each fingerprint
> test requires that the query structure be re-fingerprinted for the
> per-fingerprint hash function.

I think salting is a must. If any mony is at stake, I'd suspect equally
computing power used to crack it. The closes analogy and walk-around for
the slow computing hashing are "rainbow tables" for strings. So instead of
computing the hash, you just need to look it up. Without salting such
lookup tables would not be that big i suppose. If you had such lookup
table, then you'd only need an algorithm (or GA) that builds a molecule
from a set of environments not randomly build it.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-04-22 22:25 GMT+02:00 Andrew Dalke <da...@dalkescientific.com>:

> On Apr 22, 2018, at 20:22, Nils Weskamp <nils.wesk...@gmail.com> wrote:
> > Actually, I *was* also thinking about your use cases 2 and 3 since you
> > also need some form of hash function to map substructures to bit
> > numbers. This is normally a rather simple function / pseudo random
> > generator,
>
> Strictly speaking, this is not a requirement.
>
> The term "fingerprint" has taken on quite an encompassing meaning since
> 1990.
>
> The molecular formula is a count fingerprint with 118 keys, based on the
> atomic number. There's no need for hash function there. "CCO" might be:
>   [0, 0, 0, 0, 0, 2, 0, 1, ...]
>
> Or it can be written in more compact form like {"C": 2, "O": 1}.
>
> As an alternative, I could use a mapping from canonical substructures to
> counts, so "CCO" becomes:
>
>   {"C": 2, "O": 1, "CC": 1, "CO": 1, "CCO": 1}
>
> This doesn't require a hash. (While I represent that as a Python
> dictionary, which uses a hash table underneath, it could be implemented
> using a red-black tree or B-tree, or with a simple linear search.)
>
> It's only if I want to convert this into fixed length representation where
> I have to figure out some sort of encoding scheme.
>
> Even then, I don't need a PRNG or hash seed. Suppose I use a bit vector. I
> could have a table which maps all canonical substructures to its bit
> pattern. If I have an unknown fragment, I could use RANDOM.ORG to get the
> bits.
>
> Downsides include potentially unbounded table growth and the need for a
> centralized table.
>
> This is the approach that Zatocoding used, and I see Chemical Zatocoding
> as the only precursor to Daylight hash fingerprints.
>
> >  which could of course also be changed to something expensive to
> calculate.
>
>
> Yes, that could be possible. Abstractly, let the first 20 bytes of each
> fingerprint be a salt, and use something like bcrypt so each fingerprint
> test requires that the query structure be re-fingerprinted for the
> per-fingerprint hash function.
>
> It would, however, take an absurdly long time to do a similarity search.
>
> And in any case, before going further along that path, we would need to
> figure out the risk model. Brian started by saying that he wanted to
> obfuscate molecules for security, but didn't say what he want to use them
> for, and if he want to secure them against nation-states, or simply against
> me. ;)
>
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromMol2Block changes carboxylic group representation

2018-03-28 Thread Maciek Wójcikowski
Hi Maria,

This is one of many routines molecule undergoes when reading from a "Corina
Mol2", which is implemented in RDKit. Unfortunately there is no way to turn
it off, due to the (gu)estimation of formal charges.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-03-28 11:46 GMT+02:00 Maria Matveyeva <yurievnama...@gmail.com>:

> Hello all,
> When i read mol2 format from either mol2 file or block, it changes
> represenation of carboxylic group from tripos  aromatic representaiton with
> -0.5 charges on oxygens to representaion with one single and one double
> bond (when same representation read from sdf/mol  it retains "aromatic"
> form):
>
>
>
> from rdkit import Chem
>
> mol = Chem.MolFromMol2Block('@MOLECULE\n\n7 6
> 1\nSMALL\nUSER_CHARGES\n@ATOM\n1\tC183.123548.4843
> -1.9335\tC.3\t\t1\tnoname\t0.\n2\tC284.405549.1718
> -2.1563\tC.2\t\t1\tnoname\t0.\n3\tO185.578348.5739
> -1.8454\tO.co2\t1\tnoname -0.5000\n4\tO284.432750.3304
> -2.6301\tO.co2\t1\tnoname -0.5000\n5\tH182.711747.9177
>  0.7151\tH\t\t1\tnoname\t0.\n6\tH282.615348.8969
> -1.0496\tH\t\t1\tnoname\t0.\n7\tH382.472748.5910
> -2.8139\tH\t\t1\tnoname\t0.\n@BOND\n1\t1\t2\
> t1\n2\t2\t3\tar\n3\t2\t4\tar\n4\t1\t6\t1\n5\t1\t7\t1\n6\t5\t1\t1\n@
> SUBSTRUCTURE\n1\tnoname\t1\n')
>
> Chem.MolToSmiles(mol)
>
> 'CC(=O)[O-]'
>
> Thanks in advance,
> Maria
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [Rdkit-devel] Where's the 2018.03 release?

2018-04-04 Thread Maciek Wójcikowski
Hi Greg,

I don't know if it is of any help, but we have fixed recent conda (linking
against Python library) for OpenBabel by adding additional linker
parameters for Mac in cmake config, see Matts patch here:
https://github.com/openbabel/openbabel/pull/1807/files


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-04-04 5:39 GMT+02:00 Greg Landrum <greg.land...@gmail.com>:

> Dear all,
>
> As you may have noticed, the new release (2018.03) is late.
>
> This is not, as you might expect, connected to my previous email about the
> backend code changes. It turned out to be much more difficult than
> anticipated to straighten out the problems we were having with newer
> versions of conda, particularly on the Mac, and we didn't want to do a
> release until those were taken care of.
>
> It looks like we're almost there. Hopefully we will be able to do a beta
> of the 2018.03 release by the end of the week.
>
> Best,
> -greg
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-devel mailing list
> rdkit-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are atom and bond indexes deterministic?

2018-10-02 Thread Maciek Wójcikowski
Hi Peter and Nils,

To supplement Nils comment I'd like to add that during writing the Mol
atoms nor bonds order is not changed, but the canonical atom mapping is
saved in molecular property "_smilesAtomOutputOrder". This does not include
bonds though, it shouldn't change, but if you wish to be safe it is best to
save the two atom indices instead the bond idx itself.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


wt., 2 paź 2018 o 22:57 Nils Weskamp  napisał(a):

> Hi Peter,
>
> to the best of my knowledge: for a given SMILES string, you should
> always end up with the same molecule object.
>
> On the other hand, generation of (canonical / unique) SMILES often
> reorders atoms and bonds (to ensure that the SMILES is unique for a
> given structure). A conversion Molecule -> SMILES -> Molecule could thus
> lead to a different ordering of atoms and bonds and you will have to
> canonicalize your structure before you generate your index. [Or make
> sure that you use non-canonical SMILES.]
>
> Best,
> Nils
>
> Am 02.10.2018 um 22:32 schrieb Peter St. John:
> > If I store a molecule as a SMILES string, along with relevant
> > information about different bonds, is it safe to annotate those bond
> > entries by bond index?
> >
> > I.e., if I create a new rdkit Molecule with
> > rdkit.Chem.MolFromSmiles(xxx), will the bond ordering always be the
> > same? If not, does anyone know a a robust way of specifying a bond
> > within a molecule as a string-based representation?
> >
> > Thanks for the help!
> > -- Peter
> >
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] organometallics?

2018-09-13 Thread Maciek Wójcikowski
I would suggest that all coordination bonds to metal that exceed the
accepted valence of an atom could be mark as zero-ordered. This is what
happens in recent PDB reader changes and fixed a lot of problems with
sanitization.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


czw., 13 wrz 2018 o 18:16 Jan Halborg Jensen 
napisał(a):

> Here’s a modest step in the right direction
> https://www.wildcardconsulting.dk/useful-information/how-to-solve-problems-with-coordinate-bonds-in-rdkit/
>
> Best regards, Jan
>
> On 13 Sep 2018, at 15:14, Greg Landrum  wrote:
>
> Hi Michal,
>
> Though the RDKit theoretically has many of the infrastructure pieces
> required to handle organometallics (though there's not a lot you can do
> with them once you've loaded them), the difficult part almost always ends
> up being finding input files that have reasonably machine-readable
> structures in them.
>
> If you have some examples you can share, I'd be happy to take a look to
> see if I can suggest ways to read them in.
>
> Best,
> -greg
>
>
> On Wed, Sep 12, 2018 at 10:30 PM Michal Krompiec <
> michal.kromp...@gmail.com> wrote:
>
>> Hello,
>> I've been asked to analyze a dataset of organometallic compounds
>> (provided in SDF), but it turns out that most of them are not compatible
>> with RDKit (due to having pi-alkene, pi-allyl, cyclopentadienyl et al.
>> ligands). The structures can be correctly represented in Marvin, though.
>> Can anybody point me to a toolkit (or RDKit hack) that can handle these?
>> Best,
>> Michal
>>
>> 
>> Dr. Michal Krompiec
>> Adjunct Professor
>> School of Chemistry, University of Southampton
>> Highfield, Southampton SO17 1BJ, UK
>>
>> and
>> Head of Computational Modelling | Performance Materials | Early Research
>> and Business Development
>> Merck
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] numpy array to bit vector

2019-11-14 Thread Maciek Wójcikowski
Hi Thomas,

You could also use SetBitsFromList() method:

> bv.SetBitsFromList(np.where(ar)[0].tolist())
>


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


czw., 14 lis 2019 o 16:28 Greg Landrum  napisał(a):

> Hi Thomas,
>
> There may be more efficient ways to do this, but here's something that
> works (and isn't the slowest thing I came up with):
> def np_to_bv(fv):
> bv = DataStructs.ExplicitBitVect(len(fv))
> for i,v in enumerate(fv):
> if v:
> bv.SetBit(i)
>return bv
>
> -greg
>
>
>
> On Thu, Nov 14, 2019 at 3:47 PM Thomas Evangelidis 
> wrote:
>
>> Greetings,
>>
>> I am opening this old thread again for someone to answer my initial
>> question this time, which was "How do I convert numpy.ndarray objects to
>> rdkit.DataStructs.ExplicitBitVect objects?". At the time I asked
>> the question I circumvented the problem by calculating Tanimoto
>> similarities with Scipy, but now I want to utilize all similarity functions
>> offered by rdkit.DataStructs. I am struggling with that for quite some time
>> although I feel that the answer is simple.
>>
>> So basically, I have these arrays and want to calculate their
>> DataStructs.McConnaugheySimilarity similarity. How do I do it?
>>
>> fv1 = numpy.array([1,1,0,0,1,0,1])
>>
>>
>> fv2 = numpy.array([0,1,1,0,1,0,0])
>>
>> Thanks in advance.
>> Thomas
>>
>>
>> --
>>
>> ==
>>
>> Dr. Thomas Evangelidis
>>
>> Research Scientist
>>
>> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
>> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>
>> , Prague, Czech Republic
>>   &
>> CEITEC - Central European Institute of Technology
>> <https://www.ceitec.eu/>, Brno, Czech Republic
>>
>> email: teva...@gmail.com, Twitter: tevangelidis
>> <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
>> <https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reduced graphs fingerprints in postgresql cartridge

2019-12-17 Thread Maciek Wójcikowski
Hi Peter,

You can index any binary fingerprint (both sparse and explicit). Also, you
can create any custom fp in python and pass it over to postgresql. That
said, I have not managed to transfer a sparse one from python to postgres,
only the explicit.

Best,
Maciek

wt., 17 gru 2019, 13:00 użytkownik Peter Schmidtke <
peter.schmid...@discngine.com> napisał:

> Hi all,
>
> is it possible to index the reduced graphs fingerprints in the pgsql
> cartridge as well? From my understanding the fingerprint provided by rdkit
> isn’t exactly in the same format as for standard morgan fingerprints.
> Would this work anyhow? if yes with which similarity functions in pgsql?
> Anybody ever tried this and has a bit of documentation?
>
> Thanks in advance
>
> Peter
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reduced graphs fingerprints in postgresql cartridge

2019-12-17 Thread Maciek Wójcikowski
While creating more detailed answer for you I stumbled upon very useful
blogpost by Greg
https://rdkit.blogspot.com/2017/04/using-custom-fingerprint-in-postgresql.html
which
explains in detail how custom fingerprints can be handled.

Both Tanimoto and Dice are supported for any sfp/bfp.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


wt., 17 gru 2019 o 18:38 Maciek Wójcikowski 
napisał(a):

> Hi Peter,
>
> You can index any binary fingerprint (both sparse and explicit). Also, you
> can create any custom fp in python and pass it over to postgresql. That
> said, I have not managed to transfer a sparse one from python to postgres,
> only the explicit.
>
> Best,
> Maciek
>
> wt., 17 gru 2019, 13:00 użytkownik Peter Schmidtke <
> peter.schmid...@discngine.com> napisał:
>
>> Hi all,
>>
>> is it possible to index the reduced graphs fingerprints in the pgsql
>> cartridge as well? From my understanding the fingerprint provided by rdkit
>> isn’t exactly in the same format as for standard morgan fingerprints.
>> Would this work anyhow? if yes with which similarity functions in pgsql?
>> Anybody ever tried this and has a bit of documentation?
>>
>> Thanks in advance
>>
>> Peter
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Saving chains from PDB file

2019-10-07 Thread Maciek Wójcikowski
If you turn off the sanitization the splitting should be super fast too. If
that is the only thing you would like to do.

pon., 7 paź 2019, 10:31 użytkownik Téletchéa Stéphane <
stephane.teletc...@univ-nantes.fr> napisał:

> Le 05/10/2019 à 12:46, Chris Swain via Rdkit-discuss a écrit :
> > Hi,
> >
> > I have a number of PDB files (foo.pdb.gz) and I want to separate each
> chain in each file out into a separate file. So if a file contains 4 chains
> it will generate 4 separate files.
> >
> > Can I do this using RDKit, if so how?
> >
> > Cheers
> >
> > Chris
>
> Dear Chris,
>
> Even this could be performed in rdkit, I would recommend doing it using
> an external tool, for instance using Biopython and the Bio.PDB module
> (https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ),
> or even ProDy (http://prody.csb.pitt.edu/).
>
> Rdkit needs to wrap a lot of atom definitions to load the pdb file
> properly, and it takes time (minutes on my machine, which is a decent
> workstation :-).
> It will be lightning fast using Bio.PDB or prody, compared to rdkit.
>
> If you still want to use rdkit only, and need to reuse rdkit
> representation of the PDB file, then (c)pickle it (python2):
>
> import cPickle
> from rdkit import Chem
>
> def processReceptor(r):
>   try:
>  h=open('receptor.pkl','r')
>  receptor=cPickle.load(h)
>  h.close()
>except Exception as e:
>  receptor = Chem.MolFromPDBFile(r)
>  f=open('receptor.pkl','w')
>  cPickle.dump(receptor,f)
>  f.close()
>
>return receptor
>
> HTH,
>
> Stéphane
>
> --
> Assistant Professor in BioInformatics, UFIP, UMR 6286 CNRS, Team Protein
> Design In Silico
> UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322
> Nantes cedex 03, France
> Tél : +33 251 125 636 / Fax : +33 251 125 632
> http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Inchi which flavour??

2019-10-09 Thread Maciek Wójcikowski
Mike,

On top of what Greg said what might be particularly useful is an options
parameter where you can pass some non default params to InChI call.

śr., 9 paź 2019, 07:22 użytkownik Greg Landrum 
napisał:

> Hi Mike,
>
> The InChI API itself is not exposed. The contents of the module are in the
> documentation along with some explanations of how to call it:
> http://rdkit.org/docs/source/rdkit.Chem.rdinchi.html
>
> If something is missing there, please let us know.
> -greg
>
>
> On Tue, Oct 8, 2019 at 5:20 PM  wrote:
>
>> Dear RdKit users,
>> I was reading the inchi module docs and I couldn't find methods to call
>> the InChI API.  Are these exposed in RDKit?
>> It says the default is the standard Inchi.  What happens when this
>> conversion fails?
>>
>> Thanks,
>> Mike
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Saving chains from PDB file

2019-10-05 Thread Maciek Wójcikowski
Paolo and Chris,

There actually is Rdkit function to do this very task: SplitMolByPDBChainId
http://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.SplitMolByPDBChainId


sob., 5 paź 2019, 14:42 użytkownik Paolo Tosco 
napisał:

> Hi Chris,
>
> The following, though quite inefficient, will work:
>
> from rdkit import Chem
> mol = Chem.MolFromPDBFile("1CX2.pdb")
> chains = {a.GetPDBResidueInfo().GetChainId() for a in mol.GetAtoms()}
> chain_mols = {c: Chem.RWMol(mol) for c in chains}
> for c, m in chain_mols.items():
> bonds_to_remove = [(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in
> m.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetChainId() != c or
> b.GetEndAtom().GetPDBResidueInfo().GetChainId() != c]
> atoms_to_remove = [a.GetIdx() for a in m.GetAtoms() if
> a.GetPDBResidueInfo().GetChainId() != c]
> [m.RemoveBond(*b) for b in bonds_to_remove]
> [m.RemoveAtom(a) for a in sorted(atoms_to_remove, reverse=True)]
> Chem.MolToPDBFile(m, "{0:s}.pdb".format(c))
>
> Individual chains are saved to .
>
> As chains will be separate fragments, a more efficient way would to use
> rdmolops.GetMolFrags(asMols=True) which would avoid the bond/atom removal.
>
> Sorry for the poor formatting but this is what I could come up with
> IPython on the iPhone :-(
>
> p.
>
> > On 5 Oct 2019, at 12:46, Chris Swain via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
> >
> > Hi,
> >
> > I have a number of PDB files (foo.pdb.gz) and I want to separate each
> chain in each file out into a separate file. So if a file contains 4 chains
> it will generate 4 separate files.
> >
> > Can I do this using RDKit, if so how?
> >
> > Cheers
> >
> > Chris
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Maciek Wójcikowski
Thank Nils for pointing both algorithms to the list. Interestingly Greg is
putting together scaffold tree algorithm in this PR
https://github.com/rdkit/rdkit/pull/2911 so anyone could try it in the
nearest future, hopefully 2020 release.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 10 lut 2020 o 21:40 Nils Weskamp  napisał(a):

> Hi Alexis,
>
> if you go down that route and calculate artifical skeletons, you could
> also go all the way and use an algorithm like HierS [1] or the scaffold
> tree [2] to perform a recursive fragmentation of your queries and
> molecules into their various rings and ring systems. If a query contains
> a ring system that is not present in the molecule, it cannot be a
> substructure.
>
> This is something you should be able to check with basic string matching
> or lookups in dictionaries / hashes instead of doing fingerprint
> calculations and comparisons.
>
> Not sure if that is actually faster, but might be worth a try.
>
> Hope this helps,
> Nils
>
> [1] https://pubs.acs.org/doi/abs/10.1021/jm049032d
> [2] https://pubs.acs.org/doi/10.1021/ci600338x
>
> Am 10.02.2020 um 21:01 schrieb Alexis Parenty:
> > Hi Maciek, thanks for your response. I did try that function too, but it
> > also takes smiles only (not smarts). I think the solution of Gregori is
> > very interesting: I am going to transform all smiles and smarts into
> > their single-bonded-carbon-based skeleton and will store the pattern
> > fingerprint of those skeletons in a dictionary using the smarts or the
> > smiles as a key. Then I will use your proposed function to match the
> > sub-skeletons with skeletons and will only do the expensive molecular
> > graph substructure search of the keys of the dictionary from which the
> > dictionary values have been identified as potential substructure of
> > others. Thanks Gregori!
> > Any other good tips?
> > Cheers,
> > Alexis
> >
> > On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski  > <mailto:mac...@wojcikowski.pl>> wrote:
> >
> > Alexis,
> >
> > I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is
> > the function you are looking for here. More advanced usage and code
> > snippets you can find on RDKit blog post that Greg has put together
> > here:
> https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html
> >
> > Best,
> > Maciek
> >
> > 
> > Pozdrawiam,  |  Best regards,
> > Maciek Wójcikowski
> > mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl>
> >
> >
> > pon., 10 lut 2020 o 16:10 Alexis Parenty
> >  > <mailto:alexis.parenty.h...@gmail.com>> napisał(a):
> >
> > Dear Rdkiters,
> >
> > I am interested in doing substructure searches between many
> > thousands structures and many thousands of fragments, as quickly
> > as possible, with reasonable accuracy (> 0.95)...
> >
> > I did read Greg's excellent post on that subject:
> >
> >
> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
> >
> > I was using the rdkit pattern fingerprint approach to filter out
> > any fragments that have no chance of matching the bigger
> > structure through the slow and more accurate molecular graph
> > approach, saving a lot of time.
> >
> > However, I realized that this rdkit pattern fingerprint approach
> > only works well if we compared smiles with smiles:
> >
> >
> >
> > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
> > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
> > pfp_structure =
> > Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
> >
> > frag_bits = set(pfp_frag.GetOnBits())
> > structure_bits = set(pfp_structure.GetOnBits())
> >
> > if frag_bits.issubset(structure_bits):
> > return True
> > else:
> > return False
> >
> >
> >
> > Unfortunately, some of my fragments are Smarts that are not
> > valid Smiles: Using Chem.MolFromSmarts(smarts) gives really poor
> > result (Many False Positives leading to poor Specificity).
> > Interestingly, there is no False Negative, leading to a
> > Sensitivity of 1!
> >
> >
> >
> > def frag_is_a_substructure_of_structure_via_pfp(

Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Maciek Wójcikowski
The function takes two Explicit or Sparse bit vectors. Could you elaborate
on what you mean that it accept smarts only? PatternFingerprints will work
with SMARTS too.

It is always more effective to have the SMARTS as explicit as possible,
since if you have all alternative atoms, the FP cannot make a lot of
assumptions about the molecule, so things like filling your valences on
atoms and defining bonds explicitly as single will help a lot. For very
small SMARTS the screen out rate might be small anyhow, unfortunately.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 10 lut 2020 o 21:02 Alexis Parenty 
napisał(a):

> Hi Maciek, thanks for your response. I did try that function too, but it
> also takes smiles only (not smarts). I think the solution of Gregori is
> very interesting: I am going to transform all smiles and smarts into their
> single-bonded-carbon-based skeleton and will store the pattern fingerprint
> of those skeletons in a dictionary using the smarts or the smiles as a key.
> Then I will use your proposed function to match the sub-skeletons with
> skeletons and will only do the expensive molecular graph substructure
> search of the keys of the dictionary from which the dictionary values have
> been identified as potential substructure of others. Thanks Gregori!
> Any other good tips?
> Cheers,
> Alexis
>
> On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski 
> wrote:
>
>> Alexis,
>>
>> I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the
>> function you are looking for here. More advanced usage and code snippets
>> you can find on RDKit blog post that Greg has put together here:
>> https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html
>>
>> Best,
>> Maciek
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>>
>> pon., 10 lut 2020 o 16:10 Alexis Parenty 
>> napisał(a):
>>
>>> Dear Rdkiters,
>>>
>>> I am interested in doing substructure searches between many thousands
>>> structures and many thousands of fragments, as quickly as possible, with
>>> reasonable accuracy (> 0.95)...
>>>
>>> I did read Greg's excellent post on that subject:
>>>
>>>
>>> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>>>
>>> I was using the rdkit pattern fingerprint approach to filter out any
>>> fragments that have no chance of matching the bigger structure through the
>>> slow and more accurate molecular graph approach, saving a lot of time.
>>>
>>> However, I realized that this rdkit pattern fingerprint approach only
>>> works well if we compared smiles with smiles:
>>>
>>>
>>>
>>> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>>> pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
>>> pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>>>
>>> frag_bits = set(pfp_frag.GetOnBits())
>>> structure_bits = set(pfp_structure.GetOnBits())
>>>
>>> if frag_bits.issubset(structure_bits):
>>> return True
>>> else:
>>> return False
>>>
>>>
>>>
>>> Unfortunately, some of my fragments are Smarts that are not valid
>>> Smiles: Using Chem.MolFromSmarts(smarts) gives really poor result (Many
>>> False Positives leading to poor Specificity). Interestingly, there is no
>>> False Negative, leading to a Sensitivity of 1!
>>>
>>>
>>>
>>> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>>> pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
>>> pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>>>
>>> frag_bits = set(pfp_frag.GetOnBits())
>>> structure_bits = set(pfp_structure.GetOnBits())
>>>
>>> if frag_bits.issubset(structure_bits):
>>> return True
>>> else:
>>> return False
>>>
>>>
>>>
>>> Is there a way to use pattern fingerprint (or other method) for
>>> substructure searches independently of the Smiles/Smarts format of the
>>> fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I
>>> am left with?
>>>
>>> Many thanks,
>>>
>>> Alexis
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Maciek Wójcikowski
Alexis,

I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the
function you are looking for here. More advanced usage and code snippets
you can find on RDKit blog post that Greg has put together here:
https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html

Best,
Maciek


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 10 lut 2020 o 16:10 Alexis Parenty 
napisał(a):

> Dear Rdkiters,
>
> I am interested in doing substructure searches between many thousands
> structures and many thousands of fragments, as quickly as possible, with
> reasonable accuracy (> 0.95)...
>
> I did read Greg's excellent post on that subject:
>
>
> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>
> I was using the rdkit pattern fingerprint approach to filter out any
> fragments that have no chance of matching the bigger structure through the
> slow and more accurate molecular graph approach, saving a lot of time.
>
> However, I realized that this rdkit pattern fingerprint approach only
> works well if we compared smiles with smiles:
>
>
>
> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
> pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
> pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>
> frag_bits = set(pfp_frag.GetOnBits())
> structure_bits = set(pfp_structure.GetOnBits())
>
> if frag_bits.issubset(structure_bits):
> return True
> else:
> return False
>
>
>
> Unfortunately, some of my fragments are Smarts that are not valid Smiles:
> Using Chem.MolFromSmarts(smarts) gives really poor result (Many False
> Positives leading to poor Specificity). Interestingly, there is no False
> Negative, leading to a Sensitivity of 1!
>
>
>
> def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
> pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
> pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
>
> frag_bits = set(pfp_frag.GetOnBits())
> structure_bits = set(pfp_structure.GetOnBits())
>
> if frag_bits.issubset(structure_bits):
> return True
> else:
> return False
>
>
>
> Is there a way to use pattern fingerprint (or other method) for
> substructure searches independently of the Smiles/Smarts format of the
> fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I
> am left with?
>
> Many thanks,
>
> Alexis
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and Virtual machines

2020-01-23 Thread Maciek Wójcikowski
Thomas,

Could you double check if your VM has the same set of instructions as your
host? For hardware popcounts, which are used to accelerate fingerprint
operations, they might have profound impact on performance. SSE4.2 is
probably the one that is used in the RDKit (at least this is stated in the
code).

For KVM https://www.linux-kvm.org/page/Tuning_KVM (there are linux commands
to check what is available on guest, so might be helpful for you too).
It also seems that in VMWare world this might be tricky, as it is
considered to be a stability hazard:
https://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.vcenterhost.doc_50%2FGUID-8B226625-4923-410C-B7AF-51BCD2806A3B.html

Best,
Maciek


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


czw., 23 sty 2020 o 08:15 Thomas Strunz  napisał(a):

> Hi Greg,
>
> reopening this old question. I can see that there are potential
> differences between rdkit version and especially Linux and Windows but
> let's lieave that aside for now.
>
> After further "playing around" however I really have the impression there
> is a real issue with running rdkit (or python?) in a virtualized operating
> sytem. Since most production software and/or when using the cloud will
> mostly run in a virtualized operating system, I think this should be a
> fairly relevant topic worth investigation. As you showed yourself, the AWS
> System also was fairly slow.
>
> For following observations I'm keeping the same datasets as before which
> is from your blog post ( /Regress/Scripts/fingerprint_screenout.py).
> basically it's that code slightly adapted:
>
> mols = []
> with gzip.open(data_dir + 'chembl21_25K.pairs.txt.gz', 'rb') as inf:
> for line in inf:
> line = line.decode().strip().split()
> smi1 = line[1]
> smi2 = line[3]
> m1 = Chem.MolFromSmiles(smi1)
> m2 = Chem.MolFromSmiles(smi2)
> mols.append(m1)
> mols.append(m2)
>
> frags = [Chem.MolFromSmiles(x.split()[0]) for x in open(data_dir +
> 'zinc.frags.500.q.smi', 'r')]
>
> mfps = [Chem.PatternFingerprint(m, 512) for m in mols]
> fragsfps = [Chem.PatternFingerprint(m, 512) for m in frags]
>
> %%timeit -n1 -r1
> for i, fragfp in enumerate(fragsfps):
> hits = 0
> for j, mfp in enumerate(mfps):
> if DataStructs.AllProbeBitsMatch(fragfp, mfp):
> if mols[j].HasSubstructMatch(frags[i]):
> hits = hits + 1
>
>
> I want to focus on the last cell and namley the "AllProbeBitsMatch" method:
>
> %%timeit
> DataStructs.AllProbeBitsMatch(fragsfps[10], mfps[10])
>
> Results:
>
> Windows 10 native i7-8850H:
>567 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 100 loops
> each)
> Lubuntu 16.04 virtualized i7-8850H: 1.81
> µs ± 56.7 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) //
> the high variation is consistent
> Windows Server 2012 R2 virtualized Xeon E5-2620 v4:1.18 µs ± 4.09 ns
> per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> So it seems virtualization seems to reduce  the performance of this
> specific method by half which is also what I see by running the full
> substructure search code which takes double the time on the virtualized
> machines. (The windows server actually runs on ESX (eg type 1 hypervisor)
> while the Lubuntu VM is a type 2 (Vmware workstation) but both seem to
> suffer the same.).
>
> we can try same thing with
>
> %%timeit
> mols[10].HasSubstructMatch(frags[10])
>
> The difference here is smaller but VMs also take >50% more time.
>
> So there seems to be a consistent large performance impact in VMs.
>
> Of course the VM will be a bit slower but not by that much? What am I
> missing? Other experiences?
>
> Best Regards,
>
> Thomas
> --
> *Von:* Greg Landrum 
> *Gesendet:* Montag, 16. Dezember 2019 17:10
> *An:* Thomas Strunz 
> *Cc:* rdkit-discuss@lists.sourceforge.net <
> rdkit-discuss@lists.sourceforge.net>
> *Betreff:* Re: [Rdkit-discuss] Observations about RDKit performance:
> PatternFingerprinter, Windows, Linux and Virtual machines
>
> Hi Thomas,
>
> First it is important to compare equivalent major versions to each other.
> Particularly in this case. On my linux box generating the pattern
> fingerprints takes 24.2 seconds with v2019.03.x and 15.9 seconds with
> v2019.09.x (that's due to the improvements in the substructure matcher that
> the blog post you link to discusses).
>
> Comparing the same versions to each other:
>
> Performance on windows vs linux
> Windows performance with the RDKit has always lagged behind linux
> performance

Re: [Rdkit-discuss] Can we use Rdkit to generate pdbqt file for AutoDock Vina?

2020-04-11 Thread Maciek Wójcikowski
Hi Zhenting,

RDKit does not have PDBTQ writer as you mentioned, but I have implemented
one in ODDT
https://oddt.readthedocs.io/en/latest/rst/oddt.toolkits.extras.rdkit.html#oddt.toolkits.extras.rdkit.MolToPDBQTBlock

I also can warn you that reading back molecules from PDBQT might mess up
your bond orders.

In ODDT it is also possible to run a docking "pipeline" with Vina using
RDKit exclusively, and use SDFs as a input/output.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


sob., 11 kwi 2020 o 15:57 Zhenting Gao <183310...@qq.com> napisał(a):

> Hi,
>
> I've got a solution now.
> Here are the keys:
> 1. OpenBabel can convert SDF to PDBQT. During conversion, non-polar
> hydrogen will be removed.
> 2. OpenBabel can assign Gasteiger charge to molecule. Although the
> assigned charge is different from OpenBable and AutoDockTools, AutoDock
> Vina will omit the charges in input file.
>
> HTH
> Zhenting
>
>
> -- Original --
> *From:* "我自己的邮箱"<183310...@qq.com>;
> *Date:* Fri, Apr 10, 2020 12:35 PM
> *To:* "Rdkit-discuss";
> *Subject:* Can we use Rdkit to generate pdbqt file for AutoDock Vina?
>
> Hi there,
>
> Regarding to open-source CADD tools, Rdkit and AutoDock are really
> essential.
> Can we use Rdkit to generate pdbqt file for AutoDock Vina?
> Please also inform if you have solutions other than Rdkit.
>
> I am trying to leverage the MGLtoos now.
>
> Best regards
> Zhenting
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Looking for additional GSoC co-mentors

2020-04-03 Thread Maciek Wójcikowski
Hi,

I'm happy to help with Python projects.

śr., 1 kwi 2020, 21:31 użytkownik Geoffrey Hutchison <
geoff.hutchi...@gmail.com> napisał:

> Wearing my "admin for Open Chemistry" hat for the moment.. As a community,
> we've been very lucky to get a lot of good open source chemistry
> development done over the last few years.
>
> Moreover, many of these students are now exposed to RDKit,
> cheminformatics, and good coding. (In other words, they're good recruits
> for both academic and industrial positions. ;-)
>
> Google is never open about their criteria for allocating the number of
> slots to organizations. Still, groups with more mentors are more likely to
> get more slots.
>
> I'll be more blunt than Greg - without more mentors we will be forced to
> make very hard choices. We'll probably still need to do that, but finding a
> few more co-mentors will certainly help RDKit.
>
> -Geoff
>
>
> On Apr 1, 2020, at 2:16 AM, Greg Landrum  wrote:
>
> Dear all,
>
> Yesterday was the last day for potential students to submit applications
> for Google Summer of Code and we got proposals for the following five
> projects:
> 1) RDKit integration with MongoDB. Python based
> 2) Implement a generalized file reader and a multi-threaded file reader.
> C++ based
> 3) Integrating trained neural networks (specifically ANI-like force
> fields) into the RDKit. C++ based
> 4) RDKit-OpenMM integration. C++ based
> 5) Improved RDKit integration with Jupyter, Dask, Pandas, Plotly, and
> Bokeh. Python based
>
> There's more about the first four projects here:
> https://wiki.openchemistry.org/GSoC_Ideas_2020#RDKit_Project_Ideas and
> I'm happy to answer questions about them.
>
> We have a mentor and co-mentor for each of these projects, but I'm looking
> for a few more people to act as co-mentors. It doesn't make sense to do a
> project without at least two mentors - one primary mentor and one
> co-mentor- and Google is strongly encouraging organizations to have three
> mentors - one primary and two co-mentors - available for each project.
> Being a co-mentor is going to require an hour or two a week on average over
> the course of the program.
>
> There's more information on the program, including the dates, here:
> https://summerofcode.withgoogle.com/how-it-works/
>
> GSoC is really a great program and being a mentor/co-mentor is a good way
> to help move both the open-source community and the RDKit forward.
>
> If you're interested or have questions, feel free to send me email,
> -greg
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Handling PDB files

2020-07-23 Thread Maciek Wójcikowski
Hi Tim,

You have SplitMolByPDBChainId and *Residue
http://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.SplitMolByPDBChainId
They are moderately useful to be honest, but they work as advertised. I
would prefer more consistent way like mol.GetChains() or mol.GetResidues(),
but we have no such option unfortunately.

Best,
Maciek


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


czw., 23 lip 2020 o 16:41 Tim Dudgeon  napisał(a):

> RDKit can read PDB files but is there any functionality to work with the
> resulting molecule at the chain and residue level?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Removing hydrogen atoms without neighbors

2021-01-21 Thread Maciek Wójcikowski
Hi Navid,

Last but not least, recent versions of Chem.RemoveHs accept additional
parameters which include an option to remove zero degree Hs
http://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RemoveHsParameters.removeDegreeZero

params = Chem.RemoveHsParameters()
> params.removeDegreeZero = True
> mol_nohs = Chem.RemoveHs(mol, params)



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


czw., 21 sty 2021 o 16:58 Paolo Tosco 
napisał(a):

> Hi Navid,
>
> if I interpret correctly your question, either of these should do what you
> need:
> Chem.DeleteSubstructs(mol, Chem.MolFromSmarts("[#1X0]"))
> Chem.DeleteSubstructs(mol, Chem.MolFromSmarts("[#1]"), onlyFrags=True)
>
> HTH,
> p.
>
> On Wed, Jan 20, 2021 at 5:38 PM Navid Shervani-Tabar 
> wrote:
>
>> Dear all,
>>
>> I was wondering if there is a function to remove "hydrogen atoms without
>> neighbors" from the mol object. Thanks!
>>
>> Regards,
>> Navid
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] install on macosx with Python 3.8

2021-06-24 Thread Maciek Wójcikowski
Hi Michał,

Conda-forge is your solution - you probably checked legacy rdkit channel.

Best,
Maciek

czw., 24 cze 2021, 19:59 użytkownik Michal Krompiec <
michal.kromp...@gmail.com> napisał:

> Hello,
> Is it possible to install RDKit on MacOSX in a Python 3.8 environment?
> There is no conda binary for 3.8, so I tried homebrew. But the following
> gives me an error message (brew doesn't like the --with-python3 argument):
>
> brew install rdkit --with-python3 --without-numpy
>
> So I did just "brew install rdkit", but then rdkit is unimportable in
> Python ("No module named 'rdkit'"). What am I doing wrong?
>
> I'm using brew 3.2.0 on MacOS 11.4
>
>
> Thanks in advance,
>
>
> Michal Krompiec
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] explicit H atoms

2021-03-09 Thread Maciek Wójcikowski
Hi,

I'd say that for a tetrahedral stereo that is possible to remove all of Hs.
But for double bonds it might not be as easy, or impossible for some edge
cases - conjugated double bonds in particular. Here is one:

[image: image.png]
[H]\C(=C/C)C=C\C([H])=C\C



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


wt., 9 mar 2021 o 10:47 Paul Emsley  napisał(a):

> On 09/03/2021 09:01, Jean-Marc Nuzillard wrote:
> > Sure, testosterone may be drawn as
> > [snip]
>
> OK :-)
>
> That's a top quality rendering by the way. How did you make it?
>
> Paul.
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] explicit H atoms

2021-03-09 Thread Maciek Wójcikowski
Hi Jean-Marc,

I know you can draw them, but both SMILES and RDKit internally use two
bonds (up/down) directions to assign the bond stereo, which means that
there are not enough bonds to define both double bonds configuration and
have the middle one undefined at the same time.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


wt., 9 mar 2021 o 13:14 Jean-Marc Nuzillard 
napisał(a):

> Hi Maciek,
>
> I would find your example rather readable even without explicit H atoms.
>
>
>
> I drew it like that because I do not have the wavy wedge at hand.
>
> Thanks for your proposal,
> Best,
>
> Jean-Marc
>
>
>
> Le 09/03/2021 à 11:26, Maciek Wójcikowski a écrit :
>
> Hi,
>
> I'd say that for a tetrahedral stereo that is possible to remove all of
> Hs. But for double bonds it might not be as easy, or impossible for some
> edge cases - conjugated double bonds in particular. Here is one:
>
> [image: image.png]
> [H]\C(=C/C)C=C\C([H])=C\C
>
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
>
> wt., 9 mar 2021 o 10:47 Paul Emsley 
> napisał(a):
>
>> On 09/03/2021 09:01, Jean-Marc Nuzillard wrote:
>> > Sure, testosterone may be drawn as
>> > [snip]
>>
>> OK :-)
>>
>> That's a top quality rendering by the way. How did you make it?
>>
>> Paul.
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> --
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
>
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 
> 66http://www.univ-reims.fr/icmrhttp://eos.univ-reims.fr/LSD/CSNteam.html
> http://www.univ-reims.fr/LSD/http://www.univ-reims.fr/LSD/JmnSoft/
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Maciek Wójcikowski
Hi Pat,

What I found useful in the past is to make the imports inside of the
functions for dask. Not very elegant, but works.

Best,
Maciek

pon., 22 mar 2021, 14:30 użytkownik Patrick Walters 
napisał:

> 2020.09.5
>
> On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>> Hi Pat,
>>
>>
>>
>> Hum, I’ve got same error as you.
>>
>>
>>
>> By the way I have to change code to use this
>>
>> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
>>
>> to avoid another error.
>>
>> Which version of rdkit do you use  ?
>>
>>
>>
>> BR
>>
>>
>>
>> Guillaume
>>
>>
>>
>>
>>
>> *De : *Patrick Walters 
>> *Date : *lundi, 22 mars 2021 à 14:20
>> *À : *Guillaume GODIN 
>> *Cc : *rdkit-discuss 
>> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>>
>>
>>
>> The input is just SMILES and molecule name separated by a space.   I've
>> attached an example.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pat
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
>> guillaume.go...@firmenich.com> wrote:
>>
>> Hi Pat,
>>
>>
>>
>> Do you have a small example file to proceed , or can I use esol.csv for
>> example ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Guillaume
>>
>>
>>
>> *De : *Patrick Walters 
>> *Date : *lundi, 22 mars 2021 à 13:51
>> *À : *rdkit-discuss 
>> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>>
>> Apologies, there was a bug in the code I sent in my previous message.
>> The problem is the same.  Here is the corrected code in a gist.
>>
>>
>>
>> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
>> wrote:
>>
>> Hi All,
>>
>>
>>
>> I've been trying to calculate BCUT2D descriptors in parallel with Dask
>> and get this error with the code below.
>>
>> TypeError: cannot pickle 'Boost.Python.function' object
>>
>>
>>
>> Everything works if I call mw_df, which calculates molecular weight, but
>> I get the error above if I call bcut_df.  Does anyone have a workaround?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pat
>>
>>
>>
>> #!/usr/bin/env python
>>
>> import sys
>> import dask.dataframe as dd
>> import pandas as pd
>> from rdkit import Chem
>> from rdkit.Chem.Descriptors import MolWt
>> from rdkit.Chem.rdMolDescriptors import BCUT2D
>> import time
>>
>> # --  molecular weight functions
>> def calc_mw(smi):
>> mol = Chem.MolFromSmiles(smi)
>> return MolWt(mol)
>>
>> def mw_df(df):
>> return df.SMILES.apply(calc_mw)
>>
>> # -- bcut functions
>> def bcut_df(df):
>> return df.apply(calc_bcut)
>>
>> def calc_bcut(smi):
>> mol = Chem.MolFromSmiles(smi)
>> return BCUT2D(mol)
>>
>> def main():
>> start = time.time()
>> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
>> ddf = dd.from_pandas(df,npartitions=16)
>> ddf['MW'] =
>> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
>> ddf['BCUT'] =
>> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
>> print(time.time()-start)
>> print(ddf.head())
>>
>>
>> if __name__ == "__main__":
>> main()
>>
>>
>> ***
>> DISCLAIMER
>> This email and any files transmitted with it, including replies and
>> forwarded copies (which may contain alterations) subsequently transmitted
>> from Firmenich, are confidential and solely for the use of the intended
>> recipient. The contents do not represent the opinion of Firmenich except to
>> the extent that it relates to their official business.
>>
>> ***
>>
>>
>> ***
>> DISCLAIMER
>> This email and any files transmitted with it, including replies and
>> forwarded copies (which may contain alterations) subsequently transmitted
>> from Firmenich, are confidential and solely for the use of the intended
>> recipient. The contents do not represent the opinion of Firmenich except to
>> the extent that it relates to their official business.
>>
>> ***
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Maciek Wójcikowski
Hi Lewis,

You can try to use PreparePDBMol in oddt
https://github.com/oddt/oddt/blob/master/oddt/toolkits/extras/rdkit/fixer.py#L623-L669
that we used in PLEC model training and PDBFixer didn't worked for us
either. Note that as soon as you have correct bonding you can disable
automatic bonding in RDKit using proximityBonding=False.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 27 wrz 2021 o 12:25 Lewis Martin 
napisał(a):

> Very interesting - thank you Francois! PDB re-do does the trick:
>
>
>
>
>
>
>
>
>
> *import requestsfrom rdkit import Chemdef getPDB(code):out =
> requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb
> <https://pdb-redo.eu/db/%7Bcode%7D/%7Bcode%7D_final.pdb>')return
> out.contentpdb_string = getPDB('3udn')Chem.MolFromPDBBlock(pdb_string)*
>
> I think this solves it for me, but if anyone knows how to infer correct
> bonding information without relying on distances, I'd love to hear it too!
> So far I've noticed that Parmed and PDBFixer infer correct bonds, but they
> don't determine bond orders, so it's difficult to port the molecule into
> RDKit.
>
> Cheers
> Lewis
>
>
>
> On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger 
> wrote:
>
>> Hi Lewis,
>>
>> Just an idea: you might try to load your PDB in UCSF Chimera, then
>> save it as a mol2 or sdf file.
>> Then, try to read this sdf file from rdkit.
>>
>> Another idea: try to get your pdb file through the pdbredo service.
>> https://pdb-redo.eu/
>> They might have fixed a few things; maybe this PDB will read better in
>> rdkit.
>>
>> Regards,
>> F.
>>
>> On 26/09/2021 17:02, Lewis Martin wrote:
>> > Hi RDKit,
>> > While parsing proteins from the PBD with RDKit, I've come across
>> > situations where the distance-based bond determination leads to
>> > 'incorrect' bonds between atoms that are erroneously too close
>> > together. PDB files have no bond information, so it's not really
>> > 'incorrect' (rather the model coordinates are off), but the bonds are
>> > nonphysical - and it means the Mol objects won't sanitize.
>> >
>> > Here's an example:
>> >
>> > import requests
>> > from io import BytesIO
>> > import gzip
>> > from rdkit import Chem
>> >
>> > def getPDB(code):
>> > out =
>> > requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
>> > binary_stream =  BytesIO(out.content)
>> > return gzip.open(binary_stream).read()
>> >
>> > pdb_string = getPDB('3udn')
>> > Chem.MolFromPDBBlock(pdb_string)
>> >
>> > Error is:
>> >
>> > RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
>> > greater than permitted
>> >
>> > This is caused by the threonine 72 sidechain being too close to the
>> > TYR71 backbone carbonyl oxygen (this can be visualized at
>> > https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
>> > TYR71 is near the ligand).
>> >
>> > Does anyone know how to avoid this to create a Chem.Mol? I've tried
>> > using Parmed and PDBFixer, since they use residue templates to
>> > generate the correct bonding topology, but they don't write CONECT
>> > records or SDFs, so the bonds are still lost to RDKit.
>> >
>> > Thanks for your time!
>> > Lewis
>> > PS - why not just use PDBFixer? I'm trying to calculate atom
>> > invariants using RDKit's morgan fingerprinter implementation, so
>> > ultimately I want a sanitized Mol object
>> >
>> > Links:
>> > --
>> > [1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss