Hi Neal, Let me provide a more thorough answer than I did previously, I'll move it back on list too so that the answers are out there and archived.
On 9/13/07, Noel O'Boyle <baoille...@gmail.com> wrote: > (off-list due to inquisitive nature of some questions, although feel > free to move back onto-list) > > First of all, Greg, this is pretty expletive impressive. In fact, it's > unbelievable. Not only have you matched the OpenBabel or Daylight > toolkits, in many ways (perhaps all?) you have surpassed them. I can't Thanks for the complements, I appreciate them and I'm sure that Santosh (the other developer) does as well, but I do want to temper the enthusiasm with a bit of reality: the RDKit stuff isn't nearly as highly optimized or thoroughly battle tested as Daylight. > believe this code has been out for more than a year. You've got 2D and > 3D coordinate generation, and everything! Something the open source > world has been crying out for for the last few years. If this code had > been around when I first heard about the Python interface to OB > (almost 2y ago now), I probably wouldn't have been involved. (To > clarify, I'm involved at the Python end of OpenBabel) > > So on to the questions: > (1) Can I believe my eyes? Is this really open source? A lot of the > Python code has a very restrictive copyright statement right at the > start (see Windows release, AllChem.py for example) The "All rights reserved" is, in my opinion, superseded by the license.txt file (which is BSD, except for the GUI components, which are GPL due to Qt license restrictions). It's really open source and it's as open as we could make it without going public domain (I consider the BSD license to be far more open than the GPL, which is quite restrictive IMO). > (2) How does, if at all, being the product of a company affect this > toolkit? I guess what I'm saying is, are you willing to engage with > the OS community. Is the code likely to be taken back in house (which > happened with OEChem, for example)? The company is no more, so no worries about that. > (3) What's the story with version numbers and backwards-incompatible > API changes? That is, do you try to maintain the API across releases? Yes. API changes would break the unit tests, which would require a lot of work to fix, so pure developer laziness dictates API stability. We also put a lot of time into making sure that the various binary formats used can always be parsed backwards (e.g. if a file format change happens newer versions can still read old files). > (4) Why haven't you publicised RDKit, if you don't mind me asking? For > example, there is an excellent (if I do say so myself) website called > Linux4Chemistry which lists the excellent (if you do say so yourself) > YaEHMOP. Also there's the CCL mailing list. I only found RDKit because > of trawling through the SF software map. Is this, um, shyness, > intentional? There are many components to the answer to this question. Some are: 1) Promotion isn't something I enjoy or am particularly good at. 2) I'm kind of afraid of having more users. I do a lot of this as a free-time project and I'm afraid of spending all my time answering questions. This is, of course, a bit stupid because if the whole open source thing works then other people will pitch in and help with those questions. For that to happen I need those other people as users, which requires that I find them, which... it's a Catch 22 > (5) You may/not be aware but Numeric is deprecrated to the extent that > it is not available for Python 2.5 on Windows. I had to replace a > couple of "import Numeric"s with "from numpy import oldnumeric as > Numeric", but this is only a temporary solution. The Numeric thing is a definite problem (though it works fine for me with Python2.5 under windows). I made an attempt a while ago to port the code to use numpy, but was immediately frustrated by the lack of documentation available (unless you buy the book) and the very aggressive response of the community when I complained about this. > (6) It'd be nice to have an installer for the Python stuff...I've done > this for OpenBabel. It's pretty easy. If you care to share how you did this, I'd be happy to learn. It's a nice idea. > (7) Conceptually is this a C++ toolkit, or a Python toolkit with a C++ > backend? It seems that a lot of the work is done in Python... It's both. The core data structures and algorithms are almost entirely in C++, a lot of the "end-user" functionality is written in Python. The model has been that new algorithms get coded first in Python and then ported into C++ if it's needed for speed. The two APIs are similar enough (to me at least) that this usually ends up being fairly straightforward. > (8) Are there any particular reasons you didn't base your code on OpenBabel? Again, a complicated question. The short answer is: 1) at the time we started the RDKit development OpenBabel was still OELib (or close to it) and didn't do what we wanted. 2) we were doing this at a software/services company, so the use of the GPL erased any hope that we could make serious use of the code. > (9) Have you seen Rajarshi's smi23d? (If not, have a google) It > appears to use a similar method to yours to create a 3D conformation, > although it has run into some patent problems due to stochastic > proximity embedding. Do you use the same algorithm? I haven't seen smi23d. We do *not* use SPE or anything related to it. Our approach for embedding the coordinates from the distance matrix is to just use the standard diagonalization procedure that is used by things like DGeom. SPE (and related algorithms) is really overkill for small molecules. > (10) 2D depiction. Mega cool!! :-) Can I use it, can I, can I? But how > does it work? Any chance of you writing a paper so I don't need to > read the code? Nope, no chance at all: there's nothing really innovative in it. Santosh can pipe up with more information if he has time, but the algorithm is pretty much what I'd call the "ChemDraw" algorithm (as described in the Rev. Comp. Chem. article on depiction) with various small mods that we put in to address particular edge cases. The depictor isn't as good as ChemDraw or MOE or OpenEye, but it's not too bad. > (11) 2D representation of 3D structure. Unbelievable. You might want > to check out the recent (ASAP alert in JCIM) Alex Clark paper on MOE's > depiction. Yeah, that was inspired by Alex's work and the great pictures that MOE makes. That's great stuff. > (12) Interested in easily converting a ROMol to an OBMol and vice > versa? I am. It'd be trivial to do this at the Python level. We could > coordinate a bit to make the methods somewhat symmetrical. It would > make it easy to unittest shared algorithms against each other, e.g. > LogP calculation, SMILES, or whatever. It would be an interesting exercise. I'm not convinced that it would be trivial to get it right. There's a lot of devil in the details of things like aromaticity handling and general sanitization problems (the RDKit is *very* picky about molecules being "clean"). > (13) Atoms don't have coordinates, is this right? You need to get > their Idx, and look them up in a Conformer? Correct. Neither atoms nor molecules know about coordinates, that information is carried in the Conformer. This allows a molecule to carry around multiple 2D and/or 3D Conformers. -greg