Dear James, On Wed, Nov 24, 2010 at 4:35 PM, James Davidson <[email protected]> wrote: > > Great job on the Knime nodes! I have been giving these a go and am > impressed (and excited about the future development!). A couple of > observations / comments / questions:
Thanks! > > 1. I have observed that sometimes the FP node seems to generate blank > fingerprints (doesn't appear to just be the rendering - eg blank if I swap > to 'Bit Scratch' render as well. I have mainly been trying the default > Morgan FPs, and find that if I reset the node and re-run, the FP is still > blank. If, however, I swap the node to eg atompair, run, then swap back to > Morgan - it seems to work... I am running on knime 2.2.2 on Windows 32-bit. That's odd. I haven't seen anything like this, but I haven't spent a ton of time using the windows version. I'll try to see if I can reproduce it. > 2. The next point is probably down to cheminformatics / knime naivety, but > I must confess I am struggling a little to cluster compounds based on the > FP... I have used the 'Distance Matrix Calculate' node (with Tanimoto > similarity) to get a matrix that can be used by the 'Heirarchical Clustering > (DistMatrix)' or 'k-Medoids' nodes. However, both of these appear to > perform VERY slowly for a set of ~ 4000 compounds. I also attempted to > cluster on the fingerprints directly, using the Neighborgrams nodes - but > must confess I am some way off understanding what I am doing! Hierarchical Clustering (DistMatrix) does, indeed, scale poorly. According to the docs it scales cubically in the number of rows... that's going to hurt when N=4000. The implementation the RDKit uses (adapted from some code by Murtagh) is pretty heavily optimized and behaves well for large datasets. > My limited > experience of using the RDKit functionality to cluster compounds and eg > select a representative set (based on the FP Tanimoto distances and the > Murtagh clustering) was that it performed rather rapidly. Is there the > intention to expose this functionality in knime (or is the functionality > already there and I just don't know how?) It's not there yet, but it sure would be useful if the knime implementation were faster. I don't think it makes sense to use the RDKit implementation directly, but it may be possible to do a port of the Murtagh algorithm to java. Thorsten? What do you think? > > 3. Any plans for Windows 64-bit support? I haven't had a 64bit windows machine set up for development work, so I've never even tested the RDKit under 64bit windows. I just got a new machine, which does have windows installed. I will see about getting a development environment on there and trying to build the RDKit, but I'm not going to make any promises there. > 4. I would be interested to know what the team views as the next priorities > - property calcs, 3D conformations, pharmacophores, rendering? So much > great stuff to choose from! :-) We're open to suggestions. In addition to what's already there, the initial release will contain at least an AddCoordinates node which can add either 2D coordinates (optionally aligned to a template) or a 3D conformation. If you have things that you'd really like to see, please pipe up. Best Regards, -greg ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

