Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
Same here. I would also add the standardisation work done by Francis Atkinson at the EBI as an additional starting point. George. Sent from my giPhone > On 16 Jan 2018, at 17:19, JPwrote: > > Joining the fray, +1 for MolVS > >> On 16 January 2018 at 16:00, Brian Cole wrote: >> +1 to the MolVS project as well. >> >> Perhaps an easy bite-size project is to incorporate the open source mae >> parser code into core RDKit: https://github.com/schrodinger/maeparser >> >> >>> On Mon, Jan 15, 2018 at 9:08 PM, Francois BERENGER >>> wrote: >>> On 01/16/2018 05:51 AM, Tim Dudgeon wrote: >>> > Incorporating and "industrialising" Matt's MolVS tautomer and >>> > standardizer code? >>> > http://molvs.readthedocs.io/en/latest/index.html >>> >>> If we can vote, I would vote for this one. >>> >>> > On 15/01/18 07:09, Greg Landrum wrote: >>> >> Dear all, >>> >> >>> >> We've been invited again to participate in the OpenChemistry >>> >> application for Google Summer of Code. >>> >> >>> >> In order to participate we need ideas for projects and mentors to go >>> >> along with them. >>> >> >>> >> The current list of RDKit ideas is being maintained here: >>> >> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas >>> >> >>> >> (Note: at the point that I'm pressing "send", that's still a copy of >>> >> last year's project ideas). >>> >> >>> >> If you're willing to be a mentor (please ask me about the ~5 >>> >> hours/week required here) or have ideas, please reply to this thread. >>> >> >>> >> Best, >>> >> -greg >>> >> >>> >> >>> >> -- >>> >> Check out the vibrant tech community on one of the world's most >>> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> >> >>> >> >>> >> ___ >>> >> Rdkit-discuss mailing list >>> >> Rdkit-discuss@lists.sourceforge.net >>> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> > >>> > >>> > >>> > -- >>> > Check out the vibrant tech community on one of the world's most >>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> > >>> > >>> > >>> > ___ >>> > Rdkit-discuss mailing list >>> > Rdkit-discuss@lists.sourceforge.net >>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> > >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Default behavior of certain calls
Great example of functools.partial. For those who like functional programming, it can also be used with map and imap when a function needs more than one parameters. George. Sent from my giPhone > On 12 Oct 2017, at 19:04, Andy Jenningswrote: > > Hi Paolo, > > That's outstanding - thanks very much. > > Best, > Andy > >> On Thu, Oct 12, 2017 at 10:27 AM, Paolo Tosco wrote: >> Dear Andy, >> >> you may accomplish that within the scope of a Python script using >> functools.partial: >> >> In [1]: from rdkit import Chem >> >> In [2]: import functools >> >> In [3]: # redefine Chem.SDMolSupplier to include a custom default parameter >> >> In [4]: Chem.SDMolSupplier = functools.partial(Chem.SDMolSupplier, removeHs >> = False) >> >> In [5]: suppl = Chem.SDMolSupplier('/home/paolo/sdf/bilastine.sdf') >> >> In [6]: # hydrogens have not been stripped >> >> In [7]: suppl[0].GetNumAtoms() >> Out[7]: 71 >> >> In [8]: # If you wish to invoke the original function with the original >> default parameter: >> >> In [9]: suppl = Chem.SDMolSupplier.func('/home/paolo/sdf/bilastine.sdf') >> >> In [10]: # hydrogens have been stripped as the original function was invoked >> >> In [11]: suppl[0].GetNumAtoms() >> Out[11]: 34 >> HTH, cheers >> p. >>> On 10/12/17 18:09, Andy Jennings wrote: >>> Hi, >>> >>> First off: great work on the RDKit - a great resource for those of us that >>> like to cook up our own solutions to problems. >>> >>> The default behavior of certain calls (e.g. Chem.SDMolSupplier, >>> Chem.MolToSmiles) has default behavior that is the opposite of what I would >>> generally want. For instance I might be processing docking files and want >>> to keep those pesky hydrogens, or I want to keep the stereochemical >>> information when I dump a smiles string. >>> >>> I can understand why the current defaults might have been arrived at so I'm >>> not advocating the change in default behavior. Rather, I'm curious if one >>> could set the default behavior for an entire script (I write mostly >>> python). It maybe/is lazy of me but every so often I get caught out and >>> have to backtrack through a workflow. >>> >>> Best, >>> Andy >>> >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> >>> >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit-fingerprints set all bits for complex molecules?
Example: https://www.surechembl.org/chemical/SCHEMBL1895 George. Sent from my giPhone > On 1 Jun 2017, at 17:05, Greg Landrumwrote: > > Hi Nils, > > Can you please send me the SMILES for those structures (or point me to an > easy way to lookup a SCHEMBL id)? > > I will take a look at these, but I don't currently have a convenient copy of > SCHEMBL. > > -greg > > > >> On Thu, Jun 1, 2017 at 4:28 PM, Nils Weskamp wrote: >> Dear RDKitters, >> >> I just calculated RDKit "Daylight-like" fingerprints for a number of public >> compound databases and found quite a number of examples where the resulting >> fingerprints have *all* bits set to 1. This happens in both KNIME 3.2.1 >> (1024/1/7) and also via the command line (2048/1/7/4) for RDKit 2016.03. >> >> Examples include (from SureChEMBL): >> >> SCHEMBL5141968 >> >> SCHEMBL13916889 >> >> SCHEMBL16257315 >> >> SCHEMBL16257310 >> >> SCHEMBL16257297 >> >> SCHEMBL16257215 >> >> SCHEMBL16257169 >> >> SCHEMBL8232906 >> >> SCHEMBL16257312 >> >> SCHEMBL13011081 >> >> SCHEMBL12570100 >> >> SCHEMBL14524878 >> >> SCHEMBL6370886 >> >> SCHEMBL15305169 >> >> SCHEMBL16912871 >> >> SCHEMBL13290179 >> >> >> Now, these are obviously some very large and complex molecules, so I would >> expect that they contain many features and thus set many bits - but all of >> them? >> >> So, in short: Are these compounds so ugly that it is normal for the >> fingerprints to have all bits set or are they so ugly that they trigger some >> rare bug in RDKit? >> >> Any ideas / suggestions / comments? >> >> Thanks a lot, >> Nils >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] substructure of a fingerprint position
https://iwatobipen.wordpress.com/2017/01/08/get-bit-information-with-rdkit/ George. Sent from my giPhone > On 26 Jan 2017, at 11:02, Gonzalo Colmenarejo> wrote: > > Hi, > > is there a way in RDKit to retrieve the substructure(s) corresponding to a > (hashed or unhashed) Morgan fingerprint position? > > Thanks a lot in advance > > Gonzalo > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Extracting SMILES from text
:) George. Sent from my giPhone > On 2 Dec 2016, at 22:11, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > >> On 12/02/2016 03:12 PM, George Papadatos wrote: >> Here's a pragmatic idea: > ... would it not be safe to >> assume that *any *word containing more than 4 'C' or 'c' characters would >> only be a SMILES string? > > pneumonoultramicroscopicsilicovolcanoconiosis > > > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Extracting SMILES from text
Here's a pragmatic idea: If Alexis wants to search for valid SMILES strings representing typical *organic *molecules among text of plain English words, would it not be safe to assume that *any *word containing more than 4 'C' or 'c' characters would only be a SMILES string? This simple filter (word.lower().count('c')>=4) would quickly eliminate all normal English words, leaving only SMILES to parse. No need for regexes, unless you really care for ISIS or IOPS molecules. :) George On 2 December 2016 at 19:36, Andrew Dalkewrote: > On Dec 2, 2016, at 11:11 AM, Greg Landrum wrote: > > An initial start on some regexps that match SMILES is here: > https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b3 > 8da34e22fb > > > > that may also be useful > > > I've put together a more gnarly regular expression to find possible SMILES > strings. It's configured for at least 4 atom terms, but that's easy to > change (there's a "{3,}" which can be changed as desired.) > > It's follows the SMILES specification a bit more closely, which means > there should be fewer false positives than the regular expression Greg > pointed out. > > The file which constructs the regular expression, and an example driver, > is attached. Here's what the output looks like: > > > > > % python detect_smiles.py ~/talks/*.txt > /Users/dalke/talks/ICCS_2014_paper.txt:528:532 'IOPS' > /Users/dalke/talks/ICCS_2014_paper.txt:30150:30183 > 'CC12CCC3C(CCC4=CC(O)CCC34C)C1CCC2' > /Users/dalke/talks/ICCS_2014_paper2.txt:3270:3274 'CBCC' > /Users/dalke/talks/ICCS_2014_paper2.txt:10229:10239 'CC(=O)[O-]' > /Users/dalke/talks/ICCS_2014_paper2.txt:32766:32770 'ISIS' > /Users/dalke/talks/Sheffield2013.txt:25002:25013 'C1=CC=CC=C1' > /Users/dalke/talks/Sheffield2013.txt:25039:25047 'c1c1' > /Users/dalke/talks/Sheffield_2016.txt:2767:2771 'CBCC' > /Users/dalke/talks/Sheffield_2016.txt:10295:10301 'O0' > /Users/dalke/talks/Sheffield_2016_talk.txt:7302:7306 'CBCC' > /Users/dalke/talks/Sheffield_2016_talk.txt:7564:7568 'CBCC' > /Users/dalke/talks/Sheffield_2016_talk.txt:7716:7720 'CBCC' > /Users/dalke/talks/Sheffield_2016_v2.txt:2874:2878 'soon' > /Users/dalke/talks/Sheffield_2016_v2.txt:7312:7317 'O' > /Users/dalke/talks/Sheffield_2016_v2.txt:22770:22774 'ICCS' > /Users/dalke/talks/Sheffield_2016_v3.txt:2982:2986 'soon' > /Users/dalke/talks/Sheffield_2016_v3.txt:7627:7632 'O' > /Users/dalke/talks/Sheffield_2016_v3.txt:24546:24550 'ICCS' > /Users/dalke/talks/tdd_part_2.txt:7547:7551 'scop' > > You can also modify the code for line-by-line processing rather than an > entire block of text like I did. > > > As others have pointed out, this is a well-trodden path. Follow their > warnings and advice. > > Also, I didn't fully test it. > > > > Andrew > da...@dalkescientific.com > > > P.S. > > Here's the regular expression: > > (? term > > ( > > ( > ( > Cl? | # Cl and Br are part of the organic subset > Br? | > [NOSPFIbcnosp*] | # as are these single-letter elements > > # bracket atom > \[\d* # optional atomic mass >(# valid element names > C[laroudsemf]? | > Os?|N[eaibdpos]? | > S[icernbmg]? | > P[drmtboau]? | > H[eofgas]? | > c|n|o|s|p | > A[lrsgutcm] | > B[eraik]? | > Dy|E[urs] | > F[erm]? | > G[aed] | > I[nr]? | > Kr? | > L[iaur] | > M[gnodt] | > R[buhenaf] | > T[icebmalh] | > U|V|W|Xe | > Yb?|Z[nr] >) >[^]]* # ignore anything up to the ']' > \] > ) ># allow 0 or more closures directly after any atom > ( > [-=#$/\\]? # optional bond type > ( > [0-9] |# single digit closure > (%[0-9][0-9]) # two digit closure > ) > ) * > ) > > ( > > ( > ( > \( [-=#$/\\]? # a '(', which can have an optional bond (no dot) > ) | ( >\)* # any number of close parens, followed by >( > ( \( [-=#$/\\]? ) | # an open parens and optional bond (no dot) > [.-=#$/\\]? # or a dot disconnect or bond >) > ) > ) > ? > > ( > ( > Cl? | # Cl and Br are part of the organic subset > Br? | > [NOSPFIbcnosp*] | # as are these single-letter elements > > # bracket atom > \[\d* # optional atomic mass >(# valid element names > C[laroudsemf]? | > Os?|N[eaibdpos]? | > S[icernbmg]? | > P[drmtboau]? | > H[eofgas]? | > c|n|o|s|p | > A[lrsgutcm] | > B[eraik]? | > Dy|E[urs] | > F[erm]? | > G[aed] | > I[nr]? | > Kr? | > L[iaur] | > M[gnodt] | > R[buhenaf] | > T[icebmalh] | > U|V|W|Xe | > Yb?|Z[nr] >) >[^]]* # ignore anything up to the ']' > \] > ) ># allow 0 or more closures directly after any atom > ( > [-=#$/\\]? # optional bond type > ( > [0-9] |# single digit closure > (%[0-9][0-9]) # two digit closure > ) > ) * > ) > >
Re: [Rdkit-discuss] Extracting SMILES from text
I think Alexis was referring to converting actual SMILES strings found in random text. Chemical entity recognition and name to structure conversion is another story altogether and nowadays one can quickly go a long way with open tools such as OSCAR + OPSIN in KNIME or with something like this: http://chemdataextractor.org/docs/intro George On 2 December 2016 at 17:35, Brian Kelleywrote: > This was why they started using the dictionary lookup as I recall :). The > iupac system they ended up using was Roger's when at OpenEye. > > > Brian Kelley > > On Dec 2, 2016, at 12:33 PM, Igor Filippov > wrote: > > I could be wrong but I believe IBM system had a preprocessing step which > removed all known dictionary words - which would get rid of "submarine" etc. > I also believe this problem has been solved multiple times in the past, > NextMove software comes to mind, chemical tagger - > http://chemicaltagger.ch.cam.ac.uk/, etc. > > my 2 cents, > Igor > > > > > On Fri, Dec 2, 2016 at 11:46 AM, Brian Kelley > wrote: > >> I hacked a version of RDKit's smiles parser to compute heavy atom count, >> perhaps some version of this could be used to check smiles validity without >> making the actual molecule. >> >> From a fun historical perspective: IBM had an expert system to find >> IUPAC names in documents. They ended up finding things like "submarine" >> which was amusing. It turned out that just parsing all words with the >> IUPAC parser was by far the fastest and best solution. I expect the same >> will be true for finding smiles. >> >> It would be interesting to put the common OCR errors into the parser as >> well (l's and 1's are hard for instance). >> >> >> On Fri, Dec 2, 2016 at 10:46 AM, Peter Gedeck >> wrote: >> >>> Hello Alexis, >>> >>> Depending on the size of your document, you could consider limit storing >>> the already tested strings by word length and only memoize shorter words. >>> SMILES tend to be longer, so everything above a given number of characters >>> has a higher probability of being a SMILES. Large words probably also >>> contain a lot of chemical names. They often contain commas (,), so they are >>> easy to remove quickly. >>> >>> Best, >>> >>> Peter >>> >>> >>> On Fri, Dec 2, 2016 at 5:43 AM Alexis Parenty < >>> alexis.parenty.h...@gmail.com> wrote: >>> Dear Pavel And Greg, Thanks Greg for the regexps link. I’ll use that too. Pavel, I need to track on which document the SMILES are coming from, but I will indeed make a set of unique word for each document before looping. Thanks! Best, Alexis On 2 December 2016 at 11:21, Pavel wrote: Hi, Alexis, if you should not track from which document SMILES come, you may just combine all words from all document in a list, take only unique words and try to test them. Thus, you should not store and check for valid/non-valid strings. That would reduce problem complexity as well. Pavel. On 12/02/2016 11:11 AM, Greg Landrum wrote: An initial start on some regexps that match SMILES is here: https://gist.github.com/lsauer/1312860/264ae813c2bd2c2 7a769d261c8c6b38da34e22fb that may also be useful On Fri, Dec 2, 2016 at 11:07 AM, Alexis Parenty < alexis.parenty.h...@gmail.com> wrote: Hi Markus, Yes! I might discover novel compounds that way!! Would be interesting to see how they look like… Good suggestion to also store the words that were correctly identified as SMILES. I’ll add that to the script. I also like your “distribution of word” idea. I could safely skip any words that occur more than 1% of the time and could try to play around with the threshold to find an optimum. I will try every suggestions and will time it to see what is best. I’ll keep everyone in the loop and will share the script and results. Thanks, Alexis On 2 December 2016 at 10:47, Markus Sitzmann wrote: Hi Alexis, you may find also so some "novel" compounds by this approach :-). Whether your tuple solution improves performance strongly depends on the content of your text documents and how often they repeat the same words again - but my guess would be it will help. Probably the best way is even to look at the distribution of words before you feed them to RDKit. You should also "memorize" those ones that successfully generated a structure, doesn't make sense to do it again, then. Markus On Fri, Dec 2, 2016 at 10:21 AM, Maciek Wójcikowski < mac...@wojcikowski.pl> wrote: Hi Alexis, You may want to filter with some regex
Re: [Rdkit-discuss] comparing two or more tables of molecules
HI Stephen, Further to Greg's excellent reply, see this paper on how InChI strings and keys can be used in practice to map together tautomer (ones covered by InChI at least), isotope, stereo and parent-salt variants. http://rd.springer.com/article/10.1186/s13321-014-0043-5 Francis (cc'ed) has a nice notebook somewhere illustrating these nice InChI splits to find these variants. For educational purposes, there have been other approaches like the NCI's identifiers - discussion here: http://acscinf.org/docs/meetings/237nm/presentations/237nm17.pdf For pure structure standardization using RDKit see here: https://github.com/flatkinson/standardiser and https://github.com/mcs07/MolVS Cheers, George On 29 November 2016 at 17:02, Greg Landrumwrote: > Wow, this is a great question and quite a fun thread. > > It's hard to really make much of a contribution here without writing a > book/review article (something that I'm really not willing to do!), but I > have a few thoughts. Most of this is repeating/rephrasing things others > have already said. > > I'm going to propose some things as facts. I think that these won't be > controversial: > fact 1: if the structures are coming from different sources, they need to > be standardized/normalized before you compare them. This is true regardless > of how you want to compare them. The details of the standardization process > are not incredibly important, but it does need to take care of the things > you care about when comparing molecules. For example, if you don't care > about differences between salts, it should strip salts. If you don't care > about differences between tautomers, it should normalize tautomers. > fact 2: The InChI algorithm includes a standardization step that > normalizes some tautomers, but does not remove salts. > fact 3: The InChI representation contain a number of layers defining the > structure in increasing detail (this isn't strictly true, because some of > the choices about how layers are ordered are arbitrary, but it's close). > fact 4: canonicalization, the way I define it, produces a canonical atom > numbering for a given structure, but it does *not* standardize > fact 5: the RDKit has essentially no well-documented standardization code > > fact X: we don't have any standard, broadly accepted approach for > standardization, canonicalization or representation that is fool-proof or > that works for even all of organic chemistry, never mind organometallics. > InChI, useful as it is for some things, completely fails to handle things > like atropisomers (they are working on this kind of thing, but it's not out > yet). > > Given all of this, if I wanted to have flexible duplicate checking *right* > now, I think I would use the AvalonTools struchk functionality that the > RDKit provides (the new pure-RDKit version still needs a bit more testing) > to handle basic standardization and salt stripping and then produce a table > that includes the InChI in a couple of different forms. I'd want to be able > to recognize molecules that differ only by stereochemistry, molecules that > differ only by location of tautomeric Hs, and molecules that differ only by > the location of isotopic labels. You can do this with various clever splits > of the InChI (how to do it is left as an exercise for the reader and/or a > future RDKit blog post). > > I think there's something fun to be done here with SMILES variants, > borrowing heavily from some of the things that Roger has written about: > https://nextmovesoftware.com/blog/2013/04/25/finding-all-typ > es-of-every-mer/ > here's a more recent application of that from Noel: > https://nextmovesoftware.com/blog/2016/06/22/fishing-for-mat > ched-series-in-a-sea-of-structure-representations/ > > If I didn't really care about details and just wanted something that I > could explain easily to others, I'd skip all the complication and just use > InChIs (or InChI keys) to recognize duplicates. There would be times when > that would be the wrong answer, but it would be a broadly accepted kind of > wrong.[1] > > Regardless of the approach, I would not, under most any circumstances, > discard the original input structures that I had. It's really good to be > able to figure out what the original data looked like later. > > -greg > [1] I'm crying as I write this... > > > > > On Mon, Nov 28, 2016 at 5:25 PM, Stephen O'hagan > wrote: > >> Has anyone come up with fool-proof way of matching structurally >> equivalent molecules? >> >> >> >> Unique Smiles or InChI String comparisons don’t appear to work presumable >> because there are different but equivalent structures, e.g. explicit vs >> non-explicit H’s, Kekule vs Aromatic, isomeric forms vs non-isomeric form, >> tautomers etc. >> >> >> >> I also expect that comparing InChI strings might need something more than >> just a simple string comparison, such as masking off stereo information >> when you don’t care about stereo
Re: [Rdkit-discuss] Fingerprints_calculation
Hi Sahil, You'll find the same documentation as a Jupyter Notebook here: http://nbviewer.jupyter.org/github/chembl/mychembl/blob/master/ipython_notebooks/02_myChEMBL_RDKit_tutorial.ipynb#Morgan-Fingerprints-(Circular-Fingerprints) Cheers, George On 1 October 2016 at 04:17, Greg Landrumwrote: > Hi Sahil, > > The documentation includes some detail about the calculation of the Morgan > fingerprints, along with a pointer to the original publication describing > the method: http://rdkit.org/docs/GettingStartedInPython.html# > morgan-fingerprints-circular-fingerprints > > Does the information there answer your question? > -greg > > > > On Fri, Sep 30, 2016 at 6:37 AM, Sahil Kharangarh < > sahilkharang...@gmail.com> wrote: > >> >> I am facing the problem during the calculation of morgan(circular) >> fingerprints that on which basis the RDkit calculated the fingerprints when >> we choose the radius? >> how to choose the radius in the circular fingerprints? >> and what is the use of the function useFeatures=True particularly means? >> >> >> >> >> *With Warm Regards,* >> *SAHIL* >> *M.S. Research Scholar, * >> *Department of Pharmacoinformatics,* >> *National Institute of Pharmaceutical Education and Research (NIPER), * >> *sector-67, S.A.S Nagar, Mohali,* >> *Punjab- 160062, INDIA* >> *contact no: +917508142749 <%2B917508142749>,+919813153122 >> <%2B919813153122>* >> *email: sahilkharang...@gmail.com * >> >> >> >> -- >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] OCEAN: Our Target Prediction Paper (including Source Code)
Hi guys, Congrats - great use of ChEMBL and myChEMBL too :) George On 27 September 2016 at 05:13, Paul Czodrowski < paul.czodrow...@merckgroup.com> wrote: > Dear RDKitters, > > > > Our target prediction method – fully based on RDKit – has become online: > > OCEAN: *O*ptimized *C*ross r*EA*ctivity estimatio*N* > > http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00067 > > > > The source code can be found here: > > https://github.com/rdkit/OCEAN > > > > We will give a talk as well an hands-on workshop at the upcoming RDKit UGM > end of October. > > > > Cheers, > > Guido & Paul > > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > > > Click http://www.merckgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer. > > > -- > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit Tools for the IPython Notebook
Axel, this is seriously cool! Many thanks! George On 2 July 2015 at 13:31, Axel Pahl axelp...@gmx.de wrote: Dear fellow RDKitters, the RDKit community is always so helpful that I wanted share back two functions that I use in the IPython Notebook from which I thought that they could be of use to others, as well. - show_table: Display a list of molecules in a table with molecule properties as columns. When an ID property is given, the table becomes interactive and compounds can be selected. I know that this can be also done with PandasTools but that might be overkill in some situations. Also the table from Pandas is not interactive to my knowledge. - jsme: Display Peter Ertl's Javascript Melecule Editor to enter a molecule directly in the IPython notebook (how cool is that??) If you are interested, please have a look at the GitHub https://github.com/apahl/rdkit_ipynb_tools repo and the example http://nbviewer.ipython.org/github/apahl/rdkit_ipynb_tools/blob/master/rdkit_ipynb_tools.ipynb notebook. Kind regards, Axel -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Molecular dis / similarity using fingerprints
Hi JP, Aha, so you're looking for a threshold that will exhibit the optimal balance between the false positives and false negatives in the *biological* *activity* space. This threshold varies depending on the fingerprint and the dataset of course. See here for some generalised insights: (1) Papadatos, G.; Cooper, A. W. J.; Kadirkamanathan, V.; Macdonald, S. J. F.; McLay, I. M.; Pickett, S. D.; Pritchard, J. M.; Willett, P.; Gillet, V. J. Analysis of Neighborhood Behavior in Lead Optimization and Array Design. *J. Chem. Inf. Model.* *2009*, *49*, 195–208. especially Figure 17, and (2) Muchmore, S. W.; Debe, D. A.; Metz, J. T.; Brown, S. P.; Martin, Y. C.; Hajduk, P. J. Application of Belief Theory to Similarity Data Fusion for Use in Analog Searching and Lead Hopping. *J. Chem. Inf. Model.* *2008*, *48*, 941–948. and also Greg's blog post: http://rdkit.blogspot.co.uk/2013/10/fingerprint-thresholds.html The TL/DR version is that for ECFP_4, this threshold should be around 0.45-0.55. Wrt methodology, are you trying to score/rank the intra-diversity/heterogeneity for different structure sets? Cheers, George On 26 May 2015 at 11:59, JP jeanpaul.ebe...@inhibox.com wrote: On 25 May 2015 at 22:23, Tim Dudgeon tdudgeon...@gmail.com wrote: Maybe a clustering approach may work? Something like sphere exclusion clustering with counting the number of clusters at 0.9 - 0.8 similarity)? With 30K structures it sounds computationally tractable? Thanks Tim for this idea. I hadn't heard of sphere exclusion. The problem is we still need a distance / similarity function (which using ECFP with high similarity 0.8-0.9 would result in very few compounds being thrown out). I think the real issue here is selecting a sensible similarity threshold which defines my idea of similarity. But that is a tricky number to get right - too high and you remove nothing, too low and you start catching different molecules. I guess the best thing is try a few values (0.5, 0.6, 0.7, 0.8, 0.9) and have a visual look at the remaining compounds. - JP -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] generating scaffold trees
Hi all, Coincidentally, we had a chat about this with James the other day. Maybe the good colleagues at the ICR have implemented this already with RDKit? Nick? Cheers, g On 22 May 2015 at 13:38, Axel Pahl axelp...@gmx.de wrote: Dear RDKitters, has someone used the RDKit to generate scaffold trees from molecules as described in this paper: Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., Waldmann, H., J. Chem. Inf. Model. 2007, 47, 47-58 I know that this is possible with ScaffoldHunter and that there is a Pipeline Pilot component for it, but being able to do it in RDKit would fit especially well in my workflow... Kind regards and have a nice weekend, Axel -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] rdkit.version()
Hi Soren, from rdkit import rdBase print rdBase.rdkitVersion Cheers, George On 11 December 2014 at 18:22, Soren Wacker swac...@ucalgary.ca wrote: Hi, I would like to find out the currently installed version on my System. However, I cannot find a version string in RDKit. Something like rdkit.version() would be nice. Is there something like this implemented?? kind regards Soren -- *From:* James Davidson [j.david...@vernalis.com] *Sent:* Wednesday, December 10, 2014 10:48 AM *To:* greg.land...@gmail.com *Cc:* rdkit-discuss@lists.sourceforge.net *Subject:* Re: [Rdkit-discuss] Avalon test failing(?) Hi Greg, The new version of the test code is targeting the 1.2 avalon toolkit version. Here's the commit that did that. https://github.com/rdkit/rdkit/commit/42dab414ee6fbe5489078e5e52046608bbf785cb As an FYI, to make these tests pass on windows, you need to edit the code to fix a bug: you need to comment out line 1446 of reaccsio.c: //MyFree((char *)tempdir); Following your advice, I downloaded the 1.2 source from Sourceforge ( http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.2/); commented-out the line in reaccsio.c; and then reconfigured in cmake and rebuilt in VS. The tests pass now – thanks! Kind regards James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies 100 Berkshire Place Wharfedale Road Winnersh, Berkshire RG41 5RD, England Tel: +44 (0)118 938 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the Company address and registration details link at the bottom of the page.. __ -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] myChEMBL
Hi all, Further to Michał's announcement earlier, let me use this opportunity to announce the new release of myChEMBL, as I'm sure it will be relevant to many of you. myChEMBL is an open platform which consists of a Linux (Ubuntu) Virtual Machine featuring a PostgreSQL schema with the latest version of the ChEMBL database, the latest RDKit toolkit and cartridge, along with several Python tools and libraries for scientific computing and data mining. myChEMBL offers several ways to interact with ChEMBL data locally and provides a free and secure environment for application development, teaching and learning. More information here: http://chembl.blogspot.co.uk/2014/06/mychembl-launchpadlaunched.html Cheers, George -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] flexmatch in RDKit cartridge?
Many thanks Jan, that's very helpful. Cheers, George On 20 February 2014 21:32, Jan Holst Jensen j...@biochemfusion.com wrote: Hi George et al, flexmatch(... 'all') is the most strict exact match that the Symyx/Accelrys cartridge has. You can relax the matching behavior to varying degrees by passing it different options, e.g. using 'tau' instead of 'all' will make the identity check tautomer-agnostic (to the extent that the cartridge will perceive tautomers correctly - an interesting discussion topic in itself). The various options to flexmatch() are well documented in the Accelrys documentation for the cartridge, but I don't know if that is publicly available. The short answer in my opinion: Yes, @= should be the equivalent of flexmatch(m1, m2, 'all'). To emulate flexmatch(..., 'all') with rdkit, I find a small gotcha with regards to chiral matching: -- Clearly not identical. postgres=# select mol('CCC') @= mol('CCF'); ?column? -- f (1 row) -- Clearly identical. postgres=# select mol('CCC') @= mol('CCC'); ?column? -- t (1 row) -- Ala versus dAla - should *not* be identical ? postgres=# select mol('C[C@H](N)C(=O)O') @= mol('C[C@@H](N)C(=O)O'); ?column? -- t (1 row) To get the expected behavior of @= you need to turn on chiral matching. Even though the parameter says that is controls SSS behavior it apparently also has an effect on exact matching: postgres=# set rdkit.do_chiral_sss=true; SET -- Ala versus dAla - no longer identical. postgres=# select mol('C[C@H](N)C(=O)O') @= mol('C[C@@H](N)C(=O)O'); ?column? -- f (1 row) -- Ala versus Ala - phew, identical. postgres=# select mol('C[C@H](N)C(=O)O') @= mol('C[C@H](N)C(=O)O'); ?column? -- t (1 row) Cheers -- Jan On 2014-02-20 13:46, George Papadatos wrote: Hi there, Wouldn't that be (at least partly) possible with an exact structure search? - @= : returns whether or not two molecules are the same. Cheers, George On 20 February 2014 11:59, Greg Landrum greg.land...@gmail.com wrote: Sounds interesting. Can anyone provide a pointer to a doc with more specific info about what this actually does? On Thursday, February 20, 2014, Michał Nowotka mmm...@gmail.com wrote: Hi, Symix cartridge defines something called flexmatch - Finds records that are an exact match of the 2D or 3D structure that you specify in the query. Is there anything similar in RDKit cartridge? I looked into documentation and couldn't find this feature. Regards, Michal Nowotka -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] flexmatch in RDKit cartridge?
Hi there, Wouldn't that be (at least partly) possible with an exact structure search? - @= : returns whether or not two molecules are the same. Cheers, George On 20 February 2014 11:59, Greg Landrum greg.land...@gmail.com wrote: Sounds interesting. Can anyone provide a pointer to a doc with more specific info about what this actually does? On Thursday, February 20, 2014, Michał Nowotka mmm...@gmail.com wrote: Hi, Symix cartridge defines something called flexmatch - Finds records that are an exact match of the 2D or 3D structure that you specify in the query. Is there anything similar in RDKit cartridge? I looked into documentation and couldn't find this feature. Regards, Michal Nowotka -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] InChI roundtrip
OK just to add some fuel to this fire: A colleague of mine and I looked at the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to mols (sanitise + remove Hs), removed the ones that fail to convert, and then we converted back to inchis (standard ones, no extra parameters). We assessed the discrepancies between indigo and rdkit inchis compared to the original input inchis that are stored in chembl. Rdkit had 10 times more discrepancies with 200 failures as opposed to 21 from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis. I had a closer look to a couple of cases here: http://nbviewer.ipython.org/gist/madgpap/8715974 It seems that there is more that one reason for the failure. I totally understand Greg's caution about the inchi2mol conversion, but given the difference between rdkit and indigo, there might room for improvement. Any insights would be very much appreciated. Btw, the KNIME workflow and full list of fails are available to you. Cheers, George On 30 January 2014 04:11, Greg Landrum greg.land...@gmail.com wrote: Yeah, I have been tempted several times to remove the InChI-RDKit functionality entirely On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov igor.v.filip...@gmail.comwrote: Thank you, Greg! Very nice explanation and I think this issue has confused people before me as well. I am going to have to keep reminding myself about it as the subject comes up every now and then. Igor On Jan 29, 2014 10:59 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Igor, On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov igor.v.filip...@gmail.com wrote: Greg et al, Here is a little script that demonstrates a problem with fingerprints after the roundtrip through InChI. My input mol file is also attached. As you can see the similarity between before and after is not 1 in 45 out of 100 cases. In one case it is as low as 0.29. Could someone take a look and tell me what I'm doing wrong? Ah! Now I see what you're doing and understand the problem. It's really important when using InChI to remember that InChI is designed to be an identifier, not an interchange format. The InChI algorithm modifies the molecule as part of its canonicalization step. This modification includes standardizing tautomers. Here's an example of the type of substructure modification that happens in your molecules: input smiles c1c1C(=O)Nc1c1 on begin converted to InChI and back yields: OC(=Nc1c1)c1c1 Basically: If you think you know what your molecules are, you probably should be building them from SMILES or CTAB, not InChI. Apologies that I didn't think of this before; I was just focusing on the stereochemistry. -greg -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] InChI roundtrip
Hi Igor, Thanks for the quick reply. I just did in my workflow. The number of discrepancies increased from 200 to 950 :( George On 30 January 2014 19:19, Igor Filippov igor.v.filip...@gmail.com wrote: George, Have you added coordinates to the mols converted from InChI? It made a huge difference for the examples I've tried. Igor On Thu, Jan 30, 2014 at 2:07 PM, George Papadatos gpapada...@gmail.comwrote: OK just to add some fuel to this fire: A colleague of mine and I looked at the inchi roundtrip using KNIME 2.9 and the latest versions of indigo and rdkit nodes. We used ~90,000 inchis from chembl_17, converted them to mols (sanitise + remove Hs), removed the ones that fail to convert, and then we converted back to inchis (standard ones, no extra parameters). We assessed the discrepancies between indigo and rdkit inchis compared to the original input inchis that are stored in chembl. Rdkit had 10 times more discrepancies with 200 failures as opposed to 21 from indigo. This rate (~0.2%) was also confirmed using ~1 million inchis. I had a closer look to a couple of cases here: http://nbviewer.ipython.org/gist/madgpap/8715974 It seems that there is more that one reason for the failure. I totally understand Greg's caution about the inchi2mol conversion, but given the difference between rdkit and indigo, there might room for improvement. Any insights would be very much appreciated. Btw, the KNIME workflow and full list of fails are available to you. Cheers, George On 30 January 2014 04:11, Greg Landrum greg.land...@gmail.com wrote: Yeah, I have been tempted several times to remove the InChI-RDKit functionality entirely On Thu, Jan 30, 2014 at 5:05 AM, Igor Filippov igor.v.filip...@gmail.com wrote: Thank you, Greg! Very nice explanation and I think this issue has confused people before me as well. I am going to have to keep reminding myself about it as the subject comes up every now and then. Igor On Jan 29, 2014 10:59 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Igor, On Wed, Jan 29, 2014 at 2:04 PM, Igor Filippov igor.v.filip...@gmail.com wrote: Greg et al, Here is a little script that demonstrates a problem with fingerprints after the roundtrip through InChI. My input mol file is also attached. As you can see the similarity between before and after is not 1 in 45 out of 100 cases. In one case it is as low as 0.29. Could someone take a look and tell me what I'm doing wrong? Ah! Now I see what you're doing and understand the problem. It's really important when using InChI to remember that InChI is designed to be an identifier, not an interchange format. The InChI algorithm modifies the molecule as part of its canonicalization step. This modification includes standardizing tautomers. Here's an example of the type of substructure modification that happens in your molecules: input smiles c1c1C(=O)Nc1c1 on begin converted to InChI and back yields: OC(=Nc1c1)c1c1 Basically: If you think you know what your molecules are, you probably should be building them from SMILES or CTAB, not InChI. Apologies that I didn't think of this before; I was just focusing on the stereochemistry. -greg -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss
[Rdkit-discuss] MDS using RDKit, SciKit and Pandas
Hi RDKitters, This is not a question, more like an FYI. Inspired by Noel's related post: http://baoilleach.blogspot.co.uk/2014/01/convert-distance-matrix-to-2d.html, I've put together an iPython Notebook example that performs MDS on a bunch of ChEMBL compounds (i.e. visualises their chemical space in 2D). http://nbviewer.ipython.org/gist/madgpap/8538507 Enjoy, George EMBL-EBI -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] postgres FPs in python
Hi there, DB-related question again: When I retrieve fps from a postgres db, they look like this: \x020c00102204810001040001981408420180400040048088c020800423a192001814002021044200092400040208 Is there are way to convert them to RDKit bitvector fingerprint objects or at least bitvector strings in python? Thanks, George -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Friday pandas q
Question to rdkit pandas users (pandaskitters?): I managed to have the mol_send(m) object in a pandas frame: [image: Inline images 1] if I do this: data['mol'].map(str).map(Chem.Mol) I get the mol in base64 PNG: [image: Inline images 2] How do I display the column as rendered images (and keep them internally as a Series of rdmols) ? PandasTools.ChangeMoleculeRendering seems relevant but I can't get it to display the mols Cheers, George image.pngimage.png-- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Friday pandas q
It worked! Many thanks! g On 25 October 2013 16:18, Greg Landrum greg.land...@gmail.com wrote: Hi George, Nikolas is really the expert here, but this just worked for me: curs.execute('select molregno,mol_send(m) from rdk.mols where m@ %s',('c12c1nncc2',)) d = curs.fetchall() df2 = pd.DataFrame(d,columns=('molregno','pkl')) df2['romol']=df2.apply(lambda x:Chem.Mol(str(x['pkl'])),axis=1) PandasTools.RenderImagesInAllDataFrames() del df2['pkl'] df2.head(2) -greg On Fri, Oct 25, 2013 at 4:43 PM, George Papadatos gpapada...@gmail.comwrote: Question to rdkit pandas users (pandaskitters?): I managed to have the mol_send(m) object in a pandas frame: [image: Inline images 1] if I do this: data['mol'].map(str).map(Chem.Mol) I get the mol in base64 PNG: [image: Inline images 2] How do I display the column as rendered images (and keep them internally as a Series of rdmols) ? PandasTools.ChangeMoleculeRendering seems relevant but I can't get it to display the mols Cheers, George image.pngimage.png-- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Notes from the 2013 UGM
Hi all, I'd also like to thank you all for attending this UGM at the EBI and contributing to its success. And, of course, a big thanks to Greg for, well, you know. :) See you all next year - or sooner! George On 22 October 2013 09:09, Greg Landrum greg.land...@gmail.com wrote: Hi, Looks like I'm never going to have time to do a really thorough write up of the UGM. In the interests of getting something out there, I guess I will do something short. From my point of view, the UGM was a great success. George did a great job of getting everything organized, and everything went very smoothly. We had an interesting set of talks, some good questions and discussions during the talks, and a couple of very nice social activities at the pub. The slides and ipython notebooks for many of the talks are available in github: https://github.com/rdkit/UGM_2013 A few things to note from the talks: 1) The code for PDB handling, MMFF94, and Open3DAlign is now all on the trunk. It will be in the upcoming release. 2) Jameed updated the MMPA code in Contrib; the new version is definitely worth checking out, as is Jameed's tutorial on how to use it (part of the materials linked to above). 3) Jameed (and his employer) also contributed an implementation of the Fraggle similarity algorithm described in his talk. The command line tools are now in Contrib and the main similarity code is in $RDBASE/rdkit/Chem/Fraggle. This will be in the upcoming release. The roundtable produced a long list of ideas for future features/changes. Some of these are already done, the rest will land in github as I manage to find time. We also had a discussion about the frequency of RDKit releases. It seems that the quarterly release cycle creates extra work for the community as well as me, so we're going to switch to doing releases every six months. If a critical bug is found (and fixed!) I'll do a patch release, but new features and improvements will only be released twice a year. Anyone who wants to stay on the bleeding edge can, of course, track the version of the code in github. That doesn't get checked in without passing tests on at least one platform. If this slower release cycle ends up creating problems, we can always go back to three or four times a year. Many many thanks to everyone for participating; in particular everyone who did a presentation or tutorial and George for the organization. I'm already looking forward to next year! -greg -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] rdkit mol objects from sql
Hi RDKitters, I must have seen this in an ipython notebook but can't find it right now: If I have a table of rdkit mols generated by the cartridge, is there a way to retrieve them using a psycopg2 connection within python - ideally inside a pandas dataframe? I've got this snippet: import pandas as pd import psycopg2 conn = psycopg2.connect(port=5432 user=chembl dbname=chembl_17) data = pd.read_sql(sql, conn) ...but I'm missing the step where I retrieve rdkit mol objects somehow instead of smiles. Many thanks in advance, George -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] rdkit mol objects from sql
Yes it does; many thanks! I've just found the notebook I mentioned: http://nbviewer.ipython.org/4316426/ (Scroll to bottom) I prefer Greg's first solution though, as it avoids the conversion from smiles completely. Best, George Sent from my gPad On 23 Oct 2013, at 20:39, JP jeanpaul.ebe...@inhibox.com wrote: Does the following help you george? http://comments.gmane.org/gmane.science.chemistry.rdkit.user/860 On 23 October 2013 17:11, George Papadatos gpapada...@gmail.com wrote: Hi RDKitters, I must have seen this in an ipython notebook but can't find it right now: If I have a table of rdkit mols generated by the cartridge, is there a way to retrieve them using a psycopg2 connection within python - ideally inside a pandas dataframe? I've got this snippet: import pandas as pd import psycopg2 conn = psycopg2.connect(port=5432 user=chembl dbname=chembl_17) data = pd.read_sql(sql, conn) ...but I'm missing the step where I retrieve rdkit mol objects somehow instead of smiles. Many thanks in advance, George -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit UGM 2013 - a few pictures
Lovely pics Paul. Many thanks, g On 14 October 2013 16:56, Paul Emsley pems...@mrc-lmb.cam.ac.uk wrote: https://www.dropbox.com/sh/a3s55kmxa37yx7e/vLC5uea1xP Paul. -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Problem drawing molecules under windows
Hi Andrea, Seems like a font problem to me, which could indicate the lack of cairo/pango libraries. George On 26 September 2013 17:29, Greg Landrum greg.land...@gmail.com wrote: Hi Andrea, On Thu, Sep 26, 2013 at 10:59 AM, Andrea Volkamer volka...@bio.mx wrote: ** I am relatively new to rdkit, and just started using IPython notebook under Windows. I installed WinPython-64bit-2.7.5.3 and RDKit_2013_06_1 as well as Pillow-2.1.0.win-amd64-py2.7 to do so. Anyhow, I have some trouble drawing molecules: For some reason, drawing this molecule Chem.MolFromSmiles('C11CC') works, adding, e.g., a nitrogen (Chem.MolFromSmiles('C11CCN')) doesn’t work (rdkit.Chem.rdchem.Mol at 0x7a6e8d0)? This happens for many other examples as well. Any suggestions? Does it happen exclusively for molecules with heteroatoms? -greg -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] failed tests for github master version on ubuntu
Hi there, I tried to install RDKit on a fresh Ubuntu 13.04 VM today. I checkout out the source from GitHub master but I got the following errors after ctest: 65% tests passed, 28 tests failed out of 79 Total Test time (real) = 23.19 sec The following tests FAILED: 4 - pyBV (Failed) 5 - pyDiscreteValueVect (Failed) 6 - pySparseIntVect (Failed) 9 - testPyGeometry (Failed) 12 - pyAlignment (Failed) 17 - pyDistGeom (Failed) 27 - pyDepictor (Failed) 37 - pyChemReactions (Failed) 42 - pyFragCatalog (Failed) 44 - pyMolDescriptors (Failed) 46 - pyPartialCharges (Failed) 48 - pyMolTransforms (Failed) 51 - pyForceFieldHelpers (Failed) 53 - pyDistGeom (Failed) 55 - pyMolAlign (Failed) 57 - pyChemicalFeatures (Failed) 59 - pyShapeHelpers (Failed) 61 - pyMolCatalog (Failed) 63 - pySLNParse (Failed) 64 - pyGraphMolWrap (Failed) 65 - pyTestConformerWrap (Failed) 68 - pyMatCalc (Failed) 69 - pyCMIM (Failed) 70 - pyRanker (Failed) 72 - pyFeatures (Failed) 73 - pythonTestDbCLI (Failed) 74 - pythonTestDirML (Failed) 79 - pythonTestDirChem (Failed) Errors while running CTest Any ideas why? Thanks in advance, George -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] failed tests for github master version on ubuntu
Hello again, Sorry for the false alarm - that was me messing up with the env variables and not having enough coffee to realise it earlier! However, there is still one fail: *99% tests passed, 1 tests failed out of 79* Total Test time (real) = 88.37 sec The following tests FAILED: 79 - pythonTestDirChem (Failed) Errors while running CTest Do you have any ideas why that might be? Is it safe to ignore it? George On 25 September 2013 10:04, George Papadatos gpapada...@gmail.com wrote: Hi there, I tried to install RDKit on a fresh Ubuntu 13.04 VM today. I checkout out the source from GitHub master but I got the following errors after ctest: 65% tests passed, 28 tests failed out of 79 Total Test time (real) = 23.19 sec The following tests FAILED: 4 - pyBV (Failed) 5 - pyDiscreteValueVect (Failed) 6 - pySparseIntVect (Failed) 9 - testPyGeometry (Failed) 12 - pyAlignment (Failed) 17 - pyDistGeom (Failed) 27 - pyDepictor (Failed) 37 - pyChemReactions (Failed) 42 - pyFragCatalog (Failed) 44 - pyMolDescriptors (Failed) 46 - pyPartialCharges (Failed) 48 - pyMolTransforms (Failed) 51 - pyForceFieldHelpers (Failed) 53 - pyDistGeom (Failed) 55 - pyMolAlign (Failed) 57 - pyChemicalFeatures (Failed) 59 - pyShapeHelpers (Failed) 61 - pyMolCatalog (Failed) 63 - pySLNParse (Failed) 64 - pyGraphMolWrap (Failed) 65 - pyTestConformerWrap (Failed) 68 - pyMatCalc (Failed) 69 - pyCMIM (Failed) 70 - pyRanker (Failed) 72 - pyFeatures (Failed) 73 - pythonTestDbCLI (Failed) 74 - pythonTestDirML (Failed) 79 - pythonTestDirChem (Failed) Errors while running CTest Any ideas why? Thanks in advance, George -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] failed tests for github master version on ubuntu
Hi Paolo, Thanks a lot for the quick response and the tips. My problem was actually tk and the lack of the $DISPLAY variable. Now everything has passed. See you next week! George On 25 September 2013 10:36, Paolo Tosco paolo.to...@unito.it wrote: Dear George, that test depends on the PIL Python module, which was not present in my Linux distro (Scientific Linux 6); once I installed it, the test ran fine. Regarding Ubuntu 13.04 and PIL, I just googled this thread: https://plus.google.com/112555004333838485342/posts/H8iRnbmdv7a Maybe this is your case too. Anyway, I suggest to try $ cd rdkit/Chem $ python test_list.py and see what is actually failing. HTH, Paolo On 09/25/2013 11:27 AM, George Papadatos wrote: Hello again, Sorry for the false alarm - that was me messing up with the env variables and not having enough coffee to realise it earlier! However, there is still one fail: *99% tests passed, 1 tests failed out of 79* Total Test time (real) = 88.37 sec The following tests FAILED: 79 - pythonTestDirChem (Failed) Errors while running CTest Do you have any ideas why that might be? Is it safe to ignore it? George On 25 September 2013 10:04, George Papadatos gpapada...@gmail.com wrote: Hi there, I tried to install RDKit on a fresh Ubuntu 13.04 VM today. I checkout out the source from GitHub master but I got the following errors after ctest: 65% tests passed, 28 tests failed out of 79 Total Test time (real) = 23.19 sec The following tests FAILED: 4 - pyBV (Failed) 5 - pyDiscreteValueVect (Failed) 6 - pySparseIntVect (Failed) 9 - testPyGeometry (Failed) 12 - pyAlignment (Failed) 17 - pyDistGeom (Failed) 27 - pyDepictor (Failed) 37 - pyChemReactions (Failed) 42 - pyFragCatalog (Failed) 44 - pyMolDescriptors (Failed) 46 - pyPartialCharges (Failed) 48 - pyMolTransforms (Failed) 51 - pyForceFieldHelpers (Failed) 53 - pyDistGeom (Failed) 55 - pyMolAlign (Failed) 57 - pyChemicalFeatures (Failed) 59 - pyShapeHelpers (Failed) 61 - pyMolCatalog (Failed) 63 - pySLNParse (Failed) 64 - pyGraphMolWrap (Failed) 65 - pyTestConformerWrap (Failed) 68 - pyMatCalc (Failed) 69 - pyCMIM (Failed) 70 - pyRanker (Failed) 72 - pyFeatures (Failed) 73 - pythonTestDbCLI (Failed) 74 - pythonTestDirML (Failed) 79 - pythonTestDirChem (Failed) Errors while running CTest Any ideas why? Thanks in advance, George -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- == Paolo Tosco, Ph.D. Department of Drug Science and Technology Via Pietro Giuria, 9 - 10125 Torino (Italy) Tel: +39 011 670 7680 | Mob: +39 348 5537206 Fax: +39 011 670 7687 | E-mail: paolo.tosco@unito.ithttp://open3dqsar.org | http://open3dalign.org == -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] name generator
I think this is not an actual structure to name converter but a look-up service based on a a predefined dictionary. If this is true, then it won't return anything for any novel/unseen structures. Give it a try and let us know. George. Sent from my giPhone On 27 Aug 2013, at 18:39, David Hall li...@cowsandmilk.net wrote: Not sure what software is behind it, but the NCI's Chemical Identifier Resolver may suit your needs. For your example, the URL: http://cactus.nci.nih.gov/chemical/structure/CC(C)O/iupac_name returns Propan-2-ol -David On Aug 27, 2013, at 11:54 AM, Sergio Martinez Cuesta sermar...@gmail.com wrote: thanks Greg, indeed, I only found commercial software for it http://www.chemaxon.com/marvin/help/applications/molconvert.html cheers Sergio On 27 August 2013 16:45, Greg Landrum greg.land...@gmail.com wrote: Dear Sergio, On Tue, Aug 27, 2013 at 5:21 PM, Sergio Martinez Cuesta sermar...@gmail.com wrote: is there any IUPAC name generator in RDKit? e.g. for transforming CC(C)O into propan-2-ol ? There is not. In fact, I'm not aware of any open source structure-name converters. -greg -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MMP analysis - active vs. inactive compounds
Hi Paul, I guess you firstly have to generate the list of MMPs as per Jameed's code, secondly you join your property values for MolID1 and MolID2 and finally you calculate the property difference/ratio for each MMP. Best regards, George On 3 May 2013 12:10, paul.czodrow...@merckgroup.com wrote: Dear RDKitters, has anyone applied Jameed's great code to the following scenario: - Perform a MMP analysis with respect to a particular property (e.g. activity) Given the current code, I do not see any chance to consider any property besides the compound ID. It is also not possible to provide 2 files (one for the active compounds, one for the inactive compounds) - or am I wrong? Cheers Thanks, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer. -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Cartridge problems (again) Ubuntu 64-bit
Hi RDKitters, So I've successfully installed RDKit from the *svn trunk* on a brand new Ubuntu Server 12.10 64-bit VM. All 77/77 tests passed. Yay. When I tried to build the cartridge against psql 9.1.8, 4/8 tests failed: ## Build RDKit Cartridge cd $RDBASE/Code/PgSQL/rdkit make sudo make install make installcheck == dropping database contrib_regression == NOTICE: database contrib_regression does not exist, skipping DROP DATABASE == creating database contrib_regression == CREATE DATABASE ALTER DATABASE == running regression test queries== test rdkit-91 ... FAILED test props... ok test btree... FAILED test molgist ... ok test bfpgist-91 ... FAILED test sfpgist ... ok test slfpgist ... ok test fps ... FAILED == 4 of 8 tests failed. == However, the following works: createdb test psql test psql (9.1.8) Type help for help. test=# create extension rdkit; CREATE EXTENSION test=# show rdkit.tanimoto_threshold; rdkit.tanimoto_threshold -- 0.5 (1 row) test=# select 'c1c1O'::mol; mol --- Oc1c1 (1 row) Any ideas? Many thanks in advance, George EMBL-EBI -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Cartridge problems (again) Ubuntu 64-bit
Ah, many thanks for the clarification Greg. Are these changes related to the problematic phenanthrene substructure query? When is the new release scheduled for? Cheers, George EMBL-EBI On 21 March 2013 17:58, greg landrum greg.land...@gmail.com wrote: These are due to some ongoing changes in the rdkit fingerprint. Don't worry about them. I will fix those tests after the fingerprint changes settle down, definitely before the next release. -greg On Mar 21, 2013, at 1:22 PM, George Papadatos gpapada...@gmail.com wrote: Hi RDKitters, So I've successfully installed RDKit from the *svn trunk* on a brand new Ubuntu Server 12.10 64-bit VM. All 77/77 tests passed. Yay. When I tried to build the cartridge against psql 9.1.8, 4/8 tests failed: ## Build RDKit Cartridge cd $RDBASE/Code/PgSQL/rdkit make sudo make install make installcheck == dropping database contrib_regression == NOTICE: database contrib_regression does not exist, skipping DROP DATABASE == creating database contrib_regression == CREATE DATABASE ALTER DATABASE == running regression test queries== test rdkit-91 ... FAILED test props... ok test btree... FAILED test molgist ... ok test bfpgist-91 ... FAILED test sfpgist ... ok test slfpgist ... ok test fps ... FAILED == 4 of 8 tests failed. == However, the following works: createdb test psql test psql (9.1.8) Type help for help. test=# create extension rdkit; CREATE EXTENSION test=# show rdkit.tanimoto_threshold; rdkit.tanimoto_threshold -- 0.5 (1 row) test=# select 'c1c1O'::mol; mol --- Oc1c1 (1 row) Any ideas? Many thanks in advance, George EMBL-EBI -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] problems with RDKit and Mountain Lion
Hi Greg, I built boost 1.49 from source and tried again. There is now a similar error but elsewhere: 44%] Building CXX object Code/GraphMol/SmilesParse/CMakeFiles/SmilesParse.dir/SmartsWrite.cpp.o Linking CXX shared library ../../../lib/libSmilesParse.dylib ld: warning: path '//Library/Frameworks/Python.framework/Versions/2.7/Python' following -L not a directory Undefined symbols for architecture x86_64: yysmarts_parse(char const*, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* *, void*), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmiles_parse(char const*, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* *, std::listunsigned int, std::allocatorunsigned int *, void*), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmarts_lex_init(void**), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmiles_lex_init(void**), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o setup_smarts_string(std::string const, void*), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o setup_smiles_string(std::string const, void*), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmarts_lex_destroy(void*), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmiles_lex_destroy(void*), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o _yysmarts_debug, referenced from: RDKit::SmartsToMol(std::string, int, bool, std::mapstd::string, std::string, std::lessstd::string, std::allocatorstd::pairstd::string const, std::string *) in SmilesParse.cpp.o _yysmiles_debug, referenced from: RDKit::SmilesToMol(std::string, int, bool, std::mapstd::string, std::string, std::lessstd::string, std::allocatorstd::pairstd::string const, std::string *) in SmilesParse.cpp.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [lib/libSmilesParse.2012.09.1beta.dylib] Error 1 make[1]: *** [Code/GraphMol/SmilesParse/CMakeFiles/SmilesParse.dir/all] Error 2 make: *** [all] Error 2 On 10 October 2012 02:35, Greg Landrum greg.land...@gmail.com wrote: Just an FYI, not sure if it's relevant or not: I have not yet done an rdkit build with boost 1.51, so I am not sure that the problem isn't there -greg On Tuesday, October 9, 2012, George Papadatos wrote: Hi James, You're right. I checked out the true HEAD which is 2234 but it still failed! This is the make log: MS-Verdun:build georgep$ cmake -D PYTHON_LIBRARY=/${PYTHON_ROOT}/Python -DPYTHON_INCLUDE_DIR=${PYTHON_ROOT}/Headers .. 21 | tee cmake.log -- The C compiler identification is GNU 4.2.1 -- The CXX compiler identification is Clang 4.1.0 -- Checking whether C compiler has -isysroot -- Checking whether C compiler has -isysroot - yes -- Checking whether C compiler supports OSX deployment target flag -- Checking whether C compiler supports OSX deployment target flag - yes -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of unsigned short -- Check size of unsigned short - done -- Using unsigned short -- Check if the system is big endian - little endian -- Found PythonLibs: //Library/Frameworks/Python.framework/Versions/2.7/Python (found version 2.7.3) -- Found PythonInterp: /Library/Frameworks/Python.framework/Versions/2.7/bin/python (found version 2.7.3) -- Boost version: 1.51.0 -- Found the following Boost libraries: -- python -- Found BISON: /usr/bin/bison -- Found FLEX: /usr/bin/flex -- Looking for include file pthread.h -- Looking for include file pthread.h - found -- Looking
Re: [Rdkit-discuss] problems with RDKit and Mountain Lion
Hello again, Success at last! I managed to build rdkit using brew and boost 1.49. I think the cause of the problem was a strange combination of Mountain Lion, boost 1.51 and not up-to-date rdkit svn repo in HomeBrew. So to summarise: brew update brew uninstall boost brew versions boost cd /usr/local git checkout e40bc41 /usr/local/Library/Formula/boost.rb #version 1.49.0 cd brew install boost --build-from-source brew untap edc/homebrew-rdkit brew tap edc/homebrew-rdkit brew uninstall rdkit brew install --HEAD rdkit Thanks for all the tips, George On 10 October 2012 12:32, George Papadatos gpapada...@gmail.com wrote: Hi Greg, I built boost 1.49 from source and tried again. There is now a similar error but elsewhere: 44%] Building CXX object Code/GraphMol/SmilesParse/CMakeFiles/SmilesParse.dir/SmartsWrite.cpp.o Linking CXX shared library ../../../lib/libSmilesParse.dylib ld: warning: path '//Library/Frameworks/Python.framework/Versions/2.7/Python' following -L not a directory Undefined symbols for architecture x86_64: yysmarts_parse(char const*, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* *, void*), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmiles_parse(char const*, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* *, std::listunsigned int, std::allocatorunsigned int *, void*), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmarts_lex_init(void**), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmiles_lex_init(void**), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o setup_smarts_string(std::string const, void*), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o setup_smiles_string(std::string const, void*), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmarts_lex_destroy(void*), referenced from: RDKit::(anonymous namespace)::smarts_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o yysmiles_lex_destroy(void*), referenced from: RDKit::(anonymous namespace)::smiles_parse(std::string const, std::vectorRDKit::RWMol*, std::allocatorRDKit::RWMol* ) in SmilesParse.cpp.o _yysmarts_debug, referenced from: RDKit::SmartsToMol(std::string, int, bool, std::mapstd::string, std::string, std::lessstd::string, std::allocatorstd::pairstd::string const, std::string *) in SmilesParse.cpp.o _yysmiles_debug, referenced from: RDKit::SmilesToMol(std::string, int, bool, std::mapstd::string, std::string, std::lessstd::string, std::allocatorstd::pairstd::string const, std::string *) in SmilesParse.cpp.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [lib/libSmilesParse.2012.09.1beta.dylib] Error 1 make[1]: *** [Code/GraphMol/SmilesParse/CMakeFiles/SmilesParse.dir/all] Error 2 make: *** [all] Error 2 On 10 October 2012 02:35, Greg Landrum greg.land...@gmail.com wrote: Just an FYI, not sure if it's relevant or not: I have not yet done an rdkit build with boost 1.51, so I am not sure that the problem isn't there -greg On Tuesday, October 9, 2012, George Papadatos wrote: Hi James, You're right. I checked out the true HEAD which is 2234 but it still failed! This is the make log: MS-Verdun:build georgep$ cmake -D PYTHON_LIBRARY=/${PYTHON_ROOT}/Python -DPYTHON_INCLUDE_DIR=${PYTHON_ROOT}/Headers .. 21 | tee cmake.log -- The C compiler identification is GNU 4.2.1 -- The CXX compiler identification is Clang 4.1.0 -- Checking whether C compiler has -isysroot -- Checking whether C compiler has -isysroot - yes -- Checking whether C compiler supports OSX deployment target flag -- Checking whether C compiler supports OSX deployment target flag - yes -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h
Re: [Rdkit-discuss] problems with RDKit and Mountain Lion
Hi James, You're right. I checked out the true HEAD which is 2234 but it still failed! This is the make log: MS-Verdun:build georgep$ cmake -D PYTHON_LIBRARY=/${PYTHON_ROOT}/Python -DPYTHON_INCLUDE_DIR=${PYTHON_ROOT}/Headers .. 21 | tee cmake.log -- The C compiler identification is GNU 4.2.1 -- The CXX compiler identification is Clang 4.1.0 -- Checking whether C compiler has -isysroot -- Checking whether C compiler has -isysroot - yes -- Checking whether C compiler supports OSX deployment target flag -- Checking whether C compiler supports OSX deployment target flag - yes -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of unsigned short -- Check size of unsigned short - done -- Using unsigned short -- Check if the system is big endian - little endian -- Found PythonLibs: //Library/Frameworks/Python.framework/Versions/2.7/Python (found version 2.7.3) -- Found PythonInterp: /Library/Frameworks/Python.framework/Versions/2.7/bin/python (found version 2.7.3) -- Boost version: 1.51.0 -- Found the following Boost libraries: -- python -- Found BISON: /usr/bin/bison -- Found FLEX: /usr/bin/flex -- Looking for include file pthread.h -- Looking for include file pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - found -- Found Threads: TRUE -- Boost version: 1.51.0 -- Found the following Boost libraries: -- regex -- Configuring done -- Generating done -- Build files have been written to: /Users/georgep/rdkit/rdkit-code/build *and the error is:* Linking CXX shared library ../../lib/libGraphMol.dylib ld: warning: path '//Library/Frameworks/Python.framework/Versions/2.7/Python' following -L not a directory Undefined symbols for architecture x86_64: boost::system::system_category(), referenced from: __GLOBAL__I_a in QueryAtom.cpp.o __GLOBAL__I_a in QueryBond.cpp.o __GLOBAL__I_a in ROMol.cpp.o __GLOBAL__I_a in QueryOps.cpp.o boost::mutex::mutex() in MolPickler.cpp.o __GLOBAL__I_a in MolPickler.cpp.o __GLOBAL__I_a in AtomIterators.cpp.o ... boost::system::generic_category(), referenced from: __GLOBAL__I_a in QueryAtom.cpp.o __GLOBAL__I_a in QueryBond.cpp.o __GLOBAL__I_a in ROMol.cpp.o __GLOBAL__I_a in QueryOps.cpp.o __GLOBAL__I_a in MolPickler.cpp.o __GLOBAL__I_a in AtomIterators.cpp.o __GLOBAL__I_a in AddHs.cpp.o ... ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [lib/libGraphMol.2012.09.1beta.dylib] Error 1 make[1]: *** [Code/GraphMol/CMakeFiles/GraphMol.dir/all] Error 2 make: *** [all] Error 2 Any more ideas? Regards, George On 9 October 2012 22:33, James Swetnam jswet...@gmail.com wrote: George- My templating fix was submitted as 2155, and HEAD in SVN is at 2234. I'm not terribly familiar with homebrew, or why it thinks 2148 is HEAD James On Tue, Oct 9, 2012 at 2:27 PM, George Papadatos gpapada...@gmail.comwrote: Hi James, Many thanks for the quick answer. I'm afraid I'm already using the trunk: brew install -v --HEAD rdkit --with-inchi (revision 2148) Regards, George On 9 October 2012 21:30, James Swetnam jswet...@gmail.com wrote: George- I believe you're running into an issue that was raised on the developer list. I submitted a patch for this issue, which has been applied in the SVN trunk. If you install from trunk you should be fine. Best James On Tue, Oct 9, 2012 at 12:52 PM, George Papadatos gpapada...@gmail.comwrote: HI RDKitters, I get compilation errors when I try to build RDKit on a new Mountain Lion Mac OS machine. I've tried both Eddie's brew formula and manual installation with cmake. I also tried both the beta 2012_09 versions and the 2012_06 one. Apart from the system python, I use the python.org version (2.7.3) I also used brew to build boost from source. I copied the error I get at the bottom of this message. Has anyone had a similar problem? Any ideas for troubleshooting? Many thanks, George Linking CXX shared library ../../lib/libGraphMol.dylib cd /tmp/rdkit-urlC/Code/GraphMol /usr/local/Cellar/cmake/2.8.9/bin/cmake -E cmake_link_script CMakeFiles/GraphMol.dir/link.txt --verbose=1 /usr/local/Library/ENV/4.3/c++ -shared -compatibility_version 1.0.0 -current_version 2012.9.1 -o ../../lib/libGraphMol.2012.09.1pre.dylib -install_name /tmp/rdkit-urlC/lib
Re: [Rdkit-discuss] parallel conformation generation
Hi Andrew, Thanks for this. I didn't know about the futures and progressbar modules. You wrote: --- *I have to use the zip because map(f, iterable, [chunksize=None]) only takes a single iterable. This also means I need to change the generateconformations function so that it takes a single element as input, which a 2-element tuple of the molecule and the count.* --- For such cases, there is a more elegant and pythonic way: functools.partial http://docs.python.org/library/functools.html#functools.partial It just freezes some of the arguments of a function, so you can use map with a single argument. In your case: newfunc = partial(generateconformations, size=n) map(newfunc, mols) Best regards, George P. On 4 October 2012 22:47, Andrew Dalke da...@dalkescientific.com wrote: Hi again, Greg asked why I used the concurrent.futures module rather than the multiprocessing module which is standard with Python 2.6. There are a few differences in the API which makes the futures module more interesting. First off, here's how you could write the same process pool part using the existing multiprocessing module: from multiprocessing import Pool p = Pool(5) for mol, ids in p.map(generateconformations, zip(suppl, [n]*len(suppl))): for id in ids: writer.write(mol, confId=id) I have to use the zip because map(f, iterable, [chunksize=None]) only takes a single iterable. This also means I need to change the generateconformations function so that it takes a single element as input, which a 2-element tuple of the molecule and the count. (That is, change from def generateconformations(m, n): ... to def generateconformations((m, n)): ... ). That's a touch uglier, but doable. Now, when I posted the code yesterday, I should have posted the simplest version of the code, which is: with futures.ProcessPoolExecutor(max_workers=max_workers) as executor: for mol, ids in executor.map(generateconformations, suppl, [n]*len(suppl)): for id in ids: writer.write(mol, confId=id) Then Greg wouldn't have asked me about how complex my code was. ;) This is the easiest to understand. You can see that this API supports multiple iterators. I used [n]*len(suppl) to make a new list containing repeats of the count, so I could have the twin iterators of the molecules and the count. This is a bit simpler than the multiprocessing code. In addition, the with statement know how to work with an executor. Here it means that all submitted jobs must finish before leaving the with block, and the process pool will be shut down; even if there's an exception. With the multiprocessing module, you need to manage that yourself, or trust in the memory manager. But I yesterday wrote something more like this: # Submit a set of asynchronous jobs jobs = [] for mol in suppl: if mol: job = executor.submit(generateconformations, mol, n) jobs.append(job) # Process the job results (in submission order) and save the conformers. for job in jobs: mol, ids = job.result() for id in ids: writer.write(mol, confId=id) The submit immediately returns a 'future' object, which is called a promise in some other language. You can ask for its .result() to get its result. That call will block (up to a timeout) if the result isn't there. You can also check to see if there is a result. The reason I did this is because I usually 1) show a progress bar and 2) have enough memory to store all the results in memory. I've enjoyed using the 'progressbar' module, from http://pypi.python.org/pypi/progressbar/ I have code which looks like this: with futures.ProcessPoolExecutor(max_workers=4) as executor: for (collection, first_id, last_id) in blocks: jobs.append(executor.submit(process_block, tmpdir, config, collection, first_id, last_id)) widgets = [Fingerprinting , progressbar.Percentage(), , progressbar.ETA(), , progressbar.Bar()] pbar = progressbar.ProgressBar(widgets=widgets, maxval=len(jobs)) for job in pbar(futures.as_completed(jobs)): job.result() This submits all of the fingerprinting jobs to the process pool. The futures.as_completed() function takes an iterable of jobs and returns each one as they become available, no matter what the order is. Then the ProgressBar sees the new item, updates the terminal display to show progress information and an ETA, only to return the original object itself as an iterator. Finally, I call job.result() in the loop, since .result() will forward any exceptions if one had happened during the original call. Then if I want the results I iterate over them again: for job in jobs: ... do something with job.result() ... BTW, you don't need to keep things around in memory. You can also do things purely asynchronously, should the output order not memory. In that case, the
Re: [Rdkit-discuss] Reading files (SmilesMolSupplier, SDMolSupplier
Hi Fabian, The first one is easy: the function expects a header in the file by default. There is a parameter that toggles this but I don't have access to a computer right now. There is an example in the documentation. Best regards, George Sent from my gPad On 7 Sep 2012, at 13:34, Fabian Dey fabian...@gmail.com wrote: Hi I found two issues when reading files: 1) I might be getting something wrong here, but it seems as if SmilesMolSupplier misses the very first Smiles: input smiles file test.smi: C mola CC molb CCC molc mold # python script from rdkit import Chem suppl = Chem.SmilesMolSupplier('test.smi'); print TEST-1 : %s %s %(Chem.MolToSmiles(suppl[0]),suppl[0].GetProp(_Name)) print for mol in suppl: print TEST-2 : %s %s %(Chem.MolToSmiles(mol),mol.GetProp(_Name)) print for i,mol in enumerate(suppl): print TEST-3 : %s %s %(Chem.MolToSmiles(mol),mol.GetProp(_Name)) #output TEST-1 : CC molb TEST-2 : CC molb TEST-2 : CCC molc TEST-2 : mold TEST-3 : CC molb TEST-3 : CCC molc TEST-3 : mold The first molecule mola is not available through the supplier (also happens with other smiles files). 2) SDMolSupplier : I have a script which calculates properties from SDfiles read in through the corresponding supplier and RDKIT occassionally reported the following errors: Pre-condition Violation Atomic number not found Violation occurred on line 56 in file /home/dey/Downloads/RDKit_2012_06_1/Code/GraphMol/PeriodicTable.h Failed Expression: atomicNumberbyanum.size() [12:25:23] Unexpected error hit on line 6 [12:25:23] ERROR: moving to the begining of the next molecule ERROR for molecule at position 0 It turned out that for the corresponding SD-file the atom elements were written in all captial letters (e.g. CL) - if these were changed to the proper format (Cl) RDKIT passed without throwing an error. Although I can preprocess the SD-files with a script, it would be nice if RDKIT could handle these cases internally. Best Fabian -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Faulty valence for nitrogen in aromatic ring
Wow, this almost makes me wanting to re-write my thesis in LaTeX. Almost! :) George On 15 August 2012 16:26, Greg Landrum greg.land...@gmail.com wrote: On Wed, Aug 15, 2012 at 5:10 PM, Michael Palmer mpal...@uwaterloo.ca wrote: Now that I've at least tried to clear up what is going on, maybe I can be more helpful: was there a specific question you were trying to answer that led you to your discovery that the RDKit behaves strangely in this special case? What I'm trying to do can be inspected here: http://chimpsky.uwaterloo.ca/mol2chemfig/index Briefly, I'm building a program for converting molecular structures from smiles or molfile format to TeX code, using the syntax defined by the chemfig package as the target. , coool! rdkit does all the heavy lifting. I was using the GetImplicitHs method to determine how many hydrogens to attach to carbons and heteroatoms and then noticed that the number of hydrogens on nitrogen in rings was off. From your answer, it seems I should be using GetTotalNumHs. However, I would still like to be able to distinguish between hydrogens that were specified in a molfile, with coordinates, and those that weren't. the answer to this isn't super straightforward, so it probably won't come until tomorrow. Another question I ran into was accessing the coordinates of an atom, either loaded from molfile or, with smiles, computed with AllChem.Compute2DCoords. Does the atom object have a method to get at those? Right now, I'm using some embarrassing workaround. This one I can answer quickly. You need to the molecule's conformer: In [7]: AllChem.Compute2DCoords(m) Out[7]: 0 In [8]: conf = m.GetConformer() In [9]: for atom in m.GetAtoms(): ...: aid = atom.GetIdx() ...: print aid,list(conf.GetAtomPosition(aid)) ...: 0 [0.15858546683951269, -1.1294387542967057, 0.0] 1 [-1.3046720119188515, -1.4594047386916416, 0.0] 2 [-2.3220596761687866, -0.35716958200679838, 0.0] 3 [-1.8761898616603592, 1.07503155907298, 0.0] 4 [-0.41293238290199596, 1.4049975434679163, 0.0] 5 [0.60445528134793969, 0.30276238678307321, 0.0] 6 [2.0677127601063026, 0.63272837117800962, 0.0] 7 [3.0851004243562379, -0.46950678550683356, 0.0 -greg -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] windows binary install
Hi Alan, You're almost there but it seems now that you need to upgrade your numpy from 1.4 to 1.6. Regards, George P. Sent from my gPhone On 11 Aug 2012, at 21:38, stanley5101 stanley5...@yahoo.co.uk wrote: I've gone to this archived message and installed the library mentioned. It just installed itself rather than asking me where I wanted to put it. However, rdkit seems to be recognising it as I now get a new error (pasted below) . Do I have to go to an older rdkit which likes numpy version 4? Python 2.7.1 |EPD 7.0-2 (32-bit)| (r271:86832, Dec 2 2010, 10:35:02) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. import rdkit from rdkit import Chem RuntimeError: module compiled against API version 6 but this version of numpy is 4 RuntimeError: module compiled against API version 6 but this version of numpy is 4 From: James Davidson j.david...@vernalis.com To: stanley5...@yahoo.co.uk Cc: rdkit-discuss@lists.sourceforge.net Sent: Saturday, 11 August 2012, 7:58 Subject: Re: [Rdkit-discuss] windows binary install Hi Alan, My guess is that your problem is missing DLLs, available in the MS C++ Redistributable package – solution described by George for a very similar problem: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02381.html. I now tend to explicitly just put a copy of these two DLLs into the RDKit lib folder when installing for others, and I can reproduce your error if I remove one of these DLLs from there on my system. Cheers James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies 100 Berkshire Place Wharfedale Road Winnersh, Berkshire RG41 5RD, England Tel: +44 (0)118 938 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the Company address and registration details link at the bottom of the page.. __ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] cartridge problems
packages... en_au en_gb en_us en_za Setting up postgresql-9.1 (9.1.3-2) ... Creating new cluster (configuration: /etc/postgresql/9.1/main, data: /var/lib/postgresql/9.1/main)... Moving configuration file /var/lib/postgresql/9.1/main/postgresql.conf to /etc/postgresql/9.1/main... Moving configuration file /var/lib/postgresql/9.1/main/pg_hba.conf to /etc/postgresql/9.1/main... Moving configuration file /var/lib/postgresql/9.1/main/pg_ident.conf to /etc/postgresql/9.1/main... Configuring postgresql.conf to use port 5432... update-alternatives: using /usr/share/postgresql/9.1/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode. * Starting PostgreSQL 9.1 database server [ OK ] Setting up postgresql (9.1+130~precise) ... Setting up postgresql-server-dev-9.1 (9.1.3-2) ... Setting up postgresql-server-dev-all (130~precise) ... Processing triggers for libc-bin ... ldconfig deferred processing now taking place Again, *all* the tests fail as do the create extension attempts. I even tried explicit postgresql-9.1 and postgresql-9.2 (beta version) but with the same sad results. Do I do something wrong here, like still installing the default postgresql packages and not the good ones? Regards, George On 29 May 2012 23:27, George Papadatos gpapada...@gmail.com wrote: Hi Jan, Many thanks for the reply. Yes, I used apt-get and the default repositories to install postgresql on a Ubuntu 12.04. I'll follow your guidelines and the new repos tomorrow and I'll let you know. Many thanks again, George On 29 May 2012 23:15, Jan Holst Jensen j...@biochemfusion.com wrote: On 2012-05-29 17:45, George Papadatos wrote: Hi RDKitters, Today I tried to install the RDKit and cartridge to a brand new Ubuntu 12.04 32-bit running on a Virtual Box. [...] Then when I tried to install the extension: georgep@george-VB:~$ psql -c 'CREATE EXTENSION rdkit' gpdb FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 connection to server was lost or even: georgep@george-VB:~/local/rdkit/rdkit_trunk/Code/PgSQL/rdkit$ psql gpdb psql (9.1.3) Type help for help. gpdb=# create extension rdkit; FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 The connection to the server was lost. Attempting reset: Succeeded. gpdb=# show rdkit.tanimoto_threshold; ERROR: unrecognized configuration parameter rdkit.tanimoto_threshold Any ideas would be much appreciated! Hi George, Sounds exactly like the behavior I described in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=CAD4fdRRHdpqDRCWd5AjEzDWJia5WM6zsq%3DosvVmb%3DYHe%3DpmR7A%40mail.gmail.comforum_name=rdkit-discuss My issue seemed to be related to the OpenSCG version of PostgreSQL and for my purposes the issue was solved by using Martin Pitt's postgres packages instead. However, on one machine, a Linux Mint box, I never got it working. Are you using all plain vanilla packages from Ubuntu 12.04 ? Cheers -- Jan -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] cartridge problems
Hi Jan, Mine is exactly the same: gcc test.c -I/usr/include/postgresql/9.1/server;./a.out 90103 So, I am back to square 1! I am starting to get a bit desperate here, has anyone ever successfully built the cartridge from the trunk on a plain Ubuntu 12.04? Many thanks for your help, George On 30 May 2012 12:48, Jan Holst Jensen j...@biochemfusion.com wrote: On 2012-05-30 13:24, George Papadatos wrote: Thanks to both of you. I do not know how to check for the PG_VERSION_NUM. I tried to edit to guc.c by removing the conditional check of the PG_VERSION but with the same results: Adrian, is this what you meant? DefineCustomRealVariable( rdkit.tanimoto_threshold, Lower threshold of Tanimoto similarity, Molecules with similarity lower than threshold are not similar by % operation, rdkit_tanimoto_smlar_limit, 0.5, 0.0, 1.0, PGC_USERSET, 0, (GucRealCheckHook)**TanimotoLimitAssign, NULL, NULL ); DefineCustomRealVariable( rdkit.dice_threshold, Lower threshold of Dice similarity, Molecules with similarity lower than threshold are not similar by # operation, rdkit_dice_smlar_limit, 0.5, 0.0, 1.0, PGC_USERSET, 0, (GucRealCheckHook)**DiceLimitAssign, NULL, NULL ); Regards, George Hi George, I just tried the same on my VM, with no change for the better either. My version of guc.c now looks like this: static void initRDKitGUC() { if (rdkit_guc_inited) return; DefineCustomRealVariable( rdkit.tanimoto_threshold, Lower threshold of Tanimoto similarity, Molecules with similarity lower than threshold are not similar by % operation, rdkit_tanimoto_smlar_limit, 0.5, 0.0, 1.0, PGC_USERSET, 0, //if PG_VERSION_NUM = 90100 (GucRealCheckHook)**TanimotoLimitAssign, NULL, //else // TanimotoLimitAssign, //endif NULL ); DefineCustomRealVariable( rdkit.dice_threshold, Lower threshold of Dice similarity, Molecules with similarity lower than threshold are not similar by # operation, rdkit_dice_smlar_limit, 0.5, 0.0, 1.0, PGC_USERSET, 0, //if PG_VERSION_NUM = 90100 (GucRealCheckHook)**DiceLimitAssign, NULL, //else // DiceLimitAssign, //endif NULL ); rdkit_guc_inited = true; } Did a cartridge make clean, make, sudo make install, and it still fails for me with postgres=# create extension rdkit; FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 The connection to the server was lost. Attempting reset: Succeeded. postgres=# Cheers -- Jan -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] cartridge problems
Compare it with this one: georgep@george-VB:~/local/rdkit/rdkit_trunk/Code/PgSQL/rdkit$ make installcheck /usr/lib/postgresql/9.1/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=. --psqldir='/usr/lib/postgresql/9.1/bin' --dbname=contrib_regression rdkit-91 props btree molgist bfpgist-91 sfpgist slfpgist fps (using postmaster on Unix socket, default port) == dropping database contrib_regression == DROP DATABASE == creating database contrib_regression == CREATE DATABASE ALTER DATABASE == running regression test queries== test rdkit-91 ... FAILED test props... FAILED test btree... FAILED test molgist ... FAILED test bfpgist-91 ... FAILED test sfpgist ... FAILED test slfpgist ... FAILED test fps ... FAILED == 8 of 8 tests failed. == The differences that caused some tests to fail can be viewed in the file /home/georgep/local/rdkit/rdkit_trunk/Code/PgSQL/rdkit/regression.diffs. A copy of the test summary that you see above is saved in the file /home/georgep/local/rdkit/rdkit_trunk/Code/PgSQL/rdkit/regression.out. make: *** [installcheck] Error 1 gpdb=# create extension rdkit with schema rdkit; FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 FATAL: failed to initialize rdkit.tanimoto_threshold to 0.5 The connection to the server was lost. Attempting reset: Succeeded. gpdb=# On 30 May 2012 14:49, Adrian Schreyer ams...@cam.ac.uk wrote: 64-bit, PostgreSQL packages are from the official archive. I simply do 'make' followed by 'sudo make install' and then create schema rdkit; create extension rdkit with schema rdkit; and that's it. $ make installcheck /usr/lib/postgresql/9.1/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=. --psqldir='/usr/lib/postgresql/9.1/bin' --dbname=contrib_regression rdkit-91 props btree molgist bfpgist-91 sfpgist slfpgist fps (using postmaster on Unix socket, default port) == dropping database contrib_regression == DROP DATABASE == creating database contrib_regression == CREATE DATABASE ALTER DATABASE == running regression test queries== test rdkit-91 ... ok test props... ok test btree... ok test molgist ... ok test bfpgist-91 ... ok test sfpgist ... ok test slfpgist ... ok test fps ... ok = All 8 tests passed. = On Wed, May 30, 2012 at 2:43 PM, Jan Holst Jensen j...@biochemfusion.com wrote: How odd. Adrian, are you using a 32-bit or 64-bit version of Ubuntu 12.04 ? Cheers -- Jan On 2012-05-30 15:26, Adrian Schreyer wrote: Yes, I could build and install the cartridge without problems (Release_2012.03.1) on 12.04. On Wed, May 30, 2012 at 2:23 PM, George Papadatosgpapada...@gmail.com wrote: Hi Jan, Mine is exactly the same: gcc test.c -I/usr/include/postgresql/9.1/server;./a.out 90103 So, I am back to square 1! I am starting to get a bit desperate here, has anyone ever successfully built the cartridge from the trunk on a plain Ubuntu 12.04? Many thanks for your help, George -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Python 2.7 binaries for WinXP VirtualBox - FYI
Hello RDKiters, Just a quick thing to say that I had some problems installing the latest 2012_03 binaries on a Virtual WinXP Pro machine. The error was deceptively familiar: In [1]: from rdkit.Chem import AllChem as Chem --- ImportError Traceback (most recent call last) C:\Documents and Settings\georgep\ipython-input-1-395511f74b21 in module() 1 from rdkit.Chem import AllChem as Chem C:\RDKit_2012_03_1\rdkit\Chem\__init__.py in module() 16 17 --- 18 from rdkit import rdBase 19 from rdkit import RDConfig 20 ImportError: DLL load failed: The specified module could not be found. ...and it is usually attributed to not setting the PATH properly. After make sure that this was fine, I had to use the Dependency Walker against the rdBase.pyd, which pointed out that there were a couple of dlls missing (msvcp100 and msvcr100). Everything was solved after the installation of MS C++ redist package I found here: http://www.microsoft.com/en-us/download/details.aspx?id= I hope this will prevent somebody else from wasting their morning with troubleshooting! Regards, George Papadatos -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Failed Expression: pick = 0
Hi Andrew, This is probably not going to solve the problem at hand but it may be useful to you or others in the future: ChEMBLdb maintains a molecular hierarchy table where you can retrieve the parent (=desalted - using Pipeline Pilot) structures for each molecular entity. You may try something like this: select distinct cs.molregno, cs.molfile, cs.canonical_smiles from compound_structures cs, molecule_hierarchy mh where cs.molregno = mh.parent_molregno This will give you all the *unique* desalted structures in chEMBL. In case you want to keep also track of the molregnos of the salt forms for each parent structure, try (mysql-specific): select cs.molregno, group_concat(mh.molregno), cs.molfile, cs.canonical_smiles from compound_structures cs, molecule_hierarchy mh where cs.molregno = mh.parent_molregno group by cs.molregno I hope it hels. Best regards, George Papadatos EMBL-EBI On 30 April 2012 21:32, Andrew Dalke da...@dalkescientific.com wrote: I'm desalting the ChEMBL data set and generating the corresponding de-salted SD and SMILES files. I found a problem in the conversion step, and found that the problem has nothing to do with the de-salting. My code failed with CHEMBL1269997, which is record ~750,200 out of 1,142,974. (In other words, it took a while to get to this point.) Here's a reproducible: from rdkit import Chem writer = Chem.SDWriter(/dev/stdout) for mol in Chem.ForwardSDMolSupplier(CHEMBL1269997.sdf): ... writer.write(mol) ... [22:11:05] Invariant Violation Violation occurred on line 388 in file /tmp/homebrew-rdkit-HEAD-Ebdo/Code/GraphMol/FileParsers/MolFileStereochem.cpp Failed Expression: pick = 0 Traceback (most recent call last): File stdin, line 2, in module RuntimeError: Invariant Violation Chem.MolToSmiles(mol) 'OCC1=CC2OC(CC(C)C)(CC(C)C)C3C456C(OC(C)(C)O5)C1(O)C46C23' Chem.MolToSmiles(mol, isomericSmiles=True) 'OCC1=C[C@@H]2OC(CC(C)C)(CC(C)C)[C@@H]3[C@H]4CCC[C@@]56[C@ @H](OC(C)(C)O5)[C@]1(O)[C@]46[C@H]23' You can see that the molecule was read in, is not None, and can be used to generate a SMILES. The CHEMBL1269997.sdf is attached. This error was previously reported in the thread JP started, titled Invariant violation..., dated July 6, 2011. Greg replied: Wow that is certainly an error I never expected to see. From the code, I guess the molecule has a stereocenter that is surrounded by other stereocenters and something extremely unfortunate is happening with the way decisions are being made about which bonds to wedge. As Eddie requested in an earlier message, it would be helpful to have the input that produced the error so that it can be added to the test cases (and so that I can be sure the problem is fixed once I figure out how to). but I see no posting of a failing structure. I hope the attached structure helps resolve this problem. Andrew da...@dalkescientific.com -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Strange SMILES behaviour
Hello all, Could anyone please explain this: In [21]: Chem.CanonSmiles('C1=CC=C2C(=C1)NC=S2') Out[21]: 'c1[nH]c2c2s1' In [22]: Chem.MolFromSmiles(Out[21]) [16:47:14] Can't kekulize mol In other words, how is it possible that a valid RDKit SMILES output fails to be converted to molecule again? I'm sure this has to do with aromaticity and kekulization for benzothiazole but still Many thanks in advance, George -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Strange SMILES behaviour
Thanks for the prompt reply Greg; this is what I suspected too! Regards, George On 12 March 2012 17:22, Greg Landrum greg.land...@gmail.com wrote: Hi George, On Mon, Mar 12, 2012 at 5:58 PM, George Papadatos gpapada...@gmail.com wrote: Could anyone please explain this: In [21]: Chem.CanonSmiles('C1=CC=C2C(=C1)NC=S2') Out[21]: 'c1[nH]c2c2s1' In [22]: Chem.MolFromSmiles(Out[21]) [16:47:14] Can't kekulize mol In other words, how is it possible that a valid RDKit SMILES output fails to be converted to molecule again? I'm sure the general answer isn't a surprise: it's a bug It may actually be more than one bug. The SMILES 'C1=CC=C2C(=C1)NC=S2' probably shouldn't produce a legal molecule. It certainly shouldn't produce one with an aromatic ring. That's not really a valid/reasonable resonance structure for benzothiazole. This would be ok: S1C=NC2=CC=CC=C12 o The output smiles: 'c1[nH]c2c2s1' is also not a reasonable molecule, which the RDKit recognizes when it tries to read it back in. I'm going to have to think about where the right place to fix this is. -greg -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 2011.09 (Q3 2011) RDKit release
Hi James, This looks like there are missing dlls from the lib folder. The easiest solution would to be copy all the files from the lib folder of the previous (working) RDKit version and paste them in the lib folder of the current one (without overwriting them). Regards, George On 16 October 2011 15:56, James Davidson j.david...@vernalis.com wrote: ** Hi Greg, I probably should have picked this up in the beta (but didn't...) When I try to import AllChem, I see the following: from rdkit import Chem from rdkit.Chem import AllChem Traceback (most recent call last): File pyshell#6, line 1, in module from rdkit.Chem import AllChem File C:\Python27\RDKit_2011_09_1\rdkit\Chem\AllChem.py, line 28, in module from rdkit.Chem.rdSLNParse import * ImportError: DLL load failed: The specified module could not be found. Any advice? Kind regards James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the Company address and registration details link at the bottom of the page.. __ -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 2011.09 (Q3 2011) RDKit release
Hi Greg, This should work - this is how I solved a similar problem with the latest RDKit version for Windows. Cheers, George On 16 October 2011 16:32, Greg Landrum greg.land...@gmail.com wrote: I'm traveling and don't have access to the machine where I normally do windows builds, but I tried to create an alternate binary using dlls from an older RDKit distribution. Please give this a try: http://code.google.com/p/rdkit/downloads/detail?name=RDKit_2011_09_1.win32.py27.pkg2.zip and let me know if it works. If so I will go ahead and replace the current binaries with this one. Sorry for the hassle and thanks for the help, -greg -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Partial/Rooted/Anchored Morgan Fingerprint
Hi Greg, That's great , thanks a lot for your help. Regards, George On 30 September 2011 07:01, Greg Landrum greg.land...@gmail.com wrote: Hi George, On Thu, Sep 29, 2011 at 1:11 PM, George Papadatos gpapada...@gmail.com wrote: I'd like to calculate the *rooted* Morgan fingerprint for a set of molecules. By rooted I mean the subset of the whole-molecule fingerprint which contains just the bits which correspond to circular atom layers (up to N bond lengths) that include a specific atom. So let's say that there is a single Uranium atom in each molecule. What I want to calculate is the subset of the Morgan fingerprint (let's say with a radius of 3) which contains the bits set on by layers including my U atom. This should include not only the bits where U was the root of the layer, but also the bits where U was in the layer of neighboring atoms, up to 3 bonds away. A minor point: I wouldn't call this the rooted fingerprint since it includes bits that are set by layers that are not centered at your U atom. After checking the super-helpful Getting Started with the RDKit in Python (Q2 2011) tutorial, section 5.4.1, I can see one way of doing this: calculating the Morgan fp and then enumerating all the sub-molecules (or layers) that set the corresponding bits on and then checking if U is in any one of these submolecules. If it is then the corresponding bit is part of the root Morgan fp. Is there any other more efficient way??? If you only want the bits that are set by a particular atom (i.e. those that are centered at that atom), you can use the fromAtoms argument: from rdkit import Chem from rdkit.Chem import rdMolDescriptors m1 = Chem.MolFromSmiles('Cc1c1') m2 = Chem.MolFromSmiles('Cc1c(C)1') rdMolDescriptors.GetMorganFingerprint(m1,1,fromAtoms=[0]).GetNonzeroElements() {2246728737: 1, 422715066: 1} rdMolDescriptors.GetMorganFingerprint(m1,2,fromAtoms=[0]).GetNonzeroElements() {2246728737: 1, 422715066: 1, 2218109011: 1} rdMolDescriptors.GetMorganFingerprint(m2,1,fromAtoms=[0]).GetNonzeroElements() {2246728737: 1, 422715066: 1} rdMolDescriptors.GetMorganFingerprint(m2,2,fromAtoms=[0]).GetNonzeroElements() {2246728737: 1, 422715066: 1, 2368203427: 1} Note that I just fixed a bug that was leading to missing bits in the morgan fingerprints generated with a fromAtoms argument. If you want all bits that the atom is involved in, I would suggest using the fromAtoms argument, but also including all the atoms that are within the appropriate radius of your atom. You can find these atoms using the molecule's distance matrix: m1 = Chem.MolFromSmiles('Cc1c1') dm=Chem.GetDistanceMatrix(m1) dm array([[ 0., 1., 2., 3., 4., 3., 2.], [ 1., 0., 1., 2., 3., 2., 1.], [ 2., 1., 0., 1., 2., 3., 2.], [ 3., 2., 1., 0., 1., 2., 3.], [ 4., 3., 2., 1., 0., 1., 2.], [ 3., 2., 3., 2., 1., 0., 1.], [ 2., 1., 2., 3., 2., 1., 0.]]) I hope this helps, -greg -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] label properties on 2D depiction
Hi all, On a related topic, is it possible to depict an arbitrary string on a cairo canvas? I am thinking particularly depicting the name or ID of a molecule below its structure. Regards, George On 12 July 2011 06:36, Peter Schmidtke pschmid...@ub.edu wrote: Thanks Greg, I'll give it a try ;) ++ Peter On 07/11/2011 06:47 PM, Greg Landrum wrote: Hi Peter, On Mon, Jul 11, 2011 at 1:35 PM, Peter Schmidtke pschmid...@ub.edu wrote: I wondered if it was possible and easy to show some numerical properties or strings or whatever for each atom on a 2d representation of a molecule. Is something like that implemented (didn't really see it right now)? There's nothing like that built in, but pretty much everything you need to be able to annotate the drawing yourself after the molecule has been drawn is already there. Take a look at the code in $RDBASE/rdkit/Chem/Draw/__init__.py:MolToImage After line 70 executes, you have a canvas (either cairo, aggdraw, or sping, depending on which system you have installed) that contains the molecule drawing as well as MolDrawing instance named drawer. drawer has a data element atomPs that can be used to get the position of atoms in canvas coordinates : drawer.atomPs[mol][atomIdx]. The code for the individual canvases shows how to do something with these coordinates. -greg -- Peter Schmidtke PhD Student Dept. Physical Chemistry Faculty of Pharmacy University of Barcelona -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] random forest in RDKit
Hi guys, I'd also be interested in some ML examples. Regards, George On 2 May 2011 20:52, Igor Filippov igor.v.filip...@gmail.com wrote: Hi Greg, Yes, actually for this project I'm interested in Python specifically! Time to learn me some new tricks :) I was looking through the docs online but I cannot figure it out :( Best regards, Igor On Mon, 2011-05-02 at 21:45 +0200, Greg Landrum wrote: Hi Igor, On Mon, May 2, 2011 at 9:08 PM, Igor Filippov igor.v.filip...@gmail.com wrote: Can anybody point me in the right direction (some simple code snippets would be best) how to use machine learning methods in RDkit? I am especially interested in RandomForest implementation. The machine learning code is mostly written in Python. I know you're primarily a C++ user, are you still interested? -greg -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Python 2.7 binaries for win32
Hello, FYI, it seems there is an inconsistency in the RDKit binaries for Windows Python 2.7, as the dependency walker indicated: The rdBase.pyd looks for a boost_python-vc-mt-1_44.dll in %RDBASE%/lib whereas the actual name of the dll is boost_python-vc*90*-mt-1_44.dll This is probably what caused the problem for me. Removing the '90' from the 2 dlls in lib folder seems to do the trick. Regards, George On 18 April 2011 08:17, George Papadatos gpapada...@gmail.com wrote: Hi Uwe, Thanks for the reply. Perhaps I did not make it clear but what I meant is that I appended %RDBASE%\lib to the PATH variable. Regards, George Sent from my gPhone On 18 Apr 2011, at 07:49, Uwe Hoffmann chemis...@uwe-hoffmann.de wrote: Hi George, Am 17.04.2011 12:03, schrieb George Papadatos: So... I've copied the binaries folder to C:\RDKit_2011_03_1 I've added the variables: RDBASE = C:\RDKit_2011_03_1 PYTHONPATH = %RDBASE% PATH = %RDBASE%\lib This seems to be problematic because you overwrite the whole PATH environment variable. ImportError: DLL load failed: The specified module could not be found. regards, Uwe -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Python 2.7 binaries for win32
Hi Uwe, Thanks for the reply. Perhaps I did not make it clear but what I meant is that I appended %RDBASE%\lib to the PATH variable. Regards, George Sent from my gPhone On 18 Apr 2011, at 07:49, Uwe Hoffmann chemis...@uwe-hoffmann.de wrote: Hi George, Am 17.04.2011 12:03, schrieb George Papadatos: So... I've copied the binaries folder to C:\RDKit_2011_03_1 I've added the variables: RDBASE = C:\RDKit_2011_03_1 PYTHONPATH = %RDBASE% PATH = %RDBASE%\lib This seems to be problematic because you overwrite the whole PATH environment variable. ImportError: DLL load failed: The specified module could not be found. regards, Uwe -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Python 2.7 binaries for win32
Cheers, Greg. George On 17 April 2011 06:14, Greg Landrum greg.land...@gmail.com wrote: Dear all, After a couple of requests, I just uploaded a win32 build of the 2011.03 release that supports Python 2.7 to both the google code and sourceforge download sites. Best Regards, -greg -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Installation driving me mad (RDKit on Centos 5.4 final)
Fair enough, I did not know that! However, according to the same documentation, these packages are highly recommended for NumPy and required for SciPy: http://scipy.org/Installing_SciPy/Linux#head-9cf6f4b7fe9ba63fc228203c4f28554a74970847 http://scipy.org/Installing_SciPy/Linux#head-9cf6f4b7fe9ba63fc228203c4f28554a74970847In any case, here is a repository for CentOS 5/RHEL 5 with the necessary rpms (for those who can't access yum): http://download.opensuse.org/repositories/home:/ashigabou/ http://download.opensuse.org/repositories/home:/ashigabou/After that, Kirk's walk though has been most helpful. George On 23 February 2011 11:12, Greg Landrum greg.land...@gmail.com wrote: Let me elaborate on that... from the numpy installation page (http://docs.scipy.org/doc/numpy/user/install.html: NumPy does not require any external linear algebra libraries to be installed. However, if these are available, NumPy’s setup script can detect them and use them for building. A number of different LAPACK library setups can be used, including optimized LAPACK libraries such as ATLAS, MKL or the Accelerate/vecLib framework on OS X. Best, -greg On Wed, Feb 23, 2011 at 12:10 PM, Greg Landrum greg.land...@gmail.com wrote: I'm not convinced of that. I'm pretty sure that I have built numpy on redhat and ubuntu systems without ever installing lapack. -greg On Wed, Feb 23, 2011 at 12:06 PM, George Papadatos gpapada...@gmail.com wrote: ...yet you need them to build Numpy... George On 23 February 2011 11:03, Greg Landrum greg.land...@gmail.com wrote: To be very clear: you do not need *any* of these packages to install the RDKit. -greg On Wed, Feb 23, 2011 at 10:53 AM, JP jeanpaul.ebe...@inhibox.com wrote: Great wiki - I wonder how I missed that. But the first instruction sudo yum install atlas, atlas-devel, blas blas-devel lapack lapack-devel Gives me the following error: No package atlas, available. No package atlas-devel, available. No package blas available. No package lapack available. Is there a repos I have to add to /etc/yum.repos.d/ ? On 22 February 2011 18:41, Robert DeLisle rkdeli...@gmail.com wrote: What are your environment settings? You should have at minimum, these: $RDBASE = the directory where you have installed the RDKit code $LD_LIBRARY_PATH = /usr/local/lib:/$RDBASE/lib $PYTHONPATH = $RDBASE At least this worked for me for a CentOS installation, detailed here - http://code.google.com/p/rdkit/wiki/BuildingOnCentOS Another possibility is your PATH variable. Make sure that /usr/local pathnames precede any /usr options. This will ensure looking into /usr/local first. There also may be options for cmake that will force it into the correct directory. I've found in the past that even though it says in the initial output that is looking in the correct location for boost and python, it doesn't necessarily follow its own advice. -Kirk On Tue, Feb 22, 2011 at 9:44 AM, JP jeanpaul.ebe...@inhibox.com wrote: I ended up not using yum to install Numpy - I installed it from source, which was only slightly painful. import platform; print platform.python_version() # /usr/local/lib/python2.7/platform.pyc matches /usr/local/lib/python2.7/platform.py import platform # precompiled from /usr/local/lib/python2.7/platform.pyc 2.7.0 import numpy as N a=N.random.randn(10, 10) In /usr/lib64/ I can find some libpython2.4.so , libpython2.4.so.1.0 What should I do? On 22 February 2011 16:23, rkdeli...@gmail.com wrote: Are you sure that your NumPy installation is going to the correct Python instance? I see from the logs that you have Python 2.7 installed, or at least that is what cmake is finding at /usr/local/lib. You use yum to install NumPy, but the standard installation of Python on CentOS 5.x is 2.4 and it is located in /usr/lib. Which version of Python has NumPy? -Kirk On Feb 22, 2011 9:14am, JP jeanpaul.ebe...@inhibox.com wrote: I've installed Atlas, Numpy, Boostand everything works fine until I try: cmake .. -DBOOST_ROOT=/share/apps/boost_1_45_0/ sudo make VERBOSE=1 At which point everything fails as follows:: [ 3%] Building CXX object Code/RDBoost/CMakeFiles/RDBoost.dir/Wrap.cpp.o cd /share/apps/RDKit_2010_12_1/build/Code/RDBoost /usr/bin/c++ -DRDBoost_EXPORTS -O3 -DNDEBUG -fPIC -I/usr/local/include/python2.7 -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/share/apps/boost_1_45_0/include -I/share/apps/RDKit_2010_12_1/Code -Wno-deprecated -Wno-unused-function -fno-strict-aliasing -fPIC -o CMakeFiles/RDBoost.dir/Wrap.cpp.o -c /share/apps/RDKit_2010_12_1/Code/RDBoost/Wrap.cpp Linking CXX
[Rdkit-discuss] KNIME + Java RDKit library
Hi guys, I installed the RDKit nodes for KNIME (by copying the plugins folder manually, as I too had problems with the 'update from file' feature). Inspired by the source code that was bundled with the nodes, I tried to use the RDKit libraries in KNIME/Eclipse in order to develop my own nodes based on the RDKit toolkit. For example: import org.RDKit.RDKFuncs; import org.RDKit.ROMol; public class RDKitTest { public static void main (String[] args) throws Exception { ROMol mol = null; String smi = c1c1N; mol = RDKFuncs.MolFromSmiles(smi); System.out.println(mol.getNumAtoms()); } However, this script throws the following runtime error: Exception in thread main java.lang.UnsatisfiedLinkError: org.RDKit.RDKFuncsJNI.MolFromSmiles(Ljava/lang/String;)J at org.RDKit.RDKFuncsJNI.MolFromSmiles(Native Method) at org.RDKit.RDKFuncs.MolFromSmiles(RDKFuncs.java:65) In the Eclipse lib folder, I included all the .jar files and the RDKFuncs.dll. Any ideas??? Regards, George Papadatos -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Antwort: Installation fails for KNIME nodes
Hi Paul, No worries! :) Regards, George -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] KNIME + Java RDKit library problem
Hi guys, I installed the RDKit nodes for KNIME (by copying the plugins folder manually, as I too had problems with the 'update from file' feature). Inspired by the source code that was bundled with the nodes, I tried to use the RDKit libraries in KNIME/Eclipse in order to develop my own nodes based on the RDKit toolkit. For example: import org.RDKit.RDKFuncs; import org.RDKit.ROMol; public class RDKitTest { public static void main (String[] args) throws Exception { ROMol mol = null; String smi = c1c1N; mol = RDKFuncs.MolFromSmiles(smi); System.out.println(mol.getNumAtoms()); } However, this script throws the following runtime error: Exception in thread main java.lang.UnsatisfiedLinkError: org.RDKit.RDKFuncsJNI.MolFromSmiles(Ljava/lang/String;)J at org.RDKit.RDKFuncsJNI.MolFromSmiles(Native Method) at org.RDKit.RDKFuncs.MolFromSmiles(RDKFuncs.java:65) In the Eclipse lib folder, I included all the .jar files and the RDKFuncs.dll. Any ideas??? Regards, George Papadatos -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] KNIME + Java RDKit library problem
Hi Thorsten and Greg, Many thanks for your replies. On 16 November 2010 20:16, Thorsten Meinl thorsten.me...@uni-konstanz.dewrote: Hi George, I installed the RDKit nodes for KNIME (by copying the plugins folder manually, as I too had problems with the 'update from file' feature). Inspired by the source code that was bundled with the nodes, Did you have the same problems as Paul, i.e. KNIME complaining about some osbi.bundles not being found? This is the error I get when I use the local update site: Cannot complete the request. See the details. Unsatisfied dependency: [org.rdkit.knime.source.feature.feature.group 0.9.0.0027626] requiredCapability: org.eclipse.equinox.p2.iu/org.rdkit.knime.bin.macosx.x86_64/[0.9.0.0027589,0.9.0.0027589] Unsatisfied dependency: [org.rdkit.knime.source.feature.feature.group 0.9.0.0027626] requiredCapability: org.eclipse.equinox.p2.iu/org.rdkit.knime.bin.linux.x86/[0.9.0.0027561,0.9.0.0027561] Unsatisfied dependency: [org.rdkit.knime.source.feature.feature.group 0.9.0.0027626] requiredCapability: org.eclipse.equinox.p2.iu/org.rdkit.knime.bin.linux.x86_64/[1.0.0.0027615,1.0.0.0027615] However, this script throws the following runtime error: Exception in thread main java.lang.UnsatisfiedLinkError: org.RDKit.RDKFuncsJNI.MolFromSmiles(Ljava/lang/String;)J at org.RDKit.RDKFuncsJNI.MolFromSmiles(Native Method) at org.RDKit.RDKFuncs.MolFromSmiles(RDKFuncs.java:65) In the Eclipse lib folder, I included all the .jar files and the RDKFuncs.dll. Any ideas??? In order to use code from native libaries Java needs to be told where to look for them. This is usually done by defining -Djava.library.path appropriately. If the application consists of Eclipse plugins (i.e. not just a bunch of jars), then there is some magic that loads the native libraries w/o needing to specify the explicitly. This is what happens wiht the KNIME plugins. So you either need to set the Java property or put your code in a plugin, which depends on org.rdkit.knime.types, and run an Eclipse application. Thanks to your tip and my colleague Nico Fechner, it is working now. For those with the same problem, you need this line at the beginning of your code: System.load(Path//to//the//dll//RDKFuncs.dll); Alternatively, as you suggested, you need to set the VM arguments appropriately in Eclipse, i.e. -Djava.library.path=Path//to///the/dll// and then add this line in the code: System.loadLibrary(RDKFuncs); Thanks again, George -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss