Re: [Rdkit-discuss] Code efficiency improvement
On 12/19/19 7:27 PM, Francois Berenger wrote: > > You should parallelize the processing of molecules, since each can be > worked at independently. > Well, for "a lot" of conformers on "a lot" of molecules that'll work if you have access to a compute cluster and/or are willing to pay for spinning up a bunch of VMs on amazon etc. Otherwise the best you can hope for is to run maybe two per CPU core. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
On 12/16/19 10:35 AM, Illimar Hugo Rekand wrote: > Fair point. > > But when working in the 100s and 1000s range of PDB-files it would be nice to > have some fewer steps when designing a pipeline. But what's the selection criteria? NMR structures are usually deposited with 20 models, do you want the ligand from every one? Only from the representative one? There's at least one PDB ID (forget which) with 3 stable conformers, i.e. model 1 is not the representative structure. Structures annotated by PDB will have HETATM instead of ATOM for non-standards and ligands, but if your files haven't been processed by them, all bets are off. And so on -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
On 12/16/2019 10:07 AM, Illimar Hugo Rekand wrote: Would it be viable to create a function where you could create a mol object from specific lines within a pdb-file? PDB file is simple text. There's any number of utilities to extract the lines you want, incl. a plain text editor, why spend time on reinventing the wheel? Dima ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Saving chains from PDB file
On 10/5/2019 10:34 AM, Maciek Wójcikowski wrote: Paolo and Chris, There actually is Rdkit function to do this very task: SplitMolByPDBChainId Why, though? -- It's a punch-card format with chain id in specific column, you just read the lines and sort them into buckets on line[X]. Unless you have NMR multi-model ones where you need to keep track of model/endmdl Dima ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code
On 8/14/19 3:42 PM, Nicola Zonta wrote: > Hi Greg, > yeah, coordgen issues are probably the best way to keep track of where we > perform poorly. So... is it "schrodinger/coordgenlibs"? I can open an issue and upload the sdf. (And yes, since we actually need all the protons with labels, I am painfully aware of how crowded the image becomes. Perhaps some day I'll manage to persuade my spectroscopist to use a 3D image instead -- her argument is that having to move the mouse to the other screen to rotate that thing all the time is too distracting.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code
PS I played with it a bit: the least ugly version is if you MMFF94-optimize it after rdkit.Chem.rdCoordGen.AddCoords() It's still far from perfect. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] drawing code
Hi all, for our workflow we need molecule drawings with all atoms (incl. Hs) explicitly labeled. And every once in a while we run into molecules that don't look so good. I wonder if it's worth collecting them somewhere, maybe another github repo under rdkit? -- for future developers of 2D layout algorithms. Here's out latest one for example. The thing about this one is, the molecule itself is not that bad, it not clear why the picture isn't any better. Enjoy. (Try it in OB if you think RDKit's pix is bad. ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu CID_10955174_alatised.sdf Description: application/vnd.kinar signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] High-quality matplotlib drawing?
On 8/9/19 12:42 PM, Wout Bittremieux wrote: > Alternatively I could export both the spectrum plot and the molecule to > SVG files and then combine them afterwards. But in that case it's not > possible to manipulate both elements in a single matplotlib figure. Yeah, that's what I meant but you're right: getting that to work in an interactive display in a notebook would be a hassle. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] High-quality matplotlib drawing?
On 8/7/2019 7:20 PM, Wout Bittremieux wrote: ... Unfortunately the quality of the molecule drawing is rather poor (see attachment; nonsensical spectrum and molecule). This seems to be true for non-SVG drawing in general, and unfortunately it's not really possible to combine SVG output with Matplotlib functionality. Hmm... have you tried wrapping the molecule svg in a `transform="translate(x,y)"` or something along those lines? Dima ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Read only first model of a pdb-file
On 5/29/19 3:31 PM, David Cosgrove wrote: > Biopython is excellent for extracting particular models from a PDB file. As > Dimitri suggests, you can then pass the result into your processing script. > It is quite straightforward to write the relevant PDB model to a string in > PDB format and parse with RDKit’s PDB reader, for example. Just to add more confusion, if you are working with PDB entries, you may also want to look at """ REMARK 210 CONFORMERS, NUMBER CALCULATED : REMARK 210 CONFORMERS, NUMBER SUBMITTED: REMARK 210 CONFORMERS, SELECTION CRITERIA : REMARK 210 REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : """ (the last one is the one I mentioned earlier) You would typically have "lowest energy" as selection criteria and "best reperesentative" is the minimized average of those submitted. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Read only first model of a pdb-file
On 5/29/19 8:19 AM, Illimar Hugo Rekand wrote: > Hey, RDKitters! > > > I am currently trying to figure out how to only read in the first model of a > pdb-file. I've designed a script that performs calculations on a per-atom > basis, and this is very slow when it tries to account for multiple models, > for example with a NMR-structure. Pre-process the PDB file to cut out the model you want. In the files annotated by PDB it should be the first model and I belive tehre is a REMARK something-or-other "best model in this ensemble". However this fails for multiple conformers in one file, there is at least one in PDB. (It's been a while since I did this so I don't remember the remark number, nor the multi-conormer entry id off the top of my head.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit Release 2018.09.2 available
On 2/22/19 5:01 PM, Markus Sitzmann wrote: > It is odd, but one thing I learned from using conda is, sometimes it helps > to ignore problems and wait for a bit and they might go away ... well, I > have similar experiences with maven :-) ... but most likely I do something > stupid which I don't see right now :-) Simple test is to make a clean one and install only rdkit and nothing else and see what happens. It's pretty common for packagers to do something-that-may-or-may-not-be-stupid and have a dependency on an specific version of some other package that depends on a specific version of another package that depends on... turtles all the way down. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Warning as error
On 1/21/19 1:42 PM, Jean-Marc Nuzillard wrote: > sys.stderr.write("Bad: %s\n" % (mol.GetProp("_Name"),)) > I know which bond has a problem but I still do not know in which molecule. Are you sure they all have _Name's? I'd just print the count outside of the try/catch block and ignore ones not followed by the warning message. (And run with #!/usr/bin/python -u and/or flush sys.stdout/stderr on every iteration for good measure.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] InChI to Mol to InChi
On 12/18/18 1:57 PM, JEAN-MARC NUZILLARD wrote: > Dimitri, how can alatis help me to find a first draft of 3D structure > for a few ten thousands of compounds from InChI strings? It won't, you have to feed it a 3D structure. However its InChI string and/or MOL block will give you the same 3D structure with the same atom labels on round-trip, *as long as you don't removeH/addH/recalculate conformers etc.* (At least on all molecules they tried and I think that includes the entire PubChem.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] InChI to Mol to InChi
On 12/18/18 11:34 AM, JEAN-MARC NUZILLARD wrote: > Molecules m1 and m2 have identical SMILES representations > but different InChI representations, which I find odd. *shrug* this is precisely why they came up with alatis: take a molecule in any input format, round-trip it through any cheminformatics program, there's 50% chance you'll get a different molecule out. That's how chemistry works when it meets compsci. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] InChI to Mol to InChi
On 12/17/18 4:50 PM, JEAN-MARC NUZILLARD wrote: > Is there any more deterministic procedure than the one of trying until > success is obtained? > > How do I determine the InChI string of a conformer obtained after > multiple embedding? This representation keeps 3D config: http://alatis.nmrfam.wisc.edu/ Generally speaking the problem with InChI is that the only *required* layer is the formula. Therefore *an* InChI string cannot be used to differentiate conformers, you need the InChI string with all the relevant layers and all the protons. https://www.nature.com/articles/sdata201773 -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 回复: 回复: Help: How to set timeout for the function namedRunReactants
On 11/15/2018 11:41 AM, Francis Atkinson wrote: > products = rxn.RunReactants([mol], maxProducts=1) > Boost.Python.ArgumentError: Python argument types in > ChemicalReaction.RunReactants(ChemicalReaction, list) > did not match C++ signature: > RunReactants(RDKit::ChemicalReaction*, boost::python::list) > RunReactants(RDKit::ChemicalReaction*, boost::python::tuple) > > I presume I am missing something, but what?! It doesn't list a candidate w/ the 3rd parameter so I'd say maxProducts is not exposed to python in your version. ICBW, though: c++ - boost - swig - python is not something I'd want to ever become familiar with... -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] svg: next question
On 11/02/2018 12:19 AM, Greg Landrum wrote: > On Fri, Nov 2, 2018 at 12:32 AM Dimitri Maziuk via Rdkit-discuss < > rdkit-discuss@lists.sourceforge.net> wrote: > >> Does anyone know where TH does >> >> >> >> come from? -- > > > assuming you're using the RDKit's MolDraw2DSVG class, that comes from here: > https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/MolDraw2D/MolDraw2DSVG.cpp#L53 Should it be changed to utf-8? I suspect any system where RDKit builds at this point is using that, and I believe technically element can contain unicode. E.g. you should be able to render your amino-acids with atoms labeled w/ Greek alphas, betas, etc. as per IUPAC. >> I have two SVGs generated by the same container running on >> the same linux host and one has the above, the other has >> >> >> > > No idea where that might have come from, but it's not MolDraw2DSVG Weird. I'll see if I get any more of those... -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Plotting values next to atoms
On 11/02/2018 07:59 AM, Eric Jonas wrote: > Hello! I'm trying to figure out if there's any known or sane way to > automatically plot numerical values adjacent to atoms using the rdkit > drawing machinery. Ideally I'd like to annotate certain atoms > programmatically with values. This draws atom labels: op = dr.drawOptions() for i in range( self._mol.GetNumAtoms() ) : op.atomLabels[i] = self._mol.GetAtomWithIdx( i ).GetSymbol() + str( (i + 1) ) HTH, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] svg: next question
Does anyone know where TH does come from? -- I have two SVGs generated by the same container running on the same linux host and one has the above, the other has The host is on utf-8, of course, and I can double-check the container thought I don't see it using anything else... certainly not cp1252. Any ideas? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] svg transparent background
On 11/01/2018 06:13 PM, Paolo Tosco wrote: > Hi Dimitri, > > the bit evidenced in red should allow you to achieve what you need: (no html here, so no red) thank you, I'll add > opts = dr.drawOptions() > opts.clearBackground=False -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] svg transparent background
Hi all, I finally got around to playing w/ drawing code in the current version and I like it, but how do I set background colour to transparent? dr = rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG( 1000, 1000 ) results in in the output, which for now I'll just post-process away. Thx, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Compilation Errors on RHEL7
On 10/24/2018 12:10 PM, Dimitri Maziuk via Rdkit-discuss wrote: > Yes. I once spent a couple of hours trying and ended up installing docer docker -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Compilation Errors on RHEL7
On 10/24/2018 11:32 AM, Oellien, Frank wrote: > Hi, > > I am trying to compile RDKit on a RHEL7 system using Python 2.7 and Boost > 1.68 ... > Has somebody already seen this error? Yes. I once spent a couple of hours trying and ended up installing docer and pulling a conda/rdkit container instead. I strongly recommend doing that, or finding a singularity version of the same. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [Question] Ok to switch to conda-forge for RDKit builds?
On 10/18/2018 11:20 AM, Greg Landrum wrote: > we're drifting a bit from the purpose of this thread, so I will give quick > answers: > > On Thu, Oct 18, 2018 at 6:00 PM Eric Jonas wrote: > >> Would there also be a pip install option ? Building from GitHub master >> is... Challenging. >> > > This is potentially a can of worms. I'd be happy to answer/discuss this in > a separate thread if someone starts one. Again, if there was a build container, you could point people to its Dockerfile for the full working build documentation. And you could have one of GitHub's CIs to spin it up and run `make && make test` in it on every commit as a bonus. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On 10/03/2018 03:23 PM, Peter St. John wrote: > Ah, well I suppose the follow up question is then does 'AddHs' add > hydrogens in a deterministic fashion? It should, what's not guaranteed is that it will be the right order. Obviously, if (using my previous example) L- and D-alanine is the "same molecule" for your purposes, then it doesn't matter. If it does mater, then alatis (the link I sent earlier) is the best option that I know of. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On 10/02/2018 03:32 PM, Peter St. John wrote: > I.e., if I create a new rdkit Molecule with rdkit.Chem.MolFromSmiles(xxx), > will the bond ordering always be the same? If not, does anyone know a a > robust way of specifying a bond within a molecule as a string-based > representation? https://www.nature.com/articles/sdata201773 -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Creating Mol Object From SD File
On 08/29/2018 01:54 PM, Chris Murphy wrote: > Hi, > > I finally realized that when passing an sdf string to Chem.MolFromMolBlock, > the Mol object will not retain the properties from the sdf. Ugh. You're right. +1 for a MolFromSdfBlock() that doesn't lose the properties. > Also, it seems that SDMolSupplier.next() does not work anymore? if sys.version_info[0] == 2 : next() elif sys.version_info[0] == 3 : __next()__ else : raise Exception( "Go! is looking better every day" ) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5
On 06/13/2018 01:10 PM, Greg Landrum wrote: > You don't excerpt the earlier message where I explained how to get things > working without needing X installed. Was that not clear? If you don't want > to have X installed but would still like to use the conda builds, you can > just install the two packages from the RDKit channel. There's more to it than that. Conda as packaged for a given distro has itself a set of dependencies. As I said before, *for example* installing conda on a centos 6 server will take it off the network at the next maintenance reboot, unless the installer knows they're doing and watches the whole ting very carefully. No, it's not your problem, you're doing the best you can, and thank you for that. But the end result is that ready-made builds are getting increasingly too bloated to be of use, and custom builds are too "non-trivial" to attempt. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5
On 06/13/2018 11:44 AM, Geoffrey Hutchison wrote: > > No, you can compile RDKit yourself if you don't want to use X11 features. You > wanted to install through conda, which has a set of packages for 'most use' - > YMMV. Sadly, MM does indeed V. On my box I can't, not without also compiling boost myself -- and I haven't looked further. Wouldn't be surprised if it takes me all the way to compiling GNU Compiler Collection myself too. Some day I'll get a round tuit to set up an alpine build VM and see if it compiles there... so I can roll it into a reasonable-sized docker container, but compiling it on my desktop is simply not worth my time. (And our compute nodes are the same or older as my desktop, so if it doesn't work on my box, we can't deploy it anywhere.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5
On 6/13/2018 10:06 AM, Greg Landrum wrote: Note that my answer assumes that there is a reason that you don't have X11 installed on your linux box. If that's not the case, you should be able to fix things "more easily" by installing X Quite frankly, this is rapidly becoming unusable as a software platform. I need to install X11 to UUF-optimize a MOL? Seriously? E.g. on centos anaconda installs NetworkManager (why?) which comes in "enabled at boot" but not configured, so next time you reboot, perhaps weeks later, tada! -- you've lost the network. And don't get me started on having several versions of boost coexist... Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] convert a smiles file to a xyz file
On 5/23/2018 10:23 AM, Chenyang Shi wrote: A separate question is that is the converted molecular structure from SMILES the same as that taken from a crystal structure? Provided there's no undefined/different stereochemistry on SMILES side, no quirks with added protons, and so on and so forth... for a small simple molecule... maybe. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Atom mapping
On 05/09/2018 10:27 AM, carlo del moro wrote: > Dear All, > > we would like to know if it is possible to map the atom's ID of a SMILES > represented substructure to the atom sequence of a ligand contained in a > pdb file. This in order to get the spatial coordinates related to such > substructure. http://alatis.nmrfam.wisc.edu/ will generate unique stable IDs from a 3D structure, and output the old->new ID map. It'll take a PDB, you'll have to convert your SMILES into a 3D .mol. ALATIS atom IDs should be the same in the two maps, *provided both inputs describe the exact same ligand*. (It's the *substructure* bit that I'm not entirely sure about.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] mol file parsing, 3D or 2D
On 2018-01-17 10:25, Jason Biggs wrote: For the case in question, I find that if I read in a mol file containing 2D coordinates, and I skip the sanitization step altogether, then the 3D embedding algorithms fail. Well, yes, as I mentioned in the other thread: the only way you can get it to work reliably is if you start with 3D coordinates to begin with. Otherwise your users have to get in there every once in a while and decide which way to slice that cake they don't get to eat. ;) Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] mol file parsing, 3D or 2D
On 2018-01-16 22:46, Greg Landrum wrote: It might be worth thinking about adding an option to the aromaticity perception code to maintain the original bond types and just set the "isAromatic" flag on the bonds. This is how it's modeled in mmCIF chem. comp. It may or may not come from openeye they were using originally to process their ligands/chem comps. From programming perspective it's pretty annoying since you have to remember to add an extra if stanza to all your code, queries, etc. What's wrong with keeping a copy of the original molecule around? -- I'm not sure I get the "I want to sanitize and keep the original bonds too", it sounds too much like the proverbial cake. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
On 01/15/2018 02:43 PM, Tim Dudgeon wrote: > Could there be something in a more general project to bridge the > compound (mol/smiles), sequence (protein/nucleotide seq + alignments) > and structure (pdb/mmcif/mmtf) worlds? FWIW PDB builds everything up from structure because they can derive bonds from the coordinates and that's the only way you can do it in the code. Without bonds, trying to link compounds in a sequence doesn't really work even if you have two cysteins in a bog standard protein sequence, with generic compounds it gets too hard fast. PDB has in the mmCIF chem. comp. model "leaving atom flag" that marks *a* bonding site but it doesn't tell you what kind of bond can form there, nor what to do if there's more than one. You need a whole lot of other code to figure out how to link two compounds into a sequence. And then there's structure calculation that I don't know if there's anything that works on not proteins, or can predict disordered regions well etc. If anyone's counting votes, pretty 2D depictions get mine. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Issue with the latest RDKit DB build
PS the real question is why you're trying to run psql built with a newer toolset when there's 2 perfectly good ones available: one from the distro vendor and one from postgres repos. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Issue with the latest RDKit DB build
On 12/29/2017 01:57 PM, Paul Emsley wrote: > On 29/12/2017 19:01, Drew Gibson via Rdkit-discuss wrote: >> psql: error while loading shared libraries: libncursesw.so.6: cannot >> open shared object file: No such file or directory > > install the ncurses-libs package (I have > ncurses-libs-6.0-8.20170212.fc26.x86_64 on fedora) On centos 7 that's not gonna get you libncursesw.so.6: $ rpm -q -l ncurses-libs /usr/lib64/libform.so.5 /usr/lib64/libform.so.5.9 /usr/lib64/libformw.so.5 /usr/lib64/libformw.so.5.9 /usr/lib64/libmenu.so.5 /usr/lib64/libmenu.so.5.9 /usr/lib64/libmenuw.so.5 /usr/lib64/libmenuw.so.5.9 /usr/lib64/libncurses++.so.5 /usr/lib64/libncurses++.so.5.9 /usr/lib64/libncurses++w.so.5 /usr/lib64/libncurses++w.so.5.9 /usr/lib64/libncurses.so.5 /usr/lib64/libncurses.so.5.9 /usr/lib64/libncursesw.so.5 /usr/lib64/libncursesw.so.5.9 /usr/lib64/libpanel.so.5 /usr/lib64/libpanel.so.5.9 /usr/lib64/libpanelw.so.5 /usr/lib64/libpanelw.so.5.9 /usr/lib64/libtic.so.5 /usr/lib64/libtic.so.5.9 /usr/lib64/libtinfo.so.5 /usr/lib64/libtinfo.so.5.9 -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDkit and Pubchem
On 12/01/2017 11:55 AM, Tim Dudgeon wrote: > In what way? Given a single PubChem compound or substance ID you just > want to pull the smiles or molfile into RDKit? Furthermore what's your definition of "a compound"? If it includes stereochemistry, pubchem usually has 3d mol files, except where it doesn't. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Transparent background for 2D molecule images
On 11/20/2017 04:45 PM, Markus Metz wrote: > opts.clearBackground=False > > or > > opts.setBackgroundColour((1,1,0)) > > are not working for me. What's your output format? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] ImportError: No module named rdkit
On 09/14/2017 03:04 PM, Markus Sitzmann wrote: > Not on Centos 6 - Docker requires Centos 7 for the host system. You can't win... :( -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] ImportError: No module named rdkit
On 09/14/2017 02:58 PM, Andrew Dalke wrote: > If only Greg got as much money for long term RDKit support as Red Hat > gets for long term RHEL support. :) Yep. But an rdkit docker container might be feasible. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] ImportError: No module named rdkit
On 09/14/2017 01:41 PM, Riccardo Vianello wrote: > True, but there shouldn't be any strong need for using the system python > for running application software. Python 2.7 (together with python 3) has > been available to RHEL6 subscribers since almost five years, as part of the > Red Hat Software Collections (also available to non-subscribers from the > upstream CentOS/Fedora repositories). A detailed discussion is available > from this post > http://www.curiousefficiency.org/posts/2015/04/stop-supporting-python26.html > > And the anaconda python distribution of course provides another alternative. All great when it's one computer and that one's your own personal laptop. > # yum ls \*python27\* > ... > python27.x86_642.7.13-2.ius.el6 > @salt-2015.8 > python27-babel.noarch 0.9.4-5.2.el6 > @salt-2015.8 > python27-chardet.noarch2.2.1-3.el6 > @salt-2015.8 > python27-crypto.x86_64 2.6.1-4.el6 > @salt-2015.8 > python27-futures.noarch3.0.3-2.el6 > @salt-2015.8 > python27-jinja2.noarch 2.8.1-2.el6 > @salt-2015.8 > python27-libs.x86_64 2.7.13-2.ius.el6 > @salt-2015.8 > python27-markupsafe.x86_64 0.11-11.el6 > @salt-2015.8 > python27-msgpack.x86_640.4.6-2.el6 > @salt-2015.8 > python27-psutil.x86_64 5.2.2-1.ius.el6 > @salt-2015.8 > python27-pycurl.x86_64 7.19.0-10.el6 > @salt-2015.8 > python27-requests.noarch 2.6.0-4.el6 > @salt-2015.8 > python27-six.noarch1.9.0-3.el6 > @salt-2015.8 > python27-tornado.x86_644.2.1-3.el6 > @salt-2015.8 > python27-urllib3.noarch1.10.2-2.el6 > @salt-2015.8 > python27-zmq.x86_6414.5.0-3.el6 > @salt-2015.8 ... > python27-babel.noarch 0.9.6-7.sc1.el6 > centos-sclo-rh > python27-pip.noarch9.0.1-1.ius.el6 > salt-2015.8 ... > python27-python.x86_64 2.7.13-3.el6 > centos-sclo-rh > python27-python-babel.noarch 0.9.6-7.sc1.el6 > centos-sclo-rh > python27-python-jinja2.noarch 2.6-10.sc1.el6 > centos-sclo-rh > python27-python-libs.x86_642.7.13-3.el6 > centos-sclo-rh > python27-python-markupsafe.x86_64 0.11-11.sc1.el6 > centos-sclo-rh > python27-python-pip.noarch 8.1.2-1.el6 > centos-sclo-rh ... Any guesses as to how many things will break in my infrastructure manglement setup (saltstack) if I enable Software Collections and some of those get updated from SCL and some: from Salt? And don't get me started on PIP. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] ImportError: No module named rdkit
On 09/14/2017 10:43 AM, Greg Landrum wrote: > Just to do some expectation management: python 2.6 is pretty ancient and > there's no guarantee that all of the RDKit code will work with it. Python > 2.7 is the minimum version that we "officially" support. It's a very good > idea to update. Just FYI: python 2.6 is the system python on (at least) RHEL-6 family of linux distros that will be officially with us until June 30, 2024. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Non-redundant database of molecules
On 09/13/2017 11:46 AM, Markus Sitzmann wrote: > The case that you have 3D information available for a molecule dataset is > rare, if you want it trustworthy it gets even worse than that. And what is > the point then to generate the configuration of a molecule first if you can > not trust that either? Veering further off topic, do you even care in the first place? E.g. if your molecule always exists as a mixture of isomers, except in some megabuck-per-microgram painstakingly created reference samples, a 3D-based system will represent it as two distinct molecules. Whereas you want it represented as one. Last I looked PDB Ligand Expo had two different benzenes. Their software doesn't (didn't?) do the circle version so they don't have the third one. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Non-redundant database of molecules
On 2017-09-13 10:17, Markus Sitzmann wrote: Canonical SMILES are only a very rough approximation for "unique molecule" as they usually don't work well for tautomeric forms of compound. InChI or Standard InChI is much better although also not perfect. ALATIS I linked to above does impose a stable consistent ordering for everything including hydrogens. The downside is it's garbage in - garbage out: you need to start with a 3D structure, otherwise it has an option to addHs and gen3D but no guarantee it'll generate the one you want. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Non-redundant database of molecules
On 2017-09-13 09:56, TJ O'Donnell wrote: Let the database do the work for you. Create a canonical SMILES column and/or InChI column and declare them to be unique. As you insert new rows, postgres will let you know if there is already a row with the same SMILES or InChI. Here's some help on how to handle that. https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT One of the problems with this is it normally fails on the first conflict whereas users very often want a list of all conflicts to look at and see what's up. The above mentions a "special excludes table" in passing but I don't see anything about accessing it or what it actually contains. If you don't care what molecules get dropped or why, "on conflict ignore" should work very nicely. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Non-redundant database of molecules
On 2017-09-13 05:13, Wandré wrote: Compare if the SMILES as already inserted is easy (text compare), but, compare fingerprint of molecule... Here's one option: http://alatis.nmrfam.wisc.edu/ -- you can use string comparison on the resulting inchi string. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Is there a Ubuntu ppa or some repository with the latest rdkit release as .deb ?
On 2017-06-22 01:36, Francois BERENGER wrote: make deb # in rdkit source tree Some people might ask for a make rpm target also. You'd have to track any changes that redhat, canonical, suse, and whoever else's out there might make to e.g. filesystem layout, linked libraries, python and so on. There is snap packages and AppImage packages. I've no idea if either is suitable for shared libraries with python libraries etc. I'd just grab the newer source .deb and try to build it on your system. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] atom indexes and order of atoms in the input file
On 06/15/2017 01:14 PM, Brian Kelley wrote: > Sorry to hear about the flooding. >> Unfortunately we got flooded day before yesterday and the servers doing >> the crunching are currently down. I should have mentioned that the server (URL is in the article), which I'll hopefully get back up today, will output a MOL file with atoms ordered as per the article. The downside is it only works on 3D MOLs. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] atom indexes and order of atoms in the input file
On 06/15/2017 10:13 AM, Maciek Wójcikowski wrote: > Hi, > > If you really want to rely on the order of atom you can renumber them > anyhow you like with Chem.RenumberAtoms() > http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms > There is also a function which returns canonical order of atoms for > you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ > from the canonical smiles, although that might have changed. https://www.nature.com/articles/sdata201773 Unfortunately we got flooded day before yesterday and the servers doing the crunching are currently down. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list
On 2017-06-10 07:42, Chris Swain wrote: This sounds like the situation where a database might be a better option, tuned to store fingerprints in RAM? The issue is how much programming time it will take, how much that time is worth, and how many times the solution will be reused. A clever coding solution could be preferable for other reasons, like a programming exercise. If it's a one-off and you just need it done and move on, throwing more hardware at it is often the most cost-effective solution. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list
On 2017-06-09 08:12, Alexis Parenty wrote: Dear Greg and Brian, Many thanks for your response. I was also thinking of your streaming approach! I think the RAM of most machine would deal with lists of 100K mol so we could put the threshold higher than 1000. Actually, I was thinking to monitor the available RAM and only start processing the matrix and clearing the list when less than 20% of RAM is left. This way, the best machines could skip the clearing process and gain time. What do you think? Take $100, buy a 200GB SSD, set it up as the swap space, don't worry about the RAM. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Molecule representation
On 03/07/2017 05:42 PM, Markus Metz wrote: > Dear Stephane: > Thank you very much. > I will give it a try. An alternative: import os import sys import time import threading PYMOL_PATH = "/SOME/PLACE/lib64/python" sys.path.append( PYMOL_PATH ) import pymol def make_image( infile, outfile ) : pymol.pymol_argv = ['pymol','-qc'] pymol.finish_launching() cmd = pymol.cmd cmd.load( infile ) cmd.hide( "everything" ) cmd.show( "sticks" ) cmd.util.cbaw() cmd.set( "cartoon_discrete_colors", 1 ) cmd.set( "ray_opaque_background", "off" ) cmd.set( "ray_trace_mode", 1 ) cmd.set( "antialias", 2 ) cmd.set( "ray_trace_color", "grey" ) cmd.set( "cartoon_fancy_helices", 1 ) cmd.set( "cartoon_side_chain_helper", "on" ) cmd.png( outfile, width = 800, dpi = 300, ray = 1 ) while threading.active_count() > 2 : time.sleep( 2 ) cmd.quit() HTH, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PBF precision is to high to determine good planarity
On 2017-03-02 04:37, Guillaume GODIN wrote: > Based on the precision of the coordinates (in rdkit sdf files it's 4 > digits) can we infer the precision on the PBF value based on that ? Only if you *know* the values are actually accurate to 4 digits and not e.g. were printed as "%.4f" just because the programmer thought it was a "reasonable" mask. :( Dimitri -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/29/2016 02:35 PM, Peter S. Shenkin wrote: > Dimitri, > > You were the one who suggested that all the structural depictions be > generated. > > I, in contrast, suggested that only the ones users need to look at need be > generated. I further suggested that these would only constitute a small > fraction of those in a large DB. My objection was to using numbers like > ... for 92877507 > structures (current size PubChem Compound): > 1s per structure = 1074 days (~3 years) > 100 ms per structure = 107 days > 1ms per structure = 25 hours as if they actually mean something. I responded that *if* the requirement is to generate all 100M depictions, making the code faster on a single CPU core is rarely the cost-effective solution. That was a purely academic "if" because I don't believe that regenerating all the depictions at once on a regular basis is a realistic use case, either. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/29/2016 12:43 PM, Peter S. Shenkin wrote: > Of the > billion structures, only a fraction will ever be visualized, so a > memoization strategy sounds reasonable, which in turn implies that you want > rapid response when an unstored structure has to be generated. :) Now I have a mental picture of a phd student tied to a chair with his eyes taped open, forced to look at a billion depictions for 10ms each. Pictures are only useful if you have a human looking at them. Looking is only useful if you do it long enough for the brain to process it. The whole "what if we need a billion depictions all at once" implies that you have a billion users looking at them all at once. If you don't, then rapid response is a very interesting academic exercise but its practical usefulness might be somewhat questionable. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/29/2016 12:43 PM, Peter S. Shenkin wrote: > Look, it all boils down to (CPU) time, and time is money. It's very hard to say how much a single cpu core actually costs 'cause they don't make them anymore. Similarly, our small molecule SVGs average at around 4K, storing 10M of those will require about 40GB and they don't make disks that small anymore either. 64GB USB stick is twenty bucks. I've no idea how much I actually cost our funding agency per hour, nor how many hours it would take me to even figure out if a piece of code of any kind of complexity can be optimized. But I can guarantee you that a) it's much more than $20, and b) hiring a competent programmer will cost you more than buying a "better computer" and is not guaranteed to result in any appreciable speed-up. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 2016-12-29 07:19, John M wrote: > For why you need sub-second depiction consider these times for 92877507 > structures (current size PubChem Compound): > > 1s per structure = 1074 days (~3 years) > 100 ms per structure = 107 days > 1ms per structure = 25 hours The Dilbert answer is buy a better computer. The serious answer is if you run millions of jobs sequentially on a single core, your problem is not how long a single job takes: no matter how fast you can make it, it will only scale linearly. There will be 1B compounds in PubChem two years from now and your painstakingly crafted 1ms/structure code will still take 3 years, the only difference is you get garbage depictions. Condor can be persuaded fire up 92877507 EC2 VMs and run all of those in parallel -- provided you're willing to pay Amazon for it of course. If you can code the algorithm into GPGPU/SIMD parallel flow, you can probably push it into an FPGA and then get that baked into ASICs in China -- they'll give you discount if you order more than ten thousand. That gets you a $20 USB dongle that will run them at umpteen K/second. And so on. If you don't want quality depictions because bad ones will work just fine for your needs, that's a perfectly good argument. If you don't want them because generating 10M sequentially on a single core will take a long time, that's BS argument. Dima -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/15/2016 04:23 PM, Peter S. Shenkin wrote: > Obviously, it doesn't matter if you're rendering just few structures, but > in a scenario where you might be downloading a hundred SMILES from a DB and > displaying them on a grid in a browser, computing the 2D depictions on the > fly, waiting 5 sec for a page refresh wouldn't be great. Maybe not, but depending how the browser lays out the grid, it may take 5 seconds anyway. My recommendation for that use case would be to pre-generate the images and store the URLs in that database. Which is what we do here. ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 12/15/2016 02:53 PM, Peter S. Shenkin wrote: > Looks good, but maybe too slow for production use... (?) I wonder what kind of production use would require sub-second wall clock time for this. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Extracting SMILES from text
On 12/02/2016 03:12 PM, George Papadatos wrote: > Here's a pragmatic idea: ... would it not be safe to > assume that *any *word containing more than 4 'C' or 'c' characters would > only be a SMILES string? pneumonoultramicroscopicsilicovolcanoconiosis -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] comparing two or more tables of molecules
On 11/29/2016 11:56 AM, Chris Swain wrote: > However I’ve found that the success is very much dependent on the > fact 1 described by Greg, get all the structures standardised then comparison using canonical SMILES or InChi seems to work fine. +1. Essentially you need to get standardized representation of all the properties you consider relevant and produce a unique hash of that. Doesn't matter if it's a SHA-1 string or some graph-based magic or a matrix voodoo. (String comparison is of course easier.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] comparing two or more tables of molecules
On 11/28/2016 10:25 AM, Stephen O'hagan wrote: > Has anyone come up with fool-proof way of matching structurally equivalent > molecules? This is somewhat convoluted and there is no proof that it's fool-proof. A few years ago we had good results from running graphpowerhash() function here: http://madgik.github.io/madis/aggregate.html#module-functions.aggregate.graph on the PDB ligand database. The parameters were - atom1, atom2 IDs (names) as node1, node2. - Atom stereo (R, S, N), aromatic (y/n), and "leaving atom" (y/n) for the atoms as node1_details, node2_details (packed into single string with jpack() function: see http://madgik.github.io/madis/row.html). Looking at it now, I don't think nodeN_details parameter needs to include atom's "aromatic" flag. - Massaged bond type and bond stereo (E, Z, N) as edge_details. Also packed into a string as above. PDB chem comp model has bond type as SING or DOUB with a separate yes/no "aromatic" column. We changed it to AROM for the ones where that was a yes. The basic model is a list of bonds with atom1, atom2, and type, and a list of atoms with stereo, aromatic, and "leaving" flags -- the last one is "Y" for atoms that "go away" when forming a bond. The algorithm itself, as far as I know (I am not the author), takes the two "matrices" representing the molecule "graphs", computes their largest eigenvalue/eigenvectors, and compares those. We have no proof that it's 100% correct, but all duplicates it found in the PDB ligand expo at the time were genuine. Enjoy, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)
On 11/17/2016 02:41 PM, Peter S. Shenkin wrote: ... > I have to say that Marvin displays the connectivity of the structures much > more > clearly than RDKit. Philosophically speaking, there must exist molecules for which a legible 2D projection is simply not possible. PubChem CID 2537 comes close. Marvin doesn't do much better on this one even if you don't turn on all the labels. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)
On 2016-10-26 23:39, Peter S. Shenkin wrote: > Hey, by the way, my agenda is trying to understand all this. (Using python syntax instead of ML) Recommended by TFM: from "http://www.w3.org/2000/svg"; import * All svg names should work with or without package qualifier: point(), line(), etc., as well as svg.point(), svg.line(), ... Rdkit way: import "http://www.w3.org/2000/svg"; as svg All svg names must be prefixed: svg.point(), svg.line(). Using unqualified point() should throw an error. (Unless there's another 'point' in the name resolution chain, yadda, yadda, yadda.) Unfortunately I find the fact that a lot of software out there doesn't get it right entirely unsurprising. :( Dima -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)
On 10/25/2016 11:21 AM, Peter S. Shenkin wrote: > Hi, Hongbin, > > Thanks. Indeed. svg2.svg, when renamed to svg2.html, shows the correct > image in Chrome. svg.html shows garbage. > > Still, it would be good to be able to create a real .svg file from RDKit. OK, you made me look and I learned something today. Mozilla claims valid SVG must include the namespace declarations (https://developer.mozilla.org/en-US/docs/Web/SVG/FAQ) citing this document: https://jwatt.org/svg/authoring/#namespace-binding There it states """ http://www.w3.org/2000/svg"; ... Be careful not to type xmlns:svg instead of just xmlns when you bind the SVG namespace. This is an easy mistake to make, but one that can break everything. Instead of making SVG the default namespace, it binds it to the namespace prefix 'svg', and this is almost certainly not what you want to do in an SVG file. A standards compliant browser will then fail to recognise any tags and attributes that don't have an explicit namespace prefix (probably most if not all of them) and fail to render your document as SVG. """ Sure enough, rdkit's files start with """ http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index
On 2016-10-24 19:04, Peter S. Shenkin wrote: > My second conclusion (based on the .svg-file experiments) is that it's > not an iPython problem and, since you see the same thing on Firefox, > it's unlikely to be a Chrome problem. Well, what I got it from (Greg's I think) tutorial that if you don't strip off the svg:'s the image won't show up in whatever he's using. In my case (firefox) the image won't show up if you do. Programs that read XML correctly (the gimp, inkscape, eog to name a couple) will show the image either way. So to your original question: no rhyme or reason that I know of to when you should or should not strip off the svg:'s. Dima -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 2D drawing with atoms labeled by index
On 10/24/2016 04:39 PM, Peter S. Shenkin wrote: > Or is it > rather because chemists in your target audience will be thinking of the > first atom in, say, a structure from an sd file as atom #1? That > 2. Regarding the last line, most of the RDKit code I've seen in past > examples displays the molecule using code like the following. When is it > necessary/not necessary to remove the "svg" string from the results of > GetDrawingText()? Not sure: it's a namespace, I'm assuming ipython can't deal with xml namespaces. Properly written programs should show it either way, unfortunately my target viewer is firefox (it's a web application and the user's default browser is firefox) and firefox isn't one of them. Without svg:'s it'll show the file as xml text instead of the image. HTH -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] 2D drawing with atoms labeled by index
Since you already got your answer I'll just post this for posterity: import sys import rdkit import rdkit.Chem import rdkit.Chem.AllChem import rdkit.Chem.Draw import rdkit.Chem.Draw.rdMolDraw2D mol=rdkit.Chem.SupplierFromFilename(sys.argv[1],removeHs=False).next() dr=rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG(800,800) dr.SetFontSize(0.3) op = dr.drawOptions() for i in range(mol.GetNumAtoms()) : op.atomLabels[i]=mol.GetAtomWithIdx(i).GetSymbol() + str((i+1)) rdkit.Chem.AllChem.Compute2DCoords(mol) dr.DrawMolecule(mol) dr.FinishDrawing() svg=dr.GetDrawingText() -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to find the idx of hydrogens bonded to a specific atom
On 10/13/2016 12:12 PM, Paul Emsley wrote: > Are you sure? I use HasSubstrMatch to match hydrogens. Perhaps this http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg05897.html may be useful? If OP's looking at crystal structures, they're likely dealing with pdb data model where all hydrogens are explicitly present and indexed. I wonder if they stay that way throughout the steps leading to (and past) the smarts match. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The RDKit and modern C++
On 09/29/2016 04:14 PM, Paolo Tosco wrote: > Hi Dimitri, > > That can be avoided building the RPMs on older RHEL distributions using the Red Hat Developer Toolsets that Greg and others mentioned. I know. I'll believe it when I see it. Based on prior experience, boost, in particular, is not something I'd care to support in that environment. That said, I learned more than one programming language as a "personofessional development" exercise, so I am all for that. ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The RDKit and modern C++
On 09/29/2016 10:16 AM, Greg Landrum wrote: > My hope is that all of those people will be able to keep happily > using a reasonably up-to-date version of the RDKit. Well, that's kinda the point: there are no RPMs that let you run binaries linked to GLIBC_2.17 on GLIBC_2.5, nor compile c++-14 code with c++-03 compilers. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The RDKit and modern C++
On 2016-09-29 00:57, Markus Sitzmann wrote: > I get the feeling, RH/Centos 6 becomes the next XP kind of story - to > many legacies that make the update impossible or very hard. Also docker, > a great technology that could mitigate this problem, is very painful > under RH/Centos 6. systemd, corosync/pacemaker, apache 2.4, gnome.whichever are some of RH7's "exciting new technologies" a lot of us don't want. Dimitri -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 2016-09-26 18:19, Peter S. Shenkin wrote: > 2D drawing code is tough. The 90/10 rule applies: the last 10% of > I think for the present purposes what we need is something correct, > robust and legible, and of course the example shown does not exhibit > that. (But I don't know what the starting SMILES is, so I don't know > whether the 7-bonded C is due to a bad SMILES, in which case all bets > are off.) That was actually a "kudos to RDKit" post. I have an application where I need a drawing with all Hs and all atom labels, and molecule description in mmCIF(-ish) format. I use RDKit for the latter because of OpenBabel's stereochemistry "model", and OpenBabel for the drawings because 90% of the time it generates better layouts. THE comment is that RDKit's layout algorithm appears to be more stable: for this molecule OB generated a "better" picture from the original SDF downloaded from PubChem, and that complete mess when we re-ordered the atoms. RDKit generated the same picture in both cases. only one is a mirror image of the other. Dima -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 3
On 09/26/2016 04:42 PM, Peter S. Shenkin wrote: > Also, the C attached to H44 has an extra H (its own or someone else's?) > superimposed upon it. I wonder if 2D drawing code should really work the same way as the 3D conformer generation: generate a bunch of candidate layouts and pick the one(s) with least clashes/overlaps. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] drawing code take 3
On the plus side, when drawing PubChem CID 5057 from a 3D SDF before and after our canonicalization, RDKit draws a mirror image, but otherwise the same 2D structure. OB's "after" version is attached: enjoy the 7-bond carbon in the ring. ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The RDKit and modern C++
On 2016-09-24 01:25, Greg Landrum wrote: > https://medium.com/@greg.landrum_t5/the-rdkit-and-modern-c-48206b966218?source=linkShare-d698b3fa9f7-1474698147 > > This is a big and important change and I'd love to hear whatever > feedback members of the community may have. Please comment either on the > blog post or here. What are the concrete benefits -14 will bring to the toolkit? C++ committee has long been criticized for attempting to solve the wrong problems every time and every time coming up with solutions that are reasonable, logical, and wrong. We've been forced to update our code several times due to g++ updates being incompatible with the "language formerly known as C++" and if that's the case with RFKit, then you don't have much choice. However, if I were rewriting the code for the sake of making it "cleaner" or "more maintainable", I'd be seriously considering go or objective c or maybe gnat even. At this point I can only recommend C++ to Comp. Sci. students in the Programming languages unit; as an object example of where good intentions usually end up. I certainly wouldn't recommend it to chemists as a "modern tool", or even a good tool. Just mu $.02 as "it professional". Dimitri -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] FindChiralCenters in MOL/SDF files howto
On 09/14/2016 02:23 PM, Dimitri Maziuk wrote: > lbl=mol.GetAtomWithIdx(s[0]).GetSymbol() + str(s[0]+1) > print label, ":", s[1] ^^^ Should be lbl -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] FindChiralCenters in MOL/SDF files howto
I don't know if this fits in the Getting Started, or Cookbook, or if TPTB decide to wikify the docs and it should go there, but anyway, here goes. With thanks to everyone responsible, of course. It does need corrections/clarifications. - # MOL/SDF FindChiralCenters howto ```python # list chiral centers in molecule sts=rdkit.Chem.FindMolChiralCenters(mol,includeUnassigned=True,force=False) for s in sts: lbl=mol.GetAtomWithIdx(s[0]).GetSymbol() + str(s[0]+1) print label, ":", s[1] ``` The values for `s[1]` are 'R', 'S', or '?' for unknown/unassigned. So how do you get rid of question marks? ## Best: use coordinates 3D coordinates are stored in `rdkit.Chem.rdchem.Conformer` and every `rdkit.Chem.rdmol.Mol` has at least one -- **question** is that always the case? ```python # get molecule's list of conformers, # if the 1st one has 3D coordinates, # flag chiral centers based on those c=mol.GetConformers() if c[0].Is3D(): rdkit.Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol) ``` ## CTFiles parity flags CTFile (MOL, SDF) can include stereo flags as either bond annotation ("up" or "down"), or atom parity annotation ("cw", "ccw", or undefined), or both. According to the specification, atom parity should be ignored when reading the file, so bond annotation is the only one that actually matters. RDKit will do the right thing and read the flags from the bond block (and write them out when exporting MOL blocks). The problem is there's plenty of broken software that does not do that and populates atom parity flags instead. RDKit will read the atom parity flags also, and store them in `molParity` property of the `rdkit.Chem.rdchem.Atom` (but not turn them into chirality tags). The values are * 1 for clockwise, * 2 for counter-clockwise, and * 3 for unspecified. **Note** that the winding is relative to atom order in the atom list. If you did anything to the molecule that changed the original atom order, the flags are most likely no longer valid. ```python # assign chiral tags from atom parity flags for a in mol.GetAtoms(): if a.HasProp("molParity"): try: parity=int(a.GetProp("molParity")) except ValueError: parity=None if parity==1: a.SetChiralTag(rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW) elif parity==2: a.SetChiralTag(rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW) elif parity==3: a.SetChiralTag(rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED) ``` ## Still unassigned That may be deliberate. For example, PubChem CID 602 (as of the time of this writing) is for "DL-Alanine" describing *either* D- or L-Alanine. In this case "unspecified" is the correct value for chirality tag. (And in the case of "2D" SDF it will be; unfortunately PubChem software will generate a "3D" SDF for CID 602 and it will have a single conformer: L-Alanine.) - -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Has3D?
On 09/13/2016 05:24 PM, Rich Lewis wrote: > Hi Dimitri, > > 3D geometry information for rdkit `Mol`s is stored as `Conformer`s. These can be accessed with the `GetConformer` method, which takes a confId as an argument. If you have loaded the molecule from a mol/sdf file, there should be a single conformer with ID 0, with the coordinates from the file. `Conformer`s have a `Is3D` method, which *should* do what you want. It does. "There's conformer[0]" is the bit I was missing. It seems to be there for 2D MOLs as well with Is3D() -> False. Thank you. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Has3D?
Hi all, a quick one hopefully: is there something like Mol.Has3D()? I'm looking at "2D" vs "3D" MOL files and the best I can tell in the 2D ones Z coord is always 0 whereas in 3D there may (should?) be a non-zero one. Is there a quick way find out after reading in a MOL if it's one or the other? TIA, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AddHs()
On 09/10/2016 12:08 PM, David Cosgrove wrote: > On the subject of the documentation, I would encourage you to find the > GettingStartedWithRDKit.rst in the Docs directory, find somewhere where > this discussion fits, add it, and send the new version to Greg. If everyone > did this every time they spent time working out how to do something, the > documentation would grow very rapidly and by definition grow fastest in > areas that people are actively using. We don't need to wait for Greg to do > it all! He's busy enough as it is, and let's face it, writing docs is dull > and I'm sure he would appreciate the help. GitHub does have a wiki. One has to become a "collaborator" to get edit permissions, AFAIK it doesn't do fine-grained, but what it does should be good enough. The wiki is a git repo itself so it could be pulled and integrated into release builds etc. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF and FindMolChiralCenters()
On 09/10/2016 05:20 PM, Paolo Tosco wrote: > https://gist.github.com/ptosco/ab668ad5c35875d8c47e0e6be9e37e79#file-set_chirality_from_atom_parity_flags-ipynb Nice. I do have 3D SDFs, that is part of the reason I'm going though this exercise, but looking at 2D SDF for PubChem's L-alanine, they do have atom parity set even though there's no 3D coordinates. So I'll probably go with your solution instead of TagsFromStructure b/c it'll work for both 2D and 3D MOL files. (elif p == 3 -> rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED, of course) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF and FindMolChiralCenters()
On 09/10/2016 04:34 PM, David Cosgrove wrote: ... > Also, the atoms in a molecule should have the property _CIPRank set, you > might be able to do something with that. Possibly, but since the non-typo'ed function seems to do the trick, that's good enough for me. Thanks -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF and FindMolChiralCenters()
Oops. AssignAtomChiralTagsFromStructure() does indeed work. >> If your file has 3D coordinates, AssignAtomChrialTagsFromStructure Good to see I'm not the only one with lysdexic fnigers. Apologies for the noise: ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF and FindMolChiralCenters()
On 09/09/2016 10:42 PM, Ling Chan wrote: > If your file has 3D coordinates, AssignAtomChrialTagsFromStructure may help > you. Maybe, if I wasn't getting rdkit.Chem.rdmolops.AssignAtomChrialTagsFromStructure( self._mol ) AttributeError: 'module' object has no attribute 'AssignAtomChrialTagsFromStructure' -- same when calling from rdkit.Chem without rdmolops. (centos 7, python2-rdkit-2016.03.1-1.el7.centos.x86_64, rdkit-2016.03.1-1.el7.centos.x86_64) >> the MOL reader perceives chirality based on the bond stereo field of the >> bond block. Instead the atom stereo parity value of the atom block is read >> and stored in the "molParity" atom property, but it is ignored for the >> purpose of chirality perception, as per the MOL file specs: >> >> http://c4.cabrillo.edu/404/ctfile.pdf (see in particular Figure 4) >> >> Therefore, if the MOL file lacks the bond stereo information chirality >> won't be set. GetProp( "molParity" ) does work, thank you, but as I understand it's based on atom ordering in the CTAB and not on CIP rules. So it's just as good as OB's stereo "feature" for my purposes: either way I'd have to roll my own CIP ordering code to arrive at R/S. Thanks. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SDF and FindMolChiralCenters()
Hi everyone, m = rdkit.Chem.SupplierFromFilename( filename, removeHs = False ).next() sts = rdkit.Chem.FindMolChiralCenters( m, includeUnassigned = True ) for s in sts : lbl = m.GetAtomWithIdx( s[0] ).GetSymbol() + str( s[0] + 1 ) print lbl, ":", s[1] For L-ALA 3D SDF the output is C4 : ? For D-ALA 3D SDF the output is also C4 : ? And for ALA 2D SDF the output is also C4 : ? If I change includeUnassigned to False, the list returned by FindMolChiralCenters() is empty. The SDFs have 1, 2, and 3 in the 7th column in the atom block. If I use m = rdkit.Chem.MolFromSmiles( 'C[C@@H](C(=O)O)N' ) instead, the output is C2 : S (this is L-ALA from the same PubChem record as the SDF). So it looks like MOL reader ignores chirality, is that the case? Thanks, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AddHs()
On 09/09/2016 12:07 PM, Peter S. Shenkin wrote: > How about "explicit", rather than "physical", hydrogens? As a programmer I'd prefer the obmol in a dependable known state at all times. I'd always put all H atom nodes in the molecule graph: calculate them from "explicit" charges and/or "implicit" valences when constructing obmol if not already spelled out in the input. It's a different issue, but as a side-effect you'd always have only one kind of hydrogens and wouldn't need these labels. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AddHs()
On 09/09/2016 12:56 AM, Greg Landrum wrote: > This is absolutely correct: if you remove the Hs and then later re-add them > it is extremely unlikely that you will end up with the same H indices > before and after the change. It makes much more sense to just use > removeHs=False That's what I expected and "removeHs = False" works, thanks. And I was kidding about histidine of course. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AddHs()
On 09/08/2016 02:37 PM, Rocco Moretti wrote: > (2) There's special complications here that there are certain structures, > such as imidazole, which needs physical or explicit hydrogens on one of the > nitrogens in order to Kekulize properly. If you're implicit only, the RDKit > sanitizer will choke. Thus, there's special casing in various Add/RemoveHs > function to avoid implicit-izing these critical hydrogens. So if I feed a mol file describing protonated histidine to rdkit, the rdmol I actually get by default is the one with NH2 and COOH? Ohkay... Great write-up, though, thank you. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AddHs()
On 09/08/2016 02:26 PM, Brian Kelley wrote: > Dimitri, Hs are removed. > > Their is a removeHs argument in MolFromMolBlock (python) that defaults to > true. > > There is a corollary in SDMolSupplier if you are using that. > > supplier = SDMolSupplier(filename, removeHs=false) > > if this helps. Thank you, it does: rdkit.Chem.SupplierFromFilename(sys.argv[1], removeHs = False ).next() returns a molecule with -- presumably "physical" -- hydrogens. The reason I ask is if they're removed and re-added, I'd worry about their indexes matching what's in the source file. Which might matter in the case of e.g. stereospecifically assigned methylene protons. (Or so they tell me ;) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] AddHs()
On 09/08/2016 10:25 AM, Greg Landrum wrote: ... > Why do you want 2D drawings that include H atoms? On the subject of H atoms: when I read in the MOL file that has them, I need to explicitly call AddHs() in order to have them drawn. Question: do they actually get stripped off by the reader and re-added by AddHs()? Or are they there "hidden" somehow and AddHs() just "unhides" them? TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] drawing code take 2
On 09/08/2016 10:25 AM, Greg Landrum wrote: > Dimitri, > Why do you want 2D drawings that include H atoms? I have an NMR spectroscopist doing peak assignments, proton spectra are the most commonly used kind for metabolites & such. As you're well aware, atom nomenclature is an "interesting" issue. We need stable atom indexing, including protons, and we need indexes on the picture. Most of the pipeline is automated, I have a webpage where the spectroscopist hits a button and gets a usable 2D picture and an atom table where she can fill in the numbers. So it's not like someone will sit in front of Marvin and play with options until they get a perfect picture for their paper. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] drawing code take 2
:( Here's one where AddHs() really breaks things. It is an unpleasant molecule to draw. So it looks like the issue is that - for CID 112084 call to AddHs() changed the layout (arguably not for the better), whereas - for CID 260719 it didn't change the layout/shorten the bonds when it really should have. For reference, CID260719.ob.svg is the other toolkit's rendering of the same file with (atom indexes changed to green from OB's default red). -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu alatis_output_Structure3D_CID_260719.sdf Description: application/vnd.kinar signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] rdMolDraw2D drawing code
PS. looking at this, it may have come out a little confusing, so I start with a 3D mol file and run it through rdkit.Chem.AllChem.Compute2DCoords() CID112084.svg is what comes out. Running rdkit.Chem.AddHs() before Compute2DCoords() generates the layout seen on other 3 pictures. CID112084.allh.svg is what comes out after AddHs(). CID112084.nonum.svg is what comes out after op = dr.drawOptions() for i in range( mh.GetNumAtoms() ) : op.atomLabels[i] = mh.GetAtomWithIdx( i ).GetSymbol() CID112084.all.svg is with the above loop changed to op.atomLabels[i] = mh.GetAtomWithIdx( i ).GetSymbol() + str ((i+1)) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] rdMolDraw2D drawing code
On 09/05/2016 03:47 AM, Greg Landrum wrote: > Dave's right about the font size: it's expressed in whatever coordinates > the molecule is being drawn in. I'd put Dave's explanation in the doc++ (?) comment on FontSize get/setter in MolDraw2D: http://www.rdkit.org/Python_Docs/rdkit.Chem.Draw.rdMolDraw2D.MolDraw2D-class.html Right now they just say "float". If you're dealing with SVG output, font sizes are px, em, and points, and you need a couple of tries to figure them out. It's a minor thing. > I suspect the other part of Dmitri's question is about the way bonds are > shortened so that they don't draw all the way through the atom labels. Yes, I think. It looks like there's more padding on the top and the right, and less padding on the left and bottom. E.g. H39 in the attached "all" version is the worst, but it is consistent in all 3. The other one on the "all" picture is that the bond to O21 isn't shortened enough and there isn't much room left between 021 and O24. OTOH all three labels: H38, O21, H39 could be drawn without even shortening the bonds if we could just move them up a little. (Well, H39 could be just drawn without shortening the bond at all.) Last but not least, my starting point is a 3D MOL file and I call rdkit.Chem.AllChem.Compute2DCoords( mh ) -- last thing before DrawMolecule(). Attached CID112084.svg is the one generated without AddHs(), notice how layout is very different on that one. I've a suspicion that that layout might look better with all the Hs and numbers added, than the one I get (the other 3 pictures). -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] rdMolDraw2D drawing code
Hi all, I finally got a round tuit for playing with the drawing code and I like it -- great job, thank you Greg and Dave and everyone who contributed. One question though: is it possible to add padding around atom labels? Or use some other trick to make the attached look less crowded? (Yes, I do want all Hs and all atom labels with numbers.) The best I can come up with is reduce the font size a little, that works fine. I think it'd be nice if the fine manual for MolDraw2D said what the units used by FontSize()/SetFontSize() are. So, any better ideas than just slightly smaller labels? TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On 01/20/2016 08:30 PM, Peter S. Shenkin wrote: > ... the problem that I thought we were trying to > address is rather the lack of extensibility, the lack of lower-case, the > fact that different users (even for deposited structures, IIRC) and > different software products overload the available fields differently (like > putting partial charge in the Temperature Factor field) and have violated > the standard by doing necessary but formally disallowed things ... PDB has a format, with API and everything, that takes care of all of that. It's called mmCIF. After 25 years (or however long it's been around) nobody uses it outside of PDB. I've seen this discussion countless times. It always does this exact circle. Everybody wants to *have* a better format. Nobody wants to *use* it because it's "too complex" and "too difficult". In the meantime we are left trying to guess whether a given "CA" stands for C-alpha or calcium. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The Chlorine molfile question
On 01/20/2016 04:57 PM, Peter S. Shenkin wrote: > On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk > wrote: >> JSON encodes a single string. That is a problem for sending larger files >> over the net, say, an NMR structure of a larger molecule with 100 models >> in the file. >> > > That's not a problem, conceptually, because you can have an array of > structures. No, my point was that streaming isn't a part of JSON specification and common implementations do not offer it. https://en.wikipedia.org/wiki/JSON_Streaming You can cut one model out of a PDB file (or one structure out of and SDF) and the result is a valid file. In ASN.1 the length of the value is at the front. If you define your array as sequence, a single structure pulled out of the middle should be OK, but the entire sequence is invalid until you read it to the end. I think in practice you wouldn't define your array as a sequence and instead have a file full of "disjoint" single structures, possibly with some kind of metadata header. (I haven't touched ASN.1 since school, so don't quote me on this.) Oh wait, that sounds exactly like PDB with its REMARKs and MODELs. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss