Re: [Rdkit-discuss] rdkit-cartridge: Inserting new molecules

2020-10-26 Thread Brian Cole
Hi Thomas, It's possible to use TEMPORARY TABLE for this purpose in a single transaction. This is the scheme we use in order to convert the input application SMILES into a canonicalized RDKit SMILES. We keep the RDKit canonical SMILES around in the table for exact isomer look ups, but this lets us

Re: [Rdkit-discuss] para-stereochemistry

2021-05-27 Thread Brian Cole
I always refer back to this graphic in Alberto Gobbi's "Handling of Tautomerism and Stereochemistry in Compound Registration" paper: https://pubs.acs.org/doi/10.1021/ci200330x [image: image.png] @Greg Landrum , I would interpret "para stereochemistry" as #3 in the above image. And "dependent ster

[Rdkit-discuss] RDKit version in AWS Aurora?

2021-06-06 Thread Brian Cole
This is a bit more of a question for AWS themselves, though I believe the RDKit build for the Postgres extension can be improved as well. The AWS documentation states, “RDKit extension version 3.8.” https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Updates.20180305.htm

Re: [Rdkit-discuss] RDKit version in AWS Aurora?

2021-06-07 Thread Brian Cole
Landrum wrote: > Hi Brian, > > On Mon, Jun 7, 2021 at 4:36 AM Brian Cole wrote: > >> This is a bit more of a question for AWS themselves, though I believe the >> RDKit build for the Postgres extension can be improved as well. >> >> The AWS documentation

Re: [Rdkit-discuss] validating stereochemistry

2021-09-27 Thread Brian Cole
Good Morning Tim, The RDKit EnumerateStereoisomers function accomplishes this through the ‘tryEmbedding’ flag: https://github.com/rdkit/rdkit/blob/d20e5cadc81bf6c7b4e590124866f178f2f2fe28/rdkit/Chem/EnumerateStereoisomers.py#L8 It attempts to generate a 3D conformer for the given stereo config

Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-31 Thread Brian Cole
I would 2nd the suggestion of continuing to push a JSON format forward that natively supports multiple conformers. I've never seen automatic recombination of an SDF work %100 of the time, it's fraught with corner cases. It's also abysmally slow and takes a huge amount of disk space. -Bruce >

Re: [Rdkit-discuss] https://en.wikipedia.org/wiki/Hansen_solubility_parameter

2016-12-08 Thread Brian Cole
Hi Dr. Guillaume, I played around with the ability to map a set of fragments to molecules a couple months ago. The result of my experiments are here: https://github.com/coleb/fragment_mapper You give it a set of molecules and fragments you would like to have mapped. It tries to find the smallest

[Rdkit-discuss] Handling SDF with 'aromatic' bonds?

2016-12-08 Thread Brian Cole
Any advice on getting RDKit to read in SDF files that use bond order '4' to mark bonds as aromatic and don't have explicit hydrogen? For example, imagine two fused heterocycles where the hydrogen isn't really known. I have SDF files that just mark the bond orders as '4', aromatic, and don't even tr

Re: [Rdkit-discuss] Generating all stereochem possibilities from smile

2016-12-09 Thread Brian Cole
This has me quite curious now, how do we detect unspecified bond stereo chemistry in RDKit? m = Chem.MolFromSmiles("FC=CF") assert m.HasProp("_StereochemDone") for bond in m.GetBonds(): print(bond.GetBondDir(), bond.GetStereo()) Yields: (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.Bond

Re: [Rdkit-discuss] Generating all stereochem possibilities from smile

2016-12-09 Thread Brian Cole
x27;t expose an "easy" way to do this. What is the trickiness and dangerousness of this API? And could we make an easy way to enumerate bond stereo? Thanks! On Fri, Dec 9, 2016 at 5:44 PM, Brian Cole wrote: > This has me quite curious now, how do we detect unspecified

Re: [Rdkit-discuss] Bug in AllChem.EmbedMultipleConfs pruning?

2016-12-22 Thread Brian Cole
RMSD with auto-morph symmetries with hydrogens are crazy expensive to calculate. Symmetry should be on by default, but without hydrogens. Would even love to see the RMSD auto-morph symmetry code ignore trifluro type of groups too as they dramatically increase the cost of the computation with little

[Rdkit-discuss] Preserving hydrogens necessary for imine cis/trans stereochemistry?

2017-05-17 Thread Brian Cole
Is there a recommended way in RDKit to preserve hydrogens necessary for representing cis/trans stereochemistry of imines? For example, given the attached SDF I need to maintain explicit hydrogens in the output SMILES string to maintain the imine cis/trans stereo-chemistry. mol = Chem.ForwardSDMol

Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread Brian Cole
You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches based upon the atom symmetry like this: def count_unique_substructures(smiles, smarts): mol = Chem.MolFromSmiles(smiles) ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False)) pattern = Chem.MolFromSmarts(smart

[Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-08 Thread Brian Cole
Hi Cheminformaticians, This is an extreme subtlety in the interpretation of SMILES atom stereochemistry and I think a bug in RDKit. Specifically, I think the following SMILES should be the same molecule: >>> rdkit.__version__ '2017.09.1' >>> Chem.CanonSmiles('F[C@@]1(C)CCO1') 'C[C@]1(F)CCO1' >>>

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Brian Cole
Here's an example of why this is useful at maintaining molecular fragmentation inside your molecular representation: >>> from rdkit import Chem >>> smiles = 'F9.[C@]91(C)CCO1' >>> fluorine, core = smiles.split('.') >>> fluorine 'F9' >>> fragment = core.replace('9', '([*:9])') >>> fragment '[C@]([*

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Brian Cole
> > Somehow you got the code to generate a "9" for that ring closure, which is > not something that RDKit does naturally, so we are only seeing a step in > the larger part of your goal. > Certainly, but thousands of lines of Python doesn't fit in an email in an easily digestible way. :-) > Since

[Rdkit-discuss] conda build instructions for OSX?

2017-12-27 Thread Brian Cole
Trying to 'conda build rdkit' as described in the https://github.com/rdkit/conda-rdkit README to no success. Are there any OSX 'conda build' instructions tucked away somewhere? It's currently failing on the cairo dependency: -- Checking for one of the modules 'cairo' CMake Error at /Users/coleb/a

Re: [Rdkit-discuss] conda build instructions for OSX?

2018-01-02 Thread Brian Cole
7; works. Now the next trick I'm still stuck on is how to build RDKit's master branch using conda. Changing `git_rev` in rdkit/meta.yaml didn't have the desired effect. -Brian On Wed, Dec 27, 2017 at 5:08 PM, Brian Cole wrote: > Trying to 'conda build rdkit' as descr

Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-16 Thread Brian Cole
+1 to the MolVS project as well. Perhaps an easy bite-size project is to incorporate the open source mae parser code into core RDKit: https://github.com/schrodinger/maeparser On Mon, Jan 15, 2018 at 9:08 PM, Francois BERENGER < beren...@bioreg.kyushu-u.ac.jp> wrote: > On 01/16/2018 05:51 AM, Ti

Re: [Rdkit-discuss] Calculating the MOE vsa_acc descriptor using the rdkit (or other Open Source software)?

2018-02-19 Thread Brian Cole
Hi Richard, You can calculate the per-atom contributions to the surface area with _CalcLabuteASAContribs: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#_CalcLabuteASAContribs If you have the MOE SMARTS for "pure hydrogen bond acceptors", the following is the Python I cu

Re: [Rdkit-discuss] Interest in a RDkit UGM in the USA midwest?

2018-04-10 Thread Brian Cole
I would be interested, but not sure we would have such a large draw in the Midwest as we would in Cambridge MA. Potential idea would be to schedule it around the SciPy Conference? https://scipy2018.scipy.org/ehome/index.php?eventid=299527&; Was thinking about checking that out this year. -Brian

Re: [Rdkit-discuss] seg fault when importing Chem on OS-X 10.12

2018-04-16 Thread Brian Cole
An issue like this was fixed in the past: https://github.com/rdkit/rdkit/commit/009dd580527caa662de8bac5ad0c60f1e9bc90cd Will see if I can reproduce this. -Brian On Mon, Apr 16, 2018 at 12:09 PM, Patrick Walters wrote: > Hi All, > > I installed the latest RDKit using conda > > conda create -c

Re: [Rdkit-discuss] seg fault when importing Chem on OS-X 10.12

2018-04-16 Thread Brian Cole
96 frame #5: 0x00011301 python`main + 497 frame #6: 0x7fff5fe23015 libdyld.dylib`start + 1 frame #7: 0x7fff5fe23015 libdyld.dylib`start + 1 (lldb) info threads On Mon, Apr 16, 2018 at 1:11 PM, Brian Cole wrote: > An issue like this was fixed in the past: https://github.

Re: [Rdkit-discuss] seg fault when importing Chem on OS-X 10.12

2018-04-16 Thread Brian Cole
2017 working. -Brian On Mon, Apr 16, 2018 at 1:20 PM, Brian Cole wrote: > I can reproduce the problem, and the issue does appear to be different > than the previous issue. Reproducible with the following on OSX: > > $ conda create -c rdkit -n rdkit_2017 rdkit python=3.5 &g

[Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread Brian Cole
Hi Chem-informaticians: I know it has been talked about in the community that fingerprints are not a way to obfuscate molecules for security, but I don't recall a paper actually demonstrating actual reverse engineering a fingerprint into a chemical structure. Does anyone know if such a paper exist

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-23 Thread Brian Cole
Thanks Andrew, very interesting and useful script! Unfortunately it doesn't work on circular/ECFP-like fingerprints. It has the requirement that the fingerprint be a substructure fingerprint as you described. It seems the evolutionary/genetic algorithm approach is the current state-of-the-art for

[Rdkit-discuss] RDKit Postgres Cartridge Parallel Queries?

2018-05-31 Thread Brian Cole
It appears like Postgres 9.6+ supports parallel queries now to accelerate slow queries: https://www.postgresql.org/docs/10/static/parallel-query.html Has anyone successfully got this to accelerate substructure queries with the RDKit Postgres cartridge? Thanks, Brian --

Re: [Rdkit-discuss] RDKit Postgres Cartridge Parallel Queries?

2018-06-01 Thread Brian Cole
hings seemed fine. > The problem (and it's a sizable one) is that parallel queries don't use > the index. Until parallel scans using GIST indices work, I don't think this > is really going to help much. > > -greg > > > On Fri, Jun 1, 2018 at 12:04 AM Brian Cole wro

Re: [Rdkit-discuss] RDKit Postgres Cartridge Parallel Queries?

2018-06-01 Thread Brian Cole
un 1, 2018 at 10:07 AM, Greg Landrum wrote: > I think they should. Does a ::mol query on the same table parallelize? If > it does but a ::qmol query does not maybe I forgot something in the SQL > function definitions > > On Fri, 1 Jun 2018 at 15:43, Brian Cole wrote: > >

Re: [Rdkit-discuss] Chemical Formula to SMILES

2018-08-12 Thread Brian Cole
While Dr. Guillaume is correct, there are some ways to find known molecules given the formula by hacking InChI strings. For example just google search the formula with the InChI prefix, e.g., InChI=1S/C16H14O10. https://www.google.com/search?safe=off&rlz=1C5CHFA_enUS700US700&ei=4ltwW5yzLYvBjwS99L

Re: [Rdkit-discuss] descriptors beyond rotatable bond count and possible correlations with entropy

2018-09-01 Thread Brian Cole
Little late to the party, but here is an RDKit implementation of a contiguous rotatable bond count I wrote awhile ago: https://gist.github.com/coleb/4737a1dc77b5f5f8a7bbe4b23f39f2c4 Doesn't return the actual bonds like Paolo's does. But it does take into account amides, triple bonds, and terminal

[Rdkit-discuss] Do reactions need a useChirality flag?

2018-09-27 Thread Brian Cole
I'm trying to get a reaction SMARTS pattern to ignore chiral atoms and it doesn't appear straightforward. First, it appears RDKit doesn't support '!@' to indicate a non-chiral specified atom. I have to wrap this in a recursive SMARTS to get it to work. For example: In [2]: mol = Chem.MolFromSmiles

[Rdkit-discuss] Docs intentionally broken?

2018-11-05 Thread Brian Cole
My google search for 'rdkit python point3d' yielded the following as the top result: https://rdkit.org/docs/api/rdkit.Geometry.rdGeometry-module.html Which unfortunately now has a 404, page not found. Was this an intentional reorganization of the documentation? -Brian __

Re: [Rdkit-discuss] Double Bond Stereochemistry in the RDKit

2018-12-04 Thread Brian Cole
Hi Kovas, For your use-case #2 should suffice, "set STEREOCIS/STEREOTRANS tags + manually set stereo atoms". This is what the EnumerateStereoisomers code does: https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/EnumerateStereoisomers.py#L38 As to what is the 'ground truth', that is a more diff

Re: [Rdkit-discuss] Error parsing a MUTAG smiles

2020-03-04 Thread Brian Cole
Note, the location of the first opening parenthesis is different: >>> 'c1ccc2=NC3=CC(=CC=C3=c2c1)[N+](=O)[O-]'.find('(') 13 >>> 'c1ccc2=NC3=CC=C(C=C3=c2c1)[N+](=O)[O-]'.find('(') 15 So the SMILES are syntactically correct to represent 2 and 3 nitrocarbazole, though semantically weird as they're a