Re: [Rdkit-discuss] AdditionalOutput from FingerprintGenerator

2020-03-17 Thread Chris Earnshaw
A quick comment on the cosine metric. Unlike Tanimoto it obeys the triangle inequality, so in cases where it's used essentially as a distance metric (e.g. some clustering applications) the results are probably more mathematically correct. I used it a lot in that context. Whether it makes any real d

Re: [Rdkit-discuss] SMILES/SMARTS codes that match multiple atoms

2020-02-09 Thread Chris Earnshaw
Sorry - tried to type this too early in the morning and introduced some errors transcribing the SMARTS pattern! It should have been "[CH](=O)O[$([CH3]),$([CH2]C)]") as in pat1 = Chem.MolFromSmarts("[CH](=O)O[$([CH3]),$([CH2]C)]") Best regards, Chris On Sun, 9 Feb 2020 at 0

Re: [Rdkit-discuss] SMILES/SMARTS codes that match multiple atoms

2020-02-09 Thread Chris Earnshaw
Hi I've always regarded it as dangerous to rely on the use of explicit hydrogens in search queries and pattern matches. I think it's generally safer to use H-count properties in your SMARTS. In your example case this will require the use of recursive SMARTS to allow matching of the CH3 and CH2Cn f

Re: [Rdkit-discuss] Smarts notation for aromatic regioisomers

2020-01-29 Thread Chris Earnshaw
Hi Dot-disconnected fragments are not going to work for this, as you describe. You need to use recursive SMARTS (see https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html section 4.4). Something like: Clc[$(cBr);$(ccBr);$(cccBr)] should (I hope!) be a reasonable starting point. Chris

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Chris Earnshaw
e it seems like it could be fairly easy). > > On Tue, Oct 23, 2018 at 6:13 PM Chris Earnshaw > wrote: > >> >> Following this analysis means you don't need to consider the resonance >> form: >> A carbonyl or imine (open chain or in a partially saturated

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Chris Earnshaw
Hi I think my approach to this is - Is there a resonance form in which the ring in question in unequivocally aromatic and the separated charge ends up somewhere sensible? The 'electron stealing' concept is a sort of handy shortcut for this. For Greg's examples, I'd say: [image: image.png] I'm not

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Chris Earnshaw
Mea culpa - I hit Reply rather than Reply All and so only sent this to Greg... On Tue, 23 Oct 2018 at 13:53, Chris Earnshaw wrote: > Hi Greg > > Apologies again, I'm not trying to stir things up here. As we can see from > some of the the other discussion there's

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Chris Earnshaw
yclic bond is allowed to steal electrons, but that may be better than what's happening here. Apologies for the dissent! Chris Earnshaw On Tue, 23 Oct 2018 at 11:57, Greg Landrum wrote: > The current implementation requires "exocyclic" bonds to actually be > *non-ring* b

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Chris Earnshaw
s probably over optimistic... Regards, Chris Earnshaw On Wed, 10 Oct 2018 at 13:16, Michal Krompiec wrote: > Hi Thomas, > Radius 2, 2048 bits, 5200 data points. > > On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis > wrote: > >> What's your bitvector length and radi

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-27 Thread Chris Earnshaw
optimised code to make it run fast enough to be useful. Regards, Chris Earnshaw On Thu, 27 Sep 2018 at 02:36, Francois Berenger wrote: > On 21/09/2018 16:53, Chris Earnshaw wrote: > > Hi > > > > I'm afraid I can't help with an RDkit solution to your question, but

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-21 Thread Chris Earnshaw
l for this kind of analysis. It's better to use an alternative which does obey the triangle inequality - e.g. the Cosine metric. Regards, Chris Earnshaw On Thu, 20 Sep 2018 at 21:55, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > RDkit Discussion Gro

Re: [Rdkit-discuss] enumeration of smiles question

2018-08-06 Thread Chris Earnshaw
Hi The question 'what do you mean by ALL?' springs to mind. None of the discussion includes dot-disconnected SMILES, which are also perfectly valid representations. For example C(C1C2)C.C12 is yet another SMILES (of many possible) for the example structure. I've no idea whether this is of any rel

Re: [Rdkit-discuss] Problem getting valence

2018-07-26 Thread Chris Earnshaw
Hi It looks to me like N5 [nH:5] also has a problem. This has 3 connections to heavy atoms, is specified to have a hydrogen attached, but has no charge. This may not have triggered an error but it looks wrong, especially in this structure. Surely this atom should just be [n:5] ? Best regards, Chr

Re: [Rdkit-discuss] Error if run Draw in Python

2018-07-11 Thread Chris Earnshaw
Hi I'm no Python expert, but I think the problem is that Python doesn't (by default) do filename globbing. As a result it doesn't understand the significance of the ~ character in your directory path and tries to interpret it literally. The simple solution is to just give a path that can be interp

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Chris Earnshaw
e. No contractual relationship is created by this message by any > person unless specifically indicated by agreement in writing other than > email. > Monitoring: MedChemica Limited retains and monitors all email traffic data > and content for the purposes of the prevention and detect

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Chris Earnshaw
I'd say that using RDkit to calculate the numbers of heavy atoms is significantly more robust than a purely lexical approach - and it's easy to implement. It's also dangerous to just discard the smallest fragment. Years ago I worked on a project where the active molecule had only 11 heavy atoms an

Re: [Rdkit-discuss] MolFromMol2Block changes carboxylic group representation

2018-03-28 Thread Chris Earnshaw
Hi Maria I would say that the behaviour of RDKit with your MOL2 file is right. SMILES notation doesn't have a way to represent the delocalised form of a carboxylate anion, so the O=C[O-] form is the correct SMILES for this structure. RDKit does a good job in recognising that it's the anion based

Re: [Rdkit-discuss] generate conformes with a restrained core

2018-03-24 Thread Chris Earnshaw
Hi Felipe You're doing something similar to the problem Paolo addressed. ConstrainedEmbed (see http://www.rdkit.org/Python_Docs/rdkit.Chem.AllChem-module.html#ConstrainedEmbed) requires a mol object as the first parameter, but you are passing it an integer cid value, not a molecule. Your code mus

Re: [Rdkit-discuss] remove salts + neutralize

2018-03-14 Thread Chris Earnshaw
The minuses are right. These are the single bonds between the individual aromatic rings and this representation is strictly correct. The OpenBabel representation doesn't mark these bonds as explicitly single and, as they're between two aromatic atoms, the bond type could be inferred to be aromatic.

Re: [Rdkit-discuss] changing atomic charges with ReactionFromSmarts

2018-01-25 Thread Chris Earnshaw
Hi Jan Your code doesn't change the charges because the reaction SMARTS doesn't tell it to. If you say - rxn_smarts = ['[N+:1]=[*:2]-[O-:3]>>[N+0:1]-[*:2]=[O+0:3]'] - the charges in the product are explicitly defined and you should get the result you expect. Best regards, Chris On 25 January 2

Re: [Rdkit-discuss] edge matrix

2018-01-17 Thread Chris Earnshaw
I don't think there's a way to do this using RDKit itself, but it appears to be straightforward using Python with numpy and networkx, e.g. import numpy as np import networkx as nx a = np.matrix([[0, 1, 0, 0, 0],[1, 0, 1, 1, 0],[0, 1, 0, 0, 0],[0, 1, 0, 0, 1],[0, 0, 0, 1, 0]]) b = nx.from_numpy_mat

Re: [Rdkit-discuss] Having trouble getting RDKIT to recognize LiAsF6

2017-11-25 Thread Chris Earnshaw
regards, Chris On 24 November 2017 at 21:31, Yoolhee Kim wrote: > Chris, > > Thank you very much for your reply! I'm not very familiar with RDKIT, and > I was wondering if you could elaborate how to fix the problem of adding '7' > in the As entry so that the valence

Re: [Rdkit-discuss] Hypervalent 2nd row element (and higher) representation / sanitization

2017-11-23 Thread Chris Earnshaw
se, but it would be worth > incorporating them into any solution / test set? > > Yours, > > Steve > > On Thu, Nov 23, 2017 at 5:27 PM, Chris Earnshaw > wrote: > >> Following a recent brief discussion about hypervalent halogen salt >> handing in RDKit (chlorates, p

[Rdkit-discuss] Hypervalent 2nd row element (and higher) representation / sanitization

2017-11-23 Thread Chris Earnshaw
ith the new MolOps.cpp: - testMMFFForceField (does some checks on dative bond forms which presumably now get converted) - graphmolMolOpsTest (builds perchlorates etc. and expects the result to be in dative bond form) - pythonTestDirChem (not sure what's wrong with this one - I can't fi

Re: [Rdkit-discuss] Hypervalent halogen structures - chlorate etc.

2017-11-22 Thread Chris Earnshaw
(arguable!), it can cause >> problems of compatibility with other software and looks remarkably >> ugly. It's also inconsistent with the handling of hypervalent P and S >> compounds. Using the same convention, we

Re: [Rdkit-discuss] Hypervalent halogen structures - chlorate etc.

2017-11-21 Thread Chris Earnshaw
lorate [O-][Cl2+]([O+])[O-] > perchlorate [O-][Cl3+]([O-])([O+])[O-] > > it looks wrong to me as there is an overall formal charge of +1. All O's > should bear a -1 charge. > > Cheers, > p. > > > > On 11/21/17 09:12, Chris Earnshaw wrote: >> >> Hi

[Rdkit-discuss] Hypervalent halogen structures - chlorate etc.

2017-11-21 Thread Chris Earnshaw
I really don't want this to happen! Does anyone know a way to restore the old behaviour for chlorites, bromates, periodates etc.? Best regards, Chris Earnshaw -- Check out the vibrant tech community on one of the wo

Re: [Rdkit-discuss] Having trouble getting RDKIT to recognize LiAsF6

2017-11-21 Thread Chris Earnshaw
y iodine has > values > 1, so by default it's not possible to construct e.g. > chlorates, or bromates, and no perhalates are allowed. > > Regards, > Chris Earnshaw > > On 20 November 2017 at 23:03, Yoolhee Kim wrote: >> Hello, >> >> I'm trying

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Chris Earnshaw
Trouble is, you're mixing chemical operations and lexical ones. It might be handy if this 'just worked' but in practice it's not going to produce valid SMILES without more work. I've written code in the past to do this kind of thing for virtual library building, using dummy atoms to mark link posi

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-08 Thread Chris Earnshaw
ersonally I wouldn't change the behaviour - or get RDKit to issue a warning that the SMILES isn't 'strict' in these cases. I think the safest approach is to stick to SMILES which are unequivocally valid, unless RDKit is going to create its own definition of SMILES... Best regards,

Re: [Rdkit-discuss] Reaction changing bonds but not charges

2017-10-09 Thread Chris Earnshaw
result you want - [#8-:2]-[#7+:1]=[O:3]>>[O+0:2]=[N+0:1]=[O:3] Chris Earnshaw On 9 October 2017 at 15:57, Chris Murphy wrote: > Hi, > > I am using rdChemReactions to perform substructure transformations as > defined by configurable reaction smarts. When I create the reaction and run

Re: [Rdkit-discuss] nitrogen valence issues

2017-10-07 Thread Chris Earnshaw
ozen problem cases out of 1.5 million compounds, I > just removed them from my main file and downloaded the mol files from > chembl and double check the structures. > > Bran > > -----Original Message- > From: Chris Earnshaw [mailto:cgearns...@g

Re: [Rdkit-discuss] nitrogen valence issues

2017-10-05 Thread Chris Earnshaw
d these to 5 and 3 respectively to make the correct charge > states, however, that did not resolve the issue. Perhaps the bonding info > is also incorrect. The file is on a remote server so I will repost with > attachment if I continue to have problems. > > Brian > > > _

Re: [Rdkit-discuss] nitrogen valence issues

2017-10-05 Thread Chris Earnshaw
Hi Be aware that there is a problem with one of the azide groups in CHEMBL592333 - in SMILES it's '-N=[NH+]-[NH-]' rather than '-N=[N+]=[N-]. This doesn't render the structure chemically invalid but it's probably wrong. What's the provenance of your SD file? It isn't the same as as a fresh downlo

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread Chris Earnshaw
Hi It amounts to the same thing - either do all tests on one atom, or one test on all atoms. The syntax is shorter for the latter if you can use the vector bindings but may not be otherwise, especially if multiple exclusions are needed. Regards, Chris Earnshaw On 24 Sep 2017 16:54, "

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread Chris Earnshaw
RTS patterns. You'll have to think about checking which atoms have been matched - for example, do you want to match quinoline because it contains a benzene ring, or exclude it because it contains a pyridine? If the former you'll have to check that the atoms matched by your two patterns are d

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread Chris Earnshaw
-ring aromatic pattern a:1:a:a:a:a:a:1, with recursive SMARTS applied to the first atom to ensure that this can't match any of the 6 ring atoms in your undesired system. Regards, Chris Earnshaw On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss wrote: > Hello, > > Supp

Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR

2017-09-19 Thread Chris Earnshaw
ives ((0, 1), (2, 3)) as required, but if you have a specific need for the 'single SMARTS' approach that's not much use. Sorry not to be more helpful... Chris Earnshaw On 19 September 2017 at 16:50, James T. Metz wrote: > Chris, > > Thank you for your interesting su

Re: [Rdkit-discuss] Returning Z-matrix coordinates for a molecule in rdkit?

2017-09-19 Thread Chris Earnshaw
Hi Open Babel will convert a wide range of structure formats and can produce at least a couple of different flavours of Z-matrix, including MOPAC and Gaussian. I'm not aware of any way to get a Z-matrix directly from RDKit (but would be happy to find out I'm wrong). Regards, Chris Ea

Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR

2017-09-19 Thread Chris Earnshaw
Hi Will the recursive SMARTS [$(C-C),$(N-N)] not do the job? I'd parse this in English as 'an atom which is EITHER an aliphatic carbon singly bonded to an aliphatic carbon OR an aliphatic nitrogen singly bonded to an aliphatic nitrogen'. Regards, Chris Earnshaw On 19 Septembe

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Chris Earnshaw
Hi The problem is due to RDkit perceiving the embedded pyranone in CHEMBL1999443 as an aromatic system, which is probably correct. However, in the structure of aspirin the carboxyl carbon and singly bonded oxygen are non-aromatic, so if you just use the SMILES of aspirin as a query it won't match

Re: [Rdkit-discuss] SMARTS substructure queries with SQL conjunctions

2017-03-21 Thread Chris Earnshaw
Hi Akos Very strange behaviour. I don't see anything wrong with your SQL syntax. I've tried equivalent searches in my 2.6M compound database and they give the expected results. I used iodine rather than gold, for which there are 19504 structures. Adding the qualifying SQL clauses singly and in com

Re: [Rdkit-discuss] mass replacement of External R-groups with many substituents

2017-03-16 Thread Chris Earnshaw
Hi Brian I'm by no means an expert in RDKit with Python, but until someone else comes along, here are a few thoughts. Your reaction SMARTS specifically defines aromatic carbons joined by single bonds which won't match an incoming benzene ring, and it's a bit redundant to specify that aromatic car

Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Chris Earnshaw
osterity but it does appear > to match the moe PMI's. > > > > On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw > wrote: > >> The new version looks good to me as far as I can test it. PMI and NPR are >> still fine, the radius of gyration is right (for an extremely ar

Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Chris Earnshaw
oordinates' to avoid confusion. Chris On 16 January 2017 at 09:30, Greg Landrum wrote: > > > On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw > wrote: > >> >> Either way, it makes it rather hard to trust their derivations generally >> - especially as there ap

Re: [Rdkit-discuss] PMI API

2017-01-16 Thread Chris Earnshaw
Is is impossible to say. Either way, it makes it rather hard to trust their derivations generally - especially as there appear to be other errors (e.g. the denominator in eq. 16 should be the square root of the given sum of squares, according to their reference). Best regards, Chris Dr Chris Ear

Re: [Rdkit-discuss] PMI API

2017-01-16 Thread Chris Earnshaw
the inertia matrix for benzene, however, >>> are definitely not zero (and not close enough that it's likely to be >>> round-off error). >>> It would be very nice if you could run the three files I mention through >>> Dragon and let me know what it calculates for

Re: [Rdkit-discuss] PMI API

2017-01-15 Thread Chris Earnshaw
k into this this weekend and I've found > a bug and something I don't understand. Hopefully the community can help > out here. > > On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw > wrote: > >> 4) The big one! The returned results look very odd. They appear to relate

Re: [Rdkit-discuss] PMI API

2017-01-13 Thread Chris Earnshaw
he functions individually, but >> the expensive calculation of the moments will only be done once, so it >> doesn't end up doing repeated work. >> >> And, finally, on the values themselves: I will have to take a look at >> that. >> -greg >> >> >> >&g

Re: [Rdkit-discuss] PMI API

2017-01-08 Thread Chris Earnshaw
need any more information from me. Chris Earnshaw On 8 Jan 2017 18:17, "Brian Kelley" wrote: I think the relevant issue is that if you are using an existing build, we don't yet have the capability for you to know what was built and what was not. I.e. You need to add the compiler

Re: [Rdkit-discuss] PMI API

2017-01-08 Thread Chris Earnshaw
tidied it up (having just looked at it to get the > link above, I see there's a typo on the first sentence, for example!) and > sent in an interim Pull Request as for people starting out it might already > be of value. > > Cheers, > Dave > > On Sun, 8 Jan 2017 a

[Rdkit-discuss] PMI API

2017-01-08 Thread Chris Earnshaw
Hi A while ago I had a project which needed PMI descriptors (specifically NPR1 and NPR2) which were not available in the main branch of RDKit at the time. At the time I used the fork by 'hahnda6' which provided the calcPMIDescriptors() function, and this worked well. Now that PMI descriptors are a