On Apr 16, 2018, at 16:29, Guillaume GODIN
wrote:
> And for this one C[C@@]12CC[C@@](C)(CC1)O2O any idea
>
> Cause your tool failed too.
It's true that smiview failed, in the sense that it shouldn't have tried to do
further analysis with a molecule that RDKit rejected.
However, RDKit does rep
If you try this out with my smiview package, available from
https://bitbucket.org/dalke/smiview/downloads/ , it reports:
% smiview 'C\(C(C)C)=N/O'
Cannot parse --smiles: Unexpected term
C\(C(C)C)=N/O
^ Tokenizing stopped here
A bond must be followed by an atom, closure.
That is, the bond
On Apr 16, 2018, at 05:37, Patrick Walters wrote:
>
> Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so I
> wrote something to work directly on a molecule.
That's the approach I started with, until I figured out that it doesn't
preserve chirality.
If I change the en
Hi Pat,
I wrote something like this for mmpdb, which is the MMPA code I helped
develop, at https://github.com/rdkit/mmpdb .
It has one restriction, which I'll get to in a moment.
The general idea is to convert the attachment points to closures, join them
with a ".", and canonicalize:
>>> fr
On Apr 7, 2018, at 07:13, Greg Landrum wrote:
> Andrew Dalke (Dalke Scientific) will offer a course on Python and the RDKit
I need to finalize what I'm going to cover. I've been going between two
approaches.
1) Python programming for cheminformatics
This is meant for someone
About 10 days ago I posted a prototype program called 'smiview', which displays
information about the structure of a SMILES string.
Thanks to feedback from a couple of users, and a deep urge to explore the idea,
I've just released smiview 1.2, available from
https://bitbucket.org/dalke/smiview/
Over the last few days I've developed a command-line tool that I call "smiview".
It's a SMILES viewer. It isn't a depiction tool where the input is in SMILES
but rather a tool to highlight different aspects of the SMILES string.
I'll put some examples at the end. If you want to try it out you ca
Hi all,
I've just released version 1.4 of chemfp, my cheminformatics fingerprint
toolkit. It has several new features and bug fixes, which you can read at:
http://chemfp.readthedocs.io/en/chemfp-1.4/#what-s-new-in-1-4
The new RDKit feature is the support for "fromAtoms" for those RDKit
fin
Hi Rajarshi,
Here's what RDKit says from the interactive shell:
>>> from rdkit import Chem
>>> Chem.MolFromSmiles("C1=CC=C(C=C1)[N]2=CC=CC3=C2C4=C(C=CC(=C4)C5=CC=CN=C5)N=C3")
[23:02:36] Explicit valence for atom # 6 N, 4, is greater than permitted
RDKit is pretty strict about accepting chemicall
Hi Wandré,
The easiest way to avoid recalculating the fingerprints is to keep the FPS
file around. The rdkit2fps program calculates the AtomPair fingerprint and
converts the resulting DataStructs fingerprint object into a hex-encoded
fingerprint, which is stored as text in the FPS file.
One
On Jan 11, 2018, at 12:04, Wandré wrote:
> Thanks for the link. It is very interesting. I will read very carefully.
> So, as input on ChemFP, I have to put a file with all molecules in 1 SDF?
Chemfp works with fingerprint files, in your case, chemfp's text-based "FPS"
format. You'll need to use
Hi Wandré,
You may want to look at chemfp for this sort of clustering.
Last year Chris Swain reviewed a few different ways to do clustering, at
https://www.macinchem.org/reviews/clustering/clustering.php . His data set had
4.4M fingerprints and it took 10 hours to cluster at 0.8 similarity th
On Nov 9, 2017, at 21:49, Brian Cole wrote:
> Certainly, but thousands of lines of Python doesn't fit in an email in an
> easily digestible way. :-)
I'll restate things since I wasn't clear. While this step may be what you need
for the way you structure things, there might be a better way to st
On Nov 9, 2017, at 16:09, Brian Cole wrote:
> Here's an example of why this is useful at maintaining molecular
> fragmentation inside your molecular representation:
>
> >>> from rdkit import Chem
> >>> smiles = 'F9.[C@]91(C)CCO1'
> >>> fluorine, core = smiles.split('.')
> >>> fluorine
> 'F9
On Nov 9, 2017, at 08:13, Greg Landrum wrote:
> As was discussed in the comments of
> https://github.com/rdkit/rdkit/issues/786, I think it's pretty gross that the
> second syntax is even legal. But that's a side point.
To belabor that point. Neither Daylight SMILES nor OpenSMILES accept it, wh
On Nov 8, 2017, at 21:00, Chenyang Shi wrote:
> =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
The recursive SMARTS notation, which is the term inside of the [$(...)], finds
a match for the entire pattern and returns the first atom in that pattern.
> For example, if I search "C=C=O" using "[CH0;A;X2;!R](
Hi Cameron,
While you are waiting for an answer about the proper way to silence errors, I
can give you a work-around which will help with the metaphorical reams of
teletype paper you are printing out.
However, it is a very crude solution. Basically, close the C/C++ stderr file
descriptor, an
On Sep 22, 2017, at 14:26, Kramer, Christian wrote:
> thanks for pointing this out. The reason for that error message is that
> signal.SIGPIPE is not available under windows. This seems to have slipped
> below our radar, since we developed the code on Linux.
And Mac. :)
It put that code there
Hi all,
I have just released chemfp 1.3. It is available from
http://dalkescientific.com/releases/chemfp-1.3.tar.gz .
Chemfp is a set of command-line tools and a Python library for working with
cheminformatics fingerprints. It can use OEChem/OEGraphSim, RDKit, or Open
Babel to create fingerpri
On Sep 14, 2017, at 19:26, Dimitri Maziuk wrote:
> Just FYI: python 2.6 is the system python on (at least) RHEL-6 family of
> linux distros that will be officially with us until June 30, 2024.
If only Greg got as much money for long term RDKit support as Red Hat
gets for long term RHEL support. :
On Sep 8, 2017, at 15:51, Noel O'Boyle wrote:
>
> Hi all,
>
> I'd like to capture error messages during SMILES parsing, but am having
> trouble getting this to work.
...
> assert sio.read() != ""
That should be a sio.getvalue(). The read() starts from the current file
position, which is at
On Aug 8, 2017, at 22:20, Peter S. Shenkin wrote:
> But I would be curious to see the 51 CHEMBL SMILES that RDKit could not parse.
As of ChEMBL 23, the following files are available:
- the sdf.gz file
- pre-computed RDKit Morgan fingerprints in fps.gz format
- the database available as an S
On Jul 1, 2017, at 17:19, Changge Ji wrote:
> I want to do some substructure match using MCS.
> It seems that Sanitize is needed for MCS.
> I met with the over valance error when using sanitize for some molecules.
>
> Like the following one :
>
> sa = Chem.MolFrom
On Jun 19, 2017, at 17:39, Dan Wandschneider
wrote:
> Greg-
> Is the RDKit currently compatible with Python3? If not, when do you expect I
> could start migrating a code base that depends on the RDKit?
I'm not Greg, but I can answer that question.
The RDKit has been available for both Python 2
Following up on myself,
On Jun 6, 2017, at 04:00, Andrew Dalke wrote:
> I've fleshed out that algorithm so it's a command-line program that can be
> used for benchmarking purposes. It's available from
> http://dalkescientific.com/writings/taylor_butina.py .
&g
On Jun 5, 2017, at 11:02, Michał Nowotka wrote:
> Is there anyone who actually done this: clustered >2M compounds using
> any well-known clustering algorithm and is willing to share a code and
> some performance statistics?
Yes. People regularly use chemfp (http://chemfp.com/) to cluster >2M comp
On May 19, 2017, at 21:59, Markus Heller wrote:
> [In chemfp] I get the following error:
>
> [11:37:55] Explicit valence for atom # 6 Te, 4, is greater than permitted
> ERROR: Cannot parse the SMILES
> 'CC(C)(/C(=C\\Cl)/[Te-2](c1ccc(cc1)OC)(Cl)Cl)O' at line 155850 of
> chembl_23.fixed.smi. Exit
On May 19, 2017, at 08:33, Greg Landrum wrote:
> The best solution to this is to use chemfp. It's a remarkable piece of
> software.
Thanks, Greg.
> If you aren't willing to license that, the RDKit's search brute-force
> fingerprint search capabilities aren't too bad for in-memory fingerprints.
On Apr 19, 2017, at 23:59, Peter S. Shenkin wrote:
> One more thing. The term "Mol" in RDKit and some other tookits does not
> really mean "molecule" in the sense that chemists use it.
? I don't see how this is connected to the previous emails.
I believe most toolkits use that terminology in th
On Apr 19, 2017, at 18:26, Curt Fischer wrote:
> From chemistry stack exchange, an answer contributed by user R.M.:
>
> SMARTS is deliberately designed to be a superset of SMILES. That is, any
> valid SMILES depiction should also be a valid SMARTS query, one that will
> retrieve the very struct
On Apr 19, 2017, at 12:03, Thilo Bauer wrote:
> is converting SMARTS to SMILES a "lossless" operation, or does one loose
> information on doing so?
It is obviously not lossless if you include terms that cannot be represented in
SMILES.
>>> from rdkit import Chem
>>> Chem.MolToSmiles(Chem.MolF
On Mar 28, 2017, at 17:56, 杨弘宾 wrote:
> Have you tried install rdkit from source? It's ok when I installed rdkit
> by conda in my PC. But when I tried installing it in a server in which I am
> only a user who cannot use "sudo" and the "python" is in a read-only
> directory.
Yes I have, and
On Feb 8, 2017, at 19:22, Markus Metz wrote:
> The question to you is: Is there another more elegant way of doing it? May be
> I missed something from the python API?
I don't quite follow what you are looking for, though I have managed to
condense your code somewhat, into:
updatedMapping = Non
On Feb 7, 2017, at 22:26, Curt Fischer wrote:
> def same_implicit_valence(mol_1, mol_2, atom_idx=1):
> """Returns True if mol_1 and mol_2 have the same implicit valence for the
> indexed atom"""
> mol_1_implicitH = mol_1.GetAtomWithIdx(atom_idx).GetImplicitValence()
> mol_2_implicitH
On Feb 7, 2017, at 19:02, Curt Fischer wrote:
> My ultimate goal is an easy way to create rdkit molecules that have isotopic
> substitutions but which are otherwise exactly the same as non-substituted
> variants. What's the best approach? Is it to directly call .SetIsotope()
> like I do above
On Feb 7, 2017, at 01:17, Curt Fischer wrote:
> I am confused by this behavior:
>
> >>> labeled_etoh = Chem.MolFromSmiles('C[13C]O')
> >>> print(Chem.MolToSmiles(labeled_etoh))
>
> C[C]O
>
> >>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True))
>
> C[13C]O
>
> 1. Why are there any br
Dear Susan,
If I understand what's going on correctly, you have run across the difference
between 0-based and 1-based indexing. See
https://en.wikipedia.org/wiki/Zero-based_numbering .
RDKit, like most programming libraries and languages, index based on an offset
from the beginning, so 0 mea
On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote:
> I had thought about making a CanonicalAtomOrder function that does this as
> well, or perhaps making a MolToSmiles variant.
I learned about this function from Noel's blog post at
https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom
On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote:
> >>> m.GetProp("_smilesAtomOutputOrder")
> '[3,2,1,0,]'
>
> Note that this returns the list as a string which is sub-optimal.
> GetPropsAsDict will convert these to proper python objects, however, this is
> considered a private member so you nee
On Dec 16, 2016, at 3:27 PM, Andrew Dalke wrote:
> 2013 RDKit didn't preserve the atom order between labeled and unlabeled atoms.
...
> I no longer have an older version of RDKit installed.
My memory is wrong. I have rebuilt a version from 2013 and been unable to find
a failure cas
On Dec 17, 2016, at 1:45 AM, Milinda Samaraweera wrote:
> However at the end of each tag header I noticed there is a number (bolded):
>
> ...
> >(1)
> N1-(2-ethylbutyl)hexane-1,3,6-triamine
...
> What is this number and how you avoid printing this number when SDwriter is
> used? As this n
On Dec 16, 2016, at 1:55 PM, Stephen Pickett wrote:
> With a 2013 RDkit install we get consistent canonicalization between reaction
> labelled and unlabelled atoms.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1
On Dec 9, 2016, at 9:50 PM, Brian Kelley wrote:
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles("F/C=C/F")
> >>> for bond in m.GetBonds():
> ...print bond.GetStereo()
> ...
> STEREONONE
> STEREOE
> STEREONONE
>
> However, setting bond stereo doesn't appear to be exposed.
I thought
On Dec 5, 2016, at 3:28 PM, Alexis Parenty wrote:
> For the parenthesis issue, the difficulty is to differentiate the SMILES
> formats (xxx)(xxx) from this one (xxx)… I will try and address
> that using something like:
I do not understand. The first one is not a SMILES format.
Can y
On Dec 5, 2016, at 11:35 AM, Alexis Parenty wrote:
> I have tested my script on:
> • 7900 unique SMILES for “drug-like molecules”
> • Alice’s adventure in wonderland (I never read the book but I assumed
> there is no SMILES!)
> • A shuffled mixture of Alice’s in wonderland and 7900 uni
On Dec 2, 2016, at 5:46 PM, Brian Kelley wrote:
> I hacked a version of RDKit's smiles parser to compute heavy atom count,
> perhaps some version of this could be used to check smiles validity without
> making the actual molecule.
FWIW, here's my regex code for it, which makes the assumption tha
On Dec 3, 2016, at 3:02 PM, Brian Kelley wrote:
> If I had to pick, I would just use the normal MolFromSmiles, if you don't
> expect many actual smiles strings in your corpus, it's plenty fast.
I didn't follow from your timings what you used to see if something was a
SMILES candidate?
Was it wo
On Dec 2, 2016, at 10:05 PM, Brian Kelley wrote:
> Here is a very old version of Andrew's parser in code form: ... It was fairy
> well tested on the sigma catalog back in the day. It might be fun to
> resurrect use it in some form.
There's also my OpenSMILES parser written for Ragel:
https:/
On Dec 2, 2016, at 10:12 PM, George Papadatos wrote:
> If Alexis wants to search for valid SMILES strings representing typical
> organic molecules among text of plain English words, would it not be safe to
> assume that any word containing more than 4 'C' or 'c' characters would only
> be a SMIL
osures, and where the "connector"
# is the possible combinations of open/close parentheses, dot disconnect,
# or bond.
# It does not attempt to balance parenthesies, ensure matching ring
# closures, or handle aromaticity. those cannot be done with a regular
# expression.
# Written in 20
I'm trying to figure out which atoms lose chirality after breaking bonds using
FragmentOnBonds().
Here's an example where a chiral carbon after fragmentation gets two "*" atoms,
which makes the carbon achiral:
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("F[C@](Cl)(Br)O")
>>> fragmen
On Oct 2, 2016, at 10:48 PM, Maciek Wójcikowski wrote:
> Yes I get it, but obviously there is no MolFromSDBlock, so one would suspect
> MolFromMolBlock to support both formats. As I understand correctly the only
> way of reading SD from variable is as presented in my example? Or is there
> some
On Sep 7, 2016, at 11:53 AM, Stephen O'hagan wrote:
> How would I find the molecular weight (fraction) of that substructure within
> a compounds expressed as a SMILES string, e.g.:
I don't know if a built-in function which does this. It's possible to write
one. Here's a function which will compu
On Jun 21, 2016, at 5:26 PM, Greg Landrum wrote:
> Because chirality is represented relative to the ordering of the bonds around
> an atom, it's pretty difficult to do this if you want to actually break and
> add bonds on your own. This would probably be somewhat easier if there were
> an RWMol.
On Feb 10, 2016, at 6:09 AM, Greg Landrum wrote:
> I agree that this is a bug.
Glad to hear. I was wondering how I would get my code to handle that case
otherwise.
On Feb 10, 2016, at 4:19 PM, David Cosgrove wrote:
> As chiralities go, this one has turned out to be quite valuable!
You can tell
On Feb 8, 2016, at 7:03 PM, Paolo Tosco wrote:
> ... there is a "ghost" atom involved in determining the sulfur chirality,
> which is the sulfur lone pair. Even if this is not in the Daylight specs, the
> lone-pair is usually treated as an implicit hydrogen, and therefore
> considered as the fir
Thanks Paolo and Hannes for pointing me to sulfoxide. I am enlightened!
I assume this is something that every chemist knows, but it's not mentioned in
the Daylight SMILES documentation (or the OpenSMILES documentation), so I had
no clue. I wonder how many more cases there are like that.
Any ide
Hi!
Could someone explain to this non-chemist what the chirality means in the
following?
CN[S@@](=O)C1=CC=CC=C1
It comes from PubChem id 12194260 at
https://pubchem.ncbi.nlm.nih.gov/compound/12194260 .
Isn't this a symmetric structure, which can't have an orientation at that
point? Even
Hi Dave,
Thanks for the suggestion about mutating the atom in-place then pruning the
rest of the R-group away.
This will work, but it's inelegant and slow. Here's why.
I'm trying to construct an R-group table for each core, for up to 3 R-groups,
of a data set. For that, I need to know the ca
On Feb 3, 2016, at 6:42 AM, Greg Landrum wrote:
> 2) If you add a call to Chem.SanitizeMol(hydrogren_mol) before any of the
> calls to SMILES generation, it clears up the problem. The calls to
> SetNumExplicitHs() are not necessary.
I am able to fix my problem by adding a SANITIZE_ADJUSTHS :
de
On Feb 3, 2016, at 6:42 AM, Greg Landrum wrote:
> 1) in the code you have this snippet:
> # This gives: c1ccc(nc1)-n1ncc2ccc(nc21)C1CC1
> # That SMILES appears to be incorrect!
> Why do you think that's true?
I was incorrect in saying "incorrect". I should have said "not canonical". I
expect the
I'm working on a project where I cut a molecule along certain single bonds, to
find a core structure and one or more R-groups.
In yesterday's email, I mentioned a problem I have in creating a canonical
SMILES for the core when the R-groups are replaced by a hydrogen.
I also want to create a SMI
Hi all,
I have a problem that I think is due to my not understanding how to work with
explicit (or perhaps implicit) hydrogens.
In my project, I want to find the core of a molecule as well as its R-groups.
I use a SMARTS pattern to find the bonds to cut, then want to store two
versions of
On Dec 9, 2015, at 1:53 PM, chris dalton wrote:
> I have fragmented a molecule using GetMolFrags and want to relate the atoms
> in the fragments to the original molecule. However, each fragment appears to
> start at atom index 0 which prevents direct comparison with the original
> atoms.
One s
On Oct 8, 2015, at 2:38 PM, John M wrote:
> This seems odd... surely you can't go faster that iterating over the atoms
> and counting element 6?
One is in C++, the other is in Python.
> Perhaps the python iter is indeed slower than a SMARTS match but that can't
> be true?
The "for atom in mo
On Oct 7, 2015, at 11:38 PM, Ling Chan wrote:
> Or you can use AllChem.CalcMolFormula() to get the chemical formula.
Well spotted! It's a bit tricky because it needs to handle carbons with/without
count ("CH4", "C2H6"), and structures with no carbons ("P", "Ca", "Cd"); the
last two start with a
On Oct 7, 2015, at 11:30 AM, Christos Kannas wrote:
> Yes there is an easier way, by using substructure search, i.e. do a
> substructure search for [C] and then get the number of matches.
> m = Chem.MolFromSmiles("c1c1")
> patt= Chem.MolFromSmarts("[C]")
> pm = m.GetSubstructMatches
On Sep 16, 2015, at 9:57 PM, Bodle, Christopher R wrote:
> I am having trouble with RDKit correctly interpreting the SMARTS character
> [!#1], which should be interpreted as "any atom not hydrogen.
I've been looking at your emails but it's difficult for me to figure out what
you are doing. Can y
On Aug 23, 2015, at 6:38 PM, Jing Lu wrote:
> I hope the memory issue won't be a problem.
That's up to you and your choice of threshold.
> Most AgglomerativeClustering algorithms have time complexity with N^2. Will
> that be a problem?
You have to decided for yourself what counts as a problem.
On Aug 23, 2015, at 3:43 AM, Jing Lu wrote:
> If I want to cluster more than 1M molecules by ECFP4. How could I do it? If I
> calculate the distance between every pair of molecules, the size of distance
> matrix will be too big. Does RDKit support any heuristic clustering algorithm
> without cal
Dear Takayuki,
On Aug 1, 2015, at 3:54 AM, Taka Seri wrote:
> Why the [MACCSkeys] bit length is not 166 bit ?
>
The RDKit MACCS implementation follows the MACCS key assignments, which start
at 1. MACCS bit 0 is always set to 0, bit 1 corresponds to key 1, etc., so key
166 is at bit 166, givi
On Jun 16, 2015, at 10:20 PM, Peter Shenkin wrote:
> [N-]=[N+]=NC(=O)N1C(=O)N([N+]([O-])=O)C2(C13C4=C56)C4=C5C2=C36
> [N-]=[N+]=NC(=O)N(C(=O)N1[N+]([O-])=O)C(c23)(c4c56)C16c3c5c24
>
> rdkit canonicalizes the two to the following, respectively:
>
> [N-]=[N+]=NC(=O)N1C(=O)N([N+](=O)[O-])C23c4c5c2c2
On Jun 11, 2015, at 2:20 PM, Laëtitia Bomble wrote:
> Is there a rdkit tool to get IUPAC name of a molecule?
No, there isn't. If you only have a few names, and/or are
willing to wait for a web service, you can use the NCI
resolver at
http://cactus.nci.nih.gov/chemical/structure
For example, th
Hi all,
I spent the last couple of week working on a project related to molecular
property and model calculations. It's called 'propbox', and is available from
https://bitbucket.org/dalke/propbox .
There are two parts to it:
- a (sparse) table, where the rows are structures and the columns
On May 1, 2015, at 12:01 AM, Michael Reutlinger wrote:
> However, in some cases this does not help. E.g. when an unknown atom (most of
> the time this is X) is found in the MolBlock the import fails with an
> Post-condition Violation and None is yielded. This is fine to detect the
> problem BUT
On Apr 30, 2015, at 6:08 AM, Greg Landrum wrote:
> I still need to put some thought into patching the SDWriter so that it can
> recognize things like consecutive line endings in property values. The big
> question is what it should do when it encounters such a case. Is that an
> error? Should it
On Apr 29, 2015, at 9:19 PM, Dimitri Maziuk wrote:
> There is a difference between ACM members writing network protocols and
> "domain" people writing junk.
I think that you are saying that the MDL connection table
file formats are junk. I do not disagree. But it's something
we have to deal with s
On Apr 29, 2015, at 7:30 PM, Dimitri Maziuk wrote:
> Based on "be liberal in what you accept and conservative in what you
> produce", the writer should
Postel's Robustness principle is a mistake.
See RFC 3117 for elaboration, at http://tools.ietf.org/html/rfc3117#page-16
Counter-intuitively, P
Riccardo Vianello:
> I suppose that if the correctness of the parser is confirmed, then a change
> could be suggested for the writer, consisting in raising an error if blank
> lines are present inside the data item.
Yes, the SD tag data is not a general purpose data field. It's not possible,
On Nov 3, 2014, at 10:22 AM, Pahl, Axel wrote:
> Please have fun with the program and let me know if there are any bugs
> or improvement proposals (apart from those already listed in the README).
It looks very nice! You might want to put a link to
http://nbviewer.ipython.org/github/apahl/sdf
On Jul 23, 2014, at 10:26 PM, Abhik Seal wrote:
> I have a sdf file attached(2 molecules only)
What you have isn't an SD file. It's missing a line in the header block.
The header block is supposed to contain three lines. I quote from the
specification:
> Line 1: Molecule name. This line is unfo
Hi Sushil,
On May 8, 2014, at 12:26 PM, Sushil Mishra wrote:
> MCS algorithm seems to me unable to handle chiral carbons and it can not
> differentiate chiral changes in ligands.
That's correct. The MCS algorithm in RDKit doesn't consider chirality. While in
principle I think it would be possi
On Feb 18, 2014, at 6:51 PM, Matthew Swain wrote:
> I don't really know what's going on here, but you could try [#5!B] for you
> SMARTS.
>
> #5 to match any boron, and !B to disallow non-aromatic.
Another possibility is [#5a], since "a" means "aromatic"
>>> from rdkit import Chem
>>> mol = Chem
On Nov 24, 2013, at 11:58 PM, Nikolas Fechner wrote:
> if I remember correctly 10.1 was an intermediate pandas version where the
> HTML rendering in tables, that we use for rendering the structures, does not
> work as we need it. In this version pandas introduced an HTML escaping, which
> leads
On Oct 25, 2013, at 10:11 AM, Roger Sayle wrote:
> The use of an integer file format "flavor" argument allows the caller
> to customize the behavior of the readers and writers. The semantics
> is that a reasonable default is zero (for all bits), but that new
> features may be added without chang
On Apr 11, 2013, at 1:39 PM, Quentin Delettre wrote:
> I was more concerned about algorithms/implementation, pitfalls that
> could happen and performance.
There are none. "Pretty much every cheminformatics toolkit can do
what you want."
The toolkits I know of use either the Ullmann algorithm or
On Apr 11, 2013, at 10:10 AM, Quentin Delettre wrote:
> I plan to use substructure search for around 1500 molecules versus 3000 small
> fragments ..
> I am quite new in the field and it's the occasion to compare programs and
> libraries
> that can do that. Can you provide me some links to papers
Hi Fabian,
On Mar 19, 2013, at 2:05 PM, Fabian Dey wrote:
> - in order to get a 1-1 correspondence of atom ids (to get the coordinate
> map) I had to search the MCS-SMARTS match again against the original files to
> get the atom-ids - is there a more direct way to do this?
There is no more dire
On Feb 22, 2013, at 6:51 AM, Greg Landrum wrote:
> Please feel free to add to the list by either commenting on that page,
> sending ideas here, or emailing them to me directly.
Add an implementation of Noel's method to use InChI to get a
canonical ordering for SMILES output.
Any improvements to t
On Feb 18, 2013, at 5:50 PM, paul.czodrow...@merckgroup.com wrote:
> My issue2solve: read in a sdf.gz & simply extract the SD tags.
If you don't mind digging into the undocumented chemfp API
(which mean that it may change in the future), then you can
use the simple-minded SDF reader I wrote for it
On Dec 15, 2012, at 4:40 AM, Greg Landrum wrote:
> Note that this also means that the H in C[OH] is ignored, so it's now a
> substructure of C[O-]. For finer-grain control over H specifications in
> queries, you will need to use either SMARTS or molecules that have Hs added.
>
> This look ok?
Y
On Dec 13, 2012, at 3:32 PM, paul.czodrow...@merckgroup.com wrote:
>> I think I figured out a way around that via some post-processing.
>
> Great!
> Now let's come to another question:
> How does one code the "complete-ring-only" variation?
> Can your code be adapated, or shall I do some post-pr
On Dec 13, 2012, at 9:18 AM, paul.czodrow...@merckgroup.com wrote:
> Regarding the issues you mentioned
...
>
> - non-canonical SMARTS
> - duplicates are not filtered out
I think I figured out a way around that via some post-processing.
>> Or do you mean the number of molecules which contain
On Dec 4, 2012, at 11:56 AM, Greg Landrum wrote:
> Bonds are matched purely using bond type, with the one exception that a bond
> of unspecified type matches anything and is matched by anything.
And I forgot to ask - is there any way in SMILES to produce a
bond of unspecified type?
> hmm, not su
I am beginning to realize the error of my ways.
This is the same issue which occurred in fmcs. Suppose
you have c1c1C and CC. The MCS between those two is
[#6]-[#6]. Atom aromaticity is not useful when doing
a comparison.
On Dec 4, 2012, at 5:32 AM, Greg Landrum wrote:
> Aromaticity is ignor
On Dec 3, 2012, at 4:55 PM, Greg Landrum wrote:
> Yes, it's here:
> http://www.rdkit.org/docs/RDKit_Book.html#atom-atom-matching-in-substructure-queries
Thanks.
It's incomplete though - it doesn't show how bonds are matched nor
how aromaticity is handled for atoms. Does a SMILES with a "C" mean
t
What are the steps one must to to use an input structure (from
a SMILES string) as a substructure query? It looks like I need
to remove explicit hydrogens [* see footnote]. Is there anything
else? And what is the right way to remove explicit hydrogens?
I'm working again on a project to do substru
Hi Greg,
> I've found some behavior in the MCS code that I would call a bug. (I'm
> hedging a bit because I realize one could argue about this one...)
> ...
> [11]>>> MCS.FindMCS(mols,ringMatchesRingOnly=False,completeRingsOnly=True)
> [11]: MCSResult(numAtoms=-1, numBonds=-1, smarts=None, compl
Hi all,
As you may know, Roche funded me to implement the multiple-structure
MCS algorithm which is now part of RDKit. They see it as a way to contribute
back to free and open source cheminformatics software projects.
I would like to show them that it has been a success, or at least that
peop
On Nov 20, 2012, at 11:05 AM, paul.czodrow...@merckgroup.com wrote:
> The situation is getting complicated, since your hack did not help.
With your error message, I see that only 'sans' is allowed in RDKit.
So says rdkit/Chem/Draw/spingCanvas.py:
faceMap={'sans':'helvetica'}
which means I gave y
101 - 200 of 310 matches
Mail list logo