About 10 days ago I posted a prototype program called 'smiview', which displays
information about the structure of a SMILES string.
Thanks to feedback from a couple of users, and a deep urge to explore the idea,
I've just released smiview 1.2, available from
https://bitbucket.org/dalke/smiview/downloads/smiview-1.2.tar.gz .
For details about what it can do, see the README at
https://bitbucket.org/dalke/smiview .
Some of the changes are:
- lots of bug fixes
- the SMILES tokenizer will now try to parse the contents of an atom in []s
- the atom indicators now point to the first character of the element symbol(s)
rather than the first character of the atom token (e.g., the "C" in
"[35Cl]"
and not the "[")
- the 'closures' track now highlights the atoms involved in a minimal cycle
for a closure, rather than the SMILES string between the two closure points
- more control over some of the styles
- there is now code to generate the molecular graph, which means smiview can
also
report errors like C11 (closure to itself) and C1C1 (two bonds between
atoms)
- new tracks, like "hcounts" to show the number of implicit hydrogens on each
atom,
and "symclasses" to show each atom's symmetry class
- support for both RDKit and OEChem, or no toolkit, albeit with reduced
functionality
- options to modify the input SMILES so all atoms have explicit hydrogen
counts,
and to set the isotope and atom class fields base on the atom index,
symmetry
class, or element number.
- cleaned up and re-organized the internals. It now uses an experimental
property
calculation dependency system, and has a "track manager" to organize the
tracks.
Here's what it looks like with most of the tracks enabled (which is rather
overwhelming):
% smiview 'Cn1c(=O)c2c(ncn2C)n(C)c1=O' --fancy
┌ 1 1 1 1
atoms│ 01 2 3 4 5 678 9 0 1 2 3
└ || | | | | ||| | | | | |
byte offsets┌ 1 1 2 2
└ 0 5 0 5 0 5
token types[ AA%A(BA)A%A(AAA%A)A(A)A%BA
SMILES[ Cn1c(=O)c2c(ncn2C)n(C)c1=O
hcounts[ 30 0 0 0 0 010 3 0 3 0 0
branches┌ *(..) *(.....)
└ *(.)
closures┌ *1* * * * *1
└ *2*.***2 .
fragments[ 00000000000000000000000000
symclasses┌ 01 7 3 9 1 651 1 1 2 8 4
└ 1 0 2 3
I'll focus on just the closures, and give more emphasis to the element symbols
which make up either end of the closure (marked with a "*") while the other
atoms in the closure ring are marked with an "x":
% smiview 'Cn1c(=O)c2c(ncn2C)n(C)c1=O' -b closures --closure-atom-style
end-elements
┌ 1 1 1 1
atoms│ 01 2 3 4 5 678 9 0 1 2 3
└ || | | | | ||| | | | | |
SMILES[ Cn1c(=O)c2c(ncn2C)n(C)c1=O
closures┌ *1x x x x *1
└ *2x.xx*2 .
With a bit of counting of *'s and x's you can see there's a ring of size 6 and
another of size 5.
Here's an example of the input syntax processing; I'll convert all of the atoms
to use the bracket form, by adding the correct hydrogen count to each
non-bracket atom:
% smiview 'Cn1c(=O)c2c(ncn2C)n(C)c1=O' --use-brackets -a input-smiles -b none
--width 80
input smiles[ C n 1 c (= O ) c 2 c ( n c n 2 C ) n ( C ) c 1= O
SMILES[ [CH3][n]1[c](=[O])[c]2[c]([n][cH][n]2[CH3])[n]([CH3])[c]1=[O]
If you want the modified SMILES string from another program, or to copy&paste
it, then turn off the legend and use a large enough width. Here I'll also set
the isotope to the atom index+1, which might be used as a way to tag the atoms:
% smiview 'Cn1c(=O)c2c(ncn2C)n(C)c1=O' --set-isotope index+1 -a none -b none
--width 100000 --legend off
[1CH3][2n]1[3c](=[4O])[5c]2[6c]([7n][8cH][9n]2[10CH3])[11n]([12CH3])[13c]1=[14O]
Let me know what you think.
Andrew
da...@dalkescientific.com
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss