Re: [Cdk-user] aromatic bonds depicted as any bonds

2018-01-03 Thread John Mayfield
Incidentally, I believe ChemAxon is the only one producing these molfiles
with aromatic bonds. Certainly CDK/RDKit/OpenBabel/OEChem don't, I think
Indigo used to generate them in older versions.

$ obabel -ismi -:'c1c1' -omol



OpenBabel01031816072D



  6  6  0  0  0  0  0  0  0  0999 V2000
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  6  1  0  0  0  0
>   1  2  2  0  0  0  0
>   2  3  1  0  0  0  0
>   3  4  2  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  2  0  0  0  0
> M  END
> 1 molecule converted


On 3 January 2018 at 15:58, Tim Dudgeon  wrote:

> John,
>
> Thanks for the details response. I think it will be useful to be able to
> depict aromatic bonds, and, as you mention, the main proper use for this
> will be for query structures and fragments. However, many structures out
> there in the wild do use aromatic bonds, so I think its useful to have it
> for normal structures too.
>
> Tim
>
> p.s. when I referred to the dotted bond as 'ANY' bond, this is the
> notation that ChemAxon uses to depict this type of query bond. But I guess
> that's not an absolute standard.
>
> On 03/01/18 14:03, John Mayfield wrote:
>
> I'll answer these back to front as the second one is much simpler to
> answer:
>
> 1. Why the inconsistency in how the different parsers/readers behave? Is
>> this documented anywhere?
>> 2. Is it possible to have the aromatic bonds depicted as proper aromatic
>> bonds?
>
>
> Answer 2: I don't think the dohnuts are useful for plain old structures,
> only query structures. The circles also do not scale well to all cases
> (porphyrin is a classic). The dashed bond in the depiction is not really
> 'any' bond as you say but rather "you input was junk/had missing
> information" (see the next Answer on why that is). Since as I said for
> query structures you need the 'delocalised' bond depiction i've updated the
> renderer accordingly. For now I've just done an offset dash but will try
> and find time to add in the dohuts: https://github.com/cdk/cdk/pull/403
>
> Answer 1: The short answer is CDK matches behaviour to what Daylight does
> for SMILES and what MDL/Symyx/Accelrys/BIOVIA do for molfile. You can
> safely use aromatic bond types in SMILES and not in CTfiles.
>
> In CDK aromaticity is a bond property and not a type/order, that is to say
> the bond order is independent of the aromatic status of the bond. The
> "normal form" of a molecule in the CDK is to have all the hydrogen counts
> and bond orders set - if this not so you will get warnings/exceptions all
> over the place. A molecule can be in an inconsistent state if an input
> format was invalid or you create it that way manually. As I'm sure you
> know, bond type = 4 in CTfiles is a query feature, if you use it to
> represent a discrete structure there is no way to know what the original
> representation was. If I try to read your structure with BIOVIA I get an
> error:
>
> ORA-20100: MDL-1919: Molecule failed registration check:
>> Error: (root) No query features allowed for registration
>> MDL-0633: Unable to convert molfile string to binary molecule ctab
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 359
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 352
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 335
>
>
> I've written a wiki section to help explain why the problem exists:
> https://github.com/cdk/cdk/wiki/CTfile-Reading#aromatic-query-bonds.
> Rather then reject molfiles with aromatic bonds outright we leave the
> molecule in an inconsistent state as a user knows their data better then us
> and may be able to correct it. SMILES will automatically kekulize input
> because it can safely do so.
>
> Hope that helps,
> John
>
>
> On 26 December 2017 at 14:46, Tim Dudgeon  wrote:
>
>> I've noticed that if you try to depict a structure in molfile format that
>> has bonds in rings defined as aromatic type then they are depicted as any
>> bonds (dashed), not aromatic (donuts). For example take this molfile:
>>
>>
>>   Mrv17a0 10061711272D
>>
>>  14 15  0  0  0  0999 V2000
>> 0.54200.23230. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.2564   -0.18020. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.2564   -1.00520. N   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.9239   -1.49010. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 2.7085   -1.23520. S   0  0  0  0  0  0  0  0  0  0  0  0
>> 3.3216   -1.78720. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.6689   -2.27480. N   0  0  0  0  0  0  0  0  0  0  0  0
>> 0.8439  

Re: [Cdk-user] aromatic bonds depicted as any bonds

2018-01-03 Thread Tim Dudgeon

John,

Thanks for the details response. I think it will be useful to be able to 
depict aromatic bonds, and, as you mention, the main proper use for this 
will be for query structures and fragments. However, many structures out 
there in the wild do use aromatic bonds, so I think its useful to have 
it for normal structures too.


Tim

p.s. when I referred to the dotted bond as 'ANY' bond, this is the 
notation that ChemAxon uses to depict this type of query bond. But I 
guess that's not an absolute standard.



On 03/01/18 14:03, John Mayfield wrote:
I'll answer these back to front as the second one is much simpler to 
answer:


1. Why the inconsistency in how the different parsers/readers
behave? Is this documented anywhere?
2. Is it possible to have the aromatic bonds depicted as proper
aromatic bonds?


Answer 2: I don't think the dohnuts are useful for plain old 
structures, only query structures. The circles also do not scale well 
to all cases (porphyrin is a classic). The dashed bond in the 
depiction is not really 'any' bond as you say but rather "you input 
was junk/had missing information" (see the next Answer on why that 
is). Since as I said for query structures you need the 'delocalised' 
bond depiction i've updated the renderer accordingly. For now I've 
just done an offset dash but will try and find time to add in the 
dohuts: https://github.com/cdk/cdk/pull/403


Answer 1: The short answer is CDK matches behaviour to what Daylight 
does for SMILES and what MDL/Symyx/Accelrys/BIOVIA do for molfile. You 
can safely use aromatic bond types in SMILES and not in CTfiles.


In CDK aromaticity is a bond property and not a type/order, that is to 
say the bond order is independent of the aromatic status of the bond. 
The "normal form" of a molecule in the CDK is to have all the hydrogen 
counts and bond orders set - if this not so you will get 
warnings/exceptions all over the place. A molecule can be in an 
inconsistent state if an input format was invalid or you create it 
that way manually. As I'm sure you know, bond type = 4 in CTfiles is a 
query feature, if you use it to represent a discrete structure there 
is no way to know what the original representation was. If I try to 
read your structure with BIOVIA I get an error:


ORA-20100: MDL-1919: Molecule failed registration check:
Error: (root) No query features allowed for registration
MDL-0633: Unable to convert molfile string to binary molecule ctab
ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 359
ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 352
ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 335


I've written a wiki section to help explain why the problem exists: 
https://github.com/cdk/cdk/wiki/CTfile-Reading#aromatic-query-bonds. 
Rather then reject molfiles with aromatic bonds outright we leave the 
molecule in an inconsistent state as a user knows their data better 
then us and may be able to correct it. SMILES will automatically 
kekulize input because it can safely do so.


Hope that helps,
John


On 26 December 2017 at 14:46, Tim Dudgeon > wrote:


I've noticed that if you try to depict a structure in molfile
format that has bonds in rings defined as aromatic type then they
are depicted as any bonds (dashed), not aromatic (donuts). For
example take this molfile:


  Mrv17a0 10061711272D

 14 15  0  0  0  0    999 V2000
    0.5420    0.2323    0. C   0  0  0  0 0  0  0  0  0  0  0  0
    1.2564   -0.1802    0. C   0  0  0  0 0  0  0  0  0  0  0  0
    1.2564   -1.0052    0. N   0  0  0  0 0  0  0  0  0  0  0  0
    1.9239   -1.4901    0. C   0  0  0  0 0  0  0  0  0  0  0  0
    2.7085   -1.2352    0. S   0  0  0  0 0  0  0  0  0  0  0  0
    3.3216   -1.7872    0. C   0  0  0  0 0  0  0  0  0  0  0  0
    1.6689   -2.2748    0. N   0  0  0  0 0  0  0  0  0  0  0  0
    0.8439   -2.2748    0. N   0  0  0  0 0  0  0  0  0  0  0  0
    0.5890   -1.4901    0. C   0  0  0  0 0  0  0  0  0  0  0  0
   -0.1956   -1.2352    0. C   0  0  0  0 0  0  0  0  0  0  0  0
   -0.8631   -1.7201    0. C   0  0  0  0 0  0  0  0  0  0  0  0
   -1.5305   -1.2352    0. C   0  0  0  0 0  0  0  0  0  0  0  0
   -1.2756   -0.4506    0. C   0  0  0  0 0  0  0  0  0  0  0  0
   -0.4506   -0.4506    0. S   0  0  0  0 0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  4  0  0  0  0
  4  5  1  0  0  0  0
  5  6  1  0  0  0  0
  4  7  4  0  0  0  0
  7  8  4  0  0  0  0
  8  9  4  0  0  0  0
  3  9  4  0  0  0  0
  9 10  1  0  0  0  0
 10 11  4  0  0  0  0
 11 12  4  0  0  0  0
 12 13  4  0  0  0  0
 13 14  4  0  0  0  0
 10 14  4  0  0  0  0
M  END

Some of the bonds are clearly aromatic (4 in the 3rd column of the
bond block). But when 

Re: [Cdk-user] aromatic bonds depicted as any bonds

2018-01-03 Thread John Mayfield
I'll answer these back to front as the second one is much simpler to answer:

1. Why the inconsistency in how the different parsers/readers behave? Is
> this documented anywhere?
> 2. Is it possible to have the aromatic bonds depicted as proper aromatic
> bonds?


Answer 2: I don't think the dohnuts are useful for plain old structures,
only query structures. The circles also do not scale well to all cases
(porphyrin is a classic). The dashed bond in the depiction is not really
'any' bond as you say but rather "you input was junk/had missing
information" (see the next Answer on why that is). Since as I said for
query structures you need the 'delocalised' bond depiction i've updated the
renderer accordingly. For now I've just done an offset dash but will try
and find time to add in the dohuts: https://github.com/cdk/cdk/pull/403

Answer 1: The short answer is CDK matches behaviour to what Daylight does
for SMILES and what MDL/Symyx/Accelrys/BIOVIA do for molfile. You can
safely use aromatic bond types in SMILES and not in CTfiles.

In CDK aromaticity is a bond property and not a type/order, that is to say
the bond order is independent of the aromatic status of the bond. The
"normal form" of a molecule in the CDK is to have all the hydrogen counts
and bond orders set - if this not so you will get warnings/exceptions all
over the place. A molecule can be in an inconsistent state if an input
format was invalid or you create it that way manually. As I'm sure you
know, bond type = 4 in CTfiles is a query feature, if you use it to
represent a discrete structure there is no way to know what the original
representation was. If I try to read your structure with BIOVIA I get an
error:

ORA-20100: MDL-1919: Molecule failed registration check:
> Error: (root) No query features allowed for registration
> MDL-0633: Unable to convert molfile string to binary molecule ctab
> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 359
> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 352
> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 335


I've written a wiki section to help explain why the problem exists:
https://github.com/cdk/cdk/wiki/CTfile-Reading#aromatic-query-bonds. Rather
then reject molfiles with aromatic bonds outright we leave the molecule in
an inconsistent state as a user knows their data better then us and may be
able to correct it. SMILES will automatically kekulize input because it can
safely do so.

Hope that helps,
John


On 26 December 2017 at 14:46, Tim Dudgeon  wrote:

> I've noticed that if you try to depict a structure in molfile format that
> has bonds in rings defined as aromatic type then they are depicted as any
> bonds (dashed), not aromatic (donuts). For example take this molfile:
>
>
>   Mrv17a0 10061711272D
>
>  14 15  0  0  0  0999 V2000
> 0.54200.23230. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.2564   -0.18020. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.2564   -1.00520. N   0  0  0  0  0  0  0  0  0  0  0  0
> 1.9239   -1.49010. C   0  0  0  0  0  0  0  0  0  0  0  0
> 2.7085   -1.23520. S   0  0  0  0  0  0  0  0  0  0  0  0
> 3.3216   -1.78720. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.6689   -2.27480. N   0  0  0  0  0  0  0  0  0  0  0  0
> 0.8439   -2.27480. N   0  0  0  0  0  0  0  0  0  0  0  0
> 0.5890   -1.49010. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.1956   -1.23520. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.8631   -1.72010. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.5305   -1.23520. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.2756   -0.45060. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.4506   -0.45060. S   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0  0  0  0
>   2  3  1  0  0  0  0
>   3  4  4  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  1  0  0  0  0
>   4  7  4  0  0  0  0
>   7  8  4  0  0  0  0
>   8  9  4  0  0  0  0
>   3  9  4  0  0  0  0
>   9 10  1  0  0  0  0
>  10 11  4  0  0  0  0
>  11 12  4  0  0  0  0
>  12 13  4  0  0  0  0
>  13 14  4  0  0  0  0
>  10 14  4  0  0  0  0
> M  END
>
> Some of the bonds are clearly aromatic (4 in the 3rd column of the bond
> block). But when rendering with code like this you get those bonds depicted
> as dashed bonds:
>
> String mol = ...
> DepictionGenerator dg = new DepictionGenerator()
> .withTerminalCarbons()
> .withSize(500d, 400d)
> .withFillToFit()
>
> MDLV2000Reader v2000Parser = new MDLV2000Reader(new
> ByteArrayInputStream(mol.getBytes()))
> IAtomContainer atomContainer = v2000Parser.read(new AtomContainer())
> Depiction depiction = dg.depict(atomContainer)
> depiction.writeTo("png", "/tmp/mol.png")
>
> This is using either CDK 2.0 or 2.1.
>
> If you try a similar thing with the same molecule in smiles format the
> behaviour is a bit different.
>
> String mol2 =