Re: [Cdk-user] Wrong molecular formula?

2020-12-03 Thread John Mayfield
Hi Manual,

Chris is right, unfortunately the ChemDraw export isn't quite correct. It
is actually possible to represent multi-attach in V3000 but it's not used
here. The more common problem is that there are simply a random bond into
the middle of a ring. I've done a fair bit of work on ChemDraw processing (
https://nextmovesoftware.com/blog/2016/07/28/sketchy-sketches/), the
biggest issue is the ChemDraw chemical formula/abbreviation parsing, for
example K2CO3 has a peroxide, HATU is a "[H]*[3H][U]", etc (I show more
examples in the poster).

NextMove has a commercial tool to generate CXSMILES, for you example note
the *m:* part on the end that captures the positional variation.

[john@harbinger:Praline]% java -jar exec/target/praline.jar convert
> ~/Downloads/structure.cdx --cxsmi
> [Ru]([P](CCC1=CC=CC=C1)(C2C2)C3C3)(Cl)(Cl)*.C1(=CC=C(C=C1)C(C)C)C
> |m:24:25.26.27.28.29.30| structure Molecule/Specific/High/+PVar


CDK can read and handle this, we actually do get the formula wrong still
though (will fix that).

OpenBabel has a FOSS ChemDraw parser, one option could be to modify that
and parse your examples to get the info and then generate the
MOLfile/CXSMILES. The parsing is easy *NodeType="MultipleAttach"
Attachments="{id1} {id2} .."* where the id's are node ids. Unfortunately I
don't think they have the data structures to represent it so it would be a
fair bit of work other than handling these fields.

All the best,
John

On Wed, 2 Dec 2020 at 15:05, Christoph Steinbeck <
christoph.steinb...@uni-jena.de> wrote:

> Dear Manuel,
>
> if you open the mol file in a text editor, there are clearly 31 C atoms in
> the file.
> So the CDK is “right”. I also opened the file in Marvin Sketch and it
> output the analysis below.
>
> ChemDraw uses a fishy trick, as it seems, to create the illusion of a
> multi-center attachment. Clearly, they focus on publication-ready drawing
> of chemical structures and not one creating correct file representations of
> the chemistry. Fact is that the end of the line to the center of the
> benzene ring is a carbon atom and nothing else.
>
> Kind regards,
>
> Chris
>
> —
> Prof. Dr. Christoph Steinbeck
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> Phone Secretariat: +49-3641-948171
> http://cheminf.uni-jena.de
> http://orcid.org/-0001-6966-0814
>
> What is man but that lofty spirit - that sense of enterprise.
> ... Kirk, "I, Mudd," stardate 4513.3..
>
>
>
>
>
> > On 2. Dec 2020, at 14:38, Stesycki, Manuel 
> wrote:
> >
> > Dear CDK users,
> >
> > we are using CDK version 2.3 in our application.
> > As a user tried to add a structure (see attachment) we found a
> difference in the molecular formula of the structure.
> >
> > The original structure was draw with ChemDraw 18.
> > A multi-center attachment was added to the structure and ChemDraw shows
> this molecular formula: C30H46Cl2PRu
> >
> > Whereas our application takes the mol-version of the cdx-file and
> computes this formula: C31H49Cl2PRu
> > To get the formula we use this piece of code:
> >
> > IMolecularFormula form =
> MolecularFormulaManipulator.getMolecularFormula(mol);
> > sumFormula = MolecularFormulaManipulator.getString(form);
> >
> > Did we missed something by creating the AtomContainer?
> > We create the atomcontainer directly by parsing the mol-file:
> > try (StringReader sr = new StringReader(molFile); MDLV2000Reader mr =
> new MDLV2000Reader(sr, mode)) {
> >
> > AtomContainer mol = new AtomContainer();
> > AtomContainer ac = mr.read(mol);
> > }
> >
> > Maybe someone can give us a hint, what we are doing wrong.
> >
> > Best regards,
> >Manuel Stesycki
> >
> > IT
> >0208 / 306-2146
> >Physikbau, Büro 117
> >stesy...@mpi-muelheim.mpg.de
> >
> > Max-Planck-Institut für Kohlenforschung
> >Kaiser-Wilhelm-Platz 1
> >D-45470 Mülheim an der Ruhr
> >http://www.kofo.mpg.de/de
> >
> > ___
> > Cdk-user mailing list
> > Cdk-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Wrong molecular formula?

2020-12-02 Thread Christoph Steinbeck
Dear Manuel, 

if you open the mol file in a text editor, there are clearly 31 C atoms in the 
file. 
So the CDK is “right”. I also opened the file in Marvin Sketch and it output 
the analysis below. 

ChemDraw uses a fishy trick, as it seems, to create the illusion of a 
multi-center attachment. Clearly, they focus on publication-ready drawing of 
chemical structures and not one creating correct file representations of the 
chemistry. Fact is that the end of the line to the center of the benzene ring 
is a carbon atom and nothing else. 

Kind regards, 

Chris

— 
Prof. Dr. Christoph Steinbeck
Analytical Chemistry - Cheminformatics and Chemometrics
Friedrich-Schiller-University Jena, Germany
Phone Secretariat: +49-3641-948171
http://cheminf.uni-jena.de
http://orcid.org/-0001-6966-0814

What is man but that lofty spirit - that sense of enterprise.
... Kirk, "I, Mudd," stardate 4513.3..





> On 2. Dec 2020, at 14:38, Stesycki, Manuel  
> wrote:
> 
> Dear CDK users,
> 
> we are using CDK version 2.3 in our application.
> As a user tried to add a structure (see attachment) we found a difference in 
> the molecular formula of the structure.
> 
> The original structure was draw with ChemDraw 18.
> A multi-center attachment was added to the structure and ChemDraw shows this 
> molecular formula: C30H46Cl2PRu
> 
> Whereas our application takes the mol-version of the cdx-file and computes 
> this formula: C31H49Cl2PRu
> To get the formula we use this piece of code:
> 
> IMolecularFormula form = MolecularFormulaManipulator.getMolecularFormula(mol);
> sumFormula = MolecularFormulaManipulator.getString(form);
> 
> Did we missed something by creating the AtomContainer?
> We create the atomcontainer directly by parsing the mol-file:
> try (StringReader sr = new StringReader(molFile); MDLV2000Reader mr = new 
> MDLV2000Reader(sr, mode)) {
> 
> AtomContainer mol = new AtomContainer();
> AtomContainer ac = mr.read(mol);
> }
> 
> Maybe someone can give us a hint, what we are doing wrong.
> 
> Best regards,
>Manuel Stesycki
> 
> IT
>0208 / 306-2146
>Physikbau, Büro 117
>stesy...@mpi-muelheim.mpg.de 
> 
> Max-Planck-Institut für Kohlenforschung
>Kaiser-Wilhelm-Platz 1
>D-45470 Mülheim an der Ruhr
>http://www.kofo.mpg.de/de
> 


structure.cdx
Description: chemical/cdx


structure.mol
Description: MOL mdl chemical test
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user




___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


[Cdk-user] Wrong molecular formula?

2020-12-02 Thread Stesycki, Manuel
Dear CDK users,

we are using CDK version 2.3 in our application.
As a user tried to add a structure (see attachment) we found a difference in 
the molecular formula of the structure.

The original structure was draw with ChemDraw 18.
A multi-center attachment was added to the structure and ChemDraw shows this 
molecular formula: C30H46Cl2PRu

Whereas our application takes the mol-version of the cdx-file and computes this 
formula: C31H49Cl2PRu
To get the formula we use this piece of code:

IMolecularFormula form = MolecularFormulaManipulator.getMolecularFormula(mol);
sumFormula = MolecularFormulaManipulator.getString(form);

Did we missed something by creating the AtomContainer?
We create the atomcontainer directly by parsing the mol-file:
try (StringReader sr = new StringReader(molFile); MDLV2000Reader mr = new 
MDLV2000Reader(sr, mode)) {

AtomContainer mol = new AtomContainer();
AtomContainer ac = mr.read(mol);
}

Maybe someone can give us a hint, what we are doing wrong.

Best regards,
   Manuel Stesycki

IT
   0208 / 306-2146
   Physikbau, Büro 117
   stesy...@mpi-muelheim.mpg.de

Max-Planck-Institut für Kohlenforschung
   Kaiser-Wilhelm-Platz 1
   D-45470 Mülheim an der Ruhr
   http://www.kofo.mpg.de/de



structure.cdx
Description: structure.cdx


structure.mol
Description: structure.mol
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user