Here I was getting ready to write a long explanation, but now I see it's
not needed. Thanks Jan!

-greg



On Fri, Aug 22, 2014 at 9:07 PM, Michał Nowotka <mmm...@gmail.com> wrote:

> Hi Jan,
>
> Yes, thanks to Greg's hints I've implemented this
>
> https://github.com/mnowotka/chembl_beaker/blob/master/chembl_beaker/beaker/core_apps/marvin/MarvinJSONEncoder.py#L292
> .
> Anyway, thank you for finding actual code, it was definitely worth
> taking a look, the whole parser implementation is interesting.
>
> Cheers,
> Michał
>
> On Fri, Aug 22, 2014 at 7:36 PM, Jan Holst Jensen <j...@biochemfusion.com>
> wrote:
> > On 2014-08-22 10:38, Michał Nowotka wrote:
> >
> > A question I have is why you want to access the bond wedging.
> >
> > [...] Now imagine I only have this molfile and I want to convert it back
> to
> > *mrv. I don't want to write my own parser for molfiles when I know
> > that RDKit can already parse it. But I need to extract this 'bond
> > stereo' information from within RDKit somehow.
> >
> > Now when you say that this '1' or 'W' value corresponds to bond
> > direction, I'm guessing that 'direction' can store only two values: up
> > and down so '1' and '6' ('W' and 'H' in marvin terms). So what about
> > other values which this field can have, If for example I have this
> > molfile:
> >
> >
> >
> >  10 10  0  0  0  0  0  0  0  0999 V2000
> >    -1.6741   -0.2687    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -2.3885   -0.6812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -2.3885   -1.5063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -1.6741   -1.9188    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -0.9596   -1.5063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -0.9596   -0.6812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -0.2451   -0.2686    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >    -0.2451    0.5563    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
> >     0.4692   -0.6811    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >     0.4692   -1.5061    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
> >   1  2  2  0  0  0  0
> >   2  3  1  0  0  0  0
> >   3  4  2  0  0  0  0
> >   4  5  1  0  0  0  0
> >   5  6  2  0  0  0  0
> >   6  1  1  0  0  0  0
> >   6  7  1  0  0  0  0
> >   7  9  1  0  0  0  0
> >   9 10  1  0  0  0  0
> >   7  8  1  4  0  0  0
> > M  END
> >
> > So 4 instead of 1, how I will get this information from RDKit?
> >
> >
> >
> > Hi Michal,
> >
> > I took a look at the C++ code in GraphMol/FileParsers/MolFileParser.cpp.
> >
> > ParseMolFileBondLine() for parsing V2000 molfiles sets the BondDir to
> > UNKNOWN (case 4, bond stereo type = 4):
> >
> >           stereo = FileParserUtils::toInt(text.substr(9,3));
> >           switch(stereo){
> >           case 0:
> >             res->setBondDir(Bond::NONE);
> >             break;
> >           case 1:
> >             res->setBondDir(Bond::BEGINWEDGE);
> >             break;
> >           case 6:
> >             res->setBondDir(Bond::BEGINDASH);
> >             break;
> >           case 3: // "either" double bond
> >             res->setBondDir(Bond::EITHERDOUBLE);
> >         res->setStereo(Bond::STEREOANY);
> >         break;
> >           case 4: // "either" single bond
> >             res->setBondDir(Bond::UNKNOWN);
> >             break;
> >           }
> >
> > In ParseV3000BondBlock() for V3000 molfiles the same thing happens, so
> they
> > agree (case 2, CFG=2, bond type = single (1)):
> >
> >           if(prop=="CFG"){
> >             unsigned int cfg=atoi(val.c_str());
> >             switch(cfg){
> >             case 0: break;
> >             case 1:
> >               bond->setBondDir(Bond::BEGINWEDGE);
> >           chiralityPossible=true;
> >               break;
> >             case 2:
> >               if(bType==1) bond->setBondDir(Bond::UNKNOWN);
> >               else if(bType==2){
> >         bond->setBondDir(Bond::EITHERDOUBLE);
> >         bond->setStereo(Bond::STEREOANY);
> >           }
> >               break;
> >             case 3:
> >               bond->setBondDir(Bond::BEGINDASH);
> >           chiralityPossible=true;
> >               break;
> >             default:
> >               errout << "bad bond CFG "<<val<<"' on line "<<line;
> >               throw FileParseException(errout.str()) ;
> >             }
> >           } else if(prop=="TOPO"){
> >
> > The bonds will therefore be assigned a BondDir value of Bond::UNKNOWN for
> > single either bonds and BOND::EITHERDOUBLE for double either bonds.
> >
> > I read in a V2000 molfile where the second bond is a single either bond
> > (stereo bond type of 4) and the third bond is a double either bond
> (stereo
> > bond type of 3).
> >
> >>>> from rdkit import Chem
> >>>> m = Chem.MolFromMolFile("C:/temp/either.mol", sanitize=False,
> >>>> removeHs=False)
> >>>> for b in m.GetBonds(): print b.GetBondDir()
> > ...
> > NONE
> > UNKNOWN
> > 5
> > NONE
> > NONE
> > NONE
> >>>>
> >
> >
> > Only slight surprise is that Python returns a "5" instead of an
> > "EITHERDOUBLE" string.
> >
> >>>> Chem.rdchem.BondDir.values
> > {0: rdkit.Chem.rdchem.BondDir.NONE, 1:
> rdkit.Chem.rdchem.BondDir.BEGINWEDGE,
> > 2: rdkit.Chem.rdchem.BondDir.BEGINDASH, 3:
> > rdkit.Chem.rdchem.BondDir.ENDDOWNRI
> > GHT, 4: rdkit.Chem.rdchem.BondDir.ENDUPRIGHT, 6:
> > rdkit.Chem.rdchem.BondDir.UNKNOWN}
> >>>>
> >
> > For some reason Python does not map the BondDir value 5 to a name. But
> the
> > value does match EITHERDOUBLE's implicit ordinal value defined in
> > GraphMol/Bond.h, so it matches what I expect from reading the parser
> code:
> >
> >     //! the bond's direction (for chirality)
> >     typedef enum {
> >       NONE=0,         //!< no special style
> >       BEGINWEDGE,     //!< wedged: narrow at begin
> >       BEGINDASH,      //!< dashed: narrow at begin
> >       // FIX: this may not really be adequate
> >       ENDDOWNRIGHT,   //!< for cis/trans
> >       ENDUPRIGHT,     //!<  ditto
> >       EITHERDOUBLE,   //!< a "crossed" double bond
> >       UNKNOWN,        //!< intentionally unspecified stereochemistry
> >     } BondDir;
> >
> > So the information is retained in GetBondDir() as long as you don't
> > sanitize.
> >
> > Cheers
> > -- Jan
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to