Here I was getting ready to write a long explanation, but now I see it's
not needed. Thanks Jan!
-greg
On Fri, Aug 22, 2014 at 9:07 PM, Michał Nowotka <mmm...@gmail.com> wrote:
> Hi Jan,
>
> Yes, thanks to Greg's hints I've implemented this
>
> https://github.com/mnowotka/chembl_beaker/blob/master/chembl_beaker/beaker/core_apps/marvin/MarvinJSONEncoder.py#L292
> .
> Anyway, thank you for finding actual code, it was definitely worth
> taking a look, the whole parser implementation is interesting.
>
> Cheers,
> Michał
>
> On Fri, Aug 22, 2014 at 7:36 PM, Jan Holst Jensen <j...@biochemfusion.com>
> wrote:
> > On 2014-08-22 10:38, Michał Nowotka wrote:
> >
> > A question I have is why you want to access the bond wedging.
> >
> > [...] Now imagine I only have this molfile and I want to convert it back
> to
> > *mrv. I don't want to write my own parser for molfiles when I know
> > that RDKit can already parse it. But I need to extract this 'bond
> > stereo' information from within RDKit somehow.
> >
> > Now when you say that this '1' or 'W' value corresponds to bond
> > direction, I'm guessing that 'direction' can store only two values: up
> > and down so '1' and '6' ('W' and 'H' in marvin terms). So what about
> > other values which this field can have, If for example I have this
> > molfile:
> >
> >
> >
> > 10 10 0 0 0 0 0 0 0 0999 V2000
> > -1.6741 -0.2687 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -2.3885 -0.6812 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -2.3885 -1.5063 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -1.6741 -1.9188 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -0.9596 -1.5063 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -0.9596 -0.6812 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -0.2451 -0.2686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > -0.2451 0.5563 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
> > 0.4692 -0.6811 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > 0.4692 -1.5061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> > 1 2 2 0 0 0 0
> > 2 3 1 0 0 0 0
> > 3 4 2 0 0 0 0
> > 4 5 1 0 0 0 0
> > 5 6 2 0 0 0 0
> > 6 1 1 0 0 0 0
> > 6 7 1 0 0 0 0
> > 7 9 1 0 0 0 0
> > 9 10 1 0 0 0 0
> > 7 8 1 4 0 0 0
> > M END
> >
> > So 4 instead of 1, how I will get this information from RDKit?
> >
> >
> >
> > Hi Michal,
> >
> > I took a look at the C++ code in GraphMol/FileParsers/MolFileParser.cpp.
> >
> > ParseMolFileBondLine() for parsing V2000 molfiles sets the BondDir to
> > UNKNOWN (case 4, bond stereo type = 4):
> >
> > stereo = FileParserUtils::toInt(text.substr(9,3));
> > switch(stereo){
> > case 0:
> > res->setBondDir(Bond::NONE);
> > break;
> > case 1:
> > res->setBondDir(Bond::BEGINWEDGE);
> > break;
> > case 6:
> > res->setBondDir(Bond::BEGINDASH);
> > break;
> > case 3: // "either" double bond
> > res->setBondDir(Bond::EITHERDOUBLE);
> > res->setStereo(Bond::STEREOANY);
> > break;
> > case 4: // "either" single bond
> > res->setBondDir(Bond::UNKNOWN);
> > break;
> > }
> >
> > In ParseV3000BondBlock() for V3000 molfiles the same thing happens, so
> they
> > agree (case 2, CFG=2, bond type = single (1)):
> >
> > if(prop=="CFG"){
> > unsigned int cfg=atoi(val.c_str());
> > switch(cfg){
> > case 0: break;
> > case 1:
> > bond->setBondDir(Bond::BEGINWEDGE);
> > chiralityPossible=true;
> > break;
> > case 2:
> > if(bType==1) bond->setBondDir(Bond::UNKNOWN);
> > else if(bType==2){
> > bond->setBondDir(Bond::EITHERDOUBLE);
> > bond->setStereo(Bond::STEREOANY);
> > }
> > break;
> > case 3:
> > bond->setBondDir(Bond::BEGINDASH);
> > chiralityPossible=true;
> > break;
> > default:
> > errout << "bad bond CFG "<<val<<"' on line "<<line;
> > throw FileParseException(errout.str()) ;
> > }
> > } else if(prop=="TOPO"){
> >
> > The bonds will therefore be assigned a BondDir value of Bond::UNKNOWN for
> > single either bonds and BOND::EITHERDOUBLE for double either bonds.
> >
> > I read in a V2000 molfile where the second bond is a single either bond
> > (stereo bond type of 4) and the third bond is a double either bond
> (stereo
> > bond type of 3).
> >
> >>>> from rdkit import Chem
> >>>> m = Chem.MolFromMolFile("C:/temp/either.mol", sanitize=False,
> >>>> removeHs=False)
> >>>> for b in m.GetBonds(): print b.GetBondDir()
> > ...
> > NONE
> > UNKNOWN
> > 5
> > NONE
> > NONE
> > NONE
> >>>>
> >
> >
> > Only slight surprise is that Python returns a "5" instead of an
> > "EITHERDOUBLE" string.
> >
> >>>> Chem.rdchem.BondDir.values
> > {0: rdkit.Chem.rdchem.BondDir.NONE, 1:
> rdkit.Chem.rdchem.BondDir.BEGINWEDGE,
> > 2: rdkit.Chem.rdchem.BondDir.BEGINDASH, 3:
> > rdkit.Chem.rdchem.BondDir.ENDDOWNRI
> > GHT, 4: rdkit.Chem.rdchem.BondDir.ENDUPRIGHT, 6:
> > rdkit.Chem.rdchem.BondDir.UNKNOWN}
> >>>>
> >
> > For some reason Python does not map the BondDir value 5 to a name. But
> the
> > value does match EITHERDOUBLE's implicit ordinal value defined in
> > GraphMol/Bond.h, so it matches what I expect from reading the parser
> code:
> >
> > //! the bond's direction (for chirality)
> > typedef enum {
> > NONE=0, //!< no special style
> > BEGINWEDGE, //!< wedged: narrow at begin
> > BEGINDASH, //!< dashed: narrow at begin
> > // FIX: this may not really be adequate
> > ENDDOWNRIGHT, //!< for cis/trans
> > ENDUPRIGHT, //!< ditto
> > EITHERDOUBLE, //!< a "crossed" double bond
> > UNKNOWN, //!< intentionally unspecified stereochemistry
> > } BondDir;
> >
> > So the information is retained in GetBondDir() as long as you don't
> > sanitize.
> >
> > Cheers
> > -- Jan
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss