> On Oct 21, 2021, at 04:50, Ling Chan <lingtrek...@gmail.com> wrote:
>
> I got the attached sdf. When I did a MolToSmiles, it gives me the following.
>
> >>> for m in Chem.SDMolSupplier("pdb_structures/1q6k_ligand.sdf"):
> ... print (Chem.MolToSmiles(m))
> ...
> [CH3:0][C:0]([CH3:0])([CH3:0])[O:0][C:0](=[O:0])[NH:0][CH:0]([CH:0]=[O:0])[CH:0]1[CH2:0][CH2:0][CH2:0][CH2:0][CH2:0]1
>
> Just wonder why does it not give something like
> O=C(OC(C)(C)C)NC(C=O)C1CCCCC1
The terms after the atom symbol in your atom block lines are center-justified
(or left-justified, in the 2-digit mass difference term 'dd') instead of
right-justified.
Here's a comparison of your first atom line, compared with the ctfile spec, and
then compared with the round-trip through RDKit:
74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 <--
yours
xxxxx.xxxxyyyyy.yyyyzzzzz.zzzz aaaddcccssshhhbbbvvvHHHrrriiimmmnnneee <-- spec
74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 <--
RDKit
Add a space after the atom symbol field ("aaa") and everything works.
What happened?
The ":0" in the SMILES string derives from the atom-atom mapping number, "mmm",
in the SDF.
The relevant code from
Code/GraphMol/FileParsers/MolFileParser.cpp::ParseMolFileAtomLine() is:
if (text.size() >= 63 && text.substr(60, 3) != " 0") {
int atomMapNumber = 0;
try {
atomMapNumber = FileParserUtils::toInt(text.substr(60, 3), true);
} catch (boost::bad_lexical_cast &) {
std::ostringstream errout;
errout << "Cannot convert '" << text.substr(60, 3) << "' to int on line "
<< line;
delete res;
throw FileParseException(errout.str());
}
res->setProp(common_properties::molAtomMapNumber, atomMapNumber);
}
This says that if the field isn't exactly " 0" then parse it as an integer and
store it in the atom's molAtomMapNumber.
Since your " 0 " field isn't exactly " 0", it gets converted into the atom map
value of 0.
I don't see an explicit statement in the spec about alignment in fields. It's
clear the spec comes from a Fortran background, so these should be interpreted
as "I2" and "I3", and right-justified.
By the way, if you pass your file through CDK you get:
org.openscience.cdk.io.MDLV2000Reader ERROR: Error while parsing line 5:
74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0 0 0 -> invalid
line length, 68: 74.0060 -9.5770 134.8660 N 0 0 0 0 0 0 0 0 0 0
0 0
org.openscience.cdk.io.iterator.IteratingSDFReader ERROR: Error while reading
next molecule: invalid line length, 68: 74.0060 -9.5770 134.8660 N 0 0
0 0 0 0 0 0 0 0 0 0
CDK's
storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java::readAtomFast()
requires that either all characters of a field be present, or the end of line.
Your line is 68 characters long because your last field is " 0" instead of the
" 0 " needed to match the exact charge flag "eee".
Best regards,
Andrew
da...@dalkescientific.com
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss