On 2015-04-29 23:08, Greg Landrum wrote: > Here are my thoughts on this: > The RDKit is usually strict while parsing molecules from SDF, SMILES, or > other formats.
My point was that given ''' > <my_property2> 1234 > <my_property3> ''' a lexer shouldn't have a problem recognizing the 2 tags. A leninent parser would return stuff in between as value: "1234\n\n" > There are exceptions to this: the RDKit ignores the limit on line length > while reading SDFs: there's no chance of confusion here, so I believe > it's safe to do so. Similarly, a lenient parser could ignore the line length and value length limits. > I still need to put some thought into patching the SDWriter so that it > can recognize things like consecutive line endings in property values. > The big question is what it should do when it encounters such a case. Is > that an error? Should it just write the output up to the blank line? A conservative writer should never write out "1234\n\n". Squash the multiple newlines. And/or give it a "strict" flag that makes it error out instead. I'm sure Andrew's seen a lot of badly broken SDFs. It doesn't mean you can't handle the ones you can unambiguously parse. Dimitri ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

