On 2015-04-29 23:08, Greg Landrum wrote:
> Here are my thoughts on this:
> The RDKit is usually strict while parsing molecules from SDF, SMILES, or
> other formats.

My point was that given
'''
 >  <my_property2>
1234

 >  <my_property3>
'''
a lexer shouldn't have a problem recognizing the 2 tags. A leninent 
parser would return stuff in between as value: "1234\n\n"

> There are exceptions to this: the RDKit ignores the limit on line length
> while reading SDFs: there's no chance of confusion here, so I believe
> it's safe to do so.

Similarly, a lenient parser could ignore the line length and value 
length limits.

> I still need to put some thought into patching the SDWriter so that it
> can recognize things like consecutive line endings in property values.
> The big question is what it should do when it encounters such a case. Is
> that an error? Should it just write the output up to the blank line?

A conservative writer should never write out "1234\n\n". Squash the 
multiple newlines. And/or give it a "strict" flag that makes it error 
out instead.

I'm sure Andrew's seen a lot of badly broken SDFs. It doesn't mean you 
can't handle the ones you can unambiguously parse.

Dimitri


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to