On Apr 29, 2015, at 9:19 PM, Dimitri Maziuk wrote: > There is a difference between ACM members writing network protocols and > "domain" people writing junk.
I think that you are saying that the MDL connection table file formats are junk. I do not disagree. But it's something we have to deal with so my personal views matter little. The MDL file formats are definitely not network protocols, but as you brought up Postel's Robustness Principle I thought you were suggesting that the principle applies more broadly than just network protocols. And for what it's worth, I used to be an ACM member. >> Yes, I agree with this. What constitutes "forbidden"? > > Simply put, the ones that lexer will match as "not values". Certainly. My question is, what are the lexer rules? This is not so simple. Do they allow NUL? Do they allow "$$$$"? Is the goal to handle the SD format exactly as specified? Or to be useful for preventing likely interoperability problems? >> If there is an error, does the writer generate a partial record, > > My interpretation of "conservative" is wipe out the file then crash and > burn. With a useful error message. If the output is to a stream than there is no file to wipe. If the downstream pipe consumer only processes the connection table, and does so at the "M END", then upstream code which emits a partial record may be enough for downstream code, which expects to receive valid data, to emit output for the incomplete record. Thus, the only way to get what you want is to validate all of the fields before emitting any data. However, that does require more performance overhead and is more complex to write. There is also the "worse is better"/"New Jersey style" principle. > If you define your lexical tokens properly, no problem. The problem is > when lexer can't decide what's what. Well, yes. A well-defined grammar is one the recommendations for the "patched" version of the Robustness Principle. The problem is two-fold: 1) there is no unambiguous language definition for the SDF grammar (I've tried!), and 2) the documentation contains ambiguities on how to handle certain circumstances. For examples, 1) can the 'S SKP' field be used to skip the 'M END'? Different Symyx tools give different answers. 2) Are the numeric fields all right-aligned? There was problem where RDKit expected one alignment and another tool generated the other. RDKit now expects either. Or, the spec says of the title line: This line is unformatted, but like all other lines in a molfile cannot extend beyond column 80 while as you saw earlier, it also says: A [Data] value can extend over multiple lines containing up to 200 characters each. Which is normative? I go back to the question of, what is the goal? Is it to prevent RDKit from being used to create ill-formated SD files? If so, then there are many things to review. For example, the spec says: This line must not contain any of the reserved tags that identify any of the other CTAB file types such as $MDL (RGfile), $$$$ (SDfile record separator), $RXN (rxnfile), or $RDFILE (RDfile headers). While RDKit allows arbitrary names in the title. (And I'm not even sure if the spec allows "$$$$12345" or "$MDL3" or not.) Your points are all valid, but I don't see how it's applicable given the circumstances. What RDKit, Open Babel, OEChem, and others do is to follow the New Jersey style, and place a higher burden on API users, instead of spending rather a lot of development time to implement some complex and rarely needed validation logic, for a format that wasn't designed as an exchange file format and doesn't contain the mechanisms needed to be able to follow the Robustness Principle. I'm not convinced that they were wrong to do so. Cheers, Andrew da...@dalkescientific.com P.S. "XML in this example ... is written by a ball street wanker." This slur is both gratuitous and wrong. The example XML was written by Tim Bray, who is not a Wall Street Banker, and the second example concerns EFTPOS messages. I do not wish to participate in discussions with remarks of this sort. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss