On Oct 25, 2013, at 10:11 AM, Roger Sayle wrote: > The use of an integer file format "flavor" argument allows the caller > to customize the behavior of the readers and writers. The semantics > is that a reasonable default is zero (for all bits), but that new > features may be added without changing the API/ABI.
For some background, this is the API style used by OpenEye's high-level readers and writers. There's more explanation at: http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/molreadwrite.html#flavored-input-and-output It solves a difficult problem, which is that there is no such thing as "the PDB format." (For that matter, there are also variations of the MDL format, if only because the output writer could use V3000 format for all cases, vs. V3000 only when V2000 can't support the structure.) RDKit also supports different input and output flavors, though it uses parameter attributes, like sanitize=False or removeHs=False for reading an SD file. OEChem's interface is more generic, in that the single 'flavor' parameter exists for the high-level readers, which is easier to pass around in a C++ toolkit. (OTOH, this is less important for Python code. In chemfp, I just pass around a Python dictionary of kwargs and apply it like: SDMolSupplier(filename, **kwargs). ) However, these integer flags are tricky to use in practice. For example, if you see flavor=49, what does it mean? Few people will be able to look at that number and know it's: bit 1 = Write MODEL/ENDMDL lines around each record bit 16 = Write MASTER record bit 32 = Write TER record For OEChem support, I ended up writing my own conversion routines between the integer and a string notation. After all, I would rather people do: rdkit2fps input.pdb --flavor "MASTER|MODEL|TER" than have to do bitwise or-ing themselves for: rdkit2fps input.pdb --flavor 49 Bitflags also don't mix well with non-binary states. Consider an SD file writer which supports a three-state option: - only V2000 output (ignore or generate corrupt records otherwise?) - V3000 output if required, otherwise V2000 - always V3000 It's of course possible to encode this using 2 bits, but it loses some of its elegance. Think though of RDKit's SMILES file reader. It supports a 'delimiter' option, in order to support space, tab, comma, and I presume other delimiters as well. It also supports the ability to say that the SMILES come from something other than the first column, and the SMILES from other than the second. These are even harder to encode in a single flavor. BTW, OEChem doesn't support a delimiter option. Their 'SMILES file' comes from the Daylight practice of SMILES + whitespace + rest_of_line_as_title vs. the RDKit practice of assuming the file is a set of delimited columns, with a possible header. Above Roger said above that "a reasonable default is zero (for all bits), but that new features may be added without changing the API/ABI." Most file format work nicely with binary flags, as OEChem's practice well shows. Some do not, as RDKit's SMILES file format suggests. There are other possible APIs which can handle the requirement of supporting new features without changing the API/ABI. RDKit's current method, that of passing additional arguments to the function or constructor, is not scalable. I may have multiple layers before I get to the actual reader or writer, and I don't want to update the intermediate APIs every time something changes. I think it's very interesting that OEChem's new InChI support (added only recently, so Roger might not know about it), takes an InChIOptions object. http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/OEChemClasses/OEInChIOptions.html OEInChIOptions(unsigned int flavor = OEOFlavor::INCHI::Default) with methods like: .GetChiral() .GetFixedHLayer() ... .SetChiral() .SetFixedHLayer() ... I don't know why they switched to this style for this case. I wonder if part of it was to insulate themselves from any odd specifications InChI might add in the future. I prefer this style - an instance which contains the different parameters - though I haven't used it in earnest. This style too has difficulties, especially in C++. Ideally you want to support programs which support, say, version 2013 (without a given feature and associated method) and version 2014 (without). You can't do that in a language like C++ which requires all methods to be resolved in order for the program to run. The XMLReader API supports a 'getFeature(name)' and associated 'getProperty()'/'setProperty()', which might provide the right generic API. That said, you should read my email as commentary, and not as a statement for or against the current code. While I don't like it that much; without doubt, bit flags do work for this task. And because of C++ overloading, there's also a migration path to support an options class API like I promoted just now. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss