On Oct 25, 2013, at 10:11 AM, Roger Sayle wrote:
> The use of an integer file format "flavor" argument allows the caller  
> to customize the behavior of the readers and writers.  The semantics
> is that a reasonable default is zero (for all bits), but that new
> features may be added without changing the API/ABI.

For some background, this is the API style used by OpenEye's
high-level readers and writers. There's more explanation at:

http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/molreadwrite.html#flavored-input-and-output

It solves a difficult problem, which is that there is no
such thing as "the PDB format." (For that matter, there are
also variations of the MDL format, if only because the
output writer could use V3000 format for all cases, vs. V3000
only when V2000 can't support the structure.)

RDKit also supports different input and output flavors, though
it uses parameter attributes, like sanitize=False or
removeHs=False for reading an SD file.

OEChem's interface is more generic, in that the single 'flavor'
parameter exists for the high-level readers, which is easier
to pass around in a C++ toolkit.

(OTOH, this is less important for Python code. In chemfp, I
just pass around a Python dictionary of kwargs and apply
it like: SDMolSupplier(filename, **kwargs). )


However, these integer flags are tricky to use in practice.

For example, if you see flavor=49, what does it mean? Few
people will be able to look at that number and know it's:

  bit  1 = Write MODEL/ENDMDL lines around each record
  bit 16 = Write MASTER record
  bit 32 = Write TER record

For OEChem support, I ended up writing my own conversion
routines between the integer and a string notation. After
all, I would rather people do:

  rdkit2fps input.pdb --flavor "MASTER|MODEL|TER"

than have to do bitwise or-ing themselves for:

  rdkit2fps input.pdb --flavor 49


Bitflags also don't mix well with non-binary states.
Consider an SD file writer which supports a three-state option:
 - only V2000 output (ignore or generate corrupt records otherwise?)
 - V3000 output if required, otherwise V2000
 - always V3000

It's of course possible to encode this using 2 bits, but it
loses some of its elegance.

Think though of RDKit's SMILES file reader. It supports a
'delimiter' option, in order to support space, tab, comma,
and I presume other delimiters as well. It also supports
the ability to say that the SMILES come from something other
than the first column, and the SMILES from other than the
second.

These are even harder to encode in a single flavor.

BTW, OEChem doesn't support a delimiter option. Their 'SMILES
file' comes from the Daylight practice of

  SMILES + whitespace + rest_of_line_as_title

vs. the RDKit practice of assuming the file is a set of
delimited columns, with a possible header.


Above Roger said above that "a reasonable default is zero (for all
bits), but that new features may be added without changing
the API/ABI."

Most file format work nicely with binary flags, as OEChem's
practice well shows. Some do not, as RDKit's SMILES file
format suggests.

There are other possible APIs which can handle the requirement of
supporting new features without changing the API/ABI.

RDKit's current method, that of passing additional arguments
to the function or constructor, is not scalable. I may have
multiple layers before I get to the actual reader or writer,
and I don't want to update the intermediate APIs every time
something changes.

I think it's very interesting that OEChem's new InChI
support (added only recently, so Roger might not know about
it), takes an InChIOptions object.

http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/OEChemClasses/OEInChIOptions.html

OEInChIOptions(unsigned int flavor = OEOFlavor::INCHI::Default)

with methods like:
  .GetChiral()
  .GetFixedHLayer()
     ...
  .SetChiral()
  .SetFixedHLayer()
     ...

I don't know why they switched to this style for this case.
I wonder if part of it was to insulate themselves from any
odd specifications InChI might add in the future.

I prefer this style - an instance which contains the different
parameters - though I haven't used it in earnest.

This style too has difficulties, especially in C++. Ideally
you want to support programs which support, say, version 2013
(without a given feature and associated method) and version
2014 (without). You can't do that in a language like C++ which
requires all methods to be resolved in order for the program
to run.

The XMLReader API supports a 'getFeature(name)' and associated
'getProperty()'/'setProperty()', which might provide the right
generic API.

That said, you should read my email as commentary, and not
as a statement for or against the current code. While I don't
like it that much; without doubt, bit flags do work for this
task. And because of C++ overloading, there's also a migration
path to support an options class API like I promoted just now.


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to