Making such an API addition adds a maintainence would commit us to
correctly maintaining the underlying information, and that's a maintainence
task I'm not willing to take on.

In addition, I don't personally think that this information adds anything
beyond reporting the version number. The nature of development code is that
its behaviour may change. This includes everything from interpreting SMILES
strings to writing fingerprints. If people use a development version for
scientific work and don't report the git revision (or even just the date),
that's a problem...but it's not our problem. In other words, fingerprints
generated with a development version of Open Babel (or anything else for
that matter) should not be considered stable. If a user pulls down a new
version they should update their fingerprints (and their canonical SMILES
strings, etc.).

Regards,
- Noel


On Tue, 8 Jan 2019, 15:40 Andrew Dalke <da...@dalkescientific.com wrote:

> > On Jan 7, 2019, at 14:10, Noel O'Boyle <baoille...@gmail.com> wrote:
> >
> > Can you clarify the requirement for bumping the version? That is, which
> of the following is the invariant:
> > 1. Any molecule represented in any format changes must create the same
> fingerprint
> > 2. Any SMILES string must create the same fingerprint
> > 3. Any OBMol must create the same fingerprint
>
> I don't know how to answer that question. I think the answer is (2), but
> where "SMILES" is replaced with "input structure record".
>
> The idea of the version in the chemfp type string is to let people know if
> it's reasonable to use the same fingerprint data set after changing to a
> new version of a toolkit.
>
> For example, I might use Open Babel to generate MACCS fingerprints from a
> ChEMBL SD file, and the same version of Open Babel to convert a query
> SMILES into a query fingerprint to find the k=10 nearest neighbors. After a
> period of time I upgrade to a new version of Open Babel. I would like get a
> warning if the generation method has changed enough that I should
> re-compute the MACCS keys.
>
> Or, someone may publish a paper which uses an Open Babel-generated FP2
> data set. I download the dataset and want to know if my installed version
> of Open Babel is likely compatible with it.
>
> My criteria hasn't been so strict as "any" change. For example, if the SD
> parser was changed to better support information which is in 1 out of every
> 100,000 PubChem record, and that change sometimes affects one bit of a
> fingerprint, then in principle the version number could be bumped.
>
> Usually that's between the threshold of noticeability. Fingerprints are
> blunt tools for comparing molecules, and we already expect some level of
> error when working with structure and fingerprint files.
>
> On the other hand, a change in 1% of the records seems like enough to bump
> the version number.
>
> Chemfp has a "software" header which helps in cases where more
> fine-grained versioning might be needed. For example,:
>
>   #software=OpenBabel/2.4.1 chemfp/3.0
>
> says that the data set was generated with Open Babel 2.4.1 using chemfp
> 3.0. However, it's impossible for software to look at "2.4.0" vs. "2.4.1"
> or "2.4.90" and tell if the fingerprint generation method changed.
>
> (Plus, the 2.4.90 has been the same since 2017-10-11 so isn't enough
> information if someone wants to reproduce an analysis. Ideally someone who
> publishes a paper based on a version installed from version control should
> include the relevant git commit id.)
>
> > Since you know where to edit, you can if you wish make the change
> directly on github, if you have an account there. But otherwise, I can do
> it.
>
> I can make the change. I'm trying to figure out what change to make.
>
> If there were two significant periods of time since 2.4.1 was released,
> with different fingerprint generation methods, then I would build versions
> of Open Babel for those periods so that chemfp's versioning captures that
> information. Eg, have a "/3" and a "/4". But Open Babel would only need the
> "/4".
>
> If there's only one significant implementation change, which is what it
> now seems like, then the easiest code change is to bump all versions to
> "/3".
>
> I'm fine with that.
>
> In principle I would like to add a "version" string to the plugin system,
> so that I can replace:
>
>         << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n'
>
> with something like
>
>         << "#type=OpenBabel-" << _pFP->GetID() << "/" <<
> _pFP->GetVersion() << '\n'
>
> which means the implementation version numbers can be bumped independently.
>
> However, that requires adding a new attribute to the OBPlugin class, which
> I think would break ABI compatibility and require a rebuild of all
> third-party extensions.
>
> I could instead add it to OBFingerprint, which would break fewer things.
>
> On the other hand, my feeling is that that's overkill for the FP[234] and
> MACCS fingerprints as those have been stable for a long time.
>
> The newer circular fingerprints are not as stable, but they also can't be
> exported to FPS format.
>
> Do the Open Babel core developers think this feature is useful enough to
> outweigh the potential of breaking existing third-party plugins? If not, is
> there an alternative way to add a version string which is acceptable?
>
> If not, I'll just change all of the version numbers from "/1" to "/3".
>
> Best regards,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> _______________________________________________
> OpenBabel-discuss mailing list
> OpenBabel-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to