Sorry - a more careful read is that it means (3). And indeed this has changed.
Off the top of my head, I don't believe any other commits have changed this, but they may have. For example, I rewrote the handling of explicit hydrogens, fixing many related bugs along the way. Regards, - Noel On Mon, 7 Jan 2019 at 13:10, Noel O'Boyle <baoille...@gmail.com> wrote: > Can you clarify the requirement for bumping the version? That is, which of > the following is the invariant: > 1. Any molecule represented in any format changes must create the same > fingerprint > 2. Any SMILES string must create the same fingerprint > 3. Any OBMol must create the same fingerprint > > Since you know where to edit, you can if you wish make the change directly > on github, if you have an account there. But otherwise, I can do it. > > - Noel > > On Mon, 7 Jan 2019 at 10:50, Andrew Dalke <da...@dalkescientific.com> > wrote: > >> Hi all, >> >> I just updated from OB 2.4.1 to the most recent version from version >> control. (This is part of a migration to Python 3.7.) >> >> I noticed that the MACCS key implementation changed for about 1% of the >> first 27008 ChEMBL-24 structures, and the FP2 fingerprints changed for a >> bit more than 1% of the structures. Here's a reproducible for MACCS: >> >> % cat CHEMBL23759.smi >> O=C1CC(=O)[N+](CC2CC2)=C2SC=CN12 CHEMBL23759 >> >> [py36-all] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS >> #FPS1 >> #num_bits=166 >> #type=OpenBabel-MACCS/1 >> #software=OpenBabel/2.4.1 >> #source=CHEMBL23759.smi >> #date=2019-01-07T09:35:01 >> 000020000840010001b495891b63d043c9e12c6f1f CHEMBL23759 >> 1 molecule converted >> >> [py37-2019-1] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS >> #FPS1 >> #num_bits=166 >> #type=OpenBabel-MACCS/1 >> #software=OpenBabel/2.4.90 >> #source=CHEMBL23759.smi >> #date=2019-01-07T09:34:39 >> 000020000850010000b495891f63d04389612c6f1d CHEMBL23759 >> 1 molecule converted >> >> If you compare the two strings you'll see several differences (I picked >> one with many differences) >> >> >> 000020000840010001b495891b63d043c9e12c6f1f >> 000020000850010000b495891f63d04389612c6f1d >> ^ ^ ^ ^ ^ >> >> The most common changes from the subset of ChEMBL I tested are: >> >> Fewer matches in the new code for: >> [#8]!:*:* Onot%A%A >> c:n C%N >> [!#1]!:*:*!:[!#1] Anot%A%Anot%A >> a Aromatic >> >> More matches for: >> [#6]=[#6] C=C >> >> Different matches for: >> [#7]!:*:* Nnot%A%A >> >> >> The same structure (CHEMBL23759) also has a number of changes for the FP2 >> fingerprint, and changes for the FP3 and FP4 fingerprints. I haven't >> analyzed how many structures have changed for the latter two. >> >> I assume it's a side effect of a change to aromaticity perception, and my >> guess is it's due to the following commit: >> >> commit 1991439efd920f27cd9755fe8abf5c18699d4a58 >> Merge: a06e271 d78062b >> Author: Geoff Hutchison <geoff.hutchi...@gmail.com> >> Date: Mon Oct 2 16:40:08 2017 -0400 >> >> Merge pull request #1638 from baoilleach/daylightarom >> >> Implement the Daylight aromaticity model as described by John Mayfield >> >> >> Is my diagnosis correct? >> >> Has there only been one such change between the 2.4.1 release and now? >> >> Since the fingerprint output has changed, would someone update the >> version number in Open Babel's FPS output from "1" to something higher? >> >> The "type" version should be updated when the fingerprint implementation >> changes. Chemfp currently has: >> >> OpenBabel-MACCS/1 -- for pre-2012 versions, before a bug-fix in the >> SMARTS definitions >> OpenBabel-MACCS/2 -- for OB 2.4.1 >> >> and /1 for the FP2, FP3, and FP4 types. >> >> The version information helps identify possible incompatibility problems. >> >> I am about to add the following types to chemfp, for the tentative reason >> "support the Daylight aromaticity model added in October 2017": >> >> OpenBabel-MACCS/2 to OpenBabel-MACCS/3 >> OpenBabel-FP2/1 to OpenBabel-FP2/2 >> OpenBabel-FP3/1 to OpenBabel-FP3/2 >> OpenBabel-FP4/1 to OpenBabel-FP4/2 >> >> I would appreciate it if Open Babel produced the same version string as >> chemfp. >> >> The relevant code is in src/formats/fpsformat.cpp line 130: >> >> << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n' >> >> That's a hard-coded version number for all fingerprint types. >> >> I don't think the OB registry system supports versioning of the entire >> fingerprinting process, which makes sense from the plugin view because the >> plugin only knows about the format part, and not the fingerprint generation >> code. I don't know how the code might change to handle that information in >> the future. >> >> (Chemfp internally has a similar problem. Even there I'm not sure how >> I'll handle it.) >> >> The easy fix for now is likely to replace the "/1" with a "/3". >> >> If the Open Babel developers decide to make that change then use >> "OpenBabel-FP2/3", etc. instead of "/2". >> >> That means there wouldn't be an "OpenBabel-FP2/2", FP3/2, or FP4/2, but I >> think that's okay. >> >> Best regards, >> >> Andrew >> da...@dalkescientific.com >> >> >> >> >> _______________________________________________ >> OpenBabel-discuss mailing list >> OpenBabel-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss >> >
_______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss