Can you clarify the requirement for bumping the version? That is, which of
the following is the invariant:
1. Any molecule represented in any format changes must create the same
fingerprint
2. Any SMILES string must create the same fingerprint
3. Any OBMol must create the same fingerprint

Since you know where to edit, you can if you wish make the change directly
on github, if you have an account there. But otherwise, I can do it.

- Noel

On Mon, 7 Jan 2019 at 10:50, Andrew Dalke <da...@dalkescientific.com> wrote:

> Hi all,
>
>   I just updated from OB 2.4.1 to the most recent version from version
> control. (This is part of a migration to Python 3.7.)
>
> I noticed that the MACCS key implementation changed for about 1% of the
> first 27008 ChEMBL-24 structures, and the FP2 fingerprints changed for a
> bit more than 1% of the structures. Here's a reproducible for MACCS:
>
> % cat CHEMBL23759.smi
> O=C1CC(=O)[N+](CC2CC2)=C2SC=CN12 CHEMBL23759
>
> [py36-all] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS
> #FPS1
> #num_bits=166
> #type=OpenBabel-MACCS/1
> #software=OpenBabel/2.4.1
> #source=CHEMBL23759.smi
> #date=2019-01-07T09:35:01
> 000020000840010001b495891b63d043c9e12c6f1f      CHEMBL23759
> 1 molecule converted
>
> [py37-2019-1] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS
> #FPS1
> #num_bits=166
> #type=OpenBabel-MACCS/1
> #software=OpenBabel/2.4.90
> #source=CHEMBL23759.smi
> #date=2019-01-07T09:34:39
> 000020000850010000b495891f63d04389612c6f1d      CHEMBL23759
> 1 molecule converted
>
> If you compare the two strings you'll see several differences (I picked
> one with many differences)
>
>
> 000020000840010001b495891b63d043c9e12c6f1f
> 000020000850010000b495891f63d04389612c6f1d
>                  ^       ^      ^ ^      ^
>
> The most common changes from the subset of ChEMBL I tested are:
>
>   Fewer matches in the new code for:
>     [#8]!:*:*   Onot%A%A
>     c:n   C%N
>     [!#1]!:*:*!:[!#1]   Anot%A%Anot%A
>     a   Aromatic
>
>   More matches for:
>     [#6]=[#6]   C=C
>
>   Different matches for:
>     [#7]!:*:*   Nnot%A%A
>
>
> The same structure (CHEMBL23759) also has a number of changes for the FP2
> fingerprint, and changes for the FP3 and FP4 fingerprints. I haven't
> analyzed how many structures have changed for the latter two.
>
> I assume it's a side effect of a change to aromaticity perception, and my
> guess is it's due to the following commit:
>
> commit 1991439efd920f27cd9755fe8abf5c18699d4a58
> Merge: a06e271 d78062b
> Author: Geoff Hutchison <geoff.hutchi...@gmail.com>
> Date:   Mon Oct 2 16:40:08 2017 -0400
>
>     Merge pull request #1638 from baoilleach/daylightarom
>
>     Implement the Daylight aromaticity model as described by John Mayfield
>
>
> Is my diagnosis correct?
>
> Has there only been one such change between the 2.4.1 release and now?
>
> Since the fingerprint output has changed, would someone update the version
> number in Open Babel's FPS output from "1" to something higher?
>
> The "type" version should be updated when the fingerprint implementation
> changes. Chemfp currently has:
>
>   OpenBabel-MACCS/1 -- for pre-2012 versions, before a bug-fix in the
> SMARTS definitions
>   OpenBabel-MACCS/2 -- for OB 2.4.1
>
> and /1 for the FP2, FP3, and FP4 types.
>
> The version information helps identify possible incompatibility problems.
>
> I am about to add the following types to chemfp, for the tentative reason
> "support the Daylight aromaticity model added in October 2017":
>
>   OpenBabel-MACCS/2 to OpenBabel-MACCS/3
>   OpenBabel-FP2/1 to OpenBabel-FP2/2
>   OpenBabel-FP3/1 to OpenBabel-FP3/2
>   OpenBabel-FP4/1 to OpenBabel-FP4/2
>
> I would appreciate it if Open Babel produced the same version string as
> chemfp.
>
> The relevant code is in src/formats/fpsformat.cpp line 130:
>
>         << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n'
>
> That's a hard-coded version number for all fingerprint types.
>
> I don't think the OB registry system supports versioning of the entire
> fingerprinting process, which makes sense from the plugin view because the
> plugin only knows about the format part, and not the fingerprint generation
> code. I don't know how the code might change to handle that information in
> the future.
>
> (Chemfp internally has a similar problem. Even there I'm not sure how I'll
> handle it.)
>
> The easy fix for now is likely to replace the "/1" with a "/3".
>
> If the Open Babel developers decide to make that change then use
> "OpenBabel-FP2/3", etc. instead of "/2".
>
> That means there wouldn't be an "OpenBabel-FP2/2", FP3/2, or FP4/2, but I
> think that's okay.
>
> Best regards,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> _______________________________________________
> OpenBabel-discuss mailing list
> OpenBabel-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to