Sorry - a more careful read is that it means (3). And indeed this has
changed.

Off the top of my head, I don't believe any other commits have changed
this, but they may have. For example, I rewrote the handling of explicit
hydrogens, fixing many related bugs along the way.

Regards,
- Noel

On Mon, 7 Jan 2019 at 13:10, Noel O'Boyle <baoille...@gmail.com> wrote:

> Can you clarify the requirement for bumping the version? That is, which of
> the following is the invariant:
> 1. Any molecule represented in any format changes must create the same
> fingerprint
> 2. Any SMILES string must create the same fingerprint
> 3. Any OBMol must create the same fingerprint
>
> Since you know where to edit, you can if you wish make the change directly
> on github, if you have an account there. But otherwise, I can do it.
>
> - Noel
>
> On Mon, 7 Jan 2019 at 10:50, Andrew Dalke <da...@dalkescientific.com>
> wrote:
>
>> Hi all,
>>
>>   I just updated from OB 2.4.1 to the most recent version from version
>> control. (This is part of a migration to Python 3.7.)
>>
>> I noticed that the MACCS key implementation changed for about 1% of the
>> first 27008 ChEMBL-24 structures, and the FP2 fingerprints changed for a
>> bit more than 1% of the structures. Here's a reproducible for MACCS:
>>
>> % cat CHEMBL23759.smi
>> O=C1CC(=O)[N+](CC2CC2)=C2SC=CN12 CHEMBL23759
>>
>> [py36-all] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS
>> #FPS1
>> #num_bits=166
>> #type=OpenBabel-MACCS/1
>> #software=OpenBabel/2.4.1
>> #source=CHEMBL23759.smi
>> #date=2019-01-07T09:35:01
>> 000020000840010001b495891b63d043c9e12c6f1f      CHEMBL23759
>> 1 molecule converted
>>
>> [py37-2019-1] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS
>> #FPS1
>> #num_bits=166
>> #type=OpenBabel-MACCS/1
>> #software=OpenBabel/2.4.90
>> #source=CHEMBL23759.smi
>> #date=2019-01-07T09:34:39
>> 000020000850010000b495891f63d04389612c6f1d      CHEMBL23759
>> 1 molecule converted
>>
>> If you compare the two strings you'll see several differences (I picked
>> one with many differences)
>>
>>
>> 000020000840010001b495891b63d043c9e12c6f1f
>> 000020000850010000b495891f63d04389612c6f1d
>>                  ^       ^      ^ ^      ^
>>
>> The most common changes from the subset of ChEMBL I tested are:
>>
>>   Fewer matches in the new code for:
>>     [#8]!:*:*   Onot%A%A
>>     c:n   C%N
>>     [!#1]!:*:*!:[!#1]   Anot%A%Anot%A
>>     a   Aromatic
>>
>>   More matches for:
>>     [#6]=[#6]   C=C
>>
>>   Different matches for:
>>     [#7]!:*:*   Nnot%A%A
>>
>>
>> The same structure (CHEMBL23759) also has a number of changes for the FP2
>> fingerprint, and changes for the FP3 and FP4 fingerprints. I haven't
>> analyzed how many structures have changed for the latter two.
>>
>> I assume it's a side effect of a change to aromaticity perception, and my
>> guess is it's due to the following commit:
>>
>> commit 1991439efd920f27cd9755fe8abf5c18699d4a58
>> Merge: a06e271 d78062b
>> Author: Geoff Hutchison <geoff.hutchi...@gmail.com>
>> Date:   Mon Oct 2 16:40:08 2017 -0400
>>
>>     Merge pull request #1638 from baoilleach/daylightarom
>>
>>     Implement the Daylight aromaticity model as described by John Mayfield
>>
>>
>> Is my diagnosis correct?
>>
>> Has there only been one such change between the 2.4.1 release and now?
>>
>> Since the fingerprint output has changed, would someone update the
>> version number in Open Babel's FPS output from "1" to something higher?
>>
>> The "type" version should be updated when the fingerprint implementation
>> changes. Chemfp currently has:
>>
>>   OpenBabel-MACCS/1 -- for pre-2012 versions, before a bug-fix in the
>> SMARTS definitions
>>   OpenBabel-MACCS/2 -- for OB 2.4.1
>>
>> and /1 for the FP2, FP3, and FP4 types.
>>
>> The version information helps identify possible incompatibility problems.
>>
>> I am about to add the following types to chemfp, for the tentative reason
>> "support the Daylight aromaticity model added in October 2017":
>>
>>   OpenBabel-MACCS/2 to OpenBabel-MACCS/3
>>   OpenBabel-FP2/1 to OpenBabel-FP2/2
>>   OpenBabel-FP3/1 to OpenBabel-FP3/2
>>   OpenBabel-FP4/1 to OpenBabel-FP4/2
>>
>> I would appreciate it if Open Babel produced the same version string as
>> chemfp.
>>
>> The relevant code is in src/formats/fpsformat.cpp line 130:
>>
>>         << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n'
>>
>> That's a hard-coded version number for all fingerprint types.
>>
>> I don't think the OB registry system supports versioning of the entire
>> fingerprinting process, which makes sense from the plugin view because the
>> plugin only knows about the format part, and not the fingerprint generation
>> code. I don't know how the code might change to handle that information in
>> the future.
>>
>> (Chemfp internally has a similar problem. Even there I'm not sure how
>> I'll handle it.)
>>
>> The easy fix for now is likely to replace the "/1" with a "/3".
>>
>> If the Open Babel developers decide to make that change then use
>> "OpenBabel-FP2/3", etc. instead of "/2".
>>
>> That means there wouldn't be an "OpenBabel-FP2/2", FP3/2, or FP4/2, but I
>> think that's okay.
>>
>> Best regards,
>>
>>                                 Andrew
>>                                 da...@dalkescientific.com
>>
>>
>>
>>
>> _______________________________________________
>> OpenBabel-discuss mailing list
>> OpenBabel-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>>
>
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to