Thank you again Greg.  If you have time to get this in the upcoming release
great, do not rush on my account.

I have another couple of questions regarding fingerprints in general and
the fingerprint generators in particular.


   - To what degree do people use the different fingerprint types? Is it
   more common to use the RDKit fingerprint, for example, as a bit vector, and
   the Morgan fingerprint as a counts vector?  Does it depend on the
   application or is it more how a particular fingerprint was historically
   used?
   - I notice there is a wider variety of distance measures available for
   bit vectors than for count vectors. Is this because these measures, the
   McConnaughey similarity for example, aren't extendable to multisets in the
   same way that Tversky similarity can? Or is it just that there hasn't been
   any demand for non-bitvector versions of the measures in BitOps.h?
   - Would it be useful to people for the FingerprintGenerator class to
   return the list of atom invariants (or environments) used?  Or is that what
   the BitInfo is used for?

Best,
Jason



On Fri, Mar 13, 2020 at 11:13 PM Greg Landrum <greg.land...@gmail.com>
wrote:

> Unfortunately it looks like the additional outputs for morgan, and rdkit
> fingerprints are parts that weren't finished:
>
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/MorganGenerator.cpp#L143
>
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/RDKitFPGenerator.cpp#L99
>
> I will take a look and see if it's possible to get these into the next
> release. In the meantime, if you want that info it looks like you'll need
> to use the older fingerprinting functions.
>
> -greg
>
> On Fri, Mar 13, 2020 at 11:10 PM Jason Biggs <jasondbi...@gmail.com>
> wrote:
>
>> Thank you Greg.
>>
>> I am working in C++.  I can poke around with this if I knew which members
>> of the AdditionalOutput struct are used by which fingerprint generators.  I
>> just wanted to make sure there wasn't an explanation somewhere I missed.
>>
>> I can see that with the AtomPairs fingerprints I can do the following
>>
>> //mol is an *ROMol and fpg is a *FingerprintGenerator
>> RDKit::AdditionalOutput ao;
>>
>> std::vector<std::vector<std::uint64_t>> atomtobits(mol->getNumAtoms());
>> ao.atomToBits = &atb;
>>
>> auto res = fpg->getSparseCountFingerprint(*mo, nullptr, nullptr, -1, &ao);
>>
>> after which atomtobits contains a list of bits for each atom.  From the
>> comments I think the bitInfo member should be used by the
>> RDKitFingerprintGenerator, but I don't see where it is used in the code.
>> Is that the part that wasn't finished?  Is it possible to get information
>> about the atoms/environments that set particular bits in the Morgan or
>> RDKit fingerprints using the new API?
>>
>> Jason Biggs
>>
>>
>>
>> On Fri, Mar 13, 2020 at 10:20 AM Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> Hi Jason,
>>>
>>> At the moment there's nothing available here except what's in the C++
>>> tests. This part of the code didn't end up being completely finished before
>>> the GSoC project ended and it's never bubbled up on my priority list to
>>> finish it.
>>>
>>> I haven't spent much time with this code, but I can probably put
>>> together an example.
>>> Are you working from C++?
>>>
>>> -greg
>>>
>>>
>>> On Thu, Mar 12, 2020 at 10:42 PM Jason Biggs <jasondbi...@gmail.com>
>>> wrote:
>>>
>>>> I am taking a look at the FingerprintGenerator class and I really like
>>>> this unified interface for these four types of fingerprints.  I have very
>>>> limited experience with the fingerprint code before the generator API was
>>>> introduced.
>>>>
>>>> What I'm not sure about is how to get information about the
>>>> atoms/environments that set the bits.  I believe I need to use the
>>>> AdditionalOutput struct,
>>>> https://www.rdkit.org/docs/cppapi/structRDKit_1_1AdditionalOutput.html,
>>>> but I'm not exactly sure how to do so.  I normally would look at the c++
>>>> test files to see how it is used, and from that I see the atomToBits member
>>>> is used in the atom pairs fingerprints, but I'm not sure about the other
>>>> members of this struct.  For example there is a bitInfo member, is this
>>>> where I would find information for the RDKit and Morgan fingerprints?
>>>>
>>>> Are there any examples somewhere that I could follow to find out more
>>>> information?
>>>>
>>>> Thank you
>>>>
>>>> Jason
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to