I think Nils is right here. An RDKit fingerprint with a max length of 12 is
going to set A LOT of bits. Try it and see.
Collisions are almost guaranteed

There are many possible reasons why you may not be getting the results you
expect (that’s the fun in machine learning), but if you suspect that the
fingerprints are the problem, you might try another FP and see if you miss
the same compounds. If so: maybe it’s the data. If not: could be the
different info in the different FPs and you could try combining them. We
did a paper on this:
https://pubs.acs.org/doi/abs/10.1021/ci400466r

There are many things to try... one never runs out of new approaches. :-)

On Thu, 4 Oct 2018 at 21:06, Nils Weskamp <[email protected]> wrote:

> Am 04.10.2018 um 20:53 schrieb Thomas Evangelidis:
> >     not sure if significantly longer path lengths (e.g. 12) actually
> >     "increase the amount of information" since they also increase the
> risk
> >     of bit collisions in folded fingerprints.
> >
> > If you increase the fpSize to 8192, won't you reduce the risk of bit
> > collisions?
>
> Yes, by a factor of two. However, depending on the size and complexity
> of your compounds, I would expect that the number of bits growths
> significantly more (due to combinatorial explosion) when you go from
> path length 5 (or 7) to 12.
>
> Best,
> Nils
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to