I think Nils is right here. An RDKit fingerprint with a max length of 12 is going to set A LOT of bits. Try it and see. Collisions are almost guaranteed
There are many possible reasons why you may not be getting the results you expect (that’s the fun in machine learning), but if you suspect that the fingerprints are the problem, you might try another FP and see if you miss the same compounds. If so: maybe it’s the data. If not: could be the different info in the different FPs and you could try combining them. We did a paper on this: https://pubs.acs.org/doi/abs/10.1021/ci400466r There are many things to try... one never runs out of new approaches. :-) On Thu, 4 Oct 2018 at 21:06, Nils Weskamp <[email protected]> wrote: > Am 04.10.2018 um 20:53 schrieb Thomas Evangelidis: > > not sure if significantly longer path lengths (e.g. 12) actually > > "increase the amount of information" since they also increase the > risk > > of bit collisions in folded fingerprints. > > > > If you increase the fpSize to 8192, won't you reduce the risk of bit > > collisions? > > Yes, by a factor of two. However, depending on the size and complexity > of your compounds, I would expect that the number of bits growths > significantly more (due to combinatorial explosion) when you go from > path length 5 (or 7) to 12. > > Best, > Nils > > > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

