Hello,

Apologies. this is a very basic question:
If I am converting many ligands into morgan fingerprints, could I
theoretically stack the bit representations on top of each other to get the
same features represented across ligands? For example is the below
representation correct?

| sample | feature1 | feature2 | feature3 |
|:----   |:--------:|:--------:|---------:|
| 1      | bit 1    | bit 2    | bit 3    |
| 2      | bit 1    | bit 2    | bit 3    |
| 3      | bit 1    | bit 2    | bit 3    |

So basically is feature 1, 2, 3 etc always one type of substructure no
matter what the input molecule is? What happens if the 2048 bits or
substructures predesignated in rdkit do not contain a new substructure in a
molecule we are evaluating?

Any advice on how to reduce features and then use that reduced feature list
for new molecules after training a model would also be appreciated. How
would the model only extract the reduced bits for a new ligand if I remove
low variance bits from the training set for example?
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to