Hello, Apologies. this is a very basic question: If I am converting many ligands into morgan fingerprints, could I theoretically stack the bit representations on top of each other to get the same features represented across ligands? For example is the below representation correct?
| sample | feature1 | feature2 | feature3 | |:---- |:--------:|:--------:|---------:| | 1 | bit 1 | bit 2 | bit 3 | | 2 | bit 1 | bit 2 | bit 3 | | 3 | bit 1 | bit 2 | bit 3 | So basically is feature 1, 2, 3 etc always one type of substructure no matter what the input molecule is? What happens if the 2048 bits or substructures predesignated in rdkit do not contain a new substructure in a molecule we are evaluating? Any advice on how to reduce features and then use that reduced feature list for new molecules after training a model would also be appreciated. How would the model only extract the reduced bits for a new ligand if I remove low variance bits from the training set for example?
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss