On 12/12/2022 01:26, Christopher Mayer-Bacon wrote:
Hello all,
I’m starting a project that explores the sampling of a large
compound library. My question is not so much about how to do
something, but rather the specific use cases for weighted sampling
from a compound library.
Given a large compound library and a smaller, reference library, I
want to take random samples from the large library such that the
samples resemble the reference library in some way. At the moment
I’m focused on element composition (% of carbon atoms, % of oxygen
atoms, etc.), but I’m open to using other features in the future.
- what if the smaller reference library has no overlap with the large
compound library? I.e. no overlap between the chemical space
sampled by each library.
Another strategy could be to generate new molecules using your
"smaller reference library" as a training set.
Cf. https://doi.org/10.1186/s13321-021-00566-4 for a simple method
with an open-source implementation. ;)
I have an idea of how to perform this sampling; my question for this
community concerns a possible use case. What would be the benefit of
sampling from a compound library such that the samples resemble
another library in some way? I can think of a use case for my
specific research niche (adaptive properties of the canonical amino
acid alphabet), but I can’t think of another potential use case. I
know the RDKit community has a wide variety of backgrounds and
expertise, hence why I wanted to pose this question to you all.
-Chris
--
-Christopher Mayer-Bacon (_he/him/his_)
PhD student
Department of Biological Sciences
University of Maryland, Baltimore County
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss