I don't think there are any such estimators in scikit-learn directly,
but the model selection machinery is there to help. Check out
GroupKFold [1] so you can do cross-validation after concatenating all
the samples, while ensuring that training and validation groups are
separate.
The setup of
Hello Thomas,
I don't personally know of any algorithm that works on collections of
groupings, but why not first test a simple control model, meaning
can you achieve a satisfactory model by simply concatenating all 48 scores
per sample and building a forest the standard way?
If not, what context
Sorry, the previous email was incomplete. Below is how the grouped data
look like:
Group1:
score1 = [0.56, 0.34, 0.42, 0.12, 0.08, 0.21, ...]
score2 = [0.34, 0.27, 0.24, 0.05, 0.13, 0,14, ...]
y=[1,1,1,0,0,0, ...] # 1 indicates "active" and 0 "inactive"
Group2:
score1 = [0.34, 0.38, 0.48,
Greetings
I have grouped data which are divided into actives and inactives. The
features are two different types of normalized scores (0-1), where the
higher the score the most probable is an observation to be an "active". My
data look like this:
Group1:
score1 = [0.56, 0.34, 0.42, 0.12, 0.08,