Re: [scikit-learn] random forests using grouped data

2016-12-01 Thread Vlad Niculae
I don't think there are any such estimators in scikit-learn directly, but the model selection machinery is there to help. Check out GroupKFold [1] so you can do cross-validation after concatenating all the samples, while ensuring that training and validation groups are separate. The setup of

Re: [scikit-learn] random forests using grouped data

2016-12-01 Thread Brown J.B.
Hello Thomas, I don't personally know of any algorithm that works on collections of groupings, but why not first test a simple control model, meaning can you achieve a satisfactory model by simply concatenating all 48 scores per sample and building a forest the standard way? If not, what context

Re: [scikit-learn] random forests using grouped data

2016-12-01 Thread Thomas Evangelidis
Sorry, the previous email was incomplete. Below is how the grouped data look like: Group1: score1 = [0.56, 0.34, 0.42, 0.12, 0.08, 0.21, ...] score2 = [0.34, 0.27, 0.24, 0.05, 0.13, 0,14, ...] y=[1,1,1,0,0,0, ...] # 1 indicates "active" and 0 "inactive" Group2: score1 = [0.34, 0.38, 0.48,

[scikit-learn] random forests using grouped data

2016-12-01 Thread Thomas Evangelidis
Greetings ​I have grouped data which are divided into actives and inactives. The features are two different types of normalized scores (0-1), where the higher the score the most probable is an observation to be an "active". My data look like this: Group1: score1 = [0.56, 0.34, 0.42, 0.12, 0.08,