On 1 October 2016 at 20:48, Алексей Драль <[email protected]> wrote:
> Hi Thomas, > > What quality do you have on training? > > There is no silver bullet, but there is quite common technique you can use > to find out if you use appropriate algorithm. You can take a look at the > difference between "train" and "validation" quality of learning curves ( > example > <http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html#example-model-selection-plot-learning-curve-py>). > If you see big gap, then you can reduce complexity of your model to > overcome overfitting (reduce interaction parameter / number of variables > / iterations / ...). If you see a small gap, then you can try to increase > model complexity to fit your data better. > > > Hi Алексей, the "Training examples" in the learning curves are the number of observations used for training? Don't you think my dataset is kind of small (42 observations) to use that technique? > Moreover, I see you have a tiny dataset and use 50/50 split. I presume, > that you will train "production" model on the whole available dataset. In > that case, I suggest you to use more data for training and use almost LOO > <http://scikit-learn.org/stable/modules/cross_validation.html#leave-one-out-loo> > approach > to better estimate your predictive quality. But, be really cautious about > cross-validation as you can easily overfit your data. > > > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
