Re: [scikit-learn] suggested machine learning algorithm

Алексей Драль Sun, 02 Oct 2016 15:54:54 -0700

2016-10-02 13:23 GMT+01:00 Thomas Evangelidis <[email protected]>:

>
>
> On 1 October 2016 at 20:48, Алексей Драль <[email protected]> wrote:
>
>> Hi Thomas,
>>
>> What quality do you have on training?
>>
>> There is no silver bullet, but there is quite common technique you can
>> use to find out if you use appropriate algorithm. You can take a look at
>> the difference between "train" and "validation" quality of learning curves (
>> example
>> <http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html#example-model-selection-plot-learning-curve-py>).
>> If you see big gap, then you can reduce complexity of your model to
>> overcome overfitting (reduce interaction parameter / number of variables
>> / iterations / ...). If you see a small gap, then you can try to increase
>> model complexity to fit your data better.
>> 
>>
>> Hi Алексей,
>
> the "Training examples" in the learning curves are  the number of
> observations used for training? Don't you think my dataset is kind of small
> (42 observations) to use that technique?
>


Yes, it is really a tiny dataset =). You don't necessarily need to use it
over number of training observations. For instance, you can have this plot
over number of iterations.


>
>
>
>> Moreover, I see you have a tiny dataset and use 50/50 split. I presume,
>> that you will train "production" model on the whole available dataset.
>> In that case, I suggest you to use more data for training and use almost
>> LOO
>> <http://scikit-learn.org/stable/modules/cross_validation.html#leave-one-out-loo>
>>  approach
>> to better estimate your predictive quality. But, be really cautious about
>> cross-validation as you can easily overfit your data.
>>
>>
>>
>>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Yours sincerely,
https://www.linkedin.com/in/alexey-dral
Alexey A. Dral

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] suggested machine learning algorithm

Reply via email to