Splitting the data into train and test data is needed with any machine
learning model (not just linear regression with or without least squares).
The idea is that you want to evaluate the performance of your model
(prediction + scoring) on a portion of the data that you did not use for
training.
You'll find more details in the user guide
https://scikit-learn.org/stable/modules/cross_validation.html
Nicolas
On 5/31/19 8:54 PM, C W wrote:
Hello everyone,
I'm new to scikit learn. I see that many tutorial in scikit-learn
follows the work-flow along the lines of
1) tranform the data
2) split the data: train, test
3) instantiate the sklearn object and fit
4) predict and tune parameter
But, linear regression is done in least squares, so I don't think
train test split is necessary. So, I guess I can just use the entire
dataset?
Thanks in advance!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn