Re: [scikit-learn] Creating dataset

2020-11-08 Thread Nicolas Hug
load_iris() reads a csv file, and then retrieves/sets some other info like the feature names and a description of the dataset (which comes from another file) Then it packs everything into a Bunch object which is basically a fancy dict: https://github.com/scikit-learn/scikit-learn/blob/master/

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Mahmood Naderan
>You need to understand what the different statements are doing; just >as you need to understand what processing you apply on your data >(whether it's preprocessing or learning) to properly use any machine >learning tool. I know, but the problem is that the csv file of the iris doesn't have such

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Matthieu Brucher
data_file["data"], this works only if you have such a column as well. load_csv can perfectly do what you need, but you have to adapt the script to what you have in the csv (which is something only you know!). You need to understand what the different statements are doing; just as you need to unders

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Mahmood Naderan
Thanks for the replies. >I'd recommend just reading that csv file with e.g. pandas >( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), >and then just use the dataframe as input to scikit-learn utilities (you may need to >separate the features X from the target y).

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Alex Levin
Guillaume, I only meant that he can do it locally :) submitting that would be a bad idea On Sun, Nov 8, 2020 at 2:03 PM Guillaume LemaƮtre wrote: > I would not recommend the solution of Alex. Do not modify the scikit-learn > source code. > Write it in your own Python module. > > But most probabl

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Guillaume LemaƮtre
I would not recommend the solution of Alex. Do not modify the scikit-learn source code. Write it in your own Python module. But most probably the solution of Nicolas should be enough for 99% of the use-cases. Cheers, On Sun, 8 Nov 2020 at 12:41, Alex Levin wrote: > Hi Mahmood > You can add you

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Alex Levin
Hi Mahmood You can add your data set to `datasets/data` and then implement `load_my_data` function in `datasets/_base.py` Also register it in `datasets/__init__.py` On Sun, Nov 8, 2020 at 1:24 PM Mahmood Naderan wrote: > Hi, > I have created an input file similar to iris data set. That is someth

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Nicolas Hug
Mahmood, From what I understand your dataset is stored in a csv file. I'd recommend just reading that csv file with e.g. pandas (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), and then just use the dataframe as input to scikit-learn utilities (you may need t

[scikit-learn] Creating dataset

2020-11-08 Thread Mahmood Naderan
Hi, I have created an input file similar to iris data set. That is something like this: 0.1,0.2,0.3,0.4,M1 ... I want to know how I can create my own dataset similar to the following lines? from sklearn.datasets import load_iris iris = load_iris() Regards, Mahmood _