Dear,
I encountered a possible bug when using pandas dataframes with sklearn and
the preprocessing.StandardScaler in particular (I was applying some SVR,
and therefore used the scaler).
In short: when using a pandas dataframe in the StandardScaler().transform
function, the original dataframe was also changed (although the default
value for 'copy' is True, so it should not change the input in place). This
did not happen when first converting the dataframe column to a plain numpy
array.
I made a notebook showing the problem with some dummy data:
http://nbviewer.ipython.org/5386690
So I have two questions:
1. How is actually the pandas support in sklearn? Does sklearn 'officially'
say that it supports working with pandas dataframes?
2. Is this indeed a bug? Or is there something stupid I am missing? Do I
file an issue for it?
In the meanwhile, I think I found the source of the bug. I also added it to
the notebook, but in short it comes to this. When using a pandas dataframe
df, the statement:
np.asarray(df) is df
will return False even though the resulting numpy array *is* a view and no
copy of the data of df (when there are only numerical values in the
dataframe). This is a problem in check_arrays():
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L215
Kind regards,
Joris Van den Bossche
--
ir. Joris Van den Bossche - PhD Student
KERMIT, Research Unit Knowledge-based Systems
Department of Mathematical Modelling, Statistics and Bioinformatics
Ghent University - Faculty of Bioscience Engineering
Coupure links 653, 9000 Gent, Belgium
URL: http://www.biomath.ugent.be/
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general