Re: [Scikit-learn-general] Inputer, python list and strings

2014-09-25 Thread Vlad Niculae
Hi Zoraida, The Imputer assumes that your data is a numeric numpy array, or convertible to one. You should replace your string "NA" values with np.nan objects, then use the Imputer with the default, `missing_values='NaN'`. It's easier to debug if you explicitly convert your data to a float numpy

[Scikit-learn-general] Inputer, python list and strings

2014-09-25 Thread ZORAIDA HIDALGO SANCHEZ
Hi all, I am having problems when trying to deal with missing values. I am using Imputer like this: Pipeline([('imputerNA', Imputer(missing_values='NA', strategy='mean', axis=0, verbose=4)), ('minmax', MinMaxScaler())]))] My data looks like this: 24881956.0|NA|1840.0|NA|NA|48.0|1.4|NA|-1.0|0.0|