Hi Zoraida, The Imputer assumes that your data is a numeric numpy array, or convertible to one. You should replace your string "NA" values with np.nan objects, then use the Imputer with the default, `missing_values='NaN'`.
It's easier to debug if you explicitly convert your data to a float numpy array prior to feeding it to the pipeline. Hope this helps, Vlad On Thu, Sep 25, 2014 at 5:41 PM, ZORAIDA HIDALGO SANCHEZ <zoraida.hidalgosanc...@telefonica.com> wrote: > Hi all, > > I am having problems when trying to deal with missing values. I am using > Imputer like this: > > Pipeline([('imputerNA', Imputer(missing_values='NA', strategy='mean', > axis=0, verbose=4)), ('minmax', MinMaxScaler())]))] > > My data looks like this: > > 24881956.0|NA|1840.0|NA|NA|48.0|1.4|NA|-1.0|0.0|0.0|1.0 > > and I am getting this exception: > > > > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/sklearn/pipeline.py", line 119, in _pre_transform > Xt = transform.fit_transform(Xt, y, **fit_params_steps[name]) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/sklearn/base.py", line 429, in fit_transform > return self.fit(X, y, **fit_params).transform(X) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/sklearn/preprocessing/imputation.py", line 181, in fit > X = atleast2d_or_csc(X, dtype=np.float64, force_all_finite=False) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/sklearn/utils/validation.py", line 154, in atleast2d_or_csc > force_all_finite) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/sklearn/utils/validation.py", line 142, in > _atleast2d_or_sparse > force_all_finite=force_all_finite) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/sklearn/utils/validation.py", line 120, in array2d > X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > site-packages/numpy/core/numeric.py", line 460, in asarray > return array(a, dtype, copy=False, order=order) > ValueError: could not convert string to float: NA > > > It fails when it tries to convert X(which is a list of list) into numpy > array. That is fair because a numpy array elements must be of the same > time and I have strings and floats. > > Does it make sense? > > Thanks in advance, > > Zoraida.- > > > ________________________________ > > Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, > puede contener información privilegiada o confidencial y es para uso > exclusivo de la persona o entidad de destino. Si no es usted. el destinatario > indicado, queda notificado de que la lectura, utilización, divulgación y/o > copia sin autorización puede estar prohibida en virtud de la legislación > vigente. Si ha recibido este mensaje por error, le rogamos que nos lo > comunique inmediatamente por esta misma vía y proceda a su destrucción. > > The information contained in this transmission is privileged and confidential > information intended only for the use of the individual or entity named > above. If the reader of this message is not the intended recipient, you are > hereby notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this transmission > in error, do not read it. Please immediately reply to the sender that you > have received this communication in error and then delete it. > > Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, > pode conter informação privilegiada ou confidencial e é para uso exclusivo da > pessoa ou entidade de destino. Se não é vossa senhoria o destinatário > indicado, fica notificado de que a leitura, utilização, divulgação e/ou cópia > sem autorização pode estar proibida em virtude da legislação vigente. Se > recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente > por esta mesma via e proceda a sua destruição > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general