Hi Zoraida,

The Imputer assumes that your data is a numeric numpy array, or
convertible to one. You should replace your string "NA" values with
np.nan objects, then use the Imputer with the default,
`missing_values='NaN'`.

It's easier to debug if you explicitly convert your data to a float
numpy array prior to feeding it to the pipeline.

Hope this helps,
Vlad

On Thu, Sep 25, 2014 at 5:41 PM, ZORAIDA HIDALGO SANCHEZ
<zoraida.hidalgosanc...@telefonica.com> wrote:
> Hi all,
>
> I am having problems when trying to deal with missing values. I am using
> Imputer like this:
>
> Pipeline([('imputerNA', Imputer(missing_values='NA', strategy='mean',
> axis=0, verbose=4)), ('minmax', MinMaxScaler())]))]
>
> My data looks like this:
>
> 24881956.0|NA|1840.0|NA|NA|48.0|1.4|NA|-1.0|0.0|0.0|1.0
>
> and I am getting this exception:
>
>
>
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/sklearn/pipeline.py", line 119, in _pre_transform
>     Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/sklearn/base.py", line 429, in fit_transform
>     return self.fit(X, y, **fit_params).transform(X)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/sklearn/preprocessing/imputation.py", line 181, in fit
>     X = atleast2d_or_csc(X, dtype=np.float64, force_all_finite=False)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/sklearn/utils/validation.py", line 154, in atleast2d_or_csc
>     force_all_finite)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/sklearn/utils/validation.py", line 142, in
> _atleast2d_or_sparse
>     force_all_finite=force_all_finite)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/sklearn/utils/validation.py", line 120, in array2d
>     X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> site-packages/numpy/core/numeric.py", line 460, in asarray
>     return array(a, dtype, copy=False, order=order)
> ValueError: could not convert string to float: NA
>
>
> It fails when it tries to convert X(which is a list of list) into numpy
> array. That is fair because a numpy array elements must be of the same
> time and I have strings and floats.
>
> Does it make sense?
>
> Thanks in advance,
>
> Zoraida.-
>
>
> ________________________________
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, 
> puede contener información privilegiada o confidencial y es para uso 
> exclusivo de la persona o entidad de destino. Si no es usted. el destinatario 
> indicado, queda notificado de que la lectura, utilización, divulgación y/o 
> copia sin autorización puede estar prohibida en virtud de la legislación 
> vigente. Si ha recibido este mensaje por error, le rogamos que nos lo 
> comunique inmediatamente por esta misma vía y proceda a su destrucción.
>
> The information contained in this transmission is privileged and confidential 
> information intended only for the use of the individual or entity named 
> above. If the reader of this message is not the intended recipient, you are 
> hereby notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this transmission 
> in error, do not read it. Please immediately reply to the sender that you 
> have received this communication in error and then delete it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, 
> pode conter informação privilegiada ou confidencial e é para uso exclusivo da 
> pessoa ou entidade de destino. Se não é vossa senhoria o destinatário 
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou cópia 
> sem autorização pode estar proibida em virtude da legislação vigente. Se 
> recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente 
> por esta mesma via e proceda a sua destruição
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to