Hi all,

hopefully a simple question:

Imputer on axis=0 drops four columns where the first element is nan even though not all of the values of these columns are nan.

I can't reproduce with a minimal example. Any ideas what could go wrong?

Please see below the code I used to debug.

Thanks,
Fabian

   X.shape
   (385186, 223)

   imp = Imputer(axis=0, verbose=5)

   imp.fit(X)
   Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
   verbose=5)

   X[:,72]
array([nan, 0.0166205 , 0.00619835, ..., 0.00189036, 0.00788955, 0.00378583])

   X[:,73]
array([nan, 0.31578947, 0.13636364, ..., 0.08695652, 0.30769231, 0.1627907 ])

   X.shape
   (385186, 223)

   np.isnan(X).all(axis=0).any()
   False

   X_imputed = imp.transform(X)
   
/home/user/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:347:
   UserWarning: Deleting features without observed values: [ 72  73 131
   132]
      "observed values: %s" % missing)

   X_imputed.shape
   (385186, 219)

   # There is also nans in other columns
   np.isnan(X).any(axis=0).sum()
   7

   np.isnan(X).any(axis=1).sum()
   107181

   np.isnan(X).all(axis=0).sum()
   0

   np.isnan(X).all(axis=1).sum()
   0

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to