On Fri, Mar 29, 2013 at 12:57 AM, Nelle Varoquaux
<[email protected]> wrote:
> We need to find a uniform way over the whole scikit to indicate missing
> data. Hence, 0 cannot be how missing data is spotted.
> A solution would be to use "Nan" but it is not very satisfying either, as
> this could lead to think there is missing data, while there isn't.

Encoding missing values with np.nan doesn't scale to very
high-dimensional problems with mostly missing values.
Personally, for encoding missing data, I just use sparse matrices.
Values which are actually zero can be stored explicitly in the .data
attribute.

Mathieu

------------------------------------------------------------------------------
Own the Future-Intel&reg; Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game 
on Steam. $5K grand prize plus 10 genre and skill prizes. 
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to