On Fri, Mar 29, 2013 at 12:57 AM, Nelle Varoquaux <[email protected]> wrote: > We need to find a uniform way over the whole scikit to indicate missing > data. Hence, 0 cannot be how missing data is spotted. > A solution would be to use "Nan" but it is not very satisfying either, as > this could lead to think there is missing data, while there isn't.
Encoding missing values with np.nan doesn't scale to very high-dimensional problems with mostly missing values. Personally, for encoding missing data, I just use sparse matrices. Values which are actually zero can be stored explicitly in the .data attribute. Mathieu ------------------------------------------------------------------------------ Own the Future-Intel® Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
