Hi Cory, The lack of sample_weight support in sparse solvers is a known issue, see https://github.com/scikit-learn/scikit-learn/issues/1190
In the meantime, I see two solutions. As described in the above issue, one solution is to multiply each x_i and y_i in your training set by the square root of its sample weight. This will be exactly equivalent to using sample weights and will allow you to use fast sparse solvers like "sparse_cg" or "lsqr". The second solution is to use SGDRegressor(loss="squared"), which should readily support sample_weight. HTH, Mathieu On Wed, Apr 2, 2014 at 9:18 AM, Cory Dolphin <[email protected]> wrote: > Hello, > > I am trying to perform ridge regression on a relatively large data set 70 > million examples 24 million very sparse features. > > E.G. I have created an X matrix with dimensions (73725855, 24652292), an > associated y vector with dimensions (73725855,), and a sample_weights > vector with identical dimensions ((73725855,)). > > In this case, the y vector is a rating, and the sample_weights describe > how many times a given rating occurred. > > I need to use one of the sparse solvers, as the data set does not fit in > memory as a dense matrix, however it seems that all of the sparse solvers > do not accept a sample_weights vector. > > Does anyone have experience with weighted ridge regression on large sparse > matrices? > > > I am new to the world of machine learning, so please forgive me for any > vocabulary mistakes! > > Thanks, > Cory > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
