color and clarity are ordered factors, so sparse.model.matrix is generating orthogonal-polynomial contrasts (see ?contr.poly). This is by design ... what are you trying to do? Are you interested in fac2sparse?

## Advertising

On 18-02-07 11:00 PM, Dario Strbenac wrote: > Good day, > > Sometimes, sparse.model.matrix outputs a dgCMatrix which has column names > consisting of factor levels that were not in the original dataset. The first > factor appears to be correctly transformed, but the following factors don't. > For example: > > diamonds <- as.data.frame(ggplot2::diamonds) >> colnames(sparse.model.matrix(~ . -1, diamonds)) > [1] "carat" "cutFair" "cutGood" "cutVery Good" "cutPremium" > "cutIdeal" "color.L" "color.Q" "color.C" "color^4" > "color^5" > [12] "color^6" "clarity.L" "clarity.Q" "clarity.C" "clarity^4" > "clarity^5" "clarity^6" "clarity^7" "depth" "table" > "price" > [23] "x" "y" "z" > > The variables color and clarity don't have factor levels which have been > suffixed to them in the transformed matrix. The values in those columns are > also wrong. Changing the Ord.factor columns into simply being factors fixes > the problem. > >> diamonds[, "cut"] <- factor(as.character(diamonds[, "cut"])) >> diamonds[, "color"] <- factor(as.character(diamonds[, "color"])) >> diamonds[, "clarity"] <- factor(as.character(diamonds[, "clarity"])) > >> colnames(sparse.model.matrix(~ . -1, diamonds)) # No more invented factor >> levels. > [1] "carat" "cutFair" "cutGood" "cutIdeal" "cutPremium" > "cutVery Good" "colorE" "colorF" "colorG" "colorH" > [11] "colorI" "colorJ" "clarityIF" "claritySI1" "claritySI2" > "clarityVS1" "clarityVS2" "clarityVVS1" "clarityVVS2" "depth" > [21] "table" "price" "x" "y" "z" > > Can it be made to work correctly for both plain and ordered factors? > >> sessionInfo() > R Under development (unstable) (2018-02-06 r74231) > Platform: i386-w64-mingw32/i386 (32-bit) > > other attached packages: > [1] Matrix_1.2-12 > > loaded via a namespace (and not attached): > [1] colorspace_1.3-2 scales_0.5.0 compiler_3.5.0 lazyeval_0.2.1 > [5] plyr_1.8.4 pillar_1.1.0 gtable_0.2.0 tibble_1.4.2 > [9] Rcpp_0.12.15 ggplot2_2.2.1 grid_3.5.0 rlang_0.1.6 > [13] munsell_0.4.3 lattice_0.20-35 > > -------------------------------------- > Dario Strbenac > University of Sydney > Camperdown NSW 2050 > Australia > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel