>>>>> Trevor Hastie <has...@stanford.edu> >>>>> on Thu, 19 Mar 2015 16:03:38 -0700 writes:
> Hi Martin > I got stung by this last week. > glmnet produces a coefficient matrix of class “dgCMatrix” > If a predictor matrix was created using sparseMatrix as follows, > one gets unexpected results, as this simple example shows. > My fix was easy (I always convert the predictor matrix to class “dgCMatrix” now) > Trevor >> y=Matrix(diag(4)) Considerably faster (for larger n): Diagonal(4) if you want a sparse matrix directly, there are .sparseDiagonal() and .symDiagonal() function >> y > 4 x 4 diagonal matrix of class "ddiMatrix" > [,1] [,2] [,3] [,4] > [1,] 1 . . . > [2,] . 1 . . > [3,] . . 1 . > [4,] . . . 1 there's no problem with 'y' which is a "diagonalMatrix" and only needs O(n) storage rather than diag(n), right ? >> z=sparseMatrix(1:4,1:4) >> z > 4 x 4 sparse Matrix of class "ngCMatrix" > [1,] | . . . > [2,] . | . . > [3,] . . | . > [4,] . . . | >> beta=as(Matrix(1:4),"dgCMatrix") >> y%*%beta > 4 x 1 sparse Matrix of class "dgCMatrix" > [1,] 1 > [2,] 2 > [3,] 3 > [4,] 4 >> z%*%beta > 4 x 1 sparse Matrix of class "ngCMatrix" > [1,] | > [2,] | > [3,] | > [4,] | >> Yes, the last one is what I consieder bogous. Thank you, Trevor, for the feedback! Martin >> On Mar 19, 2015, at 3:02 PM, Martin Maechler <maech...@stat.math.ethz.ch> wrote: >> >> This is a Request For Comment, also BCCed to 390 package maintainers >> of reverse dependencies of the Matrix package. >> >> Most users and package authors working with our 'Matrix' package will >> be using it for numerical computations, and so will be using >> "dMatrix" (d : double precision) matrix objects M, and indirectly, e.g., for >> M >= c will also use "lMatrix" (l: logical i.e. TRUE/FALSE/NA). >> All the following is **not** affecting those numerical / logical >> computations. >> >> A few others will know that we also have "pattern" matrices (purely >> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix" etc, >> all starting with "n" (from ``patter[n]``) which do play a prominent >> role in the internal sparse matrix algorithms, notably of the >> (underlying C code) CHOLMOD library in the so-called "symbolic" >> cholesky decomposition and other such operations. Another reason you >> may use them because they are equivalent to incidence matrices of >> unweighted (directed or undirected) graphs. >> >> Now, as the subject says, I'm bringing up the topic of what should >> happen when these matrices appear in matrix multiplications. >> Somewhat by design, but also partly by coincidence, the *sparse* >> pattern matrices multiplication in the Matrix package mostly builds on >> the CHOLMOD library `cholmod_ssmult()` function which implements >> "Boolean arithmetic" for them, instead of regular arithmetic: >> "+" is logical "or" >> "*" is logical "and". >> Once we map TRUE <-> 1 and FALSE <-> 0, the only difference between >> boolean and regular arithmetic is that "1+1 = 1" in the (mapped) >> boolean arithmetic, because "TRUE | TRUE" is TRUE in original logic. >> >> The drawback of using the boolean arithmetic here is the "clash" with >> the usual numeric arithmetic, and arithmetic in R where logical is >> coerced to integer (and that to "double") when certain numerical >> functions/operations are used. >> >> A more severe problem --- which I had not been aware of until >> relatively recently -- is the fact that the CHOLMD function >> cholmod_ssdmult(A, B) >> treats *both* A and B as "pattern" as soon as one of them is a >> (sparse) pattern matrix. >> And this is - I say - in clear contrast to what R users would expect: >> If you multiply a numeric with a "kind of logical" matrix (a pattern >> one), you will expect that the >> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is >> combined with a numeric matrix. >> So we could say that in this case, the Matrix package behavior is >> clearly bugous .... but still it has been the behavior for the last 10 >> years or so. >> >> RFC 1: "Change 1": >> I currently propose to change this behavior for the upcoming release >> of Matrix (version 1.2-0), though I have no idea if dependent >> packages would partly fail their checks or otherwise have changed >> behavior subsequently. >> The change seems sensible, since I think if your package relied on >> this behavior, it was inadvertent and accidental. >> Still you may differ in your opinion about this change nr.1 >> >> RFC 2: "Change 2": >> This change would be more radical, and something I would not plan for >> the upcoming release of Matrix, but possibly for an update say one or >> two months later or so: It concerns the matrix products when *both* >> matrices are pattern. A situation where the boolean arithmetic may >> really make sense and where indeed packages may have depended on the >> current behavior ("T + T |--> T"). ... although that is currently >> only used for *sparse* pattern matrices, not for dense ones. >> >> Further, it may still seem surprising that matrix multiplication does >> not behave numerically for a pair of such matrices, and by the >> principle of "least surprise" we should provide the boolean arithmetic >> matrix products in another way than by the standard %*%, >> crossprod() and tcrossprod() functions. >> So one possibility could be to change the standard functions to behave >> numerically, >> and e.g., use %&% (replace the numeric "*" by a logical "&") and >> crossprod(A,B, boolean=TRUE), tcrossprod(A,B, boolean=TRUE) >> for the three boolean arithmetic version of matrix multiplications. >> >> What do you think about this? I'm particularly interested to hear >> from authors and users of packages such as 'arules' which IIRC >> explicitly work with sparse pattern matrices. >> >> Thank you for your thoughts and creative ideas, >> Martin Maechler, ETH Zurich > ---------------------------------------------------------------------------------------- > Trevor Hastie has...@stanford.edu <mailto:has...@stanford.edu> > Professor, Department of Statistics, Stanford University > Phone: (650) 725-2231 Fax: (650) 725-8977 > URL: http://www.stanford.edu/~hastie <http://www-stat.stanford.edu/~hastie> > address: room 104, Department of Statistics, Sequoia Hall > 390 Serra Mall, Stanford University, CA 94305-4065 > -------------------------------------------------------------------------------------- ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel