On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
> > The other missing bit is dataframes. R and Spark have them in different > forms but Mahout largely ignores the issue of real world object ids. Mahout only supports matrices and vectors, not data frames. Data frames imply mix of various types of data which yet to be converted to numerical data to be consumed by algebraic algorithm (in R, usually done via formula). Unfortunately Mahout has no extension for formula. As for data frames, usually native data frames (e.g., spark data frames specifically) work reasonably well for vectorization of non-numerical data. distributed matrices are indeed do not support column labels, and row labels are quasi-supported, meaning they share label nature with unordered row index for transposition purposes, i.e., one can either have row labels and limited transposition semantics, or one can have integer labels interpreted as column index for transposition purposes, but not both. another way is to use mahout NamedVectors for the purposes of row labeling, but this is not supported consistently in any given elementary solver. > >