Mllib Logistic Regression performance relative to Mahout

2016-02-26 Thread raj.kumar
Hi, We are trying to port over some code that uses Mahout Logistic Regression to Mllib Logistic Regression and our preliminary performance tests indicate a performance bottleneck. It is not clear to me if this is due to one of three factors: o Comparing apples to oranges o Inadequate tuning o

Saving and Loading Dataframes

2016-02-25 Thread raj.kumar
Hi, I am using mllib. I use the ml vectorization tools to create the vectorized input dataframe for the ml/mllib machine-learning models with schema: root |-- label: double (nullable = true) |-- features: vector (nullable = true) To avoid repeated vectorization, I am trying to save and load

Dataset Encoders for SparseVector

2016-02-04 Thread raj.kumar
Hi, I have a DataFrame df with a column "feature" of type SparseVector that results from the ml library's VectorAssembler class. I'd like to get a Dataset of SparseVectors from this column, but when I do a df.as[SparseVector] scala complains that it doesn't know of an encoder for