Re: how to construct parameter for model.transform() from datafile

2017-03-14 Thread Liang-Chi Hsieh
Just found that you can specify number of features when loading libsvm source: val df = spark.read.option("numFeatures", "100").format("libsvm") Liang-Chi Hsieh wrote > As the libsvm format can't specify number of features, and looks like > NaiveBayes doesn't have such parameter, if your

Re: how to construct parameter for model.transform() from datafile

2017-03-14 Thread Liang-Chi Hsieh
As the libsvm format can't specify number of features, and looks like NaiveBayes doesn't have such parameter, if your training/testing data is sparse, the number of features inferred from the data files can be inconsistent. We may need to fix this. Before a fixing going into NaiveBayes,

Re: how to construct parameter for model.transform() from datafile

2017-03-14 Thread Yuhao Yang
Hi Jinhong, Based on the error message, your second collection of vectors has a dimension of 804202, while the dimension of your training vectors was 144109. So please make sure your test dataset are of the same dimension as the training data. >From the test dataset you posted, the vector

Re: Question on Spark's graph libraries roadmap

2017-03-14 Thread Andy
GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine which GraphX used to be. And I'm sorry to say, it doesn’t fit most scenarioes at all in fact. Enzo, I don’t think there is any roadmap of Graph libraries for Spark for now. *Andy* On Tue, Mar 14, 2017 at 7:28 AM, Tim Hunter