PySpark 2: Kmeans The input data is not directly cached

2016-11-03 Thread Zakaria Hili
Hi, I dont know why I receive the message WARN KMeans: The input data is not directly cached, which may hurt performance if its parent RDDs are also uncached. when I try to use Spark Kmeans df_Part = assembler.transform(df_Part) df_Part.cache()while (k<=max_cluster) and (wssse > seuilStop):

convert spark dataframe to numpy (ndarray)

2016-10-28 Thread Zakaria Hili
Hi, Is there any way to convert a spark dataframe into numpy ndarray without using toPandas operation ? Example: C1 C2 C3 C4 0.7 3.0 1000 109540.9 4.2 1200 12345 I want to get this output: [(0.7, 3.0, 1000L, 10954),(0.9, 4.2, 1200L, 12345)], dtype=[('C1', '

Re: Can't generate model for prediction

2016-08-11 Thread Zakaria Hili
here you can find more information about the code of my class " RandomForestRegression..java" : http://spark.apache.org/docs/latest/mllib-ensembles.html#regression ᐧ 2016-08-11 10:18 GMT+02:00 Zakaria Hili : > Hi, > > I recognize that spark can't save generated model

Can't generate model for prediction

2016-08-11 Thread Zakaria Hili
Hi, I recognize that spark can't save generated model on HDFS (I'm used random forest regression and linear regression for this test). it can save only the data directory as you can see in the picture bellow : [image: Images intégrées 1] but to load a model I will need some data from metadata di

Dataframe : Column features must be of type org.apache.spark.mllib.linalg.VectorUDT

2016-06-13 Thread Zakaria Hili
Hi, I create a dataframe using a schema, but when I try to create a model, I receive this error: requirement failed: Column features must be of type org.apache.spark.mllib.linalg.VectorUDT@f71b0bce but was actually ArrayType(StringType,true) piece of code SQLContext sqlContext = SQL

JavaDStream to Dataframe: Java

2016-06-03 Thread Zakaria Hili
Hi, I m newbie in spark and I want to ask you a simple question. I have an JavaDStream which contains data selected from sql database. something like (id, user, score ...) and I want to convert the JavaDStream to a dataframe . how can I do this with java ? Thank you ᐧ

Stream reading from database using spark streaming

2016-06-02 Thread Zakaria Hili
I want to use spark streaming to read data from RDBMS database like mysql. but I don't know how to do this using JavaStreamingContext JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.milliseconds(500));DataFrame df = jssc. ?? I search in the internet but I didn't find anythi