Re: Reading as Parquet a directory created by Spark Structured Streaming - problems

2019-01-10 Thread ddebarbieux
cala> spark.read.schema(StructType(Seq(StructField("_1",StringType,false), StructField("_2",StringType,true.parque ("hdfs://---/MY_DIRECTORY/*_1=201812030900*").show() +++ | _1| _2| +++ |null|ba1ca2dc033440125...| |null|ba1ca2dc033440125...|

Spark 2 - How to order keys in sparse vector (K-means)?

2018-12-21 Thread ddebarbieux
Dear all, I am using Spark 2 in order to cluster data with the K-means algorithm. My input data is flat and K-means requires sparse vectors with ordered keys. Here is an example of an input and the expected output: [id, key, value] [1, 10, 100] [1, 30, 300] [2, 40, 400] [1, 20, 200] [id,