Re: Building SparkML vectors from long data

2018-07-03 Thread Patrick McCarthy
I'm still validating my results, but my solution for the moment looks like the below. I'm presently dealing with one-hot encoded values, so all the numbers in my array are 1: def udfMaker(feature_len): return F.udf(lambda x: SparseVector(feature_len, sorted(x), [1.0]*len(x)), VectorUDT()) in

Re: Building SparkML vectors from long data

2018-06-12 Thread Nathan Kronenfeld
I don't know if this is the best way or not, but: val indexer = new StringIndexer().setInputCol("vr").setOutputCol("vrIdx") val indexModel = indexer.fit(data) val indexedData = indexModel.transform(data) val variables = indexModel.labels.length val toSeq = udf((a: Double, b: Double) => Seq(a, b))