Maciej Szymkiewicz created SPARK-19940: ------------------------------------------
Summary: FPGrowthModel.transform should skip duplicated items Key: SPARK-19940 URL: https://issues.apache.org/jira/browse/SPARK-19940 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.2.0 Reporter: Maciej Szymkiewicz Priority: Minor Due to misplaced {{distinct}} {{FPGrowthModel.transform} generates duplicated items in the "prediction": {code} scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) data: org.apache.spark.sql.DataFrame = [features: array<string>] scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) data: org.apache.spark.sql.DataFrame = [features: array<string>] scala> fpm.transform(Seq(Array("t", "s")).toDF("features")).show(1, false) +--------+---------------------+ |features|prediction | +--------+---------------------+ |[t, s] |[y, x, z, x, y, x, z]| +--------+---------------------+ {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org