Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20235#discussion_r160969767
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with
MLlibTestSparkContext with Defaul
}
test("FPGrowth fit and transform with different data types") {
- Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach
{ dt =>
- val data = dataset.withColumn("items",
col("items").cast(ArrayType(dt)))
- val model = new FPGrowth().setMinSupport(0.5).fit(data)
- val generatedRules = model.setMinConfidence(0.5).associationRules
- val expectedRules = spark.createDataFrame(Seq(
- (Array("2"), Array("1"), 1.0),
- (Array("1"), Array("2"), 0.75)
- )).toDF("antecedent", "consequent", "confidence")
- .withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
- .withColumn("consequent", col("consequent").cast(ArrayType(dt)))
- assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
- generatedRules.sort("antecedent").rdd.collect()))
-
- val transformed = model.transform(data)
- val expectedTransformed = spark.createDataFrame(Seq(
- (0, Array("1", "2"), Array.emptyIntArray),
- (0, Array("1", "2"), Array.emptyIntArray),
- (0, Array("1", "2"), Array.emptyIntArray),
- (0, Array("1", "3"), Array(2))
- )).toDF("id", "items", "prediction")
- .withColumn("items", col("items").cast(ArrayType(dt)))
- .withColumn("prediction", col("prediction").cast(ArrayType(dt)))
- assert(expectedTransformed.collect().toSet.equals(
- transformed.collect().toSet))
+ class DataTypeWithEncoder[A](val a: DataType)
+ (implicit val encoder: Encoder[(Int,
Array[A], Array[A])])
--- End diff --
This class is needed for two purposes:
1. to connect data types with their corresponding DataType.
Note: this information is already available in AtomicType as InternalType,
but it's not accessible. Using it from this test doesn't justify making it
public.
2. to get the proper encoder to the testTransformer method. As the
datatypes are put into an array dt is inferred to be their parent type, and
implicit search is able to find the encoders only for concrete types.
For a similar reason, we need to use the type of the final encoder. If we
have only the encoder for A implicit search will not be able to construct
Array[A], as we have implicit encoders for Array[Int], Array[Short]... but not
for generic A, having an encoder.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]