Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/7837#issuecomment-126988403
@mengxr Java compatibility might require API changes since `Item` type
parameter will not be inferred in Java's call to:
```scala
def run[Item, Basket <: JavaIterable[JavaIterable[Item]]](data:
JavaRDD[Basket])
: JavaRDD[(Iterable[Iterable[Item]], Long)] = {
implicit val tag = fakeClassTag[Item]
run[Item](data.rdd.map(_.asScala.toArray.map(_.asScala.toArray)))
.map { case (pattern: Array[Array[Item]], count: Long) =>
(pattern.map(_.toIterable).toIterable, count)
}
.toJavaRDD()
}
```
and will instead return a `JavaRDD[(Iterable[Iterable[Object]], Long)]`.
`FPGrowth` gets around this by defining an `FPGrowthModel[Item: ClassTag]` from
which when Java users call `freqItemsets` the type of the item has been reified
and they get the concrete type rather than `Object`.
The options I see are:
1. Make the API change, introducing a `PrefixSpanModel` class to be
returned by `run` with the reified `Item` type
2. Return `Object` type to Java users, forcing them to cast into their
desired item type.
3. Hold off on generic items for 1.5, only supporting Ints. Generalizing
to generic items should be backwards compatible, but we will probably run into
this problem about reified Item types in the future.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]