Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444535
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] (
// TODO: Check more carefully about whether this whole class will be
included in a closure.
/** Per-vector transform function */
- private val transformFunc: Vector => Vector = {
+ private lazy val transformFunc: Vector => Vector = {
val sortedCatFeatureIndices = categoryMaps.keys.toArray.sorted
val localVectorMap = categoryMaps
val localNumFeatures = numFeatures
+ val localHandleInvalid = getHandleInvalid
val f: Vector => Vector = { (v: Vector) =>
assert(v.size == localNumFeatures, "VectorIndexerModel expected
vector of length" +
s" $numFeatures but found length ${v.size}")
+ val exceptMsg = "VectorIndexer encountered NULL value. To handle" +
--- End diff --
And I would suggest moving the exceptMsg in case
VectorIndexer.ERROR_INVALID, where it may provide some concrete error info,
like the featureIndex and unexpected value.
Otherwise it will be very hard for the users to locate the root cause for
the error case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]