Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/21195
Thanks Lu!
I had a pass over this PR and it looks pretty straightforward. One thing I
noticed is that there are two patterns that we keep repeating. I think we
should add private APIs for these patterns and delegate to those.
The first pattern is the validate schema method defined in terms of
typeCandidates. I suggest we add something like
`validateVectorCompatibleColumn` to `DatasetUtils`. In addition to helping with
code reuse, this api would make it easier if we ever decide, for example, to
support Arrays of Ints.
The second pattern is going from a dataframe & column name to an
rdd[OldVector]. Lets add a method that does this, maybe something like
`(DataFrame, String) => RDD[OldVector]`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]