Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19156#discussion_r149823481
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -197,14 +240,14 @@ private[ml] object SummaryBuilderImpl extends Logging
{
* metrics that need to de computed internally to get the final result.
*/
private val allMetrics: Seq[(String, Metric, DataType,
Seq[ComputeMetric])] = Seq(
- ("mean", Mean, arrayDType, Seq(ComputeMean, ComputeWeightSum)),
- ("variance", Variance, arrayDType, Seq(ComputeWeightSum, ComputeMean,
ComputeM2n)),
+ ("mean", Mean, vectorUDT, Seq(ComputeMean, ComputeWeightSum)),
+ ("variance", Variance, vectorUDT, Seq(ComputeWeightSum, ComputeMean,
ComputeM2n)),
("count", Count, LongType, Seq()),
- ("numNonZeros", NumNonZeros, arrayLType, Seq(ComputeNNZ)),
- ("max", Max, arrayDType, Seq(ComputeMax, ComputeNNZ)),
- ("min", Min, arrayDType, Seq(ComputeMin, ComputeNNZ)),
- ("normL2", NormL2, arrayDType, Seq(ComputeM2)),
- ("normL1", NormL1, arrayDType, Seq(ComputeL1))
+ ("numNonZeros", NumNonZeros, vectorUDT, Seq(ComputeNNZ)),
--- End diff --
Could you let me know why did you make this change? I think we should use
long array rather than double array to store ```numNonZeros```.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]