srowen commented on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding job in AreaUnderCurve URL: https://github.com/apache/spark/pull/24648#issuecomment-494447837 The existing implementation also makes one pass; what do you mean there? This looks like it's just optimizing the implementation, though it makes it significantly more complex. The curve RDD is typically not nearly so large, maybe 1000 points. is it worth it? I wonder if a simpler implementation also gets a performance gain: ``` def of(curve: RDD[(Double, Double)]): Double = { curve.sliding(2).map { pair: Array[(Double, Double)] => trapezoid(pair) }.sum() } def of(curve: Iterable[(Double, Double)]): Double = { curve.toIterator.sliding(2).withPartial(false).map(trapezoid).sum } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
