srowen commented on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding 
job in AreaUnderCurve
URL: https://github.com/apache/spark/pull/24648#issuecomment-494447837
 
 
   The existing implementation also makes one pass; what do you mean there?
   This looks like it's just optimizing the implementation, though it makes it 
significantly more complex. The curve RDD is typically not nearly so large, 
maybe 1000 points. is it worth it?
   
   I wonder if a simpler implementation also gets a performance gain: 
   ```
     def of(curve: RDD[(Double, Double)]): Double = {
       curve.sliding(2).map { pair: Array[(Double, Double)] => trapezoid(pair) 
}.sum()
     }
   
     def of(curve: Iterable[(Double, Double)]): Double = {
       curve.toIterator.sliding(2).withPartial(false).map(trapezoid).sum
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to