LuciferYang commented on code in PR #43890:
URL: https://github.com/apache/spark/pull/43890#discussion_r1399115205
##########
core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala:
##########
@@ -476,7 +476,7 @@ private[spark] object ResourceUtils extends Logging {
if (maxTaskPerExec < (execAmount * numParts / taskAmount)) {
val origTaskAmount = treq.amount
val taskReqStr = s"${origTaskAmount}/${numParts}"
- val resourceNumSlots = Math.floor(execAmount * numParts /
taskAmount).toInt
+ val resourceNumSlots = Math.floor(execAmount * numParts /
taskAmount.toDouble).toInt
Review Comment:
Before this pr, the result here was a Long value from `execAmount * numParts
/ taskAmount`. Is there a need to perform `Math.floor` on it? @tgravescs Do you
have time to help check this logic? Thanks
##########
common/unsafe/src/test/scala/org/apache/spark/unsafe/types/UTF8StringPropertyCheckSuite.scala:
##########
@@ -80,7 +80,9 @@ class UTF8StringPropertyCheckSuite extends AnyFunSuite with
ScalaCheckDrivenProp
test("compare") {
forAll { (s1: String, s2: String) =>
- assert(Math.signum(toUTF8(s1).compareTo(toUTF8(s2))) ===
Math.signum(s1.compareTo(s2)))
+ assert(Math.signum {
Review Comment:
What is the necessity of using `{}` instead of `()` here?
##########
graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala:
##########
@@ -321,7 +321,7 @@ class PageRankSuite extends SparkFunSuite with
LocalSparkContext {
val rank = if (vid < source) {
0.0
} else {
- a * Math.pow(1 - resetProb, vid - source)
+ a * Math.pow(1.0 - resetProb, vid.toDouble - source)
Review Comment:
nit: `resetProb` is already a double type, it seems unnecessary to
explicitly change it to 1.0, right?
##########
streaming/src/main/scala/org/apache/spark/streaming/ui/UIUtils.scala:
##########
@@ -67,9 +67,9 @@ private[streaming] object UIUtils {
* will discard the fractional part.
*/
def convertToTimeUnit(milliseconds: Long, unit: TimeUnit): Double = unit
match {
- case TimeUnit.NANOSECONDS => milliseconds * 1000 * 1000
- case TimeUnit.MICROSECONDS => milliseconds * 1000
- case TimeUnit.MILLISECONDS => milliseconds
+ case TimeUnit.NANOSECONDS => milliseconds * 1000.0 * 1000.0
Review Comment:
How about unifying it to `milliseconds.toDouble`?
##########
core/src/main/scala/org/apache/spark/input/FixedLengthBinaryInputFormat.scala:
##########
@@ -74,7 +74,7 @@ private[spark] class FixedLengthBinaryInputFormat
if (defaultSize < recordLength) {
recordLength.toLong
} else {
- (Math.floor(defaultSize / recordLength) * recordLength).toLong
+ (Math.floor(defaultSize.toDouble / recordLength.toDouble) *
recordLength).toLong
Review Comment:
1. Is it feasible here to just add `.toDouble` to `defaultSize`?
2. In the current scenario, theoretically, both `defaultSize` and
`recordLength` should be integer(or long) greater than 0 and `defaultSize /
recordLength` is a long value. So, Is there a need to perform Math.floor on it?
This is a very old piece of code, do you have time to help check this?
@dongjoon-hyun @srowen Thanks
##########
core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala:
##########
@@ -165,7 +165,8 @@ private[spark] class Benchmark(
val best = runTimes.min
val avg = runTimes.sum / runTimes.size
Review Comment:
how about just make `avg` as a `Double`?
##########
core/src/test/scala/org/apache/spark/deploy/history/EventLogTestHelper.scala:
##########
@@ -56,7 +56,7 @@ object EventLogTestHelper {
eventStr: String,
desiredSize: Long): Seq[String] = {
val stringLen = eventStr.getBytes(StandardCharsets.UTF_8).length
- val repeatCount = Math.floor(desiredSize / stringLen).toInt
+ val repeatCount = Math.floor(desiredSize.toDouble / stringLen).toInt
Review Comment:
This is the third case related to `Math.floor`.
`desiredSize / stringLen` is also a long value. Is it necessary to perform
`Math.floor` here?
@HeartSaVioR Do you have time to help check this logic? Thanks
##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorRollPlugin.scala:
##########
@@ -150,14 +150,15 @@ class ExecutorRollDriverPlugin extends DriverPlugin with
Logging {
* Since we will choose only first item, the duplication is okay.
*/
private def outliersFromMultipleDimensions(listWithoutDriver:
Seq[v1.ExecutorSummary]) =
- outliers(listWithoutDriver.filter(_.totalTasks > 0), e => e.totalDuration
/ e.totalTasks) ++
- outliers(listWithoutDriver, e => e.totalDuration) ++
- outliers(listWithoutDriver, e => e.totalGCTime) ++
- outliers(listWithoutDriver, e => e.failedTasks) ++
- outliers(listWithoutDriver, e => getPeakMetrics(e, "JVMHeapMemory")) ++
- outliers(listWithoutDriver, e => getPeakMetrics(e, "JVMOffHeapMemory"))
++
- outliers(listWithoutDriver, e => e.totalShuffleWrite) ++
- outliers(listWithoutDriver, e => e.diskUsed)
+ outliers(listWithoutDriver.filter(_.totalTasks > 0),
+ e => e.totalDuration.toFloat / e.totalTasks) ++
Review Comment:
Directly change to `e.totalDuration.toFloat` seems to change the result? It
might need @dongjoon-hyun to confirm this change
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]