WeichenXu123 commented on a change in pull request #30009:
URL: https://github.com/apache/spark/pull/30009#discussion_r519784311
##########
File path:
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
##########
@@ -562,4 +562,22 @@ trait HasBlockSize extends Params {
/** @group expertGetParam */
final def getBlockSize: Int = $(blockSize)
}
+
+/**
+ * Trait for shared param blockSizeInMB (default: 0.0). This trait may be
changed or
+ * removed between minor versions.
+ */
+trait HasBlockSizeInMB extends Params {
+
+ /**
+ * Param for Maximum memory in MB for stacking input data in blocks. Data is
stacked within partitions. If more than remaining data size in a partition then
it is adjusted to the data size. If 0, try to infer an appropriate value based
on the statistics of dataset. Must be >= 0..
+ * @group expertParam
+ */
+ final val blockSizeInMB: DoubleParam = new DoubleParam(this,
"blockSizeInMB", "Maximum memory in MB for stacking input data in blocks. Data
is stacked within partitions. If more than remaining data size in a partition
then it is adjusted to the data size. If 0, try to infer an appropriate value
based on the statistics of dataset. Must be >= 0.", ParamValidators.gtEq(0.0))
Review comment:
> a block can exceed this size
Only will slightly exceed the limit so not a matter.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]