dtarima commented on pull request #33251:
URL: https://github.com/apache/spark/pull/33251#issuecomment-876075630
Instructions for testing manually:
```
export SPARK_HOME=<provide>
$SPARK_HOME/bin/spark-shell -driver-memory 4g \
--conf spark.network.maxRemoteBlockSizeFetchToMem=1 \
<script.scala >output.txt 2>&1
grep "Clean up file" output.txt
```
The `output.txt` should have log messages like
```
21/07/07 21:35:05 DEBUG BlockManager$RemoteBlockDownloadFileManager: Clean
up file
/tmp/blockmgr-5c650c95-5152-41c0-b4e7-0084ca811a54/2a/temp_local_7488b3ad-429f-48f5-8838-101999ee42b4
21/07/07 21:35:05 DEBUG BlockManager$RemoteBlockDownloadFileManager: Clean
up file
/tmp/blockmgr-5c650c95-5152-41c0-b4e7-0084ca811a54/0b/temp_local_6001e735-5cea-44ed-a094-f6518bdde6b7
```
If it doesn't have them assuming the script finished successfully then the
files are not properly cleaned.
Here is the `script.scala`:
```
import org.apache.log4j.Level
import org.apache.log4j.Logger
import org.apache.spark.sql.SparkSession
val logger = Logger.getLogger("org.apache.spark.storage")
logger.setLevel(Level.DEBUG)
val size: Int = 100 * 1000
val random = new util.Random(123)
val df =
Seq.fill(size)(Tuple1.apply(Vector.fill(10)(random.nextInt(size).toHexString))).toDF("numbers")
import org.apache.spark.ml.feature.CountVectorizer
val cv = new
CountVectorizer().setInputCol("numbers").setOutputCol("features")
val dfFeatures = cv.fit(df).transform(df)
import org.apache.spark.ml.clustering.LDA
val lda = new
LDA().setSeed(123).setK(10).setMaxIter(3).setCheckpointInterval(-1)
lda.fit(dfFeatures).describeTopics.show()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]