dtarima commented on pull request #33251:
URL: https://github.com/apache/spark/pull/33251#issuecomment-876075630


   Instructions for testing manually:
   ```
   export SPARK_HOME=<provide>
   $SPARK_HOME/bin/spark-shell -driver-memory 4g \
     --conf spark.network.maxRemoteBlockSizeFetchToMem=1 \
     <script.scala >output.txt 2>&1
   grep "Clean up file" output.txt
   ```
   The `output.txt` should have log messages like
   ```
   21/07/07 21:35:05 DEBUG BlockManager$RemoteBlockDownloadFileManager: Clean 
up file 
/tmp/blockmgr-5c650c95-5152-41c0-b4e7-0084ca811a54/2a/temp_local_7488b3ad-429f-48f5-8838-101999ee42b4
   21/07/07 21:35:05 DEBUG BlockManager$RemoteBlockDownloadFileManager: Clean 
up file 
/tmp/blockmgr-5c650c95-5152-41c0-b4e7-0084ca811a54/0b/temp_local_6001e735-5cea-44ed-a094-f6518bdde6b7
   ```
   If it doesn't have them assuming the script finished successfully then the 
files are not properly cleaned.
   
   
   Here is the `script.scala`:
   ```
   import org.apache.log4j.Level
   import org.apache.log4j.Logger
   import org.apache.spark.sql.SparkSession
   
   val logger = Logger.getLogger("org.apache.spark.storage")
   logger.setLevel(Level.DEBUG)
   
   val size: Int = 100 * 1000
   val random = new util.Random(123)
   
   val df = 
Seq.fill(size)(Tuple1.apply(Vector.fill(10)(random.nextInt(size).toHexString))).toDF("numbers")
   
   import org.apache.spark.ml.feature.CountVectorizer
   val cv = new 
CountVectorizer().setInputCol("numbers").setOutputCol("features")
   val dfFeatures = cv.fit(df).transform(df)
   
   import org.apache.spark.ml.clustering.LDA
   val lda = new 
LDA().setSeed(123).setK(10).setMaxIter(3).setCheckpointInterval(-1)
   lda.fit(dfFeatures).describeTopics.show()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to