Hi all,
Recently, there was an issue about a leak in SparkR in https://issues.apache.org/jira/browse/SPARK-21093. It was even worse because R workers crash on CentOS easily. This was fixed in https://github.com/apache/spark/commit/6b3d02285ee0debc73cbcab01b10398a498fbeb8. It was about the very core in SparkR and the logics were rather radically changed after careful review of few reviewers. Thanks to reviewers, in particular, Felix and Shivaram who stick with my PR and the issue. However, it is still a rather radical change that might affect many APIs that runs R's native functions (e.g., gapply, dapply and old RDD based APIs) and due to this concern this was not targeted to Spark 2.2.0. To cut it short, as suggested by R committers, I would like to encourage testing such APIs, that run R native functions (UDF) to find any bug ahead. To be more specific, I would like to suggest both ways as below to check if the PR really fixed the JIRA and if there is any bug with it. 1. Run the APIs multiple times and see if it works. If you are more interested in this, probably, you could open another terminal and enter ... watch -n 0.01 "lsof -c R | wc -l and see if the number consistently increases, which was the original issue. If want to be more specific, run ... ps -fe | grep /exec/R and check the PID of daemon.R. And then, run watch -n 0.01 "lsof -p [PID] | wc -l" and check the same thing. Checking this with other good tools would also be very wellcome. 2. Run existing workloads with the APIs and check if it works correctly to find any hidden bugs ahead. Thanks.