Hi all,

Recently, there was an issue about a leak in SparkR in
https://issues.apache.org/jira/browse/SPARK-21093.
It was even worse because R workers crash on CentOS easily. This was fixed
in
https://github.com/apache/spark/commit/6b3d02285ee0debc73cbcab01b10398a498fbeb8.
It was about the very core
in SparkR and the logics were rather radically changed after careful review
of few reviewers.
Thanks to reviewers, in particular, Felix and Shivaram who stick with my PR
and the issue.

However, it is still a rather radical change that might affect many APIs
that runs R's native
functions (e.g., gapply, dapply and old RDD based APIs) and due to this
concern this was not targeted
to Spark 2.2.0.

To cut it short, as suggested by R committers, I would like to encourage
testing such APIs, that run
R native functions (UDF) to find any bug ahead. To be more specific, I
would like to suggest both ways
as below to check if the PR really fixed the JIRA and if there is any bug
with it.


1. Run the APIs multiple times and see if it works. If you are more
interested in this,
probably, you could open another terminal and enter ...

  watch -n 0.01 "lsof -c R | wc -l

and see if the number consistently increases, which was the original issue.
If want to be
more specific, run ...

  ps -fe | grep /exec/R

and check the PID of daemon.R. And then, run

  watch -n 0.01 "lsof -p [PID] | wc -l"

and check the same thing. Checking this with other good tools would also be
very wellcome.


2. Run existing workloads with the APIs and check if it works correctly to
find any hidden bugs ahead.


Thanks.

Reply via email to