Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18465
I referred this - http://adv-r.had.co.nz/Rcpp.html and your link.
I did as below:
**R test alone**
```
vi tmp.R
```
copy and paste the codes in **Before** and **After** and then ran
```
Rscript tmp.R
```
Before
```R
library(Rcpp)
cppFunction('double takeLog(double val) {
try {
if (val <= 0.0) { // log() not defined here
throw std::range_error("Inadmissible value");
}
return log(val);
} catch(std::exception &ex) {
forward_exception_to_r(ex);
} catch(...) {
::Rf_error("c++ exception (unknown reason)");
}
return NA_REAL; // not reached
}')
for(i in 0:10000) {
p <- parallel:::mcfork()
if (inherits(p, "masterProcess")) {
takeLog(-1.0)
print("unreachable")
tools::pskill(child, tools::SIGUSR1)
}
}
print("end")
Sys.sleep(10L)
```
After
```R
library(Rcpp)
cppFunction('double takeLog(double val) {
try {
if (val <= 0.0) { // log() not defined here
throw std::range_error("Inadmissible value");
}
return log(val);
} catch(std::exception &ex) {
forward_exception_to_r(ex);
} catch(...) {
::Rf_error("c++ exception (unknown reason)");
}
return NA_REAL; // not reached
}')
for(i in 0:10000) {
p <- parallel:::mcfork()
if (inherits(p, "masterProcess")) {
takeLog(-1.0)
print("unreachable")
}
children <- suppressWarnings(parallel:::selectChildren(timeout = 0))
if (is.integer(children)) {
lapply(children, function(child) {
print(parallel:::readChild(child))
tools::pskill(child, tools::SIGUSR1)
})
}
}
print("end")
Sys.sleep(10L)
```
The symptoms are similar with
https://github.com/apache/spark/pull/18465#issuecomment-313049544
**End to end**
I could not do this as I did above with `cppFunction` due to such errors
below:
```
Error in as.character(node[[1]]) :
cannot coerce type 'builtin' to vector of type 'character'
```
So, I did as below:
```
vi takeLog.cpp
```
copy and paste
```cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double takeLog(double val) {
try {
if (val <= 0.0) { // log() not defined here
throw std::range_error("Inadmissible value");
}
return log(val);
} catch(std::exception &ex) {
forward_exception_to_r(ex);
} catch(...) {
::Rf_error("c++ exception (unknown reason)");
}
return NA_REAL; // not reached
}
```
And then ran below with SparkR:
```R
func <- function(key, x) {
library(Rcpp)
path <- "/.../spark/takeLog.cpp"
sourceCpp(path)
takeLog(-1.0)
}
df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", func, schema(df)))
... 30 times
collect(gapply(df, "a", function(key, x) { x }, schema(df)))
```
The symptoms are also similar with
https://github.com/apache/spark/pull/18465#issuecomment-313055990 for both
before/after.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]