[
https://issues.apache.org/jira/browse/SPARK-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Monica Liu updated SPARK-10981:
---
Description:
I am using SparkR from RStudio, and I ran into an error with the join function
that I recreated with a smaller example:
{code:title=joinTest.R|borderStyle=solid}
Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init("local[4]")
sqlContext <- sparkRSQL.init(sc)
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
df1= createDataFrame(sqlContext, df)
showDF(df1)
x = c(2, 3, 10)
t = c("dd", "ee", "ff")
c = c(FALSE, FALSE, TRUE)
dff = data.frame(x, t, c)
df2 = createDataFrame(sqlContext, dff)
showDF(df2)
res = join(df1, df2, df1$n == df2$x, "semijoin")
showDF(res)
{code}
Running this code, I encountered the error:
{panel}
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
java.lang.IllegalArgumentException: Unsupported join type 'semijoin'.
Supported join types include: 'inner', 'outer', 'full', 'fullouter',
'leftouter', 'left', 'rightouter', 'right', 'leftsemi'.
{panel}
However, if I changed the joinType to "leftsemi",
{code}
res = join(df1, df2, df1$n == df2$x, "leftsemi")
{code}
I would get the error:
{panel}
Error in .local(x, y, ...) :
joinType must be one of the following types: 'inner', 'outer', 'left_outer',
'right_outer', 'semijoin'
{panel}
Since the join function in R appears to invoke a Java method, I went into
DataFrame.R and changed the code on line 1374 and line 1378 to change the
"semijoin" to "leftsemi" to match the Java function's parameters. These also
make the R joinType accepted values match those of Scala's.
semijoin:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "semijoin"))
{
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
}
else {
stop("joinType must be one of the following types: ",
"'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'")
}
{code}
leftsemi:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "leftsemi"))
{
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
}
else {
stop("joinType must be one of the following types: ",
"'inner', 'outer', 'left_outer', 'right_outer', 'leftsemi'")
}
{code}
This fixed the issue, but I'm not sure if this solution breaks hive
compatibility or causes other issues, but I can submit a pull request to change
this
was:
I am using SparkR from RStudio, and I ran into an error with the join function
that I recreated with a smaller example:
{code:title=joinTest.R|borderStyle=solid}
Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init("local[4]")
sqlContext <- sparkRSQL.init(sc)
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
df1= createDataFrame(sqlContext, df)
showDF(df1)
x = c(2, 3, 10)
t = c("dd", "ee", "ff")
c = c(FALSE, FALSE, TRUE)
dff = data.frame(x, t, c)
df2 = createDataFrame(sqlContext, dff)
showDF(df2)
res = join(df1, df2, df1$n == df2$x, "semijoin")
showDF(res)
{code}
Running this code, I encountered the error:
{panel}
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
java.lang.IllegalArgumentException: Unsupported join type 'semijoin'.
Supported join types include: 'inner', 'outer', 'full', 'fullouter',
'leftouter', 'left', 'rightouter', 'right', 'leftsemi'.
{panel}
However, if I changed the joinType to "leftsemi",
{code}
res = join(df1, df2, df1$n == df2$x, "leftsemi")
{code}
I would get the error:
{panel}
Error in .local(x, y, ...) :
joinType must be one of the following types: 'inner', 'outer', 'left_outer',
'right_outer', 'semijoin'
{panel}
Since the join function in R appears to invoke a Java method, I went into
DataFrame.R and changed the code on line 1374 and line 1378 to change the
"semijoin" to "leftsemi" to match the Java function's parameters. These also
make the R joinType accepted values match those of Scala's.
semijoin:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "semijoin"))
{
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
}
else {
stop("joinType must be one of the following types: ",
"'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'")
}
{code}
leftsemi:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "leftsemi"))
{
sdf <-