ulysses-you commented on pull request #34069:
URL: https://github.com/apache/spark/pull/34069#issuecomment-927217863
hi @c21 , I agree. In general bnlj is much slower than smj. I find some
extreme case that a left join with very small left side and large right side,
and unfortunately the right side is also skewed. Then smj does not work good,
even failed with OOM at skewed partition.
Here a simple benchmark with my local side:
```scala
spark.range(0, 10000000).selectExpr("id % 1 as c1", "id as
c2").repartition(100).createOrReplaceTempView("t1")
spark.range(0, 10).selectExpr("id as c1").createOrReplaceTempView("t2")
// 5s
spark.sql("select /*+ merge(t2) */ count(*) from t2 left join t1 on t1.c1 =
t2.c1").collect
// 3s
spark.sql("select /*+ broadcast_nl(t2) */ count(*) from t2 left join t1 on
t1.c1 = t2.c1").collect
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]