zhengruifeng commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1478764552
@srowen sounds reasonable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
zhengruifeng commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1477193966
TL;DR I want to apply scalar subquery to optimize
`FPGrowthModel.transform`, there are two options:
1, create temp views and use `spark.sql`, see
zhengruifeng commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1473064231
@srowen if the latest performance test seems fine, then I'd ask the SQL guys
whether we can have a subquery method in DataFrame APIs.
--
This is an automated message from the
zhengruifeng commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1467821846
yes, the `BroadcastNestedLoopJoin` is slower.
Then I have a try with subquery, and it's faster in both execution and
analysis, but I have to create temp view and write the sql
zhengruifeng commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1465662299
I did a quick test with dataset `T10I4D100K` in
http://fimi.uantwerpen.be/data/
fit:
```
scala> val df =