subject:"\[GitHub\] \[spark\] zhengruifeng commented on pull request #40263\: \[SPARK\-42659\]\[ML\] Reimplement `FPGrowthModel.transform` with dataframe operations"

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-21 Thread via GitHub

zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1478764552 @srowen sounds reasonable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-20 Thread via GitHub

zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1477193966 TL;DR I want to apply scalar subquery to optimize `FPGrowthModel.transform`, there are two options: 1, create temp views and use `spark.sql`, see

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-16 Thread via GitHub

zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1473064231 @srowen if the latest performance test seems fine, then I'd ask the SQL guys whether we can have a subquery method in DataFrame APIs. -- This is an automated message from the

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-14 Thread via GitHub

zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1467821846 yes, the `BroadcastNestedLoopJoin` is slower. Then I have a try with subquery, and it's faster in both execution and analysis, but I have to create temp view and write the sql

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-13 Thread via GitHub

zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1465662299 I did a quick test with dataset `T10I4D100K` in http://fimi.uantwerpen.be/data/ fit: ``` scala> val df =

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

5 matches

Site Navigation

Mail list logo

Footer information