Re: [I] OOM when nested join + limit [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816753297 I can try and help this effort in a few weeks, but I need to finish up the filter pushdown work in parquet and topk work first (I have too many things outstanding at the moment an

Re: [I] OOM when nested join + limit [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816732058 > What I suggest is that someone updates our documentation with the current state of joins in DataFusion (namely what operators are implemented and what types of joins they ar

Re: [I] OOM when nested join + limit [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816665497 TLDR while this is straight forward bug report, I think fixing it is not not something we are going to make a patch for -- it will require a more serious implementation effort fo

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816553633 There is also a now closed PR for IEJoin https://github.com/apache/datafusion/pull/12754#pullrequestreview-2350501097 That allows joining on some other conditions.

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
kosiew commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816494226 @2010YOUY01 What do you think of using sorted inputs, just like SortMergeJoinExec Instead of ==, allow conditions like: t1.id < t2.id t1.timestamp <= t2.ev

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2815082026 I think https://github.com/apache/datafusion/issues/15760 is needed to fix the issue. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2814936184 > [@alamb](https://github.com/alamb) > > I think a SortMergeJoin derivative that handles inequality could be helpful here but there are comments that SortMergeJoin is n

Re: [I] OOM when nested join + limit [datafusion]

2025-04-17 Thread via GitHub
kosiew commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2812443761 @alamb I think a SortMergeJoin derivative that handles inequality could be helpful here but there are comments that SortMergeJoin is not used in plans yet. Is t

Re: [I] OOM when nested join + limit [datafusion]

2025-04-14 Thread via GitHub
kosiew commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2801251917 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] OOM when nested join + limit [datafusion]

2025-04-08 Thread via GitHub
mmooyyii opened a new issue, #15628: URL: https://github.com/apache/datafusion/issues/15628 ### Is your feature request related to a problem or challenge? ``` # make testdata import pandas as pd n_rows = 10 data = {'id': range(1, n_rows + 1)} df = pd.DataFram