maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1628961000
> > @beliefer I don't see any difference as well before and after, but the intent of the PR looks good, in case of left outer join, bloom filter should be added. I would like to +1 this PR. Any thoughts? Thanks > > Personally, I think we should stay strict. We should find out the case have better performance and other cases without regression. @beliefer In scenarios where the left side table is small and right is huge, I have seen significant performance jump. The performance improvement is coming mostly because of reduction in amount of data exchanged and sorted. In the below query, with bloom filter enabled, its taking just 12 secs to complete the execution. The same query takes 41secs with bloom filter disabled.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
