maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1628961000

   > > @beliefer I don't see any difference as well before and after, but the 
intent of the PR looks good, in case of left outer join, bloom filter should be 
added. I would like to +1 this PR. Any thoughts? Thanks
   > 
   > Personally, I think we should stay strict. We should find out the case 
have better performance and other cases without regression.
   
   @beliefer In scenarios where the left side table is small and right is huge, 
I have seen significant performance jump. The performance improvement is coming 
mostly because of reduction in amount of data exchanged and sorted. 
   
   In the below query, with bloom filter enabled, its taking just 12 secs to 
complete the execution. The same query takes 41secs with bloom filter disabled.
   
   
![image](https://github.com/apache/spark/assets/19660171/5b908bfa-c9b7-40be-92da-8a18a1434380)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to