maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1635347890
@beliefer Doing some experiments to check the impact of size of tables on the performance number. As far as bloom is concern, the worst case seems to be the case when left side (bloom creation side) is largest and the right side (bloom application side) is smallest. These are the value for left side table of size ~10MB (the max value, beyond this value bloom will not be applied) and right side table size ~10GB (this is the min size, below this bloom is not applied). Here the reduction is from 220M records to 7,952,642 records. Will try to do some experiments on this reduction ratio.   -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
