> Another optimization which is far more useful is the shared fetch >optimization. This tries to avoid copying the same data onto the same >host multiple times. > We've seen fairly good gains when fetching data to 10K reducers from a >single source - 28 minutes > down to 2 minutes. There's an example - BroadcastLoadGen - which can be >used to try out this feature.
Since in 0.5.x we missed out on whitelisting runtime shared fetch, that I couldn¹t test heavily in production installs - so it would be amazing if you can do some broadcast JOIN tests with that enabled on a multi-rack cluster. My testing on a single-rack cluster has proved that even with full 10 GigE node-to-node, it improves broadcast performance by about 10-20%, but on a large cluster I¹m expecting at least a 2-3x performance improvement for large map-joins with that feature. Would be great to get scale validation on that for a user workload, since I tested it on 350 nodes with broadcastloadgen (which showed the ~15x gains), but did not test that heavily with a user workload. Cheers, Gopal
