I have received a fair number of questions on the topic of handling data
locality and co-located joins in HAWQ 2. Most of the questions are coming
from the background where HAWQ 1.x defaulted to HASH distributed tables
distributed by a key and hence resulted in local joins in most cases for
With the new architecture and RANDOM distribution policy as default, I
thought it would be good to crowd-source some useful info here from the
community on how performance is achieved with the new architecture and data
distribution policy? Questions around how data movement is minimized,
how/when dynamic redistribution is utilized, how joins are co-located etc.
Can someone start by providing insights on this topic?