Cool that you working on a research project with Impala!
Properly adding such a feature to Impala is a substantial effort, but
hacking the code for an experiment or two seems doable.
I think you will need to modify two things: (1) the planner to not add
exchange nodes, and (2) the scheduler to
Thank you very much for your quick answers!
The intention behind this is to improve the execution time and
(primarily) to examine the impact of block-co-location (research
project) for this particular query (simplified):
select A.x, B.y, A.z from tableA as A inner join tableB as B on
The driver with this fix is now available.
http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-New-Impala-JDBC-Driver-Released/m-p/65325#M220
On Fri, Feb 16, 2018 at 5:59 PM, Greg Rahn wrote:
> Unfortunately there is not a JIRA to track because the drivers are
Such a specific block arrangement is very uncommon for typical Impala
setups, so we don't attempt to recognize and optimize this narrow case. In
particular, such an arrangement tends to be short lived if you have the
HDFS balancer turned on.
Without making code changes, there is no way today to
Hello everyone!
In order to prevent network traffic, I'd like to perform local joins on each
node instead of exchanging the data and perform a join over the complete data
afterwards. My query is basically a join over three three tables on an ID
attribute. The blocks are perfectly distributed,