Re: Local join instead of data exchange - co-located blocks

2018-03-12 Thread Alexander Behm
Cool that you working on a research project with Impala! Properly adding such a feature to Impala is a substantial effort, but hacking the code for an experiment or two seems doable. I think you will need to modify two things: (1) the planner to not add exchange nodes, and (2) the scheduler to

Re: Local join instead of data exchange - co-located blocks

2018-03-12 Thread Philipp Krause
Thank you very much for your quick answers! The intention behind this is to improve the execution time and (primarily) to examine the impact of block-co-location (research project) for this particular query (simplified): select A.x, B.y, A.z from tableA as A inner join tableB as B on

Re: Using comments in query for detailed monitoring

2018-03-12 Thread Greg Rahn
The driver with this fix is now available. http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-New-Impala-JDBC-Driver-Released/m-p/65325#M220 On Fri, Feb 16, 2018 at 5:59 PM, Greg Rahn wrote: > Unfortunately there is not a JIRA to track because the drivers are

Re: Local join instead of data exchange - co-located blocks

2018-03-12 Thread Alexander Behm
Such a specific block arrangement is very uncommon for typical Impala setups, so we don't attempt to recognize and optimize this narrow case. In particular, such an arrangement tends to be short lived if you have the HDFS balancer turned on. Without making code changes, there is no way today to

Local join instead of data exchange - co-located blocks

2018-03-12 Thread Philipp Krause
Hello everyone! In order to prevent network traffic, I'd like to perform local joins on each node instead of exchanging the data and perform a join over the complete data afterwards. My query is basically a join over three three tables on an ID attribute. The blocks are perfectly distributed,