IMU, the left side is always located with the hash join node. If the stats are
correct, the left side will always be a larger table/input. There're two
terminologies in the hash join algorithm: build and probe. The smaller table
that can be built into an in-memory hash table is called the
IIUC, every row scanned in a partitioned hash join (both sides) is sent
across the network (an exchange on HASH(key)). The targets of this exchange
are nodes that have data locality with the left side of the join. Why does
Impala do it that way?
Since all rows are sent across the network anyway,
Thanks for the response, Quanlong. The behaviour you describe is broadcast
join (versus partitioned / shuffle) - sorry for confusing usage of terms!
Take a look at the differences in the cost model for the two (in lieu of
I would like to help out with the task listed at
I agree with the previous comments on this thread. Thank you for
Putting it behind a flag sounds good to me too. Hopefully we can get
feedback from Hulu and other users of Impala that will try out the
On Mon, Feb 12, 2018 at 10:26 AM, Dimitris Tsirogiannis <
> Does the patch also implement an ORC
Does the patch also implement an ORC writer?
On Mon, Feb 12, 2018 at 8:48 AM, Jim Apple wrote:
> I agree with the previous comments on this thread. Thank you for
> contributing, Quanlong!
Jeszy, the way I read your question is: How much inter-node parallelism is
As usual with perf question the answer is "it depends". Involving all nodes
in the cluster for a PHJ may not work well. Intuitively, each node should
have a minimum amount of work for the cost of shipping fragments
Thank you for volunteering! It looks like that ticket,
https://issues.apache.org/jira/browse/IMPALA-5886, already has an
assignee. Are there any other newbie tickets you would be interested
in working on?
On Mon, Feb 12, 2018 at 3:31 AM,
Dimitris, as the first step, this patch only supports reading primitive types
from ORC files. I just created two follow-up JIRAs for reading complex types
(IMPALA-6503) and writing to ORC tables (IMPALA-6504). Will work on them later.
Tim, I also created some follow-on JIRAs as you suggest in
Maybe it would make sense to create an Epic in JIRA for ORC scanner
enhancements, following on from the initial implementation. I don't really
feel strongly as long as the related JIRAs are linked together somehow.
On Mon, Feb 12, 2018 at 1:42 PM, Quanlong Huang
Mail list logo