[ https://issues.apache.org/jira/browse/IMPALA-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar resolved IMPALA-6461. ------------------------------------- Resolution: Fixed IMPALA-6461 : Micro-optimizations to DataStreamSender::Send While analyzing performance of partition exchange operator, I noticed that there is dependency and a function call per row in the hot path. Optimizations in this change are: 1) Remove the data dependency between computing the hash and the channel 2) Inline DataStreamSender::Channel::AddRow 3) Save partition_exprs_.size() to save a couple of instructions This translates to improving CPI for DataStreamSender::Send by 10% Change-Id: I642a9dad531a29d4838a3537ab0e04320a69960d Reviewed-on: [http://gerrit.cloudera.org:8080/9221]Reviewed-by: Mostafa Mokhtar <mmokh...@cloudera.com> Tested-by: Impala Public Jenkins > *DataStreamSender::Channel::AddRow needs some micro-optimizations to remove > per row function call and data dependency > ---------------------------------------------------------------------------------------------------------------------- > > Key: IMPALA-6461 > URL: https://issues.apache.org/jira/browse/IMPALA-6461 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec > Reporter: Mostafa Mokhtar > Assignee: Mostafa Mokhtar > Priority: Minor > > While analyzing performance of partition exchange operator I noticed that > there is dependency and a function call per row in the hot path. > {code} > // hash-partition batch's rows across channels > // TODO: encapsulate this in an Expr as we've done for Kudu above and remove > this case > // once we have codegen here. > int num_channels = channels_.size(); > for (int i = 0; i < batch->num_rows(); ++i) { > TupleRow* row = batch->GetRow(i); > uint64_t hash_val = EXCHANGE_HASH_SEED; > for (int j = 0; j < partition_exprs_.size(); ++j) { > ScalarExprEvaluator* eval = partition_expr_evals_[j]; > void* partition_val = eval->GetValue(row); > // We can't use the crc hash function here because it does not result in > // uncorrelated hashes with different seeds. Instead we use FastHash. > // TODO: fix crc hash/GetHashValue() > DCHECK(&(eval->root()) == partition_exprs_[j]); > hash_val = RawValue::GetHashValueFastHash( > partition_val, partition_exprs_[j]->type(), hash_val); > } > RETURN_IF_ERROR(channels_[hash_val % num_channels]->AddRow(row)); > } > {code} > Force inlining DataStreamSender::Channel::AddRow and breaking up the loop > improves partition exchange performance by 5% > Code-generation of the hash computation IMPALA-5168 should give another 10% > speedup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)