Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )
Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors ...................................................................... Patch Set 1: (4 comments) I like this proposal, I'm only afraid that the aggregation level might be too coarse. Currently we only store the first concrete error message per error code, but we might need all unique error messages. Maybe it would be better to aggregate at the error message level, i.e. to store all unique error messages and count them. This way we can't save constructing the error messages, but on the plus side, the interface could be simpler. http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@236 PS1, Line 236: runtime_ nit: I think it's confusing to call it 'runtime' since it usually has a different meaning. I did a quick sampling over the Impala code base and found that we usually call these variables as 'state_'. http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@216 PS1, Line 216: Maybe you could add a DCHECK that checks errors_.find(message.error()) == errors_.end() http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@221 PS1, Line 221: lock_guard<SpinLock> l(runtime_->error_log_lock_); : for (auto e: errors_) { : // It is assumed that the first occurrence was already reported to RuntimeState. : int new_error_count = e.second.count - 1; : runtime_->error_log_[e.first].count += new_error_count; To me it seems only this part needs locking. The second half of the for loop could be done in an other for loop without locking the spinlock. http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@226 PS1, Line 226: PrintId(runtime_->query_id()) There's a chance that the compiler does it for us, but it might be a good idea to store the result of PrintId() since it is a quite costly function. -- To view, visit http://gerrit.cloudera.org:8080/10793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa Gerrit-Change-Number: 10793 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 25 Jun 2018 11:57:54 +0000 Gerrit-HasComments: Yes
