Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for 
common data errors
......................................................................


Patch Set 1:

(4 comments)

I like this proposal, I'm only afraid that the aggregation level might be too 
coarse.

Currently we only store the first concrete error message per error code, but we 
might need all unique error messages.

Maybe it would be better to aggregate at the error message level, i.e. to store 
all unique error messages and count them.
This way we can't save constructing the error messages, but on the plus side, 
the interface could be simpler.

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@236
PS1, Line 236: runtime_
nit: I think it's confusing to call it 'runtime' since it usually has a 
different meaning.
I did a quick sampling over the Impala code base and found that we usually call 
these variables as 'state_'.


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@216
PS1, Line 216:
Maybe you could add a DCHECK that checks errors_.find(message.error()) == 
errors_.end()


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@221
PS1, Line 221:   lock_guard<SpinLock> l(runtime_->error_log_lock_);
             :   for (auto e: errors_) {
             :     // It is assumed that the first occurrence was already 
reported to RuntimeState.
             :     int new_error_count =  e.second.count - 1;
             :     runtime_->error_log_[e.first].count += new_error_count;
To me it seems only this part needs locking.
The second half of the for loop could be done in an other for loop without 
locking the spinlock.


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@226
PS1, Line 226: PrintId(runtime_->query_id())
There's a chance that the compiler does it for us, but it might be a good idea 
to store the result of PrintId() since it is a quite costly function.



--
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 25 Jun 2018 11:57:54 +0000
Gerrit-HasComments: Yes

Reply via email to