Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18846 )
Change subject: [Client] Add query id to trace the whole query process ...................................................................... Patch Set 27: (17 comments) http://gerrit.cloudera.org:8080/#/c/18846/27//COMMIT_MSG Commit Message: PS27: Could you be more specific in describing the use case with Impala? Where does the query ID originates from if talking about Kudu client used by Impala? http://gerrit.cloudera.org:8080/#/c/18846/27//COMMIT_MSG@14 PS27, Line 14: In this patch, Impala can nit: in this patch, no Impala code is involved, as I can see http://gerrit.cloudera.org:8080/#/c/18846/27//COMMIT_MSG@15 PS27, Line 15: Kudu duplicated 'Kudu'? http://gerrit.cloudera.org:8080/#/c/18846/27//COMMIT_MSG@16 PS27, Line 16: convenient to debug problem. useful in troubleshooting and debugging. http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client-test.cc@1278 PS27, Line 1278: specified the specified http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client-test.cc@1288 PS27, Line 1288: ASSERT_STRINGS_ANY_MATCH(capture_logs.logged_msgs(), : "Query id: .* tablet id:"); : ASSERT_STRINGS_ANY_MATCH(capture_logs.logged_msgs(), : "Query id: .* scanner id:"); This matches anything, so 'test_query_id' could match this as well. If so, how is this useful then? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h File src/kudu/client/client.h: http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@2801 PS27, Line 2801: whole process nit: whole scanning process ? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@2802 PS27, Line 2802: defaultly automatically by the client library code http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@2817 PS27, Line 2817: for debugging problem nit: drop this extra part? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3375 PS27, Line 3375: whole process nit: whole scanning process ? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3376 PS27, Line 3376: posted set http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3377 PS27, Line 3377: for debugging nit: for troubleshooting and debugging http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3380 PS27, Line 3380: for debugging problem nit: drop this part of the sentence? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.cc File src/kudu/client/client.cc: http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.cc@2019 PS27, Line 2019: if (data_->next_req_.query_id().empty()) { : static ObjectIdGenerator oid_generator; : data_->next_req_.set_query_id(oid_generator.Next()); : } Do you think it makes sense to make this configurable? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/scan_token-test.cc File src/kudu/client/scan_token-test.cc: http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/scan_token-test.cc@598 PS27, Line 598: ASSERT_STRINGS_ANY_MATCH(capture_logs.logged_msgs(), : "Query id: .* tablet id:"); : ASSERT_STRINGS_ANY_MATCH(capture_logs.logged_msgs(), : "Query id: .* scanner id:"); This matches anything, right? How is this useful then? http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_server-test.cc@2870 PS27, Line 2870: // Test for a scan with query id in server side. : TEST_F(TabletServerTest, TestScanWithQueryId) { : InsertTestRowsDirect(0, 100); : : ScanRequestPB req; : NewScanRequestPB* scan = req.mutable_new_scan_request(); : scan->set_tablet_id(kTabletId); : req.set_batch_size_bytes(0); // so it won't return data right away : ASSERT_OK(SchemaToColumnPBs(schema_, scan->mutable_projected_columns())); : req.set_query_id("query_id_for_test"); : : ScanResponsePB resp; : RpcController rpc; : // Send the call : { : SCOPED_TRACE(SecureDebugString(req)); : ASSERT_OK(proxy_->Scan(req, &resp, &rpc)); : SCOPED_TRACE(SecureDebugString(resp)); : ASSERT_FALSE(resp.has_error()); : } : } What are we trying to verify here? I'm not sure I'm seeing anything specific to tracing in this scenario, but I might be missing something. http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_service.cc@3039 PS27, Line 3039: LOG(INFO) << Substitute("Query id: $0, scanner id: $1.", : req->query_id(), req->scanner_id()); Why to keep thing along with the trace statement at line 3075? If the scan is problematic in sense of timing, etc., then trace should be printed automatically. Otherwise, if it's necessary to track all the requests, it's possible to enable tracing explicitly, right? My concern is that there would be too much logging if having this INFO log for every scan continuation request. -- To view, visit http://gerrit.cloudera.org:8080/18846 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9dbae801596726fec1c85ee547128da3179345d9 Gerrit-Change-Number: 18846 Gerrit-PatchSet: 27 Gerrit-Owner: Wang Xixu <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Jian Zhang <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Mon, 09 Jan 2023 19:37:57 +0000 Gerrit-HasComments: Yes
