Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18846 )

Change subject: [Client] Add query id to trace the whole query process
......................................................................


Patch Set 30:

(25 comments)

Thanks a lot for the updates!

The patch looks good overall, just a few more comments on the newly introduced 
test in scan_token-test.cc (that I missed to post with PS27), some nits in 
comments and descriptions.

Also, it seems IWYU isn't yet happy:

>>> Fixing #includes in 
>>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/scan_token-test.cc'
@@ -65,7 +65,6 @@
 #include "kudu/tserver/ts_tablet_manager.h"
 #include "kudu/tserver/tserver.pb.h"
 #include "kudu/util/async_util.h"
-#include "kudu/util/logging_test_util.h"
 #include "kudu/util/metrics.h"
 #include "kudu/util/monotime.h"
 #include "kudu/util/net/sockaddr.h"
>>> Fixing #includes in 
>>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client-test.cc'
@@ -118,7 +118,6 @@
 #include "kudu/util/barrier.h"
 #include "kudu/util/countdown_latch.h"
 #include "kudu/util/locks.h"  // IWYU pragma: keep
-#include "kudu/util/logging_test_util.h"
 #include "kudu/util/metrics.h"
 #include "kudu/util/monotime.h"
 #include "kudu/util/net/sockaddr.h"
IWYU would have edited 2 files on your behalf.

Also, plea

http://gerrit.cloudera.org:8080/#/c/18846/27//COMMIT_MSG
Commit Message:

PS27:
> In Impala, a sql will be translated into a query plan. The
Thank you for adding the details.  I appreciate this!


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@9
PS30, Line 9: will be
is


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@9
PS30, Line 9: a sql
an SQL query


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@10
PS30, Line 10: will be
is


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@11
PS30, Line 11: in
at


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@11
PS30, Line 11: distributly
in a distributed manner


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@13
PS30, Line 13: It is useful to analyze which step get
             : stucked or failed, when a slow or failed query happened.
It is useful in analysis of stuck, failed, and otherwise non-performant queries.


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@18
PS30, Line 18: Kudu is the scan node of a query plan, when defines Kudu table
             : as external table in Impala. And a scanning on a table will be
             : splited into multiple scanning on different tablets. The scanner
             : id is used to trace a scanning on a tablet.
Scanning a Kudu table is split into multiple scan operations on appropriate 
tablets, one scanner per tablet.  When scanning a Kudu table using Impala, 
corresponding scan nodes appear for every scanner in the Impala's query plan.  
Scanner id is used to trace the scan operation on a tablet.


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@24
PS30, Line 24: from
originated from


http://gerrit.cloudera.org:8080/#/c/18846/30//COMMIT_MSG@27
PS30, Line 27: this patch post query id from Impala to Kudu, and
             : binding with scanner id.
with this patch it is now possible to post Impala's query identifier to Kudu, 
so it can be associated with Kudu's scanner id.


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/client.h
File src/kudu/client/client.h:

http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/client.h@3378
PS30, Line 3378: defaultly
automatically


http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h
File src/kudu/client/client.h:

http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3376
PS27, Line 3376:
> set
Missed this comment?


http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3377
PS27, Line 3377: ole process.
> nit: for troubleshooting and debugging
Missed this comment?


http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.h@3380
PS27, Line 3380:
> nit: drop this part of the sentence?
Missed this comment?


http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.cc
File src/kudu/client/client.cc:

http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/client/client.cc@2019
PS27, Line 2019:   }
               :
               :   VLOG(2) << "Beginning " << data_->DebugString();
               :
> Do you think it makes sense to make this configurable?
It seems this comment hasn't been answered (it was missed?).

I guess it's not a big deal to have this unconditionally, but at least please 
consider adding similar functionality to the Java Kudu client as well.

Thanks!


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/client.proto@201
PS30, Line 201: Query id to identify one query.
How about:

Identifier of a top-level query which this scan requests is a part of. Query id 
allows to associate the scanner built out of this token with the top-level 
query; it is useful for tracing the execution of the top-level query through 
various parts of the data pipeline.


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

PS30:
A few more comments on this test as of PS30.  It seems I missed posting those 
on PS27.


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc@703
PS30, Line 703: 3 replication factor
nit: but the code below sets the replication factor to 1?


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc@709
PS30, Line 709:     workload.set_num_write_threads(10);
              :     workload.set_write_batch_size(128);
Is this crucial to customize these in the context of this test?  If yes, please 
add a comment to explain why, otherwise remove these customizations.


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc@729
PS30, Line 729:     unique_ptr<KuduScanner> scanner;
              :     ASSERT_OK(IntoUniqueScanner(client_.get(), *tokens[0], 
&scanner));
Does it make sense to check for the 'query_id' field in 
scanner->data_->next_req_?  I guess it should be "query-id-for-test" at this 
point, right?


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc@731
PS30, Line 731:     ASSERT_OK(scanner->Open());
              :     KuduScanBatch batch;
              :     ASSERT_TRUE(scanner->HasMoreRows());
              :     while (scanner->HasMoreRows()) {
              :       ASSERT_OK(scanner->NextBatch(&batch));
              :     }
nit: maybe, check for the number of scanned rows to equal to 
workload.rows_inserted()?


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc@746
PS30, Line 746:     unique_ptr<KuduScanner> scanner;
              :     ASSERT_OK(IntoUniqueScanner(client_.get(), *tokens[0], 
&scanner));
Does it make sense to check for the 'query_id' field in 
scanner->data_->next_req_?  Should it be empty or auto-populated at this point?


http://gerrit.cloudera.org:8080/#/c/18846/30/src/kudu/client/scan_token-test.cc@748
PS30, Line 748:     ASSERT_OK(scanner->Open());
              :     KuduScanBatch batch;
              :     ASSERT_TRUE(scanner->HasMoreRows());
              :     while (scanner->HasMoreRows()) {
              :       ASSERT_OK(scanner->NextBatch(&batch));
              :     }
nit: maybe, check for the number of rows to equal to workload.rows_inserted()?


http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_server-test.cc@2870
PS27, Line 2870:               next_seq_id++;
               :               if (!resp.has_more_results()) {
               :                 VLOG(1) << "Finished scan";
               :                 done = true;
               :               }
               :             }
               :           }
               :         });
               :   }
               :   for (auto& t : threads) {
               :     t.join();
               :   }
               :   ASSERT_EQ(total_rows, kNumRows);
               : }
               :
               : // Test for a scan with query id in server side.
               : TEST_F(TabletServerTest, TestScanWithQueryId) {
               :   InsertTestRowsDirect(0, 100);
               :
               :   ScanRequestPB req;
               :
> Ok. Here add a check about log containing string: query_id_for_test.
All right, but I don't see the code that would check for 'query_id_for_test' 
presence.  Is it still missing?


http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/18846/27/src/kudu/tserver/tablet_service.cc@3039
PS27, Line 3039:   size_t batch_size_bytes = GetMaxBatchSizeBytesHint(req);
               :
> OK. Do you have any other questions about this patch: post query id from Im
Uh-oh, I missed your response here, sorry.


I don't have any other questions w.r.t. query id here.  Thanks a lot for 
addressing this!  With that, I guess having the query_id just in the trace is 
good enough for your use case, right?

The other questions are in other comments for the PS27 :)



--
To view, visit http://gerrit.cloudera.org:8080/18846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9dbae801596726fec1c85ee547128da3179345d9
Gerrit-Change-Number: 18846
Gerrit-PatchSet: 30
Gerrit-Owner: Wang Xixu <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Jian Zhang <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Sat, 11 Feb 2023 04:01:53 +0000
Gerrit-HasComments: Yes

Reply via email to