Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
......................................................................


Patch Set 42:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/query-state-record.cc
File be/src/service/query-state-record.cc:

http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/query-state-record.cc@208
PS39, Line 208:   base_state->num_rows_fetched = 
exec_state.num_rows_fetched_counter();
> Nice, I'd noticed base could be null and fixed it in my patch.
I only fixed this because I saw in your patch that you found and fixed it there!


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.h
File be/src/service/workload-management.h:

http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.h@40
PS41, Line 40:
> Can this renamed with something more explanatory? Perhaps "management" or "
Yup, renamed to "workload_management"


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.h@102
PS41, Line 102:  the comple
> Looks like most columns are primitive types. So this can be replaced with T
Done.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.h@248
PS41, Line 248:
              :     // Query Options set by Configuration
              :     FieldDefinition("query_opts_config", TPrimitiveType::STRING,
              :         [](FieldParserContext& ctx){
              :           const std::string opts_str = 
DebugQueryOptions(ctx.record->query_options);
              :           ctx.sql << "'" << EscapeSql(opts_str) << "'";
              :         }),
              :
              :     // Resource Pool
              :     FieldDefinition("resource_pool", TPrimitiveType::STRING,
              :         [](FieldParserContext& ctx){
              :           ctx.sql << "'" << 
EscapeSql(ctx.record->base_state->resource_pool) << "'";
              :         }),
              :
              :     // Per-host Memory Estimate
              :     FieldDefinition("per_host_mem_estimate", 
TPrimitiveType::BIGINT,
              :         [](FieldParserContext& ctx){
              :           ctx.sql << ctx.record->per_host_mem_estimate;
              :         }),
              :
              :     // Dedi
> This can be feed directly to StringStream sql. This the how generated Types
Done.  That's much nicer!


http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.h
File be/src/service/workload-management.h:

http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.h@42
PS39, Line 42:
> This is going to allocate storage every place the header is included. Do we
Nope, fixed.  I left the constant declaration here but declared it extern.


http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.h@51
PS39, Line 51: /// object that is passed to each parser.
> This should still be extern, or it won't use the right value in all cases.
Done


http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.h@105
PS39, Line 105: ///   `ctx`  The field parse context object.
> These should be `std::move`'d in to place. Right now it's making 2 copies o
Done


http://gerrit.cloudera.org:8080/#/c/20770/36/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/36/be/src/service/workload-management.cc@61
PS36, Line 61:
> Decided this didn't need to be configurable?
Yeah, after reviewing your active queries patch where the database was 
hardcoded to "sys", I decided to make the completed queries also be hardcoded 
to "sys" to match.


http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.cc@206
PS39, Line 206: namespace workload_management {
> This is still evaluated during static initialization, before query_log_writ
Done


http://gerrit.cloudera.org:8080/#/c/20770/39/be/src/service/workload-management.cc@354
PS39, Line 354:     completed_queries_.emplace_back(make_pair(move(exp_rec), 
0));
> Does this actually need to be a global? Looks like it's only used in this f
nope, it does not.  fixed


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@63
PS41, Line 63:   "the query log table where completed queries will be stored.");
             :
             : DEFINE_validator(query_log_table_name, [](const char* name, 
const string& val) {
             :   if (regex_match(val, alphanum_underscore_dash)) return true;
             :
             :   LOG(ERROR) << "Invalid value for --" << name << ": must only 
contain alphanumeric "
             :       "characters, underscores, or dashes and must not be empty";
             :   return false;
             : });
             :
> What if this constrained further to only contain alphanumeric and underscor
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@79
PS41, Line 79:     return false;
             :   }
             :
             :   if (val.find_first_of('"') != string::npos) {
             :     LOG(ERROR) << "Invalid value for --" << name << ": must not 
contain double quotes.";
             :     return false;
             :   }
             :
             :   if (val.find_first_of('\n') != string::npos) {
             :     LOG(ERROR) << "Invalid value for --" << name << ": must not 
contain newlines.";
             :     return false;
             :   }
             :
             :   return true;
             : });
             :
> Is this mainly validate if val is a legal POSIX path?
This parameter gets passed directly to the "LOCATION" portion of the create 
table statement.  Thus, it could be a URL or a file path.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@140
PS41, Line 140:     "will be recorded in the completed queries table. If a sql 
statement with a length "
              :     "longer than this value is executed, the sql inserted into 
the completed queries "
              :     "table will be trimmed to this length. Any characters that 
need escaping will have "
              :     "their backslash character counted towards this limit.");
              :
> Should this be validated against query_log_write_interval_s?
It does not need to since a graceful shutdown will interrupt the wait and force 
the completed queries queue to be drained to the completed queries table.  
Thus, this value is actually the timeout for the drain process.

I updated this flag to be named "query_log_shutdown_timeout_s" instead to 
better reflect its purpose.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@186
PS41, Line 186: DEFINE_int32_hidden(query_log_max_insert_attempts, 3, "Maximum 
number of times to "
> nit: this could all be static variables local to this file since they're no
I keep going back and forth on this topic, and I'm settling on making variables 
local static unless they are needed outside the cc file in which case they will 
go into the header file.  My original thought was to make these constant in 
case other code needed them in the future, but it's trivial to move from a 
static local to a header file global in the unlikely event that the need 
actually arises.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@368
PS41, Line 368:
> We should use this and the batch size to SET MAX_STATEMENT_LENGTH_BYTES for
I reworked this code.  There are now coordinator startup flags that control the 
max length of sql and plan statements.  If a sql or plan exceeds that max 
length, they are truncated to the specified max length.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@382
PS41, Line 382: tringStreamPop fields;
              :   fields << "INSERT INTO " << table_name << "(";
              :   for (const auto& field : FIELD_DEFINITIONS) {
              :     fields << field.db_column_name <<
> Should this moved inside if, right after L379?
There really is no harm in initializing this variable if the previous condition 
evaluated to false, but to make the code cleaner, I updated the previous if 
statement to return when the condition evaluates to false.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@391
PS41, Line 391: he initialization code only works when run in a separat
> Add DCHECK after this statement:
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@452
PS41, Line 452:       /
> After this loop, DCHECK that COMPLETED_QUERIES_QUEUED value >= size of quer
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@474
PS41, Line 474:           max_row_size = row.size();
              :         }
> DCHECK that COMPLETED_QUERIES_QUEUED >= 0 after this.
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@479
PS41, Line 479:
              :       
DCHECK(ImpaladMetrics::COMPLETED_QUERIES_QUEUED->GetValue() >=
> COMPLETED_QUERIES_QUEUED is not decremented here. Are they enqueued again a
Yes, each completed query is tried three times to be inserted into the 
completed queries table.  The COMPLETED_QUERIES_QUEUED metric is incremented 
when a completed query is added to the completed_queries_ queue and that metric 
is decremented when the completed query is either successfully written to the 
completed queries table or else fails three times.

The completed_queries_ list contains a 
std::pair<std::shared_ptr<QueryStateExpanded>, uint8_t>.  The unit8_t is a 
counter of the number of attempts to insert the completed query into the 
completed queries table.

Since the attempts is tracked on each individual completed query, to determine 
in any particular batch how many succeeded, how many failed but will be tried 
again, and how many failed but won't be tried again requires looping back 
through the queries_to_insert list and tracking each value.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/service/workload-management.cc@481
PS41, Line 481: eries_to_insert.size());
> Unnecessary extra indentation.
Done


http://gerrit.cloudera.org:8080/#/c/20770/42/be/src/util/thread.h
File be/src/util/thread.h:

http://gerrit.cloudera.org:8080/#/c/20770/42/be/src/util/thread.h@228
PS42, Line 228:
Fixed in next patch.


http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/util/thread.h
File be/src/util/thread.h:

http://gerrit.cloudera.org:8080/#/c/20770/41/be/src/util/thread.h@229
PS41, Line 229: }
              :
> There are some specific *ThreadState defined already such as ScannerThreadS
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py
File tests/custom_cluster/test_query_log.py:

http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@80
PS41, Line 80:     .format(self.QUERY_TBL, query_id))
             :       assert res.success
> Add assert message, mention what indices 0 and 1  are about.
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@87
PS41, Line 87: == self.MA
> For with_args, consider adding line break after open parentheses.
There is not a well defined pattern for custom cluster tests.  I like this 
format simply because it makes it easy to see what arguments are being provided 
to the test.


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@168
PS41, Line 168:                                                
"--shutdown_grace_period_s=10 "
              :                                                  
"--shutdown_deadline_s=60",
              :                                     
catalogd_args="--enable_workload_mgmt",
              :
> Can this merged with test_query_log_table_ddl? This looks like a superset o
It possibly could.  The test_query_log_table_ddl test is specifically to test a 
ddl statement while this test is for a dml statement.  There are enough 
differences that I would prefer to keep them separate.


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@199
PS41, Line 199:   finally:
> Move this constant above, near class name.
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@411
PS41, Line 411:         m
> Move this constant above, near class name.
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@473
PS41, Line 473: uster.ge
> Explain what data[42] is.
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@835
PS41, Line 835:
> nit: "timeout=7", for clarity.
I didn't use named arguments anywhere else, thus I don't want to include it 
here and have a one-off pattern.


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@839
PS41, Line 839:     # At least 10 seconds have already elapsed, wait up to 7 
more seconds for the
              :       # queries t
> Move these constant above, near class name. Maybe name them specific to tes
Moved the constants.  This test does not need a cluster_id, so I removed it.  
The other test selects from the completed queries table based on cluster_id, 
thus each run of the test must have it's own unique cluster_id or else records 
from previous tests get picked up and cause the assertions to fail.


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/custom_cluster/test_query_log.py@908
PS41, Line 908:   assert impalad.service.get_metric_value("impala-server.comp
> Move constant above, near class name. Maybe name them specific to test meth
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/util/workload_management.py
File tests/util/workload_management.py:

http://gerrit.cloudera.org:8080/#/c/20770/41/tests/util/workload_management.py@74
PS41, Line 74: EAK_MEM_MIN
> Array data is returned to caller of assert_query() function.
Done


http://gerrit.cloudera.org:8080/#/c/20770/41/tests/util/workload_management.py@259
PS41, Line 259:   end_time_obj = datetime.strptime(end_time.group(1)[:-3], 
"%Y-%m-%d %H:%M:%S.%f")
> This doesn't work when I try to write tests using some existing tables, lik
Interesting.  I switched my tests to this same regex and will see what happens. 
 I'm also looking at using the functional tables in these tests instead of 
setting up new tables.



--
To view, visit http://gerrit.cloudera.org:8080/20770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Gerrit-Change-Number: 20770
Gerrit-PatchSet: 42
Gerrit-Owner: Jason Fehr <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Tue, 12 Mar 2024 19:32:00 +0000
Gerrit-HasComments: Yes

Reply via email to