Adar Dembo has posted comments on this change. Change subject: [tools] added insert-generated-rows into kudu tools ......................................................................
Patch Set 11: (26 comments) Can you add a couple things to kudu-tool-test? 1. Modify the help tests to incorporate the new mode and action. 2. Add a quick test or two for the new action. You don't have to test every possible gflag value, but some end-to-end coverage would be nice, to prevent it from bitrotting the way insert-generated-rows could. http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/CMakeLists.txt File src/kudu/tools/CMakeLists.txt: Line 68: tool_action_test.cc Nit: should come after tablet. http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc File src/kudu/tools/tool_action_test.cc: I'm sure a lot of my comments here are to old code (that is, to code that you didn't write). That's not completely fair, but one of the reasons I asked you to move this into the new CLI tool was so that we can take the opportunity to rewrite this code using the Kudu style du jour. I guess what I'm saying is, thanks in advance for being patient. :) PS11, Line 19: which allows to push data Nit: "which pushes data" PS11, Line 26: records Nit: let's use 'rows' instead of 'records'; they mean the same thing but we can avoid some confusion by reusing terminology. PS11, Line 128: DEFINE_string(test_master_addresses, "localhost:7051", : "Comma separated list of master addresses to run against; " : "addresses are 'hostname:port' form, where port may be omitted " : "if a master server listens at the default port."); Hmm. The default value is user friendly, but it's inconsistent with how other actions handle master_addresses. I'd prefer we keep them consistent, which means either making this a required parameter without a default value, or changing the others to provide default values (and accepting the backwards incompatibility). What do you think? Line 132: DEFINE_int32(test_num_threads, 2, Maybe add here that each thread has its own session, so that it's easier to understand the effects of the session gflags. PS11, Line 138: "Size of the session's mutation buffer, in bytes." Nit: reword so that it's clear that this will be the size of each mutation buffer that the session has. PS11, Line 155: Using the preset data allows to squeeze " : "more performance from the code. It's not really clear what this means. Is it a recommendation to use the default value of test_string_fixed? Line 247: srandom(time(nullptr)); Why do we need this? Aren't we already seeding each Random in its constructor? Line 270: vector<thread> threads_; InsertLoadgen is stateless apart from this, and the threads are created and destroyed in the context of Run(). So can we make the entire class stateless? PS11, Line 275: const Schema* schema(row->schema()); Nit: this is only used on the next line; can we combine them? PS11, Line 330: KUDU_CHECK_OK And let's use CHECK_OK here. Line 332: .default_admin_operation_timeout(MonoDelta::FromSeconds(8)) Let's drop this override. PS11, Line 358: Status s = session->Flush(); : if (!s.ok()) { : cout << "Encountered errors: " << s.ToString() << endl; : } Should this also be gated on flush_per_n_rows_ not being 0? Or is the idea to force AUTO_FLUSH_BACKGROUND to finish up everything that's been buffered? If so, should we at least ignore the return value in AUTO_FLUSH_BACKGROUND mode? Or is it still useful? PS11, Line 399: KUDU_RETURN_NOT_OK Nit: should use RETURN_NOT_OK() for all of these. PS11, Line 401: .default_admin_operation_timeout(MonoDelta::FromSeconds(8)) Why lower the timeout? The default value is 30s, which is much safer. PS11, Line 410: KUDU_RETURN_NOT_OK(scanner.NextBatch(&results)); Let's take this opportunity to use the new NextBatch API instead. PS11, Line 427: SleepFor(MonoDelta::FromMilliseconds(50)); To be honest, I think I'd prefer the warning over the sleep. In fact, maybe we should downgrade the warning to INFO? Or remove it altogether? I don't think the underlying issue is something we're going to address, because I think it's good that KuduScanner::Close() doesn't block, and AFAICT the only way to "fix" this is for it to block so that destruction of the KuduClient can be deferred to when there are no outstanding RPCs. PS11, Line 432: const char* const kTableNameArg = "table_name"; : const char* const kTableNameDesc = "Name of an existing table to use " : "for the test. The test will determine the structure of the table column " : "schema and populate it accordingly. It's higly recommended to use " : "a dedicated table created just for testing purposes: the test does not " : "delete the inserted rows."; Nit: seems like an odd placement for this, between two static functions. Why not at the top of the anonymous namespace? PS11, Line 440: const string Maybe cref? Line 446: InsertLoadgen bench(master_addrs, table_name, With InsertLoadgen having no external callers, it's worth reevaluating various parts of the abstraction to see if they make sense. For example: 1. What's the point of the multi-arg constructor given that the only invocation passes gflags verbatim into it? Why not just use the gflags directly inside the class? 2. If the class can be made stateless, perhaps it should be converted into a series of free functions? PS11, Line 466: ostringstream s; : s << "Encountered " << total_err_count << " errors"; : return Status::RuntimeError(s.str()); How about using strings::Substitute here instead? PS11, Line 479: ostringstream s; : s << "Row count mismatch: expected " << total_row_count : << "; actual " << count; : return Status::RuntimeError(s.str()); Likewise here. Line 494: .Description( Can you split this up using Description() and LongDescription()? See ksck for an example. PS11, Line 497: higly highly Line 500: .AddOptionalParameter("test_master_addresses") Nit: please sort this list alphabetically. Makes it easier to find parameters. -- To view, visit http://gerrit.cloudera.org:8080/4412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes