Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 11:

(26 comments)

Can you add a couple things to kudu-tool-test?
1. Modify the help tests to incorporate the new mode and action.
2. Add a quick test or two for the new action. You don't have to test every 
possible gflag value, but some end-to-end coverage would be nice, to prevent it 
from bitrotting the way insert-generated-rows could.

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

Line 68:   tool_action_test.cc
Nit: should come after tablet.


http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

I'm sure a lot of my comments here are to old code (that is, to code that you 
didn't write). That's not completely fair, but one of the reasons I asked you 
to move this into the new CLI tool was so that we can take the opportunity to 
rewrite this code using the Kudu style du jour.

I guess what I'm saying is, thanks in advance for being patient. :)


PS11, Line 19: which allows to push data
Nit: "which pushes data"


PS11, Line 26: records
Nit: let's use 'rows' instead of 'records'; they mean the same thing but we can 
avoid some confusion by reusing terminology.


PS11, Line 128: DEFINE_string(test_master_addresses, "localhost:7051",
              :               "Comma separated list of master addresses to run 
against; "
              :               "addresses are 'hostname:port' form, where port 
may be omitted "
              :               "if a master server listens at the default 
port.");
Hmm. The default value is user friendly, but it's inconsistent with how other 
actions handle master_addresses. I'd prefer we keep them consistent, which 
means either making this a required parameter without a default value, or 
changing the others to provide default values (and accepting the backwards 
incompatibility).

What do you think?


Line 132: DEFINE_int32(test_num_threads, 2,
Maybe add here that each thread has its own session, so that it's easier to 
understand the effects of the session gflags.


PS11, Line 138: "Size of the session's mutation buffer, in bytes."
Nit: reword so that it's clear that this will be the size of each mutation 
buffer that the session has.


PS11, Line 155: Using the preset data allows to squeeze "
              :               "more performance from the code.
It's not really clear what this means. Is it a recommendation to use the 
default value of test_string_fixed?


Line 247:     srandom(time(nullptr));
Why do we need this? Aren't we already seeding each Random in its constructor?


Line 270:   vector<thread> threads_;
InsertLoadgen is stateless apart from this, and the threads are created and 
destroyed in the context of Run(). So can we make the entire class stateless?


PS11, Line 275:   const Schema* schema(row->schema());
Nit: this is only used on the next line; can we combine them?


PS11, Line 330: KUDU_CHECK_OK
And let's use CHECK_OK here.


Line 332:       .default_admin_operation_timeout(MonoDelta::FromSeconds(8))
Let's drop this override.


PS11, Line 358:   Status s = session->Flush();
              :   if (!s.ok()) {
              :     cout << "Encountered errors: " << s.ToString() << endl;
              :   }
Should this also be gated on flush_per_n_rows_ not being 0? Or is the idea to 
force AUTO_FLUSH_BACKGROUND to finish up everything that's been buffered? If 
so, should we at least ignore the return value in AUTO_FLUSH_BACKGROUND mode? 
Or is it still useful?


PS11, Line 399: KUDU_RETURN_NOT_OK
Nit: should use RETURN_NOT_OK() for all of these.


PS11, Line 401:       
.default_admin_operation_timeout(MonoDelta::FromSeconds(8))
Why lower the timeout? The default value is 30s, which is much safer.


PS11, Line 410:     KUDU_RETURN_NOT_OK(scanner.NextBatch(&results));
Let's take this opportunity to use the new NextBatch API instead.


PS11, Line 427:   SleepFor(MonoDelta::FromMilliseconds(50));
To be honest, I think I'd prefer the warning over the sleep. In fact, maybe we 
should downgrade the warning to INFO? Or remove it altogether?

I don't think the underlying issue is something we're going to address, because 
I think it's good that KuduScanner::Close() doesn't block, and AFAICT the only 
way to "fix" this is for it to block so that destruction of the KuduClient can 
be deferred to when there are no outstanding RPCs.


PS11, Line 432: const char* const kTableNameArg = "table_name";
              : const char* const kTableNameDesc = "Name of an existing table 
to use "
              :     "for the test. The test will determine the structure of the 
table column "
              :     "schema and populate it accordingly.  It's higly 
recommended to use "
              :     "a dedicated table created just for testing purposes: the 
test does not "
              :     "delete the inserted rows.";
Nit: seems like an odd placement for this, between two static functions. Why 
not at the top of the anonymous namespace?


PS11, Line 440: const string
Maybe cref?


Line 446:   InsertLoadgen bench(master_addrs, table_name,
With InsertLoadgen having no external callers, it's worth reevaluating various 
parts of the abstraction to see if they make sense. For example:
1. What's the point of the multi-arg constructor given that the only invocation 
passes gflags verbatim into it? Why not just use the gflags directly inside the 
class?
2. If the class can be made stateless, perhaps it should be converted into a 
series of free functions?


PS11, Line 466:     ostringstream s;
              :     s << "Encountered " << total_err_count << " errors";
              :     return Status::RuntimeError(s.str());
How about using strings::Substitute here instead?


PS11, Line 479:       ostringstream s;
              :       s << "Row count mismatch: expected " << total_row_count
              :         << "; actual " << count;
              :       return Status::RuntimeError(s.str());
Likewise here.


Line 494:       .Description(
Can you split this up using Description() and LongDescription()? See ksck for 
an example.


PS11, Line 497: higly
highly


Line 500:       .AddOptionalParameter("test_master_addresses")
Nit: please sort this list alphabetically. Makes it easier to find parameters.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to