Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools

Patch Set 11:


Will post updated code shortly.
File src/kudu/tools/CMakeLists.txt:

Line 68:
> Nit: should come after tablet.
File src/kudu/tools/

> I'm sure a lot of my comments here are to old code (that is, to code that y
Sure, no problem.  I really appreciate your guidance and feedback! :)

PS11, Line 19: which allows to push data
> Nit: "which pushes data"

PS11, Line 26: records
> Nit: let's use 'rows' instead of 'records'; they mean the same thing but we

PS11, Line 128: DEFINE_string(test_master_addresses, "localhost:7051",
              :               "Comma separated list of master addresses to run 
against; "
              :               "addresses are 'hostname:port' form, where port 
may be omitted "
              :               "if a master server listens at the default 
> Hmm. The default value is user friendly, but it's inconsistent with how oth
Certainly, it's better to keep semantics and usage of the master_addresses 
consistent across different utilities.  I put this parameter as optional since 
'master_address' in the original insert-generated-rows utility was optional and 
had the default value of 'localhost'.

What do you think is best: keep the approach when this parameter is mandatory 
or modify other utilities to make it optional (btw, Todd told me that we can 
break backward-compatibility of the insert-generated-rows tool, so let's don't 
consider backward-compatibility for the insert-generated-rows as a constraint 
here) ?

Line 132: DEFINE_int32(test_num_threads, 2,
> Maybe add here that each thread has its own session, so that it's easier to

PS11, Line 138: "Size of the session's mutation buffer, in bytes."
> Nit: reword so that it's clear that this will be the size of each mutation 

PS11, Line 155: Using the preset data allows to squeeze "
              :               "more performance from the code.
> It's not really clear what this means. Is it a recommendation to use the de
I was trying to say that client generates more data per second using a pre-set 
data string compared with generated strings of the same length if run with the 
same hardware configuration.   Probably, I'll put this explanation into the 
description of the parameter.

Line 247:     srandom(time(nullptr));
> Why do we need this? Aren't we already seeding each Random in its construct
Good catch -- this is a leftover from previous revisions.  Will remove.

Line 270:   vector<thread> threads_;
> InsertLoadgen is stateless apart from this, and the threads are created and
Good idea -- will do.

PS11, Line 275:   const Schema* schema(row->schema());
> Nit: this is only used on the next line; can we combine them?

PS11, Line 330: KUDU_CHECK_OK
> And let's use CHECK_OK here.

Line 332:       .default_admin_operation_timeout(MonoDelta::FromSeconds(8))
> Let's drop this override.

PS11, Line 358:   Status s = session->Flush();
              :   if (!s.ok()) {
              :     cout << "Encountered errors: " << s.ToString() << endl;
              :   }
> Should this also be gated on flush_per_n_rows_ not being 0? Or is the idea 
Yep, since there is a report on the errors below, it makes sense to omit this 
for all session flush modes.  I will remove this.

> Nit: should use RETURN_NOT_OK() for all of these.

PS11, Line 401:       
> Why lower the timeout? The default value is 30s, which is much safer.
I copied it over from some example; no particular reason.  Will drop the 

PS11, Line 410:     KUDU_RETURN_NOT_OK(scanner.NextBatch(&results));
> Let's take this opportunity to use the new NextBatch API instead.

PS11, Line 427:   SleepFor(MonoDelta::FromMilliseconds(50));
> To be honest, I think I'd prefer the warning over the sleep. In fact, maybe
OK, that sounds reasonable.  I'll drop this.  There might be other warning 
messages like information on DNS resolution timings, so this particular warning 
does not look too weird after all.

PS11, Line 432: const char* const kTableNameArg = "table_name";
              : const char* const kTableNameDesc = "Name of an existing table 
to use "
              :     "for the test. The test will determine the structure of the 
table column "
              :     "schema and populate it accordingly.  It's higly 
recommended to use "
              :     "a dedicated table created just for testing purposes: the 
test does not "
              :     "delete the inserted rows.";
> Nit: seems like an odd placement for this, between two static functions. Wh

PS11, Line 440: const string
> Maybe cref?
Yes, sure.  It looks like a copy-and-paste from other place.  Will fix it there 
as well.

Line 446:   InsertLoadgen bench(master_addrs, table_name,
> With InsertLoadgen having no external callers, it's worth reevaluating vari
Good idea -- that would simplify this a bit.  I had some other ideas for 
InsertLoadGen as a class, but since it's not used anywhere else, I think it's 
worth to reduce it to a number of functions.

PS11, Line 466:     ostringstream s;
              :     s << "Encountered " << total_err_count << " errors";
              :     return Status::RuntimeError(s.str());
> How about using strings::Substitute here instead?

PS11, Line 479:       ostringstream s;
              :       s << "Row count mismatch: expected " << total_row_count
              :         << "; actual " << count;
              :       return Status::RuntimeError(s.str());
> Likewise here.

Line 494:       .Description(
> Can you split this up using Description() and LongDescription()? See ksck f
You mean ExtraDescription()?  OK, will do.  I'm also thinking to add examples 
of usage there.  Does it make sense?

PS11, Line 497: higly
> highly

Line 500:       .AddOptionalParameter("test_master_addresses")
> Nit: please sort this list alphabetically. Makes it easier to find paramete

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Alexey Serbin <>
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to