Alexey Serbin has posted comments on this change. Change subject: [tools] added insert-generated-rows into kudu tools ......................................................................
Patch Set 12: (20 comments) http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc File src/kudu/tools/tool_action_test.cc: PS11, Line 358: } : if (total_row_count != nullptr) { : *total_row_count = accumulate(row_count.begin(), row_count.end(), 0UL); : } > Looks like it's still here? Oops, my bad. I'm surprised myself -- probably, I got distracted. Let me remove it, though. http://gerrit.cloudera.org:8080/#/c/4412/12/src/kudu/tools/tool_action_test.cc File src/kudu/tools/tool_action_test.cc: PS12, Line 21: options > s/options/option ? Done Line 33: // --num_threads=1 \ > don't you need master_addresses here as required params ? also applicable i Good catch -- these examples became stale once I switched to the mandatory master(s) location. Will update PS12, Line 75: master server(s) location > as per the code below there is no default_params for master server location Yes, correct -- since changing to use master addresses as mandatory parameter, this no longer has default value. Will update. Line 80: // bench_c_00_03 > Same as above. Also here and above, we could keep this name intuitive to re Done Line 119: using kudu::Schema; > I think Schema is not being used anywhere. Done Line 131: using kudu::client::sp::shared_ptr; > Actually a basic Qn here: why do we use this special shared_ptr as opposed It's just a typedef: it might be either std::shared_ptr or std::tr1::shared_ptr, check $REPO_ROOT/src/kudu/client/shared_ptr.h That's because Kudu C++ client API uses this notation to be compatible with C++03. PS12, Line 142: 1000000 > Nit: Not sure if you want to insert 1 million by default, if I am a curious 1M rows are inserted really fast: at my laptop it's just 5 seconds. Also, the idea was to create some _load_ :) With 1K rows there is no load at all. Besides, take into the consideration that the original insert-generated-rows tools was running with no limit at generated rows at all. So, what do you think would be the right number here? Line 194: int64_t NextImpl() { > Do we want to use uint64_t here ? Although Random.Next64() seems to be fill OK, that makes sense. Will update to uint64_t. PS12, Line 215: != 0 > this check is prolly redundant ? As I remember, Tidy Bot thinks it's better for readability: that's the reason I left it here. I'll try to get rid of it -- let's see what it will say. PS12, Line 285: case UNKNOWN_DATA: // fall-through > redundant ? Yes, it seems so. I copied the switch enumerators from some place and left this UNKNOWN_DATA as is. Will remove. Line 302: FLAGS_buffer_flush_watermark_pct)); > I vaguely recall cpp guideline saying next spill being after 4 spaces I gue QtCreator's editor does this sometimes. Will fix. PS12, Line 314: FLAGS_num_rows_per_thread > consider using variable FLAGS_ directly to be consistent with other places. I wanted to have a local variable since I access it in multiple places. PS12, Line 315: num_rows_per_thread == 0 > given it's default value being 1M, what's this check for ? Take a look at the comment for the '--num_rows_per_thread': 0 means no limit. Line 324: if (row_count != nullptr) { > DCHECK or CHECK for this, unless we have a use case where it could actually I don't see why DCHECK or CHECK is needed here. One can pass nullptr as rowcount: that means there isn't any interest in getting the count. It's a ubiquitous notation for pointer out parameters, which I want to have here, even if nobody uses it right now. Line 403: Stopwatch sw(Stopwatch::ALL_THREADS); > Stupid Qn: Does ALL_THREADS here mean all threads on CPU or all threads gen Threads in this process, so the monitoring/main thread will also be counted in. I don't want LOG_TIMING, since I want to output that information in my own format, not what LOG_TIMING provides. PS12, Line 440: optional scan afterwards > I think this need not be made explicit since this is listed in optional par I want this to be present to emphasize that there are two parts: the write and the read one, even if the latter is optional. That's an essence of this test in a terse phrasing, otherwise it would be necessary to read through all the list of options to understand about the read part. PS12, Line 452: kTableNameArg > Can we not make this table_name optional and create one if the user doesn't The original approach was to be able to use any table structure, not only some fixed pre-generated one which the test would create. So, this test retrieves the table schema out of existing table it's pointed at. That's because specifying custom table structure via parameters of this test would be cumbersome. We can add this an option, with pairing auto-delete option. In case of auto-created table we can safely delete it if nobody asks to leave it alone after the test is completed. PS12, Line 455: > Nit: Double space here. Yes, here and elsewhere in the description for parameters: this is intentional (that's easier for reading, IMO). Or we have some sort of style guide for having one space for those? Line 457: "the inserted rows." }) > Ideally, we would not want this behavior of test leaving the test data on t I'm strongly opposed to auto-deleting any data by the test in its current implementation, since this test might be run in production environment. Nothing prevents a user pointing to already existing table with some data, and I don't want to delete any useful data along with the data generated by the test. To me, using the approach you are advocating makes more sense in case of auto-created tables, because in that case we can have a guarantee the table was created by the test and not existed before. I think we can use auto-generated table names for that and be safe. I'll take a look at that approach. -- To view, visit http://gerrit.cloudera.org:8080/4412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes