Alex Behm has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test ......................................................................
Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > Actually a mod value is equivalent to %rows: I agree you can achieve the same thing with mod values in principle. Let me rephrase my points: * Users of this test may not know the number of rows in the target table up front. So before I can begin to run this test, I must first look up the number of rows and then compute the mod values to achieve a desired %rows. * A single mod value does not represent the same %rows for different tables (which could have a different number of rows). My understanding is that we run on multiple test tables with the same mod values. * Just like you said, if users were allowed to specify %rows, the framework could internally translate that into a mod value based on the #rows of the table(s). Seems easier for users. * Further, the "concept" of %rows would still apply even for tables with sparse primary keys, or where there are multiple primary-key columns. The internal mechanism for translating %rows into predicates would be different, of course, but the concept of "mod values" does not seem very intuitive for those cases. We definitely don't need to do this now, but it might be worth recording the above improvement in a JIRA. http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 1429: cursor.execute("SHOW CREATE TABLE " + table.name) > Yes, to my knowledge at this time we only use Kudu tables with a simple has Thanks. You can look at AnalyzeDDLTest#TestCreateManagedKuduTable to look at examples of more advanced partitioning schemes. Line 1481: "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original b " > Maybe it's okay to keep it as is? It can potentially result in many rows ha To me it's not really about validating the results, but more about predictability of the test's behavior. As a user, when I provide a list of mod values as an input I have a certain expectation of the "work" that those translate to. For tables with several primary-key columns this update (and the delete/upsert below) may be modifying far more rows than I expected based on the mod values I gave. Also consider that the join in this update could really blow up. What's the benefit of leaving it as is? I think it would be better to be explicit about the limitations. Adding a check here seems easy enough. -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
