[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

Alex Behm (Code Review) Tue, 13 Dec 2016 22:21:48 -0800

Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
......................................................................



Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
> Actually a mod value is equivalent to %rows:
I agree you can achieve the same thing with mod values in principle. Let me 
rephrase my points:
* Users of this test may not know the number of rows in the target table up 
front. So before I can begin to run this test, I must first look up the number 
of rows and then compute the mod values to achieve a desired %rows.
* A single mod value does not represent the same %rows for different tables 
(which could have a different number of rows). My understanding is that we run 
on multiple test tables with the same mod values.
* Just like you said, if users were allowed to specify %rows, the framework 
could internally translate that into a mod value based on the #rows of the 
table(s). Seems easier for users.
* Further, the "concept" of %rows would still apply even for tables with sparse 
primary keys, or where there are multiple primary-key columns. The internal 
mechanism for translating %rows into predicates would be different, of course, 
but the concept of "mod values" does not seem very intuitive for those cases.

We definitely don't need to do this now, but it might be worth recording the 
above improvement in a JIRA.


http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 1429:       cursor.execute("SHOW CREATE TABLE " + table.name)
> Yes, to my knowledge at this time we only use Kudu tables with a simple has
Thanks.

You can look at AnalyzeDDLTest#TestCreateManagedKuduTable to look at examples 
of more advanced partitioning schemes.


Line 1481:           "UPDATE a SET {update_list} FROM {table_name} a JOIN 
{table_name}_original b "
> Maybe it's okay to keep it as is? It can potentially result in many rows ha
To me it's not really about validating the results, but more about 
predictability of the test's behavior. As a user, when I provide a list of mod 
values as an input I have a certain expectation of the "work" that those 
translate to. For tables with several primary-key columns this update (and the 
delete/upsert below) may be modifying far more rows than I expected based on 
the mod values I gave. Also consider that the join in this update could really 
blow up.

What's the benefit of leaving it as is?

I think it would be better to be explicit about the limitations. Adding a check 
here seems easy enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test

Reply via email to