Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4467: Add support for DML statements in stress test ......................................................................
Patch Set 4: (13 comments) http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG Commit Message: Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML > Seems ok for now, but from a perspective we should consider changing these Actually a mod value is equivalent to %rows: %rows ~= 100 / mod_value Maybe instead we are more interested in the absolute number of rows? So a user could specify the number of rows that should be touched by a query, and the stress test would then automatically compute what the mod value should be for a given table? http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: Line 368: self._select_probability = select_probability > check that this is indeed a float between 0 and 1 Do you really think that it makes sense to add input validation? We are not validating other inputs in other places. Line 747: self.modifies_table = False > This flag is pretty confusing. What do we use it for? Done Line 1297: return > add comment why we need to skip compute stats Done Line 1428: if table.name + "_original" in set(table.name for table in tables): > What does it mean if this is not true? Maybe add a LOG.debug message to exp Done Line 1429: cursor.execute("SHOW CREATE TABLE " + table.name) > Add a note or TODO that this will not work for certain types of Kudu table Yes, to my knowledge at this time we only use Kudu tables with a simple hash partitioning. Added a note. How do we create tables with more complex partition schemes? Line 1440: For each table in the database that cursor is connected to, create several queries, > Mention the limitations of this function: Added the two points that you suggested. I don't really see what other limitations I can add. Line 1443: exists a table with a '_original' suffix that has unmodified, for example, tpch data. > that has unmodified -> that is never modified Done Line 1450: continue > LOG.debug what is happening here Done Line 1452: if primary_key.exact_type not in (TinyInt, SmallInt, BigInt): > what about Int? Done Line 1456: continue > LOG.debug what is happening here because some of these limitations are not Done Line 1481: "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original b " > This might give you strange results for tables with multiple primary keys. Maybe it's okay to keep it as is? It can potentially result in many rows having the data, but we are not really checking results in the stress test. (The update statement duplicates rows anyways, even if there is only 1 primary key column). Line 1559: cursor.execute("SHOW CREATE TABLE " + table_name) > same deal with SHOW CREATE TABLE as above, add a note or TODO that this pro Done -- To view, visit http://gerrit.cloudera.org:8080/5093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
