Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements ......................................................................
Patch Set 4: (10 comments) http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/db_connection.py File tests/comparison/db_connection.py: PS4, Line 126: ( was this completely broken before? http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/funcs.py File tests/comparison/funcs.py: Line 461: class CastFunc(Func): Don't we already have a cast function in the query generator somewhere? http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/model_translator.py File tests/comparison/model_translator.py: Line 799: return data_type_class.__name__.upper() We are not updating this like for Postgres above. It looks like this part of the code is never being used and is becoming stale. Maybe we should think about addressing this somehow? http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/query_profile.py File tests/comparison/query_profile.py: Line 84: 'INSERT_VALUES_ROWS': (1, 10)} It might be a good idea to increase this (maybe later on) so that we sometimes get large values (like 1000). PS4, Line 643: col1, col2, .. if not all columns that the table contains are in this list, then NULLs will be inserted, right? PS4, Line 647: partial' Just curious, what is the reason to write 'partial' first (before the the ==), instead of second? PS4, Line 653: table.updatable_columns We will always be biased towards inserting from the beginning of the table. How about shuffling first: .extend(shuffle(table.updatable_columns)[:additional]) table.updatable_columns returns a copy, so shuffling shouldn't cause problems. PS4, Line 654: else There is a lot of code duplication here. This can be simplified: additional_column_count = randint(0, len(...)) if additional_column_count == 0 and len(cilumns_to_insert) == 0: additional_column_count = 1 coolumns_to_insert.extend(...) http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/statement_generator.py File tests/comparison/statement_generator.py: Line 93: # expressions will be implicitly casted in the databases. This requires less work Maybe we should add a TODO jira to make it possible for the query generator to be able to deal with implicit casts? We might be losing some coverage here. Line 131: values_row.append(val) Might be interesting to sometimes insert 2 identical rows. Or 2 rows with the same primary key value. -- To view, visit http://gerrit.cloudera.org:8080/5486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
