Michael Brown has posted comments on this change. Change subject: IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements ......................................................................
Patch Set 6: (10 comments) Taras, please see patch set 6 for new changes. http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/db_connection.py File tests/comparison/db_connection.py: PS4, Line 126: ( > was this completely broken before? Yeah, this check was cleary broken, but I tended to give it happy-path migrated tables, so it didn't matter for my use. http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/funcs.py File tests/comparison/funcs.py: Line 461: class CastFunc(Func): > Don't we already have a cast function in the query generator somewhere? We only have a CastAsChar function, which exists for this reason: "This is added so that query generation can assume that any return type can be produced by an aggregate or analytic with only one level of nesting." The CastAsChar SqlWriter implementation basically writes either an Int or Char base type as a string. There is no first-class cast function with many signatures to choose from for casting. http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/model_translator.py File tests/comparison/model_translator.py: Line 799: return data_type_class.__name__.upper() > We are not updating this like for Postgres above. It looks like this part o It's basically the same case as with Oracle: the code exists and maybe worked years ago, but probably doesn't very well anymore. We are certainly not doing regular runs. We've also let slip through all kinds of code (say, in db_connection) that doesn't work and doesn't get regular testing. With Oracle at least, I can see us keeping it around for when we want to do more testing comparing analytic functions, for example. Has there ever been a need for MySQL comparison since preferring PostgreSQL? Maybe we just cut the MySQL code in a follow-on patch. http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/query_profile.py File tests/comparison/query_profile.py: Line 84: 'INSERT_VALUES_ROWS': (1, 10)} > It might be a good idea to increase this (maybe later on) so that we someti I agree. But it makes the statements incredibly long, and without serialization as artifacts, and replay, it's hard for someone to deal with such a long statement, for example when copy/pasting. IMPALA-4732 will deal with this and related concerns. PS4, Line 643: col1, col2, .. > if not all columns that the table contains are in this list, then NULLs wil yes PS4, Line 647: partial' > Just curious, what is the reason to write 'partial' first (before the the = Old habit related to C, when you could mistake = for == in comparison. If you use the literal on the left, you can't mistakenly use =, since you can't assign to a literal. Do you want me to change it? I don't believe there's anything wrong with this from a style standpoint. PS4, Line 653: maining_columns[:additi > We will always be biased towards inserting from the beginning of the table. Good catch. shuffle() acts on the parameter and doesn't have a return value, so my use doesn't match yours, but the concept was kept. PS4, Line 654: shuf > There is a lot of code duplication here. This can be simplified: Removed duplication. http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/statement_generator.py File tests/comparison/statement_generator.py: Line 93: # expressions will be implicitly casted in the databases. This requires less work > Maybe we should add a TODO jira to make it possible for the query generator Done Line 131: values_row.append(val) > Might be interesting to sometimes insert 2 identical rows. Or 2 rows with t I think this can be more effective once we increase the number of rows inserted. IMPALA-4732 -- To view, visit http://gerrit.cloudera.org:8080/5486 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
