[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 7: Thank you for looking into this. It would be nice to fix it if possible. My only concern is that even if in this particular instance, the statement works correctly without fully qualifying the table name, how can we tell what's needed and what's not? Some statements have qualified names, others don't and the confusion regarding an already convoluted mechanism increases :) -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 7: I'm 100% willing to admit that I could be wrong in my understanding of this. In the meantime, I'm going look into seeing if there's a quick fix to be made in generate-schema-statements.py for why {db_name} is not working here. I agree with you that it probably should. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 7: I'm not disputing you from a generic, syntactical standpoint -- but this is what I can tell. The ALTER TABLE statement shows up exactly 100 times in the templates in testdata/datasets/. This pattern is used 98 times: $ git grep 'ALTER TABLE {table_name}' testdata/datasets/ | wc -l 98 This pattern is used only twice: $ git grep 'ALTER TABLE {db_name}{db_suffix}.{table_name}' testdata/datasets/ | wc -l 2 And since those two instances are both conditional... 1850:ALTER TABLE {db_name}{db_suffix}.{table_name} ADD IF NOT EXISTS PARTITION (year=2015, month=3); 1851:ALTER TABLE {db_name}{db_suffix}.{table_name} ADD IF NOT EXISTS PARTITION (year=2010, month=3); ...my *guess* is that they don't ever get executed. Also, if you look at the traceback I posted in an earlier comment, the error has nothing to do with the syntax of the ALTER TABLE statement itself -- rather, it's something to do with expansion of {db_name} in the schema template file, hence the KeyError on the string 'db_name'. Traceback (most recent call last): File "./testdata/bin/generate-schema-statements.py", line 753, in test_vectors, sections, include_constraints, exclude_constraints, only_constraints) File "./testdata/bin/generate-schema-statements.py", line 658, in generate_statements output.create.append(use_db + alter.format(table_name=table_name)) KeyError: 'db_name' {db_name} expansion seems to work with other kind of statements produced by generate-schema-statements.py -- just not ALTER TABLE statements. I don't know why. But there's an open bug to refactor generate-schema-statements.py, which is generally recognized to be a rat's nest of suspect code. Since the overwhelming precedent in our code base (98:2) is to use 'ALTER TABLE {table_name}' can I just do that here? I'll change the comment to read: -- {db_name} expansion does not seem to work with ALTER TABLE statements. See IMPALA-4005. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5177/7/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: PS7, Line 373: ALTER does not take a fully qualified table name. I don't follow this comment. You *can* use a qualified name in an ALTER table statement. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 7: The last patch was just a rebase + adding a link to the commit msg. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Hello Internal Jenkins, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5177 to look at the new patch set (#7). Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. As an additional check, a pre-review test run was attempted on the upstream Jenkins server. It was successful: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/9/ Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/7 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 6: In the interest of making some progress (but more so trying out the preview job available on the upstream jenkins server) I just kicked this off: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/9/ -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 6: What's the next step? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Jim Apple has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 6: Dry-run of tests: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/193/ -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 6: And just to confirm that this works: ./testdata/bin/generate-schema-statements.py --exploration_strategy=core --workload=tpcds --scale_factor= --verbose --force_reload --hive_warehouse_dir=/test-warehouse --hdfs_namenode=localhost:20500 --backend=localhost:21000 INFO:bootstrap_virtualenv:Installing Kudu into the virtualenv Target Dataset: tpcds HDFS path: /test-warehouse/tpcds.customer_demographics does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.date_dim does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.time_dim does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.item does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.store does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.promotion does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.household_demographics does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_address does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.store_sales_unpartitioned does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.store_sales does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_demographics_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.date_dim_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.time_dim_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.item_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.store_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.promotion_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.household_demographics_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_address_seq_snap does not exists or is empty. Data will be loaded. Skipping 'tpcds_seq_snap.store_sales_unpartitioned' due to include constraint match. HDFS path: /test-warehouse/tpcds.store_sales_seq_snap does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_demographics_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.date_dim_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.time_dim_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.item_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.store_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.promotion_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.household_demographics_parquet does not exists or is empty. Data will be loaded. HDFS path: /test-warehouse/tpcds.customer_address_parquet does not exists or is empty. Data will be loaded. Skipping 'tpcds_parquet.store_sales_unpartitioned' due to include constraint match. HDFS path: /test-warehouse/tpcds.store_sales_parquet does not exists or is empty. Data will be loaded. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has uploaded a new patch set (#6). Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/6 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 4: Thanks for the comments (and the prodding.) The verification for this change was breaking here: HDFS path: /test-warehouse/tpcds.store_sales_unpartitioned does not exists or is empty. Data will be loaded. Traceback (most recent call last): File "./testdata/bin/generate-schema-statements.py", line 753, in test_vectors, sections, include_constraints, exclude_constraints, only_constraints) File "./testdata/bin/generate-schema-statements.py", line 658, in generate_statements output.create.append(use_db + alter.format(table_name=table_name)) KeyError: 'db_name' When I grep through all of the other datasets for ALTER, it appears that we don't expect a fully-qualified db name when generating ALTER statements? (Luckily someone explicitly commented on that in an earlier change.) $ git grep ALTER testdata/datasets/ testdata/datasets/functional/functional_schema_template.sql:-- ALTER does not take a fully qualified name. testdata/datasets/functional/functional_schema_template.sql:ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION (year=2009, month=1); testdata/datasets/functional/functional_schema_template.sql:ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION (year=2009, month=2); testdata/datasets/functional/functional_schema_template.sql:ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION (year=2009, month=3); testdata/datasets/functional/functional_schema_template.sql:ALTER TABLE {table_name}_tmp ADD IF NOT EXISTS PARTITION (year=2009, month=1); testdata/datasets/functional/functional_schema_template.sql:ALTER TABLE {table_name}_tmp ADD IF NOT EXISTS PARTITION (year=2009, month=2); testdata/datasets/functional/functional_schema_template.sql:ALTER TABLE {table_name}_tmp ADD IF NOT EXISTS PARTITION (year=2009, month=3); [...] Rather than reverse engineering our dataload scripts to try to understand why that's the case, I'm just going to revert the change from patch set 2. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Hello Internal Jenkins, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5177 to look at the new patch set (#6). Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/6 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Hello Internal Jenkins, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5177 to look at the new patch set (#5). Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/5 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Jim Apple has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 4: > Any progress on this? There is no activity for over a month. If > we're not going to move forward with this change let's abandon > this. Do we know if the change works or not? Hi David. I hope you don't mind, but I am testing this here: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/139/ It won't talk back to or +1 or -1 this commit, but I thought you might be interested in seeing what tests are working or not on the public Jenkins machine. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 4: Any progress on this? There is no activity for over a month. If we're not going to move forward with this change let's abandon this. Do we know if the change works or not? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 4: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/570/ -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5177 to look at the new patch set (#4). Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/4 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/5177/3/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: PS3, Line 371: up typo: us? Line 372: -- of the data. Can you also plz reference the JIRA that caused this change? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 3: Any further thoughts on this? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: Comment was added as requested, and IMPALA-4534 was created (and linked to IMPALA-4005.) -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has uploaded a new patch set (#3). Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales This patch changes the way we load tpcds.store_sales test data. Before this, we were relying on a force_reload to build the table partitions based upon the data that had been copied over to HDFS from the warehouse snapshot. This worked on the local mini-cluster, but for some reason, it was selectively duplicating data when run on a remote cluster. This patch doesn't solve the mystery of why data duplication occurs on remote clusters, but it does resolve the immediate concern of loading test data by using Impala's recover partitions feature to automatically recognize the partitions in the HDFS directories. We just needed to add an ALTER TABLE store_sales RECOVER PARTITIONS to the tpcds schema template file. Tested by dropping the tpcds table on from a remote cluster setup, reloading the table, and running the tests in test_tpcds_queries.py. Tests that had been failng before are now passing. Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/5177/3 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: Just to clarify, because I think it might be confusing to people not familiar with the 3 code paths Harrison referenced: 1. Originally "data load" meant generating all data from scratch. Doesn't assume loading anything from a snapshot file. This is the general use case, and is what an external contributor would need to do. 2. It can also mean that we copy data to HDFS from a snapshot file, but we don't restore the metadata from a snapshot. This is the case that currently applies to loading Impala's test data to a cluster within Cloudera's testing infrastructure, but theoretically this could be done elsewhere. 3. Finally, the case by which *both* HDFS and metadata DB are reconstituted from snapshot files -- though some tweaking of the metadata is still usually required. This is most commonly used by developers who have access to the Cloudera internal resources. Just wanted to clarify that. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: Sorry -- confusing typo. Correction below: "...we can't always assume that HDFS data IS already present." -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: Dimitris, I spoke with Harrison about your question, and will paraphrase his reply in case he doesn't have a chance to weigh in himself on this review. Essentially, he pointed out that we can't always assume that HDFS data isn't already present -- e.g., we don't necessarily always load test data in advance from the snapshot file. Wouldn't we possibly be introducing new bugs/regressions by replacing all of the ALTER TABLE statements with RECOVER PARTITIONS everywhere? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: Thanks, I'll check with Harrison. In the meantime, so that we don't lose track of it, I added the following issue: https://issues.cloudera.org/browse/IMPALA-4520. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: I think we should try to be consistent in the way we handle table loading. Our infrastructure is already too messy to reason about and changing the behavior in one place makes the problem even worse. I am fine with making this change if it will unblock a specific piece of work. We just need to make sure the follow up work doesn't fall through the cracks. Maybe also check with Harrison for a second opinion. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 2: functional_schema_template.sql seems to have tables that could run into the same problems. Why not using RECOVER PARTITIONS everywhere? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
David Knupp has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5177/1/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: PS1, Line 370: {table_name} > Shouldn't this be {db_name}{db_suffix}.{table_name}? Maybe? :-) I actually copied the format from the ALTER statements in the functional_schema_template.sql file, which just use {table_name}. But your suggestion makes sense -- I'll make the change and reload/retest. -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482: Use ALTER TABLE / RECOVER PARTITIONS when loading tpcds.store_sales .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5177/1/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: PS1, Line 370: {table_name} Shouldn't this be {db_name}{db_suffix}.{table_name}? -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes