[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. IMPALA-8344: Add support for running the minicluster with S3Guard Some tests can fail on S3 due to some operations that are eventually consistent. S3Guard stores extra metadata in a DynamoDB to solve several consistency issues. This adds support for running the minicluster on S3 with S3Guard. S3Guard is configured by the following environment variables: S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must be exclusively owned by this minicluster. The dataload scripts initialize this table and will purge entries if the table already exists. The table should be in the same region as the S3_BUCKET for the minicluster. S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE These environment variables only impact S3 configurations. The support comes from three pieces: 1. Configuration changes in core-site.xml to add the appropriate parameters. 2. Updating dataload to initialize/purge the s3guard dynamodb table and import data appropriately. 3. Update tests to manipulate files through the HDFS command line rather than through s3 utilities. This takes the filesystem utility code for ABFS (which actually calls HDFS command line), makes it generic, and uses it for S3Guard. Testing: - Ran multiple rounds of s3 tests - Aborted tests in the middle and restarted the s3 tests (to test the s3guard reinitialization code) Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Reviewed-on: http://gerrit.cloudera.org:8080/13020 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins Reviewed-by: Sahil Takiar --- M bin/generate_xml_config.py M bin/impala-config.sh A bin/jenkins/release_cloud_resources.sh M infra/python/deps/requirements.txt M testdata/bin/load-test-warehouse-snapshot.sh A testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py D testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py M tests/query_test/test_scanners_fuzz.py D tests/util/abfs_util.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py D tests/util/s3_util.py 13 files changed, 316 insertions(+), 373 deletions(-) Approvals: Joe McDonnell: Looks good to me, but someone else must approve Impala Public Jenkins: Verified Sahil Takiar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 6: Code-Review+2 Should we file a JIRA for applying the "-d" change in bulk? Otherwise +2. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 23 May 2019 18:23:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 23 May 2019 04:41:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4301/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 22 May 2019 23:05:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 6: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 22 May 2019 23:04:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 6: Rebase to get the drop of hwx.public.repo. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 22 May 2019 23:04:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4295/ -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 22 May 2019 03:14:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4295/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 21 May 2019 21:55:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179 PS5, Line 179: check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location]) > Makes sense. I'm wondering if we could add the -d option just for S3 (and d Within a single test, we are mostly single threaded in terms of accessing tables where we upload files. That's the main reason I'm ok with HDFS having "-d", but since there is no correctness or performance penalty, we could leave off the "-d" for HDFS. I agree that s3 mostly has no reason not to use "-d". I might look into a bulk change for that. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 21 May 2019 21:55:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: The config template changes look fine to me -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 21 May 2019 21:43:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179 PS5, Line 179: check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location]) > I saw a failure here where s3guard showed the temporary file used for copyi Makes sense. I'm wondering if we could add the -d option just for S3 (and do it for all calls to copyFromLocal). For HDFS, I think it makes sense to run without -d, as it makes the upload atomic. For S3, I think 99% of the time, you should run with -d. I don't think there is much benefit to writing data to a temp file on S3, and then re-copying it to the final file, plus I think S3 uploads are already atomic. Not a blocker, so we could do this in a follow up JIRA, just a thought. Yeah, I think your explanation is correct - e.g. you could write a file from one impalad, and S3Guard will store an entry for that file in DynamoDB, then another impalad could try to read that file, and S3Guard will indicate the file exists, but since S3 is eventually consistent the file might not appear yet. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 21 May 2019 21:27:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179 PS5, Line 179: check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location]) > what was this change for? I saw a failure here where s3guard showed the temporary file used for copying existed, but s3 still couldn't access it. I think s3guard gets us past the metadata consistency, but the actual storage is still eventually consistent? So, I'm getting rid of the extra copy and having hdfs write the file directly. I saw this once, and otherwise, builds have been stable. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 21 May 2019 21:18:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: Code-Review+1 (1 comment) LGTM +1 I like the idea of using a Python script to generate core-site.xml rather than using a .tmpl file. I think it would be good if someone else more familiar with the mini-cluster setup took a look as well though, since this is a new change. http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179 PS5, Line 179: check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location]) what was this change for? -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 21 May 2019 20:34:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3309/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 21 May 2019 18:37:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh@341 PS4, Line 341: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore" > Is this necessary? I think the default impl uses the NullMetadataStore alre Changed core-site.xml to use the python templates and pushed this in there. http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh@67 PS4, Line 67: hadoop s3guard prune -seconds 1 -meta "dynamodb://${S3GUARD_DYNAMODB_TABLE}" \ > is it necessary to prune a newly initialized S3Guard table? >From what I understand, hadoop s3guard init doesn't clear out a DynamoDB table >if it already exists. If somehow a test configuration aborts without calling >bin/jenkins/release_cloud_resources.sh (e.g. a Jenkins reboot), I prune this >to clear about anything left over from previous runs. http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py@a175 PS4, Line 175: > Should we delete the S3Client as well? It doesn't look like it is used anyw Good point, we can always revive it if we need it later. I removed it. I deferrred removing boto3 (but added a comment), because I think we are going to go through a cleanup when switching to toolchain python 2.7. http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@158 PS4, Line 158: def __init__(self, filesystem_type="HDFS"): > the filesystem_type is just purely for debug messages right? might be usefu Added comment http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@163 PS4, Line 163: hadoop_command = ['hadoop', 'fs'] + command > nit: hdfs instead of hadoop I renamed this class to the HadoopFsCommandLineClient. If there is a benefit to the hdfs command, we can switch, but I treat hadoop fs and hdfs dfs as mostly interchangeable, and this class is mainly for non-HDFS filesystems. http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@170 PS4, Line 170: def create_file(self, path, file_data, overwrite=True): > nit: would be nice to have some docs for these methods, e.g. mention that c Added some function comments. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 21 May 2019 17:46:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Hello Michael Ho, Laszlo Gaal, Philip Zeyliger, David Knupp, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13020 to look at the new patch set (#5). Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. IMPALA-8344: Add support for running the minicluster with S3Guard Some tests can fail on S3 due to some operations that are eventually consistent. S3Guard stores extra metadata in a DynamoDB to solve several consistency issues. This adds support for running the minicluster on S3 with S3Guard. S3Guard is configured by the following environment variables: S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must be exclusively owned by this minicluster. The dataload scripts initialize this table and will purge entries if the table already exists. The table should be in the same region as the S3_BUCKET for the minicluster. S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE These environment variables only impact S3 configurations. The support comes from three pieces: 1. Configuration changes in core-site.xml to add the appropriate parameters. 2. Updating dataload to initialize/purge the s3guard dynamodb table and import data appropriately. 3. Update tests to manipulate files through the HDFS command line rather than through s3 utilities. This takes the filesystem utility code for ABFS (which actually calls HDFS command line), makes it generic, and uses it for S3Guard. Testing: - Ran multiple rounds of s3 tests - Aborted tests in the middle and restarted the s3 tests (to test the s3guard reinitialization code) Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d --- M bin/generate_xml_config.py M bin/impala-config.sh A bin/jenkins/release_cloud_resources.sh M infra/python/deps/requirements.txt M testdata/bin/load-test-warehouse-snapshot.sh A testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py D testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py M tests/query_test/test_scanners_fuzz.py D tests/util/abfs_util.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py D tests/util/s3_util.py 13 files changed, 316 insertions(+), 373 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/5 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 4: (6 comments) mostly minor comments, overall LGTM pending a few comments http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh@341 PS4, Line 341: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore" Is this necessary? I think the default impl uses the NullMetadataStore already, right? http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh@67 PS4, Line 67: hadoop s3guard prune -seconds 1 -meta "dynamodb://${S3GUARD_DYNAMODB_TABLE}" \ is it necessary to prune a newly initialized S3Guard table? http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py@a175 PS4, Line 175: Should we delete the S3Client as well? It doesn't look like it is used anywhere else, right? We can remove the boto3 dependency as well, which decreases the number of Python dependencies we have. http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@158 PS4, Line 158: def __init__(self, filesystem_type="HDFS"): the filesystem_type is just purely for debug messages right? might be useful to mention that http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@163 PS4, Line 163: hadoop_command = ['hadoop', 'fs'] + command nit: hdfs instead of hadoop http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@170 PS4, Line 170: def create_file(self, path, file_data, overwrite=True): nit: would be nice to have some docs for these methods, e.g. mention that create_file actually writes to a temp file on the local fs first and then uploads the file to the target fs -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 16 May 2019 20:41:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 14 May 2019 18:31:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 22:19:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3201/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 22:12:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13020/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/3/bin/impala-config.sh@339 PS3, Line 339: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore" line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py@49 PS3, Line 49: CORE_CONF = HdfsConfig(os.path.join(environ['HADOOP_CONF_DIR'], "core-site.xml")) flake8: E305 expected 2 blank lines after class or function definition, found 1 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 21:28:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh File bin/jenkins/release_cloud_resources.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@29 PS2, Line 29: # This is currently only implemented for s3. > Consider asserting S3GUARD_DYNAMODB_TABLE and S3_BUCKET are non-empty... Added tests for S3_BUCKET, S3GUARD_DYNAMODB_TABLE, and S3GUARD_DYNAMODB_REGION. http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@37 PS2, Line 37: aws s3 rm --recursive --quiet s3://${S3_BUCKET}${TEST_WAREHOUSE_DIR} > This line and line 33 is useful enough that we should either do set +x or e Added echo statements for these two. http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh@62 PS2, Line 62: if [[ "${S3GUARD_ENABLED}" = "true" ]]; then > May be useful to verify that S3GUARD_DYNAMODB_TABLE and S3GUARD_DYNAMODB_RE I added code to bin/impala-config.sh to require those to be set. This file sources that, so I'm relying on bin/impala-config.sh to check it. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@177 PS2, Line 177: cls.filesystem_client = S3Client(S3_BUCKET_NAME) > I think we could coalesce this to use HDFS/Hadoop always, thereby maybe rem I think that makes sense. Switched this to use HdfsCommandLineClient("S3") unconditionally. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@178 PS2, Line 178: elif IS_ABFS: : # ABFS is implemented via HDFS command line client : cls.filesystem_client = HdfsCommandLineClient("ABFS") : elif IS_ADLS: : cls.filesystem_client = ADLSClient(ADLS_STORE_NAME) > Do you know which, if any, of these are getting tested these days? Neither are routinely tested. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@149 PS2, Line 149: # This client is a wrapper around the hdfs command line. This is useful for filesystems > Use pydoc? Done http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@170 PS2, Line 170: f = tempfile.NamedTemporaryFile(delete=False) > Caveat: I commented and then realized this class is just being moved, so ta Good point, that is cleaner http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@181 PS2, Line 181: return True > Caveat: I commented and then realized this class is just being moved, so ta Yeah, it seems like we should care about the return code of the hadoop call. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@213 PS2, Line 213: fname = f['name'].split("/")[-1] > Caveat: I commented and then realized this class is just being moved, so ta Switched this to use os.path.basename(). -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 21:27:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Hello Michael Ho, Laszlo Gaal, Philip Zeyliger, David Knupp, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13020 to look at the new patch set (#4). Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. IMPALA-8344: Add support for running the minicluster with S3Guard Some tests can fail on S3 due to some operations that are eventually consistent. S3Guard stores extra metadata in a DynamoDB to solve several consistency issues. This adds support for running the minicluster on S3 with S3Guard. S3Guard is configured by the following environment variables: S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must be exclusively owned by this minicluster. The dataload scripts initialize this table and will purge entries if the table already exists. The table should be in the same region as the S3_BUCKET for the minicluster. S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE These environment variables only impact S3 configurations. The support comes from three pieces: 1. Configuration changes in core-site.xml to add the appropriate parameters. 2. Updating dataload to initialize/purge the s3guard dynamodb table and import data appropriately. 3. Update tests to manipulate files through the HDFS command line rather than through s3 utilities. This takes the filesystem utility code for ABFS (which actually calls HDFS command line), makes it generic, and uses it for S3Guard. Testing: - Ran multiple rounds of s3 tests - Aborted tests in the middle and restarted the s3 tests (to test the s3guard reinitialization code) Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d --- M bin/impala-config.sh A bin/jenkins/release_cloud_resources.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py D tests/util/abfs_util.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py 8 files changed, 209 insertions(+), 121 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/4 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Hello Michael Ho, Laszlo Gaal, Philip Zeyliger, David Knupp, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13020 to look at the new patch set (#3). Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. IMPALA-8344: Add support for running the minicluster with S3Guard Some tests can fail on S3 due to some operations that are eventually consistent. S3Guard stores extra metadata in a DynamoDB to solve several consistency issues. This adds support for running the minicluster on S3 with S3Guard. S3Guard is configured by the following environment variables: S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must be exclusively owned by this minicluster. The dataload scripts initialize this table and will purge entries if the table already exists. The table should be in the same region as the S3_BUCKET for the minicluster. S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE These environment variables only impact S3 configurations. The support comes from three pieces: 1. Configuration changes in core-site.xml to add the appropriate parameters. 2. Updating dataload to initialize/purge the s3guard dynamodb table and import data appropriately. 3. Update tests to manipulate files through the HDFS command line rather than through s3 utilities. This takes the filesystem utility code for ABFS (which actually calls HDFS command line), makes it generic, and uses it for S3Guard. Testing: - Ran multiple rounds of s3 tests - Aborted tests in the middle and restarted the s3 tests (to test the s3guard reinitialization code) Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d --- M bin/impala-config.sh A bin/jenkins/release_cloud_resources.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py D tests/util/abfs_util.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py 8 files changed, 208 insertions(+), 121 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/3 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py@49 PS3, Line 49: CORE_CONF = HdfsConfig(os.path.join(environ['HADOOP_CONF_DIR'], "core-site.xml")) > flake8: E305 expected 2 blank lines after class or function definition, fou Done -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 21:29:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh@339 PS4, Line 339: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore" line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 13 May 2019 21:30:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh@62 PS2, Line 62: if [[ "${S3GUARD_ENABLED}" = "true" ]]; then May be useful to verify that S3GUARD_DYNAMODB_TABLE and S3GUARD_DYNAMODB_REGION are not empty if S3GUARD_ENABLED is "true" ? -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 29 Apr 2019 20:24:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@170 PS2, Line 170: f = tempfile.NamedTemporaryFile(delete=False) Caveat: I commented and then realized this class is just being moved, so take this as non-blocking. Use a context manager? with tempfile.NamedTemporaryFile(delete=False) as tmp_file: tmp_file.write(file_data) That calls close() for you automatically. You can later still get tmp_file.name on a closed file handle, so no need for tmp_path. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@181 PS2, Line 181: return True Caveat: I commented and then realized this class is just being moved, so take this as non-blocking. Maybe do the same thing as the previous method? (status, stdout, stderr) = self._hadoop_fs_shell(['-mkdir', '-p', fixed_path]) return status == 0 http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@213 PS2, Line 213: fname = f['name'].split("/")[-1] Caveat: I commented and then realized this class is just being moved, so take this as non-blocking. Don't if it helps in this case, but the python os lib has a lot of path manipulation helpers, e.g.: >>> import os >>> f = 'foo/bar/baz' >>> os.path.dirname(f) 'foo/bar' >>> os.path.basename(f) 'baz' >>> os.path.abspath(f) '//foo/bar/baz' -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 29 Apr 2019 20:10:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh@312 PS2, Line 312: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore" I looked into whether there were other interesting impls here and couldn't find any. hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java doesn't work across processes. If we had something that went cross-process, we wouldn't really need Dynamo in our world. http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh File bin/jenkins/release_cloud_resources.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@29 PS2, Line 29: # This is currently only implemented for s3. Consider asserting S3GUARD_DYNAMODB_TABLE and S3_BUCKET are non-empty... http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@37 PS2, Line 37: aws s3 rm --recursive --quiet s3://${S3_BUCKET}${TEST_WAREHOUSE_DIR} This line and line 33 is useful enough that we should either do set +x or echo what we're about to do. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@177 PS2, Line 177: cls.filesystem_client = S3Client(S3_BUCKET_NAME) I think we could coalesce this to use HDFS/Hadoop always, thereby maybe removing the S3Client code? http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@178 PS2, Line 178: elif IS_ABFS: : # ABFS is implemented via HDFS command line client : cls.filesystem_client = HdfsCommandLineClient("ABFS") : elif IS_ADLS: : cls.filesystem_client = ADLSClient(ADLS_STORE_NAME) Do you know which, if any, of these are getting tested these days? http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@149 PS2, Line 149: # This client is a wrapper around the hdfs command line. This is useful for filesystems Use pydoc? class Foo(bla): """bla bla bla""" -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 29 Apr 2019 18:15:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2851/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 22 Apr 2019 20:45:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh@310 PS2, Line 310: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore" line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 22 Apr 2019 20:02:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13020 to look at the new patch set (#2). Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. IMPALA-8344: Add support for running the minicluster with S3Guard Some tests can fail on S3 due to some operations that are eventually consistent. S3Guard stores extra metadata in a DynamoDB to solve several consistency issues. This adds support for running the minicluster on S3 with S3Guard. S3Guard is configured by the following environment variables: S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must be exclusively owned by this minicluster. The dataload scripts initialize this table and will purge entries if the table already exists. The table should be in the same region as the S3_BUCKET for the minicluster. S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE These environment variables only impact S3 configurations. The support comes from three pieces: 1. Configuration changes in core-site.xml to add the appropriate parameters. 2. Updating dataload to initialize/purge the s3guard dynamodb table and import data appropriately. 3. Update tests to manipulate files through the HDFS command line rather than through s3 utilities. This takes the filesystem utility code for ABFS (which actually calls HDFS command line), makes it generic, and uses it for S3Guard. Testing: - Ran multiple rounds of s3 tests - Aborted tests in the middle and restarted the s3 tests (to test the s3guard reinitialization code) Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d --- M bin/impala-config.sh A bin/jenkins/release_cloud_resources.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py D tests/util/abfs_util.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py 8 files changed, 185 insertions(+), 117 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/2 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2792/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 16 Apr 2019 00:14:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13020 Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. IMPALA-8344: Add support for running the minicluster with S3Guard Some tests can fail on S3 due to some operations that are eventually consistent. S3Guard stores extra metadata in a DynamoDB to solve several consistency issues. This adds support for running the minicluster on S3 with S3Guard. S3Guard is configured by the following environment variables: S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must be exclusively owned by this minicluster. The dataload scripts destroy and recreate this table. The table should be in the same region as the S3_BUCKET for the minicluster. S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE These environment variables only impact S3 configurations. The support comes from three pieces: 1. Configuration changes in core-site.xml to add the appropriate parameters. 2. Updating dataload to destroy/create the s3guard dynamodb table and import data appropriately. 3. Update tests to manipulate files through the HDFS command line rather than through s3 utilities. This takes the filesystem utility code for ABFS (which actually calls HDFS command line), makes it generic, and uses it for S3Guard. Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d --- M bin/impala-config.sh A bin/jenkins/release_cloud_resources.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py D tests/util/abfs_util.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py 8 files changed, 190 insertions(+), 117 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/1 -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13020/1/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/1/bin/impala-config.sh@310 PS1, Line 310: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore" line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Apr 2019 23:52:03 + Gerrit-HasComments: Yes