Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS ......................................................................
Patch Set 1: (12 comments) Thanks for the review, Matt. http://gerrit.cloudera.org:8080/#/c/6910/1//COMMIT_MSG Commit Message: PS1, Line 18: This client seems to have some consistency : issues. > Is this a known issue? Any tickets to reference for this? No, I've opened an IMPALA Jira to track this. We're just talking to an ADLS engineer to confirm. I'll post the updates in that JIRA and we can resolve this when we have more information. http://gerrit.cloudera.org:8080/#/c/6910/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS1, Line 774: S3 > update Done http://gerrit.cloudera.org:8080/#/c/6910/1/infra/python/deps/compiled-requirements.txt File infra/python/deps/compiled-requirements.txt: Line 35: pycparser == 2.17 > IIRC the indentation is meant to indicate dependency. In the Kudu case belo Done http://gerrit.cloudera.org:8080/#/c/6910/1/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: PS1, Line 61: seems > It'd be good to make a conclusive statement. I assume the consistency model It is supposed to be read-after-write consistent, but this looks like a bug in the Python client. We're reaching out to a Microsoft Engineer to clarify this issue. I've opened an IMPALA Jira to track this which I've added to the comments here. Any progress on this issue will be posted there and we can revisit once this is fixed. Line 156: print self.filesystem_client.ls( > remove? Done Line 304: # The ADLS client seems to be inconsistent too. So skip for it. > same as above Done http://gerrit.cloudera.org:8080/#/c/6910/1/tests/metadata/test_stale_metadata.py File tests/metadata/test_stale_metadata.py: PS1, Line 22: SkipIfADLS > not used Done http://gerrit.cloudera.org:8080/#/c/6910/1/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 48: @SkipIfADLS.slow_client > why does this work on s3 but not adls? This technically is not guaranteed to work for S3, since they have eventual deletes. However, we've never seen it fail, probably because this is a very small directory. On the other hand, for ADLS, this test almost always fails, hence the marking with 'slow_client' (tracked by IMPALA-5335) http://gerrit.cloudera.org:8080/#/c/6910/1/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: Line 164: @SkipIfADLS.hdfs_block_size > does this work on s3? Yes, S3 has a config option to control the block size. ADLS currently does not have such an option. The 'PARQUET_FILE_SIZE' is passed into hdfsFileOpen() as the block size to be used while writing into files. This is not respected by ADLS. I'm talking with the ADLS connector folks about this. Will open a JIRA to fix this if they confirm it's a bug or if we're not using the API correctly. http://gerrit.cloudera.org:8080/#/c/6910/1/tests/util/adls_util.py File tests/util/adls_util.py: Line 26: class ADLSClient(BaseFilesystem): > nit: newline above this Done Line 38: f.write(file_data) > this has a return value, #bytes I think, it'd be good to check, e.g. log an Done PS1, Line 47: # The ADLS Python client doesn't support cp() yet, so we have to download and : # reupload to the destination. > ? http://azure-datalake-store.readthedocs.io/en/latest/api.html#azure.datal When you click the [source] link on that, we get this: http://azure-datalake-store.readthedocs.io/en/latest/_modules/azure/datalake/store/core.html#AzureDLFileSystem.cp So it's a mistake in their docs. -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
