Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
......................................................................


Patch Set 1:

(12 comments)

Thanks for the review, Matt.

http://gerrit.cloudera.org:8080/#/c/6910/1//COMMIT_MSG
Commit Message:

PS1, Line 18: This client seems to have some consistency
            : issues.
> Is this a known issue? Any tickets to reference for this?
No, I've opened an IMPALA Jira to track this. We're just talking to an ADLS 
engineer to confirm. I'll post the updates in that JIRA and we can resolve this 
when we have more information.


http://gerrit.cloudera.org:8080/#/c/6910/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 774: S3
> update
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/infra/python/deps/compiled-requirements.txt
File infra/python/deps/compiled-requirements.txt:

Line 35:   pycparser == 2.17
> IIRC the indentation is meant to indicate dependency. In the Kudu case belo
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS1, Line 61: seems
> It'd be good to make a conclusive statement. I assume the consistency model
It is supposed to be read-after-write consistent, but this looks like a bug in 
the Python client. We're reaching out to a Microsoft Engineer to clarify this 
issue. I've opened an IMPALA Jira to track this which I've added to the 
comments here. Any progress on this issue will be posted there and we can 
revisit once this is fixed.


Line 156:       print self.filesystem_client.ls(
> remove?
Done


Line 304:       # The ADLS client seems to be inconsistent too. So skip for it.
> same as above
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/metadata/test_stale_metadata.py
File tests/metadata/test_stale_metadata.py:

PS1, Line 22: SkipIfADLS
> not used
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 48:   @SkipIfADLS.slow_client
> why does this work on s3 but not adls?
This technically is not guaranteed to work for S3, since they have eventual 
deletes. However, we've never seen it fail, probably because this is a very 
small directory.

On the other hand, for ADLS, this test almost always fails, hence the marking 
with 'slow_client' (tracked by IMPALA-5335)


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 164:   @SkipIfADLS.hdfs_block_size
> does this work on s3?
Yes, S3 has a config option to control the block size. ADLS currently does not 
have such an option.
The 'PARQUET_FILE_SIZE' is passed into hdfsFileOpen() as the block size to be 
used while writing into files. This is not respected by ADLS. I'm talking with 
the ADLS connector folks about this. Will open a JIRA to fix this if they 
confirm it's a bug or if we're not using the API correctly.


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/util/adls_util.py
File tests/util/adls_util.py:

Line 26: class ADLSClient(BaseFilesystem):
> nit: newline above this
Done


Line 38:       f.write(file_data)
> this has a return value, #bytes I think, it'd be good to check, e.g. log an
Done


PS1, Line 47:     # The ADLS Python client doesn't support cp() yet, so we have 
to download and
            :     # reupload to the destination.
> ? http://azure-datalake-store.readthedocs.io/en/latest/api.html#azure.datal
When you click the [source] link on that, we get this:
http://azure-datalake-store.readthedocs.io/en/latest/_modules/azure/datalake/store/core.html#AzureDLFileSystem.cp

So it's a mistake in their docs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to