[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942409#comment-16942409 ] Hudson commented on HADOOP-16578: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17431 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17431/]) HADOOP-16578 : Avoid FileSystem API calls when FileSystem already exists (dazhou: rev 770adc5d4abd71c58812066cf691fc565efea64c) * (edit) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestGetNameSpaceEnabled.java * (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942406#comment-16942406 ] Da Zhou commented on HADOOP-16578: -- committed, thanks! > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939244#comment-16939244 ] Sneha Vijayarajan commented on HADOOP-16578: Test results using command : mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=1 clean verify Results from account without namespace enabled (East US2): {code:java} [INFO] Results: [INFO] [INFO] Tests run: 42, Failures: 0, Errors: 0, Skipped: 0 [WARNING] Tests run: 394, Failures: 0, Errors: 0, Skipped: 207 [WARNING] Tests run: 190, Failures: 0, Errors: 0, Skipped: 23 {code} Results from account with namespace enabled (East US2) {code:java} [INFO] Results: [INFO] [INFO] Tests run: 42, Failures: 0, Errors: 0, Skipped: 0 [WARNING] Tests run: 394, Failures: 0, Errors: 0, Skipped: 21 [WARNING] Tests run: 190, Failures: 0, Errors: 0, Skipped: 23 {code} > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936124#comment-16936124 ] Da Zhou commented on HADOOP-16578: -- [~snvijaya] thank you for your work. I added one comment in the PR. Could you also run the tests twice ? one is with XNS account and another with non-XNS account, then share the results here. When switching between XNS account and non-XNS account, below account property needs to be updated: {code:java} fs.azure.test.namespace.enabled false {code} The error showed in your last comment should be related to this setting. > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935387#comment-16935387 ] Sneha Vijayarajan commented on HADOOP-16578: [~DanielZhou] - Can you please review the PR change. > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935386#comment-16935386 ] Sneha Vijayarajan commented on HADOOP-16578: PR was tested with a East US account and command line: mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify [INFO] Results: [INFO] [INFO] Tests run: 42, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Failures: [ERROR] ITestGetNameSpaceEnabled.testNonXNSAccount:57->Assert.assertFalse:64->Assert.assertTrue:41->Assert.fail:88 Expecting getIsNamespaceEnabled() return false [ERROR] Errors: [ERROR] ITestClientUrlScheme.testClientUrlScheme:85->AbstractAbfsIntegrationTest.getFileSystem:197 » AbfsRestOperation [INFO] [ERROR] Tests run: 382, Failures: 1, Errors: 1, Skipped: 21 [INFO] [WARNING] Tests run: 190, Failures: 0, Errors: 0, Skipped: 23 > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934265#comment-16934265 ] Vinay Badami commented on HADOOP-16578: --- Thanks - [~snvijaya] please proceed > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934170#comment-16934170 ] Sneha Vijayarajan commented on HADOOP-16578: Filesystem REST API calls for which ABFS driver has supporting code are: * Create FileSystem - REST Syntax - PUT https://\{accountName}.\{dnsSuffix}/\{filesystem}?resource=filesystem * Get FileSystem Properties - REST Syntax - HEAD https://\{accountName}.\{dnsSuffix}/\{filesystem}?resource=filesystem * Set FileSystem Properties - REST Syntax - PATCH https://\{accountName}.\{dnsSuffix}/\{filesystem}?resource=filesystem * Delete FileSystem - REST Syntax - DELETE https://\{accountName}.\{dnsSuffix}/\{filesystem}?resource=filesystem [FileSystem REST API Documentation: [https://docs.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2]] But among the above, AzureBlobFileSystem would currently trigger only below 2 APIs : * Create FileSystem - Called in AzureBlobFileSystem ::initialize(), if "fs.azure.createRemoteFileSystemDuringInitialization" is true and GetFileSystemProperties call had returned 404 Not Found * Get FileSystem Properties - Called in below 2 places # On AzureBlobFileSystem::initialize(), to check FileSystem existence if "fs.azure.createRemoteFileSystemDuringInitialization" is true # In GetFileStatus API code flow, if the request is for root path and account is not Namespace enabled, GetFileSystemProperties() gets called. This is the expected as GetPathProperties() does not support path as root if account is not namespace enabled. Hence GetFileSystemProperties() call will be have to be retained in this flow. Proposed change is to convert the GetFileSystemProperties call in AzureBlobFileSystem::initialize() to GetFileStatus() on the root path. Hence if FileSystem is already present and account is Namespace enabled, no FileSystem API will be called. Existing flow in which if the config createRemoteFileSystemDuringInitialization is true and FileSystem doesnt exists will continue to work as it does today. i.e, create FileSystem request will be sent to server which will validate RBAC role needed for container create. [RBAC - ADLS Gen2 Documenation : [https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control#role-based-access-control] ] > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Assignee: Sneha Vijayarajan >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16578) ABFS: fileSystemExists() should not call container level apis
[ https://issues.apache.org/jira/browse/HADOOP-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931229#comment-16931229 ] Vinay Badami commented on HADOOP-16578: --- Let us use this to audit what container level api we are calling from abfs driver. > ABFS: fileSystemExists() should not call container level apis > - > > Key: HADOOP-16578 > URL: https://issues.apache.org/jira/browse/HADOOP-16578 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Da Zhou >Priority: Major > Fix For: 3.3.0 > > > ABFS driver should not use container level api "Get Container Properties" as > there is no concept of container in HDFS, and this caused some RBAC check > issue. > Fix: use getFileStatus() to check if the container exists. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org