[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r149173203 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory( */ public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +boolean tryListStatus = false; try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + // access API checks if a user has certain permissions on a file or directory. + // returns normally if requested permissions are granted and throws an exception + // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570). + // It is less expensive (than listStatus which was being used before) and hides the + // complicated access control logic underneath. + fs.access(wsPath, FsAction.READ); } catch (final UnsupportedOperationException e) { - logger.trace("The filesystem for this workspace does not support this operation.", e); + logger.debug("The filesystem for this workspace does not support access operation.", e); + tryListStatus = true; } catch (final FileNotFoundException | AccessControlException e) { - return false; + logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e); + tryListStatus = true; --- End diff -- It can be trusted fine for regular file systems HDFS, MapR FS, Mac OS Extended File System etc. I am not sure if it works for windows or not. We were never able to figure out whether it is a problem with the test setup or it is actually not supported. Updated the diffs with your review comments taken care of. Please take a look. ---
[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r148991210 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory( */ public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +boolean tryListStatus = false; try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + // access API checks if a user has certain permissions on a file or directory. + // returns normally if requested permissions are granted and throws an exception + // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570). + // It is less expensive (than listStatus which was being used before) and hides the + // complicated access control logic underneath. + fs.access(wsPath, FsAction.READ); } catch (final UnsupportedOperationException e) { - logger.trace("The filesystem for this workspace does not support this operation.", e); + logger.debug("The filesystem for this workspace does not support access operation.", e); + tryListStatus = true; } catch (final FileNotFoundException | AccessControlException e) { - return false; + logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e); + tryListStatus = true; +} + +// if fs.access fails for some reason, fall back to listStatus. +if (tryListStatus) { + try { +fs.listStatus(wsPath); + } catch (final UnsupportedOperationException e) { +logger.debug("The filesystem for this workspace does not support listStatus operation.", e); + } catch (final FileNotFoundException | AccessControlException e) { +logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e); --- End diff -- let's add `using listStatus` in the log statement to differentiate with previous case ---
[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r148991415 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory( */ public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +boolean tryListStatus = false; try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + // access API checks if a user has certain permissions on a file or directory. + // returns normally if requested permissions are granted and throws an exception + // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570). + // It is less expensive (than listStatus which was being used before) and hides the + // complicated access control logic underneath. + fs.access(wsPath, FsAction.READ); } catch (final UnsupportedOperationException e) { - logger.trace("The filesystem for this workspace does not support this operation.", e); + logger.debug("The filesystem for this workspace does not support access operation.", e); + tryListStatus = true; } catch (final FileNotFoundException | AccessControlException e) { - return false; + logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e); + tryListStatus = true; --- End diff -- Is access function never trustworthy for negative cases ? Or is it windows platform specific issue ? ---
[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r88097089 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,15 +152,11 @@ public WorkspaceSchemaFactory( public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + fs.access(wsPath, FsAction.READ); } catch (final UnsupportedOperationException e) { - logger.trace("The filesystem for this workspace does not support this operation.", e); + logger.debug("The filesystem for this workspace does not support this operation.", e); --- End diff -- I am not returning false to be consistent with the existing behavior. If the user does not have permission, we fail when we try to read the file during execution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r88072130 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,15 +152,11 @@ public WorkspaceSchemaFactory( public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + fs.access(wsPath, FsAction.READ); } catch (final UnsupportedOperationException e) { - logger.trace("The filesystem for this workspace does not support this operation.", e); + logger.debug("The filesystem for this workspace does not support this operation.", e); --- End diff -- We should return false in this case as well. Better to have a local boolean variable (like boolean isAccessAllowed;) and set it accordingly in try and catch. Then have just 1 return statement in the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/652#discussion_r88072322 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -151,15 +152,11 @@ public WorkspaceSchemaFactory( public boolean accessible(final String userName) throws IOException { final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem --- End diff -- Also can we please replace the comments with why we are using "fs.access" api instead. Will be good for future use. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...
GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/652 DRILL-4990:Use new HDFS API access instead of listStatus to check if ⦠â¦users have permissions to access workspace. Manually tested the fix with impersonation enabled. All unit and regression tests pass. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-4990 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #652 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---