[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2017-11-06 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r149173203
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws 
an exception
+  // if access is denied. This API was added in HDFS 2.6 (see 
HDFS-6570).
+  // It is less expensive (than listStatus which was being used 
before) and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
access operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
+  tryListStatus = true;
--- End diff --

It can be trusted fine for regular file systems HDFS, MapR FS, Mac OS 
Extended File System etc.  I am not sure if it works for windows or not. We 
were never able to figure out whether it is a problem with the test setup or it 
is actually not supported. 

Updated the diffs with your review comments taken care of. Please take a 
look.



---


[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2017-11-05 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r148991210
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws 
an exception
+  // if access is denied. This API was added in HDFS 2.6 (see 
HDFS-6570).
+  // It is less expensive (than listStatus which was being used 
before) and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
access operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
+  tryListStatus = true;
+}
+
+// if fs.access fails for some reason, fall back to listStatus.
+if (tryListStatus) {
+  try {
+fs.listStatus(wsPath);
+  } catch (final UnsupportedOperationException e) {
+logger.debug("The filesystem for this workspace does not support 
listStatus operation.", e);
+  } catch (final FileNotFoundException | AccessControlException e) {
+logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
--- End diff --

let's add `using listStatus` in the log statement to differentiate with 
previous case


---


[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2017-11-05 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r148991415
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws 
an exception
+  // if access is denied. This API was added in HDFS 2.6 (see 
HDFS-6570).
+  // It is less expensive (than listStatus which was being used 
before) and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
access operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
+  tryListStatus = true;
--- End diff --

Is access function never trustworthy for negative cases ? Or is it windows 
platform specific issue ?


---


[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2016-11-15 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r88097089
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
this operation.", e);
--- End diff --

I am not returning false to be consistent with the existing behavior.  If 
the user does not have permission, we fail when we try to read the file during 
execution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2016-11-15 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r88072130
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
this operation.", e);
--- End diff --

We should return false in this case as well. Better to have a local boolean 
variable (like boolean isAccessAllowed;) and set it accordingly in try and 
catch. Then have just 1 return statement in the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2016-11-15 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r88072322
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
--- End diff --

Also can we please replace the comments with why we are using "fs.access" 
api instead. Will be good for future use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #652: DRILL-4990:Use new HDFS API access instead of listS...

2016-11-14 Thread ppadma
GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/652

DRILL-4990:Use new HDFS API access instead of listStatus to check if …

…users have permissions to access workspace.

Manually tested the fix with impersonation enabled. All unit and regression 
tests pass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill DRILL-4990

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #652






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---