[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-12 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-690866308


   @umamaheswararao Thanks a lot!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-12 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-690866308







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-12 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-690866308


   @umamaheswararao Thanks a lot!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-10 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-690866308


   @umamaheswararao Thanks a lot!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-10 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-690016227


   @umamaheswararao  Totally make sense, updated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-09 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-690001235


   @umamaheswararao  Sure. In fact, I did a major refactor in our internal repo 
to make it switch case-based. 
(https://github.com/apache/hadoop/pull/424/files#diff-69fd14ba63365b6a428bf7142c463990R511)
 However, I didn't put it here as it's harder to review. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-09 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-689975586


   @umamaheswararao  Thanks a lot!
   There's one checkstyle violation. However, I guess it's not introduced by 
the patch. Yetus recognized a long function name which is unchanged in the path.
   
   > 
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:487:
  protected InodeTree(final Configuration config, final String viewName,:3: 
Method length is 191 lines (max allowed is 150). [MethodLength]
   
   As for the failure UTs, I guess they are not related to the patch. 
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-05 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-687687926


   Thanks for the comments!
   > > I guess most caller of getMountPoints wants to traverse all the file 
systems to do some operation. E.g. setVerifyChecksum(). We didn't see issues on 
our internal Yarn + HDFS and Yarn + GCS clusters. The usage pattern includes 
but not limited to MR, Spark, Presto, Vertica loading and etc. But it's 
possible that some users might rely on these APIs.
   > 
   > In YarnClient seems to be collecting tokens from all DelegationTokenIssuer.
   > in DelegationTokenIssuer#collectDelegationTokens
   > 
   > ```
   >  // Now collect the tokens from the children.
   > final DelegationTokenIssuer[] ancillary =
   > issuer.getAdditionalTokenIssuers();
   > if (ancillary != null) {
   >   for (DelegationTokenIssuer subIssuer : ancillary) {
   > collectDelegationTokens(subIssuer, renewer, credentials, tokens);
   >   }
   > }
   > ```
   > 
   > If you look here issuer is current fs and it's trying to get 
additionalTokenIssuers.
   > The default implementation of getDelegationTokenIssuers at FileSystem.java 
is simply getting all ChildFileSystems.
   > 
   > ```
   > @InterfaceAudience.Private
   >   @Override
   >   public DelegationTokenIssuer[] getAdditionalTokenIssuers()
   >   throws IOException {
   > return getChildFileSystems();
   >   }
   > ```
   > 
   > This will get all the child file systems available. Currently the 
implementation is getChildFileSystems in ViewFileSystem is like below:
   > 
   > ```
   > @Override
   >   public FileSystem[] getChildFileSystems() {
   > List> mountPoints =
   > fsState.getMountPoints();
   > Set children = new HashSet();
   > for (InodeTree.MountPoint mountPoint : mountPoints) {
   >   FileSystem targetFs = mountPoint.target.targetFileSystem;
   >   children.addAll(Arrays.asList(targetFs.getChildFileSystems()));
   > }
   > 
   > if (fsState.isRootInternalDir() && fsState.getRootFallbackLink() != 
null) {
   >   children.addAll(Arrays.asList(
   >   fsState.getRootFallbackLink().targetFileSystem
   >   .getChildFileSystems()));
   > }
   > return children.toArray(new FileSystem[]{});
   >   }
   > ```
   > 
   > It's iterating over mount points available getting all targetFileSystems. 
In the case of REGEX based mount points, we will not have any childFileSystems 
available via getChildFileSystems call.
   > We also implemented ViewDistributedFileSystem to provide hdfs specific API 
compatibility. Here also we used getChildFileSystems for some APIs.
   > 
   > > Returning a MountPint with special FileSystem for Regex Mount points. We 
could cache the initialized fileSystem under the regex mountpoint and perform 
the operation. For filesystems that might appear in the future, we could cache 
the past calls from callers and try to apply it or just not support it.
   > 
   > I am thinking that, how about adding the resolved mount points from 
RegxBased to MountPoints list? So, that when user calls getMounts, it will 
simply return whatever mount points so far inited. How many unique mount points 
could be there in total with Regx based in practice (resolved mappings)? We 
should document that, with RegEX based mount points, getMountPoints will return 
only currently resolved mount points.
   
   Totally make sense. I agree it won't work well with delegation tokens now. A 
few context here, the regex mount point feature was built more than 2 years 
ago. It was before Delegation Tokens was introduced (HADOOP-14556). The version 
we are using is also before HADOOP-14556. I agree we need to do more work to 
support delegation tokens.
 +1 on document it. Internally, we are using a mixed model of regex mount 
points and normal mount points. We mainly use regex mount points for GCS 
buckets. The difference from HDFS is we could use limited user namespaces to 
represent /user. But it's hard to do it for cloud storage as we will have many 
more buckets. We could make it clear to users the pros and cons of regex mount 
points.
   
   > > We did see an issue with addDelegationTokens in the secure Hadoop 
cluster. But the problem we met is not all normal mountpoints are secure. So 
the API caused a problem when it tries to initialize all children's file 
systems. We took a workaround by making it path-based. As for 
getDelegationTokens, I guess the problem is similar. We didn't see issues 
because it's not used. Could we make it path based too?
   > 
   > Certainly we can make it uri path based. However users need to make use of 
it and it could be a long term improvement because users would not change 
immediately to new APIs what we introduce now. It will take longer time for 
upstream projects to change.
   > 
   > > Could we make the inner cache a thread-safe structure and track all the 
opened file systems under 

[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-02 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-686183399


   Thanks @umamaheswararao 
   
   
   
   @umamaheswararao  Thanks for the comments. Please see the reply inline.
   > Hi @JohnZZGithub, I got few other points to discuss.
   > 
   > 1. We have exposed getMountPoints API. It seems we can't return any mount 
points from REGEX based because you would not know until you got src paths to 
resolve and find real target fs. What should we do for this API?
   
   It's a great question. I guess most caller of getMountPoints wants to 
traverse all the file systems to do some operation. E.g. setVerifyChecksum(). 
We didn't see issues on our internal Yarn + HDFS and Yarn + GCS clusters. The 
usage pattern includes but not limited to MR, Spark, Presto, Vertica loading 
and etc. But it's possible that some users might rely on these APIs. I could 
see two options forward:
   1. Returning a MountPint with special FileSystem for Regex Mount points. We 
could cache the initialized fileSystem under the regex mountpoint and perform 
the operation. For filesystems that might appear in the future, we could cache 
the past calls from callers and try to apply it or just not support it. 
   2. We could indicate that we don't support such APIs for regex mount points.
   And to extend the topic a little bit, this kind of ViewFileSystem API (API 
which tries to visit all file systems) caused several problems for us.  E.g. 
setVerifyChecksum() initialized a file system for a mount point users didn't 
want to use it all. And the initialization of the file system will fail as it 
requires credentials during initialization. Users don't have it as it never 
means to visit the mount point. We developed a LazyChRootedFileSystem on top of 
every target system (not public) to do lazy initialization for path-based APIs. 
But it's hard to tackle APIs without path passed in. So to summarize, we see 
cases users want to avoid these non-path based API to trigger actions on every 
child file system. In the meantime, some users(though rare in our scenarios) 
might want to use these APIs applied to all children's filesystems. I feel it's 
hard to satisfy both needs.
   
   > 2. Other API is getDelegationTokenIssuers. Applications like YARN uses 
this API to get all child fs delegation tokens. This also will not work for 
REGEX based mount points.
We did see an issue with addDelegationTokens in the secure Hadoop cluster. 
But the problem we met is not all normal mountpoints are secure. So the API 
caused a problem when it tries to initialize all children's file systems. We 
took a workaround by making it path-based. As for getDelegationTokens, I guess 
the problem is similar. We didn't see issues because it's not used. Could we 
make it path based too?  Or we could take the approach stated in problem one.
   
   > 3. Other question is how this child filesystem objects gets closed. There 
was an issue with [ViewFileSystem#close | 
https://issues.apache.org/jira/browse/HADOOP-15565 ]. I would like to know how 
that get addressed in this case as don't keep anything in InnerCache.
Could we make the inner cache a thread-safe structure and track all the 
opened file systems under regex mount points? 
   
   These are really great points, thanks a lot.
   
   
   
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-01 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-685248748


   @umamaheswararao  Thanks. I did a rebase and updated the patch. 
   Created  https://issues.apache.org/jira/browse/HADOOP-17238 to track the 
mount points resolution cache problem. 
   As for `porting into FileContext`, do you mean fix the APIs which are not 
implemented in ViewFileSystem so FileContext could find the right path? 
   E.g. createMultipartUploader api.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-01 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-685028507


   @umamaheswararao  Sure, thanks a ton!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-25 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-680179907


   @umamaheswararao  No worries, thanks a lot!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-18 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-675689087


   @umamaheswararao  thanks for the review. I update the diff based on the 
comments. There might be still some issues regarding docs and comments. I will 
continue to watch it. Thanks 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-11 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-672293070


   Thank you! let me check ViewFs.java. Function wise, this patch worked for MR 
jobs and HDFS use cases in our internal clusters. 
   
   > I will review it in a day or two, Thanks
   > BTW, you may need the similar changes in ViewFs.java as well, I think nfly 
also missed there.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-10 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-671589516


   Just did a rebase and updated the PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-07 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-670807166


   The UTs failure are not related to the change. Here're some references:
   https://issues.apache.org/jira/browse/HADOOP-15891 
   Design doc: 
https://issues.apache.org/jira/secure/attachment/12946315/HDFS-13948_%20Regex%20Link%20Type%20In%20Mount%20Table-v1.pdf
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-05 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-669606794


   Here're the UT failed. I didn't see them related. But I could double-check. 
Thanks 
   
   
   [ERROR] Failures: 
   [ERROR]   TestBPOfferService.testMissBlocksWhenReregister:350
   [ERROR]   TestDataNodeErasureCodingMetrics.testFullBlock:97->doTest:205 
Wrongly computed block reconstruction work
   [ERROR]   
TestNameNodeRetryCacheMetrics.testRetryCacheMetrics:95->checkMetrics:103 
CacheHit expected:<2> but was:<0>
   [ERROR] Errors: 
   [ERROR]   
TestHDFSContractMultipartUploader>AbstractContractMultipartUploaderTest.testConcurrentUploads:815
 ? IllegalArgument
   [ERROR]   
TestBlockTokenWithDFSStriped.testRead:92->TestBlockTokenWithDFS.doTestRead:508->isBlockTokenExpired:139->TestBlockTokenWithDFS.isBlockTokenExpired:633
 ? NullPointer
   [ERROR]   
TestBootstrapStandby.testSuccessfulBaseCase:130->restartNameNodesFromIndex:342 
? Bind
   [ERROR]   
TestExternalStoragePolicySatisfier.testChooseInSameDatanodeWithONESSDShouldNotChooseIfNoSpace:1064
 ? Timeout
   [INFO] 
   [ERROR] Tests run: 6500, Failures: 3, Errors: 4, Skipped: 23
   [INFO] 
   [ERROR] There are test failures.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-03 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-668314560


   @templedf  It will be great if you could help with the review, thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] JohnZZGithub commented on pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-03 Thread GitBox


JohnZZGithub commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-668288316


   @liuml07  Thanks, this is a feature adopted inside our company for almost 
two years. The code is almost as same as our internal branch except I removed 
some refactored code to make it easier to review. Seemed like the rebase caused 
some UT failures. Let me fix the UTs and update the user guide.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org