[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-09 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r486063312



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ConfigUtil.java
##
@@ -166,6 +166,42 @@ public static void addLinkNfly(final Configuration conf, 
final String src,
 addLinkNfly(conf, getDefaultMountTableName(conf), src, null, targets);
   }
 
+
+  /**
+   * Add a LinkRegex to the config for the specified mount table.
+   * @param conf - get mountable config from this conf
+   * @param mountTableName - the mountable name of the regex config item
+   * @param srcRegex - the src path regex expression that applies to this 
config
+   * @param targetStr - the string of target path
+   */

Review comment:
   This method not used anywhere?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-09 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r485953722



##
File path: hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFs.md
##
@@ -366,6 +366,69 @@ Don't want to change scheme or difficult to copy 
mount-table configurations to a
 
 Please refer to the [View File System Overload Scheme 
Guide](./ViewFsOverloadScheme.html)
 
+Regex Pattern Based Mount Points
+
+
+The view file system mount points were a Key-Value based mapping system. It is 
not friendly for user cases which mapping config could be abstracted to rules. 
E.g. Users want to provide a GCS bucket per user and there might be thousands 
of users in total. The old key-value based approach won't work well for several 
reasons:
+
+1. The mount table is used by FileSystem clients. There's a cost to spread the 
config to all clients and we should avoid it if possible. The [View File System 
Overload Scheme Guide](./ViewFsOverloadScheme.html) could help the distribution 
by central mount table management. But the mount table still have to be updated 
on every change. The change could be greatly avoided if provide a rule-based 
mount table.
+
+2. The client have to understand all the KVs in the mount table. This is not 
ideal when the mountable grows to thousands of items. E.g. thousands of file 
systems might be initialized even users only need one. And the config itself 
will become bloated at scale.
+
+### Understand the Difference
+
+In the key-value based mount table, view file system treats every mount point 
as a partition. There's several file system APIs which will lead to operation 
on all partitions. E.g. there's an HDFS cluster with multiple mount. Users want 
to run “hadoop fs -put file viewfs://hdfs.namenode.apache.org/tmp/” cmd to copy 
data from local disk to our HDFS cluster. The cmd will trigger ViewFileSystem 
to call setVerifyChecksum() method which will initialize the file system for 
every mount point.
+For a regex-base rule mount table entry, we couldn't know what's corresponding 
path until parsing. So the regex based mount table entry will be ignored on 
such cases. The file system (ChRootedFileSystem) will be created upon 
accessing. But the underlying file system will be cached by inner cache of 
ViewFileSystem.

Review comment:
   Thanks for having separate JIRA. It make sense to me. !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-08 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r485338866



##
File path: hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFs.md
##
@@ -366,6 +366,69 @@ Don't want to change scheme or difficult to copy 
mount-table configurations to a
 
 Please refer to the [View File System Overload Scheme 
Guide](./ViewFsOverloadScheme.html)
 
+Regex Pattern Based Mount Points
+
+
+The view file system mount points were a Key-Value based mapping system. It is 
not friendly for user cases which mapping config could be abstracted to rules. 
E.g. Users want to provide a GCS bucket per user and there might be thousands 
of users in total. The old key-value based approach won't work well for several 
reasons:
+
+1. The mount table is used by FileSystem clients. There's a cost to spread the 
config to all clients and we should avoid it if possible. The [View File System 
Overload Scheme Guide](./ViewFsOverloadScheme.html) could help the distribution 
by central mount table management. But the mount table still have to be updated 
on every change. The change could be greatly avoided if provide a rule-based 
mount table.
+
+2. The client have to understand all the KVs in the mount table. This is not 
ideal when the mountable grows to thousands of items. E.g. thousands of file 
systems might be initialized even users only need one. And the config itself 
will become bloated at scale.
+
+### Understand the Difference
+
+In the key-value based mount table, view file system treats every mount point 
as a partition. There's several file system APIs which will lead to operation 
on all partitions. E.g. there's an HDFS cluster with multiple mount. Users want 
to run “hadoop fs -put file viewfs://hdfs.namenode.apache.org/tmp/” cmd to copy 
data from local disk to our HDFS cluster. The cmd will trigger ViewFileSystem 
to call setVerifyChecksum() method which will initialize the file system for 
every mount point.
+For a regex-base rule mount table entry, we couldn't know what's corresponding 
path until parsing. So the regex based mount table entry will be ignored on 
such cases. The file system (ChRootedFileSystem) will be created upon 
accessing. But the underlying file system will be cached by inner cache of 
ViewFileSystem.

Review comment:
   >For a regex-base rule mount table entry, we couldn't know what's 
corresponding path until parsing. 
   
   Whatever we know should be added to mountPoints? So, that getMountPoints 
will return known fs-es? It may be a good idea to add Java doc on API level. 
   I am ok to have this in followup JIRA to cover all of this aspects.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-08 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r485332518



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
##
@@ -217,6 +239,7 @@ public Path getMountedOnPath() {
   Path homeDir = null;
   private boolean enableInnerCache = false;
   private InnerCache cache;
+  private boolean evictCacheOnClose = false;

Review comment:
   Do you see any issue if we make it true? If no issues, we can simply 
clean it on close right instead of having another config?
   
   Seems like this is an improvement to existing code. If you want, I am ok to 
file small JIRA and fix this cleanup thing.( I am assuming it's not necessarily 
needed with this.)

##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestRegexMountPoint.java
##
@@ -0,0 +1,160 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.viewfs;
+
+import java.io.IOException;
+import java.net.URI;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.hadoop.conf.Configuration;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Test Regex Mount Point.
+ */
+public class TestRegexMountPoint {
+  private static final Logger LOGGER =
+  LoggerFactory.getLogger(TestRegexMountPoint.class.getName());
+
+  private InodeTree inodeTree;
+  private Configuration conf;
+
+  class TestRegexMountPointFileSystem {
+public URI getUri() {
+  return uri;
+}
+
+private URI uri;
+
+TestRegexMountPointFileSystem(URI uri) {
+  String uriStr = uri == null ? "null" : uri.toString();
+  LOGGER.info("Create TestRegexMountPointFileSystem Via URI:" + uriStr);
+  this.uri = uri;
+}
+  }
+
+  @Before
+  public void setUp() throws Exception {
+conf = new Configuration();
+ConfigUtil.addLink(conf, TestRegexMountPoint.class.getName(), "/mnt",
+URI.create("file:///"));
+
+inodeTree = new InodeTree(conf,
+TestRegexMountPoint.class.getName(), null, false) {
+  @Override
+  protected TestRegexMountPointFileSystem getTargetFileSystem(
+  final URI uri) {
+return new TestRegexMountPointFileSystem(uri);
+  }
+
+  @Override
+  protected TestRegexMountPointFileSystem getTargetFileSystem(
+  final INodeDir dir) {
+return new TestRegexMountPointFileSystem(null);
+  }
+
+  @Override
+  protected TestRegexMountPointFileSystem getTargetFileSystem(
+  final String settings, final URI[] mergeFsURIList) {
+return new TestRegexMountPointFileSystem(null);
+  }
+};
+  }
+
+  @After
+  public void tearDown() throws Exception {
+inodeTree = null;
+  }
+
+  @Test
+  public void testGetVarListInString() throws IOException {
+String srcRegex = "/(\\w+)";
+String target = "/$0/${1}/$1/${2}/${2}";
+RegexMountPoint regexMountPoint =
+new RegexMountPoint(inodeTree, srcRegex, target, null);
+regexMountPoint.initialize();
+Map> varMap = regexMountPoint.getVarInDestPathMap();
+Assert.assertEquals(varMap.size(), 3);
+Assert.assertEquals(varMap.get("0").size(), 1);
+Assert.assertTrue(varMap.get("0").contains("$0"));
+Assert.assertEquals(varMap.get("1").size(), 2);
+Assert.assertTrue(varMap.get("1").contains("${1}"));
+Assert.assertTrue(varMap.get("1").contains("$1"));
+Assert.assertEquals(varMap.get("2").size(), 1);
+Assert.assertTrue(varMap.get("2").contains("${2}"));
+  }
+
+  @Test
+  public void testResolve() throws IOException {
+String regexStr = "^/user/(?\\w+)";
+String dstPathStr = "/namenode1/testResolve/$username";
+String settingsStr = null;
+RegexMountPoint regexMountPoint =
+new RegexMountPoint(inodeTree, regexStr, dstPathStr, settingsStr);
+regexMountPoint.initialize();
+InodeTree.ResolveResult resolveResult =
+regexMountPoint.resolve("/user/hadoop/file1", true);
+Assert.assertEquals(resolveResult.kind, 

[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-06 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r484092172



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestRegexMountPointResolvedDstPathReplaceInterceptor.java
##
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.viewfs;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.Path;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static 
org.apache.hadoop.fs.viewfs.RegexMountPointInterceptorType.REPLACE_RESOLVED_DST_PATH;
+
+/**
+ * Test RegexMountPointResolvedDstPathReplaceInterceptor.
+ */
+public class TestRegexMountPointResolvedDstPathReplaceInterceptor {
+
+  public String createSerializedString(String regex, String replaceString) {
+return REPLACE_RESOLVED_DST_PATH.getConfigName()
++ RegexMountPoint.INTERCEPTOR_INTERNAL_SEP + regex
++ RegexMountPoint.INTERCEPTOR_INTERNAL_SEP + replaceString;
+  }
+
+  @Test
+  public void testDeserializeFromStringNormalCase() throws IOException {
+String srcRegex = "-";
+String replaceString = "_";
+String serializedString = createSerializedString(srcRegex, replaceString);
+RegexMountPointResolvedDstPathReplaceInterceptor interceptor =
+RegexMountPointResolvedDstPathReplaceInterceptor
+.deserializeFromString(serializedString);
+Assert.assertTrue(interceptor.getSrcRegexString().equals(srcRegex));
+Assert.assertTrue(interceptor.getReplaceString().equals(replaceString));
+Assert.assertTrue(interceptor.getSrcRegexPattern() == null);
+interceptor.initialize();
+Assert.assertTrue(
+interceptor.getSrcRegexPattern().toString().equals(srcRegex));
+  }
+
+  @Test
+  public void testDeserializeFromStringBadCase() throws IOException {
+String srcRegex = "-";
+String replaceString = "_";
+String serializedString = createSerializedString(srcRegex, replaceString);
+serializedString = serializedString + ":ddd";
+RegexMountPointResolvedDstPathReplaceInterceptor interceptor =
+RegexMountPointResolvedDstPathReplaceInterceptor
+.deserializeFromString(serializedString);
+Assert.assertEquals(interceptor, null);
+  }
+
+  @Test
+  public void testSerialization() {
+String srcRegex = "word1";
+String replaceString = "word2";
+String serializedString = createSerializedString(srcRegex, replaceString);
+RegexMountPointResolvedDstPathReplaceInterceptor interceptor =
+new RegexMountPointResolvedDstPathReplaceInterceptor(srcRegex,
+replaceString);
+Assert.assertEquals(interceptor.serializeToString(), serializedString);
+  }
+
+  @Test

Review comment:
   No worries. Thanks for addressing. :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-02 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r481883237



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/RegexMountPointResolvedDstPathReplaceInterceptor.java
##
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.viewfs;
+
+import java.io.IOException;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.hadoop.fs.viewfs.RegexMountPointInterceptorType.REPLACE_RESOLVED_DST_PATH;
+
+/**
+ * Implementation of RegexMountPointResolvedDstPathReplaceInterceptor.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+class RegexMountPointResolvedDstPathReplaceInterceptor
+implements RegexMountPointInterceptor {
+
+  private String srcRegexString;
+  private String replaceString;
+  private Pattern srcRegexPattern;
+
+  RegexMountPointResolvedDstPathReplaceInterceptor(String srcRegex,
+  String replaceString) {
+this.srcRegexString = srcRegex;
+this.replaceString = replaceString;
+this.srcRegexPattern = null;
+  }
+
+  public String getSrcRegexString() {
+return srcRegexString;
+  }
+
+  public String getReplaceString() {
+return replaceString;
+  }
+
+  public Pattern getSrcRegexPattern() {
+return srcRegexPattern;
+  }
+
+  @Override
+  public void initialize() throws IOException {
+try {
+  srcRegexPattern = Pattern.compile(srcRegexString);
+} catch (PatternSyntaxException ex) {
+  throw new IOException(
+  "Initialize interceptor failed, srcRegx:" + srcRegexString, ex);
+}
+  }
+
+  /**
+   * Intercept source before resolution.
+   *
+   * @param source
+   * @return
+   */
+  @Override
+  public String interceptSource(String source) {
+return source;
+  }
+
+  /**
+   * Intercept resolved path, e.g.
+   * Mount point /^(\\w+)/, ${1}.hadoop.net
+   * If incoming path is /user1/home/tmp/job1,
+   * then the resolved path str will be user1.
+   *
+   * @return intercepted string
+   */
+  @Override public String interceptResolvedDestPathStr(
+  String parsedDestPathStr) {
+Matcher matcher = srcRegexPattern.matcher(parsedDestPathStr);
+return matcher.replaceAll(replaceString);
+  }
+
+  /**
+   * Intercept remaining path.
+   *
+   * @return intercepted path
+   */
+  @Override public Path interceptRemainingPath(Path remainingPath) {

Review comment:
   The splitting of remaining part is kind implementation details I feel. 
User may not clearly have idea on that, unless they are very advanced users. 
   My question was what if we use same API for remaining part also?  in ur 
example interceptSource((dir1/dir2/dir3/file3)). What will happen here? 
Intercepter are just for replacing some pattern with other right. Let me know 
if I am missing.
   Thanks for your efforts on fixing other comments..
   
   Also please take a look at check-style warnings reported in latest report.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-01 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r481539041



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestRegexMountPointResolvedDstPathReplaceInterceptor.java
##
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.viewfs;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.Path;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static 
org.apache.hadoop.fs.viewfs.RegexMountPointInterceptorType.REPLACE_RESOLVED_DST_PATH;
+
+/**
+ * Test RegexMountPointResolvedDstPathReplaceInterceptor.
+ */
+public class TestRegexMountPointResolvedDstPathReplaceInterceptor {
+
+  public String createSerializedString(String regex, String replaceString) {
+return REPLACE_RESOLVED_DST_PATH.getConfigName()
++ RegexMountPoint.INTERCEPTOR_INTERNAL_SEP + regex
++ RegexMountPoint.INTERCEPTOR_INTERNAL_SEP + replaceString;
+  }
+
+  @Test
+  public void testDeserializeFromStringNormalCase() throws IOException {
+String srcRegex = "-";
+String replaceString = "_";
+String serializedString = createSerializedString(srcRegex, replaceString);
+RegexMountPointResolvedDstPathReplaceInterceptor interceptor =
+RegexMountPointResolvedDstPathReplaceInterceptor
+.deserializeFromString(serializedString);
+Assert.assertTrue(interceptor.getSrcRegexString().equals(srcRegex));
+Assert.assertTrue(interceptor.getReplaceString().equals(replaceString));
+Assert.assertTrue(interceptor.getSrcRegexPattern() == null);
+interceptor.initialize();
+Assert.assertTrue(
+interceptor.getSrcRegexPattern().toString().equals(srcRegex));
+  }
+
+  @Test
+  public void testDeserializeFromStringBadCase() throws IOException {
+String srcRegex = "-";
+String replaceString = "_";
+String serializedString = createSerializedString(srcRegex, replaceString);
+serializedString = serializedString + ":ddd";
+RegexMountPointResolvedDstPathReplaceInterceptor interceptor =
+RegexMountPointResolvedDstPathReplaceInterceptor
+.deserializeFromString(serializedString);
+Assert.assertEquals(interceptor, null);
+  }
+
+  @Test
+  public void testSerialization() {
+String srcRegex = "word1";
+String replaceString = "word2";
+String serializedString = createSerializedString(srcRegex, replaceString);
+RegexMountPointResolvedDstPathReplaceInterceptor interceptor =
+new RegexMountPointResolvedDstPathReplaceInterceptor(srcRegex,
+replaceString);
+Assert.assertEquals(interceptor.serializeToString(), serializedString);
+  }
+
+  @Test

Review comment:
   I mean we just need to assert I think. Ex: Test saying 
"InterceptSource". SO, just validate what it should return. In this case, it 
will return same string. It may not have much value now, but if some one 
changes it, it can just catch it.
   Currently this test will never fail. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-09-01 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r480836939



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/Constants.java
##
@@ -86,12 +86,21 @@
*/
   String CONFIG_VIEWFS_LINK_MERGE_SLASH = "linkMergeSlash";
 
+  /**
+   * Config variable for specifying a regex link which uses regular expressions
+   * as source and target could use group captured in src.
+   * E.g. (^/(?\\w+), /prefix-${firstDir}) =>
+   *   (/path1/file1 => /prefix-path1/file1)
+   */
+  String CONFIG_VIEWFS_LINK_REGEX = "linkRegex";
+
   FsPermission PERMISSION_555 = new FsPermission((short) 0555);
 
   String CONFIG_VIEWFS_RENAME_STRATEGY = "fs.viewfs.rename.strategy";
 
   /**
* Enable ViewFileSystem to cache all children filesystems in inner cache.

Review comment:
   below comment can be corrected? regex base mount point will not use 
caching now.
   Otherwise people could confuse and may tend to disable. IIUC, even if they 
enable this, RegexBasedMountPoints will continue to work right?

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/RegexMountPoint.java
##
@@ -0,0 +1,306 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.viewfs;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.hadoop.fs.viewfs.InodeTree.SlashPath;
+
+/**
+ * Regex mount point is build to implement regex based mount point.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+class RegexMountPoint {
+  private static final Logger LOGGER =
+  LoggerFactory.getLogger(RegexMountPoint.class.getName());
+
+  private InodeTree inodeTree;
+  private String srcPathRegex;
+  private Pattern srcPattern;
+  private String dstPath;
+  private String interceptorSettingsString;
+  private List interceptorList;
+
+  public static final String SETTING_SRCREGEX_SEP = "#.";
+  public static final char INTERCEPTOR_SEP = ';';
+  public static final char INTERCEPTOR_INTERNAL_SEP = ':';
+  // ${var},$var
+  public static final Pattern VAR_PATTERN_IN_DEST =
+  Pattern.compile("\\$((\\{\\w+\\})|(\\w+))");
+
+  // Same var might have different representations.
+  // e.g.
+  // key => $key or key = > ${key}
+  private Map> varInDestPathMap;
+
+  public Map> getVarInDestPathMap() {
+return varInDestPathMap;
+  }
+
+  RegexMountPoint(InodeTree inodeTree, String sourcePathRegex,
+  String destPath, String settingsStr) {
+this.inodeTree = inodeTree;
+this.srcPathRegex = sourcePathRegex;
+this.dstPath = destPath;
+this.interceptorSettingsString = settingsStr;
+this.interceptorList = new ArrayList<>();
+  }
+
+  /**
+   * Initialize regex mount point.
+   *
+   * @throws IOException
+   */
+  public void initialize() throws IOException {
+try {
+  srcPattern = Pattern.compile(srcPathRegex);
+} catch (PatternSyntaxException ex) {
+  throw new IOException(
+  "Failed to initialized mount point due to bad src path regex:"
+  + srcPathRegex + ", dstPath:" + dstPath, ex);
+}
+varInDestPathMap = getVarListInString(dstPath);
+initializeInterceptors();
+  }
+
+  private void initializeInterceptors() throws IOException {
+if (interceptorSettingsString == null
+|| interceptorSettingsString.isEmpty()) {
+  return;
+}
+String[] interceptorStrArray =
+StringUtils.split(interceptorSettingsString, INTERCEPTOR_SEP);
+for (String interceptorStr : interceptorStrArray) {
+  RegexMountPointInterceptor interceptor =
+  

[GitHub] [hadoop] umamaheswararao commented on a change in pull request #2185: HADOOP-15891. provide Regex Based Mount Point In Inode Tree

2020-08-17 Thread GitBox


umamaheswararao commented on a change in pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r471268265



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ConfigUtil.java
##
@@ -166,6 +166,41 @@ public static void addLinkNfly(final Configuration conf, 
final String src,
 addLinkNfly(conf, getDefaultMountTableName(conf), src, null, targets);
   }
 
+
+  /**
+   * Add a LinkRegex to the config for the specified mount table.

Review comment:
   Below params javadoc does not help anything I think. If you want to add, 
please add description( most of that params seems self explanatory ), otherwise 
just remove params part.

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java
##
@@ -646,102 +714,222 @@ boolean isInternalDir() {
   }
 
   /**
-   * Resolve the pathname p relative to root InodeDir
+   * Resolve the pathname p relative to root InodeDir.
* @param p - input path
* @param resolveLastComponent
* @return ResolveResult which allows further resolution of the remaining 
path
* @throws FileNotFoundException
*/
   ResolveResult resolve(final String p, final boolean resolveLastComponent)
   throws FileNotFoundException {
-String[] path = breakIntoPathComponents(p);
-if (path.length <= 1) { // special case for when path is "/"
-  T targetFs = root.isInternalDir() ?
-  getRootDir().getInternalDirFs() : 
getRootLink().getTargetFileSystem();
-  ResolveResult res = new ResolveResult(ResultKind.INTERNAL_DIR,
-  targetFs, root.fullPath, SlashPath);
-  return res;
-}
+ResolveResult resolveResult = null;
+resolveResult = getResolveResultFromCache(p, resolveLastComponent);
+if (resolveResult != null) {
+  return resolveResult;
+}
+
+try {
+  String[] path = breakIntoPathComponents(p);
+  if (path.length <= 1) { // special case for when path is "/"
+T targetFs = root.isInternalDir() ?
+getRootDir().getInternalDirFs()
+: getRootLink().getTargetFileSystem();
+resolveResult = new ResolveResult(ResultKind.INTERNAL_DIR,
+targetFs, root.fullPath, SlashPath);
+return resolveResult;
+  }
 
-/**
- * linkMergeSlash has been configured. The root of this mount table has
- * been linked to the root directory of a file system.
- * The first non-slash path component should be name of the mount table.
- */
-if (root.isLink()) {
-  Path remainingPath;
-  StringBuilder remainingPathStr = new StringBuilder();
-  // ignore first slash
-  for (int i = 1; i < path.length; i++) {
-remainingPathStr.append("/").append(path[i]);
+  /**
+   * linkMergeSlash has been configured. The root of this mount table has
+   * been linked to the root directory of a file system.
+   * The first non-slash path component should be name of the mount table.
+   */
+  if (root.isLink()) {
+Path remainingPath;
+StringBuilder remainingPathStr = new StringBuilder();
+// ignore first slash
+for (int i = 1; i < path.length; i++) {
+  remainingPathStr.append("/").append(path[i]);
+}
+remainingPath = new Path(remainingPathStr.toString());
+resolveResult = new ResolveResult(ResultKind.EXTERNAL_DIR,
+getRootLink().getTargetFileSystem(), root.fullPath, remainingPath);
+return resolveResult;
   }
-  remainingPath = new Path(remainingPathStr.toString());
-  ResolveResult res = new ResolveResult(ResultKind.EXTERNAL_DIR,
-  getRootLink().getTargetFileSystem(), root.fullPath, remainingPath);
-  return res;
-}
-Preconditions.checkState(root.isInternalDir());
-INodeDir curInode = getRootDir();
+  Preconditions.checkState(root.isInternalDir());
+  INodeDir curInode = getRootDir();
 
-int i;
-// ignore first slash
-for (i = 1; i < path.length - (resolveLastComponent ? 0 : 1); i++) {
-  INode nextInode = curInode.resolveInternal(path[i]);
-  if (nextInode == null) {
-if (hasFallbackLink()) {
-  return new ResolveResult(ResultKind.EXTERNAL_DIR,
-  getRootFallbackLink().getTargetFileSystem(),
-  root.fullPath, new Path(p));
-} else {
-  StringBuilder failedAt = new StringBuilder(path[0]);
-  for (int j = 1; j <= i; ++j) {
-failedAt.append('/').append(path[j]);
+  // Try to resolve path in the regex mount point
+  resolveResult = tryResolveInRegexMountpoint(p, resolveLastComponent);
+  if (resolveResult != null) {
+return resolveResult;
+  }
+
+  int i;
+  // ignore first slash
+  for (i = 1; i < path.length - (resolveLastComponent ? 0 : 1); i++) {
+INode nextInode = curInode.resolveInternal(path[i]);
+if (nextInode ==