[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-16 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r284734055
 
 

 ##
 File path: 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
 ##
 @@ -1213,8 +1213,12 @@
 
 
   fs.s3a.connection.maximum
-  15
-  Controls the maximum number of simultaneous connections to 
S3.
+  48
 
 Review comment:
   moved to 72. Now, 64 threads *seems* a lot, but most of these are blocking 
operations rather than IO heavy actions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-16 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r284735973
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
 ##
 @@ -1215,6 +1215,18 @@ sync.
 
 See [Fail on Error](#fail-on-error) for more detail.
 
+### Error `Attempt to change a resource which is still in use: Table is being 
deleted`
+
+```
+com.amazonaws.services.dynamodbv2.model.ResourceInUseException:
+  Attempt to change a resource which is still in use: Table is being deleted: s
+3guard.test.testDynamoDBInitDestroy351245027 
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-16 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r284734556
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
 ##
 @@ -138,9 +138,15 @@ private Constants() {
   public static final String ASSUMED_ROLE_CREDENTIALS_DEFAULT =
   SimpleAWSCredentialsProvider.NAME;
 
+
+  // the maximum number of tasks cached if all threads are already uploading
+  public static final String MAX_TOTAL_TASKS = "fs.s3a.max.total.tasks";
+
+  public static final int DEFAULT_MAX_TOTAL_TASKS = 5;
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-16 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r284736917
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
 ##
 @@ -87,12 +100,86 @@ public void testMultiObjectDeleteSomeFiles() throws 
Throwable {
 timer.end("removeKeys");
   }
 
+
+  private Path maybeGetCsvPath() {
+Configuration conf = getConfiguration();
+String csvFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
+Assume.assumeTrue("CSV test file is not the default",
+DEFAULT_CSVTEST_FILE.equals(csvFile));
+return new Path(csvFile);
+  }
+
+  /**
+   * Test low-level failure handling with low level delete request.
+   */
   @Test
   public void testMultiObjectDeleteNoPermissions() throws Throwable {
-Path testFile = getLandsatCSVPath(getConfiguration());
-S3AFileSystem fs = (S3AFileSystem)testFile.getFileSystem(
+describe("Delete the landsat CSV file and expect it to fail");
+Path csvPath = maybeGetCsvPath();
+S3AFileSystem fs = 
(S3AFileSystem)csvPath.getFileSystem(getConfiguration());
+List keys
+= buildDeleteRequest(
+new String[]{
+fs.pathToKey(csvPath),
+"missing-key.csv"
+});
+MultiObjectDeleteException ex = intercept(
+MultiObjectDeleteException.class,
+() -> fs.removeKeys(keys, false, false));
+
+final List undeleted
+= extractUndeletedPaths(ex, fs::keyToQualifiedPath);
+String undeletedFiles = join(undeleted);
+failIf(undeleted.size() != 2,
+"undeleted list size wrong: " + undeletedFiles,
+ex);
+assertTrue("no CSV in " +undeletedFiles, undeleted.contains(csvPath));
+
+// and a full split, after adding a new key
+String marker = "/marker";
+Path markerPath = fs.keyToQualifiedPath(marker);
+keys.add(new DeleteObjectsRequest.KeyVersion(marker));
+
+Pair, List> pair =
+new MultiObjectDeleteSupport(fs.createStoreContext())
+.splitUndeletedKeys(ex, keys);
+assertEquals(undeleted, pair.getLeft());
+List right = pair.getRight();
+assertEquals("Wrong size for " + join(right), 1, right.size());
+assertEquals(markerPath, right.get(0));
+  }
+
+  /**
+   * See what happens when you delete two entries which do not exist.
+   * The call succeeds; if
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-16 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r284734722
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
 ##
 @@ -138,9 +138,15 @@ private Constants() {
   public static final String ASSUMED_ROLE_CREDENTIALS_DEFAULT =
   SimpleAWSCredentialsProvider.NAME;
 
+
+  // the maximum number of tasks cached if all threads are already uploading
+  public static final String MAX_TOTAL_TASKS = "fs.s3a.max.total.tasks";
+
+  public static final int DEFAULT_MAX_TOTAL_TASKS = 5;
+
   // number of simultaneous connections to s3
   public static final String MAXIMUM_CONNECTIONS = "fs.s3a.connection.maximum";
-  public static final int DEFAULT_MAXIMUM_CONNECTIONS = 15;
+  public static final int DEFAULT_MAXIMUM_CONNECTIONS = 
DEFAULT_MAX_TOTAL_TASKS * 2;
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-16 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r284734924
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
 ##
 @@ -283,6 +285,22 @@ private Constants() {
   @InterfaceStability.Unstable
   public static final int DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS = 4;
 
+  /**
+   * The capacity of executor queues for operations other than block
+   * upload, where {@link #FAST_UPLOAD_ACTIVE_BLOCKS} is used instead.
+   * This should be less than {@link #MAX_THREADS} for fair
+   * submission.
+   * Value: {@value}.
+   */
+  public static final String EXECUTOR_CAPACITY = "fs.s3a.executor.capacity";
+
+  /**
+   * The capacity of executor queues for operations other than block
+   * upload, where {@link #FAST_UPLOAD_ACTIVE_BLOCKS} is used instead.
+   * Value: {@value}
+   */
+  public static final int DEFAULT_EXECUTOR_CAPACITY = 10;
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-10 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r282963081
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java
 ##
 @@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.util.Optional;
+import java.util.function.Function;
+
+import com.google.common.util.concurrent.ListeningExecutorService;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.Invoker;
+import org.apache.hadoop.fs.s3a.S3AInputPolicy;
+import org.apache.hadoop.fs.s3a.S3AInstrumentation;
+import org.apache.hadoop.fs.s3a.S3AStorageStatistics;
+import org.apache.hadoop.fs.s3a.Statistic;
+import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.SemaphoredDelegatingExecutor;
+
+/**
+ * This class provides the core context of the S3A filesystem to subsidiary
+ * components, without exposing the entire parent class.
+ * This is eliminate explicit recursive coupling.
+ *
+ * Where methods on the FS are to be invoked, they are all passed in
+ * via functional interfaces, so test setups can pass in mock callbacks
+ * instead.
+ *
+ * Warning: this really is private and unstable. Do not use
+ * outside the org.apache.hadoop.fs.s3a package.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+public class StoreContext {
+
+  /** Filesystem URI. */
+  private final URI fsURI;
+
+  /** Bucket name */
+  private final String bucket;
+
+  /** FS configuration after all per-bucket overrides applied. */
+  private final Configuration configuration;
+
+  /** Username. */
+  private final String username;
+
+  /** Principal who created the FS. */
+  private final UserGroupInformation owner;
+
+  /**
+   * Location of a bucket.
+   * Optional as the AWS call to evaluate this may fail from a permissions
+   * or other IOE.
+   */
+  public final Optional bucketLocation;
+
+  /**
+   * Bounded thread pool for async operations.
+   */
+  private final ListeningExecutorService executor;
+
+  /**
+   * Capacity of new executors created.
+   */
+  private final int executorCapacity;
+
+  /** Invoker of operations. */
+  private final Invoker invoker;
+
+  /* Instrumentation and statistics. */
+  private final S3AInstrumentation instrumentation;
+  private final S3AStorageStatistics storageStatistics;
+
+  /** Seek policy. */
+  private final S3AInputPolicy inputPolicy;
+
+  /** How to react to changes in etags and versions. */
+  private final ChangeDetectionPolicy changeDetectionPolicy;
+
+  /** Evaluated options. */
+  private final boolean multiObjectDeleteEnabled;
+
+  /** List algorithm. */
+  private final boolean useListV1;
+
+  /** Is the store versioned? */
+  private final boolean versioned;
+
+  /**
+   * To allow this context to be passed down to the metastore, this field
+   * wll be null until initialized.
+   */
+  private final MetadataStore metadataStore;
+
+  /** Function to take a key and return a path. */
+  private final Function keyToPathQualifier;
+
+  /** Factory for temporary files. */
+  private final TempFileFactory tempFileFactory;
+
+  /**
+   * Instantiate.
+   * No attempt to use a builder here as outside tests
+   * this should only be created in the S3AFileSystem.
+   */
+  public StoreContext(final URI fsURI,
+  final String bucket,
+  final Configuration configuration,
+  final String username,
+  final UserGroupInformation owner,
+  final ListeningExecutorService executor,
+  final int executorCapacity,
+  final Invoker invoker,
+  final S3AInstrumentation instrumentation,
+  final S3AStorageStatistics storageStatistics,
+  final S3AInputPolicy inputPolicy,
+  final ChangeDetectionPolicy 

[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-10 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r282963021
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
 ##
 @@ -1199,115 +1229,248 @@ private boolean innerRename(Path source, Path dest)
   }
 }
 
-// If we have a MetadataStore, track deletions/creations.
-Collection srcPaths = null;
-List dstMetas = null;
-if (hasMetadataStore()) {
-  srcPaths = new HashSet<>(); // srcPaths need fast look up before put
-  dstMetas = new ArrayList<>();
-}
-// TODO S3Guard HADOOP-13761: retries when source paths are not visible yet
-// TODO S3Guard: performance: mark destination dirs as authoritative
-
-// Ok! Time to start
-if (srcStatus.isFile()) {
-  LOG.debug("rename: renaming file {} to {}", src, dst);
-  long length = srcStatus.getLen();
-  if (dstStatus != null && dstStatus.isDirectory()) {
-String newDstKey = maybeAddTrailingSlash(dstKey);
-String filename =
-srcKey.substring(pathToKey(src.getParent()).length()+1);
-newDstKey = newDstKey + filename;
-copyFile(srcKey, newDstKey, length);
-S3Guard.addMoveFile(metadataStore, srcPaths, dstMetas, src,
-keyToQualifiedPath(newDstKey), length, getDefaultBlockSize(dst),
-username);
-  } else {
-copyFile(srcKey, dstKey, srcStatus.getLen());
-S3Guard.addMoveFile(metadataStore, srcPaths, dstMetas, src, dst,
-length, getDefaultBlockSize(dst), username);
-  }
-  innerDelete(srcStatus, false);
-} else {
-  LOG.debug("rename: renaming directory {} to {}", src, dst);
+// Validation completed: time to begin the operation.
+// The store-specific rename operation is used to keep the store
+// to date with the in-progress operation.
+// for the null store, these are all no-ops.
+final RenameTracker renameTracker =
+metadataStore.initiateRenameOperation(
+createStoreContext(),
+src, srcStatus, dest);
+final AtomicLong bytesCopied = new AtomicLong();
+int renameParallelLimit = 10;
 
 Review comment:
   I didn't determine anything, I just didn't want another config param :)
   
   My fear is that someone would find a value which delivered great performance 
on a rename of many small files, but when one thread suddenly came to a large 
set of files it would starve everything else. I think I could make it 
configurable but perhaps hang off the same option of the bulk upload, or at 
least make sure I'm using the same limited executor mechanism


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

2019-05-06 Thread GitBox
steveloughran commented on a change in pull request #654: HADOOP-15183 S3Guard 
store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/654#discussion_r281163359
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/ExtraAssertions.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.junit.Assert;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.contract.ContractTestUtils;
+import org.apache.hadoop.util.DurationInfo;
+
+import static org.apache.hadoop.fs.s3a.S3AUtils.applyLocatedFiles;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Some extra assertions for tests.
+ */
+@InterfaceAudience.Private
+public class ExtraAssertions {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  ExtraAssertions.class);
+
+  /**
+   * Assert that the number of files in a destination matches that expected.
+   * @param text text to use in the message
+   * @param fs filesystem
+   * @param path path to list (recursively)
+   * @param expected expected count
+   * @throws IOException IO problem
+   */
+  public static void assertFileCount(String text, FileSystem fs,
+  Path path, long expected)
+  throws IOException {
+List files = new ArrayList<>();
+try (DurationInfo ignored = new DurationInfo(LOG, false,
+"Counting files in %s", path)) {
+  applyLocatedFiles(fs.listFiles(path, true),
+  (status) -> files.add(status.getPath().toString()));
+}
+long actual = files.size();
+if (actual != expected) {
+  String ls = files.stream().collect(Collectors.joining("\n"));
+  Assert.fail(text + ": expected " + expected + " files in " + path
+  + " but got " + actual + "\n" + ls);
+}
+  }
+
+  /**
+   * Assert that a string contains a piece of text.
+   * @param text text to can.
+   * @param contained text to look for.
+   */
+  public static void assertTextContains(String text, String contained) {
+assertTrue("string \"" + contained + "\" not found in \"" + text + "\"",
+text != null && text.contains(contained));
+  }
+
+  /**
+   * If the condition is met, throw an AssertionError with the message
+   * and any nested exception.
+   * @param condition condition
+   * @param message text to use in the exception
+   * @param cause a (possibly null) throwable to init the cause with
+   * @throws AssertionError with the text and throwable if condition == true.
+   */
+  public static void failIf(boolean condition,
+  String message,
+  Throwable cause) {
+if (condition) {
+  ContractTestUtils.fail(message, cause);
+}
+  }
+
+  /**
+   * If the condition is met, throw an AssertionError with the message
+   * and any nested exception.
+   * @param condition condition
+   * @param message text to use in the exception
+   * @param cause a (possibly null) throwable to init the cause with
+   * @throws AssertionError with the text and throwable if condition == true.
+   */
+  public static void failUnless(boolean condition,
+  String message,
+  Throwable cause) {
+failIf(!condition, message, cause);
+  }
+
+  /**
+   * Extract the inner cause of an exception.
+   * @param expected  expected class of the cuse
 
 Review comment:
   fixed.
   Interesting that the IntelliJ spell checker didn't flag this up, even though 
I haven't accidentally added "cuse" to the dictionary


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services