[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-22 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r305710298
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.FileSystems;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService();
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService() {
+   try {
+   return FileSystems.getDefault().newWatchService();
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   private static void watchForLatchFile(final WatchService watchService, 
final Path parentDir) {
+   try {
+   parentDir.register(
+   watchService,
+   new 
WatchEvent.Kind[]{StandardWatchEventKinds.ENTRY_CREATE},
+   SensitivityWatchEventModifier.HIGH);
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   /**
+* Waits until the latch file is created.
+*
+* @throws InterruptedException if interrupted while waiting
+*/
+   public void await() throws InterruptedException {
+   if (isReleasedOrReleasable()) {
+   return;
+   }
+
+   awaitLatchFile(watchService);
+   }
+
+   private void awaitLatchFile(final WatchService watchService) throws 
InterruptedException {
+   while (true) {
+   WatchKey take = watchService.take();
+   if (isReleasedOrReleasable()) {
 
 Review comment:
   > Prone to files being deleted in-between, but this seems unlikely.
   
   True, didn't think about that case. I think it's acceptable to leave it as 
is because:
   - we wait until the job is finished before deleting files
   - class is contained in this module
   - code is simpler (as you already mentioned)
   - judging by [this SO answer](https://stackoverflow.com/a/11182515), even 
WatchService could lose the event if the file is deleted shortly after creation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-22 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r305710298
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.FileSystems;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService();
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService() {
+   try {
+   return FileSystems.getDefault().newWatchService();
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   private static void watchForLatchFile(final WatchService watchService, 
final Path parentDir) {
+   try {
+   parentDir.register(
+   watchService,
+   new 
WatchEvent.Kind[]{StandardWatchEventKinds.ENTRY_CREATE},
+   SensitivityWatchEventModifier.HIGH);
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   /**
+* Waits until the latch file is created.
+*
+* @throws InterruptedException if interrupted while waiting
+*/
+   public void await() throws InterruptedException {
+   if (isReleasedOrReleasable()) {
+   return;
+   }
+
+   awaitLatchFile(watchService);
+   }
+
+   private void awaitLatchFile(final WatchService watchService) throws 
InterruptedException {
+   while (true) {
+   WatchKey take = watchService.take();
+   if (isReleasedOrReleasable()) {
 
 Review comment:
   > Prone to files being deleted in-between, but this seems unlikely.
   
   True, didn't think about that case. I think it's acceptable to leave it as 
is because:
   - we wait until the job is finished before deleting files
   - class is contained in this module
   - code is simpler (as you already mentioned)
   - judging by [this SO answer](https://stackoverflow.com/a/11182515), even 
WatchService could miss the file if it is deleted shortly after creation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-22 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r305710298
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.FileSystems;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService();
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService() {
+   try {
+   return FileSystems.getDefault().newWatchService();
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   private static void watchForLatchFile(final WatchService watchService, 
final Path parentDir) {
+   try {
+   parentDir.register(
+   watchService,
+   new 
WatchEvent.Kind[]{StandardWatchEventKinds.ENTRY_CREATE},
+   SensitivityWatchEventModifier.HIGH);
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   /**
+* Waits until the latch file is created.
+*
+* @throws InterruptedException if interrupted while waiting
+*/
+   public void await() throws InterruptedException {
+   if (isReleasedOrReleasable()) {
+   return;
+   }
+
+   awaitLatchFile(watchService);
+   }
+
+   private void awaitLatchFile(final WatchService watchService) throws 
InterruptedException {
+   while (true) {
+   WatchKey take = watchService.take();
+   if (isReleasedOrReleasable()) {
 
 Review comment:
   > Prone to files being deleted in-between, but this seems unlikely.
   
   True, didn't think about that case. I think it's acceptable to leave it as 
is because:
   - we wait until the job is finished before deleting files
   - class is contained in this module
   - code is simpler (as you already mentioned)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r305113537
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService(parentDir);
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService(final Path parentDir) {
+   try {
+   return parentDir.getFileSystem().newWatchService();
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   private static void watchForLatchFile(final WatchService watchService, 
final Path parentDir) {
+   try {
+   parentDir.register(
+   watchService,
+   new 
WatchEvent.Kind[]{StandardWatchEventKinds.ENTRY_CREATE},
+   SensitivityWatchEventModifier.HIGH);
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   /**
+* Waits until the latch file is created.
+*
+* @throws InterruptedException if interrupted while waiting
+*/
+   public void await() throws InterruptedException {
+   if (isReleasedOrReleasable()) {
+   return;
+   }
+
+   awaitLatchFile(watchService);
+   }
+
+   private void awaitLatchFile(final WatchService watchService) throws 
InterruptedException {
+   while (true) {
+   WatchKey take = watchService.take();
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r305113144
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService(parentDir);
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService(final Path parentDir) {
+   try {
+   return parentDir.getFileSystem().newWatchService();
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   private static void watchForLatchFile(final WatchService watchService, 
final Path parentDir) {
+   try {
+   parentDir.register(
+   watchService,
+   new 
WatchEvent.Kind[]{StandardWatchEventKinds.ENTRY_CREATE},
+   SensitivityWatchEventModifier.HIGH);
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   /**
+* Waits until the latch file is created.
+*
+* @throws InterruptedException if interrupted while waiting
+*/
+   public void await() throws InterruptedException {
+   if (isReleasedOrReleasable()) {
+   return;
+   }
+
+   awaitLatchFile(watchService);
+   }
+
+   private void awaitLatchFile(final WatchService watchService) throws 
InterruptedException {
+   while (true) {
+   WatchKey take = watchService.take();
 
 Review comment:
   rename to `watchKey`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304980645
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/pom.xml
 ##
 @@ -0,0 +1,76 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+
+   4.0.0
+
+   
+   org.apache.flink
+   flink-end-to-end-tests
+   1.9-SNAPSHOT
+   ..
+   
+
+   flink-dataset-fine-grained-recovery-test
+   flink-dataset-fine-grained-recovery-test
+   jar
+
+   
+   
+   org.apache.flink
+   flink-java
+   ${project.version}
+   provided
+   
+
+   
+   junit
+   junit
 
 Review comment:
   Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304980030
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
 
 Review comment:
   Changed the surefire config. I find it awkward to have a dependency on  
`flink-test-utils` with `compile` scope since the job is strictly speaking not 
a test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304974598
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/pom.xml
 ##
 @@ -0,0 +1,76 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+
+   4.0.0
+
+   
+   org.apache.flink
+   flink-end-to-end-tests
+   1.9-SNAPSHOT
+   ..
+   
+
+   flink-dataset-fine-grained-recovery-test
+   flink-dataset-fine-grained-recovery-test
+   jar
+
+   
+   
+   org.apache.flink
+   flink-java
+   ${project.version}
+   provided
+   
+
+   
+   junit
+   junit
 
 Review comment:
   Actually this dependency is not needed since it already comes from 
`flink-parent`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304932329
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
+
+   @Rule
+   public TemporaryFolder temporaryFolder = new TemporaryFolder();
 
 Review comment:
   accidentally used `--amend` when fixing


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304931295
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
+
+   @Rule
+   public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+   private FileBasedOneShotLatch latch;
+
+   private File latchFile;
+
+   @Before
+   public void setUp() {
+   latchFile = new File(temporaryFolder.getRoot(), "latchFile");
+   latch = new FileBasedOneShotLatch(latchFile.toPath());
+   }
+
+   @Test
+   public void awaitReturnsWhenFileIsCreated() throws Exception {
+   final AtomicBoolean awaitCompleted = new AtomicBoolean();
+   final Thread thread = new Thread(() -> {
+   try {
+   latch.await();
+   awaitCompleted.set(true);
+   } catch (InterruptedException e) {
+   Thread.currentThread().interrupt();
+   }
+   });
+   thread.start();
+
+   latchFile.createNewFile();
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304931209
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.FileSystems;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService();
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService() {
+   try {
+   return FileSystems.getDefault().newWatchService();
 
 Review comment:
   good suggestion, done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304931093
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
+
+   @Rule
+   public TemporaryFolder temporaryFolder = new TemporaryFolder();
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304930823
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
+
+   @Rule
+   public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+   private FileBasedOneShotLatch latch;
+
+   private File latchFile;
+
+   @Before
+   public void setUp() {
+   latchFile = new File(temporaryFolder.getRoot(), "latchFile");
+   latch = new FileBasedOneShotLatch(latchFile.toPath());
+   }
+
+   @Test
+   public void awaitReturnsWhenFileIsCreated() throws Exception {
+   final AtomicBoolean awaitCompleted = new AtomicBoolean();
+   final Thread thread = new Thread(() -> {
+   try {
+   latch.await();
+   awaitCompleted.set(true);
+   } catch (InterruptedException e) {
+   Thread.currentThread().interrupt();
+   }
+   });
+   thread.start();
+
+   latchFile.createNewFile();
 
 Review comment:
   I will add a new test case. The latch should not block.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304930823
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
+
+   @Rule
+   public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+   private FileBasedOneShotLatch latch;
+
+   private File latchFile;
+
+   @Before
+   public void setUp() {
+   latchFile = new File(temporaryFolder.getRoot(), "latchFile");
+   latch = new FileBasedOneShotLatch(latchFile.toPath());
+   }
+
+   @Test
+   public void awaitReturnsWhenFileIsCreated() throws Exception {
+   final AtomicBoolean awaitCompleted = new AtomicBoolean();
+   final Thread thread = new Thread(() -> {
+   try {
+   latch.await();
+   awaitCompleted.set(true);
+   } catch (InterruptedException e) {
+   Thread.currentThread().interrupt();
+   }
+   });
+   thread.start();
+
+   latchFile.createNewFile();
 
 Review comment:
   I will add a new test case


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-18 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304926563
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/main/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatch.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.nio.file.FileSystems;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A synchronization aid that allows a single thread to wait on the creation 
of a specified file.
+ */
+@NotThreadSafe
+public class FileBasedOneShotLatch implements Closeable {
+
+   private final Path latchFile;
+
+   private final WatchService watchService;
+
+   private boolean released;
+
+   public FileBasedOneShotLatch(final Path latchFile) {
+   this.latchFile = checkNotNull(latchFile);
+
+   final Path parentDir = checkNotNull(latchFile.getParent(), 
"latchFile must have a parent");
+   this.watchService = initWatchService(parentDir);
+   }
+
+   private static WatchService initWatchService(final Path parentDir) {
+   final WatchService watchService = createWatchService();
+   watchForLatchFile(watchService, parentDir);
+   return watchService;
+   }
+
+   private static WatchService createWatchService() {
+   try {
+   return FileSystems.getDefault().newWatchService();
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   private static void watchForLatchFile(final WatchService watchService, 
final Path parentDir) {
+   try {
+   parentDir.register(
+   watchService,
+   new 
WatchEvent.Kind[]{StandardWatchEventKinds.ENTRY_CREATE},
+   SensitivityWatchEventModifier.HIGH);
+   } catch (IOException e) {
+   throw new RuntimeException(e);
+   }
+   }
+
+   /**
+* Waits until the latch file is created.
+*
+* @throws InterruptedException if interrupted while waiting
+*/
+   public void await() throws InterruptedException {
+   if (isReleasedOrReleasable()) {
+   return;
+   }
+
+   awaitLatchFile(watchService);
+   }
+
+   private void awaitLatchFile(final WatchService watchService) throws 
InterruptedException {
+   while (true) {
+   WatchKey take = watchService.take();
+   if (isReleasedOrReleasable()) {
 
 Review comment:
   It could be that other files are created in the directory.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-17 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304509653
 
 

 ##
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##
 @@ -53,20 +52,51 @@ function run_ha_test() {
 
 wait_job_running ${JOB_ID}
 
-# start the watchdog that keeps the number of JMs stable
-start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+local c
 for (( c=0; c<${JM_KILLS}; c++ )); do
 # kill the JM and wait for watchdog to
 # create a new one which will take over
 kill_single 'StandaloneSessionClusterEntrypoint'
 wait_job_running ${JOB_ID}
 done
 
-cancel_job ${JOB_ID}
+for (( c=0; c<${TM_KILLS}; c++ )); do
+sleep $(( ( RANDOM % 10 )  + 1 ))
+kill_and_replace_random_task_manager
+wait_job_running ${JOB_ID}
+done
+
+wait_job_terminal_state ${JOB_ID} "FINISHED"
 
 Review comment:
   I added a new job that blocks on an external condition.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-17 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r304509103
 
 

 ##
 File path: 
flink-end-to-end-tests/flink-dataset-fine-grained-recovery-test/src/test/java/org/apache/flink/batch/tests/util/FileBasedOneShotLatchTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.batch.tests.util;
+
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for {@link FileBasedOneShotLatch}.
+ */
+public class FileBasedOneShotLatchTest {
 
 Review comment:
   Test is not run due to surefire config in the `flink-end-to-end-tests` 
module. I don't have a solution. Suggestions welcome.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-10 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r302172084
 
 

 ##
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##
 @@ -53,20 +52,51 @@ function run_ha_test() {
 
 wait_job_running ${JOB_ID}
 
-# start the watchdog that keeps the number of JMs stable
-start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+local c
 for (( c=0; c<${JM_KILLS}; c++ )); do
 # kill the JM and wait for watchdog to
 # create a new one which will take over
 kill_single 'StandaloneSessionClusterEntrypoint'
 wait_job_running ${JOB_ID}
 done
 
-cancel_job ${JOB_ID}
+for (( c=0; c<${TM_KILLS}; c++ )); do
+sleep $(( ( RANDOM % 10 )  + 1 ))
+kill_and_replace_random_task_manager
+wait_job_running ${JOB_ID}
+done
+
+wait_job_terminal_state ${JOB_ID} "FINISHED"
 
 Review comment:
   These are valid concerns.
   
   > How much longer does the test now run for?
   
   The test runs 4.5-5 minutes on my machine. It takes around 2 minutes to 
complete the batch job after the last injected fault (time determined using 
unscientific methods). The test in its current form is rather similar to 
`test_batch_allround.sh` so there is a chance that these can be merged.
   
   > I like neither option, do admit though that this would make it very 
difficult (or even impossible) to verify the correctness of the output.
   
   I don't see a good solution yet. Here are some options:
   1. Make job block on external signals (files), and make job smaller (smaller 
dataset)
   1. Leave it as before, i.e., don't verify correctness of the output (but use 
infinite data source)
   
   
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-10 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r302172084
 
 

 ##
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##
 @@ -53,20 +52,51 @@ function run_ha_test() {
 
 wait_job_running ${JOB_ID}
 
-# start the watchdog that keeps the number of JMs stable
-start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+local c
 for (( c=0; c<${JM_KILLS}; c++ )); do
 # kill the JM and wait for watchdog to
 # create a new one which will take over
 kill_single 'StandaloneSessionClusterEntrypoint'
 wait_job_running ${JOB_ID}
 done
 
-cancel_job ${JOB_ID}
+for (( c=0; c<${TM_KILLS}; c++ )); do
+sleep $(( ( RANDOM % 10 )  + 1 ))
+kill_and_replace_random_task_manager
+wait_job_running ${JOB_ID}
+done
+
+wait_job_terminal_state ${JOB_ID} "FINISHED"
 
 Review comment:
   These are valid concerns.
   
   > How much longer does the test now run for?
   
   The test runs 4.5-5 minutes on my machine. It takes around 2 minutes to 
complete the batch job after the last injected fault (time determined using 
unscientific methods). The test in its current form is rather similar to 
`test_batch_allround.sh` so there is a chance that these can be merged.
   
   > I like neither option, do admit though that this would make it very 
difficult (or even impossible) to verify the correctness of the output.
   
   I don't see a good solution yet. Here are some options:
   1. Make job block on external signals (files), and make job smaller (smaller 
dataset)
   1. Leave it as before, i.e., don't verify correctness of the output
   
   
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-10 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r302172084
 
 

 ##
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##
 @@ -53,20 +52,51 @@ function run_ha_test() {
 
 wait_job_running ${JOB_ID}
 
-# start the watchdog that keeps the number of JMs stable
-start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+local c
 for (( c=0; c<${JM_KILLS}; c++ )); do
 # kill the JM and wait for watchdog to
 # create a new one which will take over
 kill_single 'StandaloneSessionClusterEntrypoint'
 wait_job_running ${JOB_ID}
 done
 
-cancel_job ${JOB_ID}
+for (( c=0; c<${TM_KILLS}; c++ )); do
+sleep $(( ( RANDOM % 10 )  + 1 ))
+kill_and_replace_random_task_manager
+wait_job_running ${JOB_ID}
+done
+
+wait_job_terminal_state ${JOB_ID} "FINISHED"
 
 Review comment:
   These are valid concerns.
   
   > How much longer does the test now run for?
   
   The test runs 4.5-5 minutes on my machine. It takes around 2 minutes to 
complete the batch job after the last injected fault (time determined using 
unscientific methods). 
   
   > I like neither option, do admit though that this would make it very 
difficult (or even impossible) to verify the correctness of the output.
   
   I don't see a good solution yet. Here are some options:
   1. Make job block on external signals (files)
   1. Leave it as before, i.e., don't verify correctness of the output
   
   
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-10 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r302167993
 
 

 ##
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##
 @@ -53,20 +50,51 @@ function run_ha_test() {
 
 wait_job_running ${JOB_ID}
 
-# start the watchdog that keeps the number of JMs stable
-start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+local c
 for (( c=0; c<${JM_KILLS}; c++ )); do
 # kill the JM and wait for watchdog to
 # create a new one which will take over
 kill_single 'StandaloneSessionClusterEntrypoint'
 wait_job_running ${JOB_ID}
 done
 
-cancel_job ${JOB_ID}
+for (( c=0; c<${TM_KILLS}; c++ )); do
+sleep $(( ( RANDOM % 10 )  + 1 ))
+kill_and_replace_random_task_manager
+wait_job_running ${JOB_ID}
 
 Review comment:
   `wait_job_running` can be omitted. In fact it only asserts that the job 
appears in the `flink list`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

2019-07-10 Thread GitBox
GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r302055257
 
 

 ##
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##
 @@ -53,20 +52,51 @@ function run_ha_test() {
 
 wait_job_running ${JOB_ID}
 
-# start the watchdog that keeps the number of JMs stable
-start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+local c
 for (( c=0; c<${JM_KILLS}; c++ )); do
 # kill the JM and wait for watchdog to
 # create a new one which will take over
 kill_single 'StandaloneSessionClusterEntrypoint'
 wait_job_running ${JOB_ID}
 done
 
-cancel_job ${JOB_ID}
+for (( c=0; c<${TM_KILLS}; c++ )); do
+sleep $(( ( RANDOM % 10 )  + 1 ))
+kill_and_replace_random_task_manager
+wait_job_running ${JOB_ID}
 
 Review comment:
   `wait_job_running` will terminate the script if the job does not become 
running within a timeout (10s). Since we are not launching a new process by 
invoking the function, the main script will exit. Am I missing something?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services