[GitHub] [flink] becketqin commented on a change in pull request #12234: [FLINK-16986][coordination] Provide exactly-once guaranteed around checkpoints and operator event sending

2020-05-29 Thread GitBox


becketqin commented on a change in pull request #12234:
URL: https://github.com/apache/flink/pull/12234#discussion_r432805261



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/coordination/OperatorCoordinatorHolder.java
##
@@ -99,29 +187,110 @@ public void close() throws Exception {
 
@Override
public void handleEventFromOperator(int subtask, OperatorEvent event) 
throws Exception {
+   mainThreadExecutor.assertRunningInMainThread();
coordinator.handleEventFromOperator(subtask, event);
}
 
@Override
public void subtaskFailed(int subtask, @Nullable Throwable reason) {
+   mainThreadExecutor.assertRunningInMainThread();
coordinator.subtaskFailed(subtask, reason);
+   eventValve.resetForTask(subtask);
}
 
@Override
-   public CompletableFuture checkpointCoordinator(long 
checkpointId) throws Exception {
-   return coordinator.checkpointCoordinator(checkpointId);
+   public void checkpointCoordinator(long checkpointId, 
CompletableFuture result) {
+   // unfortunately, this method does not run in the scheduler 
executor, but in the
+   // checkpoint coordinator time thread.
+   // we can remove the delegation once the checkpoint coordinator 
runs fully in the scheduler's
+   // main thread executor
+   mainThreadExecutor.execute(() -> 
checkpointCoordinatorInternal(checkpointId, result));
}
 
@Override
public void checkpointComplete(long checkpointId) {
-   coordinator.checkpointComplete(checkpointId);
+   // unfortunately, this method does not run in the scheduler 
executor, but in the
+   // checkpoint coordinator time thread.
+   // we can remove the delegation once the checkpoint coordinator 
runs fully in the scheduler's
+   // main thread executor
+   mainThreadExecutor.execute(() -> 
checkpointCompleteInternal(checkpointId));
}
 
@Override
public void resetToCheckpoint(byte[] checkpointData) throws Exception {
+   // ideally we would like to check this here, however this 
method is called early during
+   // execution graph construction, before the main thread 
executor is set
+
+   eventValve.reset();

Review comment:
   It seems we have a minor race conditions here. It is possible that the 
coordinator sends some events between line 224 and line 225, i.e. after the 
eventValve is reset and before the coordinator has been reset to the checkpoint.
   
   I think those events won't be actually sent to the tasks because the 
operator coordinators are only restored in a global recovery, and the tasks 
shouldn't have started yet. If so, there is no real impact here except that 
those events will receive a different exception from the events failed by 
`eventValve.reset()`.

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/operators/coordination/CoordinatorEventsExactlyOnceITCase.java
##
@@ -0,0 +1,494 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.operators.coordination;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.accumulators.ListAccumulator;
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.RestOptions;
+import org.apache.flink.runtime.checkpoint.CheckpointMetaData;
+import org.apache.flink.runtime.checkpoint.CheckpointMetrics;
+import org.apache.flink.runtime.checkpoint.CheckpointOptions;
+import org.apache.flink.runtime.checkpoint.OperatorSubtaskState;
+import org.apache.flink.runtime.checkpoint.PrioritizedOperatorSubtaskState;
+import org.apache.flink.runtime.checkpoint.StateObjectCollection;
+import org.apache.flink.runtime.checkpoint.TaskStateSnapshot;
+import org.apache.flink.runtime.execution.Environment;
+import org.apache.flink.runtime.jobgraph.JobGraph;

[jira] [Commented] (FLINK-17091) AvroRow(De)SerializationSchema doesn't support new Timestamp conversion classes

2020-05-29 Thread Paul Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120097#comment-17120097
 ] 

Paul Lin commented on FLINK-17091:
--

[~jark] I checked AvroRowDataDeserializationSchema, and yes, the problem should 
be fixed in it. Is AvroRow(De)SerializationSchema going to be deprecated in 
1.11? In that case, this fix should only apply to 1.10.x.

> AvroRow(De)SerializationSchema doesn't support new Timestamp conversion 
> classes
> ---
>
> Key: FLINK-17091
> URL: https://issues.apache.org/jira/browse/FLINK-17091
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.10.0
>Reporter: Paul Lin
>Assignee: Paul Lin
>Priority: Major
>
> AvroRow(De)SerializationSchema doesn't know how to convert the new physical 
> classes of Timestamp (eg. java.time.Date) to/from Avro's int/long based 
> timestamp. Currently, when encountering objects of the new physical classes, 
> AvroRow(De)SerializationSchema just ignores them and passes them to Avro's 
> GenericDatumWriter/Reader, which leads to ClassCastException thrown by 
> GenericDatumWriter/Reader. See 
> [AvroRowSerializationSchema|https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/AvroRowSerializationSchema.java#L251].
> To fix this problem, we should support LocalTime/LocalDate/LocalDateTime 
> conversion to int/long in AvroRowSerializationSchema, and support int/long 
> conversion to LocalTime/LocalDate/LocalDateTime based on logical 
> types(Types.LOCAL_TIME/Types.LOCAL_DATE/Types.LOCAL_DATE_TIME) in 
> AvroRowDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16085) Translate "Joins in Continuous Queries" page of "Streaming Concepts" into Chinese

2020-05-29 Thread Authuir (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120086#comment-17120086
 ] 

Authuir commented on FLINK-16085:
-

Hi [~jark], sorry for the delay. Actually it's nearing completion. I'll send 
out the PR today.

> Translate "Joins in Continuous Queries" page of "Streaming Concepts" into 
> Chinese 
> --
>
> Key: FLINK-16085
> URL: https://issues.apache.org/jira/browse/FLINK-16085
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Jark Wu
>Assignee: Authuir
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/streaming/joins.html
> The markdown file is located in {{flink/docs/dev/table/streaming/joins.zh.md}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12413: [FLINK-16198][core] Fix FileUtilsTest fails on MacOS

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12413:
URL: https://github.com/apache/flink/pull/12413#issuecomment-636264630


   
   ## CI report:
   
   * 9495b0c78ae7f16e5f5fa0c0e7baa265942b742f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2472)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17895) Default value of rows-per-second in datagen can be limited

2020-05-29 Thread liufangliang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120077#comment-17120077
 ] 

liufangliang commented on FLINK-17895:
--

[~lzljs3620320] can you add a description?

> Default value of rows-per-second in datagen can be limited
> --
>
> Key: FLINK-17895
> URL: https://issues.apache.org/jira/browse/FLINK-17895
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Jingsong Lee
>Priority: Minor
>  Labels: starter
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17736) Add flink CEP examples

2020-05-29 Thread liufangliang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120076#comment-17120076
 ] 

liufangliang commented on FLINK-17736:
--

[~lzljs3620320] Can you add a description?

> Add flink CEP examples
> --
>
> Key: FLINK-17736
> URL: https://issues.apache.org/jira/browse/FLINK-17736
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples, Library / CEP
>Reporter: dengziming
>Priority: Minor
>  Labels: pull-request-available, starter
>
> There is not a flink cep example, we can add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12413: [FLINK-16198][core] Fix FileUtilsTest fails on MacOS

2020-05-29 Thread GitBox


flinkbot commented on pull request #12413:
URL: https://github.com/apache/flink/pull/12413#issuecomment-636264630


   
   ## CI report:
   
   * 9495b0c78ae7f16e5f5fa0c0e7baa265942b742f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18039) Change SourceCoordinator to handle resetToCheckpoint() call after started.

2020-05-29 Thread Jiangjie Qin (Jira)
Jiangjie Qin created FLINK-18039:


 Summary: Change SourceCoordinator to handle resetToCheckpoint() 
call after started.
 Key: FLINK-18039
 URL: https://issues.apache.org/jira/browse/FLINK-18039
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.11.0
Reporter: Jiangjie Qin


Right now the SourceCoordinator assumes that {{resetToCheckpoint()}} is only 
called before {{start()}} is called. We need to change the SourceCoordinator to 
handle the case when {{resetToCheckpoint()}} is invoked after the 
SourceCoordinator has started.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12413: [FLINK-16198][core] Fix FileUtilsTest fails on MacOS

2020-05-29 Thread GitBox


flinkbot commented on pull request #12413:
URL: https://github.com/apache/flink/pull/12413#issuecomment-636263706


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 9495b0c78ae7f16e5f5fa0c0e7baa265942b742f (Sat May 30 
02:35:40 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-16198) FileUtilsTest fails on Mac OS

2020-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-16198:
---
Labels: pull-request-available starter  (was: starter)

> FileUtilsTest fails on Mac OS
> -
>
> Key: FLINK-16198
> URL: https://issues.apache.org/jira/browse/FLINK-16198
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems, Tests
>Reporter: Andrey Zagrebin
>Assignee: Zhe Yu
>Priority: Major
>  Labels: pull-request-available, starter
>
> The following tests fail if run on Mac OS (IDE/maven).
>  
> FileUtilsTest.testCompressionOnRelativePath: 
> {code:java}
> java.nio.file.NoSuchFileException: 
> ../../../../../var/folders/67/v4yp_42d21j6_n8k1h556h0cgn/T/junit6496651678375117676/compressDir/rootDirjava.nio.file.NoSuchFileException:
>  
> ../../../../../var/folders/67/v4yp_42d21j6_n8k1h556h0cgn/T/junit6496651678375117676/compressDir/rootDir
>  at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>  at java.nio.file.Files.createDirectory(Files.java:674) at 
> org.apache.flink.util.FileUtilsTest.verifyDirectoryCompression(FileUtilsTest.java:440)
>  at 
> org.apache.flink.util.FileUtilsTest.testCompressionOnRelativePath(FileUtilsTest.java:261)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20) at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:137) at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>  at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>  at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
>  
> FileUtilsTest.testDeleteDirectoryConcurrently: 
> {code:java}
> java.nio.file.FileSystemException: 
> /var/folders/67/v4yp_42d21j6_n8k1h556h0cgn/T/junit7558825557740784886/junit3566161583262218465/ab1fa0bde8b22cad58b717508c7a7300/121fdf5f7b057183843ed2e1298f9b66/6598025f390d3084d69c98b36e542fe2/8db7cd9c063396a19a86f5b63ce53f66:
>  Invalid argument   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
>   at 
> sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
>   at java.nio.file.Files.deleteIfExists(Files.java:1165)
>   at 
> org.apache.flink.util.FileUtils.deleteFileOrDirectoryInternal(FileUtils.java:324)
>   at org.apache.flink.util.FileUtils.guardIfWindows(FileUtils.java:391)
>   at 
> org.apache.flink.util.FileUtils.deleteFileOrDirectory(FileUtils.java:258)
>   at 
> org.apache.flink.util.FileUtils.cleanDirectoryInternal(FileUtils.java:376)
>   at 
> org.apache.flink.util.FileUtils.deleteDirectoryInternal(FileUtils.java:335)
>   at 
> 

[GitHub] [flink] fishzhe opened a new pull request #12413: [FLINK-16198][core] Fix FileUtilsTest fails on MacOS

2020-05-29 Thread GitBox


fishzhe opened a new pull request #12413:
URL: https://github.com/apache/flink/pull/12413


   
   
   ## What is the purpose of the change
   
   Fix FileUtilsTest fails on MacOS
   
   ## Brief change log
   
 -*Fix concurrent delete failure on MacOS*
 -*Fix relative compression path test failure*
 
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is already covered by existing tests, such as *(FileUtilsTest.)*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no) no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no) no
 - The serializers: (yes / no / don't know) no
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know) no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know) no
 - The S3 file system connector: (yes / no / don't know) no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no) no
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18036) Chinese documentation build is broken

2020-05-29 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120005#comment-17120005
 ] 

Jark Wu commented on FLINK-18036:
-

[~wuxuyang78], do you want to fix this one together? 

> Chinese documentation build is broken
> -
>
> Key: FLINK-18036
> URL: https://issues.apache.org/jira/browse/FLINK-18036
> Project: Flink
>  Issue Type: Bug
>  Components: chinese-translation, Documentation
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.11.0
>
>
> Log from one of the builders: 
> https://ci.apache.org/builders/flink-docs-master/builds/1848/steps/Build%20docs/logs/stdio
> The problem is that the chinese doc uses "link" tags that refer to documents 
> from the english documentation. It should be as easy as adding {{.zh}} in 
> these links.
> It seems this change introduced the problem: 
> https://github.com/apache/flink/commit/d40abbf0309f414a6acf8a090c448ba397a08d9c



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise commented on a change in pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-29 Thread GitBox


AHeise commented on a change in pull request #12353:
URL: https://github.com/apache/flink/pull/12353#discussion_r43276



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/BroadcastRecordWriter.java
##
@@ -130,7 +132,7 @@ public BufferBuilder requestNewBufferBuilder(int 
targetChannel) throws IOExcepti
 
BufferBuilder builder = 
super.requestNewBufferBuilder(targetChannel);
if (randomTriggered) {
-   addBufferConsumer(builder.createBufferConsumer(), 
targetChannel);
+   addBufferConsumer(randomConsumer = 
builder.createBufferConsumer(), targetChannel);

Review comment:
   You are right, currently LM gets replicated, which is not good. I was 
expecting `#copy` to pick up the current builder position for some reason.
   
   Coming back to this bug, the issue is that the underlying memory segment 
will be recycled as soon as the first buffer consumer (with LM) is being 
consumed. That causes buffers to be effectively skipped and corrupted streams.
   
   @pnowojski and I was also questioning why `BufferBuilder` only holds the 
memory segment and not the buffer. By definition calling 
`#createBufferConsumer` more than once will create the aforementioned race 
condition.
   
   It would be solved if `BufferBuilder` holds the buffer instead of the raw 
memory segment as a buffer would only be recycled when the last buffer consumer 
has been closed. However, it seems so obvious that there must be a reason for 
the current implementation.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18038) StateBackendLoader logs application-defined state before it is fully configured

2020-05-29 Thread Steve Bairos (Jira)
Steve Bairos created FLINK-18038:


 Summary: StateBackendLoader logs application-defined state before 
it is fully configured
 Key: FLINK-18038
 URL: https://issues.apache.org/jira/browse/FLINK-18038
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.9.1
Reporter: Steve Bairos


In the 
[StateBackendLoader|[https://github.com/apache/flink/blob/bb46756b84940a6134910e74406bfaff4f2f37e9/flink-runtime/src/main/java/org/apache/flink/runtime/state/StateBackendLoader.java#L201]],
 there's this log line:
{code:java}
logger.info("Using application-defined state backend: {}", fromApplication); 
{code}
It seems like this is inaccurate though because immediately after logging this, 
if fromApplication is a ConfigurableStateBackend, we call the .configure() 
function and it is replaced by a newly configured StateBackend. 

To me, it seems like it would be better if we logged the state backend after it 
was fully configured. In the current setup, we get confusing logs like this: 
{code:java}
2020-05-29 21:39:44,387 INFO  
org.apache.flink.streaming.runtime.tasks.StreamTask   - Using 
application-defined state backend: 
RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 
's3://pinterest-montreal/checkpoints/xenon-dev-001-20191210/Xenon/BasicJavaStream',
 savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), 
localRocksDbDirectories=null, enableIncrementalCheckpointing=UNDEFINED, 
numberOfTransferingThreads=-1}2020-05-29 21:39:44,387 INFO  
org.apache.flink.streaming.runtime.tasks.StreamTask   - Configuring 
application-defined state backend with job/cluster config{code}
Which makes it ambiguous whether or not settings in our flink-conf.yaml like 
"state.backend.incremental: true" are being applied properly or not. 

 

I can make a diff for the change if there aren't any objections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12285: [FLINK-17445][State Processor] Add Scala support for OperatorTransformation

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12285:
URL: https://github.com/apache/flink/pull/12285#issuecomment-632360179


   
   ## CI report:
   
   * a61851595a23f50e7a282e702bf28cdcabe3e3d8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2468)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12380: [FLINK-17581][docs-zh] Update translation of S3 documentation

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12380:
URL: https://github.com/apache/flink/pull/12380#issuecomment-635291205


   
   ## CI report:
   
   * 823086d411e78ab2510834f96e7203d313026d9c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2467)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12234: [FLINK-16986][coordination] Provide exactly-once guaranteed around checkpoints and operator event sending

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12234:
URL: https://github.com/apache/flink/pull/12234#issuecomment-630433269


   
   ## CI report:
   
   * fe27eb7b98b31454c615e1ae2f07082825a10da3 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2471)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12401: [FLINK-16057] Optimize ContinuousFileReaderOperator

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12401:
URL: https://github.com/apache/flink/pull/12401#issuecomment-635845764


   
   ## CI report:
   
   * 3bde38fba011e5c786da8f072543b673a5b25eb0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2466)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12234: [FLINK-16986][coordination] Provide exactly-once guaranteed around checkpoints and operator event sending

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12234:
URL: https://github.com/apache/flink/pull/12234#issuecomment-630433269


   
   ## CI report:
   
   * 90e782a60c9a9514716b2a2b501f133847b9dcc2 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2359)
 
   * fe27eb7b98b31454c615e1ae2f07082825a10da3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2471)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen commented on pull request #12234: [FLINK-16986][coordination] Provide exactly-once guaranteed around checkpoints and operator event sending

2020-05-29 Thread GitBox


StephanEwen commented on pull request #12234:
URL: https://github.com/apache/flink/pull/12234#issuecomment-636172362


   Updated the PR to address the issue raised by @becketqin 
   
   This required a change in the `OperatorCoordinator` interface. To explain 
why, let's revisit the semantics we want to offer.
   
   The exactly-once semantics for the `OperatorCoordinator` are defined as 
follows:
 - The point in time when the checkpoint future is completed is considered 
the point in time when the coordinator's checkpoint takes place.
 - The OperatorCoordinator implementation must have a way of strictly 
ordering the sending of events and the completion of the checkpoint future (for 
example the same thread does both actions, or both actions are guarded by a 
mutex).
 - Every event sent before the checkpoint future is completed is considered 
before the checkpoint.
 - Every event sent after the checkpoint future is completed is considered 
to be after the checkpoint.
   
   The previous interface did not allow us to observe this point accurately. 
The future was created inside the application-specific OperatorCoordinator code 
and returned from the methods. By the time that the scheduler/checkpointing 
code could observe the future (attach handlers to it), some (small amount of) 
time had inevitably passed in the meantime.
   Within that time, the future could already be complete and some events could 
have been sent, and in that case the scheduler/checkpointing code could not 
determin which events were before the completion of the future, and which 
events were after the completion of the future.
   
   We hence need to change the checkpointing method from
   ```java
   CompletableFuture checkpointCoordinator(long checkpointId) throws 
Exception;
   ```
   to
   ```java
   void checkpointCoordinator(long checkpointId, CompletableFuture 
result) throws Exception;
   ```
   
   The changed interface passes the future from the scheduler/checkpointing 
code into the coordinator. The future already has synchronous handlers attached 
to it which exactly mark the point when the future was completed, allowing the 
scheduler/checkpointing code to observe the correct order in which the 
Checkpoint Coordinator implementation performed its actions (event sending, 
future completion).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12234: [FLINK-16986][coordination] Provide exactly-once guaranteed around checkpoints and operator event sending

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12234:
URL: https://github.com/apache/flink/pull/12234#issuecomment-630433269


   
   ## CI report:
   
   * 90e782a60c9a9514716b2a2b501f133847b9dcc2 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2359)
 
   * fe27eb7b98b31454c615e1ae2f07082825a10da3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12412: [FLINK-18011] Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12412:
URL: https://github.com/apache/flink/pull/12412#issuecomment-636063842


   
   ## CI report:
   
   * 55b61138a63ce9fde288910e00e533798c8cbaec Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2465)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen commented on a change in pull request #12234: [FLINK-16986][coordination] Provide exactly-once guaranteed around checkpoints and operator event sending

2020-05-29 Thread GitBox


StephanEwen commented on a change in pull request #12234:
URL: https://github.com/apache/flink/pull/12234#discussion_r432706623



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/OperatorCoordinatorCheckpoints.java
##
@@ -58,7 +58,17 @@
final Collection> 
individualSnapshots = new ArrayList<>(coordinators.size());
 
for (final OperatorCoordinatorCheckpointContext coordinator : 
coordinators) {
-   
individualSnapshots.add(triggerCoordinatorCheckpoint(coordinator, 
checkpointId));
+   final CompletableFuture 
checkpointFuture = triggerCoordinatorCheckpoint(coordinator, checkpointId);
+   coordinator.onCallTriggerCheckpoint(checkpointId);
+
+   individualSnapshots.add(checkpointFuture);
+   checkpointFuture.whenComplete((ignored, failure) -> {
+   if (failure != null) {
+   coordinator.abortCurrentTriggering();
+   } else {
+   
coordinator.onCheckpointStateFutureComplete(checkpointId);

Review comment:
   I found a way to implement this test scenario: 
https://github.com/apache/flink/pull/12234/commits/6af459b5a07576f48a600484515ac30d72fcc4eb#diff-89f8c4f657c748242a72e9efdf5e7f97R285
 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TengHu commented on a change in pull request #12297: [FLINK-12855] [streaming-java][window-assigners] Add functionality that staggers panes on partitions to distribute workload.

2020-05-29 Thread GitBox


TengHu commented on a change in pull request #12297:
URL: https://github.com/apache/flink/pull/12297#discussion_r432696612



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/assigners/TumblingProcessingTimeWindows.java
##
@@ -46,28 +46,44 @@
 
private final long size;
 
-   private final long offset;
+   private final long globalOffset;
 
-   private TumblingProcessingTimeWindows(long size, long offset) {
+   private Long staggerOffset = null;
+
+   private final WindowStagger windowStagger;
+
+   private TumblingProcessingTimeWindows(long size, long offset, 
WindowStagger windowStagger) {
if (Math.abs(offset) >= size) {
throw new 
IllegalArgumentException("TumblingProcessingTimeWindows parameters must satisfy 
abs(offset) < size");
}
 
this.size = size;
-   this.offset = offset;
+   this.globalOffset = offset;
+   this.windowStagger = windowStagger;
}
 
@Override
public Collection assignWindows(Object element, long 
timestamp, WindowAssignerContext context) {
final long now = context.getCurrentProcessingTime();
-   long start = TimeWindow.getWindowStartWithOffset(now, offset, 
size);
+   if (staggerOffset == null) {
+   staggerOffset = 
windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);

Review comment:
   Yes, your interpretation is correct, the intention was to align the 
panes with respect to the first event ever ingested.
   
   Events arrive at different times among partitions, the differences were 
usually caused by the design of partition keys, network delay, etc, which led 
to an implicit staggering (normal distribution in our case).
   
   This is useful because compared to the ALIGNED and RANDOM, this one staggers 
the pane but still preserves some useful alignments (for example, in our 
geospatial applications, windowing on events partitioned by cities still 
trigger at the same time if they're under same time zone ).
   
   Therefore, we think this would be a useful option for staggering.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TengHu commented on a change in pull request #12297: [FLINK-12855] [streaming-java][window-assigners] Add functionality that staggers panes on partitions to distribute workload.

2020-05-29 Thread GitBox


TengHu commented on a change in pull request #12297:
URL: https://github.com/apache/flink/pull/12297#discussion_r432696612



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/assigners/TumblingProcessingTimeWindows.java
##
@@ -46,28 +46,44 @@
 
private final long size;
 
-   private final long offset;
+   private final long globalOffset;
 
-   private TumblingProcessingTimeWindows(long size, long offset) {
+   private Long staggerOffset = null;
+
+   private final WindowStagger windowStagger;
+
+   private TumblingProcessingTimeWindows(long size, long offset, 
WindowStagger windowStagger) {
if (Math.abs(offset) >= size) {
throw new 
IllegalArgumentException("TumblingProcessingTimeWindows parameters must satisfy 
abs(offset) < size");
}
 
this.size = size;
-   this.offset = offset;
+   this.globalOffset = offset;
+   this.windowStagger = windowStagger;
}
 
@Override
public Collection assignWindows(Object element, long 
timestamp, WindowAssignerContext context) {
final long now = context.getCurrentProcessingTime();
-   long start = TimeWindow.getWindowStartWithOffset(now, offset, 
size);
+   if (staggerOffset == null) {
+   staggerOffset = 
windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);

Review comment:
   Yes, your interpretation is correct, the intention was to align the 
panes with respect to the first event evert ingested.
   
   Events arrive at different times among partitions, the differences were 
usually caused by the design of partition keys, network delay, etc, which led 
to an implicit staggering (normal distribution in our case).
   
   This is useful because compared to the ALIGNED and RANDOM, this one staggers 
the pane but still preserves some useful alignments (for example, in our 
geospatial applications, windowing on events partitioned by cities still 
trigger at the same time if they're under same time zone ).
   
   Therefore, we think this would be a useful option for staggering.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-29 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119890#comment-17119890
 ] 

Dian Fu commented on FLINK-17923:
-

We assume that the Python worker uses task off-heap memory in this case and so 
it will still work on YARN.

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
>   ... 11 more
> Caused by: org.apache.flink.runtime.memory.MemoryAllocationException: Could 
> not created the shared memory resource of size 536870920. 

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-29 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119883#comment-17119883
 ] 

Stephan Ewen commented on FLINK-17923:
--

I think my suggestion from above is not a feasible for for 1.11.

[~dian.fu] changing Python to not use managed memory - would it still work on 
Yarn then? Or would we expect that the TM gets killed very often, for using too 
much memory?


> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
>   ... 11 

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-29 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119880#comment-17119880
 ] 

Stephan Ewen commented on FLINK-17923:
--

We could think about making the Python Processes the responsibility of the 
deployment framework.

  - The Kubernetes resource manager would deploy pods with multiple containers 
(one TM and multiple Python processes, one per slot for example).
  - The Yarn resource manager would ask for larger resource containers, to 
accommodate the additional memory that the python processes require.
  - In standalone, the machine simply needs to have enough memory, the same way 
as when starting a standalone session. It is the user's responsibility when 
they set up Flink.

In that case the managed memory would all go to RocksDB or to the batch 
algorithms.

As hinted above, this probably needs a change that there is one Python process 
per TaskManager, or at least one Python process per slot. Then the 
deployment/resource manager can reason about this well. I am not sure what the 
implications of that change are for the Python language layer.

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 

[GitHub] [flink] flinkbot edited a comment on pull request #12285: [FLINK-17445][State Processor] Add Scala support for OperatorTransformation

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12285:
URL: https://github.com/apache/flink/pull/12285#issuecomment-632360179


   
   ## CI report:
   
   * 67e45d0cd91f2a0f3b7f233e9fb5006933ccad8b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2412)
 
   * a61851595a23f50e7a282e702bf28cdcabe3e3d8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2468)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12285: [FLINK-17445][State Processor] Add Scala support for OperatorTransformation

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12285:
URL: https://github.com/apache/flink/pull/12285#issuecomment-632360179


   
   ## CI report:
   
   * 67e45d0cd91f2a0f3b7f233e9fb5006933ccad8b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2412)
 
   * a61851595a23f50e7a282e702bf28cdcabe3e3d8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] carp84 commented on pull request #12368: [FLINK-15507][state backends] Activate local recovery by default (on release-1.11)

2020-05-29 Thread GitBox


carp84 commented on pull request #12368:
URL: https://github.com/apache/flink/pull/12368#issuecomment-636120661


   We've decided to postpone FLINK-15507 to 1.12.0 to have more thorough 
discussions, thus closing the PR directly (w/o merging).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] carp84 closed pull request #12368: [FLINK-15507][state backends] Activate local recovery by default (on release-1.11)

2020-05-29 Thread GitBox


carp84 closed pull request #12368:
URL: https://github.com/apache/flink/pull/12368


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-15507) Activate local recovery for RocksDB backends by default

2020-05-29 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-15507:
--
Fix Version/s: (was: 1.11.0)
   1.12.0
 Priority: Critical  (was: Blocker)

Thanks all for sharing your thoughts! [~Zakelly] [~yunta] [~klion26]

After an offline discussion with [~sewen], we think it's not an easy decision 
to enable local recovery by default for all cases, since for 
`RocksDBStateBackend` with full checkpoint or `FsStateBackend`, local recovery 
is much heavier and will make the checkpointing slower.

So let's postpone the JIRA to 1.12 to have more thorough discussions, instead 
of rushing it in (smile).

Thanks.

> Activate local recovery for RocksDB backends by default
> ---
>
> Key: FLINK-15507
> URL: https://issues.apache.org/jira/browse/FLINK-15507
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Stephan Ewen
>Assignee: Zakelly Lan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> For the RocksDB state backend, local recovery has no overhead when 
> incremental checkpoints are used. 
> It should be activated by default, because it greatly helps with recovery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] carp84 commented on pull request #12312: [FLINK-15507][state backends] Activate local recovery by default

2020-05-29 Thread GitBox


carp84 commented on pull request #12312:
URL: https://github.com/apache/flink/pull/12312#issuecomment-636120242


   We've decided to postpone FLINK-15507 to 1.12.0 to have more thorough 
discussions, thus won't merge the PR at the moment.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12380: [FLINK-17581][docs-zh] Update translation of S3 documentation

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12380:
URL: https://github.com/apache/flink/pull/12380#issuecomment-635291205


   
   ## CI report:
   
   * 4bd1784410abe95638fafa4e1e46c086a6d59663 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2394)
 
   * 823086d411e78ab2510834f96e7203d313026d9c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2467)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12146: [FLINK-17968][hbase] Fix Hadoop Configuration is not properly serialized in HBaseRowInputFormat

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12146:
URL: https://github.com/apache/flink/pull/12146#issuecomment-628508688


   
   ## CI report:
   
   * 59820867eb46cdb2870ca6500ac93858923590f9 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2460)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17800) RocksDB optimizeForPointLookup results in missing time windows

2020-05-29 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119803#comment-17119803
 ] 

Yu Li commented on FLINK-17800:
---

Thanks for debugging and locating the root cause [~yunta].

>From the analysis, this should be a RocksDB bug and requires a fix in FRocksDB 
>and building a new version, which makes it not an easy fix. It's also a long 
>existing issue but with more severe impact when we make RocksDB the default 
>timer store for RocksDB backend in 1.10.0 release, rather than regression 
>caused by changes in 1.11 release cycle.

I agree we should try to fix it before 1.11.0, but tend to not consider it as a 
release blocker due to the above reasons, and suggest to downgrade the severity 
to Critical.

Please also estimate the time needed and see whether we could make it in 
1.11.0. Thanks.

> RocksDB optimizeForPointLookup results in missing time windows
> --
>
> Key: FLINK-17800
> URL: https://issues.apache.org/jira/browse/FLINK-17800
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Yordan Pavlov
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.11.0, 1.10.2
>
> Attachments: MissingWindows.scala, MyMissingWindows.scala, 
> MyMissingWindows.scala
>
>
> +My Setup:+
> We have been using the _RocksDb_ option of _optimizeForPointLookup_ and 
> running version 1.7 for years. Upon upgrading to Flink 1.10 we started 
> receiving a strange behavior of missing time windows on a streaming Flink 
> job. For the purpose of testing I experimented with previous Flink version 
> and (1.8, 1.9, 1.9.3) and non of them showed the problem
>  
> A sample of the code demonstrating the problem is here:
> {code:java}
>  val datastream = env
>  .addSource(KafkaSource.keyedElements(config.kafkaElements, 
> List(config.kafkaBootstrapServer)))
>  val result = datastream
>  .keyBy( _ => 1)
>  .timeWindow(Time.milliseconds(1))
>  .print()
> {code}
>  
>  
> The source consists of 3 streams (being either 3 Kafka partitions or 3 Kafka 
> topics), the elements in each of the streams are separately increasing. The 
> elements generate increasing timestamps using an event time and start from 1, 
> increasing by 1. The first partitions would consist of timestamps 1, 2, 10, 
> 15..., the second of 4, 5, 6, 11..., the third of 3, 7, 8, 9...
>  
> +What I observe:+
> The time windows would open as I expect for the first 127 timestamps. Then 
> there would be a huge gap with no opened windows, if the source has many 
> elements, then next open window would be having a timestamp in the thousands. 
> A gap of hundred of elements would be created with what appear to be 'lost' 
> elements. Those elements are not reported as late (if tested with the 
> ._sideOutputLateData_ operator). The way we have been using the option is by 
> setting in inside the config like so:
> ??etherbi.rocksDB.columnOptions.optimizeForPointLookup=268435456??
> We have been using it for performance reasons as we have huge RocksDB state 
> backend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12380: [FLINK-17581][docs-zh] Update translation of S3 documentation

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12380:
URL: https://github.com/apache/flink/pull/12380#issuecomment-635291205


   
   ## CI report:
   
   * 4bd1784410abe95638fafa4e1e46c086a6d59663 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2394)
 
   * 823086d411e78ab2510834f96e7203d313026d9c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wydhcws commented on pull request #12380: [FLINK-17581][docs-zh] Update translation of S3 documentation

2020-05-29 Thread GitBox


wydhcws commented on pull request #12380:
URL: https://github.com/apache/flink/pull/12380#issuecomment-636096206


   Thanks for your patience and careful  . I've fixed the problem.  @klion26
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wydhcws commented on a change in pull request #12380: [FLINK-17581][docs-zh] Update translation of S3 documentation

2020-05-29 Thread GitBox


wydhcws commented on a change in pull request #12380:
URL: https://github.com/apache/flink/pull/12380#discussion_r432635762



##
File path: docs/ops/filesystems/s3.zh.md
##
@@ -58,11 +58,15 @@ env.setStateBackend(new 
FsStateBackend("s3:///"));
 Flink 提供两种文件系统用来与 S3 交互:`flink-s3-fs-presto` 和 
`flink-s3-fs-hadoop`。两种实现都是独立的且没有依赖项,因此使用时无需将 Hadoop 添加至 classpath。
 
   - `flink-s3-fs-presto`,通过 *s3://* 和 *s3p://* 两种 scheme 使用,基于 [Presto 
project](https://prestodb.io/)。
-  可以使用与[配置 Presto 
文件系统](https://prestodb.io/docs/0.187/connector/hive.html#amazon-s3-configuration)相同的方法进行配置,即将配置添加到
 `flink-conf.yaml` 文件中。推荐使用 Presto 文件系统来在 S3 中建立 checkpoint。
+  可以使用[和 Presto 
文件系统相同的配置项](https://prestodb.io/docs/0.187/connector/hive.html#amazon-s3-configuration)进行配置,方式为将配置添加到
 `flink-conf.yaml` 文件中。如果要在 S3 中使用 checkpoint,推荐使用 Presto S3 文件系统。
 
   - `flink-s3-fs-hadoop`,通过 *s3://* 和 *s3a://* 两种 scheme 使用, 基于 [Hadoop 
Project](https://hadoop.apache.org/)。
-  文件系统可以使用与 [Hadoop S3A 
完全相同的配置方法](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A)进行配置,即将配置添加到
 `flink-conf.yaml` 文件中。它是唯一一个支持 [StreamingFileSink]({{ 
site.baseurl}}/zh/dev/connectors/streamfile_sink.html) 的文件系统。
-
+  本文件系统可以使用类似 [Hadoop S3A 
的配置项](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A)进行配置,方式为将配置添加到
 `flink-conf.yaml` 文件中。
+  
+ 例如,Hadoop 有 `fs.s3a.connection.maximum` 的配置选项。 如果你想在 Flink 
程序中改变该配置的值,你需要将配置 `s3.connection.maximum: xyz` 添加到 `flink-conf.yaml` 文件中。Flink 
会内部将其转换成配置 `fs.s3a.connection.maximum`。 而无需通过 Hadoop 的XML 配置文件来传递参数。
+  
+另外,它是唯一支持 [StreamingFileSink]({{ 
site.baseurl}}/zh/dev/connectors/streamfile_sink.html) 的S3 文件系统。

Review comment:
   Thanks for your patience , this problem has been fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wydhcws commented on a change in pull request #12380: [FLINK-17581][docs-zh] Update translation of S3 documentation

2020-05-29 Thread GitBox


wydhcws commented on a change in pull request #12380:
URL: https://github.com/apache/flink/pull/12380#discussion_r432635557



##
File path: docs/ops/filesystems/s3.zh.md
##
@@ -58,11 +58,15 @@ env.setStateBackend(new 
FsStateBackend("s3:///"));
 Flink 提供两种文件系统用来与 S3 交互:`flink-s3-fs-presto` 和 
`flink-s3-fs-hadoop`。两种实现都是独立的且没有依赖项,因此使用时无需将 Hadoop 添加至 classpath。
 
   - `flink-s3-fs-presto`,通过 *s3://* 和 *s3p://* 两种 scheme 使用,基于 [Presto 
project](https://prestodb.io/)。
-  可以使用与[配置 Presto 
文件系统](https://prestodb.io/docs/0.187/connector/hive.html#amazon-s3-configuration)相同的方法进行配置,即将配置添加到
 `flink-conf.yaml` 文件中。推荐使用 Presto 文件系统来在 S3 中建立 checkpoint。
+  可以使用[和 Presto 
文件系统相同的配置项](https://prestodb.io/docs/0.187/connector/hive.html#amazon-s3-configuration)进行配置,方式为将配置添加到
 `flink-conf.yaml` 文件中。如果要在 S3 中使用 checkpoint,推荐使用 Presto S3 文件系统。
 
   - `flink-s3-fs-hadoop`,通过 *s3://* 和 *s3a://* 两种 scheme 使用, 基于 [Hadoop 
Project](https://hadoop.apache.org/)。
-  文件系统可以使用与 [Hadoop S3A 
完全相同的配置方法](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A)进行配置,即将配置添加到
 `flink-conf.yaml` 文件中。它是唯一一个支持 [StreamingFileSink]({{ 
site.baseurl}}/zh/dev/connectors/streamfile_sink.html) 的文件系统。
-
+  本文件系统可以使用类似 [Hadoop S3A 
的配置项](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A)进行配置,方式为将配置添加到
 `flink-conf.yaml` 文件中。
+  
+ 例如,Hadoop 有 `fs.s3a.connection.maximum` 的配置选项。 如果你想在 Flink 
程序中改变该配置的值,你需要将配置 `s3.connection.maximum: xyz` 添加到 `flink-conf.yaml` 文件中。Flink 
会内部将其转换成配置 `fs.s3a.connection.maximum`。 而无需通过 Hadoop 的XML 配置文件来传递参数。

Review comment:
   Thanks for your patience , this problem has been fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on a change in pull request #12401: [FLINK-16057] Optimize ContinuousFileReaderOperator

2020-05-29 Thread GitBox


rkhachatryan commented on a change in pull request #12401:
URL: https://github.com/apache/flink/pull/12401#discussion_r432618358



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/mailbox/MailboxProcessor.java
##
@@ -114,6 +114,7 @@ public MailboxProcessor(
this.actionExecutor = 
Preconditions.checkNotNull(actionExecutor);
this.mailbox = Preconditions.checkNotNull(mailbox);
this.mainMailboxExecutor = 
Preconditions.checkNotNull(mainMailboxExecutor);
+   ((MailboxExecutorImpl) 
this.mainMailboxExecutor).setMailboxProcessor(this);

Review comment:
   Good idea!
   Removed `MailboxProcessor.mainMailboxExecutor` field.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12401: [FLINK-16057] Optimize ContinuousFileReaderOperator

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12401:
URL: https://github.com/apache/flink/pull/12401#issuecomment-635845764


   
   ## CI report:
   
   * 5ae178ffebe69903ebe8d09bdd8709c299c86dc0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2433)
 
   * 3bde38fba011e5c786da8f072543b673a5b25eb0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2466)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12411: [FLINK-18005][table] Implement type inference for CAST

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12411:
URL: https://github.com/apache/flink/pull/12411#issuecomment-63596


   
   ## CI report:
   
   * a09af4cc6d36cdc635ac9f216c23d5a81266118e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2457)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12379: [FLINK-18001][table-planner] Add a new test base for evaluating functions

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12379:
URL: https://github.com/apache/flink/pull/12379#issuecomment-635291122


   
   ## CI report:
   
   * f9fe38e9d7bc846467f0d713a00b1fcf7d5321e0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2464)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12412: [FLINK-18011] Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12412:
URL: https://github.com/apache/flink/pull/12412#issuecomment-636063842


   
   ## CI report:
   
   * 55b61138a63ce9fde288910e00e533798c8cbaec Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2465)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12401: [FLINK-16057] Optimize ContinuousFileReaderOperator

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12401:
URL: https://github.com/apache/flink/pull/12401#issuecomment-635845764


   
   ## CI report:
   
   * 5ae178ffebe69903ebe8d09bdd8709c299c86dc0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2433)
 
   * 3bde38fba011e5c786da8f072543b673a5b25eb0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12412: [FLINK-18011] Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread GitBox


flinkbot commented on pull request #12412:
URL: https://github.com/apache/flink/pull/12412#issuecomment-636063842


   
   ## CI report:
   
   * 55b61138a63ce9fde288910e00e533798c8cbaec UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18037) The doc of StreamTaskNetworkInput.java may has a redundant 'status'

2020-05-29 Thread ZhuShang (Jira)
ZhuShang created FLINK-18037:


 Summary: The doc of StreamTaskNetworkInput.java may has a 
redundant 'status'
 Key: FLINK-18037
 URL: https://issues.apache.org/jira/browse/FLINK-18037
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.10.1, 1.10.0, 1.11.0
Reporter: ZhuShang


The doc of class StreamTaskNetworkInput as follows 
{code:java}
* Forwarding elements, watermarks, or status status elements must be 
protected by synchronizing
* on the given lock object. This ensures that we don't call methods on a
* {@link StreamInputProcessor} concurrently with the timer callback or other 
things.{code}
 
 Is one of the 'status' redundant?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] walterddr commented on a change in pull request #11586: [FLINK-5552][runtime] make JMXServer static per JVM

2020-05-29 Thread GitBox


walterddr commented on a change in pull request #11586:
URL: https://github.com/apache/flink/pull/11586#discussion_r432582640



##
File path: 
flink-core/src/main/java/org/apache/flink/configuration/JMXServerOptions.java
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.configuration;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.annotation.docs.Documentation;
+
+import static org.apache.flink.configuration.ConfigOptions.key;
+
+/**
+ * The set of configuration options relating to heartbeat manager settings.
+ */
+@PublicEvolving
+public class JMXServerOptions {
+
+   /** Port configured to enable JMX server for metrics and debugging. */
+   
@Documentation.Section(Documentation.Sections.EXPERT_DEBUGGING_AND_TUNING)
+   public static final ConfigOptionJMX_SERVER_PORT =
+   key("jmx.server.port")
+   .defaultValue("-1")

Review comment:
   hmm this is the part I am also struggling. --> how to make the original 
`JMXReporter.port` configuration work as a "deprecated option" but make it 
still working. 
   
   ideally speaking this JMX_SERVER_PORT config should override the ones in 
`JMXReporter`, but the config string for that is actually dynamic. e.g. 
`jmxreporter..port` --> makes it hard to fallback. 
   
   I was trying to use a "-1" here to indicate that a global JMXServer 
singelton should not be initialized during the taskmanager / jobmaster startup, 
instead differ to the JMXReporter startup. (that's where the 
"" config can be accessed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] walterddr commented on a change in pull request #11586: [FLINK-5552][runtime] make JMXServer static per JVM

2020-05-29 Thread GitBox


walterddr commented on a change in pull request #11586:
URL: https://github.com/apache/flink/pull/11586#discussion_r432582640



##
File path: 
flink-core/src/main/java/org/apache/flink/configuration/JMXServerOptions.java
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.configuration;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.annotation.docs.Documentation;
+
+import static org.apache.flink.configuration.ConfigOptions.key;
+
+/**
+ * The set of configuration options relating to heartbeat manager settings.
+ */
+@PublicEvolving
+public class JMXServerOptions {
+
+   /** Port configured to enable JMX server for metrics and debugging. */
+   
@Documentation.Section(Documentation.Sections.EXPERT_DEBUGGING_AND_TUNING)
+   public static final ConfigOptionJMX_SERVER_PORT =
+   key("jmx.server.port")
+   .defaultValue("-1")

Review comment:
   hmm this is the part I am also struggling. --> how to make the original 
`JMXReporter.port` configuration work as a "deprecated option" but make it 
still working. 
   
   ideally speaking this config should override the ones in `JMXReporter`, but 
the config string for that is actually dynamic. e.g. 
`jmxreporter..port` --> makes it hard to fallback. 
   
   I was trying to use a "-1" here to indicate that a global JMXServer 
singelton should not be initialized during the taskmanager / jobmaster startup, 
instead differ to the JMXReporter startup. (that's where the 
"" config can be accessed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] walterddr commented on a change in pull request #11586: [FLINK-5552][runtime] make JMXServer static per JVM

2020-05-29 Thread GitBox


walterddr commented on a change in pull request #11586:
URL: https://github.com/apache/flink/pull/11586#discussion_r432582640



##
File path: 
flink-core/src/main/java/org/apache/flink/configuration/JMXServerOptions.java
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.configuration;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.annotation.docs.Documentation;
+
+import static org.apache.flink.configuration.ConfigOptions.key;
+
+/**
+ * The set of configuration options relating to heartbeat manager settings.
+ */
+@PublicEvolving
+public class JMXServerOptions {
+
+   /** Port configured to enable JMX server for metrics and debugging. */
+   
@Documentation.Section(Documentation.Sections.EXPERT_DEBUGGING_AND_TUNING)
+   public static final ConfigOptionJMX_SERVER_PORT =
+   key("jmx.server.port")
+   .defaultValue("-1")

Review comment:
   hmm this is the part I am also struggling. --> how to make the original 
`JMXReporter.port` configuration work as a "deprecated" but still working 
alternative. 
   
   ideally speaking this config should override the ones in `JMXReporter`, but 
the config string for that is actually dynamic. e.g. 
`jmxreporter..port` --> makes it hard to fallback. 
   
   I was trying to use a "-1" here to indicate that a global JMXServer 
singelton should not be initialized during the taskmanager / jobmaster startup, 
instead differ to the JMXReporter startup. (that's where the 
"" config can be accessed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] walterddr commented on a change in pull request #11586: [FLINK-5552][runtime] make JMXServer static per JVM

2020-05-29 Thread GitBox


walterddr commented on a change in pull request #11586:
URL: https://github.com/apache/flink/pull/11586#discussion_r432579859



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/management/JMXServer.java
##
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.management;
+
+import org.apache.flink.configuration.JMXServerOptions;
+import org.apache.flink.util.NetUtils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.management.remote.JMXConnectorServer;
+import javax.management.remote.JMXServiceURL;
+import javax.management.remote.rmi.RMIConnectorServer;
+import javax.management.remote.rmi.RMIJRMPServerImpl;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.net.MalformedURLException;
+import java.rmi.NoSuchObjectException;
+import java.rmi.NotBoundException;
+import java.rmi.Remote;
+import java.rmi.RemoteException;
+import java.rmi.registry.Registry;
+import java.rmi.server.UnicastRemoteObject;
+import java.util.Iterator;
+import java.util.concurrent.atomic.AtomicReference;
+
+
+/**
+ * JMX Server implementation that JMX clients can connect to.
+ *
+ * Heavily based on j256 simplejmx project
+ *
+ * 
https://github.com/j256/simplejmx/blob/master/src/main/java/com/j256/simplejmx/server/JmxServer.java
+ */
+public class JMXServer {

Review comment:
   This actually is a great idea. I would do that. thanks for the advice! 
:-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on a change in pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-29 Thread GitBox


zhijiangW commented on a change in pull request #12353:
URL: https://github.com/apache/flink/pull/12353#discussion_r432575584



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/BroadcastRecordWriter.java
##
@@ -130,7 +132,7 @@ public BufferBuilder requestNewBufferBuilder(int 
targetChannel) throws IOExcepti
 
BufferBuilder builder = 
super.requestNewBufferBuilder(targetChannel);
if (randomTriggered) {
-   addBufferConsumer(builder.createBufferConsumer(), 
targetChannel);
+   addBufferConsumer(randomConsumer = 
builder.createBufferConsumer(), targetChannel);

Review comment:
   Yeah,  your thought is right. But my concern is that the current changes 
would make the latency marker visible for all the channels, not only for one 
random channel if my understanding is right.
   
   I mean the current created `randomConsumer` will have the read position `0`, 
then all the copies should also have the read position `0`, which means they 
can read the latency marker when it is filled in the `BufferBuilder` afterwards.
   
   I guess the proper way might be we only create one consumer for the random 
channel before filling the latency marker. After that, we create/copy the 
separate consumers with increased references for all the left channels, then 
the read position for these copied  consumers should overtake the length of 
latency marker?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on a change in pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-29 Thread GitBox


zhijiangW commented on a change in pull request #12353:
URL: https://github.com/apache/flink/pull/12353#discussion_r432575584



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/BroadcastRecordWriter.java
##
@@ -130,7 +132,7 @@ public BufferBuilder requestNewBufferBuilder(int 
targetChannel) throws IOExcepti
 
BufferBuilder builder = 
super.requestNewBufferBuilder(targetChannel);
if (randomTriggered) {
-   addBufferConsumer(builder.createBufferConsumer(), 
targetChannel);
+   addBufferConsumer(randomConsumer = 
builder.createBufferConsumer(), targetChannel);

Review comment:
   Yeah,  your thought is right. But my concern is that the current changes 
would make the latency marker visible for all the channels, not only for one 
random channel if my understanding is right.
   
   I mean the current created `randomConsumer` will have the read position `0`, 
then all the copies should also have the read position `0`, which means they 
can read the latency marker because it is filled in the `BufferBuilder` 
afterwards.
   
   I guess the proper way might be we only create one consumer for the random 
channel before filling the latency marker. After that, we create/copy the 
separate consumers with increased references for all the left channels, then 
the read position for these copied  consumers should overtake the length of 
latency marker?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12412: [FLINK-18011] Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread GitBox


flinkbot commented on pull request #12412:
URL: https://github.com/apache/flink/pull/12412#issuecomment-636047460


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 55b61138a63ce9fde288910e00e533798c8cbaec (Fri May 29 
15:51:43 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18011) Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18011:
---
Labels: pull-request-available  (was: )

> Make WatermarkStrategy/WatermarkStrategies more ergonomic
> -
>
> Key: FLINK-18011
> URL: https://issues.apache.org/jira/browse/FLINK-18011
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core, API / DataStream
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Currently, we have an interface {{WatermarkStrategy}}, which is a 
> {{TimestampAssignerSupplier}} and {{WatermarkGeneratorSupplier}}. The very 
> first design (which is also currently implemented) also added 
> {{WatermarkStrategies}} as a convenience builder for a {{WatermarkStrategy}}. 
> However, I don't think users will ever implement a {{WatermarkStrategy}} but 
> always wrap it in a builder. I also think that {{WatermarkStrategy}} itself 
> is already that builder and we currently have two levels of builders, which 
> also makes them harder to use in the {{DataStream API}} because of type 
> checking issues.
> I'm proposing to remove {{WatermarkStrategies}} and to instead put the static 
> methods directly into {{WatermarkStrategy}} and also to remove the 
> {{build()}} method. Instead of a {{build()}} method, API methods on 
> {{WatermarkStrategy}} just keep "piling" features on top of a base 
> {{WatermarkStrategy}} via wrapping.
> Example to show what I mean for the API (current):
> {code}
> DataStream input = ...;
> input.assignTimestampsAndWatermarks(
> WatermarkStrategies..forMonotonousTimestamps().build());
> {code}
> with the proposed change:
> {code}
> DataStream input = ...;
> input.assignTimestampsAndWatermarks(
> WatermarkStrategy.forMonotonousTimestamps());
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] aljoscha commented on pull request #12412: [FLINK-18011] Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread GitBox


aljoscha commented on pull request #12412:
URL: https://github.com/apache/flink/pull/12412#issuecomment-636046671


   By now I'm wondering wether we should make `WatermarkStrategy` 
`@PublicEvolving`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] aljoscha opened a new pull request #12412: [FLINK-18011] Make WatermarkStrategy/WatermarkStrategies more ergonomic

2020-05-29 Thread GitBox


aljoscha opened a new pull request #12412:
URL: https://github.com/apache/flink/pull/12412


   ## What is the purpose of the change
   
   From the Jira issue:
   Currently, we have an interface `WatermarkStrategy`, which is a 
`TimestampAssignerSupplier` and `WatermarkGeneratorSupplier`. The very first 
design (which is also currently implemented) also added `WatermarkStrategies` 
as a convenience builder for a `WatermarkStrategy`. However, I don't think 
users will ever implement a `WatermarkStrategy` but always wrap it in a 
builder. I also think that `WatermarkStrategy` itself is already that builder 
and we currently have two levels of builders, which also makes them harder to 
use in the `DataStream API` because of type checking issues.
   
   I'm proposing to remove `WatermarkStrategies` and to instead put the static 
methods directly into `WatermarkStrategy` and also to remove the `build()` 
method. Instead of a `build()` method, API methods on `WatermarkStrategy` just 
keep "piling" features on top of a base `WatermarkStrategy` via wrapping.
   
   Example to show what I mean for the API (current):
   ```
   DataStream input = ...;
   input.assignTimestampsAndWatermarks(
   WatermarkStrategies..forMonotonousTimestamps().build());
   ```
   
   with this change:
   ```
   DataStream input = ...;
   input.assignTimestampsAndWatermarks(
   WatermarkStrategy.forMonotonousTimestamps());
   ```
   
   ## Brief change log
   
   - remove `WatermarkStrategies` and instead move all code to 
`WatermarkStrategy`
   
   
   ## Verifying this change
   
   - covered by existing tests
   - new tests are added to verify how `WatermarkStrategy` can be used on 
`DataStream.assignTimestampsAndWatermarks()`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: *yes*
   
   ## Documentation
   
   I didn't yet change the documentation. I would update that once we agree on 
the change.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on a change in pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-29 Thread GitBox


zhijiangW commented on a change in pull request #12353:
URL: https://github.com/apache/flink/pull/12353#discussion_r432575584



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/BroadcastRecordWriter.java
##
@@ -130,7 +132,7 @@ public BufferBuilder requestNewBufferBuilder(int 
targetChannel) throws IOExcepti
 
BufferBuilder builder = 
super.requestNewBufferBuilder(targetChannel);
if (randomTriggered) {
-   addBufferConsumer(builder.createBufferConsumer(), 
targetChannel);
+   addBufferConsumer(randomConsumer = 
builder.createBufferConsumer(), targetChannel);

Review comment:
   Yeah,  your thought is right. But my concern is that the current changes 
would make the latency marker visible for all the channels, not only for one 
random channel if my understanding is right.
   
   I mean the current created `randomConsumer` should have `0` read position, 
then all the copies should also have the `0` read position, which means they 
can read the latency marker because it is filled in the `BufferBuilder` 
afterwards.
   
   I guess the proper way might be we only create one consumer for the random 
channel before filling the latency marker. After that, we create/copy the 
separate consumers with increased references for all the left channels, then 
the read position for these copied  consumers should overtake the length of 
latency marker?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12379: [FLINK-18001][table-planner] Add a new test base for evaluating functions

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12379:
URL: https://github.com/apache/flink/pull/12379#issuecomment-635291122


   
   ## CI report:
   
   * 4735a8269768a6732018f9ca18ac7448defd6db8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2451)
 
   * f9fe38e9d7bc846467f0d713a00b1fcf7d5321e0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2464)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on a change in pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-29 Thread GitBox


zhijiangW commented on a change in pull request #12353:
URL: https://github.com/apache/flink/pull/12353#discussion_r432575584



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/BroadcastRecordWriter.java
##
@@ -130,7 +132,7 @@ public BufferBuilder requestNewBufferBuilder(int 
targetChannel) throws IOExcepti
 
BufferBuilder builder = 
super.requestNewBufferBuilder(targetChannel);
if (randomTriggered) {
-   addBufferConsumer(builder.createBufferConsumer(), 
targetChannel);
+   addBufferConsumer(randomConsumer = 
builder.createBufferConsumer(), targetChannel);

Review comment:
   Yeah,  your thought is right. But my concern is that the current changes 
would make the latency marker visible for all the channels, not only for one 
random channel if my understanding is right.
   
   I mean the current created `randomConsumer` should have `0` reader position, 
then all the copies should also have the `0` reader position, which means they 
can read the latency marker because it is filled in the `BufferBuilder` 
afterwards.
   
   I guess the proper way might be we only create one consumer for the random 
channel before filling the latency marker. After that, we create/copy the 
separate consumers with increased references for all the left channels, then 
the reader position for these copied  consumers should overtake the length of 
latency marker?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18036) Chinese documentation build is broken

2020-05-29 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119714#comment-17119714
 ] 

Aljoscha Krettek commented on FLINK-18036:
--

Yes, that's the same issue.

> Chinese documentation build is broken
> -
>
> Key: FLINK-18036
> URL: https://issues.apache.org/jira/browse/FLINK-18036
> Project: Flink
>  Issue Type: Bug
>  Components: chinese-translation, Documentation
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.11.0
>
>
> Log from one of the builders: 
> https://ci.apache.org/builders/flink-docs-master/builds/1848/steps/Build%20docs/logs/stdio
> The problem is that the chinese doc uses "link" tags that refer to documents 
> from the english documentation. It should be as easy as adding {{.zh}} in 
> these links.
> It seems this change introduced the problem: 
> https://github.com/apache/flink/commit/d40abbf0309f414a6acf8a090c448ba397a08d9c



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12379: [FLINK-18001][table-planner] Add a new test base for evaluating functions

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12379:
URL: https://github.com/apache/flink/pull/12379#issuecomment-635291122


   
   ## CI report:
   
   * 4735a8269768a6732018f9ca18ac7448defd6db8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2451)
 
   * f9fe38e9d7bc846467f0d713a00b1fcf7d5321e0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW edited a comment on pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-29 Thread GitBox


zhijiangW edited a comment on pull request #12406:
URL: https://github.com/apache/flink/pull/12406#issuecomment-636036676


   Thanks for the efficient review @pnowojski !
   
   >As far as I understand it, it's not a full fix, as EndOfPartitionEvents or 
checkpoint barrier cancellation markers can still cause the same race 
condition, and you want to fix it in a follow up PR, right? 
   
   Exactly, it only resolves partial race condition in this PR, and i also 
thought there still has some issues while handing EOF and cancellation marker, 
which seems hard to reproduce in `UnalignedCheckpointITCase`. Furthermore, 
considering the current logic inside `CheckpointBarrierUnaligner` is too 
complicated, actually it is totally different with my initial PoC version. I 
try to refactor some logics to simplify the process a bit while resolving the 
other two issues, so i choose to resolve them separately to not delay the whole 
progress.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12406:
URL: https://github.com/apache/flink/pull/12406#issuecomment-635907759


   
   ## CI report:
   
   * bee02498f028f38676813dde3b99d318ace084aa Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2452)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12408: [FLINK-15547][hive] Add test for avro table

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12408:
URL: https://github.com/apache/flink/pull/12408#issuecomment-635936224


   
   ## CI report:
   
   * c66db1484851da00f3b54b1d8302b60053f97c88 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2453)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW edited a comment on pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-29 Thread GitBox


zhijiangW edited a comment on pull request #12406:
URL: https://github.com/apache/flink/pull/12406#issuecomment-636036676


   Thanks for the efficient review @pnowojski !
   
   >As far as I understand it, it's not a full fix, as EndOfPartitionEvents or 
checkpoint barrier cancellation markers can still cause the same race 
condition, and you want to fix it in a follow up PR, right? 
   
   Exactly, it only resolves partial race condition in this PR, and i also 
thought these still has some issues while handing EOF and cancellation marker, 
which seems hard to reproduce in `UnalignedCheckpointITCase`. Furthermore, 
considering the current logic inside `CheckpointBarrierUnaligner` is too 
complicated, actually it is totally different with my initial PoC version. I 
try to refactor some logics to simplify the process a bit while resolving the 
other two issues, so i choose to resolve them separately to not delay the whole 
progress.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-29 Thread GitBox


zhijiangW commented on pull request #12406:
URL: https://github.com/apache/flink/pull/12406#issuecomment-636036676


   >As far as I understand it, it's not a full fix, as EndOfPartitionEvents or 
checkpoint barrier cancellation markers can still cause the same race 
condition, and you want to fix it in a follow up PR, right? 
   
   Exactly, it only resolves partial race condition in this PR, and i also 
thought these still has some issues while handing EOF and cancellation marker, 
which seems hard to reproduce in `UnalignedCheckpointITCase`. Furthermore, 
considering the current logic inside `CheckpointBarrierUnaligner` is too 
complicated, actually it is totally different with my initial PoC version. I 
try to refactor some logics to simplify the process a bit while resolving the 
other two issues, so i choose to resolve them separately to not delay the whole 
progress.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17706) Clarify licensing situation for flink-benchmarks

2020-05-29 Thread Roman Grebennikov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119704#comment-17119704
 ] 

Roman Grebennikov commented on FLINK-17706:
---

I'm ok with licensing my contributions to the Apache License 2.0

> Clarify licensing situation for flink-benchmarks 
> -
>
> Key: FLINK-17706
> URL: https://issues.apache.org/jira/browse/FLINK-17706
> Project: Flink
>  Issue Type: Sub-task
>  Components: Benchmarks
>Affects Versions: 1.11.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.11.0
>
>
> After enabling the rat plugin, it finds the following files with missing or 
> invalid license headers, broken down by all contributors I could find in the 
> git history. If I see this correctly, every contributor should acknowledge 
> the change of their files to the Apache License and then we could add the 
> license headers and continue the project move:
>  * [~rgrebennikov] + [~NicoK]
> {code:java}
>   
> src/main/java/org/apache/flink/benchmark/full/PojoSerializationBenchmark.java
>   
> src/main/java/org/apache/flink/benchmark/full/StringSerializationBenchmark.java
>  {code}
>  * [~sunhaibotb]
> {code:java}
>   src/main/java/org/apache/flink/benchmark/functions/QueuingLongSource.java
>   
> src/main/java/org/apache/flink/benchmark/operators/MultiplyByTwoCoStreamMap.java{code}
>  * [~pnowojski]
> {code:java}
>   src/main/java/org/apache/flink/benchmark/functions/IntLongApplications.java
>   src/main/java/org/apache/flink/benchmark/functions/IntegerLongSource.java
>   src/main/java/org/apache/flink/benchmark/functions/LongSource.java
>   src/main/java/org/apache/flink/benchmark/functions/MultiplyByTwo.java
>   src/main/java/org/apache/flink/benchmark/functions/MultiplyIntLongByTwo.java
>   src/main/java/org/apache/flink/benchmark/functions/SuccessException.java
>   src/main/java/org/apache/flink/benchmark/functions/SumReduce.java
>   src/main/java/org/apache/flink/benchmark/functions/SumReduceIntLong.java
>   src/main/java/org/apache/flink/benchmark/functions/ValidatingCounter.java
>   src/main/java/org/apache/flink/benchmark/CollectSink.java{code}
>  * [~pnowojski]  + [~sunhaibotb]  + [~xintongsong]
> {code:java}
>src/main/java/org/apache/flink/benchmark/FlinkEnvironmentContext.java{code}
>  * [~NicoK]
> {code:java}
>   src/main/resources/avro/mypojo.avsc
>   src/main/resources/protobuf/MyPojo.proto
>   src/main/resources/thrift/mypojo.thrift{code}
>  * [~pnowojski] + [~liyu]
> {code:java}
>   save_jmh_result.py{code}
> The license should be clarified with the author and all contributors of that 
> file.
> Please, every tagged contributor, express your decision (whether you are fine 
> with the change) below, so we can continue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhijiangW commented on a change in pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceiv

2020-05-29 Thread GitBox


zhijiangW commented on a change in pull request #12406:
URL: https://github.com/apache/flink/pull/12406#discussion_r432561333



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/CheckpointBarrierUnaligner.java
##
@@ -164,33 +158,32 @@ public void processBarrier(
hasInflightBuffers[channelIndex] = false;
numBarrierConsumed++;
}
-   // processBarrier is called from task thread and can actually 
happen before notifyBarrierReceived on empty
-   // buffer queues
-   // to avoid replicating any logic, we simply call 
notifyBarrierReceived here as well

Review comment:
   It indeed provides a bit richer information than the javadoc of this 
method, and i can consider to retain it. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhijiangW commented on a change in pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceiv

2020-05-29 Thread GitBox


zhijiangW commented on a change in pull request #12406:
URL: https://github.com/apache/flink/pull/12406#discussion_r432559532



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/io/CheckpointBarrierUnalignerTest.java
##
@@ -639,4 +673,79 @@ public long getLastCanceledCheckpointId() {
return lastCanceledCheckpointId;
}
}
+
+   /**
+* Specific {@link AbstractInvokable} implementation to record and 
validate which checkpoint
+* id is executed and how many checkpoints are executed.
+*/
+   private static final class ValidatingCheckpointInvokable extends 
AbstractInvokable {
+
+   private long expectedCheckpointId;
+
+   private int totalNumCheckpoints;
+
+   ValidatingCheckpointInvokable() {
+   super(new DummyEnvironment("test", 1, 0));
+   }
+
+   public void invoke() {
+   throw new UnsupportedOperationException();
+   }
+
+   public void triggerCheckpointOnBarrier(
+   CheckpointMetaData checkpointMetaData,
+   CheckpointOptions checkpointOptions,
+   CheckpointMetrics checkpointMetrics) {
+   expectedCheckpointId = 
checkpointMetaData.getCheckpointId();
+   totalNumCheckpoints++;
+   }
+
+   @Override
+   public  void executeInTaskThread(
+   ThrowingRunnable runnable,
+   String descriptionFormat,
+   Object... descriptionArgs) throws E {
+   runnable.run();
+   }
+
+   long getTriggeredCheckpointId() {
+   return expectedCheckpointId;
+   }
+
+   int getTotalTriggeredCheckpoints() {
+   return totalNumCheckpoints;
+   }
+   }
+
+   /**
+* Specific {@link CheckpointBarrierUnaligner} implementation to mock 
the scenario that the later triggered
+* checkpoint executes before the preceding triggered checkpoint.
+*/
+   private static final class ValidatingCheckpointBarrierUnaligner extends 
CheckpointBarrierUnaligner {
+
+   private ThrowingRunnable waitingRunnable;
+   private boolean firstRunnable = true;
+
+   ValidatingCheckpointBarrierUnaligner(AbstractInvokable 
invokable) {
+   super(
+   new int[]{1},
+   new ChannelStateWriter.NoOpChannelStateWriter(),
+   "test",
+   invokable);
+   }
+
+   @Override
+   protected  void executeInTaskThread(
+   ThrowingRunnable runnable,
+   String descriptionFormat,
+   Object... descriptionArgs) throws E {
+   if (firstRunnable) {
+   waitingRunnable = runnable;
+   firstRunnable = false;
+   } else {
+   super.executeInTaskThread(runnable, 
"checkpoint");
+   super.executeInTaskThread(waitingRunnable, 
"checkpoint");
+   }
+   }
+   }

Review comment:
   I indeed considered the way of verifying the race condition via somehow 
real `AbstractInvokable` with `TaskMailbox`, but also thought that these two 
components are a bit far away from `CheckpointBarrierHandler` and they are also 
a bit heavy-weight components from themselves. 
   
   From the aspect of touching less external components in unit tests, i chose 
the current way. Actually I bypassed the mailbox implementation and simulate 
the race condition via executing the runnable in mis-order way. The propose for 
introducing `ValidatingCheckpointInvokable` and 
`ValidatingCheckpointBarrierUnaligner` is just for avoiding relying on external 
components of `AbstractInvokable` and `TaskMailbox` in unit tests.
   
   And this test is for verifying the processes of 
`CheckpointBarrierUnaligner#processBarrier` and `#notifyBarrierReceived`, to 
confirm the new introduced method `CheckpointBarrierUnaligner#notifyCheckpoint` 
really effect in these interactions. All these three methods would be really 
touched in this test.
   
   From another aspect, for the interaction between two components it is better 
to verify the real interactions using two real components without 
re-implementing either sides. Then any internal core changes in either 
component will be reflected in the test. For this case, actually the 
`CheckpointBarrierUnaligner` component will interact with `AbstractInvokable` 
with 

[jira] [Assigned] (FLINK-17918) Jobs with two input operators are loosing data with unaligned checkpoints

2020-05-29 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski reassigned FLINK-17918:
--

Assignee: Arvid Heise  (was: Yingjie Cao)

> Jobs with two input operators are loosing data with unaligned checkpoints
> -
>
> Key: FLINK-17918
> URL: https://issues.apache.org/jira/browse/FLINK-17918
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Network
>Affects Versions: 1.11.0
>Reporter: Piotr Nowojski
>Assignee: Arvid Heise
>Priority: Blocker
> Fix For: 1.11.0
>
>
> After trying to enable unaligned checkpoints by default, a lot of Blink 
> streaming SQL/Table API tests containing joins or set operations are throwing 
> errors that are indicating we are loosing some data (full records, without 
> deserialisation errors). Example errors:
> {noformat}
> [ERROR] Failures: 
> [ERROR]   JoinITCase.testFullJoinWithEqualPk:775 expected: 3,3, null,4, null,5)> but was:
> [ERROR]   JoinITCase.testStreamJoinWithSameRecord:391 expected: 1,1,1,1, 2,2,2,2, 2,2,2,2, 3,3,3,3, 3,3,3,3, 4,4,4,4, 4,4,4,4, 5,5,5,5, 
> 5,5,5,5)> but was:
> [ERROR]   SemiAntiJoinStreamITCase.testAntiJoin:352 expected:<0> but was:<1>
> [ERROR]   SetOperatorsITCase.testIntersect:55 expected: 2,2,Hello, 3,2,Hello world)> but was:
> [ERROR]   JoinITCase.testJoinPushThroughJoin:1272 expected: 2,1,Hello, 2,1,Hello world)> but was:
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17842) Performance regression on 19.05.2020

2020-05-29 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-17842.
--
Resolution: Fixed

2nd fixed merged to release-1.11 as acb16cbdc3 and to master as 955a683 

> Performance regression on 19.05.2020
> 
>
> Key: FLINK-17842
> URL: https://issues.apache.org/jira/browse/FLINK-17842
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Affects Versions: 1.11.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> There is a noticeable performance regression in many benchmarks:
> http://codespeed.dak8s.net:8000/timeline/?ben=serializerHeavyString=2
> http://codespeed.dak8s.net:8000/timeline/?ben=networkThroughput.1000,1ms=2
> http://codespeed.dak8s.net:8000/timeline/?ben=networkThroughput.100,100ms=2
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2
> that happened on May 19th, probably between 260ef2c and 2f18138



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] pnowojski merged pull request #12407: [FLINK-17842][network] Remove NextRecordResponse to improve deserialisation performance

2020-05-29 Thread GitBox


pnowojski merged pull request #12407:
URL: https://github.com/apache/flink/pull/12407


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] twalthr commented on pull request #12379: [FLINK-18001][table-planner] Add a new test base for evaluating functions

2020-05-29 Thread GitBox


twalthr commented on pull request #12379:
URL: https://github.com/apache/flink/pull/12379#issuecomment-636015882


   Thanks @dawidwys. I will check what I can do and merge this once I get a 
green build.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12407: [FLINK-17842][network] Remove NextRecordResponse to improve deserialisation performance

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12407:
URL: https://github.com/apache/flink/pull/12407#issuecomment-635920790


   
   ## CI report:
   
   * cc1b0b49d02982ca88dfea41393cbbf34748806b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2448)
 
   * 695516408f92a0dc4138ff2aa046a72eab78827a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2461)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12409: [FLINK-18035][runtime] Use fixed thread pool

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12409:
URL: https://github.com/apache/flink/pull/12409#issuecomment-635946066


   
   ## CI report:
   
   * ab960b968d07be175fe6a94f130f4d1f0fd051b3 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2455)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12256:
URL: https://github.com/apache/flink/pull/12256#issuecomment-631025695


   
   ## CI report:
   
   * 8ba2a2eb65072f71c66055c3f86549a061eb1514 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2445)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12406: [FLINK-17994][checkpointing] Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12406:
URL: https://github.com/apache/flink/pull/12406#issuecomment-635907759


   
   ## CI report:
   
   * 67af5b88a2032e6515d41c26586f2fba88cbcd0e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2447)
 
   * bee02498f028f38676813dde3b99d318ace084aa Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2452)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12381: [FLINK-17941][sql-client] Switching catalog or database doesn't work …

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12381:
URL: https://github.com/apache/flink/pull/12381#issuecomment-635335584


   
   ## CI report:
   
   * 007076ff616543b838b538001432546667c78be9 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2446)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12146: [FLINK-17968][hbase] Fix Hadoop Configuration is not properly serialized in HBaseRowInputFormat

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12146:
URL: https://github.com/apache/flink/pull/12146#issuecomment-628508688


   
   ## CI report:
   
   * 2e9f1a46dd294b87fc5577a36d1fdd634ac3c64e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1268)
 
   * 59820867eb46cdb2870ca6500ac93858923590f9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2460)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-17500) Deploy JobGraph from file in StandaloneClusterEntrypoint

2020-05-29 Thread Fabian Paul (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119656#comment-17119656
 ] 

Fabian Paul edited comment on FLINK-17500 at 5/29/20, 2:21 PM:
---

Sorry for the late reply and obviously no progress from my side. I have had a 
lot of fruitful discussions with all the commiters involved and I also 
reiterated a couple of times what the right approach could be.

Finally, we have decided that in the current state it is not a feature that 
helps in any way the community. I would propose to close this ticket for now.


was (Author: fabian.paul):
Sorry for the late reply and obviously no progress from my side. I have a lot 
of fruitful discussions with all the commiters involved and I also reiterated a 
couple of times what the right approach could be.

Finally, we have decided that in the current state it is not a feature that 
helps in any way the community. I would propose to close this ticket for now.

> Deploy JobGraph from file in StandaloneClusterEntrypoint
> 
>
> Key: FLINK-17500
> URL: https://issues.apache.org/jira/browse/FLINK-17500
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / Docker
>Reporter: Ufuk Celebi
>Assignee: Fabian Paul
>Priority: Minor
>
> We have a requirement to deploy a pre-generated {{JobGraph}} from a file in 
> {{StandaloneClusterEntrypoint}}.
> Currently, {{StandaloneClusterEntrypoint}} only supports deployment of a 
> Flink job from the class path using {{ClassPathPackagedProgramRetriever}}. 
> Our desired behaviour would be as follows:
> If {{internal.jobgraph-path}} is set, prepare a {{PackagedProgram}} from a 
> local {{JobGraph}} file using {{FileJobGraphRetriever}}. Otherwise, deploy 
> using {{ClassPathPackagedProgramRetriever}} (current behavior).
> ---
> I understand that this requirement is pretty niche, but wanted to get 
> feedback whether the Flink community would be open to supporting this 
> nonetheless.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17500) Deploy JobGraph from file in StandaloneClusterEntrypoint

2020-05-29 Thread Fabian Paul (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119656#comment-17119656
 ] 

Fabian Paul edited comment on FLINK-17500 at 5/29/20, 2:21 PM:
---

Sorry for the late reply and obviously no progress from my side. I have a lot 
of fruitful discussions with all the commiters involved and I also reiterated a 
couple of times what the right approach could be.

Finally, we have decided that in the current state it is not a feature that 
helps in any way the community. I would propose to close this ticket for now.


was (Author: fabian.paul):
Sorry for the late reply and obviously no progress from my side. I have a lot 
of fruitful discussions with all the commiters involved and I also reiterated a 
couple of times what the right approach could be.

 

Finally, we have decided that in the current state it is not a feature that 
helps in any way the community. I would propose to close this ticket for now.

> Deploy JobGraph from file in StandaloneClusterEntrypoint
> 
>
> Key: FLINK-17500
> URL: https://issues.apache.org/jira/browse/FLINK-17500
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / Docker
>Reporter: Ufuk Celebi
>Assignee: Fabian Paul
>Priority: Minor
>
> We have a requirement to deploy a pre-generated {{JobGraph}} from a file in 
> {{StandaloneClusterEntrypoint}}.
> Currently, {{StandaloneClusterEntrypoint}} only supports deployment of a 
> Flink job from the class path using {{ClassPathPackagedProgramRetriever}}. 
> Our desired behaviour would be as follows:
> If {{internal.jobgraph-path}} is set, prepare a {{PackagedProgram}} from a 
> local {{JobGraph}} file using {{FileJobGraphRetriever}}. Otherwise, deploy 
> using {{ClassPathPackagedProgramRetriever}} (current behavior).
> ---
> I understand that this requirement is pretty niche, but wanted to get 
> feedback whether the Flink community would be open to supporting this 
> nonetheless.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17500) Deploy JobGraph from file in StandaloneClusterEntrypoint

2020-05-29 Thread Fabian Paul (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119656#comment-17119656
 ] 

Fabian Paul commented on FLINK-17500:
-

Sorry for the late reply and obviously no progress from my side. I have a lot 
of fruitful discussions with all the commiters involved and I also reiterated a 
couple of times what the right approach could be.

 

Finally, we have decided that in the current state it is not a feature that 
helps in any way the community. I would propose to close this ticket for now.

> Deploy JobGraph from file in StandaloneClusterEntrypoint
> 
>
> Key: FLINK-17500
> URL: https://issues.apache.org/jira/browse/FLINK-17500
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / Docker
>Reporter: Ufuk Celebi
>Assignee: Fabian Paul
>Priority: Minor
>
> We have a requirement to deploy a pre-generated {{JobGraph}} from a file in 
> {{StandaloneClusterEntrypoint}}.
> Currently, {{StandaloneClusterEntrypoint}} only supports deployment of a 
> Flink job from the class path using {{ClassPathPackagedProgramRetriever}}. 
> Our desired behaviour would be as follows:
> If {{internal.jobgraph-path}} is set, prepare a {{PackagedProgram}} from a 
> local {{JobGraph}} file using {{FileJobGraphRetriever}}. Otherwise, deploy 
> using {{ClassPathPackagedProgramRetriever}} (current behavior).
> ---
> I understand that this requirement is pretty niche, but wanted to get 
> feedback whether the Flink community would be open to supporting this 
> nonetheless.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12407: [FLINK-17842][network] Remove NextRecordResponse to improve deserialisation performance

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12407:
URL: https://github.com/apache/flink/pull/12407#issuecomment-635920790


   
   ## CI report:
   
   * cc1b0b49d02982ca88dfea41393cbbf34748806b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2448)
 
   * 695516408f92a0dc4138ff2aa046a72eab78827a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12146: [FLINK-17968][hbase] Fix Hadoop Configuration is not properly serialized in HBaseRowInputFormat

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12146:
URL: https://github.com/apache/flink/pull/12146#issuecomment-628508688


   
   ## CI report:
   
   * 2e9f1a46dd294b87fc5577a36d1fdd634ac3c64e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1268)
 
   * 59820867eb46cdb2870ca6500ac93858923590f9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17961) Create an Elasticsearch source

2020-05-29 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119635#comment-17119635
 ] 

Etienne Chauchot commented on FLINK-17961:
--

Thanks Aljoscha for commenting. ES has data streams features but only for time 
series data; the aim of this source is to read all kind of data. Apart from 
data streams it behaves like a database. You read the content of an index 
(similar to a table) corresponding to the given query (similar to SQL). So, 
regarding streaming changes, if there are changes between 2 read requests, at 
the second the whole index (containing the change) will be read another time. 
Regarding failover: I guess exactly once semantics cannot be guaranteed only at 
least once. Indeed there is no ack mechanism on already read data. Under those 
circumstances, I guess an ES source cannot get into ES. So what should a user 
do to read from ES? Should he send ES requests manually from a Map ?

> Create an Elasticsearch source
> --
>
> Key: FLINK-17961
> URL: https://issues.apache.org/jira/browse/FLINK-17961
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / ElasticSearch
>Reporter: Etienne Chauchot
>Priority: Minor
>
> There is only an Elasticsearch sink available. There are opensource github 
> repos such as [this 
> one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also 
> the apache bahir project does not provide an Elasticsearch source connector 
> for flink either. IMHO I think the project would benefit from having an 
> bundled source connector for ES alongside with the available sink connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17853) JobGraph is not getting deleted after Job cancelation

2020-05-29 Thread Fritz Budiyanto (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119634#comment-17119634
 ] 

Fritz Budiyanto commented on FLINK-17853:
-

Thanks. We will migrate to 1.10. Feel free to close this ticket. I'll re-open 
if it is still happening in 1.10.

> JobGraph is not getting deleted after Job cancelation
> -
>
> Key: FLINK-17853
> URL: https://issues.apache.org/jira/browse/FLINK-17853
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.2
> Environment: Flink 1.9.2
> Zookeeper from AWS MSK
>Reporter: Fritz Budiyanto
>Priority: Major
> Attachments: flinkissue.txt
>
>
> I have been seeing this issue several time where JobGraph are not cleaned up 
> properly after Job deletion. Job deletion is performed by using "flink stop" 
> command. As a result JobGraph node lingering in ZK, when Flink cluster is 
> restarted, it will attempt to do HA restoration on non existing checkpoint 
> which prevent the Flink cluster to come up.
> 2020-05-19 19:56:21,471 INFO 
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
> sending final execution state FINISHED to JobManager for task Source: 
> kafkaConsumer[update_server] -> (DetectedUpdateMessageConverter -> Sink: 
> update_server.detected_updates, DrivenCoordinatesMessageConverter -> Sink: 
> update_server.driven_coordinates) 588902a8096f49845b09fa1f595d6065.
> 2020-05-19 19:56:21,622 INFO 
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable - Free slot 
> TaskSlot(index:0, state:ACTIVE, resource profile: 
> ResourceProfile\{cpuCores=1.7976931348623157E308, heapMemoryInMB=2147483647, 
> directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, 
> networkMemoryInMB=2147483647, managedMemoryInMB=642}, allocationId: 
> 29f6a5f83c832486f2d7ebe5c779fa32, jobId: 86a028b3f7aada8ffe59859ca71d6385).
> 2020-05-19 19:56:21,622 INFO 
> org.apache.flink.runtime.taskexecutor.JobLeaderService - Remove job 
> 86a028b3f7aada8ffe59859ca71d6385 from job leader monitoring.
> 2020-05-19 19:56:21,622 INFO 
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - 
> Stopping ZooKeeperLeaderRetrievalService 
> /leader/86a028b3f7aada8ffe59859ca71d6385/job_manager_lock.
> 2020-05-19 19:56:21,623 INFO 
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager 
> connection for job 86a028b3f7aada8ffe59859ca71d6385.
> 2020-05-19 19:56:21,624 INFO 
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager 
> connection for job 86a028b3f7aada8ffe59859ca71d6385.
> 2020-05-19 19:56:21,624 INFO 
> org.apache.flink.runtime.taskexecutor.JobLeaderService - Cannot reconnect to 
> job 86a028b3f7aada8ffe59859ca71d6385 because it is not registered.
> ...
> Zookeeper CLI:
> ls /flink/cluster_update/jobgraphs
> [86a028b3f7aada8ffe59859ca71d6385]
>  
> Attached is the Flink logs in reverse order



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise commented on a change in pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-29 Thread GitBox


AHeise commented on a change in pull request #12353:
URL: https://github.com/apache/flink/pull/12353#discussion_r432502570



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/BroadcastRecordWriter.java
##
@@ -130,7 +132,7 @@ public BufferBuilder requestNewBufferBuilder(int 
targetChannel) throws IOExcepti
 
BufferBuilder builder = 
super.requestNewBufferBuilder(targetChannel);
if (randomTriggered) {
-   addBufferConsumer(builder.createBufferConsumer(), 
targetChannel);
+   addBufferConsumer(randomConsumer = 
builder.createBufferConsumer(), targetChannel);

Review comment:
   We have to or else the underlying buffer is not reference counted 
properly.
   The idea is to keep the `randomConsumer` and use `copy` to use the then 
updated reader index. 
   If the random event spans multiple buffers, only the last `randomConsumer` 
is effectively used.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17952) Confusing exception was thrown when old planner and batch mode is used via EnvironmentSettings

2020-05-29 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-17952:

Summary: Confusing exception was thrown when old planner and batch mode is 
used via EnvironmentSettings  (was: Improve the error message when old planner 
and batch mode is used via EnvironmentSettings)

> Confusing exception was thrown when old planner and batch mode is used via 
> EnvironmentSettings
> --
>
> Key: FLINK-17952
> URL: https://issues.apache.org/jira/browse/FLINK-17952
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Currently it doesn't support to use batch mode of the old planner via 
> EnvironmentSettings. The following message will be thrown in that case:
> {code:java}
> : org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find 
> a suitable table factory for 
> 'org.apache.flink.table.delegation.ExecutorFactory' in: 
> org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a 
> suitable table factory for 
> 'org.apache.flink.table.delegation.ExecutorFactory' inthe classpath.
> Reason: No factory supports the additional filters.
> The following properties are 
> requested:class-name=org.apache.flink.table.executor.StreamExecutorFactorystreaming-mode=false
> The following factories have been 
> considered:org.apache.flink.table.planner.delegation.BlinkExecutorFactory at 
> org.apache.flink.table.factories.ComponentFactoryService.find(ComponentFactoryService.java:71)
>  at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.create(TableEnvironmentImpl.java:253)
>  at 
> org.apache.flink.table.api.TableEnvironment.create(TableEnvironment.java:91) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>  at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>  at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) 
> at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>  at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
>  at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
>  at java.lang.Thread.run(Thread.java:745)
> {code}
> This exception message is confusing for Python users and we should improve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17952) Improve the error message when old planner and batch mode is used via EnvironmentSettings

2020-05-29 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-17952.
---
Resolution: Fixed

Merged to:
- master via be049121e861837485993e5d6756b7b0266959ee
- release-1.11 via ab8ec625789aa5cc37778e38d816364a319cf633

> Improve the error message when old planner and batch mode is used via 
> EnvironmentSettings
> -
>
> Key: FLINK-17952
> URL: https://issues.apache.org/jira/browse/FLINK-17952
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Currently it doesn't support to use batch mode of the old planner via 
> EnvironmentSettings. The following message will be thrown in that case:
> {code:java}
> : org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find 
> a suitable table factory for 
> 'org.apache.flink.table.delegation.ExecutorFactory' in: 
> org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a 
> suitable table factory for 
> 'org.apache.flink.table.delegation.ExecutorFactory' inthe classpath.
> Reason: No factory supports the additional filters.
> The following properties are 
> requested:class-name=org.apache.flink.table.executor.StreamExecutorFactorystreaming-mode=false
> The following factories have been 
> considered:org.apache.flink.table.planner.delegation.BlinkExecutorFactory at 
> org.apache.flink.table.factories.ComponentFactoryService.find(ComponentFactoryService.java:71)
>  at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.create(TableEnvironmentImpl.java:253)
>  at 
> org.apache.flink.table.api.TableEnvironment.create(TableEnvironment.java:91) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>  at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>  at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) 
> at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>  at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
>  at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
>  at java.lang.Thread.run(Thread.java:745)
> {code}
> This exception message is confusing for Python users and we should improve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17968) Hadoop Configuration is not properly serialized in HBaseRowInputFormat

2020-05-29 Thread tartarus (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119622#comment-17119622
 ] 

tartarus commented on FLINK-17968:
--

I removed the logic of generating hbase-site.xml in the unit test, and must set 
Hbase configuration when initialize HbaseInputFormat.

> Hadoop Configuration is not properly serialized in HBaseRowInputFormat
> --
>
> Key: FLINK-17968
> URL: https://issues.apache.org/jira/browse/FLINK-17968
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / HBase, Table SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: tartarus
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> This pull request mentions an issue with the serialization of the 
> {{Configuration}} field in {{HBaseRowInputFormat}}: 
> https://github.com/apache/flink/pull/12146.
> After reviewing the code, it seems that this field is {{transient}}, thus it 
> is always {{null}} at runtime.
> Note {{HBaseRowDataInputFormat}} is likely suffering from the same issue 
> (because the code has been copied)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] Tartarus0zm commented on pull request #12146: [FLINK-17968][hbase] Fix Hadoop Configuration is not properly serialized in HBaseRowInputFormat

2020-05-29 Thread GitBox


Tartarus0zm commented on pull request #12146:
URL: https://github.com/apache/flink/pull/12146#issuecomment-635982323


   @wuchong @rmetzger I removed the logic of generating hbase-site.xml in the 
unit test, and must set Hbase configuration when initialize HbaseInputFormat . 
please take a look , thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12402: Update state.zh.md

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12402:
URL: https://github.com/apache/flink/pull/12402#issuecomment-635881017


   
   ## CI report:
   
   * 6ce66868a5bb54a61083ed11703e33d9225229d1 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2438)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12405: [FLINK-16816][table sql/runtime] Add timestamp and date array test for DataFormatConverters.

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12405:
URL: https://github.com/apache/flink/pull/12405#issuecomment-635889281


   
   ## CI report:
   
   * aa0011aa6dbce3a0c6bdcafdba284d0bcb0d862c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2442)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12144: [FLINK-17384][flink-dist] support read hbase conf dir from flink.conf.

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12144:
URL: https://github.com/apache/flink/pull/12144#issuecomment-628475376


   
   ## CI report:
   
   * d36b959fb16a91c15babda10dd884cbbdec58420 UNKNOWN
   * 35f180a59746b17b3b965254b8bcdcb21e75a3c5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2454)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12410: [FLINK-13782][table-api] Implement type strategies for IF ELSE expression

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12410:
URL: https://github.com/apache/flink/pull/12410#issuecomment-635946149


   
   ## CI report:
   
   * 05965a6250f3873dfccc7e226ffb64d40d3a599d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2456)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12403: [FLINK-18030][hive] Hive UDF doesn't accept empty string literal para…

2020-05-29 Thread GitBox


flinkbot edited a comment on pull request #12403:
URL: https://github.com/apache/flink/pull/12403#issuecomment-635881099


   
   ## CI report:
   
   * 0e667a3bd702288955808b79493a09244513d202 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2439)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dianfu closed pull request #12400: [FLINK-17952][python] Fix the bug that exception was thrown when creating BatchTableEnvironment via EnvironmentSettings with old planner.

2020-05-29 Thread GitBox


dianfu closed pull request #12400:
URL: https://github.com/apache/flink/pull/12400


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >