[jira] [Commented] (FLINK-20235) Missing Hive dependencies

2020-11-22 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237173#comment-17237173
 ] 

Robert Metzger commented on FLINK-20235:


Shouldn't this ticket be a blocker?

I would like to create the first voting RC soon, I guess we want to include 
this fix in there? If so, please make sure to merge a fix soon. If you want me 
to track it, upgrade it to blocker.

> Missing Hive dependencies
> -
>
> Key: FLINK-20235
> URL: https://issues.apache.org/jira/browse/FLINK-20235
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.12.0
> Environment: hive 2.3.4
> hadoop 2.7.4
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I tried following the setup here: 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
> I put the flink-sql-connector-hive-2.3.6 in the {{\lib}} directory and tried 
> running queries (as described in 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_streaming.html)
>  via {{sql-client}}.
> {code}
> SET table.sql-dialect=hive;
> CREATE TABLE hive_table (
>   user_id STRING,
>   order_amount DOUBLE
> ) PARTITIONED BY (dt STRING, hr STRING) STORED AS parquet TBLPROPERTIES (
>   'partition.time-extractor.timestamp-pattern'='$dt $hr:00:00',
>   'sink.partition-commit.trigger'='partition-time',
>   'sink.partition-commit.delay'='1 s',
>   'sink.partition-commit.policy.kind'='metastore,success-file'
> );
> SET table.sql-dialect=default;
> SELECT * FROM hive_table;
> {code}
> It fails with:
> {code}
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.hive.shaded.parquet.format.converter.ParquetMetadataConverter
>   at 
> org.apache.flink.hive.shaded.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:112)
>   at 
> org.apache.flink.hive.shaded.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:73)
>   at 
> org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter.createReader(HiveBulkFormatAdapter.java:99)
>   at 
> org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter.createReader(HiveBulkFormatAdapter.java:62)
>   at 
> org.apache.flink.connector.file.src.impl.FileSourceSplitReader.checkSplitOrStartNext(FileSourceSplitReader.java:110)
>   at 
> org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:68)
>   at 
> org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
>   at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:136)
>   at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   ... 1 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20221) DelimitedInputFormat does not restore compressed filesplits correctly leading to dataloss

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237172#comment-17237172
 ] 

Dian Fu commented on FLINK-20221:
-

I have downgraded this issue to Critical. Feel free to set it back if anyone 
feels that this should be a blocker issue.

> DelimitedInputFormat does not restore compressed filesplits correctly leading 
> to dataloss
> -
>
> Key: FLINK-20221
> URL: https://issues.apache.org/jira/browse/FLINK-20221
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.10.2, 1.12.0, 1.11.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
> Fix For: 1.12.0, 1.10.3, 1.11.3
>
>
> It seems that the delimited input format cannot correctly restore input 
> splits if they belong to compressed files. Basically when a compressed 
> filesplit is restored in the middle, it won't read it anymore leading to 
> dataloss.
> The cause of the problem is that for compressed splits that use an inflater 
> stream, the splitlength is set to the magic number -1 which is ignored in the 
> reopen method and causes the split to go to `end` state immediately.
> The problem and the fix is shown in this commit:
> [https://github.com/gyfora/flink/commit/4adc8ba8d1989fff2db43881c9cb3799848c6e0d]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20221) DelimitedInputFormat does not restore compressed filesplits correctly leading to dataloss

2020-11-22 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-20221:

Priority: Critical  (was: Blocker)

> DelimitedInputFormat does not restore compressed filesplits correctly leading 
> to dataloss
> -
>
> Key: FLINK-20221
> URL: https://issues.apache.org/jira/browse/FLINK-20221
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.10.2, 1.12.0, 1.11.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
> Fix For: 1.12.0, 1.10.3, 1.11.3
>
>
> It seems that the delimited input format cannot correctly restore input 
> splits if they belong to compressed files. Basically when a compressed 
> filesplit is restored in the middle, it won't read it anymore leading to 
> dataloss.
> The cause of the problem is that for compressed splits that use an inflater 
> stream, the splitlength is set to the magic number -1 which is ignored in the 
> reopen method and causes the split to go to `end` state immediately.
> The problem and the fix is shown in this commit:
> [https://github.com/gyfora/flink/commit/4adc8ba8d1989fff2db43881c9cb3799848c6e0d]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14164: [FLINK-20278][Python] Throw a meaningful exception if the Python DataStream API job executes in batch mode.

2020-11-22 Thread GitBox


flinkbot commented on pull request #14164:
URL: https://github.com/apache/flink/pull/14164#issuecomment-731986480


   
   ## CI report:
   
   * 8e32013740da2349001a08f807c1ef7f398e9ee0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14163: [FLINK-20282][runtime] Improve error messages for invalid managed memory fraction

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14163:
URL: https://github.com/apache/flink/pull/14163#issuecomment-731977706


   
   ## CI report:
   
   * ccf1b22d8807727c2f78fa49871ccc620fe126e7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9924)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14162: [FLINK-19687][table] Support to get execution plan in `Flink SQL 1.11`

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14162:
URL: https://github.com/apache/flink/pull/14162#issuecomment-731977185


   
   ## CI report:
   
   * 4dec15772cc7558155d8d8b164e7adc14798bb3a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9923)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete invocation context

2020-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20265:
---
Labels: pull-request-available  (was: )

> Extend invocation protocol to allow functions to indicate incomplete 
> invocation context
> ---
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states [A, B].
> - Function receives request, but since it requires states [A, B, C, D], it 
> responds with a {{IncompleteInvocationContext}} response that indicates state 
> values for [C, D] is missing.
> - StateFun receives this response, and registers new Flink state handles for 
> [C, D].
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for [A, B, C, D] is resent to 
> the function.
> This JIRA only targets updating the Protobuf messages {{ToFunction}} and 
> {{FromFunction}} to fulfill the extended protocol, and support handling 
> {{IncompleteInvocationContext}} responses in the request dispatcher.
> Updating SDKs should be separate subtask JIRAs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise commented on pull request #14140: [FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading

2020-11-22 Thread GitBox


AHeise commented on pull request #14140:
URL: https://github.com/apache/flink/pull/14140#issuecomment-731984069


   > @AHeise I am getting confused. We probably have essential divergences on 
what `StreamTaskTestHarness.waitForInputProcessing` should do. From my 
understanding, it should wait until **all currently available input** has been 
processed not end of stream. It is waiting for an intermediate status, and 
could occur several times for single test harness, say three times in 
`TwoInputStreamTaskTest.testWatermarkMetrics`. What you try to propose here 
should have covered by combination of `StreamTaskTestHarness.endInput` and 
`StreamTaskTestHarness.waitForTaskCompletion`. That combination wait for task 
termination which is a terminated status, and should occur at most once for 
single test harness.
   > 
   > @rkhachatryan gave similar suggestion in previous review cycle, I think we 
probably should align on what `StreamTaskTestHarness.waitForInputProcessing` 
should do.
   
   You are completely right. It's just very difficult to realize the original 
implementation without some kind of hacks and assumptions - any hotfix is prone 
to fail with a slight change again. I'd probably go your way but also start 
migrating the tests - the harness has been deprecated for a reason.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-statefun] tzulitai opened a new pull request #177: [FLINK-20265] [core] Extend remote invocation protocol with IncompleteInvocationContext response type

2020-11-22 Thread GitBox


tzulitai opened a new pull request #177:
URL: https://github.com/apache/flink-statefun/pull/177


   This PR extends the current remote invocation request-reply protocol to 
allow functions to reply with a `IncompleteInvocationContext` response.
   
   The end goal for this protocol extension is to allow state specifications to 
be declared in the functions (via the language SDKs), instead of being declared 
statically in the module YAML definition files.
   
   ## Extended protocol with the new `IncompleteInvocationContext` response type
   
   Below is an explanation of how the new protocol works:
   
   1. On startup, the StateFun worker processes will not have any knowledge of 
remote function states, and therefore do not register / attempt to access any 
Flink state on their behalf.
   2. On the first remote invocation, the invocation request would not carry 
any state values.
   3. Upon receiving the invocation request, the functions may decide to 
respond with a `IncompleteInvocationContext` or the usual `InvocationResponse` 
as before. If the functions find that the invocation request has missing state 
values (after matching the provided state names with the declared states in the 
functions), then it should respond with the new `IncompleteInvocationContext` 
response type.
   4. When the StateFun workers receive an `IncompleteInvocationContext`, it 
dynamically registers Flink state for the indicated missing states, and then 
re-sends the original invocation batch now "patched" with all required states.
   5. All following invocation requests will be attached with all the required 
states.
   6. The same state specification discovery process from (3) to (5) applies 
when a function upgrades itself and declares new state, without restarting the 
StateFun workers.
   
   ## Brief change log
   
   - 923914b Changes the Protobuf messages definition file to have the new 
`IncompleteInvocationContext` response type
   - f75d7dd to 86aed63 does a few things surrounding 
`PersistedRemoteFunctionValues`. Most importantly, a new `registerStates` 
method is added to the class to support registering new `PersistedValue`s 
dynamically based on `IncompleteInvocationContext` responses from functions. 
Secondly, mark the original eager state spec (coming from module YAMLs) 
constructor as deprecated, as this will no longer be supported before the next 
release. Finally, a UT `PersistedRemoteFunctionValuesTest` is added to cover 
the user contracts of the class.
   - 2db4ad1 Actual implementation of the extended protocol in the 
`RequestReplyFunction` dispatching logic.
   
   ## Testing the change
   
   - User contracts of the adapted `PersistedRemoteFunctionValues` class is 
covered in the new UT `PersistedRemoteFunctionValuesTest`
   - A new UT `retryBatchOnIncompleteInvocationContextResponse` has been added 
to `RequestReplyFunctionTest` to verify that the `RequestReplyFunction` 
re-sends the original batch with patched states on 
`IncompleteInvocationContext` responses.
   
   ## Upgrading existing SDKs
   
   After merging this PR to `master`, existing language SDKs may begin to be 
upgraded to implement the new extended protocol. 
   
   A separate PR will be provided for updating the Python SDK, as a "reference" 
for upgrading other language SDKs.
   
   ## Backwards compatibility for existing SDKs
   
   For the time being (before the next release), old SDKs yet to be updated 
would still work as is against `master`, since eagerly declaring state 
specifications via the module YAML definition files (format versions <= `2.0`) 
will still be temporarily supported in the snapshot `master` branch.
   
   With the next major release (version `2.3.0`), module YAML format versions 
<= `2.0` will no longer be supported, and therefore old SDKs that are not 
updated will cease to work with StateFun 2.3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20157) SourceCoordinatorProvider kills JobManager with IllegalStateException on job submission

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237169#comment-17237169
 ] 

Dian Fu commented on FLINK-20157:
-

It seems that this issue depends on several other tickets. If we see this issue 
as a blocker issue, then I guess we need also mark all the dependent issues as 
blockers.

> SourceCoordinatorProvider kills JobManager with IllegalStateException on job 
> submission
> ---
>
> Key: FLINK-20157
> URL: https://issues.apache.org/jira/browse/FLINK-20157
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 1.12.0
>
>
> While setting up a test job using the new Kafka source for testing the RC1 of 
> Flink 1.12, my JobManager died with a fatal exception:
> {code}
> 2020-11-13 17:05:53,947 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Deploying 
> Flat Map -> Sink: Print to Std. Out (1/1) (attempt #0) with attempt id 
> fc36327d85e775204e82fc8507bf4264 to 192.168.1.25:57387-78ca68 @ 
> robertsbabamac2.localdomain (dataPort=57390) with allocation id 
> a8d918c0cfb57305908ce5a4f4787034
> 2020-11-13 17:05:53,988 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'SourceCoordinator-Source: Kafka Source' produced an uncaught 
> exception. Stopping the process...
> java.lang.IllegalStateException: Should never happen. This factory should 
> only be used by a SingleThreadExecutor.
> at 
> org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:619)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_222]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
> {code}
> I'm using the KafkaSource as documented, with a single partition topic:
> {code:java}
>   KafkaSource source = KafkaSource
>.builder()
>.setBootstrapServers(brokers)
>.setGroupId("myGroup")
>.setTopics(Arrays.asList(kafkaTopic))
>.setDeserializer(new NewEventDeserializer())
>.build();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on pull request #14162: [FLINK-19687][table] Support to get execution plan in `Flink SQL 1.11`

2020-11-22 Thread GitBox


wuchong commented on pull request #14162:
URL: https://github.com/apache/flink/pull/14162#issuecomment-731983804


   Hi @V1ncentzzZ , I think this introduces a new API so it is not a bugfix. 
The Flink community doesn't allow a new feature merged into the bugfix 
versions, e.g. `1.11.3`.  
   
   Could you open a new pull request for `master` branch? Maybe we can catch up 
the upcomming `1.12.0` release.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-20229) duplicate "for" security-kerberos doc

2020-11-22 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-20229.
---
Fix Version/s: 1.12.0
   Resolution: Fixed

Fixed in master (1.12.0): 89bc29e2465b14149a47331bcfb9850e2ec6ff3e

> duplicate "for" security-kerberos doc
> -
>
> Key: FLINK-20229
> URL: https://issues.apache.org/jira/browse/FLINK-20229
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: zhouyongjin
>Assignee: zhouyongjin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.12.0
>
> Attachments: image-2020-11-19-09-24-31-331.png
>
>
> !image-2020-11-19-09-24-31-331.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong merged pull request #14124: [FLINK-20229][docs]remove duplicate "for"

2020-11-22 Thread GitBox


wuchong merged pull request #14124:
URL: https://github.com/apache/flink/pull/14124


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] azagrebin commented on pull request #8952: [FLINK-10868][flink-yarn] Add failure rater for resource manager

2020-11-22 Thread GitBox


azagrebin commented on pull request #8952:
URL: https://github.com/apache/flink/pull/8952#issuecomment-731982120


   @HuangZhenQiu 
   ok, I have reopened  the PR
   Please, rebase it on master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-20229) duplicate "for" security-kerberos doc

2020-11-22 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-20229:
---

Assignee: zhouyongjin

> duplicate "for" security-kerberos doc
> -
>
> Key: FLINK-20229
> URL: https://issues.apache.org/jira/browse/FLINK-20229
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: zhouyongjin
>Assignee: zhouyongjin
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2020-11-19-09-24-31-331.png
>
>
> !image-2020-11-19-09-24-31-331.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] HuangZhenQiu opened a new pull request #8952: [FLINK-10868][flink-yarn] Add failure rater for resource manager

2020-11-22 Thread GitBox


HuangZhenQiu opened a new pull request #8952:
URL: https://github.com/apache/flink/pull/8952


   ## What is the purpose of the change
   Add failure rate for resource manager, so that every type of Resource 
Managers can handle with maximum failed executers.
   
   ## Brief change log
 - Add a failure rater to record failure numbers in last one minute.
 - Update Resource Manager to use maximum failure rate threshold to limit 
the resource allocation.
   
   ## Verifying this change
 - Added unit test in Resource Manager to verify the functionality of 
maximum failure rate threshold 
  enforcement
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (docs)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-20220:
---
Component/s: (was: Runtime / Coordination)
 API / DataSet

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Reporter: Robert Metzger
>Priority: Critical
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 

[jira] [Updated] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-20220:
---
Component/s: Runtime / Coordination

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 

[jira] [Commented] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237164#comment-17237164
 ] 

Robert Metzger commented on FLINK-20220:


Thanks a lot [~zhuzh]!
I will close this ticket, since we do not intend to put more effort into the 
{{DataSet}} API.

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> 

[jira] [Closed] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-20220.
--
Resolution: Won't Fix

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 

[GitHub] [flink] YongjinZhou commented on pull request #14124: [FLINK-20229][docs]remove duplicate "for"

2020-11-22 Thread GitBox


YongjinZhou commented on pull request #14124:
URL: https://github.com/apache/flink/pull/14124#issuecomment-731979327


   @wuchong @dianfu Have time to review it? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20267) JaasModule prevents Flink from starting if working directory is a symbolic link

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237163#comment-17237163
 ] 

Dian Fu commented on FLINK-20267:
-

[~karmagyz] Could you help to take a look at this issue?

> JaasModule prevents Flink from starting if working directory is a symbolic 
> link
> ---
>
> Key: FLINK-20267
> URL: https://issues.apache.org/jira/browse/FLINK-20267
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Till Rohrmann
>Priority: Blocker
> Fix For: 1.12.0
>
>
> [~AHeise] reported that starting Flink on EMR fails with
> {code}
> java.lang.RuntimeException: unable to generate a JAAS configuration file
> at 
> org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:170)
> at 
> org.apache.flink.runtime.security.modules.JaasModule.install(JaasModule.java:94)
> at 
> org.apache.flink.runtime.security.SecurityUtils.installModules(SecurityUtils.java:78)
> at 
> org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:59)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1045)
> Caused by: java.nio.file.FileAlreadyExistsException: /tmp
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
> at java.nio.file.Files.createDirectory(Files.java:674)
> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
> at java.nio.file.Files.createDirectories(Files.java:727)
> at 
> org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:162)
> ... 4 more
> {code}
> The problem is that on EMR {{/tmp}} is a symbolic link. Due to FLINK-19252 
> where we introduced the [creation of the working 
> directory|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/JaasModule.java#L162]
>  in order to create the default Jaas config file, the start up process fails 
> if the path for the working directory is not a directory (apparently 
> {{Files.createDirectories}} cannot deal with symbolic links).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20221) DelimitedInputFormat does not restore compressed filesplits correctly leading to dataloss

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237161#comment-17237161
 ] 

Dian Fu commented on FLINK-20221:
-

I'm also in favor of downgrading the priority to Critical and trying to fix 
this issue on a best effort basis.

> DelimitedInputFormat does not restore compressed filesplits correctly leading 
> to dataloss
> -
>
> Key: FLINK-20221
> URL: https://issues.apache.org/jira/browse/FLINK-20221
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.10.2, 1.12.0, 1.11.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Blocker
> Fix For: 1.12.0, 1.10.3, 1.11.3
>
>
> It seems that the delimited input format cannot correctly restore input 
> splits if they belong to compressed files. Basically when a compressed 
> filesplit is restored in the middle, it won't read it anymore leading to 
> dataloss.
> The cause of the problem is that for compressed splits that use an inflater 
> stream, the splitlength is set to the magic number -1 which is ignored in the 
> reopen method and causes the split to go to `end` state immediately.
> The problem and the fix is shown in this commit:
> [https://github.com/gyfora/flink/commit/4adc8ba8d1989fff2db43881c9cb3799848c6e0d]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14163: [FLINK-20282][runtime] Improve error messages for invalid managed memory fraction

2020-11-22 Thread GitBox


flinkbot commented on pull request #14163:
URL: https://github.com/apache/flink/pull/14163#issuecomment-731977706


   
   ## CI report:
   
   * ccf1b22d8807727c2f78fa49871ccc620fe126e7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14162: [FLINK-19687][table] Support to get execution plan in `Flink SQL 1.11`

2020-11-22 Thread GitBox


flinkbot commented on pull request #14162:
URL: https://github.com/apache/flink/pull/14162#issuecomment-731977185


   
   ## CI report:
   
   * 4dec15772cc7558155d8d8b164e7adc14798bb3a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14160: [FLINK-19795][table-blink] Fix Flink SQL throws exception when changelog source contains duplicate change events

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14160:
URL: https://github.com/apache/flink/pull/14160#issuecomment-731775317


   
   ## CI report:
   
   * 040aeb255189647566ce8417e135c8a9d6fc13aa Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9916)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731917055


   
   ## CI report:
   
   * adb4d262c3bf03ef0abe97cd68adf540521ff68c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9920)
 
   * 405db09250d7ecc514dd691ed2abdfa7d852b530 UNKNOWN
   * 320d598828b80e4afe23fbdd2a82124daa100a94 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9921)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14132: [FLINK-20214][k8s] Fix the unnecessary warning logs when Hadoop environment is not set

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14132:
URL: https://github.com/apache/flink/pull/14132#issuecomment-730290775


   
   ## CI report:
   
   * 71f8e69ff40b4e2bc382ddeacf24b0e975c35cbe Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9915)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14164: [FLINK-20278][Python] Throw a meaningful exception if the Python DataStream API job executes in batch mode.

2020-11-22 Thread GitBox


flinkbot commented on pull request #14164:
URL: https://github.com/apache/flink/pull/14164#issuecomment-731976203


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 8e32013740da2349001a08f807c1ef7f398e9ee0 (Mon Nov 23 
07:21:31 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20278) Throw a meaningful exception if the Python DataStream API job executes in batch mode

2020-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20278:
---
Labels: pull-request-available  (was: )

> Throw a meaningful exception if the Python DataStream API job executes in 
> batch mode
> 
>
> Key: FLINK-20278
> URL: https://issues.apache.org/jira/browse/FLINK-20278
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Shuiqiang Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently, the Python DataStream job still doesn't support batch mode. We 
> should thrown a meaningful exception if it runs in batch mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13983: [FLINK-19989][python] Add collect operation in Python DataStream API

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #13983:
URL: https://github.com/apache/flink/pull/13983#issuecomment-723574143


   
   ## CI report:
   
   * a4b291d58c664776ac81ace1f3b020239993f893 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9309)
 
   * fe0505f91d71c3a5947bdac49a8b3fb91983d5c5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9922)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] shuiqiangchen opened a new pull request #14164: [FLINK-20278][Python] Throw a meaningful exception if the Python DataStream API job executes in batch mode.

2020-11-22 Thread GitBox


shuiqiangchen opened a new pull request #14164:
URL: https://github.com/apache/flink/pull/14164


   
   
   ## What is the purpose of the change
   
   *Currently, the Python DataStream job still doesn't support batch mode. We 
should thrown a meaningful exception if it runs in batch mode.*
   
   ## Brief change log
   
   -*Throw UnsupportedOperationException if the RuntimeExecutionMode is BATCH 
when generating StreamGrapth.*
   
   ## Verifying this change
   
   This change has been verified by test_batch_execution_mode in 
test_stream_execution_environment.py.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? ( not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-20281) Window aggregation supports changelog stream input

2020-11-22 Thread Shengkai Fang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237160#comment-17237160
 ] 

Shengkai Fang edited comment on FLINK-20281 at 11/23/20, 7:16 AM:
--

We also can't do window agg on upsert-kafka. Because upsert-kafka is also a 
changelog stream. 


was (Author: fsk119):
We are also can't do window agg on upsert-kafka. Because upsert-kafka is also a 
changelog stream. 

> Window aggregation supports changelog stream input
> --
>
> Key: FLINK-20281
> URL: https://issues.apache.org/jira/browse/FLINK-20281
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner, Table SQL / Runtime
>Reporter: Jark Wu
>Priority: Major
>
> Currently, window aggregation doesn't support to consume a changelog stream. 
> This makes it impossible to do a window aggregation on changelog sources 
> (e.g. Kafka with Debezium format, or upsert-kafka, or mysql-cdc). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dianfu commented on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


dianfu commented on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731974012


   @zhuzhurk Thanks a lot for the review. Will merge once the tests passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20281) Window aggregation supports changelog stream input

2020-11-22 Thread Shengkai Fang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237160#comment-17237160
 ] 

Shengkai Fang commented on FLINK-20281:
---

We are also can't do window agg on upsert-kafka. Because upsert-kafka is also a 
changelog stream. 

> Window aggregation supports changelog stream input
> --
>
> Key: FLINK-20281
> URL: https://issues.apache.org/jira/browse/FLINK-20281
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner, Table SQL / Runtime
>Reporter: Jark Wu
>Priority: Major
>
> Currently, window aggregation doesn't support to consume a changelog stream. 
> This makes it impossible to do a window aggregation on changelog sources 
> (e.g. Kafka with Debezium format, or upsert-kafka, or mysql-cdc). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20180) Translation the FileSink Document into Chinese

2020-11-22 Thread Yun Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Gao updated FLINK-20180:

Fix Version/s: 1.12.0

> Translation the FileSink Document into Chinese
> --
>
> Key: FLINK-20180
> URL: https://issues.apache.org/jira/browse/FLINK-20180
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Connectors / FileSystem
>Reporter: Yun Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Translate the newly added FileSink documentation into Chinese



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14163: [FLINK-20282][runtime] Improve error messages for invalid managed memory fraction

2020-11-22 Thread GitBox


flinkbot commented on pull request #14163:
URL: https://github.com/apache/flink/pull/14163#issuecomment-731969606


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit ccf1b22d8807727c2f78fa49871ccc620fe126e7 (Mon Nov 23 
07:05:59 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-20282).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


zhuzhurk commented on a change in pull request #14161:
URL: https://github.com/apache/flink/pull/14161#discussion_r528500411



##
File path: 
flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java
##
@@ -250,7 +250,10 @@ public void open(PythonConfig config) throws Exception {
Struct pipelineOptions = 
PipelineOptionsTranslation.toProto(portableOptions);
 
if (memoryManager != null && config.isUsingManagedMemory()) {
-   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0);
+   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0,
+   "The configured managed memory fraction for 
Python worker process must be within (0, 1], was: %s. " +
+   "It maybe because the consumer type \"Python\" 
was missing or set to 0 for the config option 
\"taskmanager.memory.managed.consumer-weights\"." +

Review comment:
   Yes you are right. That is also possible to happen and will result in 
problems.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20282) Make invalid managed memory fraction errors more advisory in MemoryManager

2020-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20282:
---
Labels: pull-request-available  (was: )

> Make invalid managed memory fraction errors more advisory in MemoryManager
> --
>
> Key: FLINK-20282
> URL: https://issues.apache.org/jira/browse/FLINK-20282
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The invalid managed memory fraction errors[1] reported from MemoryManager are 
> not advisory for users to solve the problem. This error happens when managed 
> memory is required for a use case but its weight is 0. See FLINK-20116.
> I think it would be better to enrich the error message to guide users to 
> properly configure "taskmanager.memory.managed.consumer-weights".
> [1] "Caused by: java.lang.IllegalArgumentException: The fraction of memory to 
> allocate must within (0, 1], was: 0.0"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] xintongsong commented on pull request #14163: [FLINK-20282][runtime] Improve error messages for invalid managed memory fraction

2020-11-22 Thread GitBox


xintongsong commented on pull request #14163:
URL: https://github.com/apache/flink/pull/14163#issuecomment-731968607


   cc @zhuzhurk 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong opened a new pull request #14163: [FLINK-20282][runtime] Improve error messages for invalid managed memory fraction

2020-11-22 Thread GitBox


xintongsong opened a new pull request #14163:
URL: https://github.com/apache/flink/pull/14163


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-statefun] wangzzu commented on pull request #131: [FLINK-18968] Translate README.md to Chinese

2020-11-22 Thread GitBox


wangzzu commented on pull request #131:
URL: https://github.com/apache/flink-statefun/pull/131#issuecomment-731966327


   @tzulitai @klion26 I have submitted the new PR: 
https://github.com/apache/flink-statefun/pull/176
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-statefun] wangzzu opened a new pull request #176: [FLINK-18968] Translate README.md to Chinese

2020-11-22 Thread GitBox


wangzzu opened a new pull request #176:
URL: https://github.com/apache/flink-statefun/pull/176


   Translate README.md to Chinese. New file is named README.zh.md
   Methodology: Google Translate on HTML + Manual Correction



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731917055


   
   ## CI report:
   
   * adb4d262c3bf03ef0abe97cd68adf540521ff68c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9920)
 
   * 405db09250d7ecc514dd691ed2abdfa7d852b530 UNKNOWN
   * 320d598828b80e4afe23fbdd2a82124daa100a94 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13983: [FLINK-19989][python] Add collect operation in Python DataStream API

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #13983:
URL: https://github.com/apache/flink/pull/13983#issuecomment-723574143


   
   ## CI report:
   
   * a4b291d58c664776ac81ace1f3b020239993f893 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9309)
 
   * fe0505f91d71c3a5947bdac49a8b3fb91983d5c5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14162: [FLINK-19687][table] Support to introduce the json format execution plan by `StatementSet#explain`

2020-11-22 Thread GitBox


flinkbot commented on pull request #14162:
URL: https://github.com/apache/flink/pull/14162#issuecomment-731964807


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 4dec15772cc7558155d8d8b164e7adc14798bb3a (Mon Nov 23 
06:53:58 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19687) Support to get execution plan in `StatementSet`

2020-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19687:
---
Labels: pull-request-available  (was: )

> Support to get execution plan in `StatementSet`
> ---
>
> Key: FLINK-19687
> URL: https://issues.apache.org/jira/browse/FLINK-19687
> Project: Flink
>  Issue Type: Wish
>  Components: Table SQL / API
>Affects Versions: 1.11.0
>Reporter: xiaozilong
>Assignee: xiaozilong
>Priority: Major
>  Labels: pull-request-available
>
> Hi, I want to get job's execution plan in Flink SQL 1.11, but i meet 
> exception "No operators defined in streaming topology. Cannot execute." when 
> use `env.getExecutionPlan()`. The same code runs fine in Flink SQL 1.10. I 
> found translation operations only happen when StatementSet.execute() is 
> called in Flink SQL 1.11. So we cannot get job's execution plan before the 
> job submit? Can we support to get execution plan in `StatementSet`?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] V1ncentzzZ opened a new pull request #14162: [FLINK-19687][table] Support to introduce the json format execution plan by `StatementSet#explain`

2020-11-22 Thread GitBox


V1ncentzzZ opened a new pull request #14162:
URL: https://github.com/apache/flink/pull/14162


   
   
   ## What is the purpose of the change
   
   Support to introduce the json execution plan by `StatementSet#explain`.
   
   
   ## Brief change log
   
   *(for example:)*
 - Add an enum such as  `JSON_EXECUTION_PLAN` for ExplainDetail.
 - Modify all implementations of `Planner#explain`.
 - Append the execution plan in json format to the result when 
`Planner#explain` execute with parameter `ExplainDetail.JSON_EXECUTION_PLAN`.
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] SteNicholas commented on pull request #13983: [FLINK-19989][python] Add collect operation in Python DataStream API

2020-11-22 Thread GitBox


SteNicholas commented on pull request #13983:
URL: https://github.com/apache/flink/pull/13983#issuecomment-731962821


   @shuiqiangchen @dianfu , could you please help to review this pull request?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20285) LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices

2020-11-22 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237151#comment-17237151
 ] 

Xintong Song commented on FLINK-20285:
--

Thanks for the update. Sounds good to me.

> LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices
> --
>
> Key: FLINK-20285
> URL: https://issues.apache.org/jira/browse/FLINK-20285
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> LazyFromSourcesSchedulingStrategy is possible to schedule vertices which are 
> not in CREATED state. This will lead result in unexpected check failure and 
> result in fatal error (see attached error).
> The reason is that the status of a vertex to schedule was changed in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() 
> during the invocation of schedulerOperations.allocateSlotsAndDeploy(...) on 
> other vertices.
> e.g. ev1 and ev2 are in the same pipelined region and are restarted one by 
> one in the scheduling loop in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices(). 
> They are all CREATED at the moment. ev1 is scheduled first but it immediately 
> fails due to some slot allocation error and ev2 will be canceled as a result. 
> So when ev2 is scheduled, its state would be CANCELED and the state check 
> failed.
> More details see FLINK-20220.
> {code:java}
> 2020-11-19 13:34:17,231 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'flink-akka.actor.default-dispatcher-15' produced an uncaught 
> exception. Stopping the process...
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> expected vertex aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, 
> was: CANCELED
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708) 
> ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  ~[?:1.8.0_222]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
>  

[GitHub] [flink] flinkbot edited a comment on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731917055


   
   ## CI report:
   
   * adb4d262c3bf03ef0abe97cd68adf540521ff68c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9920)
 
   * 405db09250d7ecc514dd691ed2abdfa7d852b530 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dianfu commented on a change in pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


dianfu commented on a change in pull request #14161:
URL: https://github.com/apache/flink/pull/14161#discussion_r528491975



##
File path: 
flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java
##
@@ -250,7 +250,10 @@ public void open(PythonConfig config) throws Exception {
Struct pipelineOptions = 
PipelineOptionsTranslation.toProto(portableOptions);
 
if (memoryManager != null && config.isUsingManagedMemory()) {
-   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0);
+   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0,
+   "The configured managed memory fraction for 
Python worker process must be within (0, 1], was: %s. " +
+   "It maybe because the consumer type \"Python\" 
was missing or set to 0 for the config option 
\"taskmanager.memory.managed.consumer-weights\"." +

Review comment:
   I meant missing Python entry, for example, users configure 
*taskmanager.memory.managed.consumer-weights* as *DATAPROC:100*





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20285) LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237144#comment-17237144
 ] 

Zhu Zhu commented on FLINK-20285:
-

I'd like to fix it by changing 
LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() to 
put the vertex filtering and allocateSlotsAndDeploy() invocation in one same 
loop.
The fix PR will be opened today.

> LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices
> --
>
> Key: FLINK-20285
> URL: https://issues.apache.org/jira/browse/FLINK-20285
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> LazyFromSourcesSchedulingStrategy is possible to schedule vertices which are 
> not in CREATED state. This will lead result in unexpected check failure and 
> result in fatal error (see attached error).
> The reason is that the status of a vertex to schedule was changed in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() 
> during the invocation of schedulerOperations.allocateSlotsAndDeploy(...) on 
> other vertices.
> e.g. ev1 and ev2 are in the same pipelined region and are restarted one by 
> one in the scheduling loop in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices(). 
> They are all CREATED at the moment. ev1 is scheduled first but it immediately 
> fails due to some slot allocation error and ev2 will be canceled as a result. 
> So when ev2 is scheduled, its state would be CANCELED and the state check 
> failed.
> More details see FLINK-20220.
> {code:java}
> 2020-11-19 13:34:17,231 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'flink-akka.actor.default-dispatcher-15' produced an uncaught 
> exception. Stopping the process...
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> expected vertex aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, 
> was: CANCELED
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708) 
> ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  ~[?:1.8.0_222]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 

[GitHub] [flink] zhuzhurk commented on a change in pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


zhuzhurk commented on a change in pull request #14161:
URL: https://github.com/apache/flink/pull/14161#discussion_r528491131



##
File path: 
flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java
##
@@ -250,7 +250,10 @@ public void open(PythonConfig config) throws Exception {
Struct pipelineOptions = 
PipelineOptionsTranslation.toProto(portableOptions);
 
if (memoryManager != null && config.isUsingManagedMemory()) {
-   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0);
+   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0,
+   "The configured managed memory fraction for 
Python worker process must be within (0, 1], was: %s. " +
+   "It maybe because the consumer type \"Python\" 
was missing or set to 0 for the config option 
\"taskmanager.memory.managed.consumer-weights\"." +

Review comment:
   typo: maybe -> may be





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


zhuzhurk commented on a change in pull request #14161:
URL: https://github.com/apache/flink/pull/14161#discussion_r528491085



##
File path: 
flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java
##
@@ -250,7 +250,10 @@ public void open(PythonConfig config) throws Exception {
Struct pipelineOptions = 
PipelineOptionsTranslation.toProto(portableOptions);
 
if (memoryManager != null && config.isUsingManagedMemory()) {
-   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0);
+   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0,
+   "The configured managed memory fraction for 
Python worker process must be within (0, 1], was: %s. " +
+   "It maybe because the consumer type \"Python\" 
was missing or set to 0 for the config option 
\"taskmanager.memory.managed.consumer-weights\"." +

Review comment:
   PYTHON has a default weight 30 so missing setting it will not result in 
this problem.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20288) Correct documentation about savepoint self-contained

2020-11-22 Thread Yun Tang (Jira)
Yun Tang created FLINK-20288:


 Summary: Correct documentation about savepoint self-contained
 Key: FLINK-20288
 URL: https://issues.apache.org/jira/browse/FLINK-20288
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Yun Tang
Assignee: Yun Tang
 Fix For: 1.12.0, 1.11.4


Savepoint self-contained has been supported while the documentation still 
remain as not supported, we should fix that description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] godfreyhe commented on a change in pull request #14154: [FLINK-19878][table-planner-blink] Fix WatermarkAssigner shouldn't be after ChangelogNormalize

2020-11-22 Thread GitBox


godfreyhe commented on a change in pull request #14154:
URL: https://github.com/apache/flink/pull/14154#discussion_r528490059



##
File path: 
flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/planner/plan/stream/sql/TableScanTest.xml
##
@@ -556,10 +577,10 @@ LogicalProject(a=[$1], b=[$2], c=[$3])
 
   

[jira] [Commented] (FLINK-20277) flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure

2020-11-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237143#comment-17237143
 ] 

谢波 commented on FLINK-20277:


Sorry, I'll pay attention next time.

I have a job that needs to read the Hive table once a minute. When the task 
fails, the job keeps restarting, keeps throwing exceptions, and is 
unrecoverable. It seems that this exception is thrown when there is no file 
under the hive table, and after the exception is thrown, the state cannot be 
recovered.

 

> flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure
> -
>
> Key: FLINK-20277
> URL: https://issues.apache.org/jira/browse/FLINK-20277
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: 谢波
>Assignee: godfrey he
>Priority: Blocker
> Fix For: 1.11.3
>
>
> 流式消费Hive表,出现失败时,任务无法正常恢复,一直重启。
> 一直报错:The ContinuousFileMonitoringFunction has already restored from a 
> previous Flink version.
>  
> {color:#FF}java.io.FileNotFoundException: File does not exist: 
> hdfs://nameservice1/rawdata/db/bw_hana/sapecc/hepecc_ekko_cut{color}
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1270)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1262)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1262)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getFileStatus(HadoopFileSystem.java:85)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:588)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.getInputSplitsSortedByModTime(ContinuousFileMonit
>  oringFunction.java:279) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.monitorDirAndForwardSplits(ContinuousFileMonitori
>  ngFunction.java:251) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:215)
>  ~[
>  flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
>  ~[flink-dist_2
>  .11-1.11.2.jar:1.11.2]
>  
>  
> 2020-11-23 05:00:33,313 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Split Reader: 
> HiveFileMonitoringFunction -> S
>  ink: 
> Sink(table=[default_catalog.default_database.kafka_hepecc_ekko_cut_json], 
> fields=[mandt, ebeln, bukrs, bstyp, bsart, bsakz, loekz, statu
>  , aedat, ernam, pincr, lponr, lifnr, spras, zterm, zbd1t, zbd2t, zbd3t, 
> zbd1p, zbd2p, ekorg, ekgrp, waers, wkurs, kufix, bedat, kdatb, kdate,
>  bwbdt, angdt, bnddt, gwldt, ausnr, angnr, ihran, ihrez, verkf, telf1, llief, 
> kunnr, konnr, abgru, autlf, weakt, reswk, lblif, inco1, inco2, 
>  ktwrt, submi, knumv, kalsm, stafo, lifre, exnum, unsez, logsy, upinc, stako, 
> frggr, frgsx, frgke, frgzu, frgrl, lands, lphis, adrnr, stceg_l,
>  stceg, absgr, addnr, kornr, memory, procstat, rlwrt, revno, scmproc, 
> reason_code, memorytype, rettp, retpc, dptyp, dppct, dpamt, dpdat, msr_
>  id, hierarchy_exists, threshold_exists, legal_contract, description, 
> release_date, force_id, force_cnt, reloc_id, reloc_seq_id, source_logsys
>  , auflg, yxcort, ysyb, ypsfs, yxqlx, yjplx, ylszj, yxdry, yxdrymc, ylbid1, 
> yqybm, ysxpt_order, yy_write_if, fanpfg, yresfg, yrcofg, yretxt, y
>  ...skipping...
>  {color:#FF}java.lang.IllegalArgumentException: The 
> ContinuousFileMonitoringFunction has already restored from a previous Flink 
> version.{color}
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.initializeState(ContinuousFileMonitoringFunction.java:176)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:185)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> 

[jira] [Updated] (FLINK-20277) flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure

2020-11-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

谢波 updated FLINK-20277:
---
Description: 
流式消费Hive表,出现失败时,任务无法正常恢复,一直重启。

一直报错:The ContinuousFileMonitoringFunction has already restored from a previous 
Flink version.

 

{color:#FF}java.io.FileNotFoundException: File does not exist: 
hdfs://nameservice1/rawdata/db/bw_hana/sapecc/hepecc_ekko_cut{color}
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1270)
 ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1262)
 ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-2.6.0-cdh5.16.2.jar:?]
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1262)
 ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
 at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getFileStatus(HadoopFileSystem.java:85)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:588)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.getInputSplitsSortedByModTime(ContinuousFileMonit
 oringFunction.java:279) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.monitorDirAndForwardSplits(ContinuousFileMonitori
 ngFunction.java:251) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:215)
 ~[
 flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) 
~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
 ~[flink-dist_2
 .11-1.11.2.jar:1.11.2]

 

 

2020-11-23 05:00:33,313 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Split Reader: 
HiveFileMonitoringFunction -> S
 ink: Sink(table=[default_catalog.default_database.kafka_hepecc_ekko_cut_json], 
fields=[mandt, ebeln, bukrs, bstyp, bsart, bsakz, loekz, statu
 , aedat, ernam, pincr, lponr, lifnr, spras, zterm, zbd1t, zbd2t, zbd3t, zbd1p, 
zbd2p, ekorg, ekgrp, waers, wkurs, kufix, bedat, kdatb, kdate,
 bwbdt, angdt, bnddt, gwldt, ausnr, angnr, ihran, ihrez, verkf, telf1, llief, 
kunnr, konnr, abgru, autlf, weakt, reswk, lblif, inco1, inco2, 
 ktwrt, submi, knumv, kalsm, stafo, lifre, exnum, unsez, logsy, upinc, stako, 
frggr, frgsx, frgke, frgzu, frgrl, lands, lphis, adrnr, stceg_l,
 stceg, absgr, addnr, kornr, memory, procstat, rlwrt, revno, scmproc, 
reason_code, memorytype, rettp, retpc, dptyp, dppct, dpamt, dpdat, msr_
 id, hierarchy_exists, threshold_exists, legal_contract, description, 
release_date, force_id, force_cnt, reloc_id, reloc_seq_id, source_logsys
 , auflg, yxcort, ysyb, ypsfs, yxqlx, yjplx, ylszj, yxdry, yxdrymc, ylbid1, 
yqybm, ysxpt_order, yy_write_if, fanpfg, yresfg, yrcofg, yretxt, y
 ...skipping...
 {color:#FF}java.lang.IllegalArgumentException: The 
ContinuousFileMonitoringFunction has already restored from a previous Flink 
version.{color}
 at 
org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.initializeState(ContinuousFileMonitoringFunction.java:176)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:185)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:167)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:106)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
 

[jira] [Created] (FLINK-20287) Add documentation of how to switch memory allocator in Flink docker image

2020-11-22 Thread Yun Tang (Jira)
Yun Tang created FLINK-20287:


 Summary: Add documentation of how to switch memory allocator in 
Flink docker image
 Key: FLINK-20287
 URL: https://issues.apache.org/jira/browse/FLINK-20287
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes, Documentation
Reporter: Yun Tang
Assignee: Yun Tang
 Fix For: 1.12.0


Add documentation to tell user how to switch memory allocator in Flink docker 
image.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20256) UDAF type inference will fail if accumulator contains MapView with Pojo value type

2020-11-22 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237139#comment-17237139
 ] 

Caizhi Weng commented on FLINK-20256:
-

Indeed, if I make the test more complex, for example if I'm using a 
user-defined interface in a user-defined class then the test still fails.

{code:java}
public interface MyInterface {}

public static class MyClass implements Serializable {
public String a;
public int b;
public MyInterface[] c;

public MyClass() {}

public MyClass(String a) {
this.a = a;
this.b = a.length();
this.c = new MyInterface[b];
}
}

public static class MyAcc implements Serializable {

public MapView view = new MapView<>();

public MyAcc() {}

public void add(String a, String b) {
try {
view.put(a, new MyClass(b));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}

public static class TestUDAF extends AggregateFunction {

@Override
public MyAcc createAccumulator() {
return new MyAcc();
}

public void accumulate(MyAcc acc, String value) {
if (value != null) {
acc.add(value, value);
}
}

@Override
public String getValue(MyAcc acc) {
return "test";
}
}

@Test
public void myTest() throws Exception {
String ddl = "create function MyACC as '" + TestUDAF.class.getName() + 
"'";
tEnv().executeSql(ddl).await();
try (CloseableIterator it = tEnv().executeSql("SELECT 
MyACC('123')").collect()) {
while (it.hasNext()) {
System.out.println(it.next());
}
}
}
{code}

{code}
java.lang.ClassCastException: org.apache.flink.table.types.AtomicDataType 
cannot be cast to org.apache.flink.table.types.KeyValueDataType

at 
org.apache.flink.table.planner.typeutils.DataViewUtils$MapViewSpec.getKeyDataType(DataViewUtils.java:257)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1$$anonfun$22.apply(AggsHandlerCodeGenerator.scala:1231)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1$$anonfun$22.apply(AggsHandlerCodeGenerator.scala:1231)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$.org$apache$flink$table$planner$codegen$agg$AggsHandlerCodeGenerator$$addReusableDataViewSerializer(AggsHandlerCodeGenerator.scala:1294)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1.apply(AggsHandlerCodeGenerator.scala:1228)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1.apply(AggsHandlerCodeGenerator.scala:1211)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$.addReusableStateDataViews(AggsHandlerCodeGenerator.scala:1211)
at 
org.apache.flink.table.planner.codegen.agg.ImperativeAggCodeGen.(ImperativeAggCodeGen.scala:112)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$3.apply(AggsHandlerCodeGenerator.scala:233)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$3.apply(AggsHandlerCodeGenerator.scala:214)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator.initialAggregateInformation(AggsHandlerCodeGenerator.scala:214)
at 
org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator.generateAggsHandler(AggsHandlerCodeGenerator.scala:325)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupAggregate.translateToPlanInternal(StreamExecGroupAggregate.scala:143)
at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupAggregate.translateToPlanInternal(StreamExecGroupAggregate.scala:52)
at 

[jira] [Updated] (FLINK-19125) Avoid memory fragmentation when running flink docker image

2020-11-22 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-19125:
--
Release Note: After FLINK-19125 Jemalloc is adopted as the default memory 
allocator in flink docker image to prevent memory fragmentation problem, and 
users could roll back to use glibc by passing the 'disable-jemalloc' flag to 
the docker-entrypoint.sh script. More details please refer to flink 
documentation.  (was: Adopt Jemalloc as default memory allocator in official 
docker image's docker-entrypoint.sh to avoid known memory fragmentation 
problem, and user could also revert back to previous glibc if parameter 
'disable-jemalloc' is given.)

> Avoid memory fragmentation when running flink docker image
> --
>
> Key: FLINK-19125
> URL: https://issues.apache.org/jira/browse/FLINK-19125
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, Runtime / State Backends
>Affects Versions: 1.12.0, 1.11.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> This ticket tracks the problem of memory fragmentation when launching default 
> Flink docker image.
> In FLINK-18712, user reported if he submits job with rocksDB state backend on 
> a k8s session cluster again and again once it finished, the memory usage of 
> task manager grows continuously until OOM killed. 
>  I reproduce this problem with official Flink docker image no matter how we 
> use rocksDB (whether to enable managed memory or not).
> I dig into the problem and found this is due to the memory fragmentation 
> caused by {{glibc}}, which would not return memory to kernel gracefully 
> (please refer to [glibc 
> bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and [glibc 
> manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])
> I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
> refer to 
> [choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
>  for more details).
> And if we choose to use jemalloc to allocate memory via rebuilding another 
> docker image, the problem would be gone. 
> {code:java}
> apt-get -y install libjemalloc-dev
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
> {code}
> Jemalloc intends to [emphasize fragmentation 
> avoidance|https://github.com/jemalloc/jemalloc/wiki/Background#intended-use] 
> and we might consider to re-factor our Dockerfile to base on jemalloc to 
> avoid memory fragmentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dianfu commented on pull request #8532: [FLINK-12541][REST] Support to submit Python Table API jobs via REST API

2020-11-22 Thread GitBox


dianfu commented on pull request #8532:
URL: https://github.com/apache/flink/pull/8532#issuecomment-731950941


   @elanv The REST API is still not supported as we want to add support of 
client based submission as the first step. However, it makes sense to consider 
the REST API support now as more and more functionalities have already been 
supported for PyFlink and more and more users are trying it out. As 1.12 is 
already feature freeze and so I think we could not make it in the 1.12. I will 
revisit this work and hope we could support it in 1.13.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20285) LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices

2020-11-22 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237136#comment-17237136
 ] 

Xintong Song commented on FLINK-20285:
--

[~zhuzh]

Thanks for reporting this issue. This indeed sounds should be a release blocker 
to me.

Any plan how and when this issue can be fixed?

> LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices
> --
>
> Key: FLINK-20285
> URL: https://issues.apache.org/jira/browse/FLINK-20285
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> LazyFromSourcesSchedulingStrategy is possible to schedule vertices which are 
> not in CREATED state. This will lead result in unexpected check failure and 
> result in fatal error (see attached error).
> The reason is that the status of a vertex to schedule was changed in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() 
> during the invocation of schedulerOperations.allocateSlotsAndDeploy(...) on 
> other vertices.
> e.g. ev1 and ev2 are in the same pipelined region and are restarted one by 
> one in the scheduling loop in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices(). 
> They are all CREATED at the moment. ev1 is scheduled first but it immediately 
> fails due to some slot allocation error and ev2 will be canceled as a result. 
> So when ev2 is scheduled, its state would be CANCELED and the state check 
> failed.
> More details see FLINK-20220.
> {code:java}
> 2020-11-19 13:34:17,231 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'flink-akka.actor.default-dispatcher-15' produced an uncaught 
> exception. Stopping the process...
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> expected vertex aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, 
> was: CANCELED
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708) 
> ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  ~[?:1.8.0_222]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 

[jira] [Commented] (FLINK-20278) Throw a meaningful exception if the Python DataStream API job executes in batch mode

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237129#comment-17237129
 ] 

Dian Fu commented on FLINK-20278:
-

cc [~csq]

> Throw a meaningful exception if the Python DataStream API job executes in 
> batch mode
> 
>
> Key: FLINK-20278
> URL: https://issues.apache.org/jira/browse/FLINK-20278
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Shuiqiang Chen
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently, the Python DataStream job still doesn't support batch mode. We 
> should thrown a meaningful exception if it runs in batch mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19864) TwoInputStreamTaskTest.testWatermarkMetrics failed with "expected:<1> but was:<-9223372036854775808>"

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237122#comment-17237122
 ] 

Dian Fu commented on FLINK-19864:
-

Another instance: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9912=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=05b74a19-4ee4-5036-c46f-ada307df6cf0

> TwoInputStreamTaskTest.testWatermarkMetrics failed with "expected:<1> but 
> was:<-9223372036854775808>"
> -
>
> Key: FLINK-19864
> URL: https://issues.apache.org/jira/browse/FLINK-19864
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Kezhu Wang
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8541=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=7c61167f-30b3-5893-cc38-a9e3d057e392
> {code}
> 2020-10-28T22:40:44.2528420Z [ERROR] 
> testWatermarkMetrics(org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest)
>  Time elapsed: 1.528 s <<< FAILURE! 2020-10-28T22:40:44.2529225Z 
> java.lang.AssertionError: expected:<1> but was:<-9223372036854775808> 
> 2020-10-28T22:40:44.2541228Z at org.junit.Assert.fail(Assert.java:88) 
> 2020-10-28T22:40:44.2542157Z at 
> org.junit.Assert.failNotEquals(Assert.java:834) 2020-10-28T22:40:44.2542954Z 
> at org.junit.Assert.assertEquals(Assert.java:645) 
> 2020-10-28T22:40:44.2543456Z at 
> org.junit.Assert.assertEquals(Assert.java:631) 2020-10-28T22:40:44.2544002Z 
> at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkMetrics(TwoInputStreamTaskTest.java:540)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20286) Support streaming source for filesystem SQL connector

2020-11-22 Thread Jark Wu (Jira)
Jark Wu created FLINK-20286:
---

 Summary: Support streaming source for filesystem SQL connector
 Key: FLINK-20286
 URL: https://issues.apache.org/jira/browse/FLINK-20286
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / FileSystem, Table SQL / Ecosystem
Reporter: Jark Wu


Currenlty, the filesystem SQL connector only supports bounded source. It would 
be great to support streaming read just like Hive connector. It should monitor 
the new added files and read the content of new files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731917055


   
   ## CI report:
   
   * adb4d262c3bf03ef0abe97cd68adf540521ff68c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9920)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] godfreyhe commented on pull request #14103: [FLINK-18545] Introduce `pipeline.name` to allow users to specify job name by configuration

2020-11-22 Thread GitBox


godfreyhe commented on pull request #14103:
URL: https://github.com/apache/flink/pull/14103#issuecomment-731919518


   @twalthr @dawidwys could you take a look at this pr, thanks so much



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-20285) LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices

2020-11-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-20285:
---

Assignee: Zhu Zhu

> LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices
> --
>
> Key: FLINK-20285
> URL: https://issues.apache.org/jira/browse/FLINK-20285
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> LazyFromSourcesSchedulingStrategy is possible to schedule vertices which are 
> not in CREATED state. This will lead result in unexpected check failure and 
> result in fatal error (see attached error).
> The reason is that the status of a vertex to schedule was changed in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() 
> during the invocation of schedulerOperations.allocateSlotsAndDeploy(...) on 
> other vertices.
> e.g. ev1 and ev2 are in the same pipelined region and are restarted one by 
> one in the scheduling loop in 
> LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices(). 
> They are all CREATED at the moment. ev1 is scheduled first but it immediately 
> fails due to some slot allocation error and ev2 will be canceled as a result. 
> So when ev2 is scheduled, its state would be CANCELED and the state check 
> failed.
> More details see FLINK-20220.
> {code:java}
> 2020-11-19 13:34:17,231 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'flink-akka.actor.default-dispatcher-15' produced an uncaught 
> exception. Stopping the process...
> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
> expected vertex aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, 
> was: CANCELED
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708) 
> ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  ~[?:1.8.0_222]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> 

[jira] [Updated] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-20220:

Fix Version/s: (was: 1.11.3)
   (was: 1.12.0)

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Priority: Critical
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> 

[jira] [Updated] (FLINK-20285) LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices

2020-11-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-20285:

Description: 
LazyFromSourcesSchedulingStrategy is possible to schedule vertices which are 
not in CREATED state. This will lead result in unexpected check failure and 
result in fatal error (see attached error).

The reason is that the status of a vertex to schedule was changed in 
LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() 
during the invocation of schedulerOperations.allocateSlotsAndDeploy(...) on 
other vertices.

e.g. ev1 and ev2 are in the same pipelined region and are restarted one by one 
in the scheduling loop in 
LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices(). 
They are all CREATED at the moment. ev1 is scheduled first but it immediately 
fails due to some slot allocation error and ev2 will be canceled as a result. 
So when ev2 is scheduled, its state would be CANCELED and the state check 
failed.

More details see FLINK-20220.

{code:java}
2020-11-19 13:34:17,231 ERROR 
org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'flink-akka.actor.default-dispatcher-15' produced an uncaught exception. 
Stopping the process...
java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
expected vertex aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, was: 
CANCELED
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
 ~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
 ~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708) 
~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
 ~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 ~[?:1.8.0_222]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
Caused by: java.lang.IllegalStateException: expected vertex 
aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, was: CANCELED
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:217) 
~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$validateDeploymentOptions$3(DefaultScheduler.java:326)
 

[jira] [Assigned] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-20220:
---

Assignee: (was: Zhu Zhu)

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Priority: Critical
> Fix For: 1.12.0, 1.11.3
>
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> 

[jira] [Updated] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-20220:

Affects Version/s: (was: 1.11.0)

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 

[jira] [Commented] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237113#comment-17237113
 ] 

Zhu Zhu commented on FLINK-20220:
-

FLINK-20285 is opened for the fatal error problem.

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Assignee: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0, 1.11.3
>
> Attachments: jobmanager-1.11.log
>
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> 

[GitHub] [flink] zhuzhurk commented on a change in pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


zhuzhurk commented on a change in pull request #14161:
URL: https://github.com/apache/flink/pull/14161#discussion_r528464485



##
File path: 
flink-python/src/main/java/org/apache/flink/streaming/api/runners/python/beam/BeamPythonFunctionRunner.java
##
@@ -250,7 +250,10 @@ public void open(PythonConfig config) throws Exception {
Struct pipelineOptions = 
PipelineOptionsTranslation.toProto(portableOptions);
 
if (memoryManager != null && config.isUsingManagedMemory()) {
-   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0);
+   Preconditions.checkArgument(managedMemoryFraction > 0 
&& managedMemoryFraction <= 1.0,

Review comment:
   How about we ask users to check whether the `consumer-weights` for 
PYTHON is set to 0, which is wrong because python UDF is used?
   Because I think it is the most possible case that the check fails.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


flinkbot commented on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731917055


   
   ## CI report:
   
   * adb4d262c3bf03ef0abe97cd68adf540521ff68c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-20277) flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure

2020-11-22 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he reassigned FLINK-20277:
--

Assignee: godfrey he

> flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure
> -
>
> Key: FLINK-20277
> URL: https://issues.apache.org/jira/browse/FLINK-20277
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: 谢波
>Assignee: godfrey he
>Priority: Blocker
> Fix For: 1.11.3
>
>
> 流式消费Hive表,出现失败时,任务无法正常恢复,一直重启。
> 一直报错:The ContinuousFileMonitoringFunction has already restored from a 
> previous Flink version.
>  
> java.io.FileNotFoundException: File does not exist: 
> hdfs://nameservice1/rawdata/db/bw_hana/sapecc/hepecc_ekko_cut
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1270)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1262)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1262)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getFileStatus(HadoopFileSystem.java:85)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:588)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.getInputSplitsSortedByModTime(ContinuousFileMonit
> oringFunction.java:279) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.monitorDirAndForwardSplits(ContinuousFileMonitori
> ngFunction.java:251) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:215)
>  ~[
> flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
>  ~[flink-dist_2
> .11-1.11.2.jar:1.11.2]
>  
>  
> 2020-11-23 05:00:33,313 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Split Reader: 
> HiveFileMonitoringFunction -> S
> ink: 
> Sink(table=[default_catalog.default_database.kafka_hepecc_ekko_cut_json], 
> fields=[mandt, ebeln, bukrs, bstyp, bsart, bsakz, loekz, statu
> , aedat, ernam, pincr, lponr, lifnr, spras, zterm, zbd1t, zbd2t, zbd3t, 
> zbd1p, zbd2p, ekorg, ekgrp, waers, wkurs, kufix, bedat, kdatb, kdate,
>  bwbdt, angdt, bnddt, gwldt, ausnr, angnr, ihran, ihrez, verkf, telf1, llief, 
> kunnr, konnr, abgru, autlf, weakt, reswk, lblif, inco1, inco2, 
> ktwrt, submi, knumv, kalsm, stafo, lifre, exnum, unsez, logsy, upinc, stako, 
> frggr, frgsx, frgke, frgzu, frgrl, lands, lphis, adrnr, stceg_l,
>  stceg, absgr, addnr, kornr, memory, procstat, rlwrt, revno, scmproc, 
> reason_code, memorytype, rettp, retpc, dptyp, dppct, dpamt, dpdat, msr_
> id, hierarchy_exists, threshold_exists, legal_contract, description, 
> release_date, force_id, force_cnt, reloc_id, reloc_seq_id, source_logsys
> , auflg, yxcort, ysyb, ypsfs, yxqlx, yjplx, ylszj, yxdry, yxdrymc, ylbid1, 
> yqybm, ysxpt_order, yy_write_if, fanpfg, yresfg, yrcofg, yretxt, y
> ...skipping...
> java.lang.IllegalArgumentException: The ContinuousFileMonitoringFunction has 
> already restored from a previous Flink version.
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.initializeState(ContinuousFileMonitoringFunction.java:176)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:185)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:167)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:106)
> 

[jira] [Commented] (FLINK-20277) flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure

2020-11-22 Thread godfrey he (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237112#comment-17237112
 ] 

godfrey he commented on FLINK-20277:


[~hiscat] Thanks for reporting this, I would like to take this ticket.

> flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure
> -
>
> Key: FLINK-20277
> URL: https://issues.apache.org/jira/browse/FLINK-20277
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: 谢波
>Priority: Blocker
> Fix For: 1.11.3
>
>
> 流式消费Hive表,出现失败时,任务无法正常恢复,一直重启。
> 一直报错:The ContinuousFileMonitoringFunction has already restored from a 
> previous Flink version.
>  
> java.io.FileNotFoundException: File does not exist: 
> hdfs://nameservice1/rawdata/db/bw_hana/sapecc/hepecc_ekko_cut
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1270)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1262)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1262)
>  ~[hadoop-hdfs-2.6.0-cdh5.16.2.jar:?]
>  at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getFileStatus(HadoopFileSystem.java:85)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:588)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.getInputSplitsSortedByModTime(ContinuousFileMonit
> oringFunction.java:279) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.monitorDirAndForwardSplits(ContinuousFileMonitori
> ngFunction.java:251) ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:215)
>  ~[
> flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
>  ~[flink-dist_2
> .11-1.11.2.jar:1.11.2]
>  
>  
> 2020-11-23 05:00:33,313 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Split Reader: 
> HiveFileMonitoringFunction -> S
> ink: 
> Sink(table=[default_catalog.default_database.kafka_hepecc_ekko_cut_json], 
> fields=[mandt, ebeln, bukrs, bstyp, bsart, bsakz, loekz, statu
> , aedat, ernam, pincr, lponr, lifnr, spras, zterm, zbd1t, zbd2t, zbd3t, 
> zbd1p, zbd2p, ekorg, ekgrp, waers, wkurs, kufix, bedat, kdatb, kdate,
>  bwbdt, angdt, bnddt, gwldt, ausnr, angnr, ihran, ihrez, verkf, telf1, llief, 
> kunnr, konnr, abgru, autlf, weakt, reswk, lblif, inco1, inco2, 
> ktwrt, submi, knumv, kalsm, stafo, lifre, exnum, unsez, logsy, upinc, stako, 
> frggr, frgsx, frgke, frgzu, frgrl, lands, lphis, adrnr, stceg_l,
>  stceg, absgr, addnr, kornr, memory, procstat, rlwrt, revno, scmproc, 
> reason_code, memorytype, rettp, retpc, dptyp, dppct, dpamt, dpdat, msr_
> id, hierarchy_exists, threshold_exists, legal_contract, description, 
> release_date, force_id, force_cnt, reloc_id, reloc_seq_id, source_logsys
> , auflg, yxcort, ysyb, ypsfs, yxqlx, yjplx, ylszj, yxdry, yxdrymc, ylbid1, 
> yqybm, ysxpt_order, yy_write_if, fanpfg, yresfg, yrcofg, yretxt, y
> ...skipping...
> java.lang.IllegalArgumentException: The ContinuousFileMonitoringFunction has 
> already restored from a previous Flink version.
>  at 
> org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.initializeState(ContinuousFileMonitoringFunction.java:176)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:185)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:167)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
>  ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>  at 
> 

[jira] [Commented] (FLINK-20282) Make invalid managed memory fraction errors more advisory in MemoryManager

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237111#comment-17237111
 ] 

Zhu Zhu commented on FLINK-20282:
-

Sounds good to me.

> Make invalid managed memory fraction errors more advisory in MemoryManager
> --
>
> Key: FLINK-20282
> URL: https://issues.apache.org/jira/browse/FLINK-20282
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0
>
>
> The invalid managed memory fraction errors[1] reported from MemoryManager are 
> not advisory for users to solve the problem. This error happens when managed 
> memory is required for a use case but its weight is 0. See FLINK-20116.
> I think it would be better to enrich the error message to guide users to 
> properly configure "taskmanager.memory.managed.consumer-weights".
> [1] "Caused by: java.lang.IllegalArgumentException: The fraction of memory to 
> allocate must within (0, 1], was: 0.0"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


flinkbot commented on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731915698


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit adb4d262c3bf03ef0abe97cd68adf540521ff68c (Mon Nov 23 
04:07:30 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20285) LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices

2020-11-22 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-20285:
---

 Summary: LazyFromSourcesSchedulingStrategy is possible to schedule 
non-CREATED vertices
 Key: FLINK-20285
 URL: https://issues.apache.org/jira/browse/FLINK-20285
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.12.0, 1.11.3


LazyFromSourcesSchedulingStrategy is possible to schedule vertices which are 
not in CREATED state. This will lead result in unexpected check failure and 
result in fatal error[1].

The reason is that the status of a vertex to schedule was changed in 
LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices() 
during the invocation of schedulerOperations.allocateSlotsAndDeploy(...) on 
other vertices.

e.g. ev1 and ev2 are in the same pipelined region and are restarted one by one 
in the scheduling loop in 
LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices(). 
They are all CREATED at the moment. ev1 is scheduled first but it immediately 
fails due to some slot allocation error and ev2 will be canceled as a result. 
So when ev2 is scheduled, its state would be CANCELED and the state check 
failed.

[1]
{code:java}
2020-11-19 13:34:17,231 ERROR 
org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'flink-akka.actor.default-dispatcher-15' produced an uncaught exception. 
Stopping the process...
java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
expected vertex aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, was: 
CANCELED
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
 ~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
 ~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708) 
~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
 ~[?:1.8.0_222]
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 ~[?:1.8.0_222]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.2.jar:1.11.2]
Caused by: java.lang.IllegalStateException: expected vertex 
aafcbb93173905cec9672e46932d7790_3 to be in CREATED state, was: CANCELED
at 

[jira] [Updated] (FLINK-20283) Make invalid managed memory fraction errors of python udf more user friendly

2020-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20283:
---
Labels: pull-request-available  (was: )

> Make invalid managed memory fraction errors of python udf more user friendly
> 
>
> Key: FLINK-20283
> URL: https://issues.apache.org/jira/browse/FLINK-20283
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Assignee: Dian Fu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> When managed memory is required for python udf but its 
> "taskmanager.memory.managed.consumer-weights" is set to 0, error will happen 
> but the message is hard to understand for users, see [1].
> I think we should expose the invalid fraction error to users in this case and 
> guide users to properly configure 
> "taskmanager.memory.managed.consumer-weights".
> [1]
> {code:java}
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
>   at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
>   at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
>   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:224)
>   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:217)
>   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:208)
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:534)
>   at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
>   at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:419)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201)
>   at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalArgumentException
>   at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:126)
>   at 
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:254)
>   at 
> org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:121)
>   at 
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:134)
>   at 
> 

[GitHub] [flink] dianfu commented on pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


dianfu commented on pull request #14161:
URL: https://github.com/apache/flink/pull/14161#issuecomment-731915323


   cc @zhuzhurk 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dianfu opened a new pull request #14161: [FLINK-20283][python] Provide a meaningful exception message when managed memory fraction of Python worker process is invalid

2020-11-22 Thread GitBox


dianfu opened a new pull request #14161:
URL: https://github.com/apache/flink/pull/14161


   
   ## What is the purpose of the change
   
   *This pull request improves the exception message when managed memory 
fraction of Python worker process is invalid*
   
   
   
   ## Verifying this change
   
   This change is a trivial rework without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20282) Make invalid managed memory fraction errors more advisory in MemoryManager

2020-11-22 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237110#comment-17237110
 ] 

Xintong Song commented on FLINK-20282:
--

Thanks for reporting this, [~zhuzh].

I think you are right. Unless there is a bug, the only possible reason of the 
zero fraction is that the user has not configured the managed memory consumer 
weights properly. Therefore, it makes sense to hint users with the error 
message to check the configuration option.

In addition, I think we can add logs in 
{{ManagedMemoryUtils#getManagedMemoryUseCaseWeightsFromConfig}} if weight of a 
use case is configured to 0 or missing, which might lead to problems. We 
probably should log this at the DEBUG level, given that the method is invoked 
on every operator that uses managed memory.

WDYT?

> Make invalid managed memory fraction errors more advisory in MemoryManager
> --
>
> Key: FLINK-20282
> URL: https://issues.apache.org/jira/browse/FLINK-20282
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0
>
>
> The invalid managed memory fraction errors[1] reported from MemoryManager are 
> not advisory for users to solve the problem. This error happens when managed 
> memory is required for a use case but its weight is 0. See FLINK-20116.
> I think it would be better to enrich the error message to guide users to 
> properly configure "taskmanager.memory.managed.consumer-weights".
> [1] "Caused by: java.lang.IllegalArgumentException: The fraction of memory to 
> allocate must within (0, 1], was: 0.0"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20116) Test Intra-Slot Managed Memory Sharing

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237107#comment-17237107
 ] 

Zhu Zhu edited comment on FLINK-20116 at 11/23/20, 3:59 AM:


Test is done and no blocker issue is found.

{{Details:}}
Test jobs:
1. Streaming job with Rocksdb backend
2. Batch job with HashAgg operator which uses managed memory
3. Python job with python UDF
4. Streaming job with Rocksdb backend and python UDF
5. Batch job with Sort operator and  and python UDF

Test cases:
1. Normal case: default value of "taskmanager.memory.managed.consumer-weights"
2. Normal case: consumer weights set to large values with a summation larger 
than 100, e.g. "DATAPROC:200,PYTHON:100"
3. Exceptional case: required consumer weight set to zero, i.e. 
"DATAPROC:30,PYTHON:0" or "DATAPROC:0,PYTHON:70"
4. Exceptional case: consumer weights all set to zero, i.e. 
"DATAPROC:0,PYTHON:0"

Verification:
1. Is the feature working as expected under normal conditions
status : Ok. Fractions are properly computed and used by batch operators, 
RocksDB statebackends and Python UDF operator. Jobs successfully finished
2. Is the feature working / failing as expected with invalid input, induced 
errors etc.
status : almost Ok. Tasks fail as expected due to invalid fraction when 
allocating managed memory
found FLINK-20284
3. Are the error messages, log messages, APIs etc. easy to understand
opened FLINK-20282 FLINK-20283
4. Is the documentation easy to understand
opened FLINK-20246


was (Author: zhuzh):
Test is done and no blocker issue is found.

{{Details:}}
Test jobs:
1. Streaming job with Rocksdb backend
2. Batch job with HashAgg operator which uses managed memory
3. Python job with python UDF
4. Streaming job with Rocksdb backend and python UDF
5. Batch job with Sort operator and  and python UDF

Test cases:
1. Normal case: default value of "taskmanager.memory.managed.consumer-weights"
2. Normal case: consumer weights set to large values with a summation larger 
than 100, e.g. "DATAPROC:200,PYTHON:100"
3. Exceptional case: required consumer weight set to zero, i.e. 
"DATAPROC:30,PYTHON:0" or "DATAPROC:0,PYTHON:70"
4. Exceptional case: consumer weights all set to zero, i.e. 
"DATAPROC:0,PYTHON:0"

Verification:
1. Is the feature working as expected under normal conditions
status : Ok. Fractions are properly computed and used by batch operators, 
RocksDB statebackends and Python UDF operator. Jobs successfully finished
2. Is the feature working / failing as expected with invalid input, induced 
errors etc.
status : almost Ok. Tasks fail due to invalid fraction when allocating 
managed memory
found FLINK-20284
3. Are the error messages, log messages, APIs etc. easy to understand
opened FLINK-20282 FLINK-20283
4. Is the documentation easy to understand
opened FLINK-20246

> Test Intra-Slot Managed Memory Sharing
> --
>
> Key: FLINK-20116
> URL: https://issues.apache.org/jira/browse/FLINK-20116
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0
>
>
> Introduced in https://issues.apache.org/jira/browse/FLINK-19177
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing keep us updated on tests conducted, or please write a 
> short summary of all things you have tested in the end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20116) Test Intra-Slot Managed Memory Sharing

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237107#comment-17237107
 ] 

Zhu Zhu edited comment on FLINK-20116 at 11/23/20, 3:57 AM:


Test is done and no blocker issue is found.

{{Details:}}
Test jobs:
1. Streaming job with Rocksdb backend
2. Batch job with HashAgg operator which uses managed memory
3. Python job with python UDF
4. Streaming job with Rocksdb backend and python UDF
5. Batch job with Sort operator and  and python UDF

Test cases:
1. Normal case: default value of "taskmanager.memory.managed.consumer-weights"
2. Normal case: consumer weights set to large values with a summation larger 
than 100, e.g. "DATAPROC:200,PYTHON:100"
3. Exceptional case: required consumer weight set to zero, i.e. 
"DATAPROC:30,PYTHON:0" or "DATAPROC:0,PYTHON:70"
4. Exceptional case: consumer weights all set to zero, i.e. 
"DATAPROC:0,PYTHON:0"

Verification:
1. Is the feature working as expected under normal conditions
status : Ok. Fractions are properly computed and used by batch operators, 
RocksDB statebackends and Python UDF operator. Jobs successfully finished
2. Is the feature working / failing as expected with invalid input, induced 
errors etc.
status : almost Ok. Tasks fail due to invalid fraction when allocating 
managed memory
found FLINK-20284
3. Are the error messages, log messages, APIs etc. easy to understand
opened FLINK-20282 FLINK-20283
4. Is the documentation easy to understand
opened FLINK-20246


was (Author: zhuzh):
Test is done and no blocker issue is found.

{{Details:}}
Test jobs:
1. Streaming job with Rocksdb backend
2. Batch job with HashAgg operator which uses managed memory
3. Python job with python UDF
4. Streaming job with Rocksdb backend and python UDF
5. Batch job with Sort operator and  and python UDF

Test cases:
1. Normal case: default value of "taskmanager.memory.managed.consumer-weights"
2. Normal case: consumer weights set to large values with a summation larger 
than 100, e.g. "DATAPROC:200,PYTHON:100"
3. Exceptional case: required consumer weight set to zero, i.e. 
"DATAPROC:30,PYTHON:0" or "DATAPROC:0,PYTHON:70"
4. Exceptional case: consumer weights all set to zero, i.e. 
"DATAPROC:0,PYTHON:0"

Verification:
1. Is the feature working as expected under normal conditions
  - status : Ok. Fractions are properly computed and used by batch operators, 
RocksDB statebackends and Python UDF operator. Jobs successfully finished
2. Is the feature working / failing as expected with invalid input, induced 
errors etc.
  - status : almost Ok. Tasks fail due to invalid fraction when allocating 
managed memory
  - found FLINK-20284
3. Are the error messages, log messages, APIs etc. easy to understand
  - opened FLINK-20282 FLINK-20283
4. Is the documentation easy to understand
  - opened FLINK-20246


> Test Intra-Slot Managed Memory Sharing
> --
>
> Key: FLINK-20116
> URL: https://issues.apache.org/jira/browse/FLINK-20116
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0
>
>
> Introduced in https://issues.apache.org/jira/browse/FLINK-19177
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing keep us updated on tests conducted, or please write a 
> short summary of all things you have tested in the end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20116) Test Intra-Slot Managed Memory Sharing

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237107#comment-17237107
 ] 

Zhu Zhu commented on FLINK-20116:
-

Test is done and no blocker issue is found.

{{Details:}}
Test jobs:
1. Streaming job with Rocksdb backend
2. Batch job with HashAgg operator which uses managed memory
3. Python job with python UDF
4. Streaming job with Rocksdb backend and python UDF
5. Batch job with Sort operator and  and python UDF

Test cases:
1. Normal case: default value of "taskmanager.memory.managed.consumer-weights"
2. Normal case: consumer weights set to large values with a summation larger 
than 100, e.g. "DATAPROC:200,PYTHON:100"
3. Exceptional case: required consumer weight set to zero, i.e. 
"DATAPROC:30,PYTHON:0" or "DATAPROC:0,PYTHON:70"
4. Exceptional case: consumer weights all set to zero, i.e. 
"DATAPROC:0,PYTHON:0"

Verification:
1. Is the feature working as expected under normal conditions
  - status : Ok. Fractions are properly computed and used by batch operators, 
RocksDB statebackends and Python UDF operator. Jobs successfully finished
2. Is the feature working / failing as expected with invalid input, induced 
errors etc.
  - status : almost Ok. Tasks fail due to invalid fraction when allocating 
managed memory
  - found FLINK-20284
3. Are the error messages, log messages, APIs etc. easy to understand
  - opened FLINK-20282 FLINK-20283
4. Is the documentation easy to understand
  - opened FLINK-20246


> Test Intra-Slot Managed Memory Sharing
> --
>
> Key: FLINK-20116
> URL: https://issues.apache.org/jira/browse/FLINK-20116
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0
>
>
> Introduced in https://issues.apache.org/jira/browse/FLINK-19177
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing keep us updated on tests conducted, or please write a 
> short summary of all things you have tested in the end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20284) Error happens in TaskExecutor when closing JobMaster connection if there was a python UDF

2020-11-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-20284:

Component/s: Runtime / Coordination

> Error happens in TaskExecutor when closing JobMaster connection if there was 
> a python UDF
> -
>
> Key: FLINK-20284
> URL: https://issues.apache.org/jira/browse/FLINK-20284
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.12.0
>
>
> When a TaskExecutor successfully finished running a python UDF task and 
> disconnecting from JobMaster, errors below will happen. This error, however, 
> seems not affect job execution at the moment.
> {code:java}
> 2020-11-20 17:05:21,932 INFO  
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService [] - 1 Beam Fn 
> Logging clients still connected during shutdown.
> 2020-11-20 17:05:21,938 WARN  
> org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer[] - Hanged up 
> for unknown endpoint.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task   
>  [] - Source: Custom Source -> select: (f0) -> select: 
> (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> (b0c2104dd8f87bb1caf0c83586c22a51) switched from RUNNING to FINISHED.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task   
>  [] - Freeing task resources for Source: Custom Source -> select: 
> (f0) -> select: (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select 
> table sink (1/1)#0 (b0c2104dd8f87bb1caf0c83586c22a51).
> 2020-11-20 17:05:22,128 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - 
> Un-registering task and sending final execution state FINISHED to JobManager 
> for task Source: Custom Source -> select: (f0) -> select: (add_one(f0) AS a) 
> -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> b0c2104dd8f87bb1caf0c83586c22a51.
> 2020-11-20 17:05:22,156 INFO  
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
> TaskSlot(index:0, state:ACTIVE, resource profile: 
> ResourceProfile{cpuCores=1., taskHeapMemory=384.000mb 
> (402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
> (536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
> b67c3307dcf93757adfb4f0f9f7b8c7b, jobId: d05f32162f38ec3ec813c4621bc106d9).
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
> d05f32162f38ec3ec813c4621bc106d9 from job leader monitoring.
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Close 
> JobManager connection for job d05f32162f38ec3ec813c4621bc106d9.
> 2020-11-20 17:05:23,064 ERROR 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.rejectedExecution
>  [] - Failed to submit a listener notification task. Event loop shut down?
> java.lang.NoClassDefFoundError: 
> org/apache/beam/vendor/grpc/v1p26p0/io/netty/util/concurrent/GlobalEventExecutor$2
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:227)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:215)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
>  
> 

[jira] [Commented] (FLINK-20284) Error happens in TaskExecutor when closing JobMaster connection if there was a python UDF

2020-11-22 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237105#comment-17237105
 ] 

Dian Fu commented on FLINK-20284:
-

[~hxbks2ks] Could you help to take a look at this issue?

> Error happens in TaskExecutor when closing JobMaster connection if there was 
> a python UDF
> -
>
> Key: FLINK-20284
> URL: https://issues.apache.org/jira/browse/FLINK-20284
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.12.0
>
>
> When a TaskExecutor successfully finished running a python UDF task and 
> disconnecting from JobMaster, errors below will happen. This error, however, 
> seems not affect job execution at the moment.
> {code:java}
> 2020-11-20 17:05:21,932 INFO  
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService [] - 1 Beam Fn 
> Logging clients still connected during shutdown.
> 2020-11-20 17:05:21,938 WARN  
> org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer[] - Hanged up 
> for unknown endpoint.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task   
>  [] - Source: Custom Source -> select: (f0) -> select: 
> (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> (b0c2104dd8f87bb1caf0c83586c22a51) switched from RUNNING to FINISHED.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task   
>  [] - Freeing task resources for Source: Custom Source -> select: 
> (f0) -> select: (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select 
> table sink (1/1)#0 (b0c2104dd8f87bb1caf0c83586c22a51).
> 2020-11-20 17:05:22,128 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - 
> Un-registering task and sending final execution state FINISHED to JobManager 
> for task Source: Custom Source -> select: (f0) -> select: (add_one(f0) AS a) 
> -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> b0c2104dd8f87bb1caf0c83586c22a51.
> 2020-11-20 17:05:22,156 INFO  
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
> TaskSlot(index:0, state:ACTIVE, resource profile: 
> ResourceProfile{cpuCores=1., taskHeapMemory=384.000mb 
> (402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
> (536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
> b67c3307dcf93757adfb4f0f9f7b8c7b, jobId: d05f32162f38ec3ec813c4621bc106d9).
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
> d05f32162f38ec3ec813c4621bc106d9 from job leader monitoring.
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Close 
> JobManager connection for job d05f32162f38ec3ec813c4621bc106d9.
> 2020-11-20 17:05:23,064 ERROR 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.rejectedExecution
>  [] - Failed to submit a listener notification task. Event loop shut down?
> java.lang.NoClassDefFoundError: 
> org/apache/beam/vendor/grpc/v1p26p0/io/netty/util/concurrent/GlobalEventExecutor$2
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:227)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:215)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
>  
> 

[jira] [Commented] (FLINK-20059) Outdated SQL docs on aggregate functions' merge

2020-11-22 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237101#comment-17237101
 ] 

Jark Wu commented on FLINK-20059:
-

Fixed in master (1.12.0): 
 - 3a066d6ec3fe46716b12e8d2756ba5f0e463fe43
 - d6a1fd049058cb38008289f8563a828930bd55a2



> Outdated SQL docs on aggregate functions' merge
> ---
>
> Key: FLINK-20059
> URL: https://issues.apache.org/jira/browse/FLINK-20059
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table SQL / API
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Nico Kruber
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> In the java docs as well as the user docs, the {{merge}} method of an 
> aggregation UDF is described as optional, e.g.
> {quote}Merges a group of accumulator instances into one accumulator instance. 
> This function must be implemented for data stream session window grouping 
> aggregates and data set grouping aggregates.{quote}
> However, it seems that nowadays this method is required in more cases (I 
> stumbled on this for a HOP window in streaming):
> {code}
> StreamExecGlobalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap, mergedAccExternalTypes)
> StreamExecGroupWindowAggregateBase.scala
>   generator.needMerge(mergedAccOffset = 0, mergedAccOnHeap = false)
> StreamExecIncrementalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap = true, 
> mergedAccExternalTypes)
> StreamExecLocalGroupAggregate.scala
>   .needMerge(mergedAccOffset = 0, mergedAccOnHeap = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20199) PyFlink e2e DataStream test failed during stopping kafka services

2020-11-22 Thread Shuiqiang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237087#comment-17237087
 ] 

Shuiqiang Chen edited comment on FLINK-20199 at 11/23/20, 3:38 AM:
---

[~hxbks2ks] Thank you for reporting this issue. It seems that it failed to stop 
the Kafka process after the DataStream job finished. I'll keep an eye on this 
problem.


was (Author: csq):
[~hxbks2ks] Thank you for reporting this issue. It is because the testing 
environment's instability that it fail to stop the Kafka process after the 
DataStream job finished.

> PyFlink e2e DataStream test failed during stopping kafka services
> -
>
> Key: FLINK-20199
> URL: https://issues.apache.org/jira/browse/FLINK-20199
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Huang Xingbo
>Priority: Major
> Fix For: 1.12.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9667=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529]
>  
> {code}
> Nov 17 06:04:35 Reading kafka messages... 
> Nov 17 06:05:17 Cancelling job 55acfb14436b24b9c6fb0f47d7297f83. 
> Nov 17 06:05:17 Cancelled job 55acfb14436b24b9c6fb0f47d7297f83. 
> Nov 17 06:05:18 Stopping taskexecutor daemon (pid: 110438) on host 
> fv-az598-520. 
> Nov 17 06:05:24 Stopping standalonesession daemon (pid: 110130) on host 
> fv-az598-520. 
> Nov 17 06:05:25 Waiting till process is stopped: pid = 123127 pattern = 
> 'kafka' 
> Nov 17 06:05:26 Waiting till process is stopped: pid = 123717 pattern = 
> 'kafka'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20284) Error happens in TaskExecutor when closing JobMaster connection if there was a python UDF

2020-11-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237102#comment-17237102
 ] 

Zhu Zhu commented on FLINK-20284:
-

cc [~dian.fu]

> Error happens in TaskExecutor when closing JobMaster connection if there was 
> a python UDF
> -
>
> Key: FLINK-20284
> URL: https://issues.apache.org/jira/browse/FLINK-20284
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.12.0
>
>
> When a TaskExecutor successfully finished running a python UDF task and 
> disconnecting from JobMaster, errors below will happen. This error, however, 
> seems not affect job execution at the moment.
> {code:java}
> 2020-11-20 17:05:21,932 INFO  
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService [] - 1 Beam Fn 
> Logging clients still connected during shutdown.
> 2020-11-20 17:05:21,938 WARN  
> org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer[] - Hanged up 
> for unknown endpoint.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task   
>  [] - Source: Custom Source -> select: (f0) -> select: 
> (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> (b0c2104dd8f87bb1caf0c83586c22a51) switched from RUNNING to FINISHED.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task   
>  [] - Freeing task resources for Source: Custom Source -> select: 
> (f0) -> select: (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select 
> table sink (1/1)#0 (b0c2104dd8f87bb1caf0c83586c22a51).
> 2020-11-20 17:05:22,128 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - 
> Un-registering task and sending final execution state FINISHED to JobManager 
> for task Source: Custom Source -> select: (f0) -> select: (add_one(f0) AS a) 
> -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> b0c2104dd8f87bb1caf0c83586c22a51.
> 2020-11-20 17:05:22,156 INFO  
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
> TaskSlot(index:0, state:ACTIVE, resource profile: 
> ResourceProfile{cpuCores=1., taskHeapMemory=384.000mb 
> (402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
> (536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
> b67c3307dcf93757adfb4f0f9f7b8c7b, jobId: d05f32162f38ec3ec813c4621bc106d9).
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
> d05f32162f38ec3ec813c4621bc106d9 from job leader monitoring.
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Close 
> JobManager connection for job d05f32162f38ec3ec813c4621bc106d9.
> 2020-11-20 17:05:23,064 ERROR 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.rejectedExecution
>  [] - Failed to submit a listener notification task. Event loop shut down?
> java.lang.NoClassDefFoundError: 
> org/apache/beam/vendor/grpc/v1p26p0/io/netty/util/concurrent/GlobalEventExecutor$2
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:227)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:215)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
> at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
>  
> 

[jira] [Closed] (FLINK-20059) Outdated SQL docs on aggregate functions' merge

2020-11-22 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-20059.
---
Resolution: Fixed

> Outdated SQL docs on aggregate functions' merge
> ---
>
> Key: FLINK-20059
> URL: https://issues.apache.org/jira/browse/FLINK-20059
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table SQL / API
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Nico Kruber
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> In the java docs as well as the user docs, the {{merge}} method of an 
> aggregation UDF is described as optional, e.g.
> {quote}Merges a group of accumulator instances into one accumulator instance. 
> This function must be implemented for data stream session window grouping 
> aggregates and data set grouping aggregates.{quote}
> However, it seems that nowadays this method is required in more cases (I 
> stumbled on this for a HOP window in streaming):
> {code}
> StreamExecGlobalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap, mergedAccExternalTypes)
> StreamExecGroupWindowAggregateBase.scala
>   generator.needMerge(mergedAccOffset = 0, mergedAccOnHeap = false)
> StreamExecIncrementalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap = true, 
> mergedAccExternalTypes)
> StreamExecLocalGroupAggregate.scala
>   .needMerge(mergedAccOffset = 0, mergedAccOnHeap = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14160: [FLINK-19795][table-blink] Fix Flink SQL throws exception when changelog source contains duplicate change events

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14160:
URL: https://github.com/apache/flink/pull/14160#issuecomment-731775317


   
   ## CI report:
   
   * 44ff14a0d30e454dc2502d1f644fdb94bf6e64bd Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9904)
 
   * 040aeb255189647566ce8417e135c8a9d6fc13aa Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9916)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14132: [FLINK-20214][k8s] Fix the unnecessary warning logs when Hadoop environment is not set

2020-11-22 Thread GitBox


flinkbot edited a comment on pull request #14132:
URL: https://github.com/apache/flink/pull/14132#issuecomment-730290775


   
   ## CI report:
   
   * 6c498db9839dea282cf02b7629039d829f14d08c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9870)
 
   * 71f8e69ff40b4e2bc382ddeacf24b0e975c35cbe Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9915)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Issue Comment Deleted] (FLINK-20059) Outdated SQL docs on aggregate functions' merge

2020-11-22 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-20059:

Comment: was deleted

(was: Fixed in master (1.12.0): 3a066d6ec3fe46716b12e8d2756ba5f0e463fe43)

> Outdated SQL docs on aggregate functions' merge
> ---
>
> Key: FLINK-20059
> URL: https://issues.apache.org/jira/browse/FLINK-20059
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table SQL / API
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Nico Kruber
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> In the java docs as well as the user docs, the {{merge}} method of an 
> aggregation UDF is described as optional, e.g.
> {quote}Merges a group of accumulator instances into one accumulator instance. 
> This function must be implemented for data stream session window grouping 
> aggregates and data set grouping aggregates.{quote}
> However, it seems that nowadays this method is required in more cases (I 
> stumbled on this for a HOP window in streaming):
> {code}
> StreamExecGlobalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap, mergedAccExternalTypes)
> StreamExecGroupWindowAggregateBase.scala
>   generator.needMerge(mergedAccOffset = 0, mergedAccOnHeap = false)
> StreamExecIncrementalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap = true, 
> mergedAccExternalTypes)
> StreamExecLocalGroupAggregate.scala
>   .needMerge(mergedAccOffset = 0, mergedAccOnHeap = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20284) Error happens in TaskExecutor when closing JobMaster connection if there was a python UDF

2020-11-22 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-20284:
---

 Summary: Error happens in TaskExecutor when closing JobMaster 
connection if there was a python UDF
 Key: FLINK-20284
 URL: https://issues.apache.org/jira/browse/FLINK-20284
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.12.0
Reporter: Zhu Zhu
 Fix For: 1.12.0


When a TaskExecutor successfully finished running a python UDF task and 
disconnecting from JobMaster, errors below will happen. This error, however, 
seems not affect job execution at the moment.

{code:java}
2020-11-20 17:05:21,932 INFO  
org.apache.beam.runners.fnexecution.logging.GrpcLoggingService [] - 1 Beam Fn 
Logging clients still connected during shutdown.
2020-11-20 17:05:21,938 WARN  
org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer[] - Hanged up for 
unknown endpoint.
2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task 
   [] - Source: Custom Source -> select: (f0) -> select: (add_one(f0) 
AS a) -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
(b0c2104dd8f87bb1caf0c83586c22a51) switched from RUNNING to FINISHED.
2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task 
   [] - Freeing task resources for Source: Custom Source -> select: 
(f0) -> select: (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select 
table sink (1/1)#0 (b0c2104dd8f87bb1caf0c83586c22a51).
2020-11-20 17:05:22,128 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - 
Un-registering task and sending final execution state FINISHED to JobManager 
for task Source: Custom Source -> select: (f0) -> select: (add_one(f0) AS a) -> 
to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
b0c2104dd8f87bb1caf0c83586c22a51.
2020-11-20 17:05:22,156 INFO  
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
TaskSlot(index:0, state:ACTIVE, resource profile: 
ResourceProfile{cpuCores=1., taskHeapMemory=384.000mb 
(402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
(536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
b67c3307dcf93757adfb4f0f9f7b8c7b, jobId: d05f32162f38ec3ec813c4621bc106d9).
2020-11-20 17:05:22,157 INFO  
org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
d05f32162f38ec3ec813c4621bc106d9 from job leader monitoring.
2020-11-20 17:05:22,157 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Close 
JobManager connection for job d05f32162f38ec3ec813c4621bc106d9.
2020-11-20 17:05:23,064 ERROR 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.rejectedExecution
 [] - Failed to submit a listener notification task. Event loop shut down?
java.lang.NoClassDefFoundError: 
org/apache/beam/vendor/grpc/v1p26p0/io/netty/util/concurrent/GlobalEventExecutor$2
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:227)
 
~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:215)
 
~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
 
[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
 
[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
 
[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
 
[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
 
[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1089)
 
[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
at 
org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 

[jira] [Commented] (FLINK-20236) flink-dist.jar and hive-exec.jar have conflict with protobuf-java.jar

2020-11-22 Thread CloseRiver (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237099#comment-17237099
 ] 

CloseRiver commented on FLINK-20236:


Hi [~trohrmann],i didn't verify 1.12-SNAPSHOT,so i'm not sure about that.

> flink-dist.jar and hive-exec.jar have conflict with protobuf-java.jar
> -
>
> Key: FLINK-20236
> URL: https://issues.apache.org/jira/browse/FLINK-20236
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Connectors / Hive
>Affects Versions: 1.12.0, 1.11.1
> Environment: flink 1.11.1
> hive 2.3.4
>Reporter: CloseRiver
>Priority: Critical
> Fix For: 1.12.0, 1.11.3
>
>
> We customed the protobuf format.When using the hive catalog and consume the 
> protobuf data from kafka,the following exception appear
> {code:java}
> java.lang.NoSuchMethodError: 
> com.google.protobuf.Descriptors$Descriptor.getOneofs()Ljava/util/List;java.lang.NoSuchMethodError:
>  com.google.protobuf.Descriptors$Descriptor.getOneofs()Ljava/util/List; at 
> com.google.protobuf.GeneratedMessageV3$FieldAccessorTable.(GeneratedMessageV3.java:1813)
> {code}
> What should i do,exclude the protobuf-java from flink-dist or from hive-exec.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >