Re: Help. Who can add permission in FLIP.

2021-05-16 Thread Robert Metzger
Hey, I gave you edit permissions in the Flink wiki!

On Mon, May 17, 2021 at 3:30 AM  wrote:

> Hi,I want to write a FLIP in [confluence](
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals).Who
> can help? Thx.
> My username is wangwj.My email is wangw...@sina.cn.
>
>
>


[jira] [Created] (FLINK-22678) Fix Loading Changelog Statebackend with configs set in job-level and cluster-level separately

2021-05-16 Thread Yuan Mei (Jira)
Yuan Mei created FLINK-22678:


 Summary: Fix Loading Changelog Statebackend with configs set in 
job-level and cluster-level separately
 Key: FLINK-22678
 URL: https://issues.apache.org/jira/browse/FLINK-22678
 Project: Flink
  Issue Type: Bug
Reporter: Yuan Mei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Status of a savepoint operation returns Completed but an error was thrown

2021-05-16 Thread Diogo Santos
Hi guys,

We developed some scripts to improve the rolling updates in our pipelines,
and one of the tasks done is to trigger a savepoint and waits for the
response until the status is Completed or until it achieves the limit of
retries.

It was noticed that sometimes the response has the status Completed but the
request failed:

{
"status": {
"id": "COMPLETED"
},
"operation": {
"failure-cause": {
"class": "java.util.concurrent.CompletionException",
"stack-trace": "java.util.concurrent.CompletionException: 
)\n\t... 47 more\n",
"serialized-throwable": "..."
}
}
}

An easy way to reproduce the issue is to put the job in a restart loop and
trigger a savepoint.

Should the status be in-progress, right?


[jira] [Created] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-05-16 Thread Jin Xing (Jira)
Jin Xing created FLINK-22677:


 Summary: Scheduler should invoke 
ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
 Key: FLINK-22677
 URL: https://issues.apache.org/jira/browse/FLINK-22677
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Jin Xing


Current scheduler enforces a synchronous registration though the API of 
ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In 
scenario of remote shuffle service, the talk between ShuffleMaster and remote 
cluster tends to be expensive. A synchronous registration risks to block main 
thread potentially and might cause negative side effects like heartbeat timeout.

Additionally, expensive synchronous invokes to remote could bottleneck the 
throughput for applying shuffle resource, especially for batch jobs with 
complicated DAGs;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22676) The partition tracker should support remote shuffle properly

2021-05-16 Thread Jin Xing (Jira)
Jin Xing created FLINK-22676:


 Summary: The partition tracker should support remote shuffle 
properly
 Key: FLINK-22676
 URL: https://issues.apache.org/jira/browse/FLINK-22676
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Jin Xing


In current Flink, data partition is bound with the ResourceID of TM in 
Execution#startTrackingPartitions and partition tracker will stop tracking 
corresponding partitions when a TM 
disconnects(JobMaster#disconnectTaskManager), i.e. the lifecycle of shuffle 
data is bound with computing resource (TM). It works fine for internal shuffle 
service, but doesn't for remote shuffle service. Note that shuffle data is 
accommodated on remote, the lifecycle of a completed partition is capable to be 
decoupled with TM, i.e. TM is totally fine to be released when no computing 
task on it and further shuffle reading requests could be directed to remote 
shuffle cluster. In addition, when a TM is lost, its completed data partitions 
on remote shuffle cluster could avoid reproducing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22675) Add an interface method ShuffleMaster#close

2021-05-16 Thread Jin Xing (Jira)
Jin Xing created FLINK-22675:


 Summary: Add an interface method ShuffleMaster#close
 Key: FLINK-22675
 URL: https://issues.apache.org/jira/browse/FLINK-22675
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Jin Xing


When extending remote shuffle service based on 'pluggable shuffle service', 
ShuffleMaster talks with remote cluster by network connection. This Jira 
proposes to add an interface method – ShuffleMaster#close, which can be 
extended and do cleanup work and will be called when Flink application is 
closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22673) Add document about add jar related commands

2021-05-16 Thread Shengkai Fang (Jira)
Shengkai Fang created FLINK-22673:
-

 Summary: Add document about add jar related commands
 Key: FLINK-22673
 URL: https://issues.apache.org/jira/browse/FLINK-22673
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Shengkai Fang
 Fix For: 1.14.0


Including {{ADD JAR}}, {{SHOW JAR}}, {{REMOVE JAR}}. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22674) Provide JobID when apply shuffle resource by ShuffleMaster

2021-05-16 Thread Jin Xing (Jira)
Jin Xing created FLINK-22674:


 Summary: Provide JobID when apply shuffle resource by ShuffleMaster
 Key: FLINK-22674
 URL: https://issues.apache.org/jira/browse/FLINK-22674
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Jin Xing


In current Flink 'pluggable shuffle service' framework, only 
PartitionDescriptor and ProducerDescriptor are included as parameters in 
ShuffleMaster#registerPartitionWithProducer.

But when extending a remote shuffle service based on 'pluggable shuffle 
service', JobID is also needed when apply shuffle resource from remote cluster. 
It can be used as an identification to link shuffle resource with the 
corresponding job:
 # Remote shuffle cluster can isolate or do capacity control on shuffle 
resource between jobs;
 # Remote shuffle cluster can use JobID for shuffle data cleanup when job is 
lost thus to avoid file leak;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22672) Some enhancements for pluggable shuffle service framework

2021-05-16 Thread Jin Xing (Jira)
Jin Xing created FLINK-22672:


 Summary: Some enhancements for pluggable shuffle service framework
 Key: FLINK-22672
 URL: https://issues.apache.org/jira/browse/FLINK-22672
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Jin Xing


"Pluggable shuffle service" in Flink provides an architecture which are unified 
for both streaming and batch jobs, allowing user to customize the process of 
data transfer between shuffle stages according to scenarios.

There are already a number of implementations of "remote shuffle service" on 
Spark like [1][2][3]. Remote shuffle enables to shuffle data from/to a remote 
cluster and achieves benefits like :
 # The lifecycle of computing resource can be decoupled with shuffle data, once 
computing task is finished, idle computing nodes can be released with its 
completed shuffle data accormadated on remote shuffle cluster.
 # There is no need to reserve disk capacity for shuffle on computing nodes. 
Remote shuffle cluster serves shuffling request with better scaling ability and 
alleviates the local disk pressure on computing nodes when data skew.

Based "pluggable shuffle service", we build our own "remote shuffle service" on 
Flink -- Lattice, which targets to provide functionalities and improve 
performance for batch processing jobs. Basically it works as below:
 # Lattice cluster works as an independent service for shuffling request;
 # LatticeShuffleMaster extends ShuffleMaster, works inside JM and talks with 
remote Lattice cluster for shuffle resouce application and shuffle data 
lifecycle management;
 # LatticeShuffleEnvironmente extends ShuffleEnvironment, works inside TM and 
provides an environment for shuffling data from/to remote Lattice cluster;

During the process of building Lattice we find some potential enhancements on 
"pluggable shuffle service". I will enumerate and create some sub JIRAs under 
this umbrella

 

[1] 
[https://www.alibabacloud.com/blog/emr-remote-shuffle-service-a-powerful-elastic-tool-of-serverless-spark_597728]

[2] [https://bestoreo.github.io/post/cosco/cosco/]

[3] [https://github.com/uber/RemoteShuffleService]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Help. Who can add permission in FLIP.

2021-05-16 Thread wangwj03
Hi,I want to write a FLIP in 
[confluence](https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals).Who
 can help? Thx.
My username is wangwj.My email is wangw...@sina.cn.




Re: [VOTE] Release 1.12.4, release candidate #1

2021-05-16 Thread Dawid Wysakowicz
+1 (binding)

  * Verified checksums and signatures
  * Checked no significant version changes compared to 1.12.3 (one new
test scope dependency)
  * Checked no changes to the NOTICE files
  * Built from sources
  * Run example using binary 2.12 distribution
  * verified a random class in flink-scala_2.11 and _2.12 if it was
compiled with the correct scala version

Best,

Dawid

On 10/05/2021 23:34, Arvid Heise wrote:
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 1.12.4,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 476DAA5D1FF08189 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.12.4-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Your friendly release manager Arvid
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350110
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.4-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1421
> [5] https://github.com/apache/flink/releases/tag/release-1.12.4-rc1
> [6] https://github.com/apache/flink-web/pull/446
>


OpenPGP_signature
Description: OpenPGP digital signature


[jira] [Created] (FLINK-22671) xxx

2021-05-16 Thread Jira
王彬 created FLINK-22671:
--

 Summary: xxx
 Key: FLINK-22671
 URL: https://issues.apache.org/jira/browse/FLINK-22671
 Project: Flink
  Issue Type: Bug
Reporter: 王彬






--
This message was sent by Atlassian Jira
(v8.3.4#803005)