date:20191205

[RESULT] [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-05 Thread Dian Fu

Hi everyone,

Thanks for the discussion and votes.

So far we have received 4 approving votes, 3 of which are binding and there is 
no -1 votes:
* Jincheng (binding)
* Hequn (binding)
* Jark (binding)
* Jingsong (non-binding)

Therefore, I'm happy to announce that FLIP-88 has been accepted.

Thanks everyone!

Regards,
Dian

[jira] [Created] (FLINK-15091) JoinITCase.testFullJoinWithNonEquiJoinPred failed in travis

2019-12-05 Thread Kurt Young (Jira)

Kurt Young created FLINK-15091:
--

 Summary: JoinITCase.testFullJoinWithNonEquiJoinPred failed in 
travis
 Key: FLINK-15091
 URL: https://issues.apache.org/jira/browse/FLINK-15091
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.10.0
Reporter: Kurt Young
Assignee: Jingsong Lee
 Fix For: 1.10.0


04:45:22.404 [ERROR] Tests run: 21, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 4.909 s <<< FAILURE! - in 
org.apache.flink.table.planner.runtime.batch.table.JoinITCase 04:45:22.406 
[ERROR] 
testFullJoinWithNonEquiJoinPred(org.apache.flink.table.planner.runtime.batch.table.JoinITCase)
 Time elapsed: 0.168 s <<< ERROR! 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at 
org.apache.flink.table.planner.runtime.batch.table.JoinITCase.testFullJoinWithNonEquiJoinPred(JoinITCase.scala:344)
 Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy Caused by: 
org.apache.flink.runtime.memory.MemoryAllocationException: Could not allocate 
32 pages. Only 0 pages are remaining.

 

details: [https://api.travis-ci.org/v3/job/621407747/log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15090) Reverse the dependency from flink-streaming-java to flink-client

2019-12-05 Thread Zili Chen (Jira)

Zili Chen created FLINK-15090:
-

 Summary: Reverse the dependency from flink-streaming-java to 
flink-client
 Key: FLINK-15090
 URL: https://issues.apache.org/jira/browse/FLINK-15090
 Project: Flink
  Issue Type: Improvement
Reporter: Zili Chen
 Fix For: 1.11.0


After FLIP-73 the dependencies are minor. Tasks I can find are

1. Move {{StreamGraphTranslator}} to {{flink-client}}.
2. Implement similar context environment of streaming as {{flink-java}}. 
Set/Unset as context along with {{ExecutionEnvironment}}.

After this task we still have a dependency from {{flink-streaming-java}} to 
{{flink-java}} because some input formats dependencies. We can break the 
dependencies as follow-ups.

cc [~aljoscha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-05 Thread Jark Wu

Thanks for driving this Dian! The FLIP looks good to me.

+1 (binding)

Best,
Jark

On Thu, 5 Dec 2019 at 16:44, Hequn Cheng  wrote:

> +1 (binding)
>
> Best,
> Hequn
>
> On Thu, Dec 5, 2019 at 4:41 PM jincheng sun 
> wrote:
>
> > +1（binding)
> >
> > Best,
> > Jincheng
> >
> > Jingsong Li  于2019年12月3日周二 下午7:30写道：
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Dec 2, 2019 at 5:30 PM Dian Fu  wrote:
> > >
> > > > Hi Jingsong,
> > > >
> > > > It's fine. :)  Appreciated the comments!
> > > >
> > > > I have replied you in the discussion thread as I also think it's
> better
> > > to
> > > > discuss these in the discussion thread.
> > > >
> > > > Thanks,
> > > > Dian
> > > >
> > > > > 在 2019年12月2日，下午3:47，Jingsong Li  写道：
> > > > >
> > > > > Sorry for bothering your voting.
> > > > > Let's discuss in discussion thread.
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Mon, Dec 2, 2019 at 3:32 PM Jingsong Lee <
> lzljs3620...@apache.org
> > >
> > > > wrote:
> > > > >
> > > > >> Hi Dian:
> > > > >>
> > > > >> Thanks for your driving. I have some questions:
> > > > >>
> > > > >> - Where should these configurations belong? You have mentioned
> > > > >> tableApi/SQL, so should in TableConfig?
> > > > >> - If just in table/sql, whether it should be called:
> > > table.python.,
> > > > >> because in table, all config options are called table.***.
> > > > >> - What should table module do? So in CommonPythonCalc, we should
> > read
> > > > >> options from table config, and set resources to
> > > OneInputTransformation?
> > > > >> - Are all buffer.memory off-heap memory? I took a look
> > > > >> to AbstractPythonScalarFunctionOperator, there is a
> > > > forwardedInputQueue, is
> > > > >> this one a heap queue? So we need heap memory too?
> > > > >>
> > > > >> Hope to get your reply.
> > > > >>
> > > > >> Best,
> > > > >> Jingsong Lee
> > > > >>
> > > > >> On Mon, Dec 2, 2019 at 2:34 PM Dian Fu 
> > wrote:
> > > > >>
> > > > >>> Hi all,
> > > > >>>
> > > > >>> I'd like to start the vote of FLIP-88 [1] since that we have
> > reached
> > > an
> > > > >>> agreement on the design in the discussion thread [2].
> > > > >>>
> > > > >>> This vote will be open for at least 72 hours. Unless there is an
> > > > >>> objection, I will try to close it by Dec 5, 2019 08:00 UTC if we
> > have
> > > > >>> received sufficient votes.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Dian
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
> > > > >>> [2]
> > > > >>>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best, Jingsong Lee
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Best, Jingsong Lee
> > > >
> > > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>

[jira] [Created] (FLINK-15089) Pulsar Catalog

2019-12-05 Thread Yijie Shen (Jira)

Yijie Shen created FLINK-15089:
--

 Summary: Pulsar Catalog
 Key: FLINK-15089
 URL: https://issues.apache.org/jira/browse/FLINK-15089
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Common
Reporter: Yijie Shen


Per discussion in the mailing list, A Pulsar Catalog implementation is made to 
a single task. The design doc is: 
[https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15088) "misc" build fail on travis-ci

2019-12-05 Thread Dezhi Cai (Jira)

Dezhi Cai created FLINK-15088:
-

 Summary: "misc" build fail on travis-ci
 Key: FLINK-15088
 URL: https://issues.apache.org/jira/browse/FLINK-15088
 Project: Flink
  Issue Type: Bug
  Components: Travis
Affects Versions: 1.10.0
Reporter: Dezhi Cai


log : [https://api.travis-ci.com/v3/job/263294674/log.txt] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Drop Heap Backend Synchronous snapshots

2019-12-05 Thread Yun Tang

+1 from my side for I did not see any real benefits if using synchronous 
snapshots.

Moreover, I think we should also remove the support of synchronous snapshots in 
DefaultOpeatorStateBackend and deprecate the config state.backend.async

Best
Yun Tang

On 12/5/19, 8:06 PM, "Stephan Ewen"  wrote:

Hi all!

I am wondering if there is any case for retaining the option to make
synchronous snapshots on the heap statebackend. Is anyone using that? Or
could we clean that code up and remove it?

Best,
Stephan

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-05 Thread Dian Fu

Hi Jingsong,

Appreciated for your sharing. It's very helpful as the Python operator will 
take the similar way.

Thanks,
Dian

> 在 2019年12月6日，上午11:12，Jingsong Li  写道：
> 
> Hi Dian,
> 
> After [1] and [2], in the batch sql world, we will:
> - [2] In client/compile side: we use memory weight request memory for
> Transformation.
> - [1] In runtime side: we use memory fraction to compute memory size and
> allocate in StreamOperator.
> For your information.
> 
> [1] https://jira.apache.org/jira/browse/FLINK-14063
> [2] https://jira.apache.org/jira/browse/FLINK-15035
> 
> Best,
> Jingsong Lee
> 
> On Tue, Dec 3, 2019 at 6:07 PM Dian Fu  wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks for your valuable feedback. I have updated the "Example" section
>> describing how to use these options in a Python Table API program.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年12月2日，下午6:12，Jingsong Lee  写道：
>>> 
>>> Hi Dian:
>>> 
>>> Thanks for you explanation.
>>> If you can update the document to add explanation for the changes to the
>>> table layer,
>>> it might be better. (it's just a suggestion, it depends on you)
>>> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
>>> Will this queue take up a lot of memory?
>>> Can it also occupy memory as large as buffer.memory?
>>> If so, what we're dealing with now is the silent use of heap memory?
>>> I feel a little strange, because the memory on the python side will
>> reserve,
>>> but the memory on the JVM side is used silently.
>>> 
>>> After carefully seeing your comments on Google doc:
 The memory used by the Java operator is currently accounted as the task
>>> on-heap memory. We can revisit this if we find it's a problem in the
>> future.
>>> I agree that we can ignore it now, But we can add some content to the
>>> document to remind the user, What do you think?
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
>>> 
 Hi Jingsong,
 
 Thanks a lot for your comments. Please see my reply inlined below.
 
> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
> 
> Hi Dian:
> 
> 
> Thanks for your driving. I have some questions:
> 
> 
> - Where should these configurations belong? You have mentioned
 tableApi/SQL,
> so should in TableConfig?
 
 All Python related configurations are defined in PythonOptions. User
>> could
 configure these configurations via TableConfig.getConfiguration.setXXX
>> for
 Python Table API programs.
 
> 
> - If just in table/sql, whether it should be called: table.python.,
> because in table, all config options are called table.***.
 
 These configurations are not table specific. They will be used for both
 Python Table API programs and Python DataStream API programs (which is
 planned to be supported in the future). So python.xxx seems more
 appropriate, what do you think?
 
> - What should table module do? So in CommonPythonCalc, we should read
> options from table config, and set resources to OneInputTransformation?
 
 As described in the design doc, in compilation phase, for batch jobs,
>> the
 required memory of the Python worker will be calculated according to the
 configuration and set as the managed memory for the operator. For stream
 jobs, the resource spec will be unknown(The reason is that currently the
 resources for all the operators in stream jobs are unknown and it
>> doesn’t
 support to configure both known and unknown resources in a single job).
 
> - Are all buffer.memory off-heap memory? I took a look
> to AbstractPythonScalarFunctionOperator, there is a
>> forwardedInputQueue,
 is
> this one a heap queue? So we need heap memory too?
 
 Yes, they are all off-heap memory which is supposed to be used by the
 Python process. The forwardedInputQueue is a buffer used in the Java
 operator and its memory is accounted as the on-heap memory.
 
 Regards,
 Dian
 
> 
> Hope to get your reply.
> 
> 
> Best,
> 
> Jingsong Lee
> 
> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu 
>> wrote:
> 
>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>> offline and also on the design doc.
>> 
>> It seems that we have reached consensus on the design. I would bring
>> up
>> the VOTE if there is no other feedbacks.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
>>> 
>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>> It is great to make sure that the resources used by the Python
>> process
>> are
>>> managed properly by Flink’s resource management framework.
>>> 
>>> Also, thanks to the guys that working on the unified memory
>> management
>>> framework.
>>> 
>>> Best, Hequn
>>> 
>>> 
>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo 
>> wr

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-05 Thread Jingsong Li

Hi Dian,

After [1] and [2], in the batch sql world, we will:
- [2] In client/compile side: we use memory weight request memory for
Transformation.
- [1] In runtime side: we use memory fraction to compute memory size and
allocate in StreamOperator.
For your information.

[1] https://jira.apache.org/jira/browse/FLINK-14063
[2] https://jira.apache.org/jira/browse/FLINK-15035

Best,
Jingsong Lee

On Tue, Dec 3, 2019 at 6:07 PM Dian Fu  wrote:

> Hi Jingsong,
>
> Thanks for your valuable feedback. I have updated the "Example" section
> describing how to use these options in a Python Table API program.
>
> Thanks,
> Dian
>
> > 在 2019年12月2日，下午6:12，Jingsong Lee  写道：
> >
> > Hi Dian:
> >
> > Thanks for you explanation.
> > If you can update the document to add explanation for the changes to the
> > table layer,
> > it might be better. (it's just a suggestion, it depends on you)
> > About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> > Will this queue take up a lot of memory?
> > Can it also occupy memory as large as buffer.memory?
> > If so, what we're dealing with now is the silent use of heap memory?
> > I feel a little strange, because the memory on the python side will
> reserve,
> > but the memory on the JVM side is used silently.
> >
> > After carefully seeing your comments on Google doc:
> >> The memory used by the Java operator is currently accounted as the task
> > on-heap memory. We can revisit this if we find it's a problem in the
> future.
> > I agree that we can ignore it now, But we can add some content to the
> > document to remind the user, What do you think?
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
> >
> >> Hi Jingsong,
> >>
> >> Thanks a lot for your comments. Please see my reply inlined below.
> >>
> >>> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
> >>>
> >>> Hi Dian:
> >>>
> >>>
> >>> Thanks for your driving. I have some questions:
> >>>
> >>>
> >>> - Where should these configurations belong? You have mentioned
> >> tableApi/SQL,
> >>> so should in TableConfig?
> >>
> >> All Python related configurations are defined in PythonOptions. User
> could
> >> configure these configurations via TableConfig.getConfiguration.setXXX
> for
> >> Python Table API programs.
> >>
> >>>
> >>> - If just in table/sql, whether it should be called: table.python.,
> >>> because in table, all config options are called table.***.
> >>
> >> These configurations are not table specific. They will be used for both
> >> Python Table API programs and Python DataStream API programs (which is
> >> planned to be supported in the future). So python.xxx seems more
> >> appropriate, what do you think?
> >>
> >>> - What should table module do? So in CommonPythonCalc, we should read
> >>> options from table config, and set resources to OneInputTransformation?
> >>
> >> As described in the design doc, in compilation phase, for batch jobs,
> the
> >> required memory of the Python worker will be calculated according to the
> >> configuration and set as the managed memory for the operator. For stream
> >> jobs, the resource spec will be unknown(The reason is that currently the
> >> resources for all the operators in stream jobs are unknown and it
> doesn’t
> >> support to configure both known and unknown resources in a single job).
> >>
> >>> - Are all buffer.memory off-heap memory? I took a look
> >>> to AbstractPythonScalarFunctionOperator, there is a
> forwardedInputQueue,
> >> is
> >>> this one a heap queue? So we need heap memory too?
> >>
> >> Yes, they are all off-heap memory which is supposed to be used by the
> >> Python process. The forwardedInputQueue is a buffer used in the Java
> >> operator and its memory is accounted as the on-heap memory.
> >>
> >> Regards,
> >> Dian
> >>
> >>>
> >>> Hope to get your reply.
> >>>
> >>>
> >>> Best,
> >>>
> >>> Jingsong Lee
> >>>
> >>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu 
> wrote:
> >>>
>  Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>  offline and also on the design doc.
> 
>  It seems that we have reached consensus on the design. I would bring
> up
>  the VOTE if there is no other feedbacks.
> 
>  Thanks,
>  Dian
> 
> > 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
> >
> > Thanks a lot for putting this together, Dian! Definitely +1 for this!
> > It is great to make sure that the resources used by the Python
> process
>  are
> > managed properly by Flink’s resource management framework.
> >
> > Also, thanks to the guys that working on the unified memory
> management
> > framework.
> >
> > Best, Hequn
> >
> >
> > On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo 
> wrote:
> >
> >> Thanks for driving this discussion, Dian!
> >>
> >> +1 for this proposal. It will help to reduce container failure due
> to
> >> the memory overuse.
> >> Some comments left in the design doc.
> >>
> >> Best,
> >> Ya

[jira] [Created] (FLINK-15087) JobManager is forced to shutdown JVM due to temporary loss of zookeeper connection

2019-12-05 Thread Abdul Qadeer (Jira)

Abdul Qadeer created FLINK-15087:


 Summary: JobManager is forced to shutdown JVM due to temporary 
loss of zookeeper connection
 Key: FLINK-15087
 URL: https://issues.apache.org/jira/browse/FLINK-15087
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.8.2
Reporter: Abdul Qadeer


While testing I found that the loss of connection with zookeeper triggers JVM 
shutdown for Job Manager, when started through 
"StandaloneSessionClusterEntrypoint". This happens due to a NPE on 
"taskManagerHeartbeatManager."

When JobManagerRunner suspends jobMasterService (as Job manager is no longer 
leader), taskManagerHeartbeatManager is set to null in "stopHeartbeatServices".

Next, "AkkaRpcActor" stops JobMaster and throws NPE in the following method:


{code:java}
@Override
public CompletableFuture disconnectTaskManager(final ResourceID 
resourceID, final Exception cause) {
   log.debug("Disconnect TaskExecutor {} because: {}", resourceID, 
cause.getMessage());

   taskManagerHeartbeatManager.unmonitorTarget(resourceID);
   slotPool.releaseTaskManager(resourceID, cause);
{code}
 

This leads to a fatal error finally in "ClusterEntryPoint.onFatalError()" and 
forces JVM shutdown.

The stack trace is below:

 
{noformat}
{"timeMillis":1575581120723,"thread":"flink-akka.actor.default-dispatcher-93","level":"ERROR","loggerName":"com.Sample","message":"Failed
 to take leadership with session id 
b4662db5-f065-41d9-aaaf-78625355b251.","thrown":{"commonElementCount":0,"localizedMessage":"Failed
 to take leadership with session id 
b4662db5-f065-41d9-aaaf-78625355b251.","message":"Failed to take leadership 
with session id 
b4662db5-f065-41d9-aaaf-78625355b251.","name":"org.apache.flink.runtime.dispatcher.DispatcherException","cause":{"commonElementCount":18,"localizedMessage":"Termination
 of previous JobManager for job bbb8c430787d92293e9d45c349231d9c failed. Cannot 
submit job under the same job id.","message":"Termination of previous 
JobManager for job bbb8c430787d92293e9d45c349231d9c failed. Cannot submit job 
under the same job 
id.","name":"org.apache.flink.runtime.dispatcher.DispatcherException","cause":{"commonElementCount":6,"localizedMessage":"org.apache.flink.util.FlinkException:
 Could not properly shut down the 
JobManagerRunner","message":"org.apache.flink.util.FlinkException: Could not 
properly shut down the 
JobManagerRunner","name":"java.util.concurrent.CompletionException","cause":{"commonElementCount":6,"localizedMessage":"Could
 not properly shut down the JobManagerRunner","message":"Could not properly 
shut down the 
JobManagerRunner","name":"org.apache.flink.util.FlinkException","cause":{"commonElementCount":13,"localizedMessage":"Failure
 while stopping RpcEndpoint jobmanager_0.","message":"Failure while stopping 
RpcEndpoint 
jobmanager_0.","name":"org.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException","cause":{"commonElementCount":13,"name":"java.lang.NullPointerException","extendedStackTrace":[{"class":"org.apache.flink.runtime.jobmaster.JobMaster","method":"disconnectTaskManager","file":"JobMaster.java","line":629,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"org.apache.flink.runtime.jobmaster.JobMaster","method":"onStop","file":"JobMaster.java","line":346,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState","method":"terminate","file":"AkkaRpcActor.java","line":504,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"org.apache.flink.runtime.rpc.akka.AkkaRpcActor","method":"handleControlMessage","file":"AkkaRpcActor.java","line":170,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"org.apache.flink.runtime.rpc.akka.AkkaRpcActor","method":"onReceive","file":"AkkaRpcActor.java","line":142,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"}]},"extendedStackTrace":[{"class":"org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState","method":"terminate","file":"AkkaRpcActor.java","line":508,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"org.apache.flink.runtime.rpc.akka.AkkaRpcActor","method":"handleControlMessage","file":"AkkaRpcActor.java","line":170,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"org.apache.flink.runtime.rpc.akka.AkkaRpcActor","method":"onReceive","file":"AkkaRpcActor.java","line":142,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"}]},"extendedStackTrace":[{"class":"org.apache.flink.runtime.jobmaster.JobManagerRunner","method":"lambda$closeAsync$0","file":"JobManagerRunner.java","line":207,"exact":false,"location":"flink-runtime_2.11-1.8.2.jar","version":"1.8.2"},{"class":"java.util.concurrent.CompletableFuture","method":"uniWhenComplete"

[jira] [Created] (FLINK-15086) JobCluster may not be shutdown if executeAsync called

2019-12-05 Thread Zili Chen (Jira)

Zili Chen created FLINK-15086:
-

 Summary: JobCluster may not be shutdown if executeAsync called
 Key: FLINK-15086
 URL: https://issues.apache.org/jira/browse/FLINK-15086
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission, Runtime / Coordination
Reporter: Zili Chen


If a JobCluster started in attached execution mode, it will wait for a client 
requesting its result and shutdown itself. However, there is no permission that 
a user get {{JobClient}} from {{executeAsync}} call {{getJobExecutionResult}} 
so the cluster may remain.

cc [~aljoscha] [~kkl0u]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15085) History server overview page fails loading because of web-submit feature not in the config

2019-12-05 Thread chaganti spurthi (Jira)

chaganti spurthi created FLINK-15085:


 Summary: History server overview page fails loading because of 
web-submit feature not in the config
 Key: FLINK-15085
 URL: https://issues.apache.org/jira/browse/FLINK-15085
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Reporter: chaganti spurthi


History server has javascript errors while loading the overview page, because 
the web-submit feature not found in the config. 
{code:java}
main.9c4be059472ea41d7052.js:1 ERROR TypeError: Cannot read property 
'web-submit' of undefined
at new t (main.9c4be059472ea41d7052.js:1)
at qr (main.9c4be059472ea41d7052.js:1)
at Gr (main.9c4be059472ea41d7052.js:1)
at ko (main.9c4be059472ea41d7052.js:1)
at Oo (main.9c4be059472ea41d7052.js:1)
at Object.Bo [as createRootView] (main.9c4be059472ea41d7052.js:1)
at e.create (main.9c4be059472ea41d7052.js:1)
at e.create (main.9c4be059472ea41d7052.js:1)
at t.bootstrap (main.9c4be059472ea41d7052.js:1)
at main.9c4be059472ea41d7052.js:1
{code}
It seems to be coming since this change 
[FLINK-13818|(https://issues.apache.org/jira/browse/FLINK-13818) ]  : 
[https://github.com/apache/flink/pull/9883]



The issue is that for history server we are not setting the web-submit feature 
in the conf and the /config endpoint returns 
{code:java}
{"refresh-interval":1,"timezone-offset":-1800,"timezone-name":"Eastern 
Time","flink-version":"","flink-revision":"d9f8abb @ 04.12.2019 @ 
16:16:24 EST"}{code}
while as in the Jobmanager the /config endpoint returns
{code:java}
{"refresh-interval":3000,"timezone-name":"Coordinated Universal 
Time","timezone-offset":0,"flink-version":"1.9-criteo-rc1-1573156762","flink-revision":"366237a
 @ 07.11.2019 @ 20:00:32 UTC","features":{"web-submit":true}}

{code}
*AppComponent.ts* fails at this line because the feature web-submit is not 
found in the config:
{code:java}
webSubmitEnabled = this.statusService.configuration.features['web-submit'];

{code}
This can be fixed in two ways:
 # Add defensive check in the *AppComponent.ts* 
{code:java}
 webSubmitEnabled =
    (this.statusService.configuration &&
this.statusService.configuration.features &&
        this.statusService.configuration.features['web-submit']);{code}

 # Add the features property in the config file that *HistoryServer.java* 
generates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Robert Metzger

Thanks for your comments Yun.
If there's strong support for idea 2, it would actually make my
life easier: the migration would be easier to do.

I also noticed that the uploads to transfer.sh were broken, but this should
be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The
builds in "flink-ci.flink" (coming from flink-ci/flink) might have troubles
with transfer.sh.


On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:

> Hi Robert
>
> Really exciting to see this new more powerful CI tool to get rid of the 50
> minutes limit of traivs-CI free account.
>
> After reading the wiki, I support idea 2 of AZP-setup version-2.
>
> However, after I dig into some failing builds at
> https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the
> logs of some IT cases which would be uploaded by traivs_watchdog to
> transfer.sh previously.
> I think this feature is also easy to implement in AZP, right?
>
> Best
> Yun Tang
>
> On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
>
> I've created a first draft of my plans in the wiki:
>
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
> .
> I'm looking forward to your comments.
>
> On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger 
> wrote:
>
> > Thank you all for the positive feedback. I will start putting
> together a
> > page in the wiki.
> >
> > @Jark: Azure Pipelines provides a free services, that is even better
> than
> > what Travis provides for free: 10 parallel builds with 6 hours
> timeouts.
> >
> > @Chesnay: I will answer your questions in the yet-to-be-written
> > documentation in the wiki.
> >
> >
> > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise 
> wrote:
> >
> >> +1 I had good experiences with Azure pipelines in the past.
> >>
> >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
> aljos...@apache.org>
> >> wrote:
> >>
> >> > +1
> >> >
> >> > Thanks for the effort! The tooling seems to be quite a bit nicer
> and I
> >> > like that we can grow by adding more machines.
> >> >
> >> > Best,
> >> > Aljoscha
> >> >
> >> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> >> > >
> >> > > +1 for Azure pipeline because it promises better performance.
> >> > >
> >> > > However, I have 2 concerns:
> >> > >
> >> > > 1) Travis provides personal free service for testing personal
> >> branches.
> >> > > Usually, contributors use this feature to test PoC or run CRON
> jobs
> >> for
> >> > > pull requests.
> >> > >Using local machine will cost a lot of time. Does AZP
> provides the
> >> > same
> >> > > free service?
> >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI
> build
> >> > > notifications [2] and send to bui...@flink.apache.org mailing
> list.
> >> > >We need to figure out a way how to send Azure build results
> to the
> >> > > mailing list. And this [3] might be the way to go.
> >> > >
> >> > > builds@f.a.o mailing list
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > > [1]: https://github.com/wuchong/flink-notification-bot
> >> > > [2]:
> >> > >
> >> >
> >>
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> >> > > [3]:
> >> > >
> >> >
> >>
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> >> > >
> >> > >
> >> > >
> >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang 
> wrote:
> >> > >
> >> > >> +1
> >> > >>
> >> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道：
> >> > >>
> >> > >>> +1 for moving to Azure pipelines as it promises better
> scalability
> >> and
> >> > >>> tooling. Looking forward to having faster builds and hence
> shorter
> >> > >> feedback
> >> > >>> cycles :-)
> >> > >>>
> >> > >>> Cheers,
> >> > >>> Till
> >> > >>>
> >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <
> ches...@apache.org
> >> >
> >> > >>> wrote:
> >> > >>>
> >> >  @robert Can you expand how the azure setup interacts with
> CiBot?
> >> Do we
> >> >  have to continue mirroring builds into flink-ci? How will the
> >> cronjob
> >> >  configuration work? We should have a general idea on how to
> >> implement
> >> >  this before proceeding.
> >> >  Additionally, moving /all /jobs into flink-ci requires
> setting up
> >> the
> >> >  environment variables we have; can we set these up via files
> or
> >> will
> >> > we
> >> >  have to give all committers permissions for flink-ci/flink?
> >> > 
> >> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
> >> > > From what I've seen so far Azure will provide us a better
> >> experience,
> >> > > so I'd say +1 for the transition as a whole.
> >> > >

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Trevor Grant

If anyone wants to do a drive by PR link to vendor docs, just let them.
Just have the menu set up so it doesn't get too visually busy?

E.g.:
Deployment & Ops -> Cluster Deployment -> 3rd Party Vendors -> [ GCP, AWS,
Azure, Ververica, Oracle Cloud, IBM, Lightbend, Crazy Trevo's House of
Streaming, How to Submit to This List ]

where each one will link to a the vendor docs and pop the link open in a
new tab.

When a link goes down- comment it out and ping whoever made the
original PR.


On Thu, Dec 5, 2019 at 11:28 AM Seth Wiesman  wrote:

> @chesnay I'm not sure, however, I don't know what we could do to improve
> the situation that wouldn't effectively be copying those vendors docs into
> our own.
>
> One option would be to do exactly that, but then I feel like we are
> committing to tracking changes on those systems and I just don't know how
> feasible that is.
>
> I am personally still in favor of the removal of all three but as a
> compromise, we could replace these pages with a "Vendor" page that just
> links to the appropriate docs for these services. It could also include the
> most basic Filesystem information that @ufuk mentioned. That still leaves
> an open question of who we allow. Just the cloud providers or also others
> like Cloudera and Ververica? For AWS only EMR or also Kinesis Data
> Analytics, etc.
>
>
>
> On Thu, Dec 5, 2019 at 10:25 AM Robert Metzger 
> wrote:
>
> > The bounce rate of these pages is not particularly bad.
> >
> > On Thu, Dec 5, 2019 at 3:48 PM Trevor Grant 
> > wrote:
> >
> > > You can infer that by looking at the "bounce rate" eg someone gets to
> the
> > > page, looks at it, realizes its trash and clicks "back".
> > >
> > >
> > >
> > > On Thu, Dec 5, 2019 at 8:46 AM Chesnay Schepler 
> > > wrote:
> > >
> > > > Question now is whether the numbers are so low because the docs
> aren't
> > > > required or because they are so bad.
> > > >
> > > > On 05/12/2019 14:26, Robert Metzger wrote:
> > > > > I just checked GA:
> > > > >
> > > > > All numbers are for the last month, independent of the Flink
> version:
> > > > > aws.html: 918 pageviews
> > > > > mapr_setup.html: 108 pageviews
> > > > > gce_setup.html: 256 pageviews
> > > > >
> > > > > Some other deployment-related pages for reference:
> > > > > yarn_setup: 4687
> > > > > cluster: 4284
> > > > > kubernetes: 3428
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant <
> > trevor.d.gr...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Same as Ufuk (non-binding)
> > > > >>
> > > > >> In general, docs pages are great "first commits" to leave out
> there
> > as
> > > > >> newb-issues.
> > > > >>
> > > > >> Also though, worth checking how often people use the page (e.g.
> GA)
> > > > >>
> > > > >> 3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to
> > fix
> > > > them
> > > > >> (and put a readme explaining why they are in `.bu` status (which
> > > should
> > > > >> prevent the build from picking them up), in essence, you're
> > commenting
> > > > them
> > > > >> all out until someone can come around fix them.
> > > > >>
> > > > >> Further, I would put a holder page in their place that says
> > something
> > > > like,
> > > > >> "This works, but we need someone to update the docs- check out
> JIRA
> > > 
> > > > >> for more details", might get someone to clean em up sooner w a
> > little
> > > > >> advertising.
> > > > >>
> > > > >> Just my .02
> > > > >>
> > > > >> I can't do full overhauls right now, but I could execute the
> > "comment
> > > > out"
> > > > >> option if it comes to that.
> > > > >>
> > > > >>
> > > > >> On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi 
> wrote:
> > > > >>
> > > > >>> +1 to drop the MapR page.
> > > > >>>
> > > > >>> For the other two I'm +0. I fully agree that the linked AWS and
> GCE
> > > > pages
> > > > >>> are in bad shape and don't relate to a component developed by the
> > > > >>> community. Do we have any numbers from Google Analytics on how
> > > popular
> > > > >>> those pages are? If they are somewhat popular, I would prefer to
> > "fix
> > > > >> them"
> > > > >>> to be good starting points for users in those environments
> > (probably
> > > by
> > > > >>> boiling them down to saying something simple such as "You should
> > use
> > > > >>> FileSystem [...] and point it to [...].").
> > > > >>>
> > > > >>> – Ufuk
> > > > >>>
> > > > >>> On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann <
> > trohrm...@apache.org>
> > > > >>> wrote:
> > > >  If the community cannot manage to keep the vendor-specific
> > > > >> documentation
> > > > >>> up
> > > >  to date, then I believe it is better to drop it. Hence +1 for
> the
> > > > >>> proposal.
> > > >  Cheers,
> > > >  Till
> > > > 
> > > >  On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek <
> > > aljos...@apache.org>
> > > > >>> wrote:
> > > > > +1
> > > > >
> > > > > Best,
> > > > > Aljoscha
> > > > >
> > > > >> On 2. Dec 2019, at

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Seth Wiesman

@chesnay I'm not sure, however, I don't know what we could do to improve
the situation that wouldn't effectively be copying those vendors docs into
our own.

One option would be to do exactly that, but then I feel like we are
committing to tracking changes on those systems and I just don't know how
feasible that is.

I am personally still in favor of the removal of all three but as a
compromise, we could replace these pages with a "Vendor" page that just
links to the appropriate docs for these services. It could also include the
most basic Filesystem information that @ufuk mentioned. That still leaves
an open question of who we allow. Just the cloud providers or also others
like Cloudera and Ververica? For AWS only EMR or also Kinesis Data
Analytics, etc.



On Thu, Dec 5, 2019 at 10:25 AM Robert Metzger  wrote:

> The bounce rate of these pages is not particularly bad.
>
> On Thu, Dec 5, 2019 at 3:48 PM Trevor Grant 
> wrote:
>
> > You can infer that by looking at the "bounce rate" eg someone gets to the
> > page, looks at it, realizes its trash and clicks "back".
> >
> >
> >
> > On Thu, Dec 5, 2019 at 8:46 AM Chesnay Schepler 
> > wrote:
> >
> > > Question now is whether the numbers are so low because the docs aren't
> > > required or because they are so bad.
> > >
> > > On 05/12/2019 14:26, Robert Metzger wrote:
> > > > I just checked GA:
> > > >
> > > > All numbers are for the last month, independent of the Flink version:
> > > > aws.html: 918 pageviews
> > > > mapr_setup.html: 108 pageviews
> > > > gce_setup.html: 256 pageviews
> > > >
> > > > Some other deployment-related pages for reference:
> > > > yarn_setup: 4687
> > > > cluster: 4284
> > > > kubernetes: 3428
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant <
> trevor.d.gr...@gmail.com>
> > > > wrote:
> > > >
> > > >> Same as Ufuk (non-binding)
> > > >>
> > > >> In general, docs pages are great "first commits" to leave out there
> as
> > > >> newb-issues.
> > > >>
> > > >> Also though, worth checking how often people use the page (e.g. GA)
> > > >>
> > > >> 3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to
> fix
> > > them
> > > >> (and put a readme explaining why they are in `.bu` status (which
> > should
> > > >> prevent the build from picking them up), in essence, you're
> commenting
> > > them
> > > >> all out until someone can come around fix them.
> > > >>
> > > >> Further, I would put a holder page in their place that says
> something
> > > like,
> > > >> "This works, but we need someone to update the docs- check out JIRA
> > 
> > > >> for more details", might get someone to clean em up sooner w a
> little
> > > >> advertising.
> > > >>
> > > >> Just my .02
> > > >>
> > > >> I can't do full overhauls right now, but I could execute the
> "comment
> > > out"
> > > >> option if it comes to that.
> > > >>
> > > >>
> > > >> On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:
> > > >>
> > > >>> +1 to drop the MapR page.
> > > >>>
> > > >>> For the other two I'm +0. I fully agree that the linked AWS and GCE
> > > pages
> > > >>> are in bad shape and don't relate to a component developed by the
> > > >>> community. Do we have any numbers from Google Analytics on how
> > popular
> > > >>> those pages are? If they are somewhat popular, I would prefer to
> "fix
> > > >> them"
> > > >>> to be good starting points for users in those environments
> (probably
> > by
> > > >>> boiling them down to saying something simple such as "You should
> use
> > > >>> FileSystem [...] and point it to [...].").
> > > >>>
> > > >>> – Ufuk
> > > >>>
> > > >>> On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann <
> trohrm...@apache.org>
> > > >>> wrote:
> > >  If the community cannot manage to keep the vendor-specific
> > > >> documentation
> > > >>> up
> > >  to date, then I believe it is better to drop it. Hence +1 for the
> > > >>> proposal.
> > >  Cheers,
> > >  Till
> > > 
> > >  On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek <
> > aljos...@apache.org>
> > > >>> wrote:
> > > > +1
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > >> On 2. Dec 2019, at 18:38, Konstantin Knauf <
> > > >> konstan...@ververica.com
> > > > wrote:
> > > >> +1 from my side to drop.
> > > >>
> > > >> On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman <
> sjwies...@gmail.com>
> > > >>> wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> I'd like to discuss dropping vendor-specific deployment
> > > >>> documentation
> > > > from
> > > >>> Flink's official docs. To be clear, I am *NOT* suggesting we
> drop
> > > >>> any of
> > > >>> the filesystem documentation, but the following three pages.
> > > >>>
> > > >>> AWS:
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> > > >>> Google Compute Engine:
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https:/

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Yun Tang

Hi Robert

Really exciting to see this new more powerful CI tool to get rid of the 50 
minutes limit of traivs-CI free account.

After reading the wiki, I support idea 2 of AZP-setup version-2. 

However, after I dig into some failing builds at 
https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the logs 
of some IT cases which would be uploaded by traivs_watchdog to transfer.sh 
previously.
I think this feature is also easy to implement in AZP, right?

Best
Yun Tang

On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:

I've created a first draft of my plans in the wiki:

https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines.
I'm looking forward to your comments.

On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger  wrote:

> Thank you all for the positive feedback. I will start putting together a
> page in the wiki.
>
> @Jark: Azure Pipelines provides a free services, that is even better than
> what Travis provides for free: 10 parallel builds with 6 hours timeouts.
>
> @Chesnay: I will answer your questions in the yet-to-be-written
> documentation in the wiki.
>
>
> On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  wrote:
>
>> +1 I had good experiences with Azure pipelines in the past.
>>
>> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
>> wrote:
>>
>> > +1
>> >
>> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
>> > like that we can grow by adding more machines.
>> >
>> > Best,
>> > Aljoscha
>> >
>> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
>> > >
>> > > +1 for Azure pipeline because it promises better performance.
>> > >
>> > > However, I have 2 concerns:
>> > >
>> > > 1) Travis provides personal free service for testing personal
>> branches.
>> > > Usually, contributors use this feature to test PoC or run CRON jobs
>> for
>> > > pull requests.
>> > >Using local machine will cost a lot of time. Does AZP provides the
>> > same
>> > > free service?
>> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
>> > > notifications [2] and send to bui...@flink.apache.org mailing list.
>> > >We need to figure out a way how to send Azure build results to the
>> > > mailing list. And this [3] might be the way to go.
>> > >
>> > > builds@f.a.o mailing list
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > [1]: https://github.com/wuchong/flink-notification-bot
>> > > [2]:
>> > >
>> >
>> 
https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
>> > > [3]:
>> > >
>> >
>> 
https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
>> > >
>> > >
>> > >
>> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
>> > >
>> > >> +1
>> > >>
>> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道：
>> > >>
>> > >>> +1 for moving to Azure pipelines as it promises better scalability
>> and
>> > >>> tooling. Looking forward to having faster builds and hence shorter
>> > >> feedback
>> > >>> cycles :-)
>> > >>>
>> > >>> Cheers,
>> > >>> Till
>> > >>>
>> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler > >
>> > >>> wrote:
>> > >>>
>> >  @robert Can you expand how the azure setup interacts with CiBot?
>> Do we
>> >  have to continue mirroring builds into flink-ci? How will the
>> cronjob
>> >  configuration work? We should have a general idea on how to
>> implement
>> >  this before proceeding.
>> >  Additionally, moving /all /jobs into flink-ci requires setting up
>> the
>> >  environment variables we have; can we set these up via files or
>> will
>> > we
>> >  have to give all committers permissions for flink-ci/flink?
>> > 
>> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
>> > > From what I've seen so far Azure will provide us a better
>> experience,
>> > > so I'd say +1 for the transition as a whole.
>> > >
>> > > I'd delay merge at least until the feature branch is cut.
>> > > Given the parental leave it may even make sense to only start
>> merging
>> > > in January afterwards, to reduce the total time taken for the
>> > >>> transition.
>> > >
>> > > Reviews could maybe be made earlier, but I'm wondering whether
>> anyone
>> > > would even have the time at the moment to do so.
>> > >
>> > > On 04/12/2019 12:35, Kurt Young wrote:
>> > >> Thanks Robert for driving this. There is another big pain point
>> of
>> > >> current
>> > >> travis,
>> > >> which is its cache mechanism will fail from time to time. Almost
>> > >> around 50%

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Robert Metzger

The bounce rate of these pages is not particularly bad.

On Thu, Dec 5, 2019 at 3:48 PM Trevor Grant 
wrote:

> You can infer that by looking at the "bounce rate" eg someone gets to the
> page, looks at it, realizes its trash and clicks "back".
>
>
>
> On Thu, Dec 5, 2019 at 8:46 AM Chesnay Schepler 
> wrote:
>
> > Question now is whether the numbers are so low because the docs aren't
> > required or because they are so bad.
> >
> > On 05/12/2019 14:26, Robert Metzger wrote:
> > > I just checked GA:
> > >
> > > All numbers are for the last month, independent of the Flink version:
> > > aws.html: 918 pageviews
> > > mapr_setup.html: 108 pageviews
> > > gce_setup.html: 256 pageviews
> > >
> > > Some other deployment-related pages for reference:
> > > yarn_setup: 4687
> > > cluster: 4284
> > > kubernetes: 3428
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant 
> > > wrote:
> > >
> > >> Same as Ufuk (non-binding)
> > >>
> > >> In general, docs pages are great "first commits" to leave out there as
> > >> newb-issues.
> > >>
> > >> Also though, worth checking how often people use the page (e.g. GA)
> > >>
> > >> 3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to fix
> > them
> > >> (and put a readme explaining why they are in `.bu` status (which
> should
> > >> prevent the build from picking them up), in essence, you're commenting
> > them
> > >> all out until someone can come around fix them.
> > >>
> > >> Further, I would put a holder page in their place that says something
> > like,
> > >> "This works, but we need someone to update the docs- check out JIRA
> 
> > >> for more details", might get someone to clean em up sooner w a little
> > >> advertising.
> > >>
> > >> Just my .02
> > >>
> > >> I can't do full overhauls right now, but I could execute the "comment
> > out"
> > >> option if it comes to that.
> > >>
> > >>
> > >> On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:
> > >>
> > >>> +1 to drop the MapR page.
> > >>>
> > >>> For the other two I'm +0. I fully agree that the linked AWS and GCE
> > pages
> > >>> are in bad shape and don't relate to a component developed by the
> > >>> community. Do we have any numbers from Google Analytics on how
> popular
> > >>> those pages are? If they are somewhat popular, I would prefer to "fix
> > >> them"
> > >>> to be good starting points for users in those environments (probably
> by
> > >>> boiling them down to saying something simple such as "You should use
> > >>> FileSystem [...] and point it to [...].").
> > >>>
> > >>> – Ufuk
> > >>>
> > >>> On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann 
> > >>> wrote:
> >  If the community cannot manage to keep the vendor-specific
> > >> documentation
> > >>> up
> >  to date, then I believe it is better to drop it. Hence +1 for the
> > >>> proposal.
> >  Cheers,
> >  Till
> > 
> >  On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek <
> aljos...@apache.org>
> > >>> wrote:
> > > +1
> > >
> > > Best,
> > > Aljoscha
> > >
> > >> On 2. Dec 2019, at 18:38, Konstantin Knauf <
> > >> konstan...@ververica.com
> > > wrote:
> > >> +1 from my side to drop.
> > >>
> > >> On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman 
> > >>> wrote:
> > >>> Hi all,
> > >>>
> > >>> I'd like to discuss dropping vendor-specific deployment
> > >>> documentation
> > > from
> > >>> Flink's official docs. To be clear, I am *NOT* suggesting we drop
> > >>> any of
> > >>> the filesystem documentation, but the following three pages.
> > >>>
> > >>> AWS:
> > >>>
> > >>>
> > >>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> > >>> Google Compute Engine:
> > >>>
> > >>>
> > >>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> > >>> MapR:
> > >>>
> > >>>
> > >>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> > >>> Unlike the filesystems, these docs do not refer to components
> > > maintained by
> > >>> the Apache Flink community, but external commercial services and
> > > products.
> > >>> None of these pages are well maintained and I do not think the
> > > open-source
> > >>> community can reasonably be expected to keep them up to date. In
> > >>> particular,
> > >>>
> > >>>
> > >>>- The AWS page contains sparse information and mostly just
> links
> > >>> to
> > > the
> > >>>official EMR docs.
> > >>>- The Google Compute Engine page is out of date and the
> commands
> > >>> do
> > > not
> > >>>work.
> > >>>- MapR contains some relevant information but the community
> has
> > > already
> > >>>dropped the MapR filesystem so I am not sure that deployment
> > >> would
> > > work
> > >>> (I
> > >>>have not tested).
> > >>>
> > >>>

[jira] [Created] (FLINK-15084) Let MemoryManager allocate and track shared memory resources

2019-12-05 Thread Stephan Ewen (Jira)

Stephan Ewen created FLINK-15084:


 Summary: Let MemoryManager allocate and track shared memory 
resources
 Key: FLINK-15084
 URL: https://issues.apache.org/jira/browse/FLINK-15084
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.10.0


To allocate and share resources (block cache) between the RocksDB instances of 
multiple tasks and operators in a slot, we need a per-slot component to track 
these shared resources.

The MemoryManager is a good fir for that, because we also want to allocate 
these resources from the managed memory budget maintained by the MemoryManager.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Robert Metzger

I've created a first draft of my plans in the wiki:
https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines.
I'm looking forward to your comments.

On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger  wrote:

> Thank you all for the positive feedback. I will start putting together a
> page in the wiki.
>
> @Jark: Azure Pipelines provides a free services, that is even better than
> what Travis provides for free: 10 parallel builds with 6 hours timeouts.
>
> @Chesnay: I will answer your questions in the yet-to-be-written
> documentation in the wiki.
>
>
> On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  wrote:
>
>> +1 I had good experiences with Azure pipelines in the past.
>>
>> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
>> wrote:
>>
>> > +1
>> >
>> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
>> > like that we can grow by adding more machines.
>> >
>> > Best,
>> > Aljoscha
>> >
>> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
>> > >
>> > > +1 for Azure pipeline because it promises better performance.
>> > >
>> > > However, I have 2 concerns:
>> > >
>> > > 1) Travis provides personal free service for testing personal
>> branches.
>> > > Usually, contributors use this feature to test PoC or run CRON jobs
>> for
>> > > pull requests.
>> > >Using local machine will cost a lot of time. Does AZP provides the
>> > same
>> > > free service?
>> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
>> > > notifications [2] and send to bui...@flink.apache.org mailing list.
>> > >We need to figure out a way how to send Azure build results to the
>> > > mailing list. And this [3] might be the way to go.
>> > >
>> > > builds@f.a.o mailing list
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > [1]: https://github.com/wuchong/flink-notification-bot
>> > > [2]:
>> > >
>> >
>> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
>> > > [3]:
>> > >
>> >
>> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
>> > >
>> > >
>> > >
>> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
>> > >
>> > >> +1
>> > >>
>> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道：
>> > >>
>> > >>> +1 for moving to Azure pipelines as it promises better scalability
>> and
>> > >>> tooling. Looking forward to having faster builds and hence shorter
>> > >> feedback
>> > >>> cycles :-)
>> > >>>
>> > >>> Cheers,
>> > >>> Till
>> > >>>
>> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler > >
>> > >>> wrote:
>> > >>>
>> >  @robert Can you expand how the azure setup interacts with CiBot?
>> Do we
>> >  have to continue mirroring builds into flink-ci? How will the
>> cronjob
>> >  configuration work? We should have a general idea on how to
>> implement
>> >  this before proceeding.
>> >  Additionally, moving /all /jobs into flink-ci requires setting up
>> the
>> >  environment variables we have; can we set these up via files or
>> will
>> > we
>> >  have to give all committers permissions for flink-ci/flink?
>> > 
>> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
>> > > From what I've seen so far Azure will provide us a better
>> experience,
>> > > so I'd say +1 for the transition as a whole.
>> > >
>> > > I'd delay merge at least until the feature branch is cut.
>> > > Given the parental leave it may even make sense to only start
>> merging
>> > > in January afterwards, to reduce the total time taken for the
>> > >>> transition.
>> > >
>> > > Reviews could maybe be made earlier, but I'm wondering whether
>> anyone
>> > > would even have the time at the moment to do so.
>> > >
>> > > On 04/12/2019 12:35, Kurt Young wrote:
>> > >> Thanks Robert for driving this. There is another big pain point
>> of
>> > >> current
>> > >> travis,
>> > >> which is its cache mechanism will fail from time to time. Almost
>> > >> around 50%
>> > >> of
>> > >> the build fails are caused by cache problem. I opened this issue
>> to
>> > >> travis
>> > >> but
>> > >> got no response yet. So big +1 from my side.
>> > >>
>> > >> Just one comment, it's close to 1.10 feature freeze and we will
>> > >> spend
>> > >> some
>> > >> time
>> > >> to make tests stable before release. I wish this replacement can
>> > >>> happen
>> > >> after
>> > >> 1.10 release, otherwise it will be a unstable factor during
>> release
>> > >> testing.
>> > >>
>> > >> Best,
>> > >> Kurt
>> > >>
>> > >>
>> > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu 
>> wrote:
>> > >>
>> > >>> Thanks Robert for the updates! And thanks a lot for all the
>> efforts
>> > >>> to
>> > >>> investigate, experiment and tune Azure Pipelines for Flink
>> > >> building.
>> > >>> Big +1 for it.
>> > >>>
>> > >>> It would be great that the community building can be extended
>> with
>> > >>> custom
>> > >

[jira] [Created] (FLINK-15083) Fix connectors only pick physical fields from TableSchema

2019-12-05 Thread Jark Wu (Jira)

Jark Wu created FLINK-15083:
---

 Summary: Fix connectors only pick physical fields from TableSchema
 Key: FLINK-15083
 URL: https://issues.apache.org/jira/browse/FLINK-15083
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Common, Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.10.0


Currently, all the connectors will derive TableSchema from properties. But 
after introducing computed columns, the TableSchema contains generated columns. 
However, almost all the connectors use the drived TableSchema as 
{{TableSource#getTableSchema}} and {{TableSource#getProducedDataType}} which 
may produce wrong result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15082) Mesos App Master does not respect taskmanager.memory.total-process.size

2019-12-05 Thread Gary Yao (Jira)

Gary Yao created FLINK-15082:


 Summary: Mesos App Master does not respect 
taskmanager.memory.total-process.size
 Key: FLINK-15082
 URL: https://issues.apache.org/jira/browse/FLINK-15082
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Gary Yao
 Fix For: 1.10.0


*Description*
 When the Mesos App Master is started with 
{{taskmanager.memory.total-process.size}}, [the value is not 
respected|https://github.com/apache/flink/blob/d08beaa3255b3df96afe35f17e257df31a0d71ed/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosTaskManagerParameters.java#L339].
 

One can reproduce this when starting the App Master with the command below:
{noformat}
/bin/mesos-appmaster.sh \ 
-Dtaskmanager.memory.total-process.size=2048m \
-Djobmanager.heap.size=2048m \
...
{noformat}
The ClusterEntryPoint will fail with an exception (see below). The reason is 
that the default value of {{mesos.resourcemanager.tasks.mem}} will be taken as 
the total process memory size (1024 mb).
{noformat}
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to 
initialize the cluster entrypoint MesosSessionClusterEntrypoint.
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
at 
org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:126)
Caused by: org.apache.flink.util.FlinkException: Could not create the 
DispatcherResourceManagerComponent.
at 
org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
... 2 more
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of 
configured Framework Heap Memory (134217728 bytes), Framework Off-Heap Memory 
(134217728 bytes), Task Off-Heap Memory (0 bytes), Managed Memory (719407031 
bytes) and Shuffle Memory (80530638 bytes) exceed configured Total Flink Memory 
(805306368 bytes).
at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveInternalMemoryFromTotalFlinkMemory(TaskExecutorResourceUtils.java:273)
at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithTotalProcessMemory(TaskExecutorResourceUtils.java:210)
at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:108)
at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:94)
at 
org.apache.flink.mesos.runtime.clusterframework.MesosTaskManagerParameters.create(MesosTaskManagerParameters.java:341)
at 
org.apache.flink.mesos.util.MesosUtils.createTmParameters(MesosUtils.java:109)
at 
org.apache.flink.mesos.runtime.clusterframework.MesosResourceManagerFactory.createActiveResourceManager(MesosResourceManagerFactory.java:80)
at 
org.apache.flink.runtime.resourcemanager.ActiveResourceManagerFactory.createResourceManager(ActiveResourceManagerFactory.java:58)
at 
org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:170)
... 9 more
{noformat}
*Expected Behavior*
 * If taskmanager.memory.total-process.size and mesos.resourcemanager.tasks.mem 
are both set and differ in their values, an exception should be thrown
 * If only taskmanager.memory.total-process.size is set and 
mesos.resourcemanager.tasks.mem is not set, then the value configured by the 
former should be respected
 * If only mesos.resourcemanager.tasks.mem is set and 
taskmanager.memory.total-process.size is not set, then the value configured by 
the former should be respected



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15081) Translate "Concepts & Common API" page of Table API into Chinese

2019-12-05 Thread Steve OU (Jira)

Steve OU created FLINK-15081:


 Summary: Translate "Concepts & Common API" page of Table API into 
Chinese
 Key: FLINK-15081
 URL: https://issues.apache.org/jira/browse/FLINK-15081
 Project: Flink
  Issue Type: Task
  Components: chinese-translation
Reporter: Steve OU


The page url is 
[https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html]
The markdown file is located in flink/docs/dev/table/common.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Trevor Grant

You can infer that by looking at the "bounce rate" eg someone gets to the
page, looks at it, realizes its trash and clicks "back".



On Thu, Dec 5, 2019 at 8:46 AM Chesnay Schepler  wrote:

> Question now is whether the numbers are so low because the docs aren't
> required or because they are so bad.
>
> On 05/12/2019 14:26, Robert Metzger wrote:
> > I just checked GA:
> >
> > All numbers are for the last month, independent of the Flink version:
> > aws.html: 918 pageviews
> > mapr_setup.html: 108 pageviews
> > gce_setup.html: 256 pageviews
> >
> > Some other deployment-related pages for reference:
> > yarn_setup: 4687
> > cluster: 4284
> > kubernetes: 3428
> >
> >
> >
> >
> >
> > On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant 
> > wrote:
> >
> >> Same as Ufuk (non-binding)
> >>
> >> In general, docs pages are great "first commits" to leave out there as
> >> newb-issues.
> >>
> >> Also though, worth checking how often people use the page (e.g. GA)
> >>
> >> 3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to fix
> them
> >> (and put a readme explaining why they are in `.bu` status (which should
> >> prevent the build from picking them up), in essence, you're commenting
> them
> >> all out until someone can come around fix them.
> >>
> >> Further, I would put a holder page in their place that says something
> like,
> >> "This works, but we need someone to update the docs- check out JIRA 
> >> for more details", might get someone to clean em up sooner w a little
> >> advertising.
> >>
> >> Just my .02
> >>
> >> I can't do full overhauls right now, but I could execute the "comment
> out"
> >> option if it comes to that.
> >>
> >>
> >> On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:
> >>
> >>> +1 to drop the MapR page.
> >>>
> >>> For the other two I'm +0. I fully agree that the linked AWS and GCE
> pages
> >>> are in bad shape and don't relate to a component developed by the
> >>> community. Do we have any numbers from Google Analytics on how popular
> >>> those pages are? If they are somewhat popular, I would prefer to "fix
> >> them"
> >>> to be good starting points for users in those environments (probably by
> >>> boiling them down to saying something simple such as "You should use
> >>> FileSystem [...] and point it to [...].").
> >>>
> >>> – Ufuk
> >>>
> >>> On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann 
> >>> wrote:
>  If the community cannot manage to keep the vendor-specific
> >> documentation
> >>> up
>  to date, then I believe it is better to drop it. Hence +1 for the
> >>> proposal.
>  Cheers,
>  Till
> 
>  On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek 
> >>> wrote:
> > +1
> >
> > Best,
> > Aljoscha
> >
> >> On 2. Dec 2019, at 18:38, Konstantin Knauf <
> >> konstan...@ververica.com
> > wrote:
> >> +1 from my side to drop.
> >>
> >> On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman 
> >>> wrote:
> >>> Hi all,
> >>>
> >>> I'd like to discuss dropping vendor-specific deployment
> >>> documentation
> > from
> >>> Flink's official docs. To be clear, I am *NOT* suggesting we drop
> >>> any of
> >>> the filesystem documentation, but the following three pages.
> >>>
> >>> AWS:
> >>>
> >>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> >>> Google Compute Engine:
> >>>
> >>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> >>> MapR:
> >>>
> >>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> >>> Unlike the filesystems, these docs do not refer to components
> > maintained by
> >>> the Apache Flink community, but external commercial services and
> > products.
> >>> None of these pages are well maintained and I do not think the
> > open-source
> >>> community can reasonably be expected to keep them up to date. In
> >>> particular,
> >>>
> >>>
> >>>- The AWS page contains sparse information and mostly just links
> >>> to
> > the
> >>>official EMR docs.
> >>>- The Google Compute Engine page is out of date and the commands
> >>> do
> > not
> >>>work.
> >>>- MapR contains some relevant information but the community has
> > already
> >>>dropped the MapR filesystem so I am not sure that deployment
> >> would
> > work
> >>> (I
> >>>have not tested).
> >>>
> >>> There is also a larger question of which vendor products should be
> > included
> >>> and which should not. That is why I would like to suggest dropping
> >>> these
> >>> pages and referring users to vendor maintained documentation
> >>> whenever
> > they
> >>> are using one of these services.
> >>>
> >>> Seth Wiesman
> >>>
> >>
> >> --
> >>
> >> Konstantin Knauf | Solutions Architect
> >>
> >>

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Chesnay Schepler

Question now is whether the numbers are so low because the docs aren't 
required or because they are so bad.


On 05/12/2019 14:26, Robert Metzger wrote:

I just checked GA:

All numbers are for the last month, independent of the Flink version:
aws.html: 918 pageviews
mapr_setup.html: 108 pageviews
gce_setup.html: 256 pageviews

Some other deployment-related pages for reference:
yarn_setup: 4687
cluster: 4284
kubernetes: 3428





On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant 
wrote:


Same as Ufuk (non-binding)

In general, docs pages are great "first commits" to leave out there as
newb-issues.

Also though, worth checking how often people use the page (e.g. GA)

3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to fix them
(and put a readme explaining why they are in `.bu` status (which should
prevent the build from picking them up), in essence, you're commenting them
all out until someone can come around fix them.

Further, I would put a holder page in their place that says something like,
"This works, but we need someone to update the docs- check out JIRA 
for more details", might get someone to clean em up sooner w a little
advertising.

Just my .02

I can't do full overhauls right now, but I could execute the "comment out"
option if it comes to that.


On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:


+1 to drop the MapR page.

For the other two I'm +0. I fully agree that the linked AWS and GCE pages
are in bad shape and don't relate to a component developed by the
community. Do we have any numbers from Google Analytics on how popular
those pages are? If they are somewhat popular, I would prefer to "fix

them"

to be good starting points for users in those environments (probably by
boiling them down to saying something simple such as "You should use
FileSystem [...] and point it to [...].").

– Ufuk

On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann 
wrote:

If the community cannot manage to keep the vendor-specific

documentation

up

to date, then I believe it is better to drop it. Hence +1 for the

proposal.

Cheers,
Till

On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek 

wrote:

+1

Best,
Aljoscha


On 2. Dec 2019, at 18:38, Konstantin Knauf <

konstan...@ververica.com

wrote:

+1 from my side to drop.

On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman 

wrote:

Hi all,

I'd like to discuss dropping vendor-specific deployment

documentation

from

Flink's official docs. To be clear, I am *NOT* suggesting we drop

any of

the filesystem documentation, but the following three pages.

AWS:





https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html

Google Compute Engine:





https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html

MapR:





https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html

Unlike the filesystems, these docs do not refer to components

maintained by

the Apache Flink community, but external commercial services and

products.

None of these pages are well maintained and I do not think the

open-source

community can reasonably be expected to keep them up to date. In
particular,


   - The AWS page contains sparse information and mostly just links

to

the

   official EMR docs.
   - The Google Compute Engine page is out of date and the commands

do

not

   work.
   - MapR contains some relevant information but the community has

already

   dropped the MapR filesystem so I am not sure that deployment

would

work

(I
   have not tested).

There is also a larger question of which vendor products should be

included

and which should not. That is why I would like to suggest dropping

these

pages and referring users to vendor maintained documentation

whenever

they

are using one of these services.

Seth Wiesman



--

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica 


--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung

Jason,

Ji

(Tony) Cheng

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Trevor Grant

Based on that the only "maybe" is AWS, and I just googled it and AWS docs
pretty well own the first page (flink.apache.org shows up 3/4 the way down
on first page behind AWS docs).

I revise my "vote" to +1 to dump the whole thing.

On Thu, Dec 5, 2019 at 7:26 AM Robert Metzger  wrote:

> I just checked GA:
>
> All numbers are for the last month, independent of the Flink version:
> aws.html: 918 pageviews
> mapr_setup.html: 108 pageviews
> gce_setup.html: 256 pageviews
>
> Some other deployment-related pages for reference:
> yarn_setup: 4687
> cluster: 4284
> kubernetes: 3428
>
>
>
>
>
> On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant 
> wrote:
>
> > Same as Ufuk (non-binding)
> >
> > In general, docs pages are great "first commits" to leave out there as
> > newb-issues.
> >
> > Also though, worth checking how often people use the page (e.g. GA)
> >
> > 3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to fix
> them
> > (and put a readme explaining why they are in `.bu` status (which should
> > prevent the build from picking them up), in essence, you're commenting
> them
> > all out until someone can come around fix them.
> >
> > Further, I would put a holder page in their place that says something
> like,
> > "This works, but we need someone to update the docs- check out JIRA 
> > for more details", might get someone to clean em up sooner w a little
> > advertising.
> >
> > Just my .02
> >
> > I can't do full overhauls right now, but I could execute the "comment
> out"
> > option if it comes to that.
> >
> >
> > On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:
> >
> > > +1 to drop the MapR page.
> > >
> > > For the other two I'm +0. I fully agree that the linked AWS and GCE
> pages
> > > are in bad shape and don't relate to a component developed by the
> > > community. Do we have any numbers from Google Analytics on how popular
> > > those pages are? If they are somewhat popular, I would prefer to "fix
> > them"
> > > to be good starting points for users in those environments (probably by
> > > boiling them down to saying something simple such as "You should use
> > > FileSystem [...] and point it to [...].").
> > >
> > > – Ufuk
> > >
> > > On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann 
> > > wrote:
> > > >
> > > > If the community cannot manage to keep the vendor-specific
> > documentation
> > > up
> > > > to date, then I believe it is better to drop it. Hence +1 for the
> > > proposal.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek  >
> > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Best,
> > > > > Aljoscha
> > > > >
> > > > > > On 2. Dec 2019, at 18:38, Konstantin Knauf <
> > konstan...@ververica.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > +1 from my side to drop.
> > > > > >
> > > > > > On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman  >
> > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I'd like to discuss dropping vendor-specific deployment
> > > documentation
> > > > > from
> > > > > >> Flink's official docs. To be clear, I am *NOT* suggesting we
> drop
> > > any of
> > > > > >> the filesystem documentation, but the following three pages.
> > > > > >>
> > > > > >> AWS:
> > > > > >>
> > > > > >>
> > > > >
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> > > > > >> Google Compute Engine:
> > > > > >>
> > > > > >>
> > > > >
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> > > > > >> MapR:
> > > > > >>
> > > > > >>
> > > > >
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> > > > > >>
> > > > > >> Unlike the filesystems, these docs do not refer to components
> > > > > maintained by
> > > > > >> the Apache Flink community, but external commercial services and
> > > > > products.
> > > > > >> None of these pages are well maintained and I do not think the
> > > > > open-source
> > > > > >> community can reasonably be expected to keep them up to date. In
> > > > > >> particular,
> > > > > >>
> > > > > >>
> > > > > >>   - The AWS page contains sparse information and mostly just
> links
> > > to
> > > > > the
> > > > > >>   official EMR docs.
> > > > > >>   - The Google Compute Engine page is out of date and the
> commands
> > > do
> > > > > not
> > > > > >>   work.
> > > > > >>   - MapR contains some relevant information but the community
> has
> > > > > already
> > > > > >>   dropped the MapR filesystem so I am not sure that deployment
> > would
> > > > > work
> > > > > >> (I
> > > > > >>   have not tested).
> > > > > >>
> > > > > >> There is also a larger question of which vendor products should
> be
> > > > > included
> > > > > >> and which should not. That is why I would like to suggest
> dropping
> > > these
> > > > > >> pages and referring users to vendor maintained documentation
> > > whenever
> > > > > they
> > > > > >> are using one of these

[VOTE] Release 1.8.3, release candidate #3

2019-12-05 Thread Hequn Cheng

Hi everyone,

Please review and vote on the release candidate #3 for the version 1.8.3,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.8.3-rc3" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours.
Please cast your votes before *Dec. 10th 2019, 16:00 UTC*.

It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Hequn

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346112
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc3/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1314/
[5]
https://github.com/apache/flink/commit/d54807ba10d0392a60663f030f9fe0bfa1c66754
[6] https://github.com/apache/flink-web/pull/285

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Robert Metzger

I just checked GA:

All numbers are for the last month, independent of the Flink version:
aws.html: 918 pageviews
mapr_setup.html: 108 pageviews
gce_setup.html: 256 pageviews

Some other deployment-related pages for reference:
yarn_setup: 4687
cluster: 4284
kubernetes: 3428





On Thu, Dec 5, 2019 at 1:53 PM Trevor Grant 
wrote:

> Same as Ufuk (non-binding)
>
> In general, docs pages are great "first commits" to leave out there as
> newb-issues.
>
> Also though, worth checking how often people use the page (e.g. GA)
>
> 3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to fix them
> (and put a readme explaining why they are in `.bu` status (which should
> prevent the build from picking them up), in essence, you're commenting them
> all out until someone can come around fix them.
>
> Further, I would put a holder page in their place that says something like,
> "This works, but we need someone to update the docs- check out JIRA 
> for more details", might get someone to clean em up sooner w a little
> advertising.
>
> Just my .02
>
> I can't do full overhauls right now, but I could execute the "comment out"
> option if it comes to that.
>
>
> On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:
>
> > +1 to drop the MapR page.
> >
> > For the other two I'm +0. I fully agree that the linked AWS and GCE pages
> > are in bad shape and don't relate to a component developed by the
> > community. Do we have any numbers from Google Analytics on how popular
> > those pages are? If they are somewhat popular, I would prefer to "fix
> them"
> > to be good starting points for users in those environments (probably by
> > boiling them down to saying something simple such as "You should use
> > FileSystem [...] and point it to [...].").
> >
> > – Ufuk
> >
> > On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann 
> > wrote:
> > >
> > > If the community cannot manage to keep the vendor-specific
> documentation
> > up
> > > to date, then I believe it is better to drop it. Hence +1 for the
> > proposal.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek 
> > wrote:
> > >
> > > > +1
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > > On 2. Dec 2019, at 18:38, Konstantin Knauf <
> konstan...@ververica.com
> > >
> > > > wrote:
> > > > >
> > > > > +1 from my side to drop.
> > > > >
> > > > > On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman 
> > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> I'd like to discuss dropping vendor-specific deployment
> > documentation
> > > > from
> > > > >> Flink's official docs. To be clear, I am *NOT* suggesting we drop
> > any of
> > > > >> the filesystem documentation, but the following three pages.
> > > > >>
> > > > >> AWS:
> > > > >>
> > > > >>
> > > >
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> > > > >> Google Compute Engine:
> > > > >>
> > > > >>
> > > >
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> > > > >> MapR:
> > > > >>
> > > > >>
> > > >
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> > > > >>
> > > > >> Unlike the filesystems, these docs do not refer to components
> > > > maintained by
> > > > >> the Apache Flink community, but external commercial services and
> > > > products.
> > > > >> None of these pages are well maintained and I do not think the
> > > > open-source
> > > > >> community can reasonably be expected to keep them up to date. In
> > > > >> particular,
> > > > >>
> > > > >>
> > > > >>   - The AWS page contains sparse information and mostly just links
> > to
> > > > the
> > > > >>   official EMR docs.
> > > > >>   - The Google Compute Engine page is out of date and the commands
> > do
> > > > not
> > > > >>   work.
> > > > >>   - MapR contains some relevant information but the community has
> > > > already
> > > > >>   dropped the MapR filesystem so I am not sure that deployment
> would
> > > > work
> > > > >> (I
> > > > >>   have not tested).
> > > > >>
> > > > >> There is also a larger question of which vendor products should be
> > > > included
> > > > >> and which should not. That is why I would like to suggest dropping
> > these
> > > > >> pages and referring users to vendor maintained documentation
> > whenever
> > > > they
> > > > >> are using one of these services.
> > > > >>
> > > > >> Seth Wiesman
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Konstantin Knauf | Solutions Architect
> > > > >
> > > > > +49 160 91394525
> > > > >
> > > > >
> > > > > Follow us @VervericaData Ververica 
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Join Flink Forward  - The Apache Flink
> > > > > Conference
> > > > >
> > > > > Stream Processing | Event Driven | Real Time
> > > > >
> > > > > --
> > > > >
> > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > > > >
> > > > > --
> >

[jira] [Created] (FLINK-15080) Deploy OSS filesystem to maven central

2019-12-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-15080:


 Summary: Deploy OSS filesystem to maven central
 Key: FLINK-15080
 URL: https://issues.apache.org/jira/browse/FLINK-15080
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Connectors / FileSystem, Release System
Affects Versions: 1.8.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0


Just noticed that the OSS filesystem isn't being deployed.

I see no reason why this is the case; we are deploying all other artifacts and 
it just makes it more difficult to access older versions (since you'd have to 
splice them out of a distribution).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Trevor Grant

Same as Ufuk (non-binding)

In general, docs pages are great "first commits" to leave out there as
newb-issues.

Also though, worth checking how often people use the page (e.g. GA)

3rd option: add a `.bu` to AWS/GCE pages and open a JIRA ticket to fix them
(and put a readme explaining why they are in `.bu` status (which should
prevent the build from picking them up), in essence, you're commenting them
all out until someone can come around fix them.

Further, I would put a holder page in their place that says something like,
"This works, but we need someone to update the docs- check out JIRA 
for more details", might get someone to clean em up sooner w a little
advertising.

Just my .02

I can't do full overhauls right now, but I could execute the "comment out"
option if it comes to that.


On Thu, Dec 5, 2019 at 6:35 AM Ufuk Celebi  wrote:

> +1 to drop the MapR page.
>
> For the other two I'm +0. I fully agree that the linked AWS and GCE pages
> are in bad shape and don't relate to a component developed by the
> community. Do we have any numbers from Google Analytics on how popular
> those pages are? If they are somewhat popular, I would prefer to "fix them"
> to be good starting points for users in those environments (probably by
> boiling them down to saying something simple such as "You should use
> FileSystem [...] and point it to [...].").
>
> – Ufuk
>
> On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann 
> wrote:
> >
> > If the community cannot manage to keep the vendor-specific documentation
> up
> > to date, then I believe it is better to drop it. Hence +1 for the
> proposal.
> >
> > Cheers,
> > Till
> >
> > On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek 
> wrote:
> >
> > > +1
> > >
> > > Best,
> > > Aljoscha
> > >
> > > > On 2. Dec 2019, at 18:38, Konstantin Knauf  >
> > > wrote:
> > > >
> > > > +1 from my side to drop.
> > > >
> > > > On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman 
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'd like to discuss dropping vendor-specific deployment
> documentation
> > > from
> > > >> Flink's official docs. To be clear, I am *NOT* suggesting we drop
> any of
> > > >> the filesystem documentation, but the following three pages.
> > > >>
> > > >> AWS:
> > > >>
> > > >>
> > >
>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> > > >> Google Compute Engine:
> > > >>
> > > >>
> > >
>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> > > >> MapR:
> > > >>
> > > >>
> > >
>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> > > >>
> > > >> Unlike the filesystems, these docs do not refer to components
> > > maintained by
> > > >> the Apache Flink community, but external commercial services and
> > > products.
> > > >> None of these pages are well maintained and I do not think the
> > > open-source
> > > >> community can reasonably be expected to keep them up to date. In
> > > >> particular,
> > > >>
> > > >>
> > > >>   - The AWS page contains sparse information and mostly just links
> to
> > > the
> > > >>   official EMR docs.
> > > >>   - The Google Compute Engine page is out of date and the commands
> do
> > > not
> > > >>   work.
> > > >>   - MapR contains some relevant information but the community has
> > > already
> > > >>   dropped the MapR filesystem so I am not sure that deployment would
> > > work
> > > >> (I
> > > >>   have not tested).
> > > >>
> > > >> There is also a larger question of which vendor products should be
> > > included
> > > >> and which should not. That is why I would like to suggest dropping
> these
> > > >> pages and referring users to vendor maintained documentation
> whenever
> > > they
> > > >> are using one of these services.
> > > >>
> > > >> Seth Wiesman
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf | Solutions Architect
> > > >
> > > > +49 160 91394525
> > > >
> > > >
> > > > Follow us @VervericaData Ververica 
> > > >
> > > >
> > > > --
> > > >
> > > > Join Flink Forward  - The Apache Flink
> > > > Conference
> > > >
> > > > Stream Processing | Event Driven | Real Time
> > > >
> > > > --
> > > >
> > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > > >
> > > > --
> > > > Ververica GmbH
> > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
> Ji
> > > > (Tony) Cheng
> > >
> > >
>

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Ufuk Celebi

+1 to drop the MapR page.

For the other two I'm +0. I fully agree that the linked AWS and GCE pages
are in bad shape and don't relate to a component developed by the
community. Do we have any numbers from Google Analytics on how popular
those pages are? If they are somewhat popular, I would prefer to "fix them"
to be good starting points for users in those environments (probably by
boiling them down to saying something simple such as "You should use
FileSystem [...] and point it to [...].").

– Ufuk

On Thu, Dec 5, 2019 at 10:49 AM Till Rohrmann  wrote:
>
> If the community cannot manage to keep the vendor-specific documentation
up
> to date, then I believe it is better to drop it. Hence +1 for the
proposal.
>
> Cheers,
> Till
>
> On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek 
wrote:
>
> > +1
> >
> > Best,
> > Aljoscha
> >
> > > On 2. Dec 2019, at 18:38, Konstantin Knauf 
> > wrote:
> > >
> > > +1 from my side to drop.
> > >
> > > On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman 
wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'd like to discuss dropping vendor-specific deployment documentation
> > from
> > >> Flink's official docs. To be clear, I am *NOT* suggesting we drop
any of
> > >> the filesystem documentation, but the following three pages.
> > >>
> > >> AWS:
> > >>
> > >>
> >
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> > >> Google Compute Engine:
> > >>
> > >>
> >
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> > >> MapR:
> > >>
> > >>
> >
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> > >>
> > >> Unlike the filesystems, these docs do not refer to components
> > maintained by
> > >> the Apache Flink community, but external commercial services and
> > products.
> > >> None of these pages are well maintained and I do not think the
> > open-source
> > >> community can reasonably be expected to keep them up to date. In
> > >> particular,
> > >>
> > >>
> > >>   - The AWS page contains sparse information and mostly just links to
> > the
> > >>   official EMR docs.
> > >>   - The Google Compute Engine page is out of date and the commands do
> > not
> > >>   work.
> > >>   - MapR contains some relevant information but the community has
> > already
> > >>   dropped the MapR filesystem so I am not sure that deployment would
> > work
> > >> (I
> > >>   have not tested).
> > >>
> > >> There is also a larger question of which vendor products should be
> > included
> > >> and which should not. That is why I would like to suggest dropping
these
> > >> pages and referring users to vendor maintained documentation whenever
> > they
> > >> are using one of these services.
> > >>
> > >> Seth Wiesman
> > >>
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf | Solutions Architect
> > >
> > > +49 160 91394525
> > >
> > >
> > > Follow us @VervericaData Ververica 
> > >
> > >
> > > --
> > >
> > > Join Flink Forward  - The Apache Flink
> > > Conference
> > >
> > > Stream Processing | Event Driven | Real Time
> > >
> > > --
> > >
> > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > >
> > > --
> > > Ververica GmbH
> > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
Ji
> > > (Tony) Cheng
> >
> >

[jira] [Created] (FLINK-15079) BashJavaUtilsTest fails when running in a clean directory

2019-12-05 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-15079:


 Summary: BashJavaUtilsTest fails when running in a clean directory
 Key: FLINK-15079
 URL: https://issues.apache.org/jira/browse/FLINK-15079
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.10.0
Reporter: Dawid Wysakowicz
 Fix For: 1.10.0


The {{BashJavaUtilsTest}} fails if it is run in a clean directory.
For example if you do what the flink documentation suggests for building flink 
binary from source 
(https://ci.apache.org/projects/flink/flink-docs-release-1.9/flinkDev/building.html#dependency-shading):

{code}
cd flink-dist
mvn clean install
{code}

I think the problem is that the tests tries to find the results of the shading, 
but the shading is run after the tests.

I think one solution would be to move this test to e2e tests.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15078) Report Python version exceptions in time

2019-12-05 Thread sunjincheng (Jira)

sunjincheng created FLINK-15078:
---

 Summary: Report Python version exceptions in time
 Key: FLINK-15078
 URL: https://issues.apache.org/jira/browse/FLINK-15078
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: sunjincheng
 Fix For: 1.11.0
 Attachments: flink-jincheng.sunjc-python-udf-boot-jincheng.local.log

Do not report Python version exceptions in time when Using the Python2.7 in 
flink 1.10.

We should do the follows config:

{code}
t_env.get_config().set_python_executable("python3")
{code}
although we can fix this issue by config, bug I think it's better to report 
Python version exceptions in time. 

Shot error message:

{code}
RuntimeError: Python versions prior to 3.5 are not supported for PyFlink 
[sys.version_info(major=2, minor=7, micro=16, releaselevel='final', serial=0)].
{code}

The detail info can be found in Attachment.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[DISCUSS] Drop Heap Backend Synchronous snapshots

2019-12-05 Thread Stephan Ewen

Hi all!

I am wondering if there is any case for retaining the option to make
synchronous snapshots on the heap statebackend. Is anyone using that? Or
could we clean that code up and remove it?

Best,
Stephan

[jira] [Created] (FLINK-15077) Support Semi/Anti LookupJoin in Blink planner

2019-12-05 Thread Jing Zhang (Jira)

Jing Zhang created FLINK-15077:
--

 Summary: Support Semi/Anti LookupJoin in Blink planner
 Key: FLINK-15077
 URL: https://issues.apache.org/jira/browse/FLINK-15077
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jing Zhang


Support the following sql in Blink planner:

{code:sql}
SELECT T.id, T.len, T.content FROM T WHERE T.id IN (
  SELECT id FROM csvDim FOR SYSTEM_TIME AS OF PROCTIME() AS D)
{code}

{code:sql}
SELECT T.id, T.len, T.content FROM T WHERE EXISTS (
  SELECT * FROM csvDim FOR SYSTEM_TIME AS OF PROCTIME() AS D WHERE T.id = D.id)
{code}

{code:sql}
SELECT T.id, T.len, T.content FROM T WHERE NOT EXISTS (
  SELECT * FROM csvDim FOR SYSTEM_TIME AS OF PROCTIME() AS D WHERE T.id = D.id)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Robert Metzger

Thank you all for the positive feedback. I will start putting together a
page in the wiki.

@Jark: Azure Pipelines provides a free services, that is even better than
what Travis provides for free: 10 parallel builds with 6 hours timeouts.

@Chesnay: I will answer your questions in the yet-to-be-written
documentation in the wiki.


On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  wrote:

> +1 I had good experiences with Azure pipelines in the past.
>
> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
> wrote:
>
> > +1
> >
> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
> > like that we can grow by adding more machines.
> >
> > Best,
> > Aljoscha
> >
> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> > >
> > > +1 for Azure pipeline because it promises better performance.
> > >
> > > However, I have 2 concerns:
> > >
> > > 1) Travis provides personal free service for testing personal branches.
> > > Usually, contributors use this feature to test PoC or run CRON jobs for
> > > pull requests.
> > >Using local machine will cost a lot of time. Does AZP provides the
> > same
> > > free service?
> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
> > > notifications [2] and send to bui...@flink.apache.org mailing list.
> > >We need to figure out a way how to send Azure build results to the
> > > mailing list. And this [3] might be the way to go.
> > >
> > > builds@f.a.o mailing list
> > >
> > > Best,
> > > Jark
> > >
> > > [1]: https://github.com/wuchong/flink-notification-bot
> > > [2]:
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > > [3]:
> > >
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> > >
> > >
> > >
> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
> > >
> > >> +1
> > >>
> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道：
> > >>
> > >>> +1 for moving to Azure pipelines as it promises better scalability
> and
> > >>> tooling. Looking forward to having faster builds and hence shorter
> > >> feedback
> > >>> cycles :-)
> > >>>
> > >>> Cheers,
> > >>> Till
> > >>>
> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
> > >>> wrote:
> > >>>
> >  @robert Can you expand how the azure setup interacts with CiBot? Do
> we
> >  have to continue mirroring builds into flink-ci? How will the
> cronjob
> >  configuration work? We should have a general idea on how to
> implement
> >  this before proceeding.
> >  Additionally, moving /all /jobs into flink-ci requires setting up
> the
> >  environment variables we have; can we set these up via files or will
> > we
> >  have to give all committers permissions for flink-ci/flink?
> > 
> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
> > > From what I've seen so far Azure will provide us a better
> experience,
> > > so I'd say +1 for the transition as a whole.
> > >
> > > I'd delay merge at least until the feature branch is cut.
> > > Given the parental leave it may even make sense to only start
> merging
> > > in January afterwards, to reduce the total time taken for the
> > >>> transition.
> > >
> > > Reviews could maybe be made earlier, but I'm wondering whether
> anyone
> > > would even have the time at the moment to do so.
> > >
> > > On 04/12/2019 12:35, Kurt Young wrote:
> > >> Thanks Robert for driving this. There is another big pain point of
> > >> current
> > >> travis,
> > >> which is its cache mechanism will fail from time to time. Almost
> > >> around 50%
> > >> of
> > >> the build fails are caused by cache problem. I opened this issue
> to
> > >> travis
> > >> but
> > >> got no response yet. So big +1 from my side.
> > >>
> > >> Just one comment, it's close to 1.10 feature freeze and we will
> > >> spend
> > >> some
> > >> time
> > >> to make tests stable before release. I wish this replacement can
> > >>> happen
> > >> after
> > >> 1.10 release, otherwise it will be a unstable factor during
> release
> > >> testing.
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> > >>
> > >>> Thanks Robert for the updates! And thanks a lot for all the
> efforts
> > >>> to
> > >>> investigate, experiment and tune Azure Pipelines for Flink
> > >> building.
> > >>> Big +1 for it.
> > >>>
> > >>> It would be great that the community building can be extended
> with
> > >>> custom
> > >>> machines so that the tests would not be queued for long with
> daily
> > >>> growing
> > >>> PRs.
> > >>>
> > >>> The increased timeout would be also very helpful.
> > >>> The 50min timeout for free travis accounts is a pain currently,
> > >>> especially
> > >>> when we'd like to run e2e tests in our own travis. And I had to
> > >>> manually
> > >>> split th

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Arvid Heise

+1 I had good experiences with Azure pipelines in the past.

On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
wrote:

> +1
>
> Thanks for the effort! The tooling seems to be quite a bit nicer and I
> like that we can grow by adding more machines.
>
> Best,
> Aljoscha
>
> > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> >
> > +1 for Azure pipeline because it promises better performance.
> >
> > However, I have 2 concerns:
> >
> > 1) Travis provides personal free service for testing personal branches.
> > Usually, contributors use this feature to test PoC or run CRON jobs for
> > pull requests.
> >Using local machine will cost a lot of time. Does AZP provides the
> same
> > free service?
> > 2) Currently, we deployed a webhook [1] to receive Travis CI build
> > notifications [2] and send to bui...@flink.apache.org mailing list.
> >We need to figure out a way how to send Azure build results to the
> > mailing list. And this [3] might be the way to go.
> >
> > builds@f.a.o mailing list
> >
> > Best,
> > Jark
> >
> > [1]: https://github.com/wuchong/flink-notification-bot
> > [2]:
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > [3]:
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> >
> >
> >
> > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
> >
> >> +1
> >>
> >> Till Rohrmann  于2019年12月4日周三 下午10:43写道：
> >>
> >>> +1 for moving to Azure pipelines as it promises better scalability and
> >>> tooling. Looking forward to having faster builds and hence shorter
> >> feedback
> >>> cycles :-)
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
> >>> wrote:
> >>>
>  @robert Can you expand how the azure setup interacts with CiBot? Do we
>  have to continue mirroring builds into flink-ci? How will the cronjob
>  configuration work? We should have a general idea on how to implement
>  this before proceeding.
>  Additionally, moving /all /jobs into flink-ci requires setting up the
>  environment variables we have; can we set these up via files or will
> we
>  have to give all committers permissions for flink-ci/flink?
> 
>  On 04/12/2019 12:55, Chesnay Schepler wrote:
> > From what I've seen so far Azure will provide us a better experience,
> > so I'd say +1 for the transition as a whole.
> >
> > I'd delay merge at least until the feature branch is cut.
> > Given the parental leave it may even make sense to only start merging
> > in January afterwards, to reduce the total time taken for the
> >>> transition.
> >
> > Reviews could maybe be made earlier, but I'm wondering whether anyone
> > would even have the time at the moment to do so.
> >
> > On 04/12/2019 12:35, Kurt Young wrote:
> >> Thanks Robert for driving this. There is another big pain point of
> >> current
> >> travis,
> >> which is its cache mechanism will fail from time to time. Almost
> >> around 50%
> >> of
> >> the build fails are caused by cache problem. I opened this issue to
> >> travis
> >> but
> >> got no response yet. So big +1 from my side.
> >>
> >> Just one comment, it's close to 1.10 feature freeze and we will
> >> spend
> >> some
> >> time
> >> to make tests stable before release. I wish this replacement can
> >>> happen
> >> after
> >> 1.10 release, otherwise it will be a unstable factor during release
> >> testing.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> >>
> >>> Thanks Robert for the updates! And thanks a lot for all the efforts
> >>> to
> >>> investigate, experiment and tune Azure Pipelines for Flink
> >> building.
> >>> Big +1 for it.
> >>>
> >>> It would be great that the community building can be extended with
> >>> custom
> >>> machines so that the tests would not be queued for long with daily
> >>> growing
> >>> PRs.
> >>>
> >>> The increased timeout would be also very helpful.
> >>> The 50min timeout for free travis accounts is a pain currently,
> >>> especially
> >>> when we'd like to run e2e tests in our own travis. And I had to
> >>> manually
> >>> split the jobs to make it possible to pass.
> >>>
> >>> Thanks,
> >>> Zhu Zhu
> >>>
> >>> Robert Metzger  于2019年12月4日周三 下午6:36写道：
> >>>
>  Hi all,
> 
>  as a follow up from our discussion on reducing the build time
> >> [1], I
> >>> would
>  like to propose migrating our build infrastructure to Azure
> >>> Pipelines
> >>> (away
>  from Travis).
> 
>  I believe that we have reached the limits of what Travis can
>  provide the
>  Flink community, and I don't want the build system to limit or
>  influence
>  the project's growth.
> 
>  *Bene

[jira] [Created] (FLINK-15076) Source thread should be interrupted during the Task cancellation

2019-12-05 Thread Piotr Nowojski (Jira)

Piotr Nowojski created FLINK-15076:
--

 Summary: Source thread should be interrupted during the Task 
cancellation 
 Key: FLINK-15076
 URL: https://issues.apache.org/jira/browse/FLINK-15076
 Project: Flink
  Issue Type: Bug
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski
 Fix For: 1.10.0


Source thread should be interrupted more or less the same way how task thread 
is being interrupted.

+/- The `StreamTaskTest#testCancellationNotBlockedOnLock` should also work in 
case if the mailbox (task) thread is blocked on trying to acquire a 
`checkpointLock` by some other currently being executed mail (processing time 
timer/perform checkpoint).

https://github.com/apache/flink/pull/10345#discussion_r353615760



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15075) Can not run "create table" in IT Case with expressions in TableSchema because the copied ParameterScope can not been seen

2019-12-05 Thread Danny Chen (Jira)

Danny Chen created FLINK-15075:
--

 Summary: Can not run "create table" in IT Case with expressions in 
TableSchema because the copied ParameterScope can not been seen
 Key: FLINK-15075
 URL: https://issues.apache.org/jira/browse/FLINK-15075
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.9.1
Reporter: Danny Chen


The copied ParameterScope can not be seen in the LocalExecutorITCase



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Aljoscha Krettek

+1

Thanks for the effort! The tooling seems to be quite a bit nicer and I like 
that we can grow by adding more machines.

Best,
Aljoscha

> On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> 
> +1 for Azure pipeline because it promises better performance.
> 
> However, I have 2 concerns:
> 
> 1) Travis provides personal free service for testing personal branches.
> Usually, contributors use this feature to test PoC or run CRON jobs for
> pull requests.
>Using local machine will cost a lot of time. Does AZP provides the same
> free service?
> 2) Currently, we deployed a webhook [1] to receive Travis CI build
> notifications [2] and send to bui...@flink.apache.org mailing list.
>We need to figure out a way how to send Azure build results to the
> mailing list. And this [3] might be the way to go.
> 
> builds@f.a.o mailing list
> 
> Best,
> Jark
> 
> [1]: https://github.com/wuchong/flink-notification-bot
> [2]:
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> [3]:
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> 
> 
> 
> On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
> 
>> +1
>> 
>> Till Rohrmann  于2019年12月4日周三 下午10:43写道：
>> 
>>> +1 for moving to Azure pipelines as it promises better scalability and
>>> tooling. Looking forward to having faster builds and hence shorter
>> feedback
>>> cycles :-)
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
>>> wrote:
>>> 
 @robert Can you expand how the azure setup interacts with CiBot? Do we
 have to continue mirroring builds into flink-ci? How will the cronjob
 configuration work? We should have a general idea on how to implement
 this before proceeding.
 Additionally, moving /all /jobs into flink-ci requires setting up the
 environment variables we have; can we set these up via files or will we
 have to give all committers permissions for flink-ci/flink?
 
 On 04/12/2019 12:55, Chesnay Schepler wrote:
> From what I've seen so far Azure will provide us a better experience,
> so I'd say +1 for the transition as a whole.
> 
> I'd delay merge at least until the feature branch is cut.
> Given the parental leave it may even make sense to only start merging
> in January afterwards, to reduce the total time taken for the
>>> transition.
> 
> Reviews could maybe be made earlier, but I'm wondering whether anyone
> would even have the time at the moment to do so.
> 
> On 04/12/2019 12:35, Kurt Young wrote:
>> Thanks Robert for driving this. There is another big pain point of
>> current
>> travis,
>> which is its cache mechanism will fail from time to time. Almost
>> around 50%
>> of
>> the build fails are caused by cache problem. I opened this issue to
>> travis
>> but
>> got no response yet. So big +1 from my side.
>> 
>> Just one comment, it's close to 1.10 feature freeze and we will
>> spend
>> some
>> time
>> to make tests stable before release. I wish this replacement can
>>> happen
>> after
>> 1.10 release, otherwise it will be a unstable factor during release
>> testing.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
>> 
>>> Thanks Robert for the updates! And thanks a lot for all the efforts
>>> to
>>> investigate, experiment and tune Azure Pipelines for Flink
>> building.
>>> Big +1 for it.
>>> 
>>> It would be great that the community building can be extended with
>>> custom
>>> machines so that the tests would not be queued for long with daily
>>> growing
>>> PRs.
>>> 
>>> The increased timeout would be also very helpful.
>>> The 50min timeout for free travis accounts is a pain currently,
>>> especially
>>> when we'd like to run e2e tests in our own travis. And I had to
>>> manually
>>> split the jobs to make it possible to pass.
>>> 
>>> Thanks,
>>> Zhu Zhu
>>> 
>>> Robert Metzger  于2019年12月4日周三 下午6:36写道：
>>> 
 Hi all,
 
 as a follow up from our discussion on reducing the build time
>> [1], I
>>> would
 like to propose migrating our build infrastructure to Azure
>>> Pipelines
>>> (away
 from Travis).
 
 I believe that we have reached the limits of what Travis can
 provide the
 Flink community, and I don't want the build system to limit or
 influence
 the project's growth.
 
 *Benefits:*
 1. The free Travis account are limited to 5 parallel builds, with
>> a
>>> timeout
 of 50 minutes. Azure offers *10 parallel builds with 300 minute
 timeouts
 *for
 free for open source projects.
 2. Azure Pipelines allows us to *add custom build machines* to the
 pool
>>> of
 10 free parallel bu

Re: [DISCUSS] Add N-Ary Stream Operator

2019-12-05 Thread Piotr Nowojski

Hi,

Thanks for the clarifications Jingsong. Indeed, if chaining doesn’t work with 
multiple output right now (doesn’t it?), that’s also a good future story.

Re Kurt:
I think this pattern could be easily handled if those two joins are implemented 
as a single 3 input operator, that internally is composed of those three 
operators.
1. You can set the initial InputSelection to Build1 and Build2.
2. When Build1 receives `endOfInput`, InputSelection switches to Probe1 and 
Build2.
3. When Probe1 receives `endOfInput`, you do not forward the `endOfInput` to 
the internal `HashAgg` operator 
4. When Build2 finally receives `endOfInput`, you can finally forward the 
`endOfInput` to the internal `HashAgg`

Exactly for reasons like that, I wanted to at least post pone handling 
tree-like operator chains in the Flink. Logic like that is difficult to express 
generically, since it requires the knowledge about the operators behaviour. 
While when hardcoded for the specific project (Blink in this case) and 
encapsulated behind N-ary selectable input operator, it’s very easy to handle 
by the runtime. Sure, at the expense of a bit more complexity in forcing the 
user to compose operators, that’s why I’m not saying that we do not want to 
handle this at some point in the future, but at least not in the first version.

Piotrek

> On 5 Dec 2019, at 10:11, Jingsong Li  wrote:
> 
> Kurt mentioned a very interesting thing,
> 
> If we want to better performance to read simultaneously, To this pattern:
> We need to control not only the read order of inputs, but also the outputs
> of endInput.
> In this case, HashAggregate can only call its real endInput after the input
> of build2 is finished, so the endInput of an operator is not necessarily
> determined by its input, but also by other associated inputs.
> I think we have the ability to do this in the n-input operator.
> 
> Note that these behaviors should be determined at compile time.
> 
> Best,
> Jingsong Lee
> 
> On Thu, Dec 5, 2019 at 4:42 PM Kurt Young  wrote:
> 
>> During implementing n-ary input operator in table, please keep
>> this pattern in mind:
>> 
>> Build1 ---+
>> 
>>  |
>> 
>>  +---> HshJoin1 --—> HashAgg ---+
>> 
>>  |  |
>> 
>> Probe1 ---+  +---> HashJoin2
>> 
>> |
>> 
>>   Build2 ---+
>> 
>> It's quite interesting that both `Build1`, `Build2` and `Probe1` can
>> be read simultaneously. But we need to control `HashAgg`'s output
>> before `Build2` finished. I don't have a clear solution for now, but
>> it's a common pattern we will face.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Thu, Dec 5, 2019 at 4:37 PM Jingsong Li  wrote:
>> 
>>> Hi Piotr,
>>> 
 a) two input operator X -> one input operator Y -> one input operator Z
>>> (ALLOWED)
 b) n input operator X -> one input operator Y -> one input operator Z
>>> (ALLOWED)
 c) two input operator X -> one input operator Y -> two input operator Z
>>> (NOT ALLOWED as a single chain)
>>> 
>>> NOT ALLOWED to c) sounds good to me. I understand that it is very
>> difficult
>>> to propose a general support for any input selectable two input operators
>>> chain with high performance.
>>> And it is not necessary for table layer too. b) has already excited us.
>>> 
>>> Actually, we have supported n output chain too:
>>> d) one/two/n op X -> one op Y -> one op A1 -> one op B1 -> one op C1
>>> -> one op A2 -> one op
>> B2
>>> -> one op C2
>>> d) is a very useful feature too.
>>> 
 Do you mean that those Table API/SQL use cases (HashJoin/SortMergeJoin)
>>> could be easily handled by a single N-Ary Stream Operator, so this would
>> be
>>> covered by steps 1. and 2. from my plan from my previous e-mail? That
>> would
>>> be real nice (avoiding the input selection chaining).
>>> 
>>> Yes, because in the table layer, the typical scenarios currently only
>> have
>>> static order. (We don't consider MergeJoin here, because it's too complex
>>> to be optimized, and not deserved to be optimized at present.).
>>> For example, the current TwoInputOperators: HashJoin and NestedLoopJoin.
>>> They are all static reading order. We must read the build input before we
>>> can read the probe input.
>>> So after we analyze chain, we put all the operators that can chain into
>> a N
>>> input operator, We can analyze the static order required by this
>> operator,
>>> and divide the reading order into several levels:
>>> - fist level: input4, input5, input1
>>> - second level: input2, input6
>>> - third level: input1, input7
>>> Note that these analyses are at the compile time of the client.
>>> At runtime, we just need to read in a fixed order.
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Wed, Dec 4, 2019 at 10:15 PM Piotr Nowojski 
>>> wrote:
>>> 
 Hi Jingsong,
 
 Thanks for the feedback :)
 
 Could you clarify a little bit wh

Re: [VOTE] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-05 Thread Aljoscha Krettek

+1 (binding)

> On 5. Dec 2019, at 10:58, Hequn Cheng  wrote:
> 
> +1 (binding)
> 
> Best,
> Hequn
> 
> On Thu, Dec 5, 2019 at 5:43 PM jincheng sun 
> wrote:
> 
>> +1 (binding)
>> 
>> Best,
>> Jincheng
>> 
>> Dian Fu  于2019年12月5日周四 上午11:14写道：
>> 
>>> +1 (non-binding)
>>> 
>>> Regards,
>>> Dian
>>> 
 在 2019年12月5日，上午11:11，Jark Wu  写道：
 
 +1 (binding)
 
 Best,
 Jark
 
 On Thu, 5 Dec 2019 at 10:45, Wei Zhong  wrote:
 
> Hi all,
> 
> According to our previous discussion in [1], I'd like to bring up a
>> vote
> to apply the adjustment [2] to the command-line option design of
>> FLIP-78
> [3].
> 
> The vote will be open for at least 72 hours unless there is an
>> objection
> or not enough votes.
> 
> Best,
> Wei
> 
> [1]
> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-the-Pyflink-command-line-options-Adjustment-to-FLIP-78-td35440.html
> [2]
> 
>>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> [3]
> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
> 
> 
>>> 
>>> 
>>

Re: [VOTE] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-05 Thread Hequn Cheng

+1 (binding)

Best,
Hequn

On Thu, Dec 5, 2019 at 5:43 PM jincheng sun 
wrote:

> +1 (binding)
>
> Best,
> Jincheng
>
> Dian Fu  于2019年12月5日周四 上午11:14写道：
>
> > +1 (non-binding)
> >
> > Regards,
> > Dian
> >
> > > 在 2019年12月5日，上午11:11，Jark Wu  写道：
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 5 Dec 2019 at 10:45, Wei Zhong  wrote:
> > >
> > >> Hi all,
> > >>
> > >> According to our previous discussion in [1], I'd like to bring up a
> vote
> > >> to apply the adjustment [2] to the command-line option design of
> FLIP-78
> > >> [3].
> > >>
> > >> The vote will be open for at least 72 hours unless there is an
> objection
> > >> or not enough votes.
> > >>
> > >> Best,
> > >> Wei
> > >>
> > >> [1]
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-the-Pyflink-command-line-options-Adjustment-to-FLIP-78-td35440.html
> > >> [2]
> > >>
> >
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> > >> [3]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] Drop vendor specific deployment documentation.

2019-12-05 Thread Till Rohrmann

If the community cannot manage to keep the vendor-specific documentation up
to date, then I believe it is better to drop it. Hence +1 for the proposal.

Cheers,
Till

On Tue, Dec 3, 2019 at 3:12 PM Aljoscha Krettek  wrote:

> +1
>
> Best,
> Aljoscha
>
> > On 2. Dec 2019, at 18:38, Konstantin Knauf 
> wrote:
> >
> > +1 from my side to drop.
> >
> > On Mon, Dec 2, 2019 at 6:34 PM Seth Wiesman  wrote:
> >
> >> Hi all,
> >>
> >> I'd like to discuss dropping vendor-specific deployment documentation
> from
> >> Flink's official docs. To be clear, I am *NOT* suggesting we drop any of
> >> the filesystem documentation, but the following three pages.
> >>
> >> AWS:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/aws.html
> >> Google Compute Engine:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/gce_setup.html
> >> MapR:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/mapr_setup.html
> >>
> >> Unlike the filesystems, these docs do not refer to components
> maintained by
> >> the Apache Flink community, but external commercial services and
> products.
> >> None of these pages are well maintained and I do not think the
> open-source
> >> community can reasonably be expected to keep them up to date. In
> >> particular,
> >>
> >>
> >>   - The AWS page contains sparse information and mostly just links to
> the
> >>   official EMR docs.
> >>   - The Google Compute Engine page is out of date and the commands do
> not
> >>   work.
> >>   - MapR contains some relevant information but the community has
> already
> >>   dropped the MapR filesystem so I am not sure that deployment would
> work
> >> (I
> >>   have not tested).
> >>
> >> There is also a larger question of which vendor products should be
> included
> >> and which should not. That is why I would like to suggest dropping these
> >> pages and referring users to vendor maintained documentation whenever
> they
> >> are using one of these services.
> >>
> >> Seth Wiesman
> >>
> >
> >
> > --
> >
> > Konstantin Knauf | Solutions Architect
> >
> > +49 160 91394525
> >
> >
> > Follow us @VervericaData Ververica 
> >
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Tony) Cheng
>
>

Re: [VOTE] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-05 Thread jincheng sun

+1 (binding)

Best,
Jincheng

Dian Fu  于2019年12月5日周四 上午11:14写道：

> +1 (non-binding)
>
> Regards,
> Dian
>
> > 在 2019年12月5日，上午11:11，Jark Wu  写道：
> >
> > +1 (binding)
> >
> > Best,
> > Jark
> >
> > On Thu, 5 Dec 2019 at 10:45, Wei Zhong  wrote:
> >
> >> Hi all,
> >>
> >> According to our previous discussion in [1], I'd like to bring up a vote
> >> to apply the adjustment [2] to the command-line option design of FLIP-78
> >> [3].
> >>
> >> The vote will be open for at least 72 hours unless there is an objection
> >> or not enough votes.
> >>
> >> Best,
> >> Wei
> >>
> >> [1]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-the-Pyflink-command-line-options-Adjustment-to-FLIP-78-td35440.html
> >> [2]
> >>
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> >> [3]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
> >>
> >>
>
>

Re: [DISCUSS] Voting from apache.org addresses

2019-12-05 Thread Till Rohrmann

+1 for the proposal.

Cheers,
Till

On Wed, Dec 4, 2019 at 12:43 PM Dian Fu  wrote:

> Hi Dawid,
>
> Thanks for the reply. Counting all the votes from non apache addresses as
> non-binding makes sense. Just as Jark mentioned, we can always remind the
> committer/PMC to vote again to use the apache address if necessary (i.e.
> when the number of binding votes is not enough).
>
> Thanks,
> Dian
>
> > 在 2019年12月4日，下午7:27，Kurt Young  写道：
> >
> > +1 (from my apache email ;-))
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Dec 4, 2019 at 7:22 PM Jark Wu  wrote:
> >
> >> I'm +1 on this proposal.
> >>
> >> Regarding to the case that Dian mentioned, we can reminder the
> >> committer/PMC to vote again use the apache email,
> >> and of course the non-apache vote is counted as non-binding.
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 4 Dec 2019 at 17:33, Dawid Wysakowicz 
> >> wrote:
> >>
> >>> Hi Dian,
> >>>
> >>> I don't want to be very strict, but I think it should be counted as
> >>> non-binding, if it comes from non apache address, yes.
> >>>
> >>> Anybody should be able to verify a vote. Moreover I think this the only
> >>> way to "encourage" all committers to use their apache addresses ;)
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>> On 04/12/2019 10:26, Dian Fu wrote:
>  Thanks for your explanation Dawid! It makes sense to me now. +1.
> 
>  Just one minor question: Does this mean that if a committer/PMC
> >>> accidentally votes using the non apache email, even if the person who
> >>> summarizes the votes clearly KNOWS who he/she is, that vote will still
> be
> >>> counted as non-binding?
> 
>  Regards,
>  Dian
> 
> > 在 2019年12月4日，下午5:13，Aljoscha Krettek  写道：
> >
> > Very sensible! +1
> >
> >> On 4. Dec 2019, at 10:02, Chesnay Schepler 
> >> wrote:
> >>
> >> I believe this to be a sensible approach by Dawid; +1.
> >>
> >> On 04/12/2019 09:04, Dawid Wysakowicz wrote:
> >>> Hi all,
> >>>
> >>> Sorry I think I was not clear enough on my initial e-mail. Let me
> >>> first clarify two things and later on try to rephrase my initial
> >> suggestion.
> >>>
> >>> 1. I do not want to count all votes from @apache.org addresses as
> >>> binding
> >>> 2. I do not want to discourage people that do not have @apache.org
> >>>  address from voting
> >>> 3. What I said does not change anything for non-committers/non-PMCs
> >>>
> >>> What I meant is that if you are a committer/PMC please use an
> >>> apache.org address because then the person that summarizes the votes
> can
> >>> check in the apache directory if a person with that address is a
> >>> committer/PMC in flink project. Otherwise if a committer uses a
> different
> >>> address there is no way to check if that person is a committer/PMC or
> >> not.
> >>> It does not mean though that if you vote from apache.org this vote is
> >>> automatically binding. It just allows us to check if it is.
> >>>
> >>> To elaborate on Xuefu's example. It's absolutely fine for you to
> use
> >>> an apache address for voting. I will still check if you are a committer
> >> or
> >>> not. But take me (or any other committer) for example. If I use my
> >>> non-apache address for a vote and the person verifying the vote does
> not
> >>> know me and my address, it is not easy for that person to verify if I
> am
> >> a
> >>> committer or not.
> >>>
> >>> Also it does not mean that other people are not allowed to vote.
> You
> >>> can vote from other addresses, but those votes will be counted as
> >>> non-binding. This does not change anything for non-committers/non-PMC.
> >>> However if you are a committer and vote from non apache address your
> vote
> >>> will be non-binding, because we cannot verify you are indeed a
> committer
> >>> (we might don't know your other address).
> >>>
> >>> I agree the additional information (binding, non-binding) in a vote
> >>> helps, but it still should be verified. People make mistakes.
> >>>
> >>> I hope this clears it up a bit.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>> On 04/12/2019 04:58, Dian Fu wrote:
>  Thanks Dawid for start this discussion.
> 
>  I have the same feeling with Xuefu and Jingsong. Besides that,
> >>> according to the bylaws, for some kinds of votes, only the votes from
> >>> active PMC members are binding, such as product release. So an email
> >>> address doesn't help here. Even if a vote is from a Flink committer, it
> >> is
> >>> still non-binding.
> 
>  Thanks,
>  Dian
> 
> > 在 2019年12月4日，上午10:37，Jingsong Lee  写道：
> >
> > Thanks Dawid for driving this discussion.
> >
> > +1 to Xuefu's viewpoint.
> > I am not a Flink committer, but sometimes I use apache email
> >>> address to
> > send email.
> >
> > Another way is that we require the binding ticket to

[jira] [Created] (FLINK-15074) Connection timed out, Standalone

2019-12-05 Thread gameking (Jira)

gameking created FLINK-15074:


 Summary: Connection timed out, Standalone
 Key: FLINK-15074
 URL: https://issues.apache.org/jira/browse/FLINK-15074
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.9.1
 Environment: flink version : 1.5.1 , 1.9.1

jdk version : 1.8.0_181

Number of servers : 15

Number of taskmanagers : 178

Number of slots: 178
Reporter: gameking
 Attachments: flink-conf.yaml, jobmanager.log, taskmanager.log

I am running a flink streaming application on  a standalone-cluster.

It works well when the job's parallelism is low, just like 96.

But when I try to increase job's parallelism  to a high value, like 164 or 
more,  Job will fail in 10-15 minutes due to connection timeout error

I have try to solve this problem by increaseing taskmanager configs just like 
'taskmanager.network.netty.server.numThreads', 
'taskmanager.network.netty.client.numThreads', 
'taskmanager.network.request-backoff.max', 'akka.ask.timeout' and so on, It 
doesn't work.

I also try to change different versions of flink, such as 1.5.1 and 1.9.1, to 
solve this problem , it doesn't help too. 

Does anyone know how to fix this problem，I have no idea now. It looks like a 
bug.

I hava upload my config and log as attachment, and the error trace below ：

 

--

org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: 
Connection timed out
 at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:172)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 ~[flink-dist_2.11-1.5.1.jar:1.5.1]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: java.io.IOException: Connection timed out
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_181]
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_181]
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_181]
 at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_181]
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) 
~[na:1.8.0_181]
 at 
org.apache.flink.shaded.netty4.io.netty.b

[jira] [Created] (FLINK-15073) sql client fails to run same query multiple times

2019-12-05 Thread Kurt Young (Jira)

Kurt Young created FLINK-15073:
--

 Summary: sql client fails to run same query multiple times
 Key: FLINK-15073
 URL: https://issues.apache.org/jira/browse/FLINK-15073
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Reporter: Kurt Young
Assignee: Danny Chen


Flink SQL> select abs(-1);
[INFO] Result retrieval cancelled.

Flink SQL> select abs(-1);
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Table 'default: select abs(-1)' 
already exists. Please choose a different name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Add N-Ary Stream Operator

2019-12-05 Thread Jingsong Li

Kurt mentioned a very interesting thing,

If we want to better performance to read simultaneously, To this pattern:
We need to control not only the read order of inputs, but also the outputs
of endInput.
In this case, HashAggregate can only call its real endInput after the input
of build2 is finished, so the endInput of an operator is not necessarily
determined by its input, but also by other associated inputs.
I think we have the ability to do this in the n-input operator.

Note that these behaviors should be determined at compile time.

Best,
Jingsong Lee

On Thu, Dec 5, 2019 at 4:42 PM Kurt Young  wrote:

> During implementing n-ary input operator in table, please keep
> this pattern in mind:
>
> Build1 ---+
>
>   |
>
>   +---> HshJoin1 --—> HashAgg ---+
>
>   |  |
>
> Probe1 ---+  +---> HashJoin2
>
>  |
>
>Build2 ---+
>
> It's quite interesting that both `Build1`, `Build2` and `Probe1` can
> be read simultaneously. But we need to control `HashAgg`'s output
> before `Build2` finished. I don't have a clear solution for now, but
> it's a common pattern we will face.
>
> Best,
> Kurt
>
>
> On Thu, Dec 5, 2019 at 4:37 PM Jingsong Li  wrote:
>
> > Hi Piotr,
> >
> > > a) two input operator X -> one input operator Y -> one input operator Z
> > (ALLOWED)
> > > b) n input operator X -> one input operator Y -> one input operator Z
> > (ALLOWED)
> > > c) two input operator X -> one input operator Y -> two input operator Z
> > (NOT ALLOWED as a single chain)
> >
> > NOT ALLOWED to c) sounds good to me. I understand that it is very
> difficult
> > to propose a general support for any input selectable two input operators
> > chain with high performance.
> > And it is not necessary for table layer too. b) has already excited us.
> >
> > Actually, we have supported n output chain too:
> > d) one/two/n op X -> one op Y -> one op A1 -> one op B1 -> one op C1
> >  -> one op A2 -> one op
> B2
> > -> one op C2
> > d) is a very useful feature too.
> >
> > > Do you mean that those Table API/SQL use cases (HashJoin/SortMergeJoin)
> > could be easily handled by a single N-Ary Stream Operator, so this would
> be
> > covered by steps 1. and 2. from my plan from my previous e-mail? That
> would
> > be real nice (avoiding the input selection chaining).
> >
> > Yes, because in the table layer, the typical scenarios currently only
> have
> > static order. (We don't consider MergeJoin here, because it's too complex
> > to be optimized, and not deserved to be optimized at present.).
> > For example, the current TwoInputOperators: HashJoin and NestedLoopJoin.
> > They are all static reading order. We must read the build input before we
> > can read the probe input.
> > So after we analyze chain, we put all the operators that can chain into
> a N
> > input operator, We can analyze the static order required by this
> operator,
> > and divide the reading order into several levels:
> > - fist level: input4, input5, input1
> > - second level: input2, input6
> > - third level: input1, input7
> > Note that these analyses are at the compile time of the client.
> > At runtime, we just need to read in a fixed order.
> >
> > Best,
> > Jingsong Lee
> >
> > On Wed, Dec 4, 2019 at 10:15 PM Piotr Nowojski 
> > wrote:
> >
> > > Hi Jingsong,
> > >
> > > Thanks for the feedback :)
> > >
> > > Could you clarify a little bit what do you mean by your wished use
> cases?
> > >
> > > > There are a large number jobs (in production environment) that their
> > > > TwoInputOperators that can be chained. We used to only watch the last
> > > > ten tasks transmit data through disk and network, which could have
> been
> > > >  done in one task.
> > > > For performance, if we can chain them, the average is 30%+, and there
> > > >  is an order of magnitude in extreme cases.
> > >
> > > As I mentioned at the end, I would like to avoid/post pone chaining of
> > > multiple/two input operators one after another because of the
> complexity
> > of
> > > input selection. For the first version I would like to aim only to
> allow
> > > chaining the single input operators with something (2 or N input must
> be
> > > always head of the chain) . For example chains:
> > >
> > > a) two input operator X -> one input operator Y -> one input operator Z
> > > (ALLOWED)
> > > b) n input operator X -> one input operator Y -> one input operator Z
> > > (ALLOWED)
> > > c) two input operator X -> one input operator Y -> two input operator Z
> > > (NOT ALLOWED as a single chain)
> > >
> > > The example above sounds to me like c)
> > >
> > > I think as a follow up, we could allow c), by extend chaining to a
> simple
> > > rule: there can only be a single input selectable operator in the chain
> > > (again, it’s the chaining of multiple input selectable operators that’s
> > > causing some problems)

Re: [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-05 Thread Hequn Cheng

+1 (binding)

Best,
Hequn

On Thu, Dec 5, 2019 at 4:41 PM jincheng sun 
wrote:

> +1（binding)
>
> Best,
> Jincheng
>
> Jingsong Li  于2019年12月3日周二 下午7:30写道：
>
> > +1 (non-binding)
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Dec 2, 2019 at 5:30 PM Dian Fu  wrote:
> >
> > > Hi Jingsong,
> > >
> > > It's fine. :)  Appreciated the comments!
> > >
> > > I have replied you in the discussion thread as I also think it's better
> > to
> > > discuss these in the discussion thread.
> > >
> > > Thanks,
> > > Dian
> > >
> > > > 在 2019年12月2日，下午3:47，Jingsong Li  写道：
> > > >
> > > > Sorry for bothering your voting.
> > > > Let's discuss in discussion thread.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Mon, Dec 2, 2019 at 3:32 PM Jingsong Lee  >
> > > wrote:
> > > >
> > > >> Hi Dian:
> > > >>
> > > >> Thanks for your driving. I have some questions:
> > > >>
> > > >> - Where should these configurations belong? You have mentioned
> > > >> tableApi/SQL, so should in TableConfig?
> > > >> - If just in table/sql, whether it should be called:
> > table.python.,
> > > >> because in table, all config options are called table.***.
> > > >> - What should table module do? So in CommonPythonCalc, we should
> read
> > > >> options from table config, and set resources to
> > OneInputTransformation?
> > > >> - Are all buffer.memory off-heap memory? I took a look
> > > >> to AbstractPythonScalarFunctionOperator, there is a
> > > forwardedInputQueue, is
> > > >> this one a heap queue? So we need heap memory too?
> > > >>
> > > >> Hope to get your reply.
> > > >>
> > > >> Best,
> > > >> Jingsong Lee
> > > >>
> > > >> On Mon, Dec 2, 2019 at 2:34 PM Dian Fu 
> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> I'd like to start the vote of FLIP-88 [1] since that we have
> reached
> > an
> > > >>> agreement on the design in the discussion thread [2].
> > > >>>
> > > >>> This vote will be open for at least 72 hours. Unless there is an
> > > >>> objection, I will try to close it by Dec 5, 2019 08:00 UTC if we
> have
> > > >>> received sufficient votes.
> > > >>>
> > > >>> Regards,
> > > >>> Dian
> > > >>>
> > > >>> [1]
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
> > > >>> [2]
> > > >>>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best, Jingsong Lee
> > > >>
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > >
> > >
> >
> > --
> > Best, Jingsong Lee
> >
>

[jira] [Created] (FLINK-15072) Hijack executeAsync instead of execute in context environment

2019-12-05 Thread Zili Chen (Jira)

Zili Chen created FLINK-15072:
-

 Summary: Hijack executeAsync instead of execute in context 
environment
 Key: FLINK-15072
 URL: https://issues.apache.org/jira/browse/FLINK-15072
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: Zili Chen
Assignee: Zili Chen
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Add N-Ary Stream Operator

2019-12-05 Thread Kurt Young

During implementing n-ary input operator in table, please keep
this pattern in mind:

Build1 ---+

  |

  +---> HshJoin1 --—> HashAgg ---+

  |  |

Probe1 ---+  +---> HashJoin2

 |

   Build2 ---+

It's quite interesting that both `Build1`, `Build2` and `Probe1` can
be read simultaneously. But we need to control `HashAgg`'s output
before `Build2` finished. I don't have a clear solution for now, but
it's a common pattern we will face.

Best,
Kurt


On Thu, Dec 5, 2019 at 4:37 PM Jingsong Li  wrote:

> Hi Piotr,
>
> > a) two input operator X -> one input operator Y -> one input operator Z
> (ALLOWED)
> > b) n input operator X -> one input operator Y -> one input operator Z
> (ALLOWED)
> > c) two input operator X -> one input operator Y -> two input operator Z
> (NOT ALLOWED as a single chain)
>
> NOT ALLOWED to c) sounds good to me. I understand that it is very difficult
> to propose a general support for any input selectable two input operators
> chain with high performance.
> And it is not necessary for table layer too. b) has already excited us.
>
> Actually, we have supported n output chain too:
> d) one/two/n op X -> one op Y -> one op A1 -> one op B1 -> one op C1
>  -> one op A2 -> one op B2
> -> one op C2
> d) is a very useful feature too.
>
> > Do you mean that those Table API/SQL use cases (HashJoin/SortMergeJoin)
> could be easily handled by a single N-Ary Stream Operator, so this would be
> covered by steps 1. and 2. from my plan from my previous e-mail? That would
> be real nice (avoiding the input selection chaining).
>
> Yes, because in the table layer, the typical scenarios currently only have
> static order. (We don't consider MergeJoin here, because it's too complex
> to be optimized, and not deserved to be optimized at present.).
> For example, the current TwoInputOperators: HashJoin and NestedLoopJoin.
> They are all static reading order. We must read the build input before we
> can read the probe input.
> So after we analyze chain, we put all the operators that can chain into a N
> input operator, We can analyze the static order required by this operator,
> and divide the reading order into several levels:
> - fist level: input4, input5, input1
> - second level: input2, input6
> - third level: input1, input7
> Note that these analyses are at the compile time of the client.
> At runtime, we just need to read in a fixed order.
>
> Best,
> Jingsong Lee
>
> On Wed, Dec 4, 2019 at 10:15 PM Piotr Nowojski 
> wrote:
>
> > Hi Jingsong,
> >
> > Thanks for the feedback :)
> >
> > Could you clarify a little bit what do you mean by your wished use cases?
> >
> > > There are a large number jobs (in production environment) that their
> > > TwoInputOperators that can be chained. We used to only watch the last
> > > ten tasks transmit data through disk and network, which could have been
> > >  done in one task.
> > > For performance, if we can chain them, the average is 30%+, and there
> > >  is an order of magnitude in extreme cases.
> >
> > As I mentioned at the end, I would like to avoid/post pone chaining of
> > multiple/two input operators one after another because of the complexity
> of
> > input selection. For the first version I would like to aim only to allow
> > chaining the single input operators with something (2 or N input must be
> > always head of the chain) . For example chains:
> >
> > a) two input operator X -> one input operator Y -> one input operator Z
> > (ALLOWED)
> > b) n input operator X -> one input operator Y -> one input operator Z
> > (ALLOWED)
> > c) two input operator X -> one input operator Y -> two input operator Z
> > (NOT ALLOWED as a single chain)
> >
> > The example above sounds to me like c)
> >
> > I think as a follow up, we could allow c), by extend chaining to a simple
> > rule: there can only be a single input selectable operator in the chain
> > (again, it’s the chaining of multiple input selectable operators that’s
> > causing some problems).
> >
> > > The table layer has many special features. which give us the chance to
> > optimize
> > >  it, but also results that it is hard to let underlying layer to
> provide
> > an abstract
> > > mechanism to implement it. For example:
> > > - HashJoin must read all the data on one side(build side) and then read
> > the
> > > other side (probe side).
> > > - HashJoin only emit data when read probe side.
> > > - SortMergeJoin read random, but if we have SortMergeJoin chain another
> > >  MergeJoin(Sort attribute re-use), that make things complicated.
> > > - HashAggregate/Sort only emit data in endInput.
> > >
> > > Provide an N-Ary stream operator to make everything possible. The upper
> > >  layer can do anything. These things can be specific optimization,
> which
> > is much
> > >  more natural than the lower layer.
> >
> > Do

Re: [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-05 Thread jincheng sun

+1（binding)

Best,
Jincheng

Jingsong Li  于2019年12月3日周二 下午7:30写道：

> +1 (non-binding)
>
> Best,
> Jingsong Lee
>
> On Mon, Dec 2, 2019 at 5:30 PM Dian Fu  wrote:
>
> > Hi Jingsong,
> >
> > It's fine. :)  Appreciated the comments!
> >
> > I have replied you in the discussion thread as I also think it's better
> to
> > discuss these in the discussion thread.
> >
> > Thanks,
> > Dian
> >
> > > 在 2019年12月2日，下午3:47，Jingsong Li  写道：
> > >
> > > Sorry for bothering your voting.
> > > Let's discuss in discussion thread.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Dec 2, 2019 at 3:32 PM Jingsong Lee 
> > wrote:
> > >
> > >> Hi Dian:
> > >>
> > >> Thanks for your driving. I have some questions:
> > >>
> > >> - Where should these configurations belong? You have mentioned
> > >> tableApi/SQL, so should in TableConfig?
> > >> - If just in table/sql, whether it should be called:
> table.python.,
> > >> because in table, all config options are called table.***.
> > >> - What should table module do? So in CommonPythonCalc, we should read
> > >> options from table config, and set resources to
> OneInputTransformation?
> > >> - Are all buffer.memory off-heap memory? I took a look
> > >> to AbstractPythonScalarFunctionOperator, there is a
> > forwardedInputQueue, is
> > >> this one a heap queue? So we need heap memory too?
> > >>
> > >> Hope to get your reply.
> > >>
> > >> Best,
> > >> Jingsong Lee
> > >>
> > >> On Mon, Dec 2, 2019 at 2:34 PM Dian Fu  wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I'd like to start the vote of FLIP-88 [1] since that we have reached
> an
> > >>> agreement on the design in the discussion thread [2].
> > >>>
> > >>> This vote will be open for at least 72 hours. Unless there is an
> > >>> objection, I will try to close it by Dec 5, 2019 08:00 UTC if we have
> > >>> received sufficient votes.
> > >>>
> > >>> Regards,
> > >>> Dian
> > >>>
> > >>> [1]
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
> > >>> [2]
> > >>>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html
> > >>
> > >>
> > >>
> > >> --
> > >> Best, Jingsong Lee
> > >>
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> >
> >
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Add N-Ary Stream Operator

2019-12-05 Thread Jingsong Li

Hi Piotr,

> a) two input operator X -> one input operator Y -> one input operator Z
(ALLOWED)
> b) n input operator X -> one input operator Y -> one input operator Z
(ALLOWED)
> c) two input operator X -> one input operator Y -> two input operator Z
(NOT ALLOWED as a single chain)

NOT ALLOWED to c) sounds good to me. I understand that it is very difficult
to propose a general support for any input selectable two input operators
chain with high performance.
And it is not necessary for table layer too. b) has already excited us.

Actually, we have supported n output chain too:
d) one/two/n op X -> one op Y -> one op A1 -> one op B1 -> one op C1
 -> one op A2 -> one op B2
-> one op C2
d) is a very useful feature too.

> Do you mean that those Table API/SQL use cases (HashJoin/SortMergeJoin)
could be easily handled by a single N-Ary Stream Operator, so this would be
covered by steps 1. and 2. from my plan from my previous e-mail? That would
be real nice (avoiding the input selection chaining).

Yes, because in the table layer, the typical scenarios currently only have
static order. (We don't consider MergeJoin here, because it's too complex
to be optimized, and not deserved to be optimized at present.).
For example, the current TwoInputOperators: HashJoin and NestedLoopJoin.
They are all static reading order. We must read the build input before we
can read the probe input.
So after we analyze chain, we put all the operators that can chain into a N
input operator, We can analyze the static order required by this operator,
and divide the reading order into several levels:
- fist level: input4, input5, input1
- second level: input2, input6
- third level: input1, input7
Note that these analyses are at the compile time of the client.
At runtime, we just need to read in a fixed order.

Best,
Jingsong Lee

On Wed, Dec 4, 2019 at 10:15 PM Piotr Nowojski  wrote:

> Hi Jingsong,
>
> Thanks for the feedback :)
>
> Could you clarify a little bit what do you mean by your wished use cases?
>
> > There are a large number jobs (in production environment) that their
> > TwoInputOperators that can be chained. We used to only watch the last
> > ten tasks transmit data through disk and network, which could have been
> >  done in one task.
> > For performance, if we can chain them, the average is 30%+, and there
> >  is an order of magnitude in extreme cases.
>
> As I mentioned at the end, I would like to avoid/post pone chaining of
> multiple/two input operators one after another because of the complexity of
> input selection. For the first version I would like to aim only to allow
> chaining the single input operators with something (2 or N input must be
> always head of the chain) . For example chains:
>
> a) two input operator X -> one input operator Y -> one input operator Z
> (ALLOWED)
> b) n input operator X -> one input operator Y -> one input operator Z
> (ALLOWED)
> c) two input operator X -> one input operator Y -> two input operator Z
> (NOT ALLOWED as a single chain)
>
> The example above sounds to me like c)
>
> I think as a follow up, we could allow c), by extend chaining to a simple
> rule: there can only be a single input selectable operator in the chain
> (again, it’s the chaining of multiple input selectable operators that’s
> causing some problems).
>
> > The table layer has many special features. which give us the chance to
> optimize
> >  it, but also results that it is hard to let underlying layer to provide
> an abstract
> > mechanism to implement it. For example:
> > - HashJoin must read all the data on one side(build side) and then read
> the
> > other side (probe side).
> > - HashJoin only emit data when read probe side.
> > - SortMergeJoin read random, but if we have SortMergeJoin chain another
> >  MergeJoin(Sort attribute re-use), that make things complicated.
> > - HashAggregate/Sort only emit data in endInput.
> >
> > Provide an N-Ary stream operator to make everything possible. The upper
> >  layer can do anything. These things can be specific optimization, which
> is much
> >  more natural than the lower layer.
>
> Do you mean that those Table API/SQL use cases (HashJoin/SortMergeJoin)
> could be easily handled by a single N-Ary Stream Operator, so this would be
> covered by steps 1. and 2. from my plan from my previous e-mail? That would
> be real nice (avoiding the input selection chaining).
>
> Piotrek
>
> > On 4 Dec 2019, at 14:29, Jingsong Li  wrote:
> >
> > Hi Piotr,
> >
> > Huge +1 for N-Ary Stream Operator.
> > And I love this Golden Shovel award very much!
> >
> > There are a large number jobs (in production environment) that their
> > TwoInputOperators that can be chained. We used to only watch the last
> > ten tasks transmit data through disk and network, which could have been
> >  done in one task.
> > For performance, if we can chain them, the average is 30%+, and there
> >  is an order of magnitude in extreme cases.
> >
> > The table

[jira] [Created] (FLINK-15071) YARN vcore capacity check can not pass when use large slotPerTaskManager

2019-12-05 Thread huweihua (Jira)

huweihua created FLINK-15071:


 Summary: YARN vcore capacity check can not pass when use large 
slotPerTaskManager
 Key: FLINK-15071
 URL: https://issues.apache.org/jira/browse/FLINK-15071
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.9.0
Reporter: huweihua


The check of YARN vcore capacity in YarnClusterDescriptor.isReadyForDeployment 
can not pass If we config large slotsPerTaskManager(such as 96). The dynamic 
property yarn.containers.vcores does not work.

This is because we set dynamicProperties after check isReadyForDeployment.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15070) Supplement the case of bounded blocking partition for benchmark

2019-12-05 Thread zhijiang (Jira)

zhijiang created FLINK-15070:


 Summary: Supplement the case of bounded blocking partition for 
benchmark
 Key: FLINK-15070
 URL: https://issues.apache.org/jira/browse/FLINK-15070
 Project: Flink
  Issue Type: Task
  Components: Benchmarks
Reporter: zhijiang


ATM the benchmark only covers the case of pipelined partition used in streaming 
job, so it is better to also cover the case of blocking partition for batch 
job.  Then we can easily trace the performance concerns for any changes future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15069) Supplement the compression case for benchmark

2019-12-05 Thread zhijiang (Jira)

zhijiang created FLINK-15069:


 Summary: Supplement the compression case for benchmark
 Key: FLINK-15069
 URL: https://issues.apache.org/jira/browse/FLINK-15069
 Project: Flink
  Issue Type: Task
  Components: Benchmarks
Reporter: zhijiang


While reviewing the PR of introducing data compression for persistent storage 
and network shuffle, we think it is better to also cover this scenario in the 
benchmark for tracing the performance issues future. 

Refer to https://github.com/apache/flink/pull/10375#pullrequestreview-325193504



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15068) Disable RocksDB's local LOG by default

2019-12-05 Thread Nico Kruber (Jira)

Nico Kruber created FLINK-15068:
---

 Summary: Disable RocksDB's local LOG by default
 Key: FLINK-15068
 URL: https://issues.apache.org/jira/browse/FLINK-15068
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.9.1, 1.8.2, 1.7.2
Reporter: Nico Kruber
Assignee: Nico Kruber
 Fix For: 1.10.0


With Flink's default settings for RocksDB, it will write a log file (not the 
WAL, but pure logging statements) into the data folder. Besides periodic 
statistics, it will log compaction attempts, new memtable creations, flushes, 
etc.

A few things to note about this practice:
 # *this LOG file is growing over time with no limit (!)*
 # the default logging level is INFO
 # the statistics in there may help looking into performance and/or disk space 
problems (but maybe you should be looking and monitoring metrics instead)
 # this file is not useful for debugging errors since it will be deleted along 
with the local dir when the TM goes down

With a custom \{{OptionsFactory}}, the user can change the behaviour like the 
following:
{code:java}
@Override
public DBOptions createDBOptions(DBOptions currentOptions) {
currentOptions = super.createDBOptions(currentOptions);

currentOptions.setKeepLogFileNum(10);
currentOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
currentOptions.setStatsDumpPeriodSec(0);
currentOptions.setMaxLogFileSize(1024 * 1024); // 1 MB each

return currentOptions;
}{code}
However, the rotating logger does currently not work (it will not delete old 
log files - see [https://github.com/dataArtisans/frocksdb/pull/12]). Also, the 
user should not have to write his own {{OptionsFactory}} to get a sensible 
default.

To prevent this file from filling up the disk, I propose to change Flink's 
default RocksDB settings so that the LOG file is effectively disabled (nothing 
is written to it by default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15067) Pass execution configuration from TableEnvironment to StreamExecutionEnvironment

2019-12-05 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-15067:


 Summary: Pass execution configuration from TableEnvironment to 
StreamExecutionEnvironment
 Key: FLINK-15067
 URL: https://issues.apache.org/jira/browse/FLINK-15067
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner, Table SQL / Planner
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.10.0


Upon translating a relational tree to a StreamTransformation we should pass 
execution parameters such as autowatermark interval/default parallelism etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15066) Cannot run multiple `insert into csvTable values ()`

2019-12-05 Thread Kurt Young (Jira)

Kurt Young created FLINK-15066:
--

 Summary: Cannot run multiple `insert into csvTable values ()`
 Key: FLINK-15066
 URL: https://issues.apache.org/jira/browse/FLINK-15066
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Kurt Young
Assignee: Danny Chen
 Fix For: 1.10.0


I created a csv table in sql client, and tried to insert some data into this 
table.

The first insert into success, but the second one failed with exception: 
{code:java}
// Caused by: java.io.IOException: File or directory /.../xxx.csv already 
exists. Existing files and directories are not overwritten in NO_OVERWRITE 
mode. Use OVERWRITE mode to overwrite existing files and directories.at 
org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:817)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-05 Thread Zhenghua Gao

+1 for dropping.

*Best Regards,*
*Zhenghua Gao*


On Thu, Dec 5, 2019 at 11:08 AM Dian Fu  wrote:

> +1 for dropping them.
>
> Just FYI: there was a similar discussion few months ago [1].
>
> [1]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-td29916.html#a29997
>
> 在 2019年12月5日，上午10:29，vino yang  写道：
>
> +1
>
> jincheng sun  于2019年12月5日周四 上午10:26写道：
>
>> +1  for drop it, and Thanks for bring up this discussion Chesnay!
>>
>> Best,
>> Jincheng
>>
>> Jark Wu  于2019年12月5日周四 上午10:19写道：
>>
>>> +1 for dropping, also cc'ed user mailing list.
>>>
>>>
>>> Best,
>>> Jark
>>>
>>> On Thu, 5 Dec 2019 at 03:39, Konstantin Knauf 
>>> wrote:
>>>
>>> > Hi Chesnay,
>>> >
>>> > +1 for dropping. I have not heard from any user using 0.8 or 0.9 for a
>>> long
>>> > while.
>>> >
>>> > Cheers,
>>> >
>>> > Konstantin
>>> >
>>> > On Wed, Dec 4, 2019 at 1:57 PM Chesnay Schepler 
>>> > wrote:
>>> >
>>> > > Hello,
>>> > >
>>> > > What's everyone's take on dropping the Kafka 0.8/0.9 connectors from
>>> the
>>> > > Flink codebase?
>>> > >
>>> > > We haven't touched either of them for the 1.10 release, and it seems
>>> > > quite unlikely that we will do so in the future.
>>> > >
>>> > > We could finally close a number of test stability tickets that have
>>> been
>>> > > lingering for quite a while.
>>> > >
>>> > >
>>> > > Regards,
>>> > >
>>> > > Chesnay
>>> > >
>>> > >
>>> >
>>> > --
>>> >
>>> > Konstantin Knauf | Solutions Architect
>>> >
>>> > +49 160 91394525
>>> >
>>> >
>>> > Follow us @VervericaData Ververica 
>>> >
>>> >
>>> > --
>>> >
>>> > Join Flink Forward  - The Apache Flink
>>> > Conference
>>> >
>>> > Stream Processing | Event Driven | Real Time
>>> >
>>> > --
>>> >
>>> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>> >
>>> > --
>>> > Ververica GmbH
>>> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>> > (Tony) Cheng
>>> >
>>>
>>
>

60 matches

Mail list logo