Re: [jira] [Created] (FLINK-15644) Add support for SQL query validation

2020-01-17 Thread Flavio Pompermaier
Why not adding also a suggest() method (also unimplemented initially) that
would return the list of suitable completions/tokens on the current query?
How complex eould it be to implement it in you opinion?

Il Ven 17 Gen 2020, 18:32 Fabian Hueske (Jira)  ha scritto:

> Fabian Hueske created FLINK-15644:
> -
>
>  Summary: Add support for SQL query validation
>  Key: FLINK-15644
>  URL: https://issues.apache.org/jira/browse/FLINK-15644
>  Project: Flink
>   Issue Type: New Feature
>   Components: Table SQL / API
> Reporter: Fabian Hueske
>
>
> It would be good if the {{TableEnvironment}} would offer methods to check
> the validity of SQL queries. Such a method could be used by services (CLI
> query shells, notebooks, SQL UIs) that are backed by Flink and execute
> their queries on Flink.
>
> Validation should be available in two levels:
>  # Validation of syntax and semantics: This includes parsing the query,
> checking the catalog for dbs, tables, fields, type checks for expressions
> and functions, etc. This will check if the query is a valid SQL query.
>  # Validation that query is supported: Checks if Flink can execute the
> given query. Some syntactically and semantically valid SQL queries are not
> supported, esp. in a streaming context. This requires running the
> optimizer. If the optimizer generates an execution plan, the query can be
> executed. This check includes the first step and is more expensive.
>
> The reason for this separation is that the first check can be done much
> fast as it does not involve calling the optimizer. Hence, it would be
> suitable for fast checks in an interactive query editor. The second check
> might take more time (depending on the complexity of the query) and might
> not be suitable for rapid checks but only on explicit user request.
>
> Requirements:
>  * validation does not modify the state of the {{TableEnvironment}}, i.e.
> it does not add plan operators
>  * validation does not require connector dependencies
>  * validation can identify the update mode of a continuous query result
> (append-only, upsert, retraction).
>
> Out of scope for this issue:
>  * better error messages for unsupported features as suggested by
> FLINK-7217
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


ValueState with pure Java class keeping lists/map vs ListState/MapState, which one is a recommended way?

2020-01-17 Thread Elkhan Dadashov
Hi Flinkers,

Was curious about if there is any performance(memory/speed) difference
between these two options:

in window process functions, when keeping state:

*1) Create a single ValueState, and store state in pure Java
objects*

class MyClass {
   List listOtherClass;
   Map mapKeyToSomeValue;
}

public class MyProcessFunc
  extends KeyedProcessFunction> {
...
   ValueState valueState;
...
}

vs

*2) Create ListState and MapState as 2 Flink state variables:*

public class MyProcessFunc
  extends KeyedProcessFunction> {
...
   ListState listState;
   MapState mapState;
...
}

Which option is a recommended way of storing the states?

Thanks.


[jira] [Created] (FLINK-15644) Add support for SQL query validation

2020-01-17 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-15644:
-

 Summary: Add support for SQL query validation 
 Key: FLINK-15644
 URL: https://issues.apache.org/jira/browse/FLINK-15644
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Fabian Hueske


It would be good if the {{TableEnvironment}} would offer methods to check the 
validity of SQL queries. Such a method could be used by services (CLI query 
shells, notebooks, SQL UIs) that are backed by Flink and execute their queries 
on Flink.

Validation should be available in two levels:
 # Validation of syntax and semantics: This includes parsing the query, 
checking the catalog for dbs, tables, fields, type checks for expressions and 
functions, etc. This will check if the query is a valid SQL query.
 # Validation that query is supported: Checks if Flink can execute the given 
query. Some syntactically and semantically valid SQL queries are not supported, 
esp. in a streaming context. This requires running the optimizer. If the 
optimizer generates an execution plan, the query can be executed. This check 
includes the first step and is more expensive.

The reason for this separation is that the first check can be done much fast as 
it does not involve calling the optimizer. Hence, it would be suitable for fast 
checks in an interactive query editor. The second check might take more time 
(depending on the complexity of the query) and might not be suitable for rapid 
checks but only on explicit user request.

Requirements:
 * validation does not modify the state of the {{TableEnvironment}}, i.e. it 
does not add plan operators
 * validation does not require connector dependencies
 * validation can identify the update mode of a continuous query result 
(append-only, upsert, retraction).

Out of scope for this issue:
 * better error messages for unsupported features as suggested by FLINK-7217



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] Apache Flink 1.10.0, release candidate #0

2020-01-17 Thread Gary Yao
Hi all,

RC0 for Apache Flink 1.10.0 has been created. This has all the artifacts
that
we would typically have for a release, except for a source code tag and a PR
for the release announcement.

This preview-only RC is created only to drive the current testing efforts,
and
no official vote will take place. It includes the following:

* the preview source release and binary convenience releases [1], which are
signed with the key with fingerprint
BB137807CEFBE7DD2616556710B12A1F89C115E8
[2],
* all artifacts that would normally be deployed to the Maven Central
Repository [3]

To test with these artifacts, you can create a settings.xml file with the
content shown below [4]. This settings file can be referenced in your maven
commands via --settings /path/to/settings.xml. This is useful for creating a
quickstart project based on the staged release and also for building against
the staged jars.

Happy testing!

Best,
Gary

[1] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc0/
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1318/
[4]

  
flink-1.10.0
  
  

  flink-1.10.0
  

  flink-1.10.0
  
https://repository.apache.org/content/repositories/orgapacheflink-1318/



  archetype
  
https://repository.apache.org/content/repositories/orgapacheflink-1318/


  

  



[jira] [Created] (FLINK-15643) Support to start multiple Flink masters to achieve faster recovery

2020-01-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-15643:
-

 Summary: Support to start multiple Flink masters to achieve faster 
recovery
 Key: FLINK-15643
 URL: https://issues.apache.org/jira/browse/FLINK-15643
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Yang Wang


The replicas of Flink master deployment could be set more than 1 to achieve 
faster recovery. When the active Flink master failed, the standby one will take 
over immediately.

For standalone cluster, some users have used it in the production.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15642) Support to set JobManager liveness check

2020-01-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-15642:
-

 Summary: Support to set JobManager liveness check
 Key: FLINK-15642
 URL: https://issues.apache.org/jira/browse/FLINK-15642
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Yang Wang


The liveness of TaskManager will be controlled by Flink Master. When it failed, 
timeout, a new pod will be started to replace. We need to add a liveness check 
for JobManager.

 

It just like what we could do in the yaml.
{code:java}
...
livenessProbe:
  tcpSocket:
port: 6123
  initialDelaySeconds: 30
  periodSeconds: 60
...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15641) Support to start sidecar container and init container

2020-01-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-15641:
-

 Summary: Support to start sidecar container and init container
 Key: FLINK-15641
 URL: https://issues.apache.org/jira/browse/FLINK-15641
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Yang Wang


* Sider Container 
 * Add a sidecar container of FlinkMaster and TaskManager to collector log to 
shared storage(hdfs, elastic search, etc.).
 * It could also be used for debugging purpose


 * Init Container
 * Use init container to download users jars dynamically or do something else 
before start jobmanager and taskmanager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15640) Support to set label and node selector

2020-01-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-15640:
-

 Summary: Support to set label and node selector
 Key: FLINK-15640
 URL: https://issues.apache.org/jira/browse/FLINK-15640
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Yang Wang


Navigate to [Kubernetes 
doc|https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/] 
for more information.

{code:java}
public static final ConfigOption JOB_MANAGER_USER_LABELS =
 key("kubernetes.jobmanager.user.labels")
  .noDefaultValue()
  .withDescription("The labels to be set for JobManager replica controller, 
service and pods. " +
   "Specified as key:value pairs separated by commas. such as 
version:alphav1,deploy:test.");

public static final ConfigOption TASK_MANAGER_USER_LABELS =
 key("kubernetes.taskmanager.user.labels")
  .noDefaultValue()
  .withDescription("The labels to be set for TaskManager pods. " +
   "Specified as key:value pairs separated by commas. such as 
version:alphav1,deploy:test.");

public static final ConfigOption JOB_MANAGER_NODE_SELECTOR =
 key("kubernetes.jobmanager.node-selector")
  .noDefaultValue()
  .withDescription("The node-selector to be set for JobManager pod. " +
   "Specified as key:value pairs separated by commas. such as 
environment:dev,tier:frontend.");

public static final ConfigOption TASK_MANAGER_NODE_SELECTOR =
 key("kubernetes.taskmanager.node-selector")
  .noDefaultValue()
  .withDescription("The node-selector to be set for TaskManager pods. " +
   "Specified as key:value pairs separated by commas. such as 
environment:dev,tier:frontend.");
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15639) Support to set toleration for jobmanager and taskmanger

2020-01-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-15639:
-

 Summary: Support to set toleration for jobmanager and taskmanger
 Key: FLINK-15639
 URL: https://issues.apache.org/jira/browse/FLINK-15639
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Yang Wang


Taints and tolerations work together to ensure that pods are not scheduled onto 
inappropriate nodes. Navigate to [Kubernetes 
doc|https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/] 
for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15638) releasing/create_release_branch.sh does not set version in flink-python/pyflink/version.py

2020-01-17 Thread Gary Yao (Jira)
Gary Yao created FLINK-15638:


 Summary: releasing/create_release_branch.sh does not set version 
in flink-python/pyflink/version.py
 Key: FLINK-15638
 URL: https://issues.apache.org/jira/browse/FLINK-15638
 Project: Flink
  Issue Type: Bug
  Components: Release System
Affects Versions: 1.10.0
Reporter: Gary Yao
 Fix For: 1.10.0


{{releasing/create_release_branch.sh}} does not set the version in 
{{flink-python/pyflink/version.py}}. Currently the version.py contains:

{noformat}
__version__ = "1.10.dev0" 
{noformat}

{{setup.py}} will replace .dev0 with -SNAPSHOT and tries to find the respective 
flink distribution in the flink-dist/target, which will not exist.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-91 - Support SQL Client Gateway

2020-01-17 Thread godfrey he
Hi devs,

I've updated the FLIP-91 [0] according to feedbacks. Please take another
look.

Best,
godfrey

[0]
https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/


Kurt Young  于2020年1月9日周四 下午4:21写道:

> Hi,
>
> +1 to the general idea. Supporting sql client gateway mode will bridge the
> connection
> between Flink SQL and production environment. Also the JDBC driver is a
> quite good
> supplement for usability of Flink SQL, users will have more choices to try
> out Flink SQL
> such as Tableau.
>
> I went through the document and left some comments there.
>
> Best,
> Kurt
>
>
> On Sun, Jan 5, 2020 at 1:57 PM tison  wrote:
>
> > The general idea sounds great. I'm going to keep up with the progress
> soon.
> >
> > Best,
> > tison.
> >
> >
> > Bowen Li  于2020年1月5日周日 下午12:59写道:
> >
> > > +1. It will improve user experience quite a bit.
> > >
> > >
> > > On Thu, Jan 2, 2020 at 22:07 Yangze Guo  wrote:
> > >
> > > > Thanks for driving this, Xiaoling!
> > > >
> > > > +1 for supporting SQL client gateway.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > >
> > > > On Thu, Jan 2, 2020 at 9:58 AM 贺小令  wrote:
> > > > >
> > > > > Hey everyone,
> > > > > FLIP-24
> > > > > <
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >
> > > > > proposes the whole conception and architecture of SQL Client. The
> > > > embedded
> > > > > mode is already supported since release-1.5, which is helpful for
> > > > > debugging/demo purposes.
> > > > > Many users ask that how to submit a Flink job to online environment
> > > > without
> > > > > programming on Flink API. To solve this, we create FLIP-91 [0]
> which
> > > > > supports sql client gateway mode, then users can submit a job
> through
> > > CLI
> > > > > client, REST API or JDBC.
> > > > >
> > > > > I'm glad that you can give me more feedback about FLIP-91.
> > > > >
> > > > > Best,
> > > > > godfreyhe
> > > > >
> > > > > [0]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > >
> > >
> >
>


[jira] [Created] (FLINK-15637) For RocksDBStateBackend, make RocksDB the default store for timers

2020-01-17 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-15637:


 Summary: For RocksDBStateBackend, make RocksDB the default store 
for timers
 Key: FLINK-15637
 URL: https://issues.apache.org/jira/browse/FLINK-15637
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Stephan Ewen
 Fix For: 1.10.0


Set the {{state.backend.rocksdb.timer-service.factory}} to {{ROCKSDB}} by 
default. Also ensure that the programmatic default value becomes the same.

We need to update the performance tuning guide to mention this.
 
We need to update the release notes to mention this.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15636) Support Python UDF for flink planner under batch mode

2020-01-17 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-15636:
---

 Summary: Support Python UDF for flink planner under batch mode
 Key: FLINK-15636
 URL: https://issues.apache.org/jira/browse/FLINK-15636
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: Hequn Cheng
Assignee: Hequn Cheng


Currently, Python UDF has been supported under flink planner(only stream) and 
blink planner(stream). This jira dedicates to add Python UDF support for 
flink planner under batch mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-17 Thread Dian Fu
Thanks everyone for your warm welcome.
It's my pleasure to be part of the community and looking forward to
contribute more in the future.

Regards,
Dian

On Fri, Jan 17, 2020 at 4:03 PM Yang Wang  wrote:

> Congratulations!
>
>
> Best,
> Yang
>
> Terry Wang  于2020年1月17日周五 下午3:28写道:
>
>> Congratulations!
>>
>> Best,
>> Terry Wang
>>
>>
>>
>> 2020年1月17日 14:09,Biao Liu  写道:
>>
>> Congrats!
>>
>> Thanks,
>> Biao /'bɪ.aʊ/
>>
>>
>>
>> On Fri, 17 Jan 2020 at 13:43, Rui Li  wrote:
>>
>>> Congratulations Dian, well deserved!
>>>
>>> On Thu, Jan 16, 2020 at 5:58 PM jincheng sun 
>>> wrote:
>>>
 Hi everyone,

 I'm very happy to announce that Dian accepted the offer of the Flink
 PMC to become a committer of the Flink project.

 Dian Fu has been contributing to Flink for many years. Dian Fu played
 an essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
 contributed several major features, reported and fixed many bugs, spent a
 lot of time reviewing pull requests and also frequently helping out on the
 user mailing lists and check/vote the release.

 Please join in me congratulating Dian for becoming a Flink committer !

 Best,
 Jincheng(on behalf of the Flink PMC)

>>>
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>
>>


[jira] [Created] (FLINK-15635) Allow passing a ClassLoader to EnvironmentSettings

2020-01-17 Thread Timo Walther (Jira)
Timo Walther created FLINK-15635:


 Summary: Allow passing a ClassLoader to EnvironmentSettings
 Key: FLINK-15635
 URL: https://issues.apache.org/jira/browse/FLINK-15635
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API, Table SQL / Planner
Reporter: Timo Walther


We had a couple of class loading issues in the past because people forgot to 
use the right classloader in {{flink-table}}. The SQL Client executor code 
hacks a classloader into the planner process by using {{wrapClassLoader}} that 
sets the threads context classloader.

Instead we should allow passing a class loader to environment settings. This 
class loader can be passed to the planner and can be stored in table 
environment, table config, etc. to have a consistent class loading behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15634) disableAutoGeneratedUIDs fails with coGroup and join

2020-01-17 Thread Jira
Jürgen Kreileder created FLINK-15634:


 Summary: disableAutoGeneratedUIDs fails with coGroup and join
 Key: FLINK-15634
 URL: https://issues.apache.org/jira/browse/FLINK-15634
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.10.0
Reporter: Jürgen Kreileder


coGroup/join seems to generate two Map operators for which you can't set the 
UID. 

Here's a test case:
{code:java}
@Test
public void testDisablingAutoUidsWorksWithCoGroup() throws Exception {
   StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
   env.getConfig().disableAutoGeneratedUIDs();

   env
  .addSource(new 
NoOpSourceFunction()).setUidHash("")
  .coGroup(env.addSource(new 
NoOpSourceFunction()).setUidHash(""))
  .where(o -> o).equalTo(o -> o)
  .window(TumblingEventTimeWindows.of(Time.days(1)))
  .with(new CoGroupFunction() {
 @Override
 public void coGroup(Iterable first, Iterable second, 
Collector out) throws Exception {
 }
  }).setUidHash("")
  .addSink(new 
DiscardingSink<>()).setUidHash("");

   env.execute();
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15633) Bump javadoc-plugin to 3.0.0-M1+

2020-01-17 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-15633:


 Summary: Bump javadoc-plugin to 3.0.0-M1+
 Key: FLINK-15633
 URL: https://issues.apache.org/jira/browse/FLINK-15633
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Release System
Affects Versions: 1.10.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


The javadoc-plugin fails when run Java 11. This isn't a big issue since we do 
our releases on JDK 8 anyway, but we should still fix it so users can reproduce 
releases on JDK 11.

{code}
java.lang.ExceptionInInitializerError
at 
org.apache.maven.plugin.javadoc.AbstractJavadocMojo.(AbstractJavadocMojo.java:190)
at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:86)
at 
com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
at 
com.google.inject.internal.ConstructorInjector.access$000(ConstructorInjector.java:32)
at 
com.google.inject.internal.ConstructorInjector$1.call(ConstructorInjector.java:89)
at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115)
at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133)
at 
com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:87)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
at 
com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103)
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
at 
com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)
at 
org.eclipse.sisu.space.AbstractDeferredClass.get(AbstractDeferredClass.java:48)
at 
com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:81)
at 
com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:53)
at 
com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:65)
at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115)
at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133)
at 
com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68)
at 
com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:63)
at 
com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:45)
at 
com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
at org.eclipse.sisu.inject.Guice4$1.get(Guice4.java:162)
at org.eclipse.sisu.inject.LazyBeanEntry.getValue(LazyBeanEntry.java:81)
at 
org.eclipse.sisu.plexus.LazyPlexusBean.getValue(LazyPlexusBean.java:51)
at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:263)
at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:255)
at 
org.apache.maven.plugin.internal.DefaultMavenPluginManager.getConfiguredMojo(DefaultMavenPluginManager.java:519)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:121)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at 

[jira] [Created] (FLINK-15632) Zookeeper HA service could not work for active kubernetes integration

2020-01-17 Thread Yang Wang (Jira)
Yang Wang created FLINK-15632:
-

 Summary: Zookeeper HA service could not work for active kubernetes 
integration
 Key: FLINK-15632
 URL: https://issues.apache.org/jira/browse/FLINK-15632
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Reporter: Yang Wang


It will be some different, if we want to support HA for active Kubernetes 
integration.
 # The K8s service is designed for accessing the jobmanager out of K8s cluster. 
So Flink client will not use HA service to retrieve address of jobmanager. 
Instead, it always use Kubernetes service to contact with jobmanager via rest 
client. 
 # The Kubernetes DNS creates A and SRV records only for Services. It doesn't 
generate pods' A records. So the ip address, not hostname, will be used as 
jobmanager address.

 

All other behaviors will be same as Zookeeper HA for standalone and Yarn.

To fix this problem, we just need some minor changes to 
{{KubernetesClusterDescriptor}} and {{KubernetesSessionEntrypoint}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Piotr Nowojski
+1 for making it consistent. When using X state backend, timers should be 
stored in X by default.

Also I think any configuration option controlling that needs to be well 
documented in some performance tuning section of the docs.

Piotrek

> On 17 Jan 2020, at 09:16, Congxian Qiu  wrote:
> 
> +1 to store timers in RocksDB default.
> 
> Store timers in Heap can encounter OOM problems, and make the checkpoint much 
> slower, and store times in RocksDB can get ride of both.
> 
> Best,
> Congxian
> 
> 
> Biao Liu mailto:mmyy1...@gmail.com>> 于2020年1月17日周五 
> 下午3:10写道:
> +1
> 
> I think that's how it should be. Timer should align with other regular state.
> 
> If user wants a better performance without memory concern, memory or FS 
> statebackend might be considered. Or maybe we could optimize the performance 
> by introducing a specific column family for timer. It could have its own 
> tuned options.
> 
> Thanks,
> Biao /'bɪ.aʊ/
> 
> 
> 
> On Fri, 17 Jan 2020 at 10:11, Jingsong Li  > wrote:
> Hi Stephan,
> 
> Thanks for starting this discussion.
> +1 for stores times in RocksDB by default.
> In the past, when Flink didn't save the times with RocksDb, I had a headache. 
> I always adjusted parameters carefully to ensure that there was no risk of 
> Out of Memory.
> 
> Just curious, how much impact of heap and RocksDb for times on performance
> - if there is no order of magnitude difference between heap and RocksDb, 
> there is no problem in using RocksDb.
> - if there is, maybe we should improve our documentation to let users know 
> about this option. (Looks like a lot of users didn't know)
> 
> Best,
> Jingsong Lee
> 
> On Fri, Jan 17, 2020 at 3:18 AM Yun Tang  > wrote:
> Hi Stephan,
> 
> I am +1 for the change which stores timers in RocksDB by default. 
> 
> Some users hope the checkpoint could be completed as fast as possible, which 
> also need the timer stored in RocksDB to not affect the sync part of 
> checkpoint.
> 
> Best
> Yun Tang
> From: Andrey Zagrebin mailto:azagre...@apache.org>>
> Sent: Friday, January 17, 2020 0:07
> To: Stephan Ewen mailto:se...@apache.org>>
> Cc: dev mailto:dev@flink.apache.org>>; user 
> mailto:u...@flink.apache.org>>
> Subject: Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in 
> RocksDB
>  
> Hi Stephan,
> 
> Thanks for starting this discussion. I am +1 for this change.
> In general, number of timer state keys can have the same order as number of 
> main state keys.
> So if RocksDB is used for main state for scalability, it makes sense to have 
> timers there as well
> unless timers are used for only very limited subset of keys which fits into 
> memory.
> 
> Best,
> Andrey
> 
> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen  > wrote:
> Hi all!
> 
> I would suggest a change of the current default for timers. A bit of 
> background:
> 
>   - Timers (for windows, process functions, etc.) are state that is managed 
> and checkpointed as well.
>   - When using the MemoryStateBackend and the FsStateBackend, timers are kept 
> on the JVM heap, like regular state.
>   - When using the RocksDBStateBackend, timers can be kept in RocksDB (like 
> other state) or on the JVM heap. The JVM heap is the default though!
> 
> I find this a bit un-intuitive and would propose to change this to let the 
> RocksDBStateBackend store all state in RocksDB by default.
> The rationale being that if there is a tradeoff (like here), safe and 
> scalable should be the default and unsafe performance be an explicit choice.
> 
> This sentiment seems to be shared by various users as well, see 
> https://twitter.com/StephanEwen/status/1214590846168903680 
>  and 
> https://twitter.com/StephanEwen/status/1214594273565388801 
> 
> We would of course keep the switch and mention in the performance tuning 
> section that this is an option.
> 
> # RocksDB State Backend Timers on Heap
>   - Pro: faster
>   - Con: not memory safe, GC overhead, longer synchronous checkpoint time, no 
> incremental checkpoints
> 
> #  RocksDB State Backend Timers on in RocksDB
>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>   - Con: performance overhead.
> 
> Please chime in and let me know what you think.
> 
> Best,
> Stephan
> 
> 
> 
> -- 
> Best, Jingsong Lee



[DISCUSS] FLINK-15447: Change "java.io.tmpdir" of JM/TM on Yarn to "{{PWD}}/tmp"

2020-01-17 Thread Victor Wong
Hi,

Currently, the "java.io.tmpdir" property of Flink on Yarn is set to
"/tmp", which may cause exceptions in case of a full "/tmp" directory,
especially when the user depends on some third-party libraries using
"java.io.tmpdir"
to store files.

I would like to suggest to use the working directory of the Yarn container
as "java.io.tmpdir", e.g. "{{PWD}}/tmp", which is owned by each application
and will be cleaned after the application finished.

There are some +1s in the Flink issue[1], I'd like to ask for more
suggestions on this, or assign the ticket to me if consensus has emerged.

Thanks!

[1]. https://issues.apache.org/jira/browse/FLINK-15447
-- 

Best,
Victor


[jira] [Created] (FLINK-15631) Cannot use generic types as the result of an AggregateFunction in Blink planner

2020-01-17 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-15631:


 Summary: Cannot use generic types as the result of an 
AggregateFunction in Blink planner
 Key: FLINK-15631
 URL: https://issues.apache.org/jira/browse/FLINK-15631
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.0, 1.10.0
Reporter: Dawid Wysakowicz


It is not possible to use a GenericTypeInfo for a result type of an 
{{AggregateFunction}} in a retract mode with state cleaning disabled.

{code}

  @Test
  def testGenericTypes(): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val setting = 
EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build()
val tEnv = StreamTableEnvironment.create(env, setting)
val t = env.fromElements(1, 2, 3).toTable(tEnv, 'a)

val results = t
  .select(new GenericAggregateFunction()('a))
  .toRetractStream[Row]

val sink = new TestingRetractSink
results.addSink(sink).setParallelism(1)
env.execute()
  }

class RandomClass(var i: Int)

class GenericAggregateFunction extends AggregateFunction[java.lang.Integer, 
RandomClass] {
  override def getValue(accumulator: RandomClass): java.lang.Integer = 
accumulator.i

  override def createAccumulator(): RandomClass = new RandomClass(0)

  override def getResultType: TypeInformation[java.lang.Integer] = new 
GenericTypeInfo[Integer](classOf[Integer])

  override def getAccumulatorType: TypeInformation[RandomClass] = new 
GenericTypeInfo[RandomClass](
classOf[RandomClass])

  def accumulate(acc: RandomClass, value: Int): Unit = {
acc.i = value
  }

  def retract(acc: RandomClass, value: Int): Unit = {
acc.i = value
  }

  def resetAccumulator(acc: RandomClass): Unit = {
acc.i = 0
  }
}
{code}

The code above fails with:

{code}
Caused by: java.lang.UnsupportedOperationException: BinaryGeneric cannot be 
compared
at 
org.apache.flink.table.dataformat.BinaryGeneric.equals(BinaryGeneric.java:77)
at GroupAggValueEqualiser$17.equalsWithoutHeader(Unknown Source)
at 
org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:177)
at 
org.apache.flink.table.runtime.operators.aggregate.GroupAggFunction.processElement(GroupAggFunction.java:43)
at 
org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:85)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:170)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:151)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:128)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:311)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:702)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:527)
at java.lang.Thread.run(Thread.java:748)
{code}

This is related to FLINK-13702



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15630) Improve the environment requirement documentation of the Python API

2020-01-17 Thread Wei Zhong (Jira)
Wei Zhong created FLINK-15630:
-

 Summary: Improve the environment requirement documentation of the 
Python API
 Key: FLINK-15630
 URL: https://issues.apache.org/jira/browse/FLINK-15630
 Project: Flink
  Issue Type: Improvement
Reporter: Wei Zhong
 Fix For: 1.10.0


The current Python API documentation is not very clear about the environment 
requirements. It should be described in more detail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15629) Remove LegacyScheduler class and the consequently unused codes after this removal

2020-01-17 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-15629:
---

 Summary: Remove LegacyScheduler class and the consequently unused 
codes after this removal
 Key: FLINK-15629
 URL: https://issues.apache.org/jira/browse/FLINK-15629
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
Assignee: Zhu Zhu
 Fix For: 1.11.0


This task is to remove the LegacyScheduler class and the consequently unused 
codes after this removal.
Note that the scheduling/failover logics in ExecutionGraph would be removed in 
a separate task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15627) Correct the wrong naming of compareMaps in NFATestUtilities

2020-01-17 Thread shuai.xu (Jira)
shuai.xu created FLINK-15627:


 Summary: Correct the wrong naming of compareMaps in 
NFATestUtilities
 Key: FLINK-15627
 URL: https://issues.apache.org/jira/browse/FLINK-15627
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.9.1
Reporter: shuai.xu


The compareMaps in NFATestUtilities compare two lists in fact, so rename it to 
compareLists may be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15628) Fix initialize webSubmissionHandlers list in WebSubmissionExtension with correct size

2020-01-17 Thread lining (Jira)
lining created FLINK-15628:
--

 Summary: Fix initialize webSubmissionHandlers list in 
WebSubmissionExtension with correct size
 Key: FLINK-15628
 URL: https://issues.apache.org/jira/browse/FLINK-15628
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Reporter: lining


[code|[https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/WebSubmissionExtension.java#L64]]
 here needs to be 6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Congxian Qiu
+1 to store timers in RocksDB default.

Store timers in Heap can encounter OOM problems, and make the checkpoint
much slower, and store times in RocksDB can get ride of both.

Best,
Congxian


Biao Liu  于2020年1月17日周五 下午3:10写道:

> +1
>
> I think that's how it should be. Timer should align with other regular
> state.
>
> If user wants a better performance without memory concern, memory or FS
> statebackend might be considered. Or maybe we could optimize the
> performance by introducing a specific column family for timer. It could
> have its own tuned options.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 17 Jan 2020 at 10:11, Jingsong Li  wrote:
>
>> Hi Stephan,
>>
>> Thanks for starting this discussion.
>> +1 for stores times in RocksDB by default.
>> In the past, when Flink didn't save the times with RocksDb, I had a
>> headache. I always adjusted parameters carefully to ensure that there was
>> no risk of Out of Memory.
>>
>> Just curious, how much impact of heap and RocksDb for times on performance
>> - if there is no order of magnitude difference between heap and RocksDb,
>> there is no problem in using RocksDb.
>> - if there is, maybe we should improve our documentation to let users
>> know about this option. (Looks like a lot of users didn't know)
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Jan 17, 2020 at 3:18 AM Yun Tang  wrote:
>>
>>> Hi Stephan,
>>>
>>> I am +1 for the change which stores timers in RocksDB by default.
>>>
>>> Some users hope the checkpoint could be completed as fast as possible,
>>> which also need the timer stored in RocksDB to not affect the sync part of
>>> checkpoint.
>>>
>>> Best
>>> Yun Tang
>>> --
>>> *From:* Andrey Zagrebin 
>>> *Sent:* Friday, January 17, 2020 0:07
>>> *To:* Stephan Ewen 
>>> *Cc:* dev ; user 
>>> *Subject:* Re: [DISCUSS] Change default for RocksDB timers: Java Heap
>>> => in RocksDB
>>>
>>> Hi Stephan,
>>>
>>> Thanks for starting this discussion. I am +1 for this change.
>>> In general, number of timer state keys can have the same order as number
>>> of main state keys.
>>> So if RocksDB is used for main state for scalability, it makes sense to
>>> have timers there as well
>>> unless timers are used for only very limited subset of keys which fits
>>> into memory.
>>>
>>> Best,
>>> Andrey
>>>
>>> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen  wrote:
>>>
>>> Hi all!
>>>
>>> I would suggest a change of the current default for timers. A bit of
>>> background:
>>>
>>>   - Timers (for windows, process functions, etc.) are state that is
>>> managed and checkpointed as well.
>>>   - When using the MemoryStateBackend and the FsStateBackend, timers are
>>> kept on the JVM heap, like regular state.
>>>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
>>> (like other state) or on the JVM heap. The JVM heap is the default though!
>>>
>>> I find this a bit un-intuitive and would propose to change this to let
>>> the RocksDBStateBackend store all state in RocksDB by default.
>>> The rationale being that if there is a tradeoff (like here), safe and
>>> scalable should be the default and unsafe performance be an explicit choice.
>>>
>>> This sentiment seems to be shared by various users as well, see
>>> https://twitter.com/StephanEwen/status/1214590846168903680 and
>>> https://twitter.com/StephanEwen/status/1214594273565388801
>>> We would of course keep the switch and mention in the performance tuning
>>> section that this is an option.
>>>
>>> # RocksDB State Backend Timers on Heap
>>>   - Pro: faster
>>>   - Con: not memory safe, GC overhead, longer synchronous checkpoint
>>> time, no incremental checkpoints
>>>
>>> #  RocksDB State Backend Timers on in RocksDB
>>>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>>>   - Con: performance overhead.
>>>
>>> Please chime in and let me know what you think.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>
>> --
>> Best, Jingsong Lee
>>
>


Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-17 Thread Yang Wang
Congratulations!


Best,
Yang

Terry Wang  于2020年1月17日周五 下午3:28写道:

> Congratulations!
>
> Best,
> Terry Wang
>
>
>
> 2020年1月17日 14:09,Biao Liu  写道:
>
> Congrats!
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 17 Jan 2020 at 13:43, Rui Li  wrote:
>
>> Congratulations Dian, well deserved!
>>
>> On Thu, Jan 16, 2020 at 5:58 PM jincheng sun 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I'm very happy to announce that Dian accepted the offer of the Flink PMC
>>> to become a committer of the Flink project.
>>>
>>> Dian Fu has been contributing to Flink for many years. Dian Fu played an
>>> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
>>> contributed several major features, reported and fixed many bugs, spent a
>>> lot of time reviewing pull requests and also frequently helping out on the
>>> user mailing lists and check/vote the release.
>>>
>>> Please join in me congratulating Dian for becoming a Flink committer !
>>>
>>> Best,
>>> Jincheng(on behalf of the Flink PMC)
>>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>
>