[jira] [Created] (FLINK-16013) List config option could not be parsed correctly

2020-02-11 Thread Yang Wang (Jira)
Yang Wang created FLINK-16013:
-

 Summary: List config option could not be parsed correctly
 Key: FLINK-16013
 URL: https://issues.apache.org/jira/browse/FLINK-16013
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Configuration
Reporter: Yang Wang


Currently, if a config option is `List` type and written to a flink-conf.yaml, 
it could not be parsed correctly when reloaded from yaml resource. The root 
cause is we use `List#toString` to save into the yaml resource. However, when 
we want to parse a List from a string, we use semicolon to split the value.

 

The following is a unit test to reproduce this problem.
{code:java}
public void testWriteConfigurationAndReload() throws IOException {
 final File flinkConfDir = temporaryFolder.newFolder().getAbsoluteFile();
 final Configuration flinkConfig = new Configuration();
 final ConfigOption> listConfigOption = ConfigOptions
  .key("test-list-string-key")
  .stringType()
  .asList()
  .noDefaultValue();
 final List values = Arrays.asList("value1", "value2", "value3");
 flinkConfig.set(listConfigOption, values);
 assertThat(values, 
Matchers.containsInAnyOrder(flinkConfig.get(listConfigOption).toArray()));

 BootstrapTools.writeConfiguration(flinkConfig, new File(flinkConfDir, 
"flink-conf.yaml"));
 final Configuration loadedFlinkConfig = 
GlobalConfiguration.loadConfiguration(flinkConfDir.getAbsolutePath());
 assertThat(values, 
Matchers.containsInAnyOrder(loadedFlinkConfig.get(listConfigOption).toArray()));
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-11 Thread Becket Qin
+1 (binding)

- verified signature
- Ran word count example successfully.

Thanks,

Jiangjie (Becket) Qin

On Wed, Feb 12, 2020 at 1:29 PM Jark Wu  wrote:

> +1
>
> - checked/verified signatures and hashes
> - Pip installed the package successfully: pip install
> apache-flink-1.9.2.tar.gz
> - Run word count example successfully through the documentation [1].
>
> Best,
> Jark
>
> [1]:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/tutorials/python_table_api.html
>
> On Tue, 11 Feb 2020 at 22:00, Hequn Cheng  wrote:
>
> > +1 (non-binding)
> >
> > - Check signature and checksum.
> > - Install package successfully with Pip under Python 3.7.4.
> > - Run wordcount example successfully under Python 3.7.4.
> >
> > Best, Hequn
> >
> > On Tue, Feb 11, 2020 at 12:17 PM Dian Fu  wrote:
> >
> > > +1 (non-binding)
> > >
> > > - Verified the signature and checksum
> > > - Pip installed the package successfully: pip install
> > > apache-flink-1.9.2.tar.gz
> > > - Run word count example successfully.
> > >
> > > Regards,
> > > Dian
> > >
> > > 在 2020年2月11日,上午11:44,jincheng sun  写道:
> > >
> > >
> > > +1 (binding)
> > >
> > > - Install the PyFlink by `pip install` [SUCCESS]
> > > - Run word_count in both command line and IDE [SUCCESS]
> > >
> > > Best,
> > > Jincheng
> > >
> > >
> > >
> > > Wei Zhong  于2020年2月11日周二 上午11:17写道:
> > >
> > >> Hi,
> > >>
> > >> Thanks for driving this, Jincheng.
> > >>
> > >> +1 (non-binding)
> > >>
> > >> - Verified signatures and checksums.
> > >> - Verified README.md and setup.py.
> > >> - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and
> > Python
> > >> 3.7.5 successfully.
> > >> - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via
> > >> `pyflink-shell.sh local` and try the examples in the help message, run
> > well
> > >> and no exception.
> > >> - Try a word count example in IDE with Python 2.7.15 and Python 3.7.5,
> > >> run well and no exception.
> > >>
> > >> Best,
> > >> Wei
> > >>
> > >>
> > >> 在 2020年2月10日,19:12,jincheng sun  写道:
> > >>
> > >> Hi everyone,
> > >>
> > >> Please review and vote on the release candidate #1 for the PyFlink
> > >> version 1.9.2, as follows:
> > >>
> > >> [ ] +1, Approve the release
> > >> [ ] -1, Do not approve the release (please provide specific comments)
> > >>
> > >> The complete staging area is available for your review, which
> includes:
> > >>
> > >> * the official Apache binary convenience releases to be deployed to
> > >> dist.apache.org [1], which are signed with the key with fingerprint
> > >> 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from source
> code
> > [3].
> > >>
> > >> The vote will be open for at least 72 hours. It is adopted by majority
> > >> approval, with at least 3 PMC affirmative votes.
> > >>
> > >> Thanks,
> > >> Jincheng
> > >>
> > >> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
> > >> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >> [3] https://github.com/apache/flink/tree/release-1.9.2
> > >>
> > >>
> > >
> >
>


[jira] [Created] (FLINK-16012) Reduce the default number of exclusive buffers from 2 to 1 on receiver side

2020-02-11 Thread zhijiang (Jira)
zhijiang created FLINK-16012:


 Summary: Reduce the default number of exclusive buffers from 2 to 
1 on receiver side
 Key: FLINK-16012
 URL: https://issues.apache.org/jira/browse/FLINK-16012
 Project: Flink
  Issue Type: Improvement
Reporter: zhijiang


In order to reduce the inflight buffers for checkpoint in the case of back 
pressure, we can reduce the number of exclusive buffers for remote input 
channel from default 2 to 1 as the first step. Besides that, the total required 
buffers are also reduced as a result. We can further verify the performance 
effect via various of benchmarks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] Support scalar vectorized Python UDF in PyFlink

2020-02-11 Thread Dian Fu
Hi all,

I'd like to start the vote of FLIP-97[1] which is discussed and reached 
consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection, I 
will try to close it by Feb 17, 2020 08:00 UTC if we have received sufficient 
votes.

Regards,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-scalar-vectorized-Python-UDF-in-PyFlink-tt37264.html

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-11 Thread Jark Wu
+1

- checked/verified signatures and hashes
- Pip installed the package successfully: pip install
apache-flink-1.9.2.tar.gz
- Run word count example successfully through the documentation [1].

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/tutorials/python_table_api.html

On Tue, 11 Feb 2020 at 22:00, Hequn Cheng  wrote:

> +1 (non-binding)
>
> - Check signature and checksum.
> - Install package successfully with Pip under Python 3.7.4.
> - Run wordcount example successfully under Python 3.7.4.
>
> Best, Hequn
>
> On Tue, Feb 11, 2020 at 12:17 PM Dian Fu  wrote:
>
> > +1 (non-binding)
> >
> > - Verified the signature and checksum
> > - Pip installed the package successfully: pip install
> > apache-flink-1.9.2.tar.gz
> > - Run word count example successfully.
> >
> > Regards,
> > Dian
> >
> > 在 2020年2月11日,上午11:44,jincheng sun  写道:
> >
> >
> > +1 (binding)
> >
> > - Install the PyFlink by `pip install` [SUCCESS]
> > - Run word_count in both command line and IDE [SUCCESS]
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Wei Zhong  于2020年2月11日周二 上午11:17写道:
> >
> >> Hi,
> >>
> >> Thanks for driving this, Jincheng.
> >>
> >> +1 (non-binding)
> >>
> >> - Verified signatures and checksums.
> >> - Verified README.md and setup.py.
> >> - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and
> Python
> >> 3.7.5 successfully.
> >> - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via
> >> `pyflink-shell.sh local` and try the examples in the help message, run
> well
> >> and no exception.
> >> - Try a word count example in IDE with Python 2.7.15 and Python 3.7.5,
> >> run well and no exception.
> >>
> >> Best,
> >> Wei
> >>
> >>
> >> 在 2020年2月10日,19:12,jincheng sun  写道:
> >>
> >> Hi everyone,
> >>
> >> Please review and vote on the release candidate #1 for the PyFlink
> >> version 1.9.2, as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> The complete staging area is available for your review, which includes:
> >>
> >> * the official Apache binary convenience releases to be deployed to
> >> dist.apache.org [1], which are signed with the key with fingerprint
> >> 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from source code
> [3].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> Jincheng
> >>
> >> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
> >> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> [3] https://github.com/apache/flink/tree/release-1.9.2
> >>
> >>
> >
>


Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Dian Fu
Hi Dawid,

Thanks for your reply. I'm also in favor of "col" as a column expression in the 
Python Table API. Regarding to use "$" in the Java/Scala Table API, I'm fine 
with it. So +1 from my side.

Thanks,
Dian

> 在 2020年2月11日,下午9:48,Aljoscha Krettek  写道:
> 
> +1
> 
> Best,
> Aljoscha
> 
> On 11.02.20 11:17, Jingsong Li wrote:
>> Thanks Dawid for your explanation,
>> +1 for vote.
>> So I am big +1 to accepting java.lang.Object in the Java DSL, without
>> scala implicit conversion, a lot of "lit" look unfriendly to users.
>> Best,
>> Jingsong Lee
>> On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
>> wrote:
>>> Hi,
>>> 
>>> To answer some of the questions:
>>> 
>>> @Jingsong We use Objects in the java API to make it possible to use raw
>>> Objects without the need to wrap them in literals. If an expression is
>>> passed it is used as is. If anything else is used, it is assumed to be
>>> an literal and is wrapped into a literal. This way we can e.g. write
>>> $("f0").plus(1).
>>> 
>>> @Jark I think it makes sense to shorten them, I will do it I hope people
>>> that already voted don't mind.
>>> 
>>> @Dian That's a valid concern. I would not discard the '$' as a column
>>> expression for java and scala. I think once we introduce the expression
>>> DSL for python we can add another alias to java/scala. Personally I'd be
>>> in favor of col.
>>> 
>>> On 11/02/2020 10:41, Dian Fu wrote:
 Hi Dawid,
 
 Thanks for driving this feature. The design looks very well for me
>>> overall.
 
 I have only one concern: $ is not allowed to be used in the identifier
>>> of Python and so we have to come out with another symbol when aligning this
>>> feature in the Python Table API. I noticed that there are also other
>>> options proposed in the discussion thread, e.g. ref, col, etc. I think it
>>> would be great if the proposed symbol could be supported in both the
>>> Java/Scala and Python Table API. What's your thoughts?
 
 Regards,
 Dian
 
> 在 2020年2月11日,上午11:13,Jark Wu  写道:
> 
> +1 for this.
> 
> I have some minor comments:
> - I'm +1 to use $ in both Java and Scala API.
> - I'm +1 to use lit(), Spark also provides lit() function to create a
> literal value.
> - Is it possible to have `isGreater` instead of `isGreaterThan` and
> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in
>>> BaseExpressions?
> 
> Best,
> Jark
> 
> On Tue, 11 Feb 2020 at 10:21, Jingsong Li 
>>> wrote:
> 
>> Hi Dawid,
>> 
>> Thanks for driving.
>> 
>> - adding $ in scala api looks good to me.
>> - Just a question, what should be expected to java.lang.Object? literal
>> object or expression? So the Object is the grammatical sugar of
>>> literal?
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 
>>> wrote:
>> 
>>> +1 for this.
>>> 
>>> It will also help in making a TableEnvironment.fromElements() possible
>>> and reduces technical debt. One entry point of TypeInformation less in
>>> the API.
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 10.02.20 08:31, Dawid Wysakowicz wrote:
 Hi all,
 
 I wanted to resurrect the thread about introducing a Java Expression
 DSL. Please see the updated flip page[1]. Most of the flip was
>> concluded
 in previous discussion thread. The major changes since then are:
 
 * accepting java.lang.Object in the Java DSL
 
 * adding $ interpolation for a column in the Scala DSL
 
 I think it's important to move those changes forward as it makes it
 easier to transition to the new type system (Java parser supports
>>> only
 the old type system stack for now) that we are working on for the
>>> past
 releases.
 
 Because the previous discussion thread was rather conclusive I want
>>> to
 start already with a vote. If you think we need another round of
 discussion, feel free to say so.
 
 
 The vote will last for at least 72 hours, following the consensus
>> voting
 process.
 
 FLIP wiki:
 
 [1]
 
>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
 
 Discussion thread:
 
 
>> 
>>> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
 
 
 
>>> 
>> --
>> Best, Jingsong Lee
>> 
>>> 
>>> 



[jira] [Created] (FLINK-16011) Normalize the within usage in Pattern

2020-02-11 Thread shuai.xu (Jira)
shuai.xu created FLINK-16011:


 Summary: Normalize the within usage in Pattern
 Key: FLINK-16011
 URL: https://issues.apache.org/jira/browse/FLINK-16011
 Project: Flink
  Issue Type: Improvement
  Components: Library / CEP
Affects Versions: 1.9.0
Reporter: shuai.xu


In CEP, we can use Pattern.within() to set a window in which the pattern should 
be matched. However, the usage of within is ambiguous and confusing to user.

For example:
 # Pattern.begin("a").within(t1).followedBy("b").within(t2) will use the 
minimal of t1 and t2 as the window time for the whole pattern.
 # Pattern.begin("a").followedBy("b").within(t2) will use t2 as the window time.
 # But Pattern.begin("a").within(t1).followedBy("b") will have no window time
 # While 
Pattern.begin("a").notFollowedBy("not").within(t1).followedBy("b").within(t2) 
will use t2 as the window time.

So I propose to normalize the usage of within() and make strict checking when 
compiling the pattern. 

For example, we can only allow within() at the end of the pattern and point it 
out if user set it somewhere else when compiling the pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-11 Thread godfreyhe
Hi everyone,

I'd like to resume the discussion for FlIP-84 [0]. I had updated the
document, the mainly changes are:

1. about "`void sqlUpdate(String sql)`" section
  a) change "Optional executeSql(String sql) throws Exception"
to "ResultTable executeStatement(String statement, String jobName) throws
Exception". The reason is: "statement" is a more general concept than "sql",
e.g. "show xx" is not a sql command (refer to [1]), but is a statement (just
like JDBC). "insert" statement also has return value which is the affected
row count, we can unify the return type to "ResultTable" instead of
"Optional".
  b) add two sub-interfaces for "ResultTable": "RowResultTable" is used for
non-streaming select statement and will not contain change flag;
"RowWithChangeFlagResultTable" is used for streaming select statement and
will contain change flag.

2) about "Support batch sql execute and explain" section
introduce "DmlBatch" to support both sql and Table API (which is borrowed
from the ideas Dawid mentioned in the slack)

interface TableEnvironment {
DmlBatch startDmlBatch();
}

interface DmlBatch {
  /** 
  * add insert statement to the batch
  */
void addInsert(String insert);

 /** 
  * add Table with given sink name to the batch
  */
void addInsert(String sinkName, Table table);
   
 /** 
  * execute the dml statements as a batch
  */
  ResultTable execute(String jobName) throws Exception
   
  /** 
 * Returns the AST and the execution plan to compute the result of the batch
dml statement.
  */
  String explain(boolean extended);
}

3) about "Discuss a parse method for multiple statements execute in SQL CLI"
section
add the pros and cons for each solution

4) update the "Examples" section and "Summary" section based on the above
changes

Please refer the design doc[1] for more details and welcome any feedback.

Bests,
godfreyhe


[0]
https://docs.google.com/document/d/19-mdYJjKirh5aXCwq1fDajSaI09BJMMT95wy_YhtuZk/edit
[1] https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


[jira] [Created] (FLINK-16010) Support notFollowedBy with interval as the last part of a Pattern

2020-02-11 Thread shuai.xu (Jira)
shuai.xu created FLINK-16010:


 Summary: Support notFollowedBy with interval as the last part of a 
Pattern
 Key: FLINK-16010
 URL: https://issues.apache.org/jira/browse/FLINK-16010
 Project: Flink
  Issue Type: New Feature
  Components: Library / CEP
Affects Versions: 1.9.0
Reporter: shuai.xu


Now, Pattern.begin("a").notFollowedBy("b") is not allowed in CEP. But this a 
useful in many applications. Such as operators may want to find the users who 
created an order but didn't pay in 10 minutes.

So I propose to support that notFollowedBy() with a interval can be the last 
part of a Pattern. 

For example,  Pattern.begin("a").notFollowedBy("b").within(Time.minutes(10)) 
will be valid in the future.

Discuss in dev mail list is 
[https://lists.apache.org/thread.html/rc505728048663d672ad379578ac67d3219f6076986c80a2362802ebb%40%3Cdev.flink.apache.org%3E]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16009) Native support for the Variable-length and Zig-zag Variable-length integers

2020-02-11 Thread Yun Gao (Jira)
Yun Gao created FLINK-16009:
---

 Summary: Native support for the Variable-length and Zig-zag 
Variable-length integers
 Key: FLINK-16009
 URL: https://issues.apache.org/jira/browse/FLINK-16009
 Project: Flink
  Issue Type: Improvement
  Components: API / Type Serialization System
Reporter: Yun Gao


Currently Flink only support fixed-length integers. However, in many cases the 
values of the integer fields tend to be small, and we could reduce the size of 
serialized values by using [Variable length 
encoding|https://developers.google.com/protocol-buffers/docs/encoding#varints] 
and  [Zig-zag variable-length 
encoding|https://developers.google.com/protocol-buffers/docs/encoding#signed-integers].
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16008) Add rules to transpose the join condition of Python Correlate node

2020-02-11 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-16008:


 Summary: Add rules to transpose the  join condition of Python 
Correlate node
 Key: FLINK-16008
 URL: https://issues.apache.org/jira/browse/FLINK-16008
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.11.0


Because the conditions can’t be executed in Python correlate node, this rule 
will transpose the conditions after the Python correlate node if the join type 
is inner join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16007) Add rules to push down the Java Calls contained in Python Correlate node

2020-02-11 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-16007:


 Summary: Add rules to push down the Java Calls contained in Python 
Correlate node
 Key: FLINK-16007
 URL: https://issues.apache.org/jira/browse/FLINK-16007
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.11.0


The Java Calls contained in Python Correlate node should be extracted to make 
sure the TableFunction works well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16006) Add host blacklist support for Flink YarnResourceManager

2020-02-11 Thread Zhenqiu Huang (Jira)
Zhenqiu Huang created FLINK-16006:
-

 Summary: Add host blacklist support for Flink YarnResourceManager
 Key: FLINK-16006
 URL: https://issues.apache.org/jira/browse/FLINK-16006
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.10.0
Reporter: Zhenqiu Huang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16005) Propagate yarn.application.classpath from client to TaskManager Classpath

2020-02-11 Thread Zhenqiu Huang (Jira)
Zhenqiu Huang created FLINK-16005:
-

 Summary: Propagate yarn.application.classpath from client to 
TaskManager Classpath
 Key: FLINK-16005
 URL: https://issues.apache.org/jira/browse/FLINK-16005
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Zhenqiu Huang


When Flink users what to override the hadoop yarn container classpath, they 
should just specify the yarn.application.classpath in yarn-site.xml from cli 
side. But currently, the classpath setting can only be used in flink 
application master, the classpath of TM is still determined by the setting in 
yarn host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16004) Exclude flink-rocksdb-state-memory-control-test from the dist

2020-02-11 Thread Yu Li (Jira)
Yu Li created FLINK-16004:
-

 Summary: Exclude flink-rocksdb-state-memory-control-test from the 
dist
 Key: FLINK-16004
 URL: https://issues.apache.org/jira/browse/FLINK-16004
 Project: Flink
  Issue Type: Task
  Components: Tests
Affects Versions: 1.10.0
Reporter: Yu Li
 Fix For: 1.10.1, 1.11.0


Currently {{flink-rocksdb-state-memory-control-test}} will be included in the 
dist as shown 
[here|https://repository.apache.org/content/repositories/orgapacheflink-1333/org/apache/flink/flink-rocksdb-state-memory-control-test/].
 We should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[CANCELED][VOTE] Release flink-shaded 10.0, release candidate #1

2020-02-11 Thread Chesnay Schepler
The vote is hereby cancelled since the NOTICE file entries for the new 
zookeeper modules is not correct.


On 11/02/2020 16:43, Hequn Cheng wrote:

Hi Chesnay,

One thing needs to double-check with you.
It seems the zookeeper version is not correct in the NOTICE file
for flink-shaded-zookeeper-3.5. The version should be 3.5.6?

Best, Hequn

On Sun, Feb 9, 2020 at 6:31 PM Chesnay Schepler  wrote:


Hi everyone,
Please review and vote on the release candidate #1 for the version 10.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org
[2], which are signed with the key with fingerprint 11D464BA [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-10.0-rc1 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Chesnay

[1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346746
[2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1334
[5]

https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc1
[6] https://github.com/apache/flink-web/pull/304






Re: [VOTE] Release flink-shaded 10.0, release candidate #1

2020-02-11 Thread Hequn Cheng
Hi Chesnay,

One thing needs to double-check with you.
It seems the zookeeper version is not correct in the NOTICE file
for flink-shaded-zookeeper-3.5. The version should be 3.5.6?

Best, Hequn

On Sun, Feb 9, 2020 at 6:31 PM Chesnay Schepler  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 10.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which are signed with the key with fingerprint 11D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-10.0-rc1 [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346746
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-10.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1334
> [5]
>
> https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=tag;h=refs/tags/release-10.0-rc1
> [6] https://github.com/apache/flink-web/pull/304
>
>


[jira] [Created] (FLINK-16003) ShardConsumer errors cannot be logged

2020-02-11 Thread Ori Popowski (Jira)
Ori Popowski created FLINK-16003:


 Summary: ShardConsumer errors cannot be logged
 Key: FLINK-16003
 URL: https://issues.apache.org/jira/browse/FLINK-16003
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.6.3
Reporter: Ori Popowski


Some of the errors in the Kinesis connector show on Flink UI, but are not 
logged. This causes a serious problem since we cannot see them in our logging 
aggregation platform, and we cannot create alerts on them. One of the errors is 
the following:

 
{code:java}
java.lang.RuntimeException: Rate Exceeded for getRecords operation - all 3 
retry attempts returned ProvisionedThroughputExceededException.
 at 
org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getRecords(KinesisProxy.java:234)
 at 
org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.getRecords(ShardConsumer.java:311)
 at 
org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:219)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748){code}
 

 

The code shows that this exception is not caught and hence is not logged and 
can be detected only from the "exceptions" tab in Flink UI. People who use the 
connector cannot leverage logging and metrics when such an exception occurs.

It can be useful to catch all the throwables from the ShardConsumer's run() 
method and log them.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-11 Thread Gyula Fóra
Hi All!

I have made a prototype that simply adds a getPipeline() method to the
JobClient interface. Then I could easily implement the Atlas hook using the
JobListener interface. I simply check if Pipeline is instanceof StreamGraph
and do the logic there.

I think this is so far the cleanest approach and I much prefer this
compared to working on the JobGraph directly which would expose even more
messy internals.

Unfortunately this change alone is not enough for the integration as we
need to make sure that all Sources/Sinks that we want to integrate to atlas
publicly expose some of their properties:

   - Kafka source/sink:
  - Kafka props
  - Topic(s) - this is tricky for sinks
   - FS source /sink:
  - Hadoop props
  - Base path for StreamingFileSink
  - Path for ContinuousMonitoringSource

Most of these are straightforward changes, the only question is what we
want to register in Atlas from the available connectors. Ideally users
could also somehow register their own Atlas metadata for custom sources and
sinks, we could probably introduce an interface for that in Atlas.

Cheers,
Gyula

On Fri, Feb 7, 2020 at 10:37 AM Gyula Fóra  wrote:

> Maybe we could improve the Pipeline interface in the long run, but as a
> temporary solution the JobClient could expose a getPipeline() method.
>
> That way the implementation of the JobListener could check if its a
> StreamGraph or a Plan.
>
> How bad does that sound?
>
> Gyula
>
> On Fri, Feb 7, 2020 at 10:19 AM Gyula Fóra  wrote:
>
>> Hi Aljoscha!
>>
>> That's a valid concert but we should try to figure something out, many
>> users need this before they can use Flink.
>>
>> I think the closest thing we have right now is the StreamGraph. In
>> contrast with the JobGraph  the StreamGraph is pretty nice from a metadata
>> perspective :D
>> The big downside of exposing the StreamGraph is that we don't have it in
>> batch. On the other hand we could expose the JobGraph but then the
>> integration component would still have to do the heavy lifting for batch
>> and stream specific operators and UDFs.
>>
>> Instead of exposing either StreamGraph/JobGraph, we could come up with a
>> metadata like representation for the users but that would be like
>> implementing Atlas integration itself without Atlas dependencies :D
>>
>> As a comparison point, this is how it works in Storm:
>> Every operator (spout/bolt), stores a config map (string->string) with
>> all the metadata such as operator class, and the operator specific configs.
>> The Atlas hook works on this map.
>> This is very fragile and depends on a lot of internals. Kind of like
>> exposing the JobGraph but much worse. I think we can do better.
>>
>> Gyula
>>
>> On Fri, Feb 7, 2020 at 9:55 AM Aljoscha Krettek 
>> wrote:
>>
>>> If we need it, we can probably beef up the JobListener to allow
>>> accessing some information about the whole graph or sources and sinks.
>>> My only concern right now is that we don't have a stable interface for
>>> our job graphs/pipelines right now.
>>>
>>> Best,
>>> Aljoscha
>>>
>>> On 06.02.20 23:00, Gyula Fóra wrote:
>>> > Hi Jeff & Till!
>>> >
>>> > Thanks for the feedback, this is exactly the discussion I was looking
>>> for.
>>> > The JobListener looks very promising if we can expose the JobGraph
>>> somehow
>>> > (correct me if I am wrong but it is not accessible at the moment).
>>> >
>>> > I did not know about this feature that's why I added my JobSubmission
>>> hook
>>> > which was pretty similar but only exposing the JobGraph. In general I
>>> like
>>> > the listener better and I would not like to add anything extra if we
>>> can
>>> > avoid it.
>>> >
>>> > Actually the bigger part of the integration work that will need more
>>> > changes in Flink will be regarding the accessibility of sources/sinks
>>> from
>>> > the JobGraph and their specific properties. For instance at the moment
>>> the
>>> > Kafka sources and sinks do not expose anything publicly such as topics,
>>> > kafka configs, etc. Same goes for other data connectors that we need to
>>> > integrate in the long run. I guess there will be a separate thread on
>>> this
>>> > once we iron out the initial integration points :)
>>> >
>>> > I will try to play around with the JobListener interface tomorrow and
>>> see
>>> > if I can extend it to meet our needs.
>>> >
>>> > Cheers,
>>> > Gyula
>>> >
>>> > On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:
>>> >
>>> >> Hi Gyula,
>>> >>
>>> >> Flink 1.10 introduced JobListener which is invoked after job
>>> submission and
>>> >> finished.  May we can add api on JobClient to get what info you
>>> needed for
>>> >> altas integration.
>>> >>
>>> >>
>>> >>
>>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46
>>> >>
>>> >>
>>> >> Gyula Fóra  于2020年2月5日周三 下午7:48写道:
>>> >>
>>> >>> Hi all!
>>> >>>
>>> >>> We have started some preliminary work on the Flink - Atlas
>>> integration at
>>> >>> Cloudera. I

Add support for IAM Roles for Service Accounts in AWS EKS (Web Identity)

2020-02-11 Thread Rafi Aroch
Hi,

IAM Roles for Service Accounts have many advantages when deploying Flink on
AWS EKS.

>From AWS documentation:

*With IAM roles for service accounts on Amazon EKS clusters, you can
> associate an IAM role with a Kubernetes service account. This service
> account can then provide AWS permissions to the containers in any pod that
> uses that service account. With this feature, you no longer need to provide
> extended permissions to the worker node IAM role so that pods on that node
> can call AWS APIs.*


As Kubernetes becomes the popular deployment method, I believe we should
support this capability.

In order for IAM Roles for Service Accounts to work, I see two necessary
changes:

   - Bump the AWS SDK version to at least:  1.11.623.
   - Add dependency to AWS STS in order for the assume-role to work.

This is relevant for S3 Filesystem & Kinesis modules.

There is already an issue open:
https://issues.apache.org/jira/browse/FLINK-14881

Can I go ahead and create a pull request to add this?

Thanks,
Rafi


[jira] [Created] (FLINK-16002) Add stateful functions test utilities

2020-02-11 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-16002:


 Summary: Add stateful functions test utilities
 Key: FLINK-16002
 URL: https://issues.apache.org/jira/browse/FLINK-16002
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Seth Wiesman


The stateful functions SDK does not have any dependencies on the Flink runtime. 
We should offer some test utilities so that provide a minimal syncronous 
runtime implementation to facilitate simple unit testing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16001) Avoid using Java Streams in construction of ExecutionGraph

2020-02-11 Thread Jiayi Liao (Jira)
Jiayi Liao created FLINK-16001:
--

 Summary: Avoid using Java Streams in construction of ExecutionGraph
 Key: FLINK-16001
 URL: https://issues.apache.org/jira/browse/FLINK-16001
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Jiayi Liao


I think we should avoid {{Java Streams}} in construction of {{ExecutionGraph}} 
like function {{toPipelinedRegionsSet}} in {{PipelinedRegionComputeUtil}} 
because the job submission is definitely performance sensitive, especially when 
{{distinctRegions}} has a large cardinality.

Also includes some other places in package 
{{org.apache.flink.runtime.executiongraph}}

cc [~trohrmann] [~zhuzh] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-11 Thread Hequn Cheng
+1 (non-binding)

- Check signature and checksum.
- Install package successfully with Pip under Python 3.7.4.
- Run wordcount example successfully under Python 3.7.4.

Best, Hequn

On Tue, Feb 11, 2020 at 12:17 PM Dian Fu  wrote:

> +1 (non-binding)
>
> - Verified the signature and checksum
> - Pip installed the package successfully: pip install
> apache-flink-1.9.2.tar.gz
> - Run word count example successfully.
>
> Regards,
> Dian
>
> 在 2020年2月11日,上午11:44,jincheng sun  写道:
>
>
> +1 (binding)
>
> - Install the PyFlink by `pip install` [SUCCESS]
> - Run word_count in both command line and IDE [SUCCESS]
>
> Best,
> Jincheng
>
>
>
> Wei Zhong  于2020年2月11日周二 上午11:17写道:
>
>> Hi,
>>
>> Thanks for driving this, Jincheng.
>>
>> +1 (non-binding)
>>
>> - Verified signatures and checksums.
>> - Verified README.md and setup.py.
>> - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python
>> 3.7.5 successfully.
>> - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via
>> `pyflink-shell.sh local` and try the examples in the help message, run well
>> and no exception.
>> - Try a word count example in IDE with Python 2.7.15 and Python 3.7.5,
>> run well and no exception.
>>
>> Best,
>> Wei
>>
>>
>> 在 2020年2月10日,19:12,jincheng sun  写道:
>>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the PyFlink
>> version 1.9.2, as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> The complete staging area is available for your review, which includes:
>>
>> * the official Apache binary convenience releases to be deployed to
>> dist.apache.org [1], which are signed with the key with fingerprint
>> 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from source code [3].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Jincheng
>>
>> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
>> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
>> [3] https://github.com/apache/flink/tree/release-1.9.2
>>
>>
>


[jira] [Created] (FLINK-16000) Move "Project Build Setup" to "Getting Started" in documentation

2020-02-11 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-16000:


 Summary: Move "Project Build Setup" to "Getting Started" in 
documentation
 Key: FLINK-16000
 URL: https://issues.apache.org/jira/browse/FLINK-16000
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Aljoscha Krettek

+1

Best,
Aljoscha

On 11.02.20 11:17, Jingsong Li wrote:

Thanks Dawid for your explanation,

+1 for vote.

So I am big +1 to accepting java.lang.Object in the Java DSL, without
scala implicit conversion, a lot of "lit" look unfriendly to users.

Best,
Jingsong Lee

On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
wrote:


Hi,

To answer some of the questions:

@Jingsong We use Objects in the java API to make it possible to use raw
Objects without the need to wrap them in literals. If an expression is
passed it is used as is. If anything else is used, it is assumed to be
an literal and is wrapped into a literal. This way we can e.g. write
$("f0").plus(1).

@Jark I think it makes sense to shorten them, I will do it I hope people
that already voted don't mind.

@Dian That's a valid concern. I would not discard the '$' as a column
expression for java and scala. I think once we introduce the expression
DSL for python we can add another alias to java/scala. Personally I'd be
in favor of col.

On 11/02/2020 10:41, Dian Fu wrote:

Hi Dawid,

Thanks for driving this feature. The design looks very well for me

overall.


I have only one concern: $ is not allowed to be used in the identifier

of Python and so we have to come out with another symbol when aligning this
feature in the Python Table API. I noticed that there are also other
options proposed in the discussion thread, e.g. ref, col, etc. I think it
would be great if the proposed symbol could be supported in both the
Java/Scala and Python Table API. What's your thoughts?


Regards,
Dian


在 2020年2月11日,上午11:13,Jark Wu  写道:

+1 for this.

I have some minor comments:
- I'm +1 to use $ in both Java and Scala API.
- I'm +1 to use lit(), Spark also provides lit() function to create a
literal value.
- Is it possible to have `isGreater` instead of `isGreaterThan` and
`isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in

BaseExpressions?


Best,
Jark

On Tue, 11 Feb 2020 at 10:21, Jingsong Li 

wrote:



Hi Dawid,

Thanks for driving.

- adding $ in scala api looks good to me.
- Just a question, what should be expected to java.lang.Object? literal
object or expression? So the Object is the grammatical sugar of

literal?


Best,
Jingsong Lee

On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 

wrote:



+1 for this.

It will also help in making a TableEnvironment.fromElements() possible
and reduces technical debt. One entry point of TypeInformation less in
the API.

Regards,
Timo


On 10.02.20 08:31, Dawid Wysakowicz wrote:

Hi all,

I wanted to resurrect the thread about introducing a Java Expression
DSL. Please see the updated flip page[1]. Most of the flip was

concluded

in previous discussion thread. The major changes since then are:

* accepting java.lang.Object in the Java DSL

* adding $ interpolation for a column in the Scala DSL

I think it's important to move those changes forward as it makes it
easier to transition to the new type system (Java parser supports

only

the old type system stack for now) that we are working on for the

past

releases.

Because the previous discussion thread was rather conclusive I want

to

start already with a vote. If you think we need another round of
discussion, feel free to say so.


The vote will last for at least 72 hours, following the consensus

voting

process.

FLIP wiki:

[1]




https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL


Discussion thread:





https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E







--
Best, Jingsong Lee








[jira] [Created] (FLINK-15999) Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-11 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-15999:


 Summary: Extract “Concepts” material from API/Library sections and 
start proper concepts section
 Key: FLINK-15999
 URL: https://issues.apache.org/jira/browse/FLINK-15999
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15998) Revert rename of "Job Cluster" to "Application Cluster" in documentation

2020-02-11 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-15998:


 Summary: Revert rename of "Job Cluster" to "Application Cluster" 
in documentation
 Key: FLINK-15998
 URL: https://issues.apache.org/jira/browse/FLINK-15998
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.10.0, 1.11.0
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 1.10.1, 1.11.0


[~kkl0u] and I are working on an upcoming FLIP that will propose a "real" 
application mode. The gist of it will be that an application cluster is roughly 
responsible for all the jobs that are executed in a user {{main()}} method. 
Each individual invocation of {{execute()}} in said {{main()}} function would 
launch a job on the aforementioned application cluster. The {{main()}} method 
would run inside the cluster (or not, that's up for discussion).

 FLINK-12625 introduced a glossary that describes what was previously known as 
a "Job Cluster" as an "Application Cluster". We think that in the future users 
will have both: per-job clusters and application clusters. We therefore think 
that we should describe our current per-job clusters as Per-job clusters (or 
job clusters) in the glossary and documentation and reserve the name 
application cluster for the "real" forthcoming application clusters.

This would also render FLINK-12885 outdated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15997) Make documentation 404 page look like a documentation page

2020-02-11 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-15997:


 Summary: Make documentation 404 page look like a documentation page
 Key: FLINK-15997
 URL: https://issues.apache.org/jira/browse/FLINK-15997
 Project: Flink
  Issue Type: Sub-task
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15996) Provide DataStream API for JDBC sinks

2020-02-11 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-15996:
-

 Summary: Provide DataStream API for JDBC sinks
 Key: FLINK-15996
 URL: https://issues.apache.org/jira/browse/FLINK-15996
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: Roman Khachatryan


Currently, JDBC sinks can be used with Table API.

For consistency (with planned exactly once datastream sink) and code reuse; 
expose them at the DataStream API level.

This involves refactoring of the implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15995) TO_BASE64 sql operator change operandTypeChecker from OperandTypes.ANY to OperandTypes.family(SqlTypeFamily.STRING)

2020-02-11 Thread hailong wang (Jira)
hailong wang created FLINK-15995:


 Summary: TO_BASE64 sql operator change operandTypeChecker from 
OperandTypes.ANY to OperandTypes.family(SqlTypeFamily.STRING) 
 Key: FLINK-15995
 URL: https://issues.apache.org/jira/browse/FLINK-15995
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.10.0
Reporter: hailong wang
 Fix For: 1.11.0


For the TO_BASE64 sql operator, the operandTypeChecker is OperandTypes.ANY 
which is too large. I think we should change it to 
OperandTypes.family(SqlTypeFamily.STRING) .

For if users use 

 
{code:java}
testSqlApi("to_base64(11)", "AQIDBA=="){code}
 

it will throw 
{code:java}
Caused by: org.apache.flink.api.common.InvalidProgramException: Table program 
cannot be compiled. This is a bug. Please file an issue.Caused by: 
org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
compiled. This is a bug. Please file an issue. at 
org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:81)
 at 
org.apache.flink.table.runtime.generated.CompileUtils.lambda$compile$1(CompileUtils.java:66)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
 ... 30 moreCaused by: org.codehaus.commons.compiler.CompileException: Line 
206, Column 150: No applicable constructor/method found for actual parameters 
"int"; candidates are: "public static java.lang.String 
org.apache.flink.table.runtime.functions.SqlFunctionUtils.toBase64(org.apache.flink.table.dataformat.BinaryString)",
 "public static java.lang.String 
org.apache.flink.table.runtime.functions.SqlFunctionUtils.toBase64(byte[])" at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)
{code}
it is confusing to users

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15994) Support byte array arguments for FROM_BASE64 function

2020-02-11 Thread hailong wang (Jira)
hailong wang created FLINK-15994:


 Summary: Support byte array arguments for FROM_BASE64 function
 Key: FLINK-15994
 URL: https://issues.apache.org/jira/browse/FLINK-15994
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.10.0
Reporter: hailong wang
 Fix For: 1.11.0


For the following sql,
{code:java}
testSqlApi("FROM_BASE64(cast(x'6147567362473867643239796247513D' as 
varbinary))",
 "hello world"){code}
it will throw:
{code:java}
Caused by: 
org.apache.flink.shaded.guava18.com.google.common.util.concurrent.UncheckedExecutionException:
 org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
compiled. This is a bug. Please file an issue.Caused by: 
org.apache.flink.shaded.guava18.com.google.common.util.concurrent.UncheckedExecutionException:
 org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
compiled. This is a bug. Please file an issue. at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
 at 
org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:66)
 ... 27 moreCaused by: org.apache.flink.api.common.InvalidProgramException: 
Table program cannot be compiled. This is a bug. Please file an issue. at 
org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:81)
 at 
org.apache.flink.table.runtime.generated.CompileUtils.lambda$compile$1(CompileUtils.java:66)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
 at 
org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
 ... 30 moreCaused by: org.codehaus.commons.compiler.CompileException: Line 
114, Column 94: No applicable constructor/method found for actual parameters 
"byte[]"; candidates are: "public static 
org.apache.flink.table.dataformat.BinaryString 
org.apache.flink.table.runtime.functions.SqlFunctionUtils.fromBase64(org.apache.flink.table.dataformat.BinaryString)"
 at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124) at 
org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9176)
 at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9036) at 
org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:8938)
{code}
For the reason that we did not implement SqlFunctionUtils.fromBase64(byte[] 
bytes) method.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15993) Add timeout to 404 documentation redirect, add explanation

2020-02-11 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-15993:


 Summary: Add timeout to 404 documentation redirect, add explanation
 Key: FLINK-15993
 URL: https://issues.apache.org/jira/browse/FLINK-15993
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Support scalar vectorized Python UDF in PyFlink

2020-02-11 Thread Dian Fu
Hi all,

Thanks you all participating this discussion and sharing your thoughts. It
seems that we have reached consensus on the design now. I will start a VOTE
thread if there are no other feedbacks.

Thanks,
Dian

On Tue, Feb 11, 2020 at 10:23 AM Dian Fu  wrote:

> Hi Jingsong,
>
> You're right. I have updated the FLIP which reflects this.
>
> Thanks,
> Dian
>
> > 在 2020年2月11日,上午10:03,Jingsong Li  写道:
> >
> > Hi Dian and Jincheng,
> >
> > Thanks for your explanation. Think again. Maybe most of users don't want
> to
> > modify this parameters.
> > We all realize that "batch.size" should be a larger value, so
> "bundle.size"
> > must also be increased. Now the default value of "bundle.size" is only
> 1000.
> > I think you can update design to provide meaningful default value for
> > "batch.size" and "bundle.size".
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Feb 10, 2020 at 4:36 PM Dian Fu  wrote:
> >
> >> Hi Jincheng, Hequn & Jingsong,
> >>
> >> Thanks a lot for your suggestions. I have created FLIP-97[1] for this
> >> feature.
> >>
> >>> One little suggestion: maybe it would be nice if we can add some
> >> performance explanation in the document? (I just very curious:))
> >> Thanks for the suggestion. I have updated the design doc in the
> >> "BackGround" section about where the performance gains could be got
> from.
> >>
> >>> It seems that a batch should always in a bundle. Bundle size should
> >> always
> >> bigger than batch size. (if a batch can not cross bundle).
> >> Can you explain this relationship to the document?
> >> I have updated the design doc explaining more about these two
> >> configurations.
> >>
> >>> In the batch world, vectorization batch size is about 1024+. What do
> you
> >> think about the default value of "batch"?
> >> Is there any link about where this value comes from? I have performed a
> >> simple test for Pandas UDF which performs the simple +1 operation. The
> >> performance is best when the batch size is set to 5000. I think it
> depends
> >> on the data type of each column, the functionality the Pandas UDF does,
> >> etc. However I agree with you that we could give a meaningful default
> value
> >> for the "batch" size which works in most scenarios.
> >>
> >>> Can we only configure one parameter and calculate another
> automatically?
> >> For example, if we just want to "pipeline", "bundle.size" is twice as
> much
> >> as "batch.size", is this work?
> >> I agree with Jincheng that this is not feasible. I think that giving an
> >> meaningful default value for the "batch.size" which works in most
> scenarios
> >> is enough. What's your thought?
> >>
> >> Thanks,
> >> Dian
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink
> >>
> >>
> >> On Mon, Feb 10, 2020 at 4:25 PM jincheng sun 
> >> wrote:
> >>
> >>> Hi Jingsong,
> >>>
> >>> Thanks for your feedback! I would like to share my thoughts regarding
> the
> >>> follows question:
> >>>
> > - Can we only configure one parameter and calculate another
> >>> automatically? For example, if we just want to "pipeline",
> "bundle.size"
> >> is
> >>> twice as much as "batch.size", is this work?
> >>>
> >>> I don't think this works. These two configurations are used for
> different
> >>> purposes and there is no direct relationship between them and so I
> guess
> >> we
> >>> cannot infer a configuration from the other configuration.
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>>
> >>> Jingsong Li  于2020年2月10日周一 下午1:53写道:
> >>>
>  Thanks Dian for your reply.
> 
>  +1 to create a FLIP too.
> 
>  About "python.fn-execution.bundle.size" and
>  "python.fn-execution.arrow.batch.size", I got what are you mean about
>  "pipeline". I agree.
>  It seems that a batch should always in a bundle. Bundle size should
> >>> always
>  bigger than batch size. (if a batch can not cross bundle).
>  Can you explain this relationship to the document?
> 
>  I think default value is a very important thing, we can discuss:
>  - In the batch world, vectorization batch size is about 1024+. What do
> >>> you
>  think about the default value of "batch"?
>  - Can we only configure one parameter and calculate another
> >>> automatically?
>  For example, if we just want to "pipeline", "bundle.size" is twice as
> >>> much
>  as "batch.size", is this work?
> 
>  Best,
>  Jingsong Lee
> 
>  On Mon, Feb 10, 2020 at 11:55 AM Hequn Cheng 
> wrote:
> 
> > Hi Dian,
> >
> > Thanks a lot for bringing up the discussion!
> >
> > It is great to see the Pandas UDFs feature is going to be
> >> introduced. I
> > think this would improve the performance and also the usability of
> > user-defined functions (UDFs) in Python.
> > One little suggestion: maybe it would be nice if we can add some
> > performance explanation in the document? (I just very curious:))
> >
> 

Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Jingsong Li
Thanks Dawid for your explanation,

+1 for vote.

So I am big +1 to accepting java.lang.Object in the Java DSL, without
scala implicit conversion, a lot of "lit" look unfriendly to users.

Best,
Jingsong Lee

On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
wrote:

> Hi,
>
> To answer some of the questions:
>
> @Jingsong We use Objects in the java API to make it possible to use raw
> Objects without the need to wrap them in literals. If an expression is
> passed it is used as is. If anything else is used, it is assumed to be
> an literal and is wrapped into a literal. This way we can e.g. write
> $("f0").plus(1).
>
> @Jark I think it makes sense to shorten them, I will do it I hope people
> that already voted don't mind.
>
> @Dian That's a valid concern. I would not discard the '$' as a column
> expression for java and scala. I think once we introduce the expression
> DSL for python we can add another alias to java/scala. Personally I'd be
> in favor of col.
>
> On 11/02/2020 10:41, Dian Fu wrote:
> > Hi Dawid,
> >
> > Thanks for driving this feature. The design looks very well for me
> overall.
> >
> > I have only one concern: $ is not allowed to be used in the identifier
> of Python and so we have to come out with another symbol when aligning this
> feature in the Python Table API. I noticed that there are also other
> options proposed in the discussion thread, e.g. ref, col, etc. I think it
> would be great if the proposed symbol could be supported in both the
> Java/Scala and Python Table API. What's your thoughts?
> >
> > Regards,
> > Dian
> >
> >> 在 2020年2月11日,上午11:13,Jark Wu  写道:
> >>
> >> +1 for this.
> >>
> >> I have some minor comments:
> >> - I'm +1 to use $ in both Java and Scala API.
> >> - I'm +1 to use lit(), Spark also provides lit() function to create a
> >> literal value.
> >> - Is it possible to have `isGreater` instead of `isGreaterThan` and
> >> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in
> BaseExpressions?
> >>
> >> Best,
> >> Jark
> >>
> >> On Tue, 11 Feb 2020 at 10:21, Jingsong Li 
> wrote:
> >>
> >>> Hi Dawid,
> >>>
> >>> Thanks for driving.
> >>>
> >>> - adding $ in scala api looks good to me.
> >>> - Just a question, what should be expected to java.lang.Object? literal
> >>> object or expression? So the Object is the grammatical sugar of
> literal?
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 
> wrote:
> >>>
>  +1 for this.
> 
>  It will also help in making a TableEnvironment.fromElements() possible
>  and reduces technical debt. One entry point of TypeInformation less in
>  the API.
> 
>  Regards,
>  Timo
> 
> 
>  On 10.02.20 08:31, Dawid Wysakowicz wrote:
> > Hi all,
> >
> > I wanted to resurrect the thread about introducing a Java Expression
> > DSL. Please see the updated flip page[1]. Most of the flip was
> >>> concluded
> > in previous discussion thread. The major changes since then are:
> >
> > * accepting java.lang.Object in the Java DSL
> >
> > * adding $ interpolation for a column in the Scala DSL
> >
> > I think it's important to move those changes forward as it makes it
> > easier to transition to the new type system (Java parser supports
> only
> > the old type system stack for now) that we are working on for the
> past
> > releases.
> >
> > Because the previous discussion thread was rather conclusive I want
> to
> > start already with a vote. If you think we need another round of
> > discussion, feel free to say so.
> >
> >
> > The vote will last for at least 72 hours, following the consensus
> >>> voting
> > process.
> >
> > FLIP wiki:
> >
> > [1]
> >
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> >
> > Discussion thread:
> >
> >
> >>>
> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
> >
> >
> >
> 
> >>> --
> >>> Best, Jingsong Lee
> >>>
>
>

-- 
Best, Jingsong Lee


[jira] [Created] (FLINK-15992) Incorrect classloader when finding TableFactory

2020-02-11 Thread Victor Wong (Jira)
Victor Wong created FLINK-15992:
---

 Summary: Incorrect classloader when finding TableFactory
 Key: FLINK-15992
 URL: https://issues.apache.org/jira/browse/FLINK-15992
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Victor Wong


*Background*

As a streaming service maintainer in our company, to ensure our users depend on 
the correct version of Kafka and flink-kafka, we add "flink-connector-kafka" 
into "fink-dist/lib" directory.

*Problem*

When submitting flink-sql jobs, we encountered below exceptions:
{code:java}
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could 
not find a suitable table factory for 
'org.apache.flink.table.factories.DeserializationSchemaFactory' in
the classpath.
{code}

*Debug*

We find that it was caused by this:

{code:java}
// 
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase#getSerializationSchema

final SerializationSchemaFactory formatFactory = 
TableFactoryService.find(
SerializationSchemaFactory.class,
properties,
this.getClass().getClassLoader());
{code}

It uses `this.getClass().getClassLoader()`, which will be BootStrapClassLoader 
of fink.
We could replace it with `Thread.currentThread().getContextClassLoader()` to 
solve this.

There is a related issue: https://issues.apache.org/jira/browse/FLINK-15552





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15991) Create Chinese documentation for FLIP-49 TM memory model

2020-02-11 Thread Andrey Zagrebin (Jira)
Andrey Zagrebin created FLINK-15991:
---

 Summary: Create Chinese documentation for FLIP-49 TM memory model
 Key: FLINK-15991
 URL: https://issues.apache.org/jira/browse/FLINK-15991
 Project: Flink
  Issue Type: Task
  Components: Documentation
Reporter: Andrey Zagrebin
Assignee: Xintong Song
 Fix For: 1.10.0, 1.11.0


Chinese translation of FLINK-15143



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Dawid Wysakowicz
Hi,

To answer some of the questions:

@Jingsong We use Objects in the java API to make it possible to use raw
Objects without the need to wrap them in literals. If an expression is
passed it is used as is. If anything else is used, it is assumed to be
an literal and is wrapped into a literal. This way we can e.g. write
$("f0").plus(1).

@Jark I think it makes sense to shorten them, I will do it I hope people
that already voted don't mind.

@Dian That's a valid concern. I would not discard the '$' as a column
expression for java and scala. I think once we introduce the expression
DSL for python we can add another alias to java/scala. Personally I'd be
in favor of col.

On 11/02/2020 10:41, Dian Fu wrote:
> Hi Dawid,
>
> Thanks for driving this feature. The design looks very well for me overall.
>
> I have only one concern: $ is not allowed to be used in the identifier of 
> Python and so we have to come out with another symbol when aligning this 
> feature in the Python Table API. I noticed that there are also other options 
> proposed in the discussion thread, e.g. ref, col, etc. I think it would be 
> great if the proposed symbol could be supported in both the Java/Scala and 
> Python Table API. What's your thoughts?
>
> Regards,
> Dian
>
>> 在 2020年2月11日,上午11:13,Jark Wu  写道:
>>
>> +1 for this.
>>
>> I have some minor comments:
>> - I'm +1 to use $ in both Java and Scala API.
>> - I'm +1 to use lit(), Spark also provides lit() function to create a
>> literal value.
>> - Is it possible to have `isGreater` instead of `isGreaterThan` and
>> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in BaseExpressions?
>>
>> Best,
>> Jark
>>
>> On Tue, 11 Feb 2020 at 10:21, Jingsong Li  wrote:
>>
>>> Hi Dawid,
>>>
>>> Thanks for driving.
>>>
>>> - adding $ in scala api looks good to me.
>>> - Just a question, what should be expected to java.lang.Object? literal
>>> object or expression? So the Object is the grammatical sugar of literal?
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther  wrote:
>>>
 +1 for this.

 It will also help in making a TableEnvironment.fromElements() possible
 and reduces technical debt. One entry point of TypeInformation less in
 the API.

 Regards,
 Timo


 On 10.02.20 08:31, Dawid Wysakowicz wrote:
> Hi all,
>
> I wanted to resurrect the thread about introducing a Java Expression
> DSL. Please see the updated flip page[1]. Most of the flip was
>>> concluded
> in previous discussion thread. The major changes since then are:
>
> * accepting java.lang.Object in the Java DSL
>
> * adding $ interpolation for a column in the Scala DSL
>
> I think it's important to move those changes forward as it makes it
> easier to transition to the new type system (Java parser supports only
> the old type system stack for now) that we are working on for the past
> releases.
>
> Because the previous discussion thread was rather conclusive I want to
> start already with a vote. If you think we need another round of
> discussion, feel free to say so.
>
>
> The vote will last for at least 72 hours, following the consensus
>>> voting
> process.
>
> FLIP wiki:
>
> [1]
>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
>
> Discussion thread:
>
>
>>> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
>
>
>

>>> --
>>> Best, Jingsong Lee
>>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-11 Thread Stephan Ewen
IIRC, Guowei wants to work on supporting Table API connectors in Plugins.
With that, we could have the Hive dependency as a plugin, avoiding
dependency conflicts.

On Thu, Feb 6, 2020 at 1:11 PM Jingsong Li  wrote:

> Hi Stephan,
>
> Good idea. Just like hadoop, we can have flink-shaded-hive-uber.
> Then the startup of hive integration will be very simple with one or two
> pre-bundled, user just add these dependencies:
> - flink-connector-hive.jar
> - flink-shaded-hive-uber-.jar
>
> Some changes are needed, but I think it should work.
>
> Another thing is can we put flink-connector-hive.jar into flink/lib, it
> should clean and no dependencies.
>
> Best,
> Jingsong Lee
>
> On Thu, Feb 6, 2020 at 7:13 PM Stephan Ewen  wrote:
>
>> Hi Jingsong!
>>
>> This sounds that with two pre-bundled versions (hive 1.2.1 and hive
>> 2.3.6) you can cover a lot of versions.
>>
>> Would it make sense to add these to flink-shaded (with proper dependency
>> exclusions of unnecessary dependencies) and offer them as a download,
>> similar as we offer pre-shaded Hadoop downloads?
>>
>> Best,
>> Stephan
>>
>>
>> On Thu, Feb 6, 2020 at 10:26 AM Jingsong Li 
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> The hive/lib/ has many jars, this lib is for execution, metastore, hive
>>> client and all things.
>>> What we really depend on is hive-exec.jar. (hive-metastore.jar is also
>>> required in the low version hive)
>>> And hive-exec.jar is a uber jar. We just want half classes of it. These
>>> half classes are not so clean, but it is OK to have them.
>>>
>>> Our solution now:
>>> - exclude hive jars from build
>>> - provide 8 versions dependencies way, user choose by his hive
>>> version.[1]
>>>
>>> Spark's solution:
>>> - build-in hive 1.2.1 dependencies to support hive 0.12.0 through 2.3.3.
>>> [2]
>>> - hive-exec.jar is hive-exec.spark.jar, Spark has modified the
>>> hive-exec build pom to exclude unnecessary classes including Orc and
>>> parquet.
>>> - build-in orc and parquet dependencies to optimizer performance.
>>> - support hive version 2.3.3 upper by "mvn install -Phive-2.3", to
>>> built-in hive-exec-2.3.6.jar. It seems that since this version, hive's API
>>> has been seriously incompatible.
>>> Most of the versions used by users are hive 0.12.0 through 2.3.3. So the
>>> default build of Spark is good to most of users.
>>>
>>> Presto's solution:
>>> - Built-in presto's hive.[3] Shade hive classes instead of thrift
>>> classes.
>>> - Rewrite some client related code to solve kinds of issues.
>>> This approach is the heaviest, but also the cleanest. It can support all
>>> kinds of hive versions with one build.
>>>
>>> So I think we can do:
>>>
>>> - The eight versions we now maintain are too many. I think we can move
>>> forward in the direction of Presto/Spark and try to reduce dependencies
>>> versions.
>>>
>>> - As your said, about provide fat/uber jars or helper script, I prefer
>>> uber jars, user can download one jar to their startup. Just like Kafka.
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
>>> [2]
>>> https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
>>> [3] https://github.com/prestodb/presto-hive-apache
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Wed, Feb 5, 2020 at 10:15 PM Stephan Ewen  wrote:
>>>
 Some thoughts about other options we have:

   - Put fat/shaded jars for the common versions into "flink-shaded" and
 offer them for download on the website, similar to pre-bundles Hadoop
 versions.

   - Look at the Presto code (Metastore protocol) and see if we can
 reuse that

   - Have a setup helper script that takes the versions and pulls the
 required dependencies.

 Can you share how can a "built-in" dependency could work, if there are
 so many different conflicting versions?

 Thanks,
 Stephan


 On Tue, Feb 4, 2020 at 12:59 PM Rui Li  wrote:

> Hi Stephan,
>
> As Jingsong stated, in our documentation the recommended way to add
> Hive
> deps is to use exactly what users have installed. It's just we ask
> users to
> manually add those jars, instead of automatically find them based on
> env
> variables. I prefer to keep it this way for a while, and see if
> there're
> real concerns/complaints from user feedbacks.
>
> Please also note the Hive jars are not the only ones needed to
> integrate
> with Hive, users have to make sure flink-connector-hive and Hadoop
> jars are
> in classpath too. So I'm afraid a single "HIVE" env variable wouldn't
> save
> all the manual work for our users.
>
> On Tue, Feb 4, 2020 at 5:54 PM Jingsong Li 
> wrote:
>
> > Hi all,
> >
> > For your information, we have document the dependencies detailed
> > information [1]. I think it's a lot clearer than before, but it's
> 

Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-11 Thread Chesnay Schepler
I suppose the downside in an HTTP ES sink is that you don't get _any_ 
form of high-level API from ES, and we'd have to manually build an HTTP 
request that matches the ES format. Of course you also lose any 
client-side verification that the clients did, if there is any (but I 
guess the API itself prevented certain errors).


On 11/02/2020 09:32, Stephan Ewen wrote:

+1 to drop ES 2.x - unsure about 5.x (makes sense to get more user input
for that one).

@Itamar - if you would be interested in contributing a "universal" or
"cross version" ES connector, that could be very interesting. Do you know
if there are known performance issues or feature restrictions with that
approach?
@dawid what do you think about that?


On Tue, Feb 11, 2020 at 6:28 AM Danny Chan  wrote:


5.x seems to have a lot of users, is the 6.x completely compatible with
5.x ~

Best,
Danny Chan
在 2020年2月10日 +0800 PM9:45,Dawid Wysakowicz ,写道:

Hi all,

As described in this https://issues.apache.org/jira/browse/FLINK-11720
ticket our elasticsearch 5.x connector does not work out of the box on
some systems and requires a version bump. This also happens for our e2e.
We cannot bump the version in es 5.x connector, because 5.x connector
shares a common class with 2.x that uses an API that was replaced in 5.2.

Both versions are already long eol: https://www.elastic.co/support/eol

I suggest to drop both connectors 5.x and 2.x. If it is too much to drop
both of them, I would strongly suggest dropping at least 2.x connector
and update the 5.x line to a working es client module.

What do you think? Should we drop both versions? Drop only the 2.x
connector? Or keep them both?

Best,

Dawid






[jira] [Created] (FLINK-15990) Remove register source and sink in ConnectTableDescriptor

2020-02-11 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-15990:


 Summary: Remove register source and sink in ConnectTableDescriptor
 Key: FLINK-15990
 URL: https://issues.apache.org/jira/browse/FLINK-15990
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jingsong Lee
 Fix For: 1.11.0


We should always use {{ConnectTableDescriptor.createTemporaryTable}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Dian Fu
Hi Dawid,

Thanks for driving this feature. The design looks very well for me overall.

I have only one concern: $ is not allowed to be used in the identifier of 
Python and so we have to come out with another symbol when aligning this 
feature in the Python Table API. I noticed that there are also other options 
proposed in the discussion thread, e.g. ref, col, etc. I think it would be 
great if the proposed symbol could be supported in both the Java/Scala and 
Python Table API. What's your thoughts?

Regards,
Dian

> 在 2020年2月11日,上午11:13,Jark Wu  写道:
> 
> +1 for this.
> 
> I have some minor comments:
> - I'm +1 to use $ in both Java and Scala API.
> - I'm +1 to use lit(), Spark also provides lit() function to create a
> literal value.
> - Is it possible to have `isGreater` instead of `isGreaterThan` and
> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in BaseExpressions?
> 
> Best,
> Jark
> 
> On Tue, 11 Feb 2020 at 10:21, Jingsong Li  wrote:
> 
>> Hi Dawid,
>> 
>> Thanks for driving.
>> 
>> - adding $ in scala api looks good to me.
>> - Just a question, what should be expected to java.lang.Object? literal
>> object or expression? So the Object is the grammatical sugar of literal?
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther  wrote:
>> 
>>> +1 for this.
>>> 
>>> It will also help in making a TableEnvironment.fromElements() possible
>>> and reduces technical debt. One entry point of TypeInformation less in
>>> the API.
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 10.02.20 08:31, Dawid Wysakowicz wrote:
 Hi all,
 
 I wanted to resurrect the thread about introducing a Java Expression
 DSL. Please see the updated flip page[1]. Most of the flip was
>> concluded
 in previous discussion thread. The major changes since then are:
 
 * accepting java.lang.Object in the Java DSL
 
 * adding $ interpolation for a column in the Scala DSL
 
 I think it's important to move those changes forward as it makes it
 easier to transition to the new type system (Java parser supports only
 the old type system stack for now) that we are working on for the past
 releases.
 
 Because the previous discussion thread was rather conclusive I want to
 start already with a vote. If you think we need another round of
 discussion, feel free to say so.
 
 
 The vote will last for at least 72 hours, following the consensus
>> voting
 process.
 
 FLIP wiki:
 
 [1]
 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
 
 
 Discussion thread:
 
 
>>> 
>> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
 
 
 
 
>>> 
>>> 
>> 
>> --
>> Best, Jingsong Lee
>> 



[DISCUSS] Support notFollowedBy with interval as the last part of a Pattern

2020-02-11 Thread Shuai Xu
Hi all,
CEP is broadly used in more and more applications now. In may cases, users
need to use the pattern CEP.begin().notFollowedBy(). For example, they may
want to get the uses who created an oder but didn't pay in 10 minutes and
so on.

However, CEP doesn't support notFollowBy() as the last part of a pattern
now. So I propose to enable it as following:

If the pattern is ended with notFollowBy() with a time interval within(t),
we take it as a valid pattern. This pattern will be triggered after time t
from the begin stage if the previous pattern is matched and the
notFollowBy() pattern doesn't appear during the interval.

For example, Pattern.begin("start").where(event.getId() ==
1).notFollowBy("not").where(event.getId() == 2).within(Time.minutes(10)) is
a valid pattern now.
If the ids of the input events are 1, 3, 3..., then after 10 minutes from
getting event with id 1, it will get a match with 1.

This change will not add any new public interface, it only makes some
patterns not to be invalid any more.

The detail implement design is in:
https://docs.google.com/document/d/1swUSHcVxbkWm7EPdOfOQXWj-A4gGDA8Y8R1DOUjokds/edit#

Similar requirements from users can be found in:
https://issues.apache.org/jira/browse/FLINK-9431?filter=12347662

Please let me know if you have any questions or suggestions to improve this
proposal.


Re: [VOTE] Release 1.10.0, release candidate #3

2020-02-11 Thread Andrey Zagrebin
Hi,

@Jingsong Lee
Regarding "OutOfMemoryError: Direct buffer memory" in
FileChannelBoundedData$FileBufferReader
I saw you created a specific issue issue:
https://issues.apache.org/jira/browse/FLINK-15981

In general, I think we could rewrap this error
in MemorySegmentFactory#allocateUnpooledOffHeapMemory,
e.g. suggesting to increase off heap memory option:
https://issues.apache.org/jira/browse/FLINK-15989
It can always happen independently from Flink if user code over-allocates
the direct memory somewhere else.

Thanks,
Andrey

On Tue, Feb 11, 2020 at 4:12 AM Yangze Guo  wrote:

> +1 (non-binding)
>
> - Build from source
> - Run mesos e2e tests(including unmerged heap state backend and rocks
> state backend case)
>
>
> Best,
> Yangze Guo
>
> On Tue, Feb 11, 2020 at 10:08 AM Yu Li  wrote:
> >
> > Thanks for the reminder Patrick! According to the release process [1] we
> > will publish the Dockerfiles *after* the RC voting passed, to finalize
> the
> > release.
> >
> > I have created FLINK-15978 [2] and prepared a PR [3] for it, will follow
> up
> > after we conclude our RC vote. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > [2] https://issues.apache.org/jira/browse/FLINK-15978
> > [3] https://github.com/apache/flink-docker/pull/6
> >
> >
> > On Mon, 10 Feb 2020 at 20:57, Patrick Lucas 
> wrote:
> >
> > > Now that [FLINK-15828] Integrate docker-flink/docker-flink into Flink
> > > release process  is
> > > complete, the Dockerfiles for 1.10.0 can be published as part of the
> > > release process.
> > >
> > > @Gary/@Yu: please let me know if you have any questions regarding the
> > > workflow or its documentation.
> > >
> > > --
> > > Patrick
> > >
> > > On Mon, Feb 10, 2020 at 1:29 PM Benchao Li 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > - build from source
> > > > - start standalone cluster, and run some examples
> > > > - played with sql-client with some simple sql
> > > > - run tests in IDE
> > > > - run some sqls running in 1.9 internal version with 1.10.0-rc3,
> seems
> > > 1.10
> > > > behaves well.
> > > >
> > > > Xintong Song  于2020年2月10日周一 下午8:13写道:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > - build from source (with tests)
> > > > > - run nightly e2e tests
> > > > > - run example jobs in local/standalone/yarn setups
> > > > > - play around with memory configurations on local/standalone/yarn
> > > setups
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Feb 10, 2020 at 7:55 PM Jark Wu  wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > - build the source release with Scala 2.12 and Scala 2.11
> > > successfully
> > > > > > - checked/verified signatures and hashes
> > > > > > - started cluster for both Scala 2.11 and 2.12, ran examples,
> > > verified
> > > > > web
> > > > > > ui and log output, nothing unexpected
> > > > > > - started cluster and run some e2e sql queries, all of them works
> > > well
> > > > > and
> > > > > > the results are as expected:
> > > > > >   - read from kafka source, aggregate, write into mysql
> > > > > >   - read from kafka source with watermark defined in ddl, window
> > > > > aggregate,
> > > > > > write into mysql
> > > > > >   - read from kafka with computed column defined in ddl, temporal
> > > join
> > > > > with
> > > > > > a mysql table, write into kafka
> > > > > >
> > > > > > Cheers,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > > On Mon, 10 Feb 2020 at 19:23, Kurt Young 
> wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > - verified signatures and checksums
> > > > > > > - start local cluster, run some examples, randomly play some
> sql
> > > with
> > > > > sql
> > > > > > > client, no suspicious error/warn log found in log files
> > > > > > > - repeat above operation with both scala 2.11 and 2.12 binary
> > > > > > >
> > > > > > > Best,
> > > > > > > Kurt
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Feb 10, 2020 at 6:38 PM Yang Wang <
> danrtsey...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > >  +1 non-binding
> > > > > > > >
> > > > > > > >
> > > > > > > > - Building from source with all tests skipped
> > > > > > > > - Build a custom image with 1.10-rc3
> > > > > > > > - K8s tests
> > > > > > > > * Deploy a standalone session cluster on K8s and submit
> > > > multiple
> > > > > > jobs
> > > > > > > > * Deploy a standalone per-job cluster
> > > > > > > > * Deploy a native session cluster on K8s with/without HA
> > > > > > configured,
> > > > > > > > kill TM and jobs could recover successfully
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yang
> > > > > > > >
> > > > > > > > Jingsong Li  于2020年2月10日周一 下午4:29写道:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > +1 (non-binding) Thanks for driving this, 

[jira] [Created] (FLINK-15989) Rewrap OutOfMemoryError in allocateUnpooledOffHeap with better message

2020-02-11 Thread Andrey Zagrebin (Jira)
Andrey Zagrebin created FLINK-15989:
---

 Summary: Rewrap OutOfMemoryError in allocateUnpooledOffHeap with 
better message
 Key: FLINK-15989
 URL: https://issues.apache.org/jira/browse/FLINK-15989
 Project: Flink
  Issue Type: Improvement
Reporter: Andrey Zagrebin
 Fix For: 1.10.1, 1.11.0


Now if Flink allocates direct memory in 
MemorySegmentFactory#allocateUnpooledOffHeapMemory and its limit is exceeded 
for any reason, e.g. user code over-allocated direct memory, 
ByteBuffer#allocateDirect will throw a generic "OutOfMemoryError: Direct buffer 
memory". We can catch it and add a message which provides more explanation and 
points to an option taskmanager.memory.task.off-heap.size to increase as a 
possible solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-11 Thread Stephan Ewen
+1 to drop ES 2.x - unsure about 5.x (makes sense to get more user input
for that one).

@Itamar - if you would be interested in contributing a "universal" or
"cross version" ES connector, that could be very interesting. Do you know
if there are known performance issues or feature restrictions with that
approach?
@dawid what do you think about that?


On Tue, Feb 11, 2020 at 6:28 AM Danny Chan  wrote:

> 5.x seems to have a lot of users, is the 6.x completely compatible with
> 5.x ~
>
> Best,
> Danny Chan
> 在 2020年2月10日 +0800 PM9:45,Dawid Wysakowicz ,写道:
> > Hi all,
> >
> > As described in this https://issues.apache.org/jira/browse/FLINK-11720
> > ticket our elasticsearch 5.x connector does not work out of the box on
> > some systems and requires a version bump. This also happens for our e2e.
> > We cannot bump the version in es 5.x connector, because 5.x connector
> > shares a common class with 2.x that uses an API that was replaced in 5.2.
> >
> > Both versions are already long eol: https://www.elastic.co/support/eol
> >
> > I suggest to drop both connectors 5.x and 2.x. If it is too much to drop
> > both of them, I would strongly suggest dropping at least 2.x connector
> > and update the 5.x line to a working es client module.
> >
> > What do you think? Should we drop both versions? Drop only the 2.x
> > connector? Or keep them both?
> >
> > Best,
> >
> > Dawid
> >
> >
>


[jira] [Created] (FLINK-15988) Make JsonRowSerializationSchema's constructor private

2020-02-11 Thread Benchao Li (Jira)
Benchao Li created FLINK-15988:
--

 Summary: Make JsonRowSerializationSchema's constructor private 
 Key: FLINK-15988
 URL: https://issues.apache.org/jira/browse/FLINK-15988
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Benchao Li


public constructor in \{{JsonRowSerializationSchema}} has been deprecated since 
1.9.0, and leaves a TODO to make it private.

IMO, it's ok to make it private in 1.11 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)