[jira] [Created] (FLINK-24596) Bugs in sink.buffer-flush before upsert-kafka

2021-10-19 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-24596:


 Summary: Bugs in sink.buffer-flush before upsert-kafka
 Key: FLINK-24596
 URL: https://issues.apache.org/jira/browse/FLINK-24596
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.15.0
Reporter: Jingsong Lee
 Fix For: 1.15.0


There is no ITCase for sink.buffer-flush before upsert-kafka. We should add it.
FLINK-23735 brings some bugs:
* SinkBufferFlushMode bufferFlushMode not Serializable
* Function valueCopyFunction not Serializable
* Planner dose not support DataStreamProvider with new Sink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24595) Programmatic configuration of S3 doesn't pass parameters to Hadoop FS

2021-10-19 Thread Pavel Penkov (Jira)
Pavel Penkov created FLINK-24595:


 Summary: Programmatic configuration of S3 doesn't pass parameters 
to Hadoop FS
 Key: FLINK-24595
 URL: https://issues.apache.org/jira/browse/FLINK-24595
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hadoop Compatibility
Affects Versions: 1.14.0
 Environment: Flink 1.14.0

JDK 8 

{{openjdk version "1.8.0_302"}}
{{OpenJDK Runtime Environment (Zulu 8.56.0.23-CA-macos-aarch64) (build 
1.8.0_302-b08)}}
{{OpenJDK 64-Bit Server VM (Zulu 8.56.0.23-CA-macos-aarch64) (build 25.302-b08, 
mixed mode)}}
Reporter: Pavel Penkov
 Attachments: FlinkApp.java, TickingSource.java, flink_exception.txt

When running in mini-cluster mode Flink apparently doesn't pass S3 
configuration to underlying Hadoop FS. With a code like this
{code:java}
Configuration conf = new Configuration();
conf.setString("s3.endpoint", "http://localhost:4566;);
conf.setString("s3.aws.credentials.provider","org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider");
conf.setString("s3.access.key", "harvester");
conf.setString("s3.secret.key", "harvester");
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createLocalEnvironment(conf);
{code}
Application fails with an exception with most relevant error being {{Caused by: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Failed 
to connect to service endpoint: }}

So Hadoop lists all the providers but it should use only the one set in 
configuration. Full project that reproduces this behaviour is available at 
[https://github.com/PavelPenkov/flink-s3-conf] and relevant files are attached 
to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Chesnay Schepler
Could you clarify what release cadence you're thinking of? There's quite 
a big range that fits "more frequent than Flink" (per-commit, daily, 
weekly, bi-weekly, monthly, even bi-monthly).


On 19/10/2021 14:15, Martijn Visser wrote:

Hi all,

I think it would be a huge benefit if we can achieve more frequent releases
of connectors, which are not bound to the release cycle of Flink itself. I
agree that in order to get there, we need to have stable interfaces which
are trustworthy and reliable, so they can be safely used by those
connectors. I do think that work still needs to be done on those
interfaces, but I am confident that we can get there from a Flink
perspective.

I am worried that we would not be able to achieve those frequent releases
of connectors if we are putting these connectors under the Apache umbrella,
because that means that for each connector release we have to follow the
Apache release creation process. This requires a lot of manual steps and
prohibits automation and I think it would be hard to scale out frequent
releases of connectors. I'm curious how others think this challenge could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:


Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
wrote:

Generally, the issues are reproducibility and control.

Stuffs completely broken on the Flink side for a week? Well then so are
the connector repos.
(As-is) You can't go back to a previous version of the snapshot. Which
also means that checking out older commits can be problematic because
you'd still work against the latest snapshots, and they not be
compatible with each other.


On 18/10/2021 15:22, Arvid Heise wrote:

I was actually betting on snapshots versions. What are the limits?
Obviously, we can only do a release of a 1.15 connector after 1.15 is
release.






Re: [VOTE] Release 1.13.3, release candidate #1

2021-10-19 Thread Lukáš Drbal
+1 (non-binding)

verified:
- build from source code
- run batch job using native k8s
- restore streaming job written in Apache Beam (java, scala 2.11)
- restore streaming job written in java (scala 2.11)

All works and I don't see anything weird in logs.

Thanks!
L.

On Tue, Oct 12, 2021 at 7:22 PM Chesnay Schepler  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version
> 1.13.3, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to
> be deployed to dist.apache.org [2], which are signed with the key with
> fingerprint C2EED7B111D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.13.3-rc1" [5],
> * website pull request listing the new release and adding announcement
> blog post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350329
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1453
> [5] https://github.com/apache/flink/tree/release-1.13.3-rc1
> [6] https://github.com/apache/flink-web/pull/473
>
>


[jira] [Created] (FLINK-24594) /jobs endpoint returns same job id twice (suspended and running) until stopped

2021-10-19 Thread Azocan Kara (Jira)
Azocan Kara created FLINK-24594:
---

 Summary: /jobs endpoint returns same job id twice (suspended and 
running) until stopped
 Key: FLINK-24594
 URL: https://issues.apache.org/jira/browse/FLINK-24594
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Runtime / REST
Affects Versions: 1.13.2
Reporter: Azocan Kara


We observe this behavior since our migration to flink 1.13.2. 

/jobs GET endpoint returns the same job id twice :

 
{code:java}
{"jobs":[{"id":"58bde1f30e0ec30d511a996b74ab2e12","status":"RUNNING"},{"id":"58bde1f30e0ec30d511a996b74ab2e12","status":"SUSPENDED"}]}
{code}
/jobs/overview GET returns :
{code:java}
{"jobs":[{"jid":"58bde1f30e0ec30d511a996b74ab2e12","name":"Enrichissement 
Incidents 
HTA","state":"RUNNING","start-time":1634479775893,"end-time":-1,"duration":173551611,"last-modification":1634584098667,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":1,"finished":0,"canceling":0,"canceled":0,"failed":0,"reconciling":0,"initializing":0}},{"jid":"58bde1f30e0ec30d511a996b74ab2e12","name":"Enrichissement
 Incidents 
HTA","state":"SUSPENDED","start-time":1634300482098,"end-time":-1,"duration":179293293,"last-modification":0,"tasks":{"total":0,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":0,"failed":0,"reconciling":0,"initializing":0}}]}
{code}
Of course, this also shows up on the overview page of the job-manager web UI.
 
/jobs/8bde1f30e0ec30d511a996b74ab2e12 returns details of the RUNNING job. 
 
flink stop command stops the running job and both jobs disappear from /jobs and 
/jobs/overview
 
We observed this several times on clusters we have migrated to flink 1.13.2, 
not sure yet how it happens.
 
I noticed this issue : https://issues.apache.org/jira/browse/FLINK-20195 , it 
may be related but this is not exactly the same issue, I get this response 
until job is canceled / stopped.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24593) Create Table API Quickstart

2021-10-19 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-24593:


 Summary: Create Table API Quickstart
 Key: FLINK-24593
 URL: https://issues.apache.org/jira/browse/FLINK-24593
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Reporter: Seth Wiesman
Assignee: Seth Wiesman


Following FLINK-24516 we should add a Table API quickstart. The dependencies 
should be structured to easily run queries within the IDE but only package what 
should actually be included in the fat jar. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24592) FlinkSQL multiline parser improvements

2021-10-19 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-24592:
---

 Summary: FlinkSQL multiline parser improvements
 Key: FLINK-24592
 URL: https://issues.apache.org/jira/browse/FLINK-24592
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Sergey Nuyanzin


Currently existing multiline parser has limitations e.g.
line could not end with semicolon e.g. as a part of field value, comment or 
column name.
Also if a query contains '--' e.g. as a part of varchar field value then it 
fails.

In case there is no objections I would put some efforts to improve this 
behavior;

here it is a list of sample problem queries
{code:sql}
select 123; -- comment

select 1 as `1--`;

select '--';

-- This query works if a user copy-pastes it to FlinkSQL, however it fails if a 
user types it in FlinkSQL
select '1;
';
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [RESULT][VOTE] Release 1.13.3, release candidate #1

2021-10-19 Thread Martijn Visser
Thank you Chesnay and all the others who have contributed!

Best regards,

Martijn

On Tue, 19 Oct 2021 at 15:03, Chesnay Schepler  wrote:

> We have unanimously approved this release:
>
> Binding votes:
> - Dian
> - Arvid
> - Chesnay
>
> On 12/10/2021 19:21, Chesnay Schepler wrote:
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> > 1.13.3, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases
> > to be deployed to dist.apache.org [2], which are signed with the key
> > with fingerprint C2EED7B111D464BA [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.13.3-rc1" [5],
> > * website pull request listing the new release and adding announcement
> > blog post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350329
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1453
> > [5] https://github.com/apache/flink/tree/release-1.13.3-rc1
> > [6] https://github.com/apache/flink-web/pull/473
> >
>
>


[RESULT][VOTE] Release 1.13.3, release candidate #1

2021-10-19 Thread Chesnay Schepler

We have unanimously approved this release:

Binding votes:
- Dian
- Arvid
- Chesnay

On 12/10/2021 19:21, Chesnay Schepler wrote:

Hi everyone,
Please review and vote on the release candidate #1 for the version 
1.13.3, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases 
to be deployed to dist.apache.org [2], which are signed with the key 
with fingerprint C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.13.3-rc1" [5],
* website pull request listing the new release and adding announcement 
blog post [6].


The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Thanks,
Release Manager

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350329

[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1453

[5] https://github.com/apache/flink/tree/release-1.13.3-rc1
[6] https://github.com/apache/flink-web/pull/473





Re: [VOTE] Release 1.13.3, release candidate #1

2021-10-19 Thread Chesnay Schepler

+1 (binding)

On 19/10/2021 13:59, Arvid Heise wrote:

+1 (binding)

- build from source on scala 2_12 profile
- ran standalone cluster with examples

Best,

Arvid

On Tue, Oct 19, 2021 at 4:48 AM Dian Fu  wrote:


+1 (binding)

- verified the checksum and signature
- checked the dependency changes since 1.13.2. There is only one dependency
change (commons-compress: 1.20 -> 1.21) and it is well documented in the
NOTICE file
- installed the PyFlink packages in MacOS and runs a few examples
successfully
- the website PR LGTM

Regards,
Dian

On Mon, Oct 18, 2021 at 8:07 PM Leonard Xu  wrote:


+1 (non-binding)

- verified signatures and hashsums
- built from source code with scala 2.11 succeeded
- started a cluster, ran a wordcount job, the result is expected, no
suspicious log output
- started SQL Client, used mysql-cdc connector to consumer changelog from
MySQL, the result is expected
- reviewed the web PR

Best,
Leonard


在 2021年10月18日,16:20,JING ZHANG  写道:

Thanks Chesnay for driving this.

+1 (non-binding)

- built from source code flink-1.13.3-src.tgz
<

https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/flink-1.13.3-src.tgz

succeeded
- started a standalone Flink cluster, ran the WordCount example, WebUI
looks good,  no suspicious output/log.
- started cluster and run some e2e sql queries using SQL Client, query
result is expected
- repeat step 2 and step 3 with flink-1.13.3-bin-scala_2.11.tgz
<

https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/flink-1.13.3-bin-scala_2.11.tgz

- repeat step 2 and step 3 with flink-1.13.3-bin-scala_2.12.tgz
<

https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/flink-1.13.3-bin-scala_2.12.tgz


Best,
JING ZHANG

Matthias Pohl  于2021年10月15日周五 下午10:07写道:


Thanks Chesnay for driving this.

+1 (non-binding)

- verified the checksums
- build 1.13.3-rc1 from sources
- went over the pom file diff to see whether we missed newly added
dependency in the NOTICE file
- went over the release blog post
- checked that scala 2.11 and 2.12 artifacts are present in the Maven

repo

- Run example jobs without noticing any issues in the logs
- Triggered e2e test run on VVP based on 1.13.3 RC1

On Tue, Oct 12, 2021 at 7:22 PM Chesnay Schepler 
wrote:


Hi everyone,
Please review and vote on the release candidate #1 for the version
1.13.3, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which

includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience releases

to

be deployed to dist.apache.org [2], which are signed with the key

with

fingerprint C2EED7B111D464BA [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.13.3-rc1" [5],
* website pull request listing the new release and adding

announcement

blog post [6].

The vote will be open for at least 72 hours. It is adopted by

majority

approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]



https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350329

[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]

https://repository.apache.org/content/repositories/orgapacheflink-1453

[5] https://github.com/apache/flink/tree/release-1.13.3-rc1
[6] https://github.com/apache/flink-web/pull/473







Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Chesnay Schepler
TBH I think you're overestimating how much work it is to create a 
non-Flink release. Having done most of the flink-shaded releases, I 
really don't see an issue of even doing weekly releases with that process.


We can not reduce the number of votes AFAIK; the ASF seems very clear on 
that matter to me: 
https://www.apache.org/foundation/voting.html#ReleaseVotes

However, the vote duration is up to us.

Additionally, we only /need /to vote on the /source/. This means we 
don't need to create the maven artifacts for each RC, but can do that at 
the very end.


On 19/10/2021 14:21, Arvid Heise wrote:
Okay I think it is clear that the majority would like to keep 
connectors under the Apache Flink umbrella. That means we will not be 
able to have per-connector repositories and project management, 
automatic dependency bumping with Dependabot, or semi-automatic releases.


So then I'm assuming the directory structure that @Chesnay Schepler 
 proposed would be the most beneficial:

- A root project with some convenience setup.
- Unrelated subprojects with individual versioning and releases.
- Branches for minor Flink releases. That is needed anyhow to use new 
features independent of API stability.
- Each connector maintains its own documentation that is accessible 
through the main documentation.


Any thoughts on alternatives? Do you see risks?

@Stephan Ewen  mentioned offline that we 
could adjust the bylaws for the connectors such that we need fewer 
PMCs to approve a release. Would it be enough to have one PMC vote per 
connector release? Do you know of other ways to tweak the release 
process to have fewer manual work?


On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

Thanks for initiating this discussion.

There are definitely a few things that are not optimal with our
current management of connectors. I would not necessarily characterize
it as a "mess" though. As the points raised so far show, it isn't easy
to find a solution that balances competing requirements and leads to a
net improvement.

It would be great if we can find a setup that allows for connectors to
be released independently of core Flink and that each connector can be
released separately. Flink already has separate releases
(flink-shaded), so that by itself isn't a new thing. Per-connector
releases would need to allow for more frequent releases (without the
baggage that a full Flink release comes with).

Separate releases would only make sense if the core Flink surface is
fairly stable though. As evident from Iceberg (and also Beam), that's
not the case currently. We should probably focus on addressing the
stability first, before splitting code. A success criteria could be
that we are able to build Iceberg and Beam against multiple Flink
versions w/o the need to change code. The goal would be that no
connector breaks when we make changes to Flink core. Until that's the
case, code separation creates a setup where 1+1 or N+1 repositories
need to move lock step.

Regarding some connectors being more important for Flink than others:
That's a fact. Flink w/o Kafka connector (and few others) isn't
viable. Testability of Flink was already brought up, can we really
certify a Flink core release without Kafka connector? Maybe those
connectors that are used in Flink e2e tests to validate functionality
of core Flink should not be broken out?

Finally, I think that the connectors that move into separate repos
should remain part of the Apache Flink project. Larger organizations
tend to approve the use of and contribution to open source at the
project level. Sometimes it is everything ASF. More often it is
"Apache Foo". It would be fatal to end up with a patchwork of projects
with potentially different licenses and governance to arrive at a
working Flink setup. This may mean we prioritize usability over
developer convenience, if that's in the best interest of Flink as a
whole.

Thanks,
Thomas



On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler
 wrote:
>
> Generally, the issues are reproducibility and control.
>
> Stuffs completely broken on the Flink side for a week? Well then
so are
> the connector repos.
> (As-is) You can't go back to a previous version of the snapshot.
Which
> also means that checking out older commits can be problematic
because
> you'd still work against the latest snapshots, and they not be
> compatible with each other.
>
>
> On 18/10/2021 15:22, Arvid Heise wrote:
> > I was actually betting on snapshots versions. What are the limits?
> > Obviously, we can only do a release of a 1.15 connector after
1.15 is
> > release.
>
>



Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Dawid Wysakowicz
Hey all,

I don't have much to add to the general discussion. Just a single
comment on:

that we could adjust the bylaws for the connectors such that we need
fewer PMCs to approve a release. Would it be enough to have one PMC
vote per connector release?

I think it's not an option. This particular rule is one of few rules
from the bylaws that actually originates from ASF rather than was
established within the Flink community. I believe we do need 3 PMC votes
for any formal ASF releases [1].

Votes on whether a package is ready to release use majority
approval-- i.e. at least three PMC members must vote affirmatively
for release, and there must be more positive than negative votes.
Releases may not be vetoed*.*Generally the community will cancel the
release vote if anyone identifies serious problems, but in most
cases the ultimate decision lies with the individual serving as
release manager. The specifics of the process may vary from project
to project,*but the 'minimum quorum of three +1 votes' rule is
universal.*

Best,

Dawid

https://www.apache.org/foundation/voting.html#ReleaseVotes

On 19/10/2021 14:21, Arvid Heise wrote:
> Okay I think it is clear that the majority would like to keep connectors
> under the Apache Flink umbrella. That means we will not be able to have
> per-connector repositories and project management, automatic dependency
> bumping with Dependabot, or semi-automatic releases.
>
> So then I'm assuming the directory structure that @Chesnay Schepler
>  proposed would be the most beneficial:
> - A root project with some convenience setup.
> - Unrelated subprojects with individual versioning and releases.
> - Branches for minor Flink releases. That is needed anyhow to use new
> features independent of API stability.
> - Each connector maintains its own documentation that is accessible through
> the main documentation.
>
> Any thoughts on alternatives? Do you see risks?
>
> @Stephan Ewen  mentioned offline that we could adjust the
> bylaws for the connectors such that we need fewer PMCs to approve a
> release. Would it be enough to have one PMC vote per connector release? Do
> you know of other ways to tweak the release process to have fewer manual
> work?
>
> On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:
>
>> Thanks for initiating this discussion.
>>
>> There are definitely a few things that are not optimal with our
>> current management of connectors. I would not necessarily characterize
>> it as a "mess" though. As the points raised so far show, it isn't easy
>> to find a solution that balances competing requirements and leads to a
>> net improvement.
>>
>> It would be great if we can find a setup that allows for connectors to
>> be released independently of core Flink and that each connector can be
>> released separately. Flink already has separate releases
>> (flink-shaded), so that by itself isn't a new thing. Per-connector
>> releases would need to allow for more frequent releases (without the
>> baggage that a full Flink release comes with).
>>
>> Separate releases would only make sense if the core Flink surface is
>> fairly stable though. As evident from Iceberg (and also Beam), that's
>> not the case currently. We should probably focus on addressing the
>> stability first, before splitting code. A success criteria could be
>> that we are able to build Iceberg and Beam against multiple Flink
>> versions w/o the need to change code. The goal would be that no
>> connector breaks when we make changes to Flink core. Until that's the
>> case, code separation creates a setup where 1+1 or N+1 repositories
>> need to move lock step.
>>
>> Regarding some connectors being more important for Flink than others:
>> That's a fact. Flink w/o Kafka connector (and few others) isn't
>> viable. Testability of Flink was already brought up, can we really
>> certify a Flink core release without Kafka connector? Maybe those
>> connectors that are used in Flink e2e tests to validate functionality
>> of core Flink should not be broken out?
>>
>> Finally, I think that the connectors that move into separate repos
>> should remain part of the Apache Flink project. Larger organizations
>> tend to approve the use of and contribution to open source at the
>> project level. Sometimes it is everything ASF. More often it is
>> "Apache Foo". It would be fatal to end up with a patchwork of projects
>> with potentially different licenses and governance to arrive at a
>> working Flink setup. This may mean we prioritize usability over
>> developer convenience, if that's in the best interest of Flink as a
>> whole.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
>> wrote:
>>> Generally, the issues are reproducibility and control.
>>>
>>> Stuffs completely broken on the Flink side for a week? Well then so are
>>> the connector repos.
>>> (As-is) You can't go back to a previous version of the snapshot. Which
>>> also means that 

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Konstantin Knauf
Thank you, Arvid & team, for working on this.

I would also favor one connector repository under the ASF. This will
already force us to provide better tools and more stable APIs, which
connectors developed outside of Apache Flink will benefit from, too.

Besides simplifying the formal release process for connectors, I believe,
we can also be more liberal with Committership for connector maintainers.

I expect that this setup can scale better than the current one, but it
doesn't scale super well either. In addition, there is still the ASF
barrier to contributions/releases. So, we might have more connectors in
this repository than we have in Apache Flink right now, but not all
connectors will end up in this repository. For those "external" connectors,
we should still aim to improve visibility, documentation and tooling.

It feels like such a hybrid approach might be the only option given
competing requirements.

Thanks,

Konstnatin

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Arvid Heise
Okay I think it is clear that the majority would like to keep connectors
under the Apache Flink umbrella. That means we will not be able to have
per-connector repositories and project management, automatic dependency
bumping with Dependabot, or semi-automatic releases.

So then I'm assuming the directory structure that @Chesnay Schepler
 proposed would be the most beneficial:
- A root project with some convenience setup.
- Unrelated subprojects with individual versioning and releases.
- Branches for minor Flink releases. That is needed anyhow to use new
features independent of API stability.
- Each connector maintains its own documentation that is accessible through
the main documentation.

Any thoughts on alternatives? Do you see risks?

@Stephan Ewen  mentioned offline that we could adjust the
bylaws for the connectors such that we need fewer PMCs to approve a
release. Would it be enough to have one PMC vote per connector release? Do
you know of other ways to tweak the release process to have fewer manual
work?

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Martijn Visser
Hi all,

I think it would be a huge benefit if we can achieve more frequent releases
of connectors, which are not bound to the release cycle of Flink itself. I
agree that in order to get there, we need to have stable interfaces which
are trustworthy and reliable, so they can be safely used by those
connectors. I do think that work still needs to be done on those
interfaces, but I am confident that we can get there from a Flink
perspective.

I am worried that we would not be able to achieve those frequent releases
of connectors if we are putting these connectors under the Apache umbrella,
because that means that for each connector release we have to follow the
Apache release creation process. This requires a lot of manual steps and
prohibits automation and I think it would be hard to scale out frequent
releases of connectors. I'm curious how others think this challenge could
be solved.

Best regards,

Martijn

On Mon, 18 Oct 2021 at 22:22, Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


Re: [VOTE] Release 1.13.3, release candidate #1

2021-10-19 Thread Arvid Heise
+1 (binding)

- build from source on scala 2_12 profile
- ran standalone cluster with examples

Best,

Arvid

On Tue, Oct 19, 2021 at 4:48 AM Dian Fu  wrote:

> +1 (binding)
>
> - verified the checksum and signature
> - checked the dependency changes since 1.13.2. There is only one dependency
> change (commons-compress: 1.20 -> 1.21) and it is well documented in the
> NOTICE file
> - installed the PyFlink packages in MacOS and runs a few examples
> successfully
> - the website PR LGTM
>
> Regards,
> Dian
>
> On Mon, Oct 18, 2021 at 8:07 PM Leonard Xu  wrote:
>
> >
> > +1 (non-binding)
> >
> > - verified signatures and hashsums
> > - built from source code with scala 2.11 succeeded
> > - started a cluster, ran a wordcount job, the result is expected, no
> > suspicious log output
> > - started SQL Client, used mysql-cdc connector to consumer changelog from
> > MySQL, the result is expected
> > - reviewed the web PR
> >
> > Best,
> > Leonard
> >
> > > 在 2021年10月18日,16:20,JING ZHANG  写道:
> > >
> > > Thanks Chesnay for driving this.
> > >
> > > +1 (non-binding)
> > >
> > > - built from source code flink-1.13.3-src.tgz
> > > <
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/flink-1.13.3-src.tgz
> > >
> > > succeeded
> > > - started a standalone Flink cluster, ran the WordCount example, WebUI
> > > looks good,  no suspicious output/log.
> > > - started cluster and run some e2e sql queries using SQL Client, query
> > > result is expected
> > > - repeat step 2 and step 3 with flink-1.13.3-bin-scala_2.11.tgz
> > > <
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/flink-1.13.3-bin-scala_2.11.tgz
> > >
> > > - repeat step 2 and step 3 with flink-1.13.3-bin-scala_2.12.tgz
> > > <
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/flink-1.13.3-bin-scala_2.12.tgz
> > >
> > >
> > > Best,
> > > JING ZHANG
> > >
> > > Matthias Pohl  于2021年10月15日周五 下午10:07写道:
> > >
> > >> Thanks Chesnay for driving this.
> > >>
> > >> +1 (non-binding)
> > >>
> > >> - verified the checksums
> > >> - build 1.13.3-rc1 from sources
> > >> - went over the pom file diff to see whether we missed newly added
> > >> dependency in the NOTICE file
> > >> - went over the release blog post
> > >> - checked that scala 2.11 and 2.12 artifacts are present in the Maven
> > repo
> > >> - Run example jobs without noticing any issues in the logs
> > >> - Triggered e2e test run on VVP based on 1.13.3 RC1
> > >>
> > >> On Tue, Oct 12, 2021 at 7:22 PM Chesnay Schepler 
> > >> wrote:
> > >>
> > >>> Hi everyone,
> > >>> Please review and vote on the release candidate #1 for the version
> > >>> 1.13.3, as follows:
> > >>> [ ] +1, Approve the release
> > >>> [ ] -1, Do not approve the release (please provide specific comments)
> > >>>
> > >>>
> > >>> The complete staging area is available for your review, which
> includes:
> > >>> * JIRA release notes [1],
> > >>> * the official Apache source release and binary convenience releases
> to
> > >>> be deployed to dist.apache.org [2], which are signed with the key
> with
> > >>> fingerprint C2EED7B111D464BA [3],
> > >>> * all artifacts to be deployed to the Maven Central Repository [4],
> > >>> * source code tag "release-1.13.3-rc1" [5],
> > >>> * website pull request listing the new release and adding
> announcement
> > >>> blog post [6].
> > >>>
> > >>> The vote will be open for at least 72 hours. It is adopted by
> majority
> > >>> approval, with at least 3 PMC affirmative votes.
> > >>>
> > >>> Thanks,
> > >>> Release Manager
> > >>>
> > >>> [1]
> > >>>
> > >>>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350329
> > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.3-rc1/
> > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >>> [4]
> > >>
> https://repository.apache.org/content/repositories/orgapacheflink-1453
> > >>> [5] https://github.com/apache/flink/tree/release-1.13.3-rc1
> > >>> [6] https://github.com/apache/flink-web/pull/473
> > >>>
> > >>
> >
> >
>


[jira] [Created] (FLINK-24591) Kafka Producer fails with SecurityException when using cluster.intercept-user-system-exit

2021-10-19 Thread Krzysztof Dziolak (Jira)
Krzysztof Dziolak created FLINK-24591:
-

 Summary: Kafka Producer fails with SecurityException when using 
cluster.intercept-user-system-exit
 Key: FLINK-24591
 URL: https://issues.apache.org/jira/browse/FLINK-24591
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Formats (JSON, Avro, Parquet, ORC, 
SequenceFile), Table SQL / Ecosystem
Affects Versions: 1.13.1
Reporter: Krzysztof Dziolak


A user reported in the [mailing 
list|https://lists.apache.org/thread.html/re38a07f6121cc580737a20c11574719cfe554e58d99817f79db9bb4a%40%3Cuser.flink.apache.org%3E]
 that Avro deserialization fails when using Kafka, Avro and Confluent Schema 
Registry:  

{code:java}
Caused by: java.io.IOException: Failed to deserialize Avro record.
  at 
org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:106)
  at 
org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:46)
  
  at 
org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:82)
  at 
org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaDeserializationSchema.deserialize(DynamicKafkaDeserializationSchema.java:113)
  at 
org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:179)
 
  at 
org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.runFetchLoop(KafkaFetcher.java:142)
  at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:826)
  at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
  at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
  at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)
Caused by: org.apache.avro.AvroTypeException: Found my.type.avro.MyEnumType, 
expecting union
  at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:308)
  at org.apache.avro.io.parsing.Parser.advance(Parser.java:86)
  at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:275)
  at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
  at 
org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
  at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
  at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
  at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
  at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
  at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
  at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
  at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
  at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
  at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
  at 
org.apache.flink.formats.avro.RegistryAvroDeserializationSchema.deserialize(RegistryAvroDeserializationSchema.java:81)
  at 
org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:103)
  ... 9 more
{code}

Look in the attachments for a reproducer.

Same data serialized to a file works fine (look the filesystem example in the 
reproducer) 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Multiple View Creation support

2021-10-19 Thread JING ZHANG
Hi,
I'm not sure I understand your question. Are you looking for a way to
define multiple view in SQL? Please try define multiple view by define
multiple create view query, you could find create view syntax in document
[1]
Please let me know if I misunderstand your requirement.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-view

Hugar, Mahesh  于2021年10月19日周二 下午4:02写道:

> Hi,
> I went through the flink documents, multiple view creation through flink
> approach not able to findout, please help me here for implementation.
> Thanks in advance.
> Regards,
> Mahesh Kumar GH
>


[jira] [Created] (FLINK-24590) Consider removing timeout from FlinkMatchers#futureWillCompleteExceptionally

2021-10-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24590:


 Summary: Consider removing timeout from 
FlinkMatchers#futureWillCompleteExceptionally
 Key: FLINK-24590
 URL: https://issues.apache.org/jira/browse/FLINK-24590
 Project: Flink
  Issue Type: Technical Debt
  Components: Tests
Reporter: Chesnay Schepler
 Fix For: 1.15.0


We concluded to not use timeouts in tests, but certain utility methods still 
ask for a timeout argument.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24589) FLIP-183: Buffer debloating 1.2

2021-10-19 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created FLINK-24589:
-

 Summary: FLIP-183: Buffer debloating 1.2
 Key: FLINK-24589
 URL: https://issues.apache.org/jira/browse/FLINK-24589
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Network
Affects Versions: 1.14.0
Reporter: Anton Kalashnikov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24588) Optimisation of chained cast calls can lead to unexpected behaviour

2021-10-19 Thread Marios Trivyzas (Jira)
Marios Trivyzas created FLINK-24588:
---

 Summary: Optimisation of chained cast calls can lead to unexpected 
behaviour
 Key: FLINK-24588
 URL: https://issues.apache.org/jira/browse/FLINK-24588
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.15.0
Reporter: Marios Trivyzas


Simplification of Cast chained calls can lead to unexpected behaviour:
CAST(CAST(CAST(123456 AS TINYINT) AS INT) AS BIGINT)
is simplified to 
{noformat}
CAST(123456 AS BIGINT){noformat}
and returns *123456* with *BIGINT* data type, where the first inner cast as 
TINYINT should already fail because the value is out of range.

For example, for PostgreSQL:
{noformat}
postgres=# select 123456::smallint::int::bigint;
ERROR: smallint out of range

{noformat}
Corresponding issue has been create at calcite: 
https://issues.apache.org/jira/browse/CALCITE-4861



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Multiple View Creation support

2021-10-19 Thread Hugar, Mahesh
Hi,
I went through the flink documents, multiple view creation through flink 
approach not able to findout, please help me here for implementation.
Thanks in advance.
Regards,
Mahesh Kumar GH


回复:dataStream can not use multiple classloaders

2021-10-19 Thread 百岁
TO:@Arvid Heise @Caizhi Weng

thanks for your reply, as the stream api code become more and more complicated, 
we will add more dependenies from  third pary. 

This kind of problem will be inevitable. If we only rely on this kind of trick 
like shade the dependencies  to solve this kind of problem, I think it is far 
from enough . 
There should be a work-and-for-all solution. I proposed a plan for 
discussion,the figure show as below:

make server side classLoader which create by 
BloblibraryCacheManager.DefaultClassLoaderFactory pluggable in able to make the 
parent classloader of  ChildFirstClassLoader variable
the parent classloader facade client side multi classloader for class finding 
by polling
how about it , thanks


--
发件人:Arvid Heise 
发送时间:2021年10月18日(星期一) 17:28
收件人:Caizhi Weng 
抄 送:dev ; 百岁 ; user 

主 题:Re: dataStream can not use multiple classloaders

You also must ensure that your SourceFunction is serializable, so it's not 
enough to just refer to some classloader, you must ensure that you have access 
to it also after deserialization on the task managers.

On Mon, Oct 18, 2021 at 4:24 AM Caizhi Weng  wrote:

Hi!

There is only one classloader for user code by default in runtime. The main 
method of your code is only executed on the client side. It generates a job 
graph and sends it to the cluster.

To avoid class loading conflict it is recommended to shade the dependencies of 
your source and sink function jars. If you really have to load some 
dependencies with different class loaders, you can load them in the open method 
of a RichSourceFunction or RichSinkFunction.
百岁  于2021年10月16日周六 下午11:47写道:
TO: everyone
 I have create a dataStream demo as below,in the demo,create a very simple 
example,
 read stream data from sourceFunction,and send it to sinkFunction without any 
processing.
 The point is,by creating the instance of SourceFunction and SinkFunction has 
used two separately URLClassLoader with different dependencies,for avoiding the 
code conflict .
 but the problem is flink client send to server ,the server side throw an 
classNotFoundException which defined the de classloader dependencies, Obviously 
the server side has not use the classloader as client side.
 how can I solve the problem ,is there any one can give me some advice ? thanks 
a lot



 public class FlinkStreamDemo {
 public static void main(String[] args) throws Exception {

 StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
 SourceFunction sourceFunc = createSourceFunction();
 DataStreamSource dtoDataStreamSource = env.addSource(sourceFunc);
 SinkFunction sinkFunction = createSink();
 dtoDataStreamSource.addSink(sinkFunction);
 env.execute("flink-example");
 }

 private static SinkFunction createSink() {
 URL[] urls = new URL[]{...};
 ClassLoader classLoader = new URLClassLoader(urls);
 ServiceLoader loaders = 
ServiceLoader.load(ISinkFunctionFactory.class, classLoader);
 Iterator it = loaders.iterator();
 if (it.hasNext()) {
 return it.next().create();
 }
 throw new IllegalStateException();
 }

 private static SourceFunction createSourceFunction() {
 URL[] urls = new URL[]{...};
 ClassLoader classLoader = new URLClassLoader(urls);
 ServiceLoader loaders = 
ServiceLoader.load(ISourceFunctionFactory.class, classLoader);
 Iterator it = loaders.iterator();
 if (it.hasNext()) {
 return it.next().create();
 }
 throw new IllegalStateException();
 }

 public interface ISinkFunctionFactory {
 SinkFunction create();
 }

 public interface ISourceFunctionFactory {
 SourceFunction create();
 }
 }


 from: 
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-24558?filter=allissues



[jira] [Created] (FLINK-24587) Let PubSub source support changing subscriptions

2021-10-19 Thread Shiao-An Yuan (Jira)
Shiao-An Yuan created FLINK-24587:
-

 Summary: Let PubSub source support changing subscriptions
 Key: FLINK-24587
 URL: https://issues.apache.org/jira/browse/FLINK-24587
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Google Cloud PubSub
Affects Versions: 1.12.2
Reporter: Shiao-An Yuan


Original post on user mailing list: 
[link|https://lists.apache.org/thread.html/ra3047e5105fccbea42de6c37d52d05b492af496bd0bb95cc534630de%40%3Cuser.flink.apache.org%3E]

After resuming a Flink application from a snapshot with a *new subscription*, I 
got following errors repeatedly.

 
{code:java}
org.apache.flink.util.SerializedThrowable: INVALID_ARGUMENT: You have
passed a subscription that does not belong to the given ack ID
(resource=projects/x/subscriptions/).
at
io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:244)
~[?:?]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:225)
~[?:?]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:142)
~[?:?]
at
com.google.pubsub.v1.SubscriberGrpc$SubscriberBlockingStub.acknowledge(SubscriberGrpc.java:1628)
~[?:?]
at
org.apache.flink.streaming.connectors.gcp.pubsub.BlockingGrpcPubSubSubscriber.acknowledge(BlockingGrpcPubSubSubscriber.java:99)
~[?:?]
at
org.apache.flink.streaming.connectors.gcp.pubsub.common.AcknowledgeOnCheckpoint.notifyCheckpointComplete(AcknowledgeOnCheckpoint.java:84)
~[?:?]
at
org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.notifyCheckpointComplete(PubSubSource.java:208)
~[?:?]
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:99)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:319)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:1083)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$11(StreamTask.java:1048)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$13(StreamTask.java:1071)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:189)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
{code}
 

As I see it, the 
[AckId|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-gcp-pubsub/src/main/java/org/apache/flink/streaming/connectors/gcp/pubsub/PubSubSource.java#L149]
 became invalid as long as we change to another subscription.

I also noticed an interesting thing. The process of doing a 
checkpoint/savepoint is as follow:
 # output a checkpoint/savepoint which contains non-acknowledged message's 
ackIds
 # If the checkpoint/savepoint success, do the ack 
([s|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-gcp-pubsub/src/main/java/org/apache/flink/streaming/connectors/gcp/pubsub/common/AcknowledgeOnCheckpoint.java#L84]rc)
 # remove those ackIds from state 
([src|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-gcp-pubsub/src/main/java/org/apache/flink/streaming/connectors/gcp/pubsub/common/AcknowledgeOnCheckpoint.java#L87])

If we resume a job from a snapshot, those acknowledged ackIds (removed in step 
3) still exist in the savepoint (created in step 1), so it will do the ack 
again when the next 

[jira] [Created] (FLINK-24586) SQL functions should return STRING instead of VARCHAR(2000)

2021-10-19 Thread Jira
Ingo Bürk created FLINK-24586:
-

 Summary: SQL functions should return STRING instead of 
VARCHAR(2000)
 Key: FLINK-24586
 URL: https://issues.apache.org/jira/browse/FLINK-24586
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Ingo Bürk


There are some SQL functions which currently return VARCHAR(2000). With more 
strict CAST behavior from FLINK-24413, this could become an issue.

The following functions return VARCHAR(2000) and should be changed to return 
STRING instead:
* JSON_VALUE
* JSON_QUERY
* JSON_OBJECT
* JSON_ARRAY

There are also some more functions which should be evaluated:
* CHR
* REVERSE
* SPLIT_INDEX
* PARSE_URL
* FROM_UNIXTIME
* DECODE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24585) Print the change in the size of the compacted files

2021-10-19 Thread Yubin Li (Jira)
Yubin Li created FLINK-24585:


 Summary: Print the change in the size of the compacted files
 Key: FLINK-24585
 URL: https://issues.apache.org/jira/browse/FLINK-24585
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Affects Versions: 1.15.0
Reporter: Yubin Li


{code:java}
LOG.info(
"Compaction time cost is '{}S', target file is '{}', input files are 
'{}'",
costSeconds,
target,
paths);
{code}
only print the file name and time cost in compacting, maybe we need to print 
the size change.

we have a demand in this, and have implemented it, please assign this to me, 
thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)