[jira] [Created] (FLINK-27244) Support subdirectories with Hive tables

2022-04-13 Thread luoyuxia (Jira)
luoyuxia created FLINK-27244:


 Summary: Support subdirectories with Hive tables
 Key: FLINK-27244
 URL: https://issues.apache.org/jira/browse/FLINK-27244
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: luoyuxia


Hive support to read recursive directory by setting the property 'set 
mapred.input.dir.recursive=true', and Spark also support [such 
behavior|[https://stackoverflow.com/questions/42026043/how-to-recursively-read-hadoop-files-from-directory-using-spark]].

For normal case, it won't happed for reading recursive directory. But it may 
happen in the following case:

I have a paritioned table `fact_tz` with partition day/hour
{code:java}
CREATE TABLE fact_tz(x int) PARTITIONED BY (ds STRING, hr STRING) {code}
Then I want to create an external table `fact_daily` refering to  `fact_tz`, 
but with a coarse-grained partition day. 
{code:java}
create external table fact_daily(x int) PARTITIONED BY (ds STRING) location 
'fact_tz_localtion' ;

ALTER TABLE fact_daily ADD PARTITION (ds='1') location 
'fact_tz_localtion/ds=1'{code}
But it wll throw exception "Not a file: fact_tz_localtion/ds=1" when try to 
query this table `fact_daily` for it's the first level of the origin partition 
and is actually a directory .

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Re: [VOTE] Release 1.15.0, release candidate #2

2022-04-13 Thread Yun Gao
Very thanks Dawid for the check! I think the issue of wrong licences should
indeed be a blocker. I'll also have a look at these issues. 

Due to the issues, the RC2 would be officially canceled and I'll create the RC3
after these issues are fixed. 



 --Original Mail --
Sender:Chesnay Schepler 
Send Date:Thu Apr 14 00:40:32 2022
Recipients:Dev , Dawid Wysakowicz 
, Yun Gao 
Subject:Re: [VOTE] Release 1.15.0, release candidate #2
Re why CI didn't pick up on it:

Currently, excessive NOTICE entries don't fail CI, only missing ones do.
They should be logged though!
The reason being that only missing entries are problems (for us, in a 
legal sense).

That would be easy to change though.

On 13/04/2022 17:40, Dawid Wysakowicz wrote:
>
> -1,
>
> I checked new modules and modules with updated dependencies for proper 
> listing LICENSE files. I found a number of issues, of which one 
> justifies cancelling the rc, imo.
>
> Blocker:
>
>   * https://issues.apache.org/jira/browse/FLINK-27231
>
> Minor:
>
>   * https://issues.apache.org/jira/browse/FLINK-27230
>   * https://issues.apache.org/jira/browse/FLINK-27233
>
> I checked the following modules:
>
>   * flink-connector-kinesis
>   * flink-sql-connector-aws-kinesis-firehose
>   * flink-sql-connector-pulsar (blocker)
>   * flink-dist-scala_2.12
>   * flink-sql-connector-rabbitmq
>   * flink-gs-fs-hadoop
>   * flink-sql-connector-elasticsearch7
>
> Side note, shouldn't the two minor issues be automatically discovered 
> by the license checks on CI?
>
> Please let me know what you think about the blocker I found.
>
> Best,
>
> Dawid
>
> On 12/04/2022 17:09, Yun Gao wrote:
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version 1.15.0, 
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release and binary convenience releases to be 
>> deployed to dist.apache.org [2], which are signed with the key with 
>> fingerprint CBE82BEFD827B08AFA843977EDBF922A7BC84897 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "release-1.15.0-rc2" [5],
>> * website pull request listing the new release and adding announcement blog 
>> post [6].
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> Thanks,
>> Joe, Till and Yun Gao
>> [1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
>> [2]https://dist.apache.org/repos/dist/dev/flink/flink-1.15.0-rc2/
>> [3]https://dist.apache.org/repos/dist/release/flink/KEYS
>> [4]https://repository.apache.org/content/repositories/orgapacheflink-1495/
>> [5]https://github.com/apache/flink/releases/tag/release-1.15.0-rc2/[6]https://github.com/apache/flink-web/pull/526



[jira] [Created] (FLINK-27242) Support RENAME PARTITION statement for partitioned table

2022-04-13 Thread dalongliu (Jira)
dalongliu created FLINK-27242:
-

 Summary: Support RENAME PARTITION statement for partitioned table
 Key: FLINK-27242
 URL: https://issues.apache.org/jira/browse/FLINK-27242
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27243) Support SHOW PARTITIONS statement for partitioned table

2022-04-13 Thread dalongliu (Jira)
dalongliu created FLINK-27243:
-

 Summary: Support SHOW PARTITIONS statement for partitioned table
 Key: FLINK-27243
 URL: https://issues.apache.org/jira/browse/FLINK-27243
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27241) Support DROP PARTITION statement

2022-04-13 Thread dalongliu (Jira)
dalongliu created FLINK-27241:
-

 Summary: Support DROP PARTITION statement
 Key: FLINK-27241
 URL: https://issues.apache.org/jira/browse/FLINK-27241
 Project: Flink
  Issue Type: Sub-task
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27240) Support ADD PARTITION statement

2022-04-13 Thread dalongliu (Jira)
dalongliu created FLINK-27240:
-

 Summary: Support ADD PARTITION statement
 Key: FLINK-27240
 URL: https://issues.apache.org/jira/browse/FLINK-27240
 Project: Flink
  Issue Type: Sub-task
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27239) rewrite PreValidateReWriter from scala to java

2022-04-13 Thread xuyang (Jira)
xuyang created FLINK-27239:
--

 Summary: rewrite PreValidateReWriter from scala to java
 Key: FLINK-27239
 URL: https://issues.apache.org/jira/browse/FLINK-27239
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: xuyang


Can see the details in FLIP-28. Trying to rewrite classes from scala to java is 
a processing task.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27238) The HiveGenericUDTF should support primitive array,for example Array Array ...

2022-04-13 Thread hehuiyuan (Jira)
hehuiyuan created FLINK-27238:
-

 Summary: The HiveGenericUDTF should support primitive array,for 
example Array Array ...
 Key: FLINK-27238
 URL: https://issues.apache.org/jira/browse/FLINK-27238
 Project: Flink
  Issue Type: Bug
Reporter: hehuiyuan
 Attachments: image-2022-04-14-10-27-50-340.png

!image-2022-04-14-10-27-50-340.png|width=381,height=260!

 

If argTypes[0] is Array ,it will throw exception.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27237) Partition table statement enhancement

2022-04-13 Thread dalongliu (Jira)
dalongliu created FLINK-27237:
-

 Summary: Partition table statement enhancement
 Key: FLINK-27237
 URL: https://issues.apache.org/jira/browse/FLINK-27237
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27236) No task slot allocated for job in larege-scale job

2022-04-13 Thread yanpengshi (Jira)
yanpengshi created FLINK-27236:
--

 Summary: No task slot allocated for job in larege-scale job
 Key: FLINK-27236
 URL: https://issues.apache.org/jira/browse/FLINK-27236
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.13.3
Reporter: yanpengshi
 Attachments: jobmanager.log.26, taskmanager.log, topology.png

Hey,

 
We run a large-scale flink job containing six vertices with 3k parallelism. The 
Topology is shown below.

!topology.png!

We meets the following exception in jobmanager.log:[^jobmanager.log.26]
{code:java}
2022-03-02 08:01:16,601 INFO  [1998] 
[org.apache.flink.runtime.executiongraph.Execution.transitionState(Execution.java:1446)]
  - Source: tdbank_exposure_wx - Flat Map (772/3000) 
(6cd18d4ead1887a4e19fd3f337a6f4f8) switched from DEPLOYING to FAILED on 
container_e03_1639558254334_10048_01_004716 @ 11.104.77.40 
(dataPort=39313).java.util.concurrent.CompletionException: 
org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException: No 
task slot allocated for job ID ed780087 and allocation 
ID beb058d837c09e8d5a4a6aaf2426ca99. {code}
 

In the taskmanager.log [^taskmanager.log], the slot is freed due to timeout and 
the taskmanager receives the new allocated request. By increasing the value of 
key: taskmanager.slot.timeout, we can avoid this exception temporarily.

Here are some our guesses:
 # When the job is scheduled, the slot and execution have been bound, and then 
the task is deployed to the corresponding taskmanager.

 # The slot is released after the idle interval times out and notify the 
ResouceManager the slot free. Thus, the resourceManager will assign other 
request to the slot.

 # The task is deployed to taskmanager according the previous correspondence

 

The key problems are :
 # When the slot is free, the execution is not unassigned from the slot;

 # The slot state is not consistent in JobMaster and ResourceManager

 

Has anyone else encountered this problem? When the slot is freed, how can we 
unassign the previous bounded execution? Or we need to update the resource 
address of the execution. @[~zhuzh] @[~wanglijie95] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27235) Publish Flink k8s Operator Helm Charts via Github Actions

2022-04-13 Thread Gezim Sejdiu (Jira)
Gezim Sejdiu created FLINK-27235:


 Summary: Publish Flink k8s Operator Helm Charts via Github Actions
 Key: FLINK-27235
 URL: https://issues.apache.org/jira/browse/FLINK-27235
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Gezim Sejdiu


Hi team, 

 

thanks a lot for providing k8s-operator for Flink and glad to see the community 
around it.

 

Recently I did some experiments with the possibility to release Helm Charts via 
Github-Actions: [https://github.com/GezimSejdiu/helm-chart-releaser-demo] and 
was thinking if that would also be great if we can also integrate the same for 
Flink Kubernetes Operator so that the release doesn't need to be done manually, 
but triggered whenever anything does change on the Helm Chart folder.

 

If you think this will add any value to having all those processed present on 
Github Actions, I would be more than happy to contribute it.

 

Best regards, 

Gezim



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27234) Enable fork-reuse for connector-jdbc

2022-04-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27234:


 Summary: Enable fork-reuse for connector-jdbc
 Key: FLINK-27234
 URL: https://issues.apache.org/jira/browse/FLINK-27234
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / JDBC
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Mason Chen
Hi Chesnay,

Typically, users want to plug in a KafkaSubscriber that depends on an
external system [1][2]. We could also provide a higher level interface that
doesn’t depend on the Kafka Admin Client, but I think it would be more
flexible to be able to re-use the one created by the enumerator if needed.
If we don't want to expose the Kafka Admin Client and if users want to
apply some complex filter, then we can also provide a pluggable interface
used in a similar implementation to that of the subscriber used for topic
pattern and allow users to filter topics after the Kafka API response.

[1] https://www.mail-archive.com/user@flink.apache.org/msg44340.html
[2] https://www.mail-archive.com/dev@flink.apache.org/msg52007.html

Best,
Mason


On Wed, Apr 13, 2022 at 6:32 AM Chesnay Schepler  wrote:

> Could you expand a bit on possible alternative implementations that
> require this interface to become public, opposed to providing more
> built-in ways to subscribe?
>
> On 13/04/2022 11:26, Qingsheng Ren wrote:
> > Thanks for the proposal Mason! I think exposing `KafkaSubscriber` as
> public API is helpful for users to implement more complex subscription
> logics.
> >
> > +1 (non-binding)
> >
> > Cheers,
> >
> > Qingsheng
> >
> >> On Apr 12, 2022, at 11:46, Mason Chen  wrote:
> >>
> >> Hi Flink Devs,
> >>
> >> I was looking to contribute to
> https://issues.apache.org/jira/browse/FLINK-24660, which is a ticket to
> track changing the KafkaSubscriber from Internal to PublicEvolving.
> >>
> >> In the PR, it seems a few of us have agreement on making the subscriber
> pluggable in the KafkaSource, but I'd like to raise the question
> nevertheless. Furthermore, there is also interest from various Flink
> mailing threads and on the Jira ticket itself for the ticket, so I think
> the change would be beneficial to the users. There is already some feedback
> to make the contract of handling removed splits by the KafkaSource and
> subscriber clearer in the docs.
> >>
> >> I have yet to address all the PR feedback, but does anyone have any
> concerns before I proceed further?
> >>
> >> Best,
> >> Mason
>
>
>


Re: [VOTE] Release 1.15.0, release candidate #2

2022-04-13 Thread Chesnay Schepler

Re why CI didn't pick up on it:

Currently, excessive NOTICE entries don't fail CI, only missing ones do.
They should be logged though!
The reason being that only missing entries are problems (for us, in a 
legal sense).


That would be easy to change though.

On 13/04/2022 17:40, Dawid Wysakowicz wrote:


-1,

I checked new modules and modules with updated dependencies for proper 
listing LICENSE files. I found a number of issues, of which one 
justifies cancelling the rc, imo.


Blocker:

  * https://issues.apache.org/jira/browse/FLINK-27231

Minor:

  * https://issues.apache.org/jira/browse/FLINK-27230
  * https://issues.apache.org/jira/browse/FLINK-27233

I checked the following modules:

  * flink-connector-kinesis
  * flink-sql-connector-aws-kinesis-firehose
  * flink-sql-connector-pulsar (blocker)
  * flink-dist-scala_2.12
  * flink-sql-connector-rabbitmq
  * flink-gs-fs-hadoop
  * flink-sql-connector-elasticsearch7

Side note, shouldn't the two minor issues be automatically discovered 
by the license checks on CI?


Please let me know what you think about the blocker I found.

Best,

Dawid

On 12/04/2022 17:09, Yun Gao wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the version 1.15.0, as 
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
CBE82BEFD827B08AFA843977EDBF922A7BC84897 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.15.0-rc2" [5],
* website pull request listing the new release and adding announcement blog 
post [6].
The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.
Thanks,
Joe, Till and Yun Gao
[1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
[2]https://dist.apache.org/repos/dist/dev/flink/flink-1.15.0-rc2/
[3]https://dist.apache.org/repos/dist/release/flink/KEYS
[4]https://repository.apache.org/content/repositories/orgapacheflink-1495/
[5]https://github.com/apache/flink/releases/tag/release-1.15.0-rc2/[6]https://github.com/apache/flink-web/pull/526




Re: [VOTE] Release 1.15.0, release candidate #2

2022-04-13 Thread Dawid Wysakowicz

-1,

I checked new modules and modules with updated dependencies for proper 
listing LICENSE files. I found a number of issues, of which one 
justifies cancelling the rc, imo.


Blocker:

 * https://issues.apache.org/jira/browse/FLINK-27231

Minor:

 * https://issues.apache.org/jira/browse/FLINK-27230
 * https://issues.apache.org/jira/browse/FLINK-27233

I checked the following modules:

 * flink-connector-kinesis
 * flink-sql-connector-aws-kinesis-firehose
 * flink-sql-connector-pulsar (blocker)
 * flink-dist-scala_2.12
 * flink-sql-connector-rabbitmq
 * flink-gs-fs-hadoop
 * flink-sql-connector-elasticsearch7

Side note, shouldn't the two minor issues be automatically discovered by 
the license checks on CI?


Please let me know what you think about the blocker I found.

Best,

Dawid

On 12/04/2022 17:09, Yun Gao wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the version 1.15.0, as 
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
CBE82BEFD827B08AFA843977EDBF922A7BC84897 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.15.0-rc2" [5],
* website pull request listing the new release and adding announcement blog 
post [6].
The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.
Thanks,
Joe, Till and Yun Gao
[1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
[2]https://dist.apache.org/repos/dist/dev/flink/flink-1.15.0-rc2/
[3]https://dist.apache.org/repos/dist/release/flink/KEYS
[4]https://repository.apache.org/content/repositories/orgapacheflink-1495/
[5]https://github.com/apache/flink/releases/tag/release-1.15.0-rc2/[6]https://github.com/apache/flink-web/pull/526


OpenPGP_signature
Description: OpenPGP digital signature


[jira] [Created] (FLINK-27233) Unnecessary entries in connector-elasticsearch7 in NOTICE file

2022-04-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-27233:


 Summary: Unnecessary entries in connector-elasticsearch7 in NOTICE 
file
 Key: FLINK-27233
 URL: https://issues.apache.org/jira/browse/FLINK-27233
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Affects Versions: 1.15.0
Reporter: Dawid Wysakowicz


{{flink-sql-connector-elasticsearch7}} lists following dependencies in the 
NOTICE file, which are not bundled in the jar:

{code}
- com.fasterxml.jackson.core:jackson-databind:2.13.2.2
- com.fasterxml.jackson.core:jackson-annotations:2.13.2
- org.apache.lucene:lucene-spatial:8.7.0
- org.elasticsearch:elasticsearch-plugin-classloader:7.10.2
- org.lz4:lz4-java:1.8.0
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27232) .scalafmt.conf can

2022-04-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27232:


 Summary: .scalafmt.conf can
 Key: FLINK-27232
 URL: https://issues.apache.org/jira/browse/FLINK-27232
 Project: Flink
  Issue Type: Technical Debt
Reporter: Chesnay Schepler






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27231) SQL pulsar connector lists dependencies under wrong license

2022-04-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-27231:


 Summary: SQL pulsar connector lists dependencies under wrong 
license
 Key: FLINK-27231
 URL: https://issues.apache.org/jira/browse/FLINK-27231
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Pulsar
Affects Versions: 1.15.0
Reporter: Dawid Wysakowicz


Pulsar sql connector lists following dependencies under ASL2 license while they 
are licensed with Bouncy Castle license (variant of MIT?).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27230) Unnecessary entries in connector-kinesis NOTICE file

2022-04-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-27230:


 Summary: Unnecessary entries in connector-kinesis NOTICE file
 Key: FLINK-27230
 URL: https://issues.apache.org/jira/browse/FLINK-27230
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.15.0
Reporter: Dawid Wysakowicz


flink-connector-kinesis lists but does not bundle:

{code}
- commons-logging:commons-logging:1.1.3
- com.fasterxml.jackson.core:jackson-core:2.13.2
{code}

{code}
[INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded jar.
[INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.13.2 from the 
shaded jar.
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


回复:Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner

2022-04-13 Thread 罗宇侠(莫辞)
Hi all,
Sorry for the late reply for this thread.
About decoupling Hive Connector, it is actually mainly for decoupling Hive 
dialect. So, I think it's a good timing to introduce pluagble dialect mechanism 
for Flink and make Hive dialect as the first.
Based on this point, I have updated the FLIP-216 (Introduce pluggable dialect 
and plan for migrate Hive dialect)[1].
The overview of the FLIP is as follows:

1: Introuce a slim module with limited public interfaces for the pluagble 
dialect. The public interfaces are in final state and other dialects should 
implement the interfaces and follow the style that converts SQL statement to 
Flink's Operation Tree.

2: Plan for migrating Hive Dialect. For implementing Hive dialect, it's also 
expected to convert SQL statement to Flink's Operation Tree. But unfortunately, 
the current implementation is convert SQL statment to Calcite RelNode. It's 
hard to migrate it to Operation tree at one shot. It'll be better to migrate it 
step by step. So, the first step is to stll keep Calcite dependency and 
introduce an internal interface called CalciteContext to create Calcite 
RelNode, then we can decouple to flink-table-planner. As a result, we can move 
Hive connector out from Flink repository. The second step is to migrate it to 
Operation Tree, so that we can drop the Calcite dependency.

For some questiones from previous emails:
>> What about Scala? Is flink-table-planner-spi going to be a scala module with 
>> the related suffix?
flink-table-planner-spi is going to be a scala free moudle, for Hive dialect, 
it's fine not to expose a couple of types
which implemented with Scala.
>> Are you sure exposing the Calcite interfaces is going to be enough?
I'm quite sure. As for the specific method like FlinkTypeFactory#toLogicalType, 
I think we think we can implement a simliar type converter in Hive connector 
itself.


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrate+Hive+dialect

Best regards,
Yuxia--
发件人:Martijn Visser
日 期:2022年03月31日 17:09:15
收件人:dev
抄 送:罗宇侠(莫辞)
主 题:Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner

Hi all,

Thanks for opening this discussion. I agree with Francesco that we should
go for the best solution instead of another bandaid or intermediate
solution. I think there's actually consensus for that, given the argument
provided by Yuxia:

> The first way is the ideal way and we should go in that direction. But it
will take much effort for it requires rewriting all the code about Hive
dialect and it's hard to do it in one shot. And given we want to move out
Hive connector in 1.16, it's more pratical to decouple first, and then
migrate it to operation tree.

Why should we first invest time in another intermediate solution and then
spend time afterwards to actually get to the proper solution? I would
propose to:

- Remove Hive connector for version 1.*, 2.1.* and 2.2* in Flink 1.16
- Upgrade to the latest supported Hive 2.3.* and 3.1.* in Flink 1.16
- Get as much work done as possible to decouple the Hive connector from
Flink in 1.16.

There's customer value in this approach, because we'll support the latest
supported Hive versions. If customers need support for older Hive versions,
they can still use older Flink versions. There is also community value in
this approach, because we're actively working on making our codebase
maintainable. If the entire decoupling is not possible, then we won't move
the Hive 2.3.* and 3.1.* connector out in Flink 1.16, but in Flink 1.17.
Support for the new Hive version 4.* could then also be added in Flink 1.17
(we should not add support for newer versions of Hive until this decoupling
has been finished).

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Wed, 30 Mar 2022 at 17:07, Francesco Guardiani 
wrote:

> Sorry I replied on the wrong thread, i repost my answer here :)
>
> As there was already a discussion in the doc, I'll just summarize my
> opinions here on the proposed execution of this FLIP.
>
> I think we should rather avoid exposing internal details, which I consider
> Calcite to be part of, but rather reuse what we already have to define an
> AST from Table API, which is what I'll refer in this mail as Operation
> tree.
>
> First of all, the reason I think this FLIP is not a good idea is that it
> proposes is to expose types out of our control, so an API we cannot control
> and we may realistically never be able to stabilize. A Calcite bump in the
> table project is already pretty hard today, as shown by tasks like that
> https://github.com/apache/flink/pull/13577. This will make them even
> harder. Essentially it will couple us to Calcite even more, and create a
> different but still big maintenance/complexity burden we would like to get
> rid of with this FLIP.
>
> There are also some technical aspects that 

[jira] [Created] (FLINK-27229) Cassandra overrides netty version in tests

2022-04-13 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-27229:


 Summary: Cassandra overrides netty version in tests
 Key: FLINK-27229
 URL: https://issues.apache.org/jira/browse/FLINK-27229
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Cassandra
Affects Versions: 1.15.0
Reporter: Dawid Wysakowicz


{{flink-connector-cassandra}} declares:
{code}


io.netty
netty-all
4.1.46.Final
test

{code}
which overrides the project wide version of netty just for tests. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Chesnay Schepler
Could you expand a bit on possible alternative implementations that 
require this interface to become public, opposed to providing more 
built-in ways to subscribe?


On 13/04/2022 11:26, Qingsheng Ren wrote:

Thanks for the proposal Mason! I think exposing `KafkaSubscriber` as public API 
is helpful for users to implement more complex subscription logics.

+1 (non-binding)

Cheers,

Qingsheng


On Apr 12, 2022, at 11:46, Mason Chen  wrote:

Hi Flink Devs,

I was looking to contribute to 
https://issues.apache.org/jira/browse/FLINK-24660, which is a ticket to track 
changing the KafkaSubscriber from Internal to PublicEvolving.

In the PR, it seems a few of us have agreement on making the subscriber 
pluggable in the KafkaSource, but I'd like to raise the question nevertheless. 
Furthermore, there is also interest from various Flink mailing threads and on 
the Jira ticket itself for the ticket, so I think the change would be 
beneficial to the users. There is already some feedback to make the contract of 
handling removed splits by the KafkaSource and subscriber clearer in the docs.

I have yet to address all the PR feedback, but does anyone have any concerns 
before I proceed further?

Best,
Mason





[jira] [Created] (FLINK-27228) Redistributed modules across CI profiles

2022-04-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27228:


 Summary: Redistributed modules across CI profiles
 Key: FLINK-27228
 URL: https://issues.apache.org/jira/browse/FLINK-27228
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


With the recent improvements around testing times it is time to redistribute 
the modules again to achieve a more even distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27227) ScalarFunctionsTest#testExtract fails with Extract operator does not support unit

2022-04-13 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-27227:
---

 Summary: ScalarFunctionsTest#testExtract fails with Extract 
operator does not support unit
 Key: FLINK-27227
 URL: https://issues.apache.org/jira/browse/FLINK-27227
 Project: Flink
  Issue Type: Bug
Reporter: Sergey Nuyanzin


Probably the reason is that changes for 
{{flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/expressions/time.scala}}
 were not merged within 
https://github.com/apache/flink/commit/7715392e511b75e1bb21ac1831bb278b4c66
the whole trace is 
{noformat}
13 09:26:58 org.apache.flink.table.api.ValidationException: Extract operator 
does not support unit 'PlannerTimeIntervalUnit.MILLENNIUM' for input of type 
'LocalDate'.
Apr 13 09:26:58 at 
org.apache.flink.table.planner.expressions.PlannerTypeInferenceUtilImpl.validateArguments(PlannerTypeInferenceUtilImpl.java:111)
Apr 13 09:26:58 at 
org.apache.flink.table.planner.expressions.PlannerTypeInferenceUtilImpl.runTypeInference(PlannerTypeInferenceUtilImpl.java:69)
Apr 13 09:26:58 at 
org.apache.flink.table.expressions.resolver.rules.ResolveCallByArgumentsRule$ResolvingCallVisitor.runLegacyTypeInference(ResolveCallByArgumentsRule.java:284)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Apr 13 09:26:58 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Apr 13 09:26:58 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
Apr 13 09:26:58 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
Apr 13 09:26:58 at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
Apr 13 09:26:58 at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
Apr 13 09:26:58 at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
Apr 13 09:26:58 at 
org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
Apr 13 09:26:58 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.lambda$execute$1(JUnitPlatformProvider.java:199)
Apr 13 09:26:58 at 
java.util.Iterator.forEachRemaining(Iterator.java:116)
Apr 13 09:26:58 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.execute(JUnitPlatformProvider.java:193)
Apr 13 09:26:58 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
Apr 13 09:26:58 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:120)
Apr 13 09:26:58 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
Apr 13 09:26:58 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
Apr 13 09:26:58 at 
org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
Apr 13 09:26:58 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
Apr 13 09:26:58 

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27226) How to implement two partition methods for an operator at the same time?

2022-04-13 Thread Underwood (Jira)
Underwood created FLINK-27226:
-

 Summary: How to implement two partition methods for an operator at 
the same time?
 Key: FLINK-27226
 URL: https://issues.apache.org/jira/browse/FLINK-27226
 Project: Flink
  Issue Type: New Feature
  Components: API / DataStream
Affects Versions: 1.14.3
Reporter: Underwood


For example, I'm counting letters. Each letter of A ~ Y corresponds to a 
partition, and each partition records the number of times their letters appear, 
but the letter of Z needs to be counted by each partition, that is, in a 
stream, A ~ Y needs hash partition and Z needs broadcast partition. Is there 
any method to realize this?

I want to use partitioncustom, but it seems that it can only be sent to one 
partition.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27225) Remove redundant reuseForks settings

2022-04-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27225:


 Summary: Remove redundant reuseForks settings
 Key: FLINK-27225
 URL: https://issues.apache.org/jira/browse/FLINK-27225
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


A number of modules set reuseForks without it having any effect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Morávek
Here is a very naive implementation [1] from a prototype I did few months
back that uses list and insertion sort. Since the list is sorted we can use
binary search to create sub-list, that could leverage the same thing I've
described above.

I think back then I didn't go for the SortedMap as it would be hard to
implement with the current heap state backend internals and would have
bigger memory overhead.

The ideal solution would probably use skip list [2] to lower the overhead
of the binary search, while maintaining a reasonable memory footprint.
Other than that it could be pretty much the same as the prototype
implementation [1].

[1]
https://github.com/dmvk/flink/blob/ecdbc774b13b515e8c0943b2c143fb1e34eca6f0/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapTemporalListState.java
[2] https://en.wikipedia.org/wiki/Skip_list

Best,
D.

On Wed, Apr 13, 2022 at 1:27 PM David Morávek 
wrote:

> Hi David,
>
> It seems to me that at least with the heap-based state backend, readRange
>> is going to have to do a lot of unnecessary work to implement this
>> isEmpty() operation, since it have will to consider the entire range from
>> MIN_VALUE to MAX_VALUE. (Maybe we should add an explicit isEmpty method?
>> I'm not convinced we need it, but it would be cheaper to implement. Or
>> perhaps this join can be rewritten to not need this operation; I haven't
>> thought enough about that alternative.)
>>
>
> I think this really boils down to how the returned iterable is going to be
> implemented. Basically for checking whether state is empty, you need to do
> something along the lines of:
>
> Iterables.isEmpty(state.readRange(Long.MIN_VALUE, MAX_VALUE)); //
> basically checking `hasNext() == false` or `isEmpty()` in case of
> `Collection`
>
> Few notes:
> 1) It could be lazy (the underlying collection doesn't have to be
> materialized - eg. in case of RocksDB);
> 2) For HeapStateBackend it depends on the underlying implementation. I'd
> probably do something along the lines of sorted tree (eg. SortedMap /
> NavigableMap), that allows effective range scans / range deletes. Then you
> could simply do something like (from top of the head):
>
> @Value
> class TimestampedKey {
>   K key;
>   long timestamap;
> }
>
> SortedMap, V> internalState;
>
> Iterable> readRange(long min, long max) {
>   return toIterable(internalState.subMap(new TimestampedKey(currentKey(),
> min), new TimestampedKey(currentKey(), max)));
> }
>
> This should be fairly cheap. The important bit is that the returned
> iterator is always non-null, but could be empty.
>
> Does that answer your question?
>
> D.
>
> On Wed, Apr 13, 2022 at 12:21 PM David Anderson 
> wrote:
>
>> Yun Tang and Jingsong,
>>
>> Some flavor of OrderedMapState is certainly feasible, and I do see some
>> appeal in supporting Binary**State.
>>
>> However, I haven't seen a motivating use case for this generalization, and
>> would rather keep this as simple as possible. By handling Longs we can
>> already optimize a wide range of use cases.
>>
>> David
>>
>>
>> On Tue, Apr 12, 2022 at 9:21 AM Yun Tang  wrote:
>>
>> >  Hi David,
>> >
>> > Could you share some explanations why SortedMapState cannot work in
>> > details? I just cannot catch up what the statement below means:
>> >
>> > This was rejected as being overly difficult to implement in a way that
>> > would cleanly leverage RocksDB’s iterators.
>> >
>> >
>> > Best
>> > Yun Tang
>> > 
>> > From: Aitozi 
>> > Sent: Tuesday, April 12, 2022 15:00
>> > To: dev@flink.apache.org 
>> > Subject: Re: [DISCUSS] FLIP-220: Temporal State
>> >
>> > Hi David
>> >  I have look through the doc, I think it will be a good improvement
>> to
>> > this pattern usage, I'm interested in it. Do you have some POC work to
>> > share for a closer look.
>> > Besides, I have one question that can we support expose the namespace in
>> > the different state type not limited to `TemporalState`. By this, user
>> can
>> > specify the namespace
>> > and the TemporalState is one of the special case that it use timestamp
>> as
>> > the namespace. I think it will be more extendable.
>> > What do you think about this ?
>> >
>> > Best,
>> > Aitozi.
>> >
>> > David Anderson  于2022年4月11日周一 20:54写道:
>> >
>> > > Greetings, Flink developers.
>> > >
>> > > I would like to open up a discussion of a proposal [1] to add a new
>> kind
>> > of
>> > > state to Flink.
>> > >
>> > > The goal here is to optimize a fairly common pattern, which is using
>> > >
>> > > MapState>
>> > >
>> > > to store lists of events associated with timestamps. This pattern is
>> used
>> > > internally in quite a few operators that implement sorting and joins,
>> and
>> > > it also shows up in user code, for example, when implementing custom
>> > > windowing in a KeyedProcessFunction.
>> > >
>> > > Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves
>> a
>> > > more than 2x improvement in throughput when performing these
>> 

[jira] [Created] (FLINK-27224) Drop redundant flink.forkCountTestPackage property

2022-04-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27224:


 Summary: Drop redundant flink.forkCountTestPackage property
 Key: FLINK-27224
 URL: https://issues.apache.org/jira/browse/FLINK-27224
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


This property has been identical to flink.forkCount for as long as I can 
remember. We should be able to get rid of it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Morávek
Hi David,

It seems to me that at least with the heap-based state backend, readRange
> is going to have to do a lot of unnecessary work to implement this
> isEmpty() operation, since it have will to consider the entire range from
> MIN_VALUE to MAX_VALUE. (Maybe we should add an explicit isEmpty method?
> I'm not convinced we need it, but it would be cheaper to implement. Or
> perhaps this join can be rewritten to not need this operation; I haven't
> thought enough about that alternative.)
>

I think this really boils down to how the returned iterable is going to be
implemented. Basically for checking whether state is empty, you need to do
something along the lines of:

Iterables.isEmpty(state.readRange(Long.MIN_VALUE, MAX_VALUE)); // basically
checking `hasNext() == false` or `isEmpty()` in case of `Collection`

Few notes:
1) It could be lazy (the underlying collection doesn't have to be
materialized - eg. in case of RocksDB);
2) For HeapStateBackend it depends on the underlying implementation. I'd
probably do something along the lines of sorted tree (eg. SortedMap /
NavigableMap), that allows effective range scans / range deletes. Then you
could simply do something like (from top of the head):

@Value
class TimestampedKey {
  K key;
  long timestamap;
}

SortedMap, V> internalState;

Iterable> readRange(long min, long max) {
  return toIterable(internalState.subMap(new TimestampedKey(currentKey(),
min), new TimestampedKey(currentKey(), max)));
}

This should be fairly cheap. The important bit is that the returned
iterator is always non-null, but could be empty.

Does that answer your question?

D.

On Wed, Apr 13, 2022 at 12:21 PM David Anderson 
wrote:

> Yun Tang and Jingsong,
>
> Some flavor of OrderedMapState is certainly feasible, and I do see some
> appeal in supporting Binary**State.
>
> However, I haven't seen a motivating use case for this generalization, and
> would rather keep this as simple as possible. By handling Longs we can
> already optimize a wide range of use cases.
>
> David
>
>
> On Tue, Apr 12, 2022 at 9:21 AM Yun Tang  wrote:
>
> >  Hi David,
> >
> > Could you share some explanations why SortedMapState cannot work in
> > details? I just cannot catch up what the statement below means:
> >
> > This was rejected as being overly difficult to implement in a way that
> > would cleanly leverage RocksDB’s iterators.
> >
> >
> > Best
> > Yun Tang
> > 
> > From: Aitozi 
> > Sent: Tuesday, April 12, 2022 15:00
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-220: Temporal State
> >
> > Hi David
> >  I have look through the doc, I think it will be a good improvement
> to
> > this pattern usage, I'm interested in it. Do you have some POC work to
> > share for a closer look.
> > Besides, I have one question that can we support expose the namespace in
> > the different state type not limited to `TemporalState`. By this, user
> can
> > specify the namespace
> > and the TemporalState is one of the special case that it use timestamp as
> > the namespace. I think it will be more extendable.
> > What do you think about this ?
> >
> > Best,
> > Aitozi.
> >
> > David Anderson  于2022年4月11日周一 20:54写道:
> >
> > > Greetings, Flink developers.
> > >
> > > I would like to open up a discussion of a proposal [1] to add a new
> kind
> > of
> > > state to Flink.
> > >
> > > The goal here is to optimize a fairly common pattern, which is using
> > >
> > > MapState>
> > >
> > > to store lists of events associated with timestamps. This pattern is
> used
> > > internally in quite a few operators that implement sorting and joins,
> and
> > > it also shows up in user code, for example, when implementing custom
> > > windowing in a KeyedProcessFunction.
> > >
> > > Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
> > > more than 2x improvement in throughput when performing these operations
> > on
> > > RocksDB by better leveraging the capabilities of the RocksDB state
> > backend.
> > >
> > > See FLIP-220 [1] for details.
> > >
> > > Best,
> > > David
> > >
> > > [1] https://cwiki.apache.org/confluence/x/Xo_FD
> > >
> >
>


Re: [ANNOUNCE] New scalafmt formatter has been merged

2022-04-13 Thread Jingsong Li
Thanks for your work, Francesco

Best,
Jingsong


On Wed, Apr 13, 2022 at 6:37 PM Guowei Ma  wrote:
>
> Hi, Francesco
> Thanks for your work!
> Best,
> Guowei
>
>
> On Wed, Apr 13, 2022 at 5:35 PM Dian Fu  wrote:
>
> > Thanks a lot for this great work Francesco!
> >
> > Regards,
> > Dian
> >
> > On Wed, Apr 13, 2022 at 3:23 PM Marios Trivyzas  wrote:
> >
> > > Thank you for this Francesco!
> > >
> > > It will really improve the lives of everyone touching scala code!
> > >
> > > Best,
> > > Marios
> > >
> > > On Wed, Apr 13, 2022 at 9:55 AM Timo Walther  wrote:
> > >
> > > > Thanks for the great work Francesco!
> > > >
> > > > This will improve the contributor productivity a lot and ease reviews.
> > > > This change was long overdue.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > Am 12.04.22 um 17:21 schrieb Francesco Guardiani:
> > > > > Hi all,
> > > > > The new scalafmt formatter has been merged. From now on, just using
> > mvn
> > > > > spotless:apply as usual will format both Java and Scala, and Intellij
> > > > will
> > > > > automatically pick up the scalafmt config for who has the Scala
> > plugin
> > > > > installed. If it doesn't, just go in Preferences > Editor > Code
> > Style
> > > >
> > > > > Scala and change the Formatter to scalafmt. If you use the actions on
> > > > save
> > > > > plugin, make sure you have the reformat on save enabled for Scala.
> > > > >
> > > > > For more details on integration with IDEs, please refer to
> > > > > https://scalameta.org/scalafmt/docs/installation.html
> > > > >
> > > > > If you have a pending PR with Scala changes, chances are you're going
> > > to
> > > > > have conflicts with upstream/master now. In order to fix it, here is
> > > the
> > > > > suggested procedure:
> > > > >
> > > > > - Do an interactive rebase on commit
> > > > > 3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit
> > > before
> > > > the
> > > > > refactoring of the whole codebase, fixing as usual the
> > conflicting
> > > > changes.
> > > > > This will make sure you won't miss the changes between your
> > branch
> > > > and
> > > > > master *before* the reformatting commit.
> > > > > - Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > > > accepting all changes from your local branch. You can easily do
> > > that
> > > > via git
> > > > > rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > > > - Run mvn spotless:apply and commit all the changes
> > > > > - Do an interactive rebase on upstream/master. This will make
> > sure
> > > > you
> > > > > won't miss the changes between your branch and master *after* the
> > > > > reformatting commit.
> > > > > - Force push your branch to update the PR
> > > > >
> > > > > Sorry for this noise!
> > > > >
> > > > > Thank you,
> > > > > FG
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Marios
> > >
> >


[jira] [Created] (FLINK-27223) State access doesn't work as expected when cache size is set to 0

2022-04-13 Thread Dian Fu (Jira)
Dian Fu created FLINK-27223:
---

 Summary: State access doesn't work as expected when cache size is 
set to 0
 Key: FLINK-27223
 URL: https://issues.apache.org/jira/browse/FLINK-27223
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.15.1


For the following job:
{code}
import json
import logging
import sys

from pyflink.common import Types, Configuration
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.util.java_utils import get_j_env_configuration

if __name__ == '__main__':
logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
format="%(message)s")

env = StreamExecutionEnvironment.get_execution_environment()
config = Configuration(

j_configuration=get_j_env_configuration(env._j_stream_execution_environment))
config.set_integer("python.state.cache-size", 0)
env.set_parallelism(1)

# define the source
ds = env.from_collection(
collection=[
(1, '{"name": "Flink", "tel": 123, "addr": {"country": "Germany", 
"city": "Berlin"}}'),
(2, '{"name": "hello", "tel": 135, "addr": {"country": "China", 
"city": "Shanghai"}}'),
(3, '{"name": "world", "tel": 124, "addr": {"country": "USA", 
"city": "NewYork"}}'),
(4, '{"name": "PyFlink", "tel": 32, "addr": {"country": "China", 
"city": "Hangzhou"}}')
],
type_info=Types.ROW_NAMED(["id", "info"], [Types.INT(), Types.STRING()])
)

# key by
ds = ds.map(lambda data: (json.loads(data.info)['addr']['country'],
  json.loads(data.info)['tel'])) \
   .key_by(lambda data: data[0]).sum(1)
ds.print()
env.execute()
{code}

The expected result should be:
{code}
('Germany', 123)
('China', 135)
('USA', 124)
('China', 167)
{code}

However, the actual result is:
{code}
('Germany', 123)
('China', 135)
('USA', 124)
('China', 32)
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27222) Execution history limit can lead to eviction of critical local-recovery information

2022-04-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27222:


 Summary: Execution history limit can lead to eviction of critical 
local-recovery information
 Key: FLINK-27222
 URL: https://issues.apache.org/jira/browse/FLINK-27222
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0, 1.15.1


Local recovery relies on knowing the allocation id of the last deployment. To 
that end we iterate over all previous execution attempts and use the last 
{{assignedAllocationID}}, if any.
However, since the execution history is bounded (to, by default, 16 entries) 
this can lead this information being evicted.

In other words, with the default configuration (history limit = 16, restart 
delay = 1s) local recovery can only kick if the TM is restarted within 16 
seconds.

We should decouple this information from the execution (history).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [ANNOUNCE] New scalafmt formatter has been merged

2022-04-13 Thread Guowei Ma
Hi, Francesco
Thanks for your work!
Best,
Guowei


On Wed, Apr 13, 2022 at 5:35 PM Dian Fu  wrote:

> Thanks a lot for this great work Francesco!
>
> Regards,
> Dian
>
> On Wed, Apr 13, 2022 at 3:23 PM Marios Trivyzas  wrote:
>
> > Thank you for this Francesco!
> >
> > It will really improve the lives of everyone touching scala code!
> >
> > Best,
> > Marios
> >
> > On Wed, Apr 13, 2022 at 9:55 AM Timo Walther  wrote:
> >
> > > Thanks for the great work Francesco!
> > >
> > > This will improve the contributor productivity a lot and ease reviews.
> > > This change was long overdue.
> > >
> > > Regards,
> > > Timo
> > >
> > > Am 12.04.22 um 17:21 schrieb Francesco Guardiani:
> > > > Hi all,
> > > > The new scalafmt formatter has been merged. From now on, just using
> mvn
> > > > spotless:apply as usual will format both Java and Scala, and Intellij
> > > will
> > > > automatically pick up the scalafmt config for who has the Scala
> plugin
> > > > installed. If it doesn't, just go in Preferences > Editor > Code
> Style
> > >
> > > > Scala and change the Formatter to scalafmt. If you use the actions on
> > > save
> > > > plugin, make sure you have the reformat on save enabled for Scala.
> > > >
> > > > For more details on integration with IDEs, please refer to
> > > > https://scalameta.org/scalafmt/docs/installation.html
> > > >
> > > > If you have a pending PR with Scala changes, chances are you're going
> > to
> > > > have conflicts with upstream/master now. In order to fix it, here is
> > the
> > > > suggested procedure:
> > > >
> > > > - Do an interactive rebase on commit
> > > > 3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit
> > before
> > > the
> > > > refactoring of the whole codebase, fixing as usual the
> conflicting
> > > changes.
> > > > This will make sure you won't miss the changes between your
> branch
> > > and
> > > > master *before* the reformatting commit.
> > > > - Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > > accepting all changes from your local branch. You can easily do
> > that
> > > via git
> > > > rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > > - Run mvn spotless:apply and commit all the changes
> > > > - Do an interactive rebase on upstream/master. This will make
> sure
> > > you
> > > > won't miss the changes between your branch and master *after* the
> > > > reformatting commit.
> > > > - Force push your branch to update the PR
> > > >
> > > > Sorry for this noise!
> > > >
> > > > Thank you,
> > > > FG
> > > >
> > >
> > >
> >
> > --
> > Marios
> >
>


Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Anderson
Yun Tang and Jingsong,

Some flavor of OrderedMapState is certainly feasible, and I do see some
appeal in supporting Binary**State.

However, I haven't seen a motivating use case for this generalization, and
would rather keep this as simple as possible. By handling Longs we can
already optimize a wide range of use cases.

David


On Tue, Apr 12, 2022 at 9:21 AM Yun Tang  wrote:

>  Hi David,
>
> Could you share some explanations why SortedMapState cannot work in
> details? I just cannot catch up what the statement below means:
>
> This was rejected as being overly difficult to implement in a way that
> would cleanly leverage RocksDB’s iterators.
>
>
> Best
> Yun Tang
> 
> From: Aitozi 
> Sent: Tuesday, April 12, 2022 15:00
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-220: Temporal State
>
> Hi David
>  I have look through the doc, I think it will be a good improvement to
> this pattern usage, I'm interested in it. Do you have some POC work to
> share for a closer look.
> Besides, I have one question that can we support expose the namespace in
> the different state type not limited to `TemporalState`. By this, user can
> specify the namespace
> and the TemporalState is one of the special case that it use timestamp as
> the namespace. I think it will be more extendable.
> What do you think about this ?
>
> Best,
> Aitozi.
>
> David Anderson  于2022年4月11日周一 20:54写道:
>
> > Greetings, Flink developers.
> >
> > I would like to open up a discussion of a proposal [1] to add a new kind
> of
> > state to Flink.
> >
> > The goal here is to optimize a fairly common pattern, which is using
> >
> > MapState>
> >
> > to store lists of events associated with timestamps. This pattern is used
> > internally in quite a few operators that implement sorting and joins, and
> > it also shows up in user code, for example, when implementing custom
> > windowing in a KeyedProcessFunction.
> >
> > Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
> > more than 2x improvement in throughput when performing these operations
> on
> > RocksDB by better leveraging the capabilities of the RocksDB state
> backend.
> >
> > See FLIP-220 [1] for details.
> >
> > Best,
> > David
> >
> > [1] https://cwiki.apache.org/confluence/x/Xo_FD
> >
>


Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Anderson
Aitozi,

Our POC can be seen at [1].

My personal opinion is the namespace is an implementation detail that is
better not exposed directly. Flink state is already quite complex to
understand, and I fear that if we expose the namespaces the additional
flexibility this offers will be more confusing than helpful.

[1] https://github.com/dataArtisans/flink/tree/temporal-state

On Tue, Apr 12, 2022 at 9:00 AM Aitozi  wrote:

> Hi David
>  I have look through the doc, I think it will be a good improvement to
> this pattern usage, I'm interested in it. Do you have some POC work to
> share for a closer look.
> Besides, I have one question that can we support expose the namespace in
> the different state type not limited to `TemporalState`. By this, user can
> specify the namespace
> and the TemporalState is one of the special case that it use timestamp as
> the namespace. I think it will be more extendable.
> What do you think about this ?
>
> Best,
> Aitozi.
>
> David Anderson  于2022年4月11日周一 20:54写道:
>
> > Greetings, Flink developers.
> >
> > I would like to open up a discussion of a proposal [1] to add a new kind
> of
> > state to Flink.
> >
> > The goal here is to optimize a fairly common pattern, which is using
> >
> > MapState>
> >
> > to store lists of events associated with timestamps. This pattern is used
> > internally in quite a few operators that implement sorting and joins, and
> > it also shows up in user code, for example, when implementing custom
> > windowing in a KeyedProcessFunction.
> >
> > Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
> > more than 2x improvement in throughput when performing these operations
> on
> > RocksDB by better leveraging the capabilities of the RocksDB state
> backend.
> >
> > See FLIP-220 [1] for details.
> >
> > Best,
> > David
> >
> > [1] https://cwiki.apache.org/confluence/x/Xo_FD
> >
>


[jira] [Created] (FLINK-27221) Improvements to YAML parsing

2022-04-13 Thread yao.zhou (Jira)
yao.zhou created FLINK-27221:


 Summary: Improvements to YAML parsing
 Key: FLINK-27221
 URL: https://issues.apache.org/jira/browse/FLINK-27221
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Reporter: yao.zhou


For example, according to the current parsing method, the features of yaml 
cannot be used, and the password with # in the password cannot be parsed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Anderson
David, thanks for the feedback, much appreciated!

I'm hoping you can explain a bit more about how

Iterable> readRange(long minTimestamp, long
limitTimestamp);

would be used (and perhaps, implemented) in practice. I worry that this
might either prevent certain optimizations and/or be rather more complex to
optimize well.

I've been studying the implementation of temporal join that we did using
our proposed interface [1] to see what it might look if we used readRange
instead. For the most part, readRange seems like a good fit.

However, I'm wondering about the code which currently reads

   private transient MapState rightState;
   ...
   if (lastUnprocessedTime < Long.MAX_VALUE || !rightState.isEmpty()) { ...
}

Our POC implementation changes this to

   private transient TemporalValueState tRightState;
   ...
   if (lastUnprocessedTime < Long.MAX_VALUE
 || tRightState.valueAtOrAfter(Long.MIN_VALUE) != null) { ... }

It seems to me that at least with the heap-based state backend, readRange
is going to have to do a lot of unnecessary work to implement this
isEmpty() operation, since it have will to consider the entire range from
MIN_VALUE to MAX_VALUE. (Maybe we should add an explicit isEmpty method?
I'm not convinced we need it, but it would be cheaper to implement. Or
perhaps this join can be rewritten to not need this operation; I haven't
thought enough about that alternative.)

As for why we included TemporalValueState, we've found it useful on
multiple occasions. It's the right abstraction for storing the right-hand
side for temporal joins (the example in [2] is much easier to follow than
[1]), and we also found it useful for storing window metadata when we
implemented windowing.

[1]
https://github.com/dataArtisans/flink/blob/temporal-state/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/join/temporal/TemporalRowTimeJoinOperator.java
[2]
https://github.com/dataArtisans/flink/blob/temporal-state/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/temporal/TemporalJoin.java

Regards,
David

On Tue, Apr 12, 2022 at 10:01 AM David Morávek  wrote:

> Hi David,
>
> I really like the proposal. This has so much potential for various
> optimizations, especially for temporal joins. My only concern is that the
> interfaces seems unnecessarily complicated.
>
> My feeling would be that we only need a single, simple interface that would
> fit it all (the same way as it's already present in Apache Beam):
>
> @Experimental
> public interface TemporalListState
> extends MergingState,
> Iterable>> {
>
> /**
>  * Read a timestamp-limited subrange of the list. The result is ordered
> by timestamp.
>  *
>  * All values with timestamps >= minTimestamp and < limitTimestamp
> will be in the resuling
>  * iterable. This means that only timestamps strictly less than
>  * Instant.ofEpochMilli(Long.MAX_VALUE) can be used as timestamps.
>  */
> Iterable> readRange(long minTimestamp, long
> limitTimestamp);
>
> /**
>  * Clear a timestamp-limited subrange of the list.
>  *
>  * All values with timestamps >= minTimestamp and < limitTimestamp
> will be removed from the
>  * list.
>  */
> void clearRange(long minTimestamp, long limitTimestamp);
> }
>
> Is there anything missing here? Why do we need a temporal value state at
> all? In my understanding it's still basically a "temporal list state", just
> with a slightly different API. This is indeed necessary with the "temporal
> list state" API you've proposed, would it make sense to try unifying the
> two? I really think that the Beam community already did a good job on
> designing this API.
>
> Adding one state primitive is already a big change, so if we can keep it
> minimal it would be great.
>
> One more point on the proposed API, being able to clear only a single
> "timestamped value" at the time might be limiting for some use cases
> (performance wise, because we can't optimize it as we are with the range
> delete).
>
> Best,
> D.
>
> On Tue, Apr 12, 2022 at 9:32 AM Jingsong Li 
> wrote:
>
> > Hi David,
> >
> > Thanks for driving.
> >
> > I understand that state storage itself supports byte ordering, have we
> > considered exposing Binary**State? This way the upper layers can be
> > implemented on demand, Temporal is just one of them.
> >
> > Best,
> > Jingsong
> >
> > On Tue, Apr 12, 2022 at 3:01 PM Aitozi  wrote:
> > >
> > > Hi David
> > >  I have look through the doc, I think it will be a good improvement
> > to
> > > this pattern usage, I'm interested in it. Do you have some POC work to
> > > share for a closer look.
> > > Besides, I have one question that can we support expose the namespace
> in
> > > the different state type not limited to `TemporalState`. By this, user
> > can
> > > specify the namespace
> > > and the TemporalState is one of the special case that it use timestamp
> as
> > > the namespace. I think 

Re: [ANNOUNCE] New scalafmt formatter has been merged

2022-04-13 Thread Dian Fu
Thanks a lot for this great work Francesco!

Regards,
Dian

On Wed, Apr 13, 2022 at 3:23 PM Marios Trivyzas  wrote:

> Thank you for this Francesco!
>
> It will really improve the lives of everyone touching scala code!
>
> Best,
> Marios
>
> On Wed, Apr 13, 2022 at 9:55 AM Timo Walther  wrote:
>
> > Thanks for the great work Francesco!
> >
> > This will improve the contributor productivity a lot and ease reviews.
> > This change was long overdue.
> >
> > Regards,
> > Timo
> >
> > Am 12.04.22 um 17:21 schrieb Francesco Guardiani:
> > > Hi all,
> > > The new scalafmt formatter has been merged. From now on, just using mvn
> > > spotless:apply as usual will format both Java and Scala, and Intellij
> > will
> > > automatically pick up the scalafmt config for who has the Scala plugin
> > > installed. If it doesn't, just go in Preferences > Editor > Code Style
> >
> > > Scala and change the Formatter to scalafmt. If you use the actions on
> > save
> > > plugin, make sure you have the reformat on save enabled for Scala.
> > >
> > > For more details on integration with IDEs, please refer to
> > > https://scalameta.org/scalafmt/docs/installation.html
> > >
> > > If you have a pending PR with Scala changes, chances are you're going
> to
> > > have conflicts with upstream/master now. In order to fix it, here is
> the
> > > suggested procedure:
> > >
> > > - Do an interactive rebase on commit
> > > 3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit
> before
> > the
> > > refactoring of the whole codebase, fixing as usual the conflicting
> > changes.
> > > This will make sure you won't miss the changes between your branch
> > and
> > > master *before* the reformatting commit.
> > > - Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > accepting all changes from your local branch. You can easily do
> that
> > via git
> > > rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > - Run mvn spotless:apply and commit all the changes
> > > - Do an interactive rebase on upstream/master. This will make sure
> > you
> > > won't miss the changes between your branch and master *after* the
> > > reformatting commit.
> > > - Force push your branch to update the PR
> > >
> > > Sorry for this noise!
> > >
> > > Thank you,
> > > FG
> > >
> >
> >
>
> --
> Marios
>


[DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-04-13 Thread Sebastian Mattheis
Dear Flink developers,

I would like to open a discussion on FLIP 217 [1] for an extension of
Watermark Alignment to perform alignment also in SplitReaders. To do so,
SplitReaders must be able to suspend and resume reading from split sources
where the SourceOperator coordinates and controlls suspend and resume. To
gather information about current watermarks of the SplitReaders, we extend
the internal WatermarkOutputMulitplexer and report watermarks to the
SourceOperator.

There is a PoC for this FLIP [2], prototyped by Arvid Heise and revised and
reworked by Dawid Wysakowicz (He did most of the work.) and me. The changes
are backwards compatible in a way that if affected components do not
support split alignment the behavior is as before.

Best,
Sebastian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
[2] https://github.com/dawidwys/flink/tree/aligned-splits


Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Qingsheng Ren
Thanks for the proposal Mason! I think exposing `KafkaSubscriber` as public API 
is helpful for users to implement more complex subscription logics. 

+1 (non-binding)

Cheers, 

Qingsheng

> On Apr 12, 2022, at 11:46, Mason Chen  wrote:
> 
> Hi Flink Devs,
> 
> I was looking to contribute to 
> https://issues.apache.org/jira/browse/FLINK-24660, which is a ticket to track 
> changing the KafkaSubscriber from Internal to PublicEvolving.
> 
> In the PR, it seems a few of us have agreement on making the subscriber 
> pluggable in the KafkaSource, but I'd like to raise the question 
> nevertheless. Furthermore, there is also interest from various Flink mailing 
> threads and on the Jira ticket itself for the ticket, so I think the change 
> would be beneficial to the users. There is already some feedback to make the 
> contract of handling removed splits by the KafkaSource and subscriber clearer 
> in the docs.
> 
> I have yet to address all the PR feedback, but does anyone have any concerns 
> before I proceed further?
> 
> Best,
> Mason



[jira] [Created] (FLINK-27220) Remove redundant null-check for int parameter.

2022-04-13 Thread da (Jira)
da created FLINK-27220:
--

 Summary: Remove redundant null-check for int parameter.
 Key: FLINK-27220
 URL: https://issues.apache.org/jira/browse/FLINK-27220
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Reporter: da


Remove redundant null-check for int parameter.

 

https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/SubtaskExecutionAttemptAccumulatorsInfo.java#L59-L60



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27219) CliClientITCase.testSqlStatements failed on azure with jdk11

2022-04-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-27219:
---

 Summary: CliClientITCase.testSqlStatements failed on azure with 
jdk11
 Key: FLINK-27219
 URL: https://issues.apache.org/jira/browse/FLINK-27219
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.15.0
Reporter: Yun Gao



{code:java}
Apr 13 04:56:44 [ERROR] Could not execute SQL statement. Reason:
Apr 13 04:56:44 java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap')
Apr 13 04:56:44 !error
Apr 13 04:56:44 
Apr 13 04:56:44 # test "ctas" only supported in Hive Dialect
Apr 13 04:56:44 CREATE TABLE foo as select 1;
Apr 13 04:56:44 [ERROR] Could not execute SQL statement. Reason:
Apr 13 04:56:44 java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap')
Apr 13 04:56:44 !error
Apr 13 04:56:44 
Apr 13 04:56:44 # list the configured configuration
Apr 13 04:56:44 set;
Apr 13 04:56:44 [ERROR] Could not execute SQL statement. Reason:
Apr 13 04:56:44 java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap')
Apr 13 04:56:44 !error
Apr 13 04:56:44 
Apr 13 04:56:44 # reset the configuration
Apr 13 04:56:44 reset;
Apr 13 04:56:44 [ERROR] Could not execute SQL statement. Reason:
Apr 13 04:56:44 java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap')
Apr 13 04:56:44 !error
Apr 13 04:56:44 
Apr 13 04:56:44 set;
Apr 13 04:56:44 [ERROR] Could not execute SQL statement. Reason:
Apr 13 04:56:44 java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap')
Apr 13 04:56:44 !error

...

Apr 13 04:56:44 [ERROR] Could not execute SQL statement. Reason:
Apr 13 04:56:44 org.apache.flink.sql.parser.impl.ParseException: Encountered 
"STRING" at line 10, column 27.
Apr 13 04:56:44 Was expecting one of:
Apr 13 04:56:44 ")" ...
Apr 13 04:56:44 "," ...
Apr 13 04:56:44 
Apr 13 04:56:44 !error

...

Apr 13 04:56:44 SHOW JARS;
Apr 13 04:56:44 Empty set
Apr 13 04:56:44 !ok
Apr 13 04:56:44 "
Apr 13 04:56:44 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
Apr 13 04:56:44 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Apr 13 04:56:44 at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Apr 13 04:56:44 at 
org.apache.flink.table.client.cli.CliClientITCase.testSqlStatements(CliClientITCase.java:139)
Apr 13 04:56:44 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Apr 13 04:56:44 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Apr 13 04:56:44 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Apr 13 04:56:44 at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
Apr 13 04:56:44 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Apr 13 04:56:44 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Apr 13 04:56:44 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Apr 13 04:56:44 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Apr 13 04:56:44 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Apr 13 04:56:44 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Apr 13 04:56:44 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Apr 13 04:56:44 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
Apr 13 04:56:44 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Apr 13 04:56:44 at 

OLM Bundle planned?

2022-04-13 Thread Thoms Sven
Hello

Is integrating the helm-based flink-kubernetes-operator into Operator Community 
Marketplace
and the operator lifecycle manager planned in the future?

https://sdk.operatorframework.io/docs/olm-integration/tutorial-bundle/




Sven Thoms

Senior IT Architect Data Science
Banking

T +41 81 287 14 47

[Inventx AG]


Grabenstrasse 19 · 7000 Chur

Niederlassungen: Zürich-Flughafen | St. Gallen

www.inventx.ch  Spannende Insights mit dem ix.Newsletter. 
Jetzt abonnieren.


Hinweis: Ist diese E-Mail nicht an Sie adressiert, bitten wir Sie, uns zu 
informieren und die E-Mail zu löschen. Da der E-Mail-Verkehr weder sicher noch 
vor Einblicken und Veränderungen durch Dritte geschützt ist, lehnen wir 
jegliche Haftung für Schäden ab, die daraus entstehen können.


smime.p7s
Description: S/MIME cryptographic signature


Re: [DISCUSS] The abstraction of cache lookupFunction and cache metric

2022-04-13 Thread Qingsheng Ren
Hi Yuan,

Sorry for pending such long time on this thread. I think adding unified 
abstraction and metrics for cache is quite important for users and developers 
to optimize and improve their jobs with lookup join. We also have our inner 
cache abstraction and implementation, so I took a deeper observation and 
here’re some thoughts of mine. 

1. Metrics

I think users would be interested to these 3 aspects when debugging or 
benchmarking their jobs: 

(1) Hit / Miss rate
- hitCount, Counter type, to track number of cache hit
- missCount, Counter type, to track number of cache miss
Here we just report the raw count instead of rate to external metric system, 
since it’s easier and more flexible to make aggregations and calculate rate in 
metric systems like Prometheus.

(2) Loading throughput and latency
- numBytesLoadedTotal, Counter type, to track number of bytes totally loaded by 
cache
- numRecordsLoadedTotal, Counter type, to track number of records totally loaded
These two can be used for tracking the throughput of loading

- latestLoadTime, Gauge type, to track the time spent for the latest load 
operation
Actually it’s better to use histogram for tracking latency, but it’s quite 
expensive to manage a histogram. Personally I think a gauge would be good 
enough to reflect the latency.

- numLoadFailures, Counter type, to track number of failed loads.

(3) Current usage
- numCachedRecords, Gauge type, to track number of entries in cache
- numCachedBytes, Gauge type, to track number of bytes used by cache

Most of the metrics above are similar to your original proposal, and here’s the 
difference: 
(1) I still think it’s weird to report identifier and type as metrics. It’s 
quite handy to get the actual cache type through the code path, nevertheless 
some metric systems don't support string-type metrics (like Prometheus). 
(2) numRecords is renamed to numCachedRecords
(3) loadSuccessCount is deduced by missCount - numLoadFailures. I think users 
would be interested to know how many times it loads (missCount), and how many 
failures (numLoadFailures)
(4) totalLoadTime is replaced by latestLoadTime. I think it’s not quite 
meaningful for users to see a long run job reporting totalLoadTime with hours 
even days as value.

2. APIs

(1) CacheMetricGroup: 

public interface CacheMetricGroup {
Counter getHitCounter();

Counter getMissCounter();

Counter getNumRecordsLoadedTotalCounter();

Counter getNumBytesLoadedTotalCounter();

Gauge getLatestLoadTimeGauge();

Counter getNumLoadFailureCounter();

void setNumCachedRecordsGauge(Gauge numCachedRecordsGauge);

void setNumCachedBytesGauge(Gauge numCachedBytesGauge)
}

Note that some metrics are provided as getters since they are quite straight 
forward, except numCacheRecords/Bytes, which should be left for cache 
implementers. 

(2) Cache

public interface Cache extends AutoClosable {
void open(CacheMetricGroup cacheMetricGroup);

V get(K key, Callable loader) throws Exception;

void put(K key, V value);

void putAll(Map m);

void clean();

long size();
}

Compared to your proposal: 
a. `getIdentifier()` is removed. I can’t see any usage of this function, since 
we are not dynamically loading cache implementations via SPI or factory style.
b. `init()` and `initMetric()` are merged to `open(CacheMetricGroup)`.
c. Extends `AutoClosable` to be symmetric to open, for cleaning resources 
claimed by cache
d. `getMetricGroup()` is removed. Metric groups should be exposed to cache 
implementations instead of users. 

3. Other topics
Another point to note is that if you check the usage of cache in JDBC and Hive 
lookup table, the value type is List, since it’s common that a joining 
key could mapped to multiple rows. We could add another layer of abstraction 
under Cache interface, for example: 

OneToManyCache extends Cache>

And add interfaces like `appendToKey(List)` to it. What do you think?

Cheers, 

Qingsheng

> On Mar 7, 2022, at 16:00, zst...@163.com wrote:
> 
> Hi devs,
> 
> 
> I would like to propose a discussion thread about abstraction of Cache 
> LookupFunction with metrics for cache in connectors to make cache out of box 
> for connector developers. There are multiple LookupFunction implementations 
> in individual connectors [1][2][3][4] so far. 
> At the same time, users can monitor cache in LookupFunction by adding uniform 
> cache metrics to optimize tasks or troubleshoot.
> 
> 
> I have posted an issue about this, see 
> , and made a brief design 
> .
> 
> 
> Looking forward to your feedback, thanks.
> 
> 
> Best regards,
> Yuan
> 
> 
> 
> 
> [1] 
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java
> [2] 
> 

Re: FLINK-11746 work

2022-04-13 Thread Martijn Visser
Hi Chen,

Thanks for restarting the discussion. I would +1 on Yu, let's agree on the
scope and the design. Could you elaborate a bit on how you're using Thrift?
When reading more about this, it feels like Thrift is somewhat in the
middle between a connector and a format from a Flink perspective. It would
be interesting to understand better why and how you're using these two
technologies.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Wed, 13 Apr 2022 at 08:28, Yu Li  wrote:

> Hi Chen,
>
> Thanks for the interest to work on this.
>
> Before assigning the JIRA and starting the upstream work, how about
> reviving the discussion thread [1] first, making sure to reach a consensus
> on the design and find committer resources for code review, according to
> our code contribution process [2]?
>
> Best Regards,
> Yu
>
> [1] https://lists.apache.org/thread/lp8xrzbhl70c2zbmd59zcbmtdhk1g8j2
> [2]
>
> https://flink.apache.org/contributing/contribute-code.html#code-contribution-process
>
>
> On Tue, 12 Apr 2022 at 10:37, Chen Qin  wrote:
>
> > Hi there,
> >
> > I would like to reboot discussion on FLINK-11746
> >  work. Over the
> course
> > of last two years, we managed to run a large number of critical flink
> apps
> > (table/datastream) with underlying thrift format. It would be great if
> > folks assign this jira to me so we would be able to kick off discussion
> on
> > upstream work.
> >
> > Chen
> >
>


Re: Update regarding externalizing existing connectors and new connectors

2022-04-13 Thread Marios Trivyzas
Thanks a lot for the update Martijn and all your efforts, and those of
everyone involved in this!

Best,
Marios


On Tue, Apr 12, 2022 at 2:41 PM Martijn Visser 
wrote:

> Hi everyone,
>
> After our discussion thread on "Creating an external connector repository"
> [1], "Moving connectors from Flink to external connector repositories" [2]
> and "Plan to externalize connectors and versioning" [3] I wanted to give
> you an update on the current situation, what we're working on and something
> on new connectors.
>
> We're in the process of moving out the Elasticsearch connector from Flink's
> repository. The code has been moved to the new repository [4] and the first
> CI pipeline has been set up. We're currently figuring out the best way to
> integrate the documentation from this repo in the Flink documentation and
> how to properly setup a pipeline that allows testing of the connector
> between released and snapshot versions of Flink.
>
> When this has completed, we'll focus on the RabbitMQ connector. We think
> that after these two have been migrated out of the Flink repo, we have a
> good overview of what's needed for a migration and where we can
> standardize.
>
> Next to that, I would also like to make you aware that there are two new
> connectors that will go straight into their own repositories. One is for
> Opensearch, the other one is for Redis.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> [2] https://lists.apache.org/thread/bk9f91o6wk66zdh353j1n7sfshh262tr
> [3] https://lists.apache.org/thread/vqbjoo94wwqcvo32c80dkmp7r5gmy68r
> [4] https://github.com/apache/flink-connector-elasticsearch
>


-- 
Marios


Re: [DISCUSS] Roadmap update for 1.15

2022-04-13 Thread Martijn Visser
Hi Joe,

Thanks for the update! I've left a couple of comments, which I'll repeat
here:

- Kafka, File [via Unified Sink API] should go to Production Ready &
Evolving. We should also add Pulsar there.
- We're missing Kinesis and Firehose as a connector, which should probably
be in Beta.
- NiFi Source should be removed from list all together, since we've removed
it (it's listed under Beta)
- DataStream (batch) could move to Production Ready & Evolving
- I would move JDBC Sink to Approaching End-of-Life (because we need to
replace it with a JDBC Sink via Unified Sink API
- Same for RabbitMQ source, will also be replaced with one via Unified
Source API
- Same for PubSub Source & Sink
- Same for HBase SQL Source & Sink
- Same for Cassandra Sink
- Scala 2.11 can be removed from the diagram, since it has been removed
already

Best regards,


Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Wed, 13 Apr 2022 at 09:25, Jingsong Li  wrote:

> Thanks Joe!
>
> Best,
> Jingsong
>
> On Wed, Apr 13, 2022 at 3:22 PM Johannes Moser  wrote:
> >
> > Hi Jingsong,
> >
> > I already included it in the roadmap [1], will also mention it in the
> release notes.
> >
> > Best,
> > Joe
> >
> >
> > [1]
> https://github.com/apache/flink-web/pull/527/files#diff-51abe96bc493857fdfba599c6b70e3a6c09200cf29e130d15b599fef538e32dfR87-R89
> >
> > > On 13.04.2022, at 09:19, Jingsong Li  wrote:
> > >
> > > Hi Joe,
> > >
> > > Thanks for driving. Looks very good~
> > >
> > > As a part of Apache Flink,Flink Table Store [1][2] is a new feature on
> > > Flink Table. It is now in Beta status.
> > >
> > > Now Flink Table Store is waiting for the release of Flink 1.15 as it
> > > depends on Flink 1.15, so it is ready to release the first version
> > > after Flink 1.15.
> > > We can also mention Flink Table Store in the 1.15 release[3].
> > >
> > > What do you think?
> > >
> > > [1] https://github.com/apache/flink-table-store
> > > [2] https://nightlies.apache.org/flink/flink-table-store-docs-master/
> > > [3] https://github.com/apache/flink-web/pull/526
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Wed, Apr 13, 2022 at 3:06 PM Johannes Moser 
> wrote:
> > >>
> > >> Dear Flink Community,
> > >>
> > >> I created a PR to update the roadmap.
> > >>
> > >> Please have a look at the new feature radar as well as on the
> overview of the most important efforts that are coming up.
> > >>
> > >> https://github.com/apache/flink-web/pull/527 <
> https://github.com/apache/flink-web/pull/527>
> > >>
> > >> Best,
> > >> Joe
> >
>


Re: [DISCUSS] Roadmap update for 1.15

2022-04-13 Thread Jingsong Li
Thanks Joe!

Best,
Jingsong

On Wed, Apr 13, 2022 at 3:22 PM Johannes Moser  wrote:
>
> Hi Jingsong,
>
> I already included it in the roadmap [1], will also mention it in the release 
> notes.
>
> Best,
> Joe
>
>
> [1] 
> https://github.com/apache/flink-web/pull/527/files#diff-51abe96bc493857fdfba599c6b70e3a6c09200cf29e130d15b599fef538e32dfR87-R89
>
> > On 13.04.2022, at 09:19, Jingsong Li  wrote:
> >
> > Hi Joe,
> >
> > Thanks for driving. Looks very good~
> >
> > As a part of Apache Flink,Flink Table Store [1][2] is a new feature on
> > Flink Table. It is now in Beta status.
> >
> > Now Flink Table Store is waiting for the release of Flink 1.15 as it
> > depends on Flink 1.15, so it is ready to release the first version
> > after Flink 1.15.
> > We can also mention Flink Table Store in the 1.15 release[3].
> >
> > What do you think?
> >
> > [1] https://github.com/apache/flink-table-store
> > [2] https://nightlies.apache.org/flink/flink-table-store-docs-master/
> > [3] https://github.com/apache/flink-web/pull/526
> >
> > Best,
> > Jingsong
> >
> > On Wed, Apr 13, 2022 at 3:06 PM Johannes Moser  wrote:
> >>
> >> Dear Flink Community,
> >>
> >> I created a PR to update the roadmap.
> >>
> >> Please have a look at the new feature radar as well as on the overview of 
> >> the most important efforts that are coming up.
> >>
> >> https://github.com/apache/flink-web/pull/527 
> >> 
> >>
> >> Best,
> >> Joe
>


Re: [ANNOUNCE] New scalafmt formatter has been merged

2022-04-13 Thread Marios Trivyzas
Thank you for this Francesco!

It will really improve the lives of everyone touching scala code!

Best,
Marios

On Wed, Apr 13, 2022 at 9:55 AM Timo Walther  wrote:

> Thanks for the great work Francesco!
>
> This will improve the contributor productivity a lot and ease reviews.
> This change was long overdue.
>
> Regards,
> Timo
>
> Am 12.04.22 um 17:21 schrieb Francesco Guardiani:
> > Hi all,
> > The new scalafmt formatter has been merged. From now on, just using mvn
> > spotless:apply as usual will format both Java and Scala, and Intellij
> will
> > automatically pick up the scalafmt config for who has the Scala plugin
> > installed. If it doesn't, just go in Preferences > Editor > Code Style >
> > Scala and change the Formatter to scalafmt. If you use the actions on
> save
> > plugin, make sure you have the reformat on save enabled for Scala.
> >
> > For more details on integration with IDEs, please refer to
> > https://scalameta.org/scalafmt/docs/installation.html
> >
> > If you have a pending PR with Scala changes, chances are you're going to
> > have conflicts with upstream/master now. In order to fix it, here is the
> > suggested procedure:
> >
> > - Do an interactive rebase on commit
> > 3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit before
> the
> > refactoring of the whole codebase, fixing as usual the conflicting
> changes.
> > This will make sure you won't miss the changes between your branch
> and
> > master *before* the reformatting commit.
> > - Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > accepting all changes from your local branch. You can easily do that
> via git
> > rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > - Run mvn spotless:apply and commit all the changes
> > - Do an interactive rebase on upstream/master. This will make sure
> you
> > won't miss the changes between your branch and master *after* the
> > reformatting commit.
> > - Force push your branch to update the PR
> >
> > Sorry for this noise!
> >
> > Thank you,
> > FG
> >
>
>

-- 
Marios


Re: [DISCUSS] Roadmap update for 1.15

2022-04-13 Thread Johannes Moser
Hi Jingsong,

I already included it in the roadmap [1], will also mention it in the release 
notes.

Best,
Joe


[1] 
https://github.com/apache/flink-web/pull/527/files#diff-51abe96bc493857fdfba599c6b70e3a6c09200cf29e130d15b599fef538e32dfR87-R89

> On 13.04.2022, at 09:19, Jingsong Li  wrote:
> 
> Hi Joe,
> 
> Thanks for driving. Looks very good~
> 
> As a part of Apache Flink,Flink Table Store [1][2] is a new feature on
> Flink Table. It is now in Beta status.
> 
> Now Flink Table Store is waiting for the release of Flink 1.15 as it
> depends on Flink 1.15, so it is ready to release the first version
> after Flink 1.15.
> We can also mention Flink Table Store in the 1.15 release[3].
> 
> What do you think?
> 
> [1] https://github.com/apache/flink-table-store
> [2] https://nightlies.apache.org/flink/flink-table-store-docs-master/
> [3] https://github.com/apache/flink-web/pull/526
> 
> Best,
> Jingsong
> 
> On Wed, Apr 13, 2022 at 3:06 PM Johannes Moser  wrote:
>> 
>> Dear Flink Community,
>> 
>> I created a PR to update the roadmap.
>> 
>> Please have a look at the new feature radar as well as on the overview of 
>> the most important efforts that are coming up.
>> 
>> https://github.com/apache/flink-web/pull/527 
>> 
>> 
>> Best,
>> Joe



Re: [DISCUSS] Roadmap update for 1.15

2022-04-13 Thread Jingsong Li
Hi Joe,

Thanks for driving. Looks very good~

As a part of Apache Flink,Flink Table Store [1][2] is a new feature on
Flink Table. It is now in Beta status.

Now Flink Table Store is waiting for the release of Flink 1.15 as it
depends on Flink 1.15, so it is ready to release the first version
after Flink 1.15.
We can also mention Flink Table Store in the 1.15 release[3].

What do you think?

[1] https://github.com/apache/flink-table-store
[2] https://nightlies.apache.org/flink/flink-table-store-docs-master/
[3] https://github.com/apache/flink-web/pull/526

Best,
Jingsong

On Wed, Apr 13, 2022 at 3:06 PM Johannes Moser  wrote:
>
> Dear Flink Community,
>
> I created a PR to update the roadmap.
>
> Please have a look at the new feature radar as well as on the overview of the 
> most important efforts that are coming up.
>
> https://github.com/apache/flink-web/pull/527 
> 
>
> Best,
> Joe


[jira] [Created] (FLINK-27218) Serializer in OperatorState has not been updated when new Serializers are NOT incompatible

2022-04-13 Thread Yue Ma (Jira)
Yue Ma created FLINK-27218:
--

 Summary: Serializer in OperatorState has not been updated when new 
Serializers are NOT incompatible
 Key: FLINK-27218
 URL: https://issues.apache.org/jira/browse/FLINK-27218
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.15.1
Reporter: Yue Ma
 Attachments: image-2022-04-13-14-50-10-921.png

Now OperatorState can only be constructed via DefaultOperatorStateBackend. But 
when *BroadcastState* or *PartitionableListState* Serializer changes, it seems 
to have the following problems.

As an example, we can see how PartitionableListState is initialized.

First, we will construct a restored PartitionableListState based on the 
information in the snapshot during the restoreOperation.

Then we will update the StateMetaInfo in partitionableListState as the 
following code

 
{code:java}
TypeSerializerSchemaCompatibility stateCompatibility =
                
restoredPartitionableListStateMetaInfo.updatePartitionStateSerializer(newPartitionStateSerializer);

partitionableListState.setStateMetaInfo(restoredPartitionableListStateMetaInfo);{code}
 


The main problem is that there is also an internalListCopySerializer in 
PartitionableListState that is built using the previous Serializer and it has 
not been updated. 
But internalListCopySerializer will be used later when making the snopshot. 
Therefore, when we update the StateMetaInfo, the internalListCopySerializer 
also needs to be updated.

This problem also exists in BroadcastState



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27217) Support partition filter push down when there exists default_parition

2022-04-13 Thread luoyuxia (Jira)
luoyuxia created FLINK-27217:


 Summary: Support partition filter push down when there exists 
default_parition
 Key: FLINK-27217
 URL: https://issues.apache.org/jira/browse/FLINK-27217
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: luoyuxia
 Fix For: 1.16.0


Using Hive dialect, when the table's partition column is nullable, and when 
insert a row whose value of partition column is null,  the  row will be 
assigned to the partition with value 'HiveConf.ConfVars.DEFAULTPARTITIONNAME',  
default is  "__HIVE_DEFAULT_PARTITION__".

Like this :

 
{code:java}
create table ptestfilter (a string) partitioned by (c int);
INSERT OVERWRITE TABLE ptestfilter PARTITION (c) select 'Col1', null;
INSERT OVERWRITE TABLE ptestfilter PARTITION (c) select 'Col2', 5;
select * from ptestfilter where c between 2 and 6 ;{code}
It'll try to do partition filter push down, and  cast the 
'__HIVE_DEFAULT_PARTITION__' to int type, then it will fail with the following 
exception:
{code:java}
java.lang.NumberFormatException: For input string: "__HIVE_DEFAULT_PARTITION__"
    at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:273)
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.org$apache$flink$table$planner$plan$utils$PartitionPruner$$convertPartitionFieldValue(PartitionPruner.scala:173)
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27216) CheckpointCoordinatorTest.testMinCheckpointPause failed on azure

2022-04-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-27216:
---

 Summary: CheckpointCoordinatorTest.testMinCheckpointPause failed 
on azure
 Key: FLINK-27216
 URL: https://issues.apache.org/jira/browse/FLINK-27216
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.15.0
Reporter: Yun Gao



{code:java}
2022-04-12T04:36:26.8490051Z Apr 12 04:36:26 [ERROR] Tests run: 49, Failures: 
0, Errors: 1, Skipped: 0, Time elapsed: 4.912 s <<< FAILURE! - in 
org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest
2022-04-12T04:36:26.8491666Z Apr 12 04:36:26 [ERROR] 
org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest.testMinCheckpointPause
  Time elapsed: 0.053 s  <<< ERROR!
2022-04-12T04:36:26.8493295Z Apr 12 04:36:26 
org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize the 
pending checkpoint 1. Failure reason: Failure to finalize checkpoint.
2022-04-12T04:36:26.8494208Z Apr 12 04:36:26at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1354)
2022-04-12T04:36:26.8495074Z Apr 12 04:36:26at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1241)
2022-04-12T04:36:26.8496598Z Apr 12 04:36:26at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1133)
2022-04-12T04:36:26.8497813Z Apr 12 04:36:26at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest.testMinCheckpointPause(CheckpointCoordinatorTest.java:396)
2022-04-12T04:36:26.8498560Z Apr 12 04:36:26at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-04-12T04:36:26.8499456Z Apr 12 04:36:26at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-04-12T04:36:26.8500181Z Apr 12 04:36:26at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-04-12T04:36:26.8500838Z Apr 12 04:36:26at 
java.lang.reflect.Method.invoke(Method.java:498)
2022-04-12T04:36:26.8501563Z Apr 12 04:36:26at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2022-04-12T04:36:26.8502551Z Apr 12 04:36:26at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2022-04-12T04:36:26.8503624Z Apr 12 04:36:26at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2022-04-12T04:36:26.8504933Z Apr 12 04:36:26at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2022-04-12T04:36:26.8505787Z Apr 12 04:36:26at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2022-04-12T04:36:26.8506629Z Apr 12 04:36:26at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
2022-04-12T04:36:26.8507691Z Apr 12 04:36:26at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
2022-04-12T04:36:26.8508940Z Apr 12 04:36:26at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
2022-04-12T04:36:26.8509960Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2022-04-12T04:36:26.8511048Z Apr 12 04:36:26at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
2022-04-12T04:36:26.8512426Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
2022-04-12T04:36:26.8513106Z Apr 12 04:36:26at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
2022-04-12T04:36:26.8513828Z Apr 12 04:36:26at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
2022-04-12T04:36:26.8514490Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2022-04-12T04:36:26.8515121Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2022-04-12T04:36:26.8515751Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2022-04-12T04:36:26.8516368Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2022-04-12T04:36:26.8516995Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2022-04-12T04:36:26.8517620Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2022-04-12T04:36:26.8518227Z Apr 12 04:36:26at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
2022-04-12T04:36:26.8518804Z Apr 12 04:36:26at 
org.junit.runner.JUnitCore.run(JUnitCore.java:137)
2022-04-12T04:36:26.8519362Z Apr 12 04:36:26at 
org.junit.runner.JUnitCore.run(JUnitCore.java:115)
2022-04-12T04:36:26.8519978Z Apr 12 04:36:26at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)

[DISCUSS] Roadmap update for 1.15

2022-04-13 Thread Johannes Moser
Dear Flink Community,

I created a PR to update the roadmap. 

Please have a look at the new feature radar as well as on the overview of the 
most important efforts that are coming up.

https://github.com/apache/flink-web/pull/527 


Best,
Joe

Re: [ANNOUNCE] New scalafmt formatter has been merged

2022-04-13 Thread Timo Walther

Thanks for the great work Francesco!

This will improve the contributor productivity a lot and ease reviews. 
This change was long overdue.


Regards,
Timo

Am 12.04.22 um 17:21 schrieb Francesco Guardiani:

Hi all,
The new scalafmt formatter has been merged. From now on, just using mvn
spotless:apply as usual will format both Java and Scala, and Intellij will
automatically pick up the scalafmt config for who has the Scala plugin
installed. If it doesn't, just go in Preferences > Editor > Code Style >
Scala and change the Formatter to scalafmt. If you use the actions on save
plugin, make sure you have the reformat on save enabled for Scala.

For more details on integration with IDEs, please refer to
https://scalameta.org/scalafmt/docs/installation.html

If you have a pending PR with Scala changes, chances are you're going to
have conflicts with upstream/master now. In order to fix it, here is the
suggested procedure:

- Do an interactive rebase on commit
3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit before the
refactoring of the whole codebase, fixing as usual the conflicting changes.
This will make sure you won't miss the changes between your branch and
master *before* the reformatting commit.
- Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
accepting all changes from your local branch. You can easily do that via git
rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
- Run mvn spotless:apply and commit all the changes
- Do an interactive rebase on upstream/master. This will make sure you
won't miss the changes between your branch and master *after* the
reformatting commit.
- Force push your branch to update the PR

Sorry for this noise!

Thank you,
FG





[jira] [Created] (FLINK-27215) JDBC sink transiently deleted a record because of -u message of that record

2022-04-13 Thread tim yu (Jira)
tim yu created FLINK-27215:
--

 Summary: JDBC sink transiently deleted a record because of -u 
message of that record
 Key: FLINK-27215
 URL: https://issues.apache.org/jira/browse/FLINK-27215
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Affects Versions: 1.12.0
Reporter: tim yu


A record is deleted transiently when using JDBC sink in upsert mode.

The -U message is processed as delete operation in class 
TableBufferReducedStatementExecutor.
The following codes show how to process -U message:
{code:java}
/**
 * Returns true if the row kind is INSERT or UPDATE_AFTER, returns false if 
the row kind is
 * DELETE or UPDATE_BEFORE.
 */
private boolean changeFlag(RowKind rowKind) {
switch (rowKind) {
case INSERT:
case UPDATE_AFTER:
return true;
case DELETE:
case UPDATE_BEFORE:
return false;
default:
throw new UnsupportedOperationException(
String.format(
"Unknown row kind, the supported row kinds is: 
INSERT, UPDATE_BEFORE, UPDATE_AFTER,"
+ " DELETE, but get: %s.",
rowKind));
}
}

@Override
public void executeBatch() throws SQLException {
for (Map.Entry> entry : 
reduceBuffer.entrySet()) {
if (entry.getValue().f0) {
upsertExecutor.addToBatch(entry.getValue().f1);
} else {
// delete by key
deleteExecutor.addToBatch(entry.getKey());
}
}
upsertExecutor.executeBatch();
deleteExecutor.executeBatch();
reduceBuffer.clear();
}
{code}
If -U and +U messages of one record are executed separately in different JDBC 
batches, that record will be deleted transiently in external database and then 
insert a new updated record to it. In fact, this record should be merely 
updated once in the external database.


 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: FLINK-11746 work

2022-04-13 Thread Yu Li
Hi Chen,

Thanks for the interest to work on this.

Before assigning the JIRA and starting the upstream work, how about
reviving the discussion thread [1] first, making sure to reach a consensus
on the design and find committer resources for code review, according to
our code contribution process [2]?

Best Regards,
Yu

[1] https://lists.apache.org/thread/lp8xrzbhl70c2zbmd59zcbmtdhk1g8j2
[2]
https://flink.apache.org/contributing/contribute-code.html#code-contribution-process


On Tue, 12 Apr 2022 at 10:37, Chen Qin  wrote:

> Hi there,
>
> I would like to reboot discussion on FLINK-11746
>  work. Over the course
> of last two years, we managed to run a large number of critical flink apps
> (table/datastream) with underlying thrift format. It would be great if
> folks assign this jira to me so we would be able to kick off discussion on
> upstream work.
>
> Chen
>


Re: Re: [VOTE] Release 1.15.0, release candidate #1

2022-04-13 Thread Yun Gao
Hi Yu,

Yes, the RC1 is officially canceled, the pointer has been moved to RC2 now~

Very thanks for the tips of remaining open issues! They should indeed be moved
and I'll move them to the next release~

Best,
Yun Gao




--
From:Yu Li 
Send Time:2022 Apr. 13 (Wed.) 14:05
To:dev 
Cc:Jing Zhang 
Subject:Re: Re: [VOTE] Release 1.15.0, release candidate #1

Thanks for the efforts, Yun and Joe!

According to the earlier reply, the vote of RC1 is officially canceled,
right?

And just a reminder that there are still many open issues with FixVersion
marked as 1.15.0 [1] and I think most of them (if not all, meaning some
more bug fixes are still expected to be included in the next RC) could be
moved out, to clean up the release note and relative segment of the
blogpost.

Best Regards,
Yu

[1] https://s.apache.org/flink-1.15-open-issues


On Tue, 12 Apr 2022 at 19:03, Yun Gao  wrote:

> Hi Jing,
>
> No worry, very thanks for fixing the issue and informing us.
>
> I'll then head to create the rc2 today.
>
> Best,
> Yun
>
>
>  --Original Mail --
> Sender:Jing Zhang 
> Send Date:Tue Apr 12 18:55:30 2022
> Recipients:dev 
> Subject:Re: [VOTE] Release 1.15.0, release candidate #1
> Hi, Yun Gao
> There is a new bug [1] introduced in release-1.15.
> It's better to be fixed in 1.15.0 version.
> I'm terribly sorry to merge the pull request too late.
>
> [1] https://issues.apache.org/jira/browse/FLINK-26681
>
> Best,
> Jing Zhang
>
> Johannes Moser  于2022年4月12日周二 14:53写道:
>
> > Here is the missing link to the announcement blogpost [6]
> >
> > [6] https://github.com/apache/flink-web/pull/526
> >
> > > On 11.04.2022, at 20:12, Yun Gao  wrote:
> > >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> > 1.15.0, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint CBE82BEFD827B08AFA843977EDBF922A7BC84897 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.15.0-rc1" [5],
> > > * website pull request listing the new release and adding announcement
> > blog post [6].
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> > > Thanks,
> > > Joe, Till and Yun Gao
> > > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.0-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1494/
> > > [5] https://github.com/apache/flink/releases/tag/release-1.15.0-rc1[6]
> > https://github.com/apache/flink-web/pull/526
> >
> >
>



Re: Re: [VOTE] Release 1.15.0, release candidate #1

2022-04-13 Thread Yu Li
Thanks for the efforts, Yun and Joe!

According to the earlier reply, the vote of RC1 is officially canceled,
right?

And just a reminder that there are still many open issues with FixVersion
marked as 1.15.0 [1] and I think most of them (if not all, meaning some
more bug fixes are still expected to be included in the next RC) could be
moved out, to clean up the release note and relative segment of the
blogpost.

Best Regards,
Yu

[1] https://s.apache.org/flink-1.15-open-issues


On Tue, 12 Apr 2022 at 19:03, Yun Gao  wrote:

> Hi Jing,
>
> No worry, very thanks for fixing the issue and informing us.
>
> I'll then head to create the rc2 today.
>
> Best,
> Yun
>
>
>  --Original Mail --
> Sender:Jing Zhang 
> Send Date:Tue Apr 12 18:55:30 2022
> Recipients:dev 
> Subject:Re: [VOTE] Release 1.15.0, release candidate #1
> Hi, Yun Gao
> There is a new bug [1] introduced in release-1.15.
> It's better to be fixed in 1.15.0 version.
> I'm terribly sorry to merge the pull request too late.
>
> [1] https://issues.apache.org/jira/browse/FLINK-26681
>
> Best,
> Jing Zhang
>
> Johannes Moser  于2022年4月12日周二 14:53写道:
>
> > Here is the missing link to the announcement blogpost [6]
> >
> > [6] https://github.com/apache/flink-web/pull/526
> >
> > > On 11.04.2022, at 20:12, Yun Gao  wrote:
> > >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> > 1.15.0, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint CBE82BEFD827B08AFA843977EDBF922A7BC84897 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.15.0-rc1" [5],
> > > * website pull request listing the new release and adding announcement
> > blog post [6].
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> > > Thanks,
> > > Joe, Till and Yun Gao
> > > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.0-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1494/
> > > [5] https://github.com/apache/flink/releases/tag/release-1.15.0-rc1[6]
> > https://github.com/apache/flink-web/pull/526
> >
> >
>