[jira] [Created] (FLINK-31254) Improve the read performance for files table

2023-02-27 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31254:


 Summary: Improve the read performance for files table
 Key: FLINK-31254
 URL: https://issues.apache.org/jira/browse/FLINK-31254
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0


At present, the reading performance of the Files table is very poor. Even every 
data read will read the schema file. We can optimize the reading performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31253) Port itcases to Flink 1.15 and 1.14

2023-02-27 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31253:


 Summary: Port itcases to Flink 1.15 and 1.14
 Key: FLINK-31253
 URL: https://issues.apache.org/jira/browse/FLINK-31253
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


At present, only common has tests. We need to copy a part of itcase to 1.14 and 
1.15 to ensure normal work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema

2023-02-27 Thread Hang Ruan
Hi, Shammon,

I think the method `void deserialize(byte[] message, Collector out)`
with a default implementation encapsulate how to deal with null for
developers. If we remove the `T deserialize(byte[] message)`, the
developers have to remember to handle null. Maybe we will get duplicate
code among them.
And I find there are only 5 implementations override the method `void
deserialize(byte[] message, Collector out)`. Other implementations reuse
the same code to handle null.
I don't know the benefits of removing this method. Looking forward to other
people's opinions.

Best,
Hang



Shammon FY  于2023年2月28日周二 14:14写道:

> Hi devs
>
> Currently there are two deserialization methods in `DeserializationSchema`
> 1. `T deserialize(byte[] message)`, only deserialize one record from
> binary, if there is no record it should return null.
> 2. `void deserialize(byte[] message, Collector out)`, supports
> deserializing none, one or multiple records gracefully, it can completely
> replace method `T deserialize(byte[] message)`.
>
> The deserialization logic in the above two methods is basically coincident,
> we recommend users use the second method to deserialize data. To improve
> code maintainability, I'd like to mark the first function as `@Deprecated`,
> and remove it when it is no longer used in the future.
>
> I have created an issue[1] to track it, looking forward to your feedback,
> thanks
>
> [1] https://issues.apache.org/jira/browse/FLINK-31251
>
>
> Best,
> Shammon
>


[DISCUSS] Deprecate deserialize method in DeserializationSchema

2023-02-27 Thread Shammon FY
Hi devs

Currently there are two deserialization methods in `DeserializationSchema`
1. `T deserialize(byte[] message)`, only deserialize one record from
binary, if there is no record it should return null.
2. `void deserialize(byte[] message, Collector out)`, supports
deserializing none, one or multiple records gracefully, it can completely
replace method `T deserialize(byte[] message)`.

The deserialization logic in the above two methods is basically coincident,
we recommend users use the second method to deserialize data. To improve
code maintainability, I'd like to mark the first function as `@Deprecated`,
and remove it when it is no longer used in the future.

I have created an issue[1] to track it, looking forward to your feedback,
thanks

[1] https://issues.apache.org/jira/browse/FLINK-31251


Best,
Shammon


[jira] [Created] (FLINK-31252) Improve StaticFileStoreSplitEnumerator to assign batch splits

2023-02-27 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31252:


 Summary: Improve StaticFileStoreSplitEnumerator to assign batch 
splits
 Key: FLINK-31252
 URL: https://issues.apache.org/jira/browse/FLINK-31252
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0


{code:java}
// The following batch assignment operation is for two things:
// 1. It can be evenly distributed during batch reading to avoid 
scheduling problems (for
// example, the current resource can only schedule part of the tasks) 
that cause some tasks
// to fail to read data.
// 2. Read with limit, if split is assigned one by one, it may cause 
the task to repeatedly
// create SplitFetchers. After the task is created, it is found that it 
is idle and then
// closed. Then, new split coming, it will create SplitFetcher and 
repeatedly read the data
// of the limit number (the limit status is in the SplitFetcher).
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Flink minor version support policy for old releases

2023-02-27 Thread yuxia
Thanks Danny for driving it.

+1 (non-binding)

Best regards,
Yuxia

- 原始邮件 -
发件人: "Weihua Hu" 
收件人: "dev" 
发送时间: 星期二, 2023年 2 月 28日 下午 12:48:09
主题: Re: [VOTE] Flink minor version support policy for old releases

Thanks, Danny.

+1 (non-binding)

Best,
Weihua


On Tue, Feb 28, 2023 at 12:38 PM weijie guo 
wrote:

> Thanks Danny for bring this.
>
> +1 (non-binding)
>
> Best regards,
>
> Weijie
>
>
> Jing Ge  于2023年2月27日周一 20:23写道:
>
> > +1 (non-binding)
> >
> > BTW, should we follow the content style [1] to describe the new rule
> using
> > 1.2.x, 1.1.y, 1.1.z?
> >
> > [1] https://flink.apache.org/downloads/#update-policy-for-old-releases
> >
> > Best regards,
> > Jing
> >
> > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl
> >  wrote:
> >
> > > Thanks, Danny. Sounds good to me.
> > >
> > > +1 (non-binding)
> > >
> > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer <
> dannycran...@apache.org>
> > > wrote:
> > >
> > > > I am starting a vote to update the "Update Policy for old releases"
> [1]
> > > to
> > > > include additional bugfix support for end of life versions.
> > > >
> > > > As per the discussion thread [2], the change we are voting on is:
> > > > - Support policy: updated to include: "Upon release of a new Flink
> > minor
> > > > version, the community will perform one final bugfix release for
> > resolved
> > > > critical/blocker issues in the Flink minor version losing support."
> > > > - Release process: add a step to start the discussion thread for the
> > > final
> > > > patch version, if there are resolved critical/blocking issues to
> flush.
> > > >
> > > > Voting schema: since our bylaws [3] do not cover this particular
> > > scenario,
> > > > and releases require PMC involvement, we will use a consensus vote
> with
> > > PMC
> > > > binding votes.
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> > > > [1]
> > > https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > > > [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> > > > [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > > >
> > >
> >
>


[jira] [Created] (FLINK-31251) Deprecate deserialize method in DeserializationSchema

2023-02-27 Thread Shammon (Jira)
Shammon created FLINK-31251:
---

 Summary: Deprecate deserialize method in DeserializationSchema
 Key: FLINK-31251
 URL: https://issues.apache.org/jira/browse/FLINK-31251
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Affects Versions: 1.18.0
Reporter: Shammon


Deprecate method `T deserialize(byte[] message)` and use `void 
deserialize(byte[] message, Collector out)` instead in 
`DeserializationSchema`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31250) ParquetSchemaConverter supports MULTISET type for parquet format

2023-02-27 Thread Nicholas Jiang (Jira)
Nicholas Jiang created FLINK-31250:
--

 Summary: ParquetSchemaConverter supports MULTISET type for parquet 
format
 Key: FLINK-31250
 URL: https://issues.apache.org/jira/browse/FLINK-31250
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.18.0
Reporter: Nicholas Jiang
 Fix For: 1.18.0


ParquetSchemaConverter supports ARRAY, MAP and ROW type, doesn't support 
MULTISET type. ParquetSchemaConverter should support MULTISET type for parquet 
format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-02-27 Thread Matt Wang
This is a good proposal for me, it will make the code of the SlotManager more 
clear.



--

Best,
Matt Wang


 Replied Message 
| From | David Morávek |
| Date | 02/27/2023 22:45 |
| To |  |
| Subject | Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager |
Hi Weihua, I still need to dig into the details, but the overall sentiment
of this change sounds reasonable.

Best,
D.

On Mon, Feb 27, 2023 at 2:26 PM Zhanghao Chen 
wrote:

Thanks for driving this topic. I think this FLIP could help clean up the
codebase to make it easier to maintain. +1 on it.

Best,
Zhanghao Chen

From: Weihua Hu 
Sent: Monday, February 27, 2023 20:40
To: dev 
Subject: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

Hi everyone,

I would like to begin a discussion on FLIP-298: Unifying the Implementation
of SlotManager[1]. There are currently two types of SlotManager in Flink:
DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotManager
should behave as DeclarativeSlotManager if the user does not configure the
slot request profile.

Therefore, this FLIP aims to unify the implementation of SlotManager in
order to reduce maintenance costs.

Looking forward to hearing from you.

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager

Best,
Weihua



Re: [VOTE] Flink minor version support policy for old releases

2023-02-27 Thread Weihua Hu
Thanks, Danny.

+1 (non-binding)

Best,
Weihua


On Tue, Feb 28, 2023 at 12:38 PM weijie guo 
wrote:

> Thanks Danny for bring this.
>
> +1 (non-binding)
>
> Best regards,
>
> Weijie
>
>
> Jing Ge  于2023年2月27日周一 20:23写道:
>
> > +1 (non-binding)
> >
> > BTW, should we follow the content style [1] to describe the new rule
> using
> > 1.2.x, 1.1.y, 1.1.z?
> >
> > [1] https://flink.apache.org/downloads/#update-policy-for-old-releases
> >
> > Best regards,
> > Jing
> >
> > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl
> >  wrote:
> >
> > > Thanks, Danny. Sounds good to me.
> > >
> > > +1 (non-binding)
> > >
> > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer <
> dannycran...@apache.org>
> > > wrote:
> > >
> > > > I am starting a vote to update the "Update Policy for old releases"
> [1]
> > > to
> > > > include additional bugfix support for end of life versions.
> > > >
> > > > As per the discussion thread [2], the change we are voting on is:
> > > > - Support policy: updated to include: "Upon release of a new Flink
> > minor
> > > > version, the community will perform one final bugfix release for
> > resolved
> > > > critical/blocker issues in the Flink minor version losing support."
> > > > - Release process: add a step to start the discussion thread for the
> > > final
> > > > patch version, if there are resolved critical/blocking issues to
> flush.
> > > >
> > > > Voting schema: since our bylaws [3] do not cover this particular
> > > scenario,
> > > > and releases require PMC involvement, we will use a consensus vote
> with
> > > PMC
> > > > binding votes.
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> > > > [1]
> > > https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > > > [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> > > > [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > > >
> > >
> >
>


Re: [VOTE] Flink minor version support policy for old releases

2023-02-27 Thread weijie guo
Thanks Danny for bring this.

+1 (non-binding)

Best regards,

Weijie


Jing Ge  于2023年2月27日周一 20:23写道:

> +1 (non-binding)
>
> BTW, should we follow the content style [1] to describe the new rule using
> 1.2.x, 1.1.y, 1.1.z?
>
> [1] https://flink.apache.org/downloads/#update-policy-for-old-releases
>
> Best regards,
> Jing
>
> On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl
>  wrote:
>
> > Thanks, Danny. Sounds good to me.
> >
> > +1 (non-binding)
> >
> > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer 
> > wrote:
> >
> > > I am starting a vote to update the "Update Policy for old releases" [1]
> > to
> > > include additional bugfix support for end of life versions.
> > >
> > > As per the discussion thread [2], the change we are voting on is:
> > > - Support policy: updated to include: "Upon release of a new Flink
> minor
> > > version, the community will perform one final bugfix release for
> resolved
> > > critical/blocker issues in the Flink minor version losing support."
> > > - Release process: add a step to start the discussion thread for the
> > final
> > > patch version, if there are resolved critical/blocking issues to flush.
> > >
> > > Voting schema: since our bylaws [3] do not cover this particular
> > scenario,
> > > and releases require PMC involvement, we will use a consensus vote with
> > PMC
> > > binding votes.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1]
> > https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > > [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> > > [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > >
> >
>


Re: [DISCUSS] SourceCoordinator and ExternallyInducedSourceReader do not work well together

2023-02-27 Thread ming li
Hi, Jiangjie,

Thanks for your reply and suggestion.

In fact, we don't want to modify the way JM triggers checkpoint, but we
hope to give OperatorCoodinator a mechanism similar to
ExternallyInducedSourceReader to coordinate the sending timing of
checkpoint barrier (just advance from Source to OperatorCoodinator). We
hope that the produced data and Checkpoint have a one-to-one mapping. If
there is such a mechanism, the difficulty of programming and design can be
greatly simplified.

In addition, I am not sure if there is the same need in other
OperatorCoordinator, because we always make a snapshot of
OperatorCoordinator immediately.

Thanks,
Ming Li


Becket Qin  于2023年2月28日周二 08:31写道:

> Hi Ming,
>
> I am not sure if I fully understand what you want. It seems what you are
> looking for is to have a checkpoint triggered at a customized timing which
> aligns with some semantic. This is not what the current checkpoint in Flink
> was designed for. I think the basic idea of checkpoint is to just take a
> snapshot of the current state, so we can restore to that state in case of
> failure. This is completely orthogonal to the data semantic.
>
> Even with the ExternallyInducedSourceReader, the checkpoint is still
> triggered by the JM. It is just the effective checkpoint barrier message (a
> custom message in this case) will not be sent by the JM, but by the
> external source storage. This helps when the external source storage needs
> its own internal state to be aligned with the state of the Flink
> SourceReader. For example, if the external source storage can only seek at
> some bulk boundary, then it might wait until the current bulk to finish
> before it sends the custom checkpoint barrier to the SourceReader.
>
>  Considering this scenario, if the data we want has not been produced yet,
> > but the *SourceCoordinator* receives the c*heckpoint* message, it will
> > directly make a *checkpoint*, and the *ExternallyInducedSource* will not
> > make a *checkpoint* immediately after receiving the *checkpoint*, but
> > continues to wait for a new split. Even if a new split is generated, due
> to
> > the behavior of closing *gateway* in *FLINK-28606*, the new split cannot
> be
> > assigned to the *Source*, resulting in a deadlock (or forced to wait for
> > checkpoint to time out).
>
>
> In this case, the source reader should not "wait" for the splits that are
> not included in this checkpoint. These splits should be a part of the next
> checkpoint. It would be the Sink's responsibility to ensure the output is
> committed in a way that aligns with the user semantic.
>
> That said, I agree it might be useful in some cases if users can decided
> the checkpoint triggering timing. But that will be a new feature which
> needs some careful design.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>


[jira] [Created] (FLINK-31249) Checkpoint Timer failed to process timeout events when it blocked at writing _metadata to DFS

2023-02-27 Thread renxiang zhou (Jira)
renxiang zhou created FLINK-31249:
-

 Summary: Checkpoint Timer failed to process timeout events when it 
blocked at writing _metadata to DFS
 Key: FLINK-31249
 URL: https://issues.apache.org/jira/browse/FLINK-31249
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.16.0, 1.11.6
Reporter: renxiang zhou
 Fix For: 1.18.0
 Attachments: image-2023-02-28-11-25-03-637.png

The jobmanager-future thread may be blocked at writing metadata to DFS caused 
by a DFS failure, and the CheckpointCoordinator Lock is hold by this thread. 

When the next Checkpoint is triggered, the Checkpoint Timer thread waits for 
the lock to be released.  If the previous checkpoint times out, the checkpoint 
timer does not execute the timeout event since it is blocked at waiting for the 
lock. As a result, the previous checkpoint cannot be cancelled.

!image-2023-02-28-11-25-03-637.png|width=1144,height=248!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

2023-02-27 Thread Ran Tao
Hi, guys. thanks for advices.

allow me to make a small summary:

1.Support ILIKE
2.Using catalog api to support show operations
3.Need a dedicated FLIP try to support INFORMATION_SCHEMA
4.Support SHOW CONNECTORS

If there are no other questions, i will try to start a VOTE for this FLIP.
WDYT?

Best Regards,
Ran Tao


Sergey Nuyanzin  于2023年2月27日周一 21:12写道:

> Hi Jark,
>
> thanks for your comment.
>
> >Considering they
> > are orthogonal and information schema requires more complex design and
> > discussion, it deserves a separate FLIP
> I'm ok with a separate FLIP for INFORMATION_SCHEMA.
>
> >Sergey, are you willing to contribute this FLIP?
> Seems I need to have more research done for that.
> I would try to help/contribute here
>
>
> On Mon, Feb 27, 2023 at 3:46 AM Ran Tao  wrote:
>
> > HI, Jing. thanks.
> >
> > @about ILIKE, from my collections of some popular engines founds that
> just
> > snowflake has this syntax in show with filtering.
> > do we need to support it? if yes, then current some existed show
> operations
> > need to be addressed either.
> > @about ShowOperation with like. it's a good idea. yes, two parameters for
> > constructor can work. thanks for your advice.
> >
> >
> > Best Regards,
> > Ran Tao
> >
> >
> > Jing Ge  于2023年2月27日周一 06:29写道:
> >
> > > Hi,
> > >
> > > @Aitozi
> > >
> > > This is exactly why LoD has been introduced: to avoid exposing internal
> > > structure(2nd and lower level API).
> > >
> > > @Jark
> > >
> > > IMHO, there is no conflict between LoD and "High power-to-weight ratio"
> > > with the given example, List.subList() returns List interface itself,
> no
> > > internal or further interface has been exposed. After offering
> > > tEvn.getCatalog(), "all" methods in Catalog Interface have been
> provided
> > by
> > > TableEnvironment(via getCatalog()). From user's perspective and
> > maintenance
> > > perspective there is no/less difference between providing them directly
> > via
> > > TableEnvironment or via getCatalog(). They are all exposed. Using
> > > getCatalog() will reduce the number of boring wrapper methods, but on
> the
> > > other hand not every method in Catalog needs to be exposed, so the
> number
> > > of wrapper methods would be limited/less, if we didn't expose Catalog.
> > > Nevertheless, since we already offered getCatalog(), it makes sense to
> > > continue using it. The downside is the learning effort for users - they
> > > have to know that listDatabases() is hidden in Catalog, go to another
> > > haystack and then find the needle in there.
> > >
> > > +1 for Information schema with a different FLIP. From a design
> > perspective,
> > > information schema should be the first choice for most cases and easy
> to
> > > use. Catalog, on the other hand, will be more powerful and offer more
> > > advanced features.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Sat, Feb 25, 2023 at 3:57 PM Jark Wu  wrote:
> > >
> > > > Hi Sergey,
> > > >
> > > > I think INFORMATION_SCHEMA is a very interesting idea, and I hope we
> > can
> > > > support it. However, it doesn't conflict with the idea of auxiliary
> > > > statements. I can see different benefits of them. The information
> > schema
> > > > provides powerful and flexible capabilities but needs to learn the
> > > complex
> > > > entity relationship[1]. The auxiliary SQL statements are super handy
> > and
> > > > can resolve most problems, but they offer limited features.
> > > >
> > > > I can see almost all the mature systems support both of them. I think
> > it
> > > > also makes sense to support both of them in Flink. Considering they
> > > > are orthogonal and information schema requires more complex design
> and
> > > > discussion, it deserves a separate FLIP. Sergey, are you willing to
> > > > contribute this FLIP?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
> > > >
> > > >
> > > > On Fri, 24 Feb 2023 at 22:43, Ran Tao  wrote:
> > > >
> > > > > Thanks John.
> > > > >
> > > > > It seems that most people prefer the information_schema
> > implementation.
> > > > > information_schema does have more benefits (however, the show
> > operation
> > > > is
> > > > > also an option and supplement).
> > > > > Otherwise, the sql syntax and keywords may be changed frequently.
> > > > > Of course, it will be more complicated than the extension of the
> show
> > > > > operation.
> > > > >  It is necessary to design various tables in information_schema,
> > which
> > > > may
> > > > > take a period of effort.
> > > > >
> > > > > I will try to design the information_schema and integrate it with
> > > flink.
> > > > > This may be a relatively big feature for me. I hope you guys can
> give
> > > > > comments and opinions later.
> > > > > Thank you all.
> > > > >
> > > > > Best Regards,
> > > > > Ran Tao
> > > > >
> > > > >
> > > > > John Roesler  于2023年2月24日周五 21:53写道:
> > > > >
> > > > > > 

Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

2023-02-27 Thread Ran Tao
Hi Jing, I think we can support it, it's not a very difficult job.
thank you.

Best Regards,
Ran Tao


Jing Ge  于2023年2月28日周二 06:04写道:

> Hi Ran,
>
> It would be great if we could support ILIKE [1]. If you figure out that the
> effort is bigger than you expected, you might want to create a ticket for
> beginners(?) who are interested in Flink development to make a further
> contribution. WDYT?
>
> Best regards,
> Jing
>
>
> [1] https://clickhouse.com/docs/en/sql-reference/statements/show/
>
> On Mon, Feb 27, 2023 at 3:47 AM Ran Tao  wrote:
>
> > HI, Jing. thanks.
> >
> > @about ILIKE, from my collections of some popular engines founds that
> just
> > snowflake has this syntax in show with filtering.
> > do we need to support it? if yes, then current some existed show
> operations
> > need to be addressed either.
> > @about ShowOperation with like. it's a good idea. yes, two parameters for
> > constructor can work. thanks for your advice.
> >
> >
> > Best Regards,
> > Ran Tao
> >
> >
> > Jing Ge  于2023年2月27日周一 06:29写道:
> >
> > > Hi,
> > >
> > > @Aitozi
> > >
> > > This is exactly why LoD has been introduced: to avoid exposing internal
> > > structure(2nd and lower level API).
> > >
> > > @Jark
> > >
> > > IMHO, there is no conflict between LoD and "High power-to-weight ratio"
> > > with the given example, List.subList() returns List interface itself,
> no
> > > internal or further interface has been exposed. After offering
> > > tEvn.getCatalog(), "all" methods in Catalog Interface have been
> provided
> > by
> > > TableEnvironment(via getCatalog()). From user's perspective and
> > maintenance
> > > perspective there is no/less difference between providing them directly
> > via
> > > TableEnvironment or via getCatalog(). They are all exposed. Using
> > > getCatalog() will reduce the number of boring wrapper methods, but on
> the
> > > other hand not every method in Catalog needs to be exposed, so the
> number
> > > of wrapper methods would be limited/less, if we didn't expose Catalog.
> > > Nevertheless, since we already offered getCatalog(), it makes sense to
> > > continue using it. The downside is the learning effort for users - they
> > > have to know that listDatabases() is hidden in Catalog, go to another
> > > haystack and then find the needle in there.
> > >
> > > +1 for Information schema with a different FLIP. From a design
> > perspective,
> > > information schema should be the first choice for most cases and easy
> to
> > > use. Catalog, on the other hand, will be more powerful and offer more
> > > advanced features.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Sat, Feb 25, 2023 at 3:57 PM Jark Wu  wrote:
> > >
> > > > Hi Sergey,
> > > >
> > > > I think INFORMATION_SCHEMA is a very interesting idea, and I hope we
> > can
> > > > support it. However, it doesn't conflict with the idea of auxiliary
> > > > statements. I can see different benefits of them. The information
> > schema
> > > > provides powerful and flexible capabilities but needs to learn the
> > > complex
> > > > entity relationship[1]. The auxiliary SQL statements are super handy
> > and
> > > > can resolve most problems, but they offer limited features.
> > > >
> > > > I can see almost all the mature systems support both of them. I think
> > it
> > > > also makes sense to support both of them in Flink. Considering they
> > > > are orthogonal and information schema requires more complex design
> and
> > > > discussion, it deserves a separate FLIP. Sergey, are you willing to
> > > > contribute this FLIP?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
> > > >
> > > >
> > > > On Fri, 24 Feb 2023 at 22:43, Ran Tao  wrote:
> > > >
> > > > > Thanks John.
> > > > >
> > > > > It seems that most people prefer the information_schema
> > implementation.
> > > > > information_schema does have more benefits (however, the show
> > operation
> > > > is
> > > > > also an option and supplement).
> > > > > Otherwise, the sql syntax and keywords may be changed frequently.
> > > > > Of course, it will be more complicated than the extension of the
> show
> > > > > operation.
> > > > >  It is necessary to design various tables in information_schema,
> > which
> > > > may
> > > > > take a period of effort.
> > > > >
> > > > > I will try to design the information_schema and integrate it with
> > > flink.
> > > > > This may be a relatively big feature for me. I hope you guys can
> give
> > > > > comments and opinions later.
> > > > > Thank you all.
> > > > >
> > > > > Best Regards,
> > > > > Ran Tao
> > > > >
> > > > >
> > > > > John Roesler  于2023年2月24日周五 21:53写道:
> > > > >
> > > > > > Hello Ran,
> > > > > >
> > > > > > Thanks for the FLIP!
> > > > > >
> > > > > > Do you mind if we revisit the topic of doing this by adding an
> > > > > Information
> > > > > > schema? The SHOW approach requires modifying the parser/language
> > 

[jira] [Created] (FLINK-31248) Improve documentation for append-only table

2023-02-27 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31248:


 Summary: Improve documentation for append-only table
 Key: FLINK-31248
 URL: https://issues.apache.org/jira/browse/FLINK-31248
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] SourceCoordinator and ExternallyInducedSourceReader do not work well together

2023-02-27 Thread Becket Qin
Hi Ming,

I am not sure if I fully understand what you want. It seems what you are
looking for is to have a checkpoint triggered at a customized timing which
aligns with some semantic. This is not what the current checkpoint in Flink
was designed for. I think the basic idea of checkpoint is to just take a
snapshot of the current state, so we can restore to that state in case of
failure. This is completely orthogonal to the data semantic.

Even with the ExternallyInducedSourceReader, the checkpoint is still
triggered by the JM. It is just the effective checkpoint barrier message (a
custom message in this case) will not be sent by the JM, but by the
external source storage. This helps when the external source storage needs
its own internal state to be aligned with the state of the Flink
SourceReader. For example, if the external source storage can only seek at
some bulk boundary, then it might wait until the current bulk to finish
before it sends the custom checkpoint barrier to the SourceReader.

 Considering this scenario, if the data we want has not been produced yet,
> but the *SourceCoordinator* receives the c*heckpoint* message, it will
> directly make a *checkpoint*, and the *ExternallyInducedSource* will not
> make a *checkpoint* immediately after receiving the *checkpoint*, but
> continues to wait for a new split. Even if a new split is generated, due to
> the behavior of closing *gateway* in *FLINK-28606*, the new split cannot be
> assigned to the *Source*, resulting in a deadlock (or forced to wait for
> checkpoint to time out).


In this case, the source reader should not "wait" for the splits that are
not included in this checkpoint. These splits should be a part of the next
checkpoint. It would be the Sink's responsibility to ensure the output is
committed in a way that aligns with the user semantic.

That said, I agree it might be useful in some cases if users can decided
the checkpoint triggering timing. But that will be a new feature which
needs some careful design.

Thanks,

Jiangjie (Becket) Qin


On Mon, Feb 27, 2023 at 8:35 PM ming li  wrote:

> Hi, dev,
>
> We recently used *SourceCoordinator* and *ExternallyInducedSource* to work
> together on some file type connectors to fulfill some requirements, but we
> found that these two interfaces do not work well together.
>
> *SourceCoordinator* (FLINK-15101) and *ExternallyInducedSource*
> (FLINK-20270) were introduced in Flip27. *SourceCoordinator* is responsible
> for running *SplitEnumerator* and coordinating the allocation of *Split*.
> *ExternallyInducedSource* allows us to delay making a c*heckpoint* in
> Source or make a c*heckpoint* at specified data. This works fine with
> connectors like *Kafka*.
>
> But in some connectors (such as hive connector), the split is completely
> allocated by the *SourceCoordinator*, and after the consumption is
> completed, it needs to wait for the allocation of the next batch of splits
> (it is not like kafka that continuously consumes the same split). In
> FLINK-28606, we introduced another mechanism: the *OperatorCoordinator* is
> not allowed to send *OperatorEvents* to the *Operator* before the
> *Operator's* checkpoint is completed.
>
> Considering this scenario, if the data we want has not been produced yet,
> but the *SourceCoordinator* receives the c*heckpoint* message, it will
> directly make a *checkpoint*, and the *ExternallyInducedSource* will not
> make a *checkpoint* immediately after receiving the *checkpoint*, but
> continues to wait for a new split. Even if a new split is generated, due to
> the behavior of closing *gateway* in *FLINK-28606*, the new split cannot be
> assigned to the *Source*, resulting in a deadlock (or forced to wait for
> checkpoint to time out).
>
> So should we also add a mechanism similar to *ExternallyInducedSource* in
> *OperatorCoordinator*: only make a checkpoint on *OperatorCoordinator* when
> *OperatorCoordinator* is ready, which allows us to delay making checkpoint?
>
> [1] https://issues.apache.org/jira/browse/FLINK-15101
> [2] https://issues.apache.org/jira/browse/FLINK-20270
> [3] https://issues.apache.org/jira/browse/FLINK-28606
>


[jira] [Created] (FLINK-31247) Support for fine-grained resource allocation (slot sharing groups) at the TableAPI level

2023-02-27 Thread Sergio Sainz (Jira)
Sergio Sainz created FLINK-31247:


 Summary: Support for fine-grained resource allocation (slot 
sharing groups) at the TableAPI level
 Key: FLINK-31247
 URL: https://issues.apache.org/jira/browse/FLINK-31247
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Affects Versions: 1.16.1
Reporter: Sergio Sainz


Currently Flink allows for fine-grained resource allocation at the DataStream 
API level (please see 
[https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/finegrained_resource/)]

 

This ticket is an enhancement request to support the same API at the Table 
level:

 

org.apache.flink.table.api.Table.setSlotSharingGroup(String ssg)

org.apache.flink.table.api.Table.setSlotSharingGroup(SlotSharingGroup ssg)

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

2023-02-27 Thread Jing Ge
Hi Ran,

It would be great if we could support ILIKE [1]. If you figure out that the
effort is bigger than you expected, you might want to create a ticket for
beginners(?) who are interested in Flink development to make a further
contribution. WDYT?

Best regards,
Jing


[1] https://clickhouse.com/docs/en/sql-reference/statements/show/

On Mon, Feb 27, 2023 at 3:47 AM Ran Tao  wrote:

> HI, Jing. thanks.
>
> @about ILIKE, from my collections of some popular engines founds that just
> snowflake has this syntax in show with filtering.
> do we need to support it? if yes, then current some existed show operations
> need to be addressed either.
> @about ShowOperation with like. it's a good idea. yes, two parameters for
> constructor can work. thanks for your advice.
>
>
> Best Regards,
> Ran Tao
>
>
> Jing Ge  于2023年2月27日周一 06:29写道:
>
> > Hi,
> >
> > @Aitozi
> >
> > This is exactly why LoD has been introduced: to avoid exposing internal
> > structure(2nd and lower level API).
> >
> > @Jark
> >
> > IMHO, there is no conflict between LoD and "High power-to-weight ratio"
> > with the given example, List.subList() returns List interface itself, no
> > internal or further interface has been exposed. After offering
> > tEvn.getCatalog(), "all" methods in Catalog Interface have been provided
> by
> > TableEnvironment(via getCatalog()). From user's perspective and
> maintenance
> > perspective there is no/less difference between providing them directly
> via
> > TableEnvironment or via getCatalog(). They are all exposed. Using
> > getCatalog() will reduce the number of boring wrapper methods, but on the
> > other hand not every method in Catalog needs to be exposed, so the number
> > of wrapper methods would be limited/less, if we didn't expose Catalog.
> > Nevertheless, since we already offered getCatalog(), it makes sense to
> > continue using it. The downside is the learning effort for users - they
> > have to know that listDatabases() is hidden in Catalog, go to another
> > haystack and then find the needle in there.
> >
> > +1 for Information schema with a different FLIP. From a design
> perspective,
> > information schema should be the first choice for most cases and easy to
> > use. Catalog, on the other hand, will be more powerful and offer more
> > advanced features.
> >
> > Best regards,
> > Jing
> >
> >
> > On Sat, Feb 25, 2023 at 3:57 PM Jark Wu  wrote:
> >
> > > Hi Sergey,
> > >
> > > I think INFORMATION_SCHEMA is a very interesting idea, and I hope we
> can
> > > support it. However, it doesn't conflict with the idea of auxiliary
> > > statements. I can see different benefits of them. The information
> schema
> > > provides powerful and flexible capabilities but needs to learn the
> > complex
> > > entity relationship[1]. The auxiliary SQL statements are super handy
> and
> > > can resolve most problems, but they offer limited features.
> > >
> > > I can see almost all the mature systems support both of them. I think
> it
> > > also makes sense to support both of them in Flink. Considering they
> > > are orthogonal and information schema requires more complex design and
> > > discussion, it deserves a separate FLIP. Sergey, are you willing to
> > > contribute this FLIP?
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> >
> https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
> > >
> > >
> > > On Fri, 24 Feb 2023 at 22:43, Ran Tao  wrote:
> > >
> > > > Thanks John.
> > > >
> > > > It seems that most people prefer the information_schema
> implementation.
> > > > information_schema does have more benefits (however, the show
> operation
> > > is
> > > > also an option and supplement).
> > > > Otherwise, the sql syntax and keywords may be changed frequently.
> > > > Of course, it will be more complicated than the extension of the show
> > > > operation.
> > > >  It is necessary to design various tables in information_schema,
> which
> > > may
> > > > take a period of effort.
> > > >
> > > > I will try to design the information_schema and integrate it with
> > flink.
> > > > This may be a relatively big feature for me. I hope you guys can give
> > > > comments and opinions later.
> > > > Thank you all.
> > > >
> > > > Best Regards,
> > > > Ran Tao
> > > >
> > > >
> > > > John Roesler  于2023年2月24日周五 21:53写道:
> > > >
> > > > > Hello Ran,
> > > > >
> > > > > Thanks for the FLIP!
> > > > >
> > > > > Do you mind if we revisit the topic of doing this by adding an
> > > > Information
> > > > > schema? The SHOW approach requires modifying the parser/language
> for
> > > > every
> > > > > gap we identify. On the flip side, an Information schema is much
> > easier
> > > > to
> > > > > discover and remember how to use, and the ability to run queries on
> > it
> > > is
> > > > > quite valuable for admins. It’s also better for Flink maintainers,
> > > > because
> > > > > the information tables’ schemas can be evolved over time just like
> > > > regular
> > > > > tables, whereas every 

[jira] [Created] (FLINK-31246) Remove PodTemplate description from the SpecChange message

2023-02-27 Thread Peter Vary (Jira)
Peter Vary created FLINK-31246:
--

 Summary: Remove PodTemplate description from the SpecChange message
 Key: FLINK-31246
 URL: https://issues.apache.org/jira/browse/FLINK-31246
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Peter Vary


Currently the Spec Change message contains the full PodTemplate twice.
This makes the message seriously big and also contains very little useful 
information.

We should abbreviate the message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Contributing a Google Cloud Pub/Sub Lite source and sink?

2023-02-27 Thread Daniel Collins
Hello Martijn,

Thanks for the redirect!

> The process for contributing a connector is to create a Connector FLIP
[1], which needs to be discussed and voted on in the Dev mailing list [2]

I don't mind doing this, but would like to get a read of the room before
doing so. If people object to it, then it probably doesn't make sense.

> One thing in particular is who can help with the maintenance of the
connector: will there be more volunteers who can help with bug fixes, new
features etc.

In this case, the product team is willing to handle feature requests and
bug reports for the connector, as we currently do for our beam and spark
connectors. I don't know if there is any mechanism for sending emails when
jira tickets with a specific tag are opened? But if there is I can ensure
that any tickets get routed to a member of my team to look at.

What would people's thoughts be about this?

-Daniel

On Mon, Feb 27, 2023 at 3:53 AM Martijn Visser 
wrote:

> Hi Daniel,
>
> Thanks for reaching out. Keep in mind that you weren't subscribed to the
> Flink Dev mailing list, I've just pushed this through the mailing list
> moderation.
>
> The process for contributing a connector is to create a Connector FLIP [1],
> which needs to be discussed and voted on in the Dev mailing list [2]. One
> thing in particular is who can help with the maintenance of the connector:
> will there be more volunteers who can help with bug fixes, new features
> etc. As we've seen with the current PubSub connector, that's already quite
> hard: it's currently lacking volunteers overall. I do recall a proposal to
> contribute a PubSub Lite connector a while back, but that ultimately was
> not followed through.
>
> Best regards,
>
> Martijn
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template
> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
>
> On Mon, Feb 27, 2023 at 9:44 AM Daniel Collins
> 
> wrote:
>
> > Hello flink devs,
> >
> > My name is Daniel, I'm the tech lead for Google's Pub/Sub Lite streaming
> > system, which is a lower cost streaming data system with semantics more
> > similar to OSS streaming solutions. I've authored a source and sink
> > connector for flink and load tested it at GiB/s scale- what would be the
> > process for contributing this to flink?
> >
> > I've opened a JIRA to do this
> > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-31229, if
> this
> > seems like a reasonable thing to do could someone assign it to me?
> >
> > The code for the connector currently lives here
> > https://github.com/googleapis/java-pubsublite-flink, I believe it is
> > following the FLIP-27 guidelines, but please let me know if I'm
> > implementing the wrong interfaces. Which repo and in what folder should I
> > move this code into?
> >
> > -Daniel
> >
>


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-02-27 Thread David Morávek
Hi Weihua, I still need to dig into the details, but the overall sentiment
of this change sounds reasonable.

Best,
D.

On Mon, Feb 27, 2023 at 2:26 PM Zhanghao Chen 
wrote:

> Thanks for driving this topic. I think this FLIP could help clean up the
> codebase to make it easier to maintain. +1 on it.
>
> Best,
> Zhanghao Chen
> 
> From: Weihua Hu 
> Sent: Monday, February 27, 2023 20:40
> To: dev 
> Subject: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager
>
> Hi everyone,
>
> I would like to begin a discussion on FLIP-298: Unifying the Implementation
> of SlotManager[1]. There are currently two types of SlotManager in Flink:
> DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotManager
> should behave as DeclarativeSlotManager if the user does not configure the
> slot request profile.
>
> Therefore, this FLIP aims to unify the implementation of SlotManager in
> order to reduce maintenance costs.
>
> Looking forward to hearing from you.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
>
> Best,
> Weihua
>


Re: [DISCUSS] Deprecating GlobalAggregateManager

2023-02-27 Thread David Morávek
I think this makes sense, +1 from my side; as I wrote on the ticket, I'm
not aware of any other usages apart from the kinesis connector, and we
already have more feature complete API that can replace the functionality
there.

Best,
D.

On Mon, Feb 27, 2023 at 2:44 PM Zhanghao Chen 
wrote:

> Hi dev,
>
> I'd like to open discussion on deprecating Global Aggregate Manager in
> favor of Operator Coordinator.
>
>
>   1.  Global Aggregate Manager is rarely used and can be replaced by
> Opeator Coordinator. Global Aggregate Manager was introduced in [1]<
> https://issues.apache.org/jira/browse/FLINK-10886> to support event time
> synchronization across sources and more generally, coordination of parallel
> tasks. AFAIK, this was only used in the Kinesis source [2] for an early
> version of watermark alignment. Operator Coordinator, introduced in [3],
> provides a more powerful and elegant solution for that need and is part of
> the new source API standard.
>   2.  Global Aggregate Manager manages state in JobMaster object, causing
> problems for adaptive parallelism changes. It maintains a state (the
> accumulators field in JobMaster) in JM memory. The accumulator state
> content is defined in user code. In my company, a user stores task
> parallelism in the accumulator, assuming task parallelism never changes.
> However, this assumption is broken when using adaptive scheduler. See [4]
> for more details.
>
> Therefore, I think we should deprecate the use of Global Aggregate
> Manager, which can improve the maintainability of the Flink codebase
> without compromising its functionality. Looking forward to your opinions on
> this.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10886
> [2]
> https://github.com/apache/flink-connector-aws/blob/d0817fecdcaa53c4bf039761c2d1a16e8fb9f89b/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/JobManagerWatermarkTracker.java
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-SplitEnumerator
> [4] [FLINK-31245] Adaptive scheduler does not reset the state of
> GlobalAggregateManager when rescaling - ASF JIRA (apache.org)<
> https://issues.apache.org/jira/browse/FLINK-31245?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
> >
>
> Best,
> Zhanghao Chen
>


[DISCUSS] Deprecating GlobalAggregateManager

2023-02-27 Thread Zhanghao Chen
Hi dev,

I'd like to open discussion on deprecating Global Aggregate Manager in favor of 
Operator Coordinator.


  1.  Global Aggregate Manager is rarely used and can be replaced by Opeator 
Coordinator. Global Aggregate Manager was introduced in 
[1] to support event time 
synchronization across sources and more generally, coordination of parallel 
tasks. AFAIK, this was only used in the Kinesis source [2] for an early version 
of watermark alignment. Operator Coordinator, introduced in [3], provides a 
more powerful and elegant solution for that need and is part of the new source 
API standard.
  2.  Global Aggregate Manager manages state in JobMaster object, causing 
problems for adaptive parallelism changes. It maintains a state (the 
accumulators field in JobMaster) in JM memory. The accumulator state content is 
defined in user code. In my company, a user stores task parallelism in the 
accumulator, assuming task parallelism never changes. However, this assumption 
is broken when using adaptive scheduler. See [4] for more details.

Therefore, I think we should deprecate the use of Global Aggregate Manager, 
which can improve the maintainability of the Flink codebase without 
compromising its functionality. Looking forward to your opinions on this.

[1] https://issues.apache.org/jira/browse/FLINK-10886
[2] 
https://github.com/apache/flink-connector-aws/blob/d0817fecdcaa53c4bf039761c2d1a16e8fb9f89b/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/JobManagerWatermarkTracker.java
[3] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-SplitEnumerator
[4] [FLINK-31245] Adaptive scheduler does not reset the state of 
GlobalAggregateManager when rescaling - ASF JIRA 
(apache.org)

Best,
Zhanghao Chen


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-02-27 Thread Zhanghao Chen
Thanks for driving this topic. I think this FLIP could help clean up the 
codebase to make it easier to maintain. +1 on it.

Best,
Zhanghao Chen

From: Weihua Hu 
Sent: Monday, February 27, 2023 20:40
To: dev 
Subject: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

Hi everyone,

I would like to begin a discussion on FLIP-298: Unifying the Implementation
of SlotManager[1]. There are currently two types of SlotManager in Flink:
DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotManager
should behave as DeclarativeSlotManager if the user does not configure the
slot request profile.

Therefore, this FLIP aims to unify the implementation of SlotManager in
order to reduce maintenance costs.

Looking forward to hearing from you.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager

Best,
Weihua


Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

2023-02-27 Thread Sergey Nuyanzin
Hi Jark,

thanks for your comment.

>Considering they
> are orthogonal and information schema requires more complex design and
> discussion, it deserves a separate FLIP
I'm ok with a separate FLIP for INFORMATION_SCHEMA.

>Sergey, are you willing to contribute this FLIP?
Seems I need to have more research done for that.
I would try to help/contribute here


On Mon, Feb 27, 2023 at 3:46 AM Ran Tao  wrote:

> HI, Jing. thanks.
>
> @about ILIKE, from my collections of some popular engines founds that just
> snowflake has this syntax in show with filtering.
> do we need to support it? if yes, then current some existed show operations
> need to be addressed either.
> @about ShowOperation with like. it's a good idea. yes, two parameters for
> constructor can work. thanks for your advice.
>
>
> Best Regards,
> Ran Tao
>
>
> Jing Ge  于2023年2月27日周一 06:29写道:
>
> > Hi,
> >
> > @Aitozi
> >
> > This is exactly why LoD has been introduced: to avoid exposing internal
> > structure(2nd and lower level API).
> >
> > @Jark
> >
> > IMHO, there is no conflict between LoD and "High power-to-weight ratio"
> > with the given example, List.subList() returns List interface itself, no
> > internal or further interface has been exposed. After offering
> > tEvn.getCatalog(), "all" methods in Catalog Interface have been provided
> by
> > TableEnvironment(via getCatalog()). From user's perspective and
> maintenance
> > perspective there is no/less difference between providing them directly
> via
> > TableEnvironment or via getCatalog(). They are all exposed. Using
> > getCatalog() will reduce the number of boring wrapper methods, but on the
> > other hand not every method in Catalog needs to be exposed, so the number
> > of wrapper methods would be limited/less, if we didn't expose Catalog.
> > Nevertheless, since we already offered getCatalog(), it makes sense to
> > continue using it. The downside is the learning effort for users - they
> > have to know that listDatabases() is hidden in Catalog, go to another
> > haystack and then find the needle in there.
> >
> > +1 for Information schema with a different FLIP. From a design
> perspective,
> > information schema should be the first choice for most cases and easy to
> > use. Catalog, on the other hand, will be more powerful and offer more
> > advanced features.
> >
> > Best regards,
> > Jing
> >
> >
> > On Sat, Feb 25, 2023 at 3:57 PM Jark Wu  wrote:
> >
> > > Hi Sergey,
> > >
> > > I think INFORMATION_SCHEMA is a very interesting idea, and I hope we
> can
> > > support it. However, it doesn't conflict with the idea of auxiliary
> > > statements. I can see different benefits of them. The information
> schema
> > > provides powerful and flexible capabilities but needs to learn the
> > complex
> > > entity relationship[1]. The auxiliary SQL statements are super handy
> and
> > > can resolve most problems, but they offer limited features.
> > >
> > > I can see almost all the mature systems support both of them. I think
> it
> > > also makes sense to support both of them in Flink. Considering they
> > > are orthogonal and information schema requires more complex design and
> > > discussion, it deserves a separate FLIP. Sergey, are you willing to
> > > contribute this FLIP?
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> >
> https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
> > >
> > >
> > > On Fri, 24 Feb 2023 at 22:43, Ran Tao  wrote:
> > >
> > > > Thanks John.
> > > >
> > > > It seems that most people prefer the information_schema
> implementation.
> > > > information_schema does have more benefits (however, the show
> operation
> > > is
> > > > also an option and supplement).
> > > > Otherwise, the sql syntax and keywords may be changed frequently.
> > > > Of course, it will be more complicated than the extension of the show
> > > > operation.
> > > >  It is necessary to design various tables in information_schema,
> which
> > > may
> > > > take a period of effort.
> > > >
> > > > I will try to design the information_schema and integrate it with
> > flink.
> > > > This may be a relatively big feature for me. I hope you guys can give
> > > > comments and opinions later.
> > > > Thank you all.
> > > >
> > > > Best Regards,
> > > > Ran Tao
> > > >
> > > >
> > > > John Roesler  于2023年2月24日周五 21:53写道:
> > > >
> > > > > Hello Ran,
> > > > >
> > > > > Thanks for the FLIP!
> > > > >
> > > > > Do you mind if we revisit the topic of doing this by adding an
> > > > Information
> > > > > schema? The SHOW approach requires modifying the parser/language
> for
> > > > every
> > > > > gap we identify. On the flip side, an Information schema is much
> > easier
> > > > to
> > > > > discover and remember how to use, and the ability to run queries on
> > it
> > > is
> > > > > quite valuable for admins. It’s also better for Flink maintainers,
> > > > because
> > > > > the information tables’ schemas can be evolved over time just like
> > > > regular
> > > > > 

[jira] [Created] (FLINK-31245) Adaptive scheduler does not reset the state of GlobalAggregateManager when rescaling

2023-02-27 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-31245:
-

 Summary: Adaptive scheduler does not reset the state of 
GlobalAggregateManager when rescaling
 Key: FLINK-31245
 URL: https://issues.apache.org/jira/browse/FLINK-31245
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.16.1
Reporter: Zhanghao Chen
 Fix For: 1.18.0


*Problem*

GlobalAggregateManager is used to share state amongst parallel tasks in a job 
and thus coordinate their execution. It maintains a state (the _accumulators_ 
field in JobMaster) in JM memory. The accumulator state content is defined in 
user code, in my company, a user stores task parallelism in the accumulator, 
assuming task parallelism never changes. However, this assumption is broken 
when using adaptive scheduler.

*Possible Solutions*
 # Mark GlobalAggregateManager as deprecated. It seems that operator 
coordinator can completely replace GlobalAggregateManager and is a more elegent 
solution. Therefore, it is fine to deprecate GlobalAggregateManager and leave 
this issue there. It that's the case, we can open another ticket for doing that.
 # If we decide to continue supporting GlobalAggregateManager, then we need to 
reset the state when rescaling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31244) OffHeapUnsafeMemorySegmentTest.testCallCleanerOnceOnConcurrentFree prints IllegalStateException

2023-02-27 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-31244:
-

 Summary: 
OffHeapUnsafeMemorySegmentTest.testCallCleanerOnceOnConcurrentFree prints 
IllegalStateException
 Key: FLINK-31244
 URL: https://issues.apache.org/jira/browse/FLINK-31244
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network, Tests
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Matthias Pohl


We're observing strange IllegalStateException stacktrace output in 
{{OffHeapUnsafeMemorySegmentTest.testCallCleanerOnceOnConcurrentFree}} in CI 
like:

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46283=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798=5584]
 
{code:java}
Feb 18 03:58:47 [INFO] Running 
org.apache.flink.core.memory.OffHeapUnsafeMemorySegmentTest
Exception in thread "Thread-13" java.lang.IllegalStateException: MemorySegment 
can be freed only once!
    at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-15" java.lang.IllegalStateException: MemorySegment 
can be freed only once!
    at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-17" java.lang.IllegalStateException: MemorySegment 
can be freed only once!
    at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244)
    at java.lang.Thread.run(Thread.java:748){code}
This is caused by FLINK-21798. The corresponding system property is enabled as 
part of the CI run (see 
[tools/ci/test_controller.sh:108|https://github.com/apache/flink/blob/7e37d59f834bca805f5fbee99db87eb909d1814f/tools/ci/test_controller.sh#L108])
 which makes the {{IllegalStateException}} to be thrown.

AFAIU, the intention of this test was to make sure that the cleaner logic is 
only called once even if the free method is called multiple times. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-02-27 Thread Weihua Hu
Hi everyone,

I would like to begin a discussion on FLIP-298: Unifying the Implementation
of SlotManager[1]. There are currently two types of SlotManager in Flink:
DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotManager
should behave as DeclarativeSlotManager if the user does not configure the
slot request profile.

Therefore, this FLIP aims to unify the implementation of SlotManager in
order to reduce maintenance costs.

Looking forward to hearing from you.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager

Best,
Weihua


[DISCUSS] SourceCoordinator and ExternallyInducedSourceReader do not work well together

2023-02-27 Thread ming li
Hi, dev,

We recently used *SourceCoordinator* and *ExternallyInducedSource* to work
together on some file type connectors to fulfill some requirements, but we
found that these two interfaces do not work well together.

*SourceCoordinator* (FLINK-15101) and *ExternallyInducedSource*
(FLINK-20270) were introduced in Flip27. *SourceCoordinator* is responsible
for running *SplitEnumerator* and coordinating the allocation of *Split*.
*ExternallyInducedSource* allows us to delay making a c*heckpoint* in
Source or make a c*heckpoint* at specified data. This works fine with
connectors like *Kafka*.

But in some connectors (such as hive connector), the split is completely
allocated by the *SourceCoordinator*, and after the consumption is
completed, it needs to wait for the allocation of the next batch of splits
(it is not like kafka that continuously consumes the same split). In
FLINK-28606, we introduced another mechanism: the *OperatorCoordinator* is
not allowed to send *OperatorEvents* to the *Operator* before the
*Operator's* checkpoint is completed.

Considering this scenario, if the data we want has not been produced yet,
but the *SourceCoordinator* receives the c*heckpoint* message, it will
directly make a *checkpoint*, and the *ExternallyInducedSource* will not
make a *checkpoint* immediately after receiving the *checkpoint*, but
continues to wait for a new split. Even if a new split is generated, due to
the behavior of closing *gateway* in *FLINK-28606*, the new split cannot be
assigned to the *Source*, resulting in a deadlock (or forced to wait for
checkpoint to time out).

So should we also add a mechanism similar to *ExternallyInducedSource* in
*OperatorCoordinator*: only make a checkpoint on *OperatorCoordinator* when
*OperatorCoordinator* is ready, which allows us to delay making checkpoint?

[1] https://issues.apache.org/jira/browse/FLINK-15101
[2] https://issues.apache.org/jira/browse/FLINK-20270
[3] https://issues.apache.org/jira/browse/FLINK-28606


Re: [VOTE] Flink minor version support policy for old releases

2023-02-27 Thread Jing Ge
+1 (non-binding)

BTW, should we follow the content style [1] to describe the new rule using
1.2.x, 1.1.y, 1.1.z?

[1] https://flink.apache.org/downloads/#update-policy-for-old-releases

Best regards,
Jing

On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl
 wrote:

> Thanks, Danny. Sounds good to me.
>
> +1 (non-binding)
>
> On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer 
> wrote:
>
> > I am starting a vote to update the "Update Policy for old releases" [1]
> to
> > include additional bugfix support for end of life versions.
> >
> > As per the discussion thread [2], the change we are voting on is:
> > - Support policy: updated to include: "Upon release of a new Flink minor
> > version, the community will perform one final bugfix release for resolved
> > critical/blocker issues in the Flink minor version losing support."
> > - Release process: add a step to start the discussion thread for the
> final
> > patch version, if there are resolved critical/blocking issues to flush.
> >
> > Voting schema: since our bylaws [3] do not cover this particular
> scenario,
> > and releases require PMC involvement, we will use a consensus vote with
> PMC
> > binding votes.
> >
> > Thanks,
> > Danny
> >
> > [1]
> https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> > [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> >
>


Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL

2023-02-27 Thread kui yuan
Hi Timo,

Thanks for your advice. I totally agree with your suggestion of naming
convention, I will rename these options and update the flip later, thanks
very much.

In our internal implementation we had put these options inside the
`FactoryUtil`, just as you expect.  We have also taken into account the
changes to the CompiledPlan and we have packaged these options
appropriately to minimize intrusiveness and ensure the compatibility to the
`WatermarkPushDownSpec`.

> A hint to the implementation: I would suggest that we add those options
> to `FactoryUtil`. All cross-connector options should end up there.


> Please also consider the changes to the CompiledPlan in your FLIP. This
> change has implications on the JSON format as watermark strategy of
> ExecNode becomes more complex, see WatermarkPushDownSpec

Best
Kui Yuan

Timo Walther  于2023年2月27日周一 18:05写道:

> Hi Kui Yuan,
>
> thanks for working on this FLIP. Let me also give some comments about
> the proposed changes.
>
> I support the direction of this FLIP about handling these
> watermark-specific properties through options and /*+OPTIONS(...) */ hints.
>
> Regarding naming, I would like to keep the options in sync with existing
> options:
>
>  > 'watermark.emit.strategy'='ON_EVENT'
>
> Let's use lower case (e.g. `on-event`) that matches with properties like
> sink.partitioner [1] or sink.delivery-guarantee [2].
>
>  > 'source.idle-timeout'='1min'
>
> According to FLIP-122 [3], we want to prefix all scan-source related
> properties with `scan.*`. This clearly includes idle-timeout and
> actually also watermark strategies which don't apply for lookup sources.
>
> Summarizing the comments above, we should use the following options:
>
> 'scan.watermark.emit.strategy'='on-event',
> 'scan.watermark.emit.on-event.gap'='1',
> 'scan.idle-timeout'='1min',
> 'scan.watermark.alignment.group'='alignment-group-1',
> 'scan.watermark.alignment.max-drift'='1min',
> 'scan.watermark.alignment.update-interval'='1s'
>
> I know that this makes the keys even longer, but given that those
> options are for power users this should be acceptable. It also clearly
> indicates which options are for sinks, scans, and lookups. This
> potentially also helps in allow lists.
>
> A hint to the implementation: I would suggest that we add those options
> to `FactoryUtil`. All cross-connector options should end up there.
>
> Please also consider the changes to the CompiledPlan in your FLIP. This
> change has implications on the JSON format as watermark strategy of
> ExecNode becomes more complex, see WatermarkPushDownSpec [4].
>
> Regards,
> Timo
>
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
> [4]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java
>
>
> On 24.02.23 04:55, kui yuan wrote:
> > Hi all,
> >
> > I have updated the flip according to the discussion, and we will extend
> the
> > watermark-related features with both table options and 'OPTIONS' hint,
> like
> > this:
> >
> > ```
> > -- configure in table options
> > CREATE TABLE user_actions (
> >...
> >user_action_time TIMESTAMP(3),
> >WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5'
> SECOND
> > ) WITH (
> >'watermark.emit.strategy'='ON_PERIODIC',
> >...
> > );
> >
> > -- use 'OPTIONS' hint
> > select ... from source_table /*+ OPTIONS('watermark.emit.strategy'=
> > 'ON_PERIODIC') */
> > ```
> >
> > Does everybody have any other questions?
> >
> > Best
> > Kui Yuan
> >
> > kui yuan  于2023年2月23日周四 20:05写道:
> >
> >> Hi all,
> >>
> >> Thanks for all suggestions.
> >>
> >> We will extend the watermark-related features in SQL layer with dynamic
> >> table options and 'OPTIONS' hint, just as everyone expects. I will
> modify
> >> Flip-296 as discussed.
> >>
> >> @Martijn As far as I know, there is no hint interface in the table API,
> >> so we can't use hint in table API directly. if we need to extend the
> hint
> >> interface in the table API, maybe we need another flip. However, if we
> >> extend the watermark-related features in the dynamic table options,
> maybe
> >> we are able to use them indirectly in the table API like this[1]:
> >>
> >> ```
> >> // register a table named "Orders"
> >> tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product STRING,
> >> amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)");
> >> ```
> >>
> >> [1]
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/
> >>
> >> Best
> >> Kui Yuan
> >>
> >> Yun Tang  于2023年2月23日周四 17:46写道:
> >>
> >>> Thanks for the warm discussions!
> >>>
> >>> I 

Re: [Discuss] Some questions on flink-table-store micro benchmark

2023-02-27 Thread Yun Tang
Thanks for the response. I think we can also incorporate the performance 
regression monitoring with Slack in the feature, which Yanfei ever did [1].


[1] https://lists.apache.org/thread/b75j2sgzhf25298p4982fy9tzdjrvght

Best
Yun Tang

From: Jingsong Li 
Sent: Monday, February 27, 2023 18:02
To: dev@flink.apache.org 
Subject: Re: [Discuss] Some questions on flink-table-store micro benchmark

Hi Yun and Shammon,

> track the performance changes of the micro benchmark

I think we can create a github action for this. Print results everyday.

Best,
Jingsong

On Mon, Feb 27, 2023 at 5:59 PM Shammon FY  wrote:
>
> Hi jingsong
>
> Getting rid of JMH is a good idea. For the second point, how can we track
> the performance changes of the micro benchmark? What do you think?
>
> Best,
> Shammon
>
> On Mon, Feb 27, 2023 at 10:57 AM Jingsong Li  wrote:
>
> > Thanks Yun.
> >
> > Another way is we can get rid of JMH, something like Spark
> > `org.apache.spark.benchmark.Benchmark` can replace JMH.
> >
> > Best,
> > Jingsong
> >
> > On Mon, Feb 27, 2023 at 1:24 AM Yun Tang  wrote:
> > >
> > > Hi dev,
> > >
> > > I just noticed that flink-table-store had introduced the micro benchmark
> > module [1] to test the basic performance. And I have two questions here.
> > > First of all, we might not be able to keep the micro benchmark, which is
> > based on JMH, in the main repo of flink-table-store. This is because JMH is
> > under GPL license, which is not compliant with Apache-2 license. That's why
> > Flink moved the flink-benchmark out [2].
> > >
> > > Secondly, I try to run the micro benchmark locally but it seems failed,
> > I just wonder can we make the flink-table-store's micro benchmark could be
> > periodically executed just as flink-benchmarks did on
> > http://codespeed.dak8s.net:8080? Please correct me if such daily
> > benchmark has been set up.
> > > Moreover, maybe we can also consider to integrate this micro benchmark
> > notification in the Slack channel just as what flink-benchmarks did.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-29636
> > > [2] https://issues.apache.org/jira/browse/FLINK-2973
> > >
> > > Best
> > > Yun Tang
> >


Re: [VOTE] Flink minor version support policy for old releases

2023-02-27 Thread Matthias Pohl
Thanks, Danny. Sounds good to me.

+1 (non-binding)

On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer 
wrote:

> I am starting a vote to update the "Update Policy for old releases" [1] to
> include additional bugfix support for end of life versions.
>
> As per the discussion thread [2], the change we are voting on is:
> - Support policy: updated to include: "Upon release of a new Flink minor
> version, the community will perform one final bugfix release for resolved
> critical/blocker issues in the Flink minor version losing support."
> - Release process: add a step to start the discussion thread for the final
> patch version, if there are resolved critical/blocking issues to flush.
>
> Voting schema: since our bylaws [3] do not cover this particular scenario,
> and releases require PMC involvement, we will use a consensus vote with PMC
> binding votes.
>
> Thanks,
> Danny
>
> [1] https://flink.apache.org/downloads.html#update-policy-for-old-releases
> [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
>


[jira] [Created] (FLINK-31243) KryoSerializer when loaded from user code classloader cannot load Scala extensions from app classloader

2023-02-27 Thread Amit Gurdasani (Jira)
Amit Gurdasani created FLINK-31243:
--

 Summary: KryoSerializer when loaded from user code classloader 
cannot load Scala extensions from app classloader
 Key: FLINK-31243
 URL: https://issues.apache.org/jira/browse/FLINK-31243
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.16.1, 1.15.3
 Environment: OS: Amazon Linux 2

JVM: Amazon Corretto 11

 
Reporter: Amit Gurdasani


The 
[KryoSerializer|https://github.com/apache/flink/blob/9bf0d9f2c2bcb2bc0c8ab6228bb0a9e76e10ad70/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/kryo/KryoSerializer.java]
 uses Class.forName() to dynamically load Scala extensions by name. This seems 
to imply that it references only its own classloader to find these extensions. 
By default, as the application classloader is favored for KryoSerializer, this 
implies that unless the flink-scala artifact is available to the application 
classloader, the Scala extensions cannot be loaded. Scala applications that 
include flink-scala are therefore unable to benefit from the Scala extensions 
to the Kryo Serializer.

Exception looks like this:
{noformat}
java.lang.ClassNotFoundException: 
org.apache.flink.runtime.types.FlinkScalaKryoInstantiator
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:315)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.getKryoInstance(KryoSerializer.java:486)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.checkKryoInitialized(KryoSerializer.java:521)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.getKryo(KryoSerializer.java:720)
at software.amazon.kinesisanalytics.kryotest.Main.main(Main.java:16)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at 
org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)
at 
org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)
at 
org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829){noformat}
Example code resulting in this issue:

Main class for Flink application:
{noformat}
package software.amazon.kinesisanalytics.kryotest;

import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.io.Serializable;

public class Main {
private static class Something implements Serializable {
public static long serialVersionUID = 289034745902347830L;
}

public static void main(String... args) {
StreamExecutionEnvironment executionEnvironment = new 
StreamExecutionEnvironment();
KryoSerializer serializer = new 
KryoSerializer<>(Something.class, executionEnvironment.getConfig());
serializer.getKryo();
}
}
{noformat}
build.gradle for Flink application:
{code:java}
plugins {
id 'application'
id 'java'
id 'com.github.johnrengelman.shadow' version '7.1.2'
}

group 'software.amazon.kinesisanalytics'
version '0.1'

repositories {
mavenCentral()
}

dependencies {
compileOnly 'org.apache.flink:flink-core:1.15.2'
compileOnly 

[jira] [Created] (FLINK-31242) Correct the definition of creating functions in the SQL client documentation

2023-02-27 Thread Xianxun Ye (Jira)
Xianxun Ye created FLINK-31242:
--

 Summary: Correct the definition of creating functions in the SQL 
client documentation
 Key: FLINK-31242
 URL: https://issues.apache.org/jira/browse/FLINK-31242
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.17.0
Reporter: Xianxun Ye


The definition of creating functions is wrong in the SQL Client section:
{code:java}
-- Define user-defined functions here.

CREATE FUNCTION foo.bar.AggregateUDF AS myUDF;{code}
The correct one should be:
{code:java}
-- Define user-defined functions here. 

CREATE FUNCTION myUDF AS 'foo.bar.AggregateUDF'; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31241) Hive dependency cannot be resolved

2023-02-27 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-31241:
-

 Summary: Hive dependency cannot be resolved
 Key: FLINK-31241
 URL: https://issues.apache.org/jira/browse/FLINK-31241
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.17.0
Reporter: Matthias Pohl


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46535=logs=bf5e383b-9fd3-5f02-ca1c-8f788e2e76d3=85189c57-d8a0-5c9c-b61d-fc05cfac62cf]
{code:java}
Feb 25 01:24:49 [ERROR] Failed to execute goal on project 
flink-connector-hive_2.12: Could not resolve dependencies for project 
org.apache.flink:flink-connector-hive_2.12:jar:1.17-SNAPSHOT: Failed to collect 
dependencies at org.apache.hive:hive-service:jar:3.1.3 -> 
org.apache.hive:hive-llap-server:jar:3.1.3 -> 
org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact 
descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Could not 
transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT from/to 
apache.snapshots (https://repository.apache.org/snapshots): transfer failed for 
https://repository.apache.org/snapshots/org/glassfish/javax.el/3.0.1-b06-SNAPSHOT/javax.el-3.0.1-b06-SNAPSHOT.pom,
 status: 502 Proxy Error -> [Help 1] {code}
I would imagine this to be a network issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL

2023-02-27 Thread Timo Walther

Hi Kui Yuan,

thanks for working on this FLIP. Let me also give some comments about 
the proposed changes.


I support the direction of this FLIP about handling these 
watermark-specific properties through options and /*+OPTIONS(...) */ hints.


Regarding naming, I would like to keep the options in sync with existing 
options:


> 'watermark.emit.strategy'='ON_EVENT'

Let's use lower case (e.g. `on-event`) that matches with properties like 
sink.partitioner [1] or sink.delivery-guarantee [2].


> 'source.idle-timeout'='1min'

According to FLIP-122 [3], we want to prefix all scan-source related 
properties with `scan.*`. This clearly includes idle-timeout and 
actually also watermark strategies which don't apply for lookup sources.


Summarizing the comments above, we should use the following options:

'scan.watermark.emit.strategy'='on-event',
'scan.watermark.emit.on-event.gap'='1',
'scan.idle-timeout'='1min',
'scan.watermark.alignment.group'='alignment-group-1',
'scan.watermark.alignment.max-drift'='1min',
'scan.watermark.alignment.update-interval'='1s'

I know that this makes the keys even longer, but given that those 
options are for power users this should be acceptable. It also clearly 
indicates which options are for sinks, scans, and lookups. This 
potentially also helps in allow lists.


A hint to the implementation: I would suggest that we add those options 
to `FactoryUtil`. All cross-connector options should end up there.


Please also consider the changes to the CompiledPlan in your FLIP. This 
change has implications on the JSON format as watermark strategy of 
ExecNode becomes more complex, see WatermarkPushDownSpec [4].


Regards,
Timo


[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee
[3] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
[4] 
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java



On 24.02.23 04:55, kui yuan wrote:

Hi all,

I have updated the flip according to the discussion, and we will extend the
watermark-related features with both table options and 'OPTIONS' hint, like
this:

```
-- configure in table options
CREATE TABLE user_actions (
   ...
   user_action_time TIMESTAMP(3),
   WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND
) WITH (
   'watermark.emit.strategy'='ON_PERIODIC',
   ...
);

-- use 'OPTIONS' hint
select ... from source_table /*+ OPTIONS('watermark.emit.strategy'=
'ON_PERIODIC') */
```

Does everybody have any other questions?

Best
Kui Yuan

kui yuan  于2023年2月23日周四 20:05写道:


Hi all,

Thanks for all suggestions.

We will extend the watermark-related features in SQL layer with dynamic
table options and 'OPTIONS' hint, just as everyone expects. I will modify
Flip-296 as discussed.

@Martijn As far as I know, there is no hint interface in the table API,
so we can't use hint in table API directly. if we need to extend the hint
interface in the table API, maybe we need another flip. However, if we
extend the watermark-related features in the dynamic table options, maybe
we are able to use them indirectly in the table API like this[1]:

```
// register a table named "Orders"
tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product STRING,
amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)");
```

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/

Best
Kui Yuan

Yun Tang  于2023年2月23日周四 17:46写道:


Thanks for the warm discussions!

I had an offline discussion with Kui about the replies. I think I could
give some explanations on the original intention to introduce another
WATERMARK_PARAMS. If we take a look at the current datastream API, the
watermark strategy does not belong to any specific connector. And we
thought the dynamic table options were more like the configurations within
some specific connector.

 From the review comments, I think most people feel good to make it part
of the dynamic table options. I think this is fine if we give more clear
scope definition of the dynamic table options here. And I also agree with
Jingsong's concern about adding SQL syntax which is the most concerning
part before launching this discussion.

For Martijn's concern, if we accept to make the watermark-related options
part of dynamic table options, the problem becomes another topic: how to
support the dynamic table options in table API, which is deserved to create
another FLIP.

Best
Yun Tang

From: Martijn Visser 
Sent: Thursday, February 23, 2023 17:14
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL

Hi,

While I can understand that there's a desire to first focus on 

Re: Slack community invitation request

2023-02-27 Thread Viacheslav Chernyshev
Thank you!

From: Martijn Visser 
Sent: 27 February 2023 09:24
To: dev@flink.apache.org 
Subject: Re: Slack community invitation request

Hi Viacheslav,

Thanks for flagging this. I've updated the Slack invite link.

Best regards,

Martijn

On Mon, Feb 27, 2023 at 10:01 AM Viacheslav Chernyshev <
v.chernys...@outlook.com> wrote:

> Good morning,
>
> I am a senior software engineer at Bloomberg working on a streaming system
> that uses Apache Flink. I would like to join the Slack community to ask a
> few questions regarding the best practices for structuring the pipeline.
> Unfortunately, the invite link listed in the Community & Project Info
> section of the website has expired. The instructions point to this mailing
> list. Could someone please invite me into the community?
>
> Kind regards,
> Viacheslav
>


Re: [Discuss] Some questions on flink-table-store micro benchmark

2023-02-27 Thread Jingsong Li
Hi Yun and Shammon,

> track the performance changes of the micro benchmark

I think we can create a github action for this. Print results everyday.

Best,
Jingsong

On Mon, Feb 27, 2023 at 5:59 PM Shammon FY  wrote:
>
> Hi jingsong
>
> Getting rid of JMH is a good idea. For the second point, how can we track
> the performance changes of the micro benchmark? What do you think?
>
> Best,
> Shammon
>
> On Mon, Feb 27, 2023 at 10:57 AM Jingsong Li  wrote:
>
> > Thanks Yun.
> >
> > Another way is we can get rid of JMH, something like Spark
> > `org.apache.spark.benchmark.Benchmark` can replace JMH.
> >
> > Best,
> > Jingsong
> >
> > On Mon, Feb 27, 2023 at 1:24 AM Yun Tang  wrote:
> > >
> > > Hi dev,
> > >
> > > I just noticed that flink-table-store had introduced the micro benchmark
> > module [1] to test the basic performance. And I have two questions here.
> > > First of all, we might not be able to keep the micro benchmark, which is
> > based on JMH, in the main repo of flink-table-store. This is because JMH is
> > under GPL license, which is not compliant with Apache-2 license. That's why
> > Flink moved the flink-benchmark out [2].
> > >
> > > Secondly, I try to run the micro benchmark locally but it seems failed,
> > I just wonder can we make the flink-table-store's micro benchmark could be
> > periodically executed just as flink-benchmarks did on
> > http://codespeed.dak8s.net:8080? Please correct me if such daily
> > benchmark has been set up.
> > > Moreover, maybe we can also consider to integrate this micro benchmark
> > notification in the Slack channel just as what flink-benchmarks did.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-29636
> > > [2] https://issues.apache.org/jira/browse/FLINK-2973
> > >
> > > Best
> > > Yun Tang
> >


Re: [Discuss] Some questions on flink-table-store micro benchmark

2023-02-27 Thread Shammon FY
Hi jingsong

Getting rid of JMH is a good idea. For the second point, how can we track
the performance changes of the micro benchmark? What do you think?

Best,
Shammon

On Mon, Feb 27, 2023 at 10:57 AM Jingsong Li  wrote:

> Thanks Yun.
>
> Another way is we can get rid of JMH, something like Spark
> `org.apache.spark.benchmark.Benchmark` can replace JMH.
>
> Best,
> Jingsong
>
> On Mon, Feb 27, 2023 at 1:24 AM Yun Tang  wrote:
> >
> > Hi dev,
> >
> > I just noticed that flink-table-store had introduced the micro benchmark
> module [1] to test the basic performance. And I have two questions here.
> > First of all, we might not be able to keep the micro benchmark, which is
> based on JMH, in the main repo of flink-table-store. This is because JMH is
> under GPL license, which is not compliant with Apache-2 license. That's why
> Flink moved the flink-benchmark out [2].
> >
> > Secondly, I try to run the micro benchmark locally but it seems failed,
> I just wonder can we make the flink-table-store's micro benchmark could be
> periodically executed just as flink-benchmarks did on
> http://codespeed.dak8s.net:8080? Please correct me if such daily
> benchmark has been set up.
> > Moreover, maybe we can also consider to integrate this micro benchmark
> notification in the Slack channel just as what flink-benchmarks did.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-29636
> > [2] https://issues.apache.org/jira/browse/FLINK-2973
> >
> > Best
> > Yun Tang
>


[jira] [Created] (FLINK-31240) Reduce the overhead of conversion between DataStream and Table

2023-02-27 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-31240:
-

 Summary: Reduce the overhead of conversion between DataStream and 
Table
 Key: FLINK-31240
 URL: https://issues.apache.org/jira/browse/FLINK-31240
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Jiang Xin


In some cases, users may need to convert the underlying DataStream to Table and 
then convert it back to DataStream(e.g. some Flink ML libraries accept a Table 
as input and convert it to DataStream for calculation.). This would cause 
unnecessary overhead because of data conversion between the internal data type 
and the external data type.

We can reduce the overhead by checking if there are paired 
`fromDataStream`/`toDataStream` function call without any transformation, if so 
using the source datastream directly.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Slack community invitation request

2023-02-27 Thread Martijn Visser
Hi Viacheslav,

Thanks for flagging this. I've updated the Slack invite link.

Best regards,

Martijn

On Mon, Feb 27, 2023 at 10:01 AM Viacheslav Chernyshev <
v.chernys...@outlook.com> wrote:

> Good morning,
>
> I am a senior software engineer at Bloomberg working on a streaming system
> that uses Apache Flink. I would like to join the Slack community to ask a
> few questions regarding the best practices for structuring the pipeline.
> Unfortunately, the invite link listed in the Community & Project Info
> section of the website has expired. The instructions point to this mailing
> list. Could someone please invite me into the community?
>
> Kind regards,
> Viacheslav
>


[jira] [Created] (FLINK-31239) Fix sum function can't get the corrected value when the argument type is string

2023-02-27 Thread dalongliu (Jira)
dalongliu created FLINK-31239:
-

 Summary: Fix sum function can't get the corrected value when the 
argument type is string
 Key: FLINK-31239
 URL: https://issues.apache.org/jira/browse/FLINK-31239
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Affects Versions: 1.17.0
Reporter: dalongliu
 Fix For: 1.17.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Slack community invitation request

2023-02-27 Thread Viacheslav Chernyshev
Good morning,

I am a senior software engineer at Bloomberg working on a streaming system that 
uses Apache Flink. I would like to join the Slack community to ask a few 
questions regarding the best practices for structuring the pipeline. 
Unfortunately, the invite link listed in the Community & Project Info section 
of the website has expired. The instructions point to this mailing list. Could 
someone please invite me into the community?

Kind regards,
Viacheslav


[jira] [Created] (FLINK-31238) Use IngestDB to speed up Rocksdb rescaling recovery

2023-02-27 Thread Yue Ma (Jira)
Yue Ma created FLINK-31238:
--

 Summary: Use IngestDB to speed up Rocksdb rescaling recovery 
 Key: FLINK-31238
 URL: https://issues.apache.org/jira/browse/FLINK-31238
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.16.1
Reporter: Yue Ma
 Fix For: 1.16.2
 Attachments: image-2023-02-27-16-41-18-552.png

There have been many discussions and optimizations in the community about 
optimizing rocksdb scaling and recovery.

https://issues.apache.org/jira/browse/FLINK-17971

https://issues.apache.org/jira/browse/FLINK-8845

https://issues.apache.org/jira/browse/FLINK-21321

We hope to discuss some of our explorations under this ticket

The process of scaling and recovering in rocksdb simply requires two steps
 # Insert the valid keyGroup data of the new task.
 # Delete the invalid data in the old stateHandle.

The current method for data writing is to specify the main Db first and then 
insert data using writeBatch.In addition, the method of deleteRange is 
currently used to speed up the ClipDB. But in our production environment, we 
found that the speed of rescaling is still very slow, especially when the state 
of a single Task is large. 

 

We hope that the previous sst file can be reused directly when restoring state, 
instead of retraversing the data. So we made some attempts to optimize it in 
our internal version of flink and frocksdb.

 

We added two APIs *ClipDb* and *IngestDb* in frocksdb. 
 * ClipDB is used to clip the data of a DB. Different from db.DeteleRange and 
db.Delete, DeleteValue and RangeTombstone will not be generated for parts 
beyond the key range. We will iterate over the FileMetaData of db. Process each 
sst file. There are three situations here. 
If all the keys of a file are required, we will keep the sst file and do 
nothing 
If all the keys of the sst file exceed the specified range, we will delete the 
file directly. 
If we only need some part of the sst file, we will rewrite the required keys to 
generate a new sst file。
All sst file changes will be placed in a VersionEdit, and the current versions 
will LogAndApply this edit to ensure that these changes can take effect
 * IngestDb is used to directly ingest all sst files of one DB into another DB. 
But it is necessary to strictly ensure that the keys of the two DBs do not 
overlap, which is easy to do in the Flink scenario. The hard link method will 
be used in the process of ingesting files, so it will be very fast. At the same 
time, the file number of the main DB will be incremented sequentially, and the 
SequenceNumber of the main DB will be updated to the larger SequenceNumber of 
the two DBs.

When IngestDb and ClipDb are supported, the state restoration logic is as 
follows
 * Open the first StateHandle as the main DB and pause the compaction.
 * Clip the main DB according to the KeyGroup range of the Task with ClipDB
 * Open other StateHandles in sequence as Tmp DB, and perform ClipDb  according 
to the KeyGroup range
 * Ingest all tmpDb into the main Db after tmpDb cliped
 * Open the Compaction process of the main DB
!image-2023-02-27-16-41-18-552.png!

We have done some benchmark tests on the internal Flink version, and the test 
results show that compared with the writeBatch method, the expansion and 
recovery speed of IngestDb can be increased by 5 to 10 times As follows 

 
 * parallelism changes from 4 to 2

|*TaskStateSize*|*Write_Batch*|*SST_File_Writer*|*Ingest_DB*|
|500M|Iteration 1: 8.018 s/op
Iteration 2: 9.551 s/op
Iteration 3: 7.486 s/op|Iteration 1: 6.041 s/op
Iteration 2: 5.934 s/op
Iteration 3: 6.707 s/o|{color:#FF}Iteration 1: 3.922 s/op{color}
{color:#FF}Iteration 2: 3.208 s/op{color}
{color:#FF}Iteration 3: 3.096 s/op{color}|
|1G|Iteration 1: 19.686 s/op
Iteration 2: 19.402 s/op
Iteration 3: 21.146 s/op|Iteration 1: 17.538 s/op
Iteration 2: 16.933 s/op
Iteration 3: 15.486 s/op|{color:#FF}Iteration 1: 6.207 s/op{color}
{color:#FF}Iteration 2: 7.164 s/op{color}
{color:#FF}Iteration 3: 6.397 s/op{color}|
|5G|Iteration 1: 244.795 s/op
Iteration 2: 243.141 s/op
Iteration 3: 253.542 s/op|Iteration 1: 78.058 s/op
Iteration 2: 85.635 s/op
Iteration 3: 76.568 s/op|{color:#FF}Iteration 1: 23.397 s/op{color}
{color:#FF}Iteration 2: 21.387 s/op{color}
{color:#FF}Iteration 3: 22.858 s/op{color}|
 * parallelism changes from 4 to 8

|*TaskStateSize*|*Write_Batch*|*SST_File_Writer*|*Ingest_DB*|
|500M|Iteration 1: 3.477 s/op
Iteration 2: 3.515 s/op
Iteration 3: 3.433 s/op|Iteration 1: 3.453 s/op
Iteration 2: 3.300 s/op
Iteration 3: 3.313 s/op|{color:#FF}Iteration 1: 0.941 s/op{color}
{color:#FF}Iteration 2: 0.963 s/op{color}
{color:#FF}Iteration 3: 1.102 s/op{color}|
|1G|IIteration 1: 7.571 s/op
Iteration 2: 7.352 s/op
Iteration 3: 7.568 s/op|Iteration 1: 5.032 s/op
Iteration 2: 4.689 

Re: Contributing a Google Cloud Pub/Sub Lite source and sink?

2023-02-27 Thread Martijn Visser
Hi Daniel,

Thanks for reaching out. Keep in mind that you weren't subscribed to the
Flink Dev mailing list, I've just pushed this through the mailing list
moderation.

The process for contributing a connector is to create a Connector FLIP [1],
which needs to be discussed and voted on in the Dev mailing list [2]. One
thing in particular is who can help with the maintenance of the connector:
will there be more volunteers who can help with bug fixes, new features
etc. As we've seen with the current PubSub connector, that's already quite
hard: it's currently lacking volunteers overall. I do recall a proposal to
contribute a PubSub Lite connector a while back, but that ultimately was
not followed through.

Best regards,

Martijn

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template
[2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws

On Mon, Feb 27, 2023 at 9:44 AM Daniel Collins 
wrote:

> Hello flink devs,
>
> My name is Daniel, I'm the tech lead for Google's Pub/Sub Lite streaming
> system, which is a lower cost streaming data system with semantics more
> similar to OSS streaming solutions. I've authored a source and sink
> connector for flink and load tested it at GiB/s scale- what would be the
> process for contributing this to flink?
>
> I've opened a JIRA to do this
> https://issues.apache.org/jira/projects/FLINK/issues/FLINK-31229, if this
> seems like a reasonable thing to do could someone assign it to me?
>
> The code for the connector currently lives here
> https://github.com/googleapis/java-pubsublite-flink, I believe it is
> following the FLIP-27 guidelines, but please let me know if I'm
> implementing the wrong interfaces. Which repo and in what folder should I
> move this code into?
>
> -Daniel
>


Contributing a Google Cloud Pub/Sub Lite source and sink?

2023-02-27 Thread Daniel Collins
Hello flink devs,

My name is Daniel, I'm the tech lead for Google's Pub/Sub Lite streaming
system, which is a lower cost streaming data system with semantics more
similar to OSS streaming solutions. I've authored a source and sink
connector for flink and load tested it at GiB/s scale- what would be the
process for contributing this to flink?

I've opened a JIRA to do this
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-31229, if this
seems like a reasonable thing to do could someone assign it to me?

The code for the connector currently lives here
https://github.com/googleapis/java-pubsublite-flink, I believe it is
following the FLIP-27 guidelines, but please let me know if I'm
implementing the wrong interfaces. Which repo and in what folder should I
move this code into?

-Daniel


[jira] [Created] (FLINK-31237) Fix possible bug of array_distinct

2023-02-27 Thread jackylau (Jira)
jackylau created FLINK-31237:


 Summary: Fix possible bug of array_distinct
 Key: FLINK-31237
 URL: https://issues.apache.org/jira/browse/FLINK-31237
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.18.0
Reporter: jackylau
 Fix For: 1.18.0


as talked here [https://github.com/apache/flink/pull/19623,] we should use 
builtin expressions/functions. because the sql semantic is different from  java 
equals



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31236) Limit pushdown will open useless RecordReader

2023-02-27 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31236:


 Summary: Limit pushdown will open useless RecordReader
 Key: FLINK-31236
 URL: https://issues.apache.org/jira/browse/FLINK-31236
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31235) Flink Jdbc Connector can not push down where condition

2023-02-27 Thread leo.zhi (Jira)
leo.zhi created FLINK-31235:
---

 Summary: Flink Jdbc Connector can not push down where condition
 Key: FLINK-31235
 URL: https://issues.apache.org/jira/browse/FLINK-31235
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: leo.zhi


when we use flink 1.13/1.14/1.15, I found out that every time I query 
tidb(mysql) , it will load the whole table without the where condiditon.

 

Below table has 1 milion records, it takes 15 minuetes to load and return one 
record.

I dont know why, and it is very appreciated for the help :)

For example:

val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment

env.setRuntimeMode(RuntimeExecutionMode.BATCH)

val tEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)

tEnv.executeSql(
s"""
|CREATE TABLE table(
| ID varchar(50) NOT NULL,
| CreateTime Timestamp NOT NULL
|) with (
| 'connector' = 'jdbc',
| 'url' = 
'jdbc:mysql://:3306/xx?tinyInt1isBit=false=false',
| 'username' = '',
| 'password' = '',
| 'table-name' = 'Service',
| 'driver' = 'com.mysql.cj.jdbc.Driver'
|)
""".stripMargin)

val query: Table = tEnv.sqlQuery("select * from table where ID = '00011'")

query.print()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)