[jira] [Created] (FLINK-22862) Support profiling in PyFlink

2021-06-02 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-22862:


 Summary: Support profiling in PyFlink
 Key: FLINK-22862
 URL: https://issues.apache.org/jira/browse/FLINK-22862
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.14.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.14.0


We will support profiling to help users to analyze the performance bottleneck 
in their python udf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discuss] Planning Flink 1.14

2021-06-02 Thread Kurt Young
Thanks for bringing this up.

I have one thought about the release period. In a short word: shall we try
to extend the release period for 1 month?

There are a couple of reasons why I want to bring up this proposal.

1) I observed that lots of users are actually far behind the current Flink
version. For example, we are now actively
developing 1.14 but most users I know who have a migration or upgrade plan
are planning to upgrade to 1.12. This means
we need to back port bug fixes to 1.12 and 1.13. If we extend the release
period by 1 month, I think there may be some
chances that users can have a proper time frame to upgrade to the previous
released version. Then we can have a
good development cycle which looks like "actively developing the current
version and making the previous version stable,
not 2 ~ 3 versions before". Always far away from Flink's latest version
also suppresses the motivation to contribute to Flink
from users perspective.

2) Increasing the release period also eases the workload of committers
which I think can improve the contributor experience.
I have seen several times that when some contributors want to do some new
features or improvements, we have to response
with "sorry we are right now focusing with implementing/stabilizing planned
feature for this version", and the contributions are
mostly like being stalled and never brought up again.

BTW extending the release period also has downsides. It slows down the
delivery speed of new features. And I'm also not
sure how much it can improve the above 2 issues.

Looking forward to hearing some feedback from the community, both users and
developers.

Best,
Kurt


On Wed, Jun 2, 2021 at 8:39 PM JING ZHANG  wrote:

> Hi Dawid, Joe & Xintong,
>
> Thanks for starting the discussion.
>
> I would like to polish Window TVFs[1][2] which is a popular feature in SQL
> introduced in 1.13.
>
> The detailed items are as follows.
> 1. Add more computations based on Window TVF
> * Window Join (which is already merged in master branch)
> * Window Table Function
> * Window Deduplicate
> 2. Finish related JIRA to improve user experience
>* Add offset support for TUMBLE, HOP, session window
> 3. Complement the missing functions compared to the group window, which is
> a precondition of deprecating the legacy Grouped Window Function in the
> later versions.
>* Support Session windows
>* Support allow-lateness
>* Support retract input stream
>* Support window TVF in batch mode
>
> [1] https://issues.apache.org/jira/browse/FLINK-19604
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows
>
> Best regards,
> JING ZHANG
>
> Xintong Song  于2021年6月2日周三 下午6:45写道:
>
> > Hi all,
> >
> > As 1.13 has been released for a while, I think it is a good time to start
> > planning for the 1.14 release cycle.
> >
> > - Release managers: This time we'd like to have a team of 3 release
> > managers. Dawid, Joe and I would like to volunteer for it. What do you
> > think about it?
> >
> > - Timeline: According to our approximate 4 months release period, we
> > propose to aim for a feature freeze roughly in early August (which could
> > mean something like early September for the 1.14. release). Does it work
> > for everyone?
> >
> > - Collecting features: It would be helpful to have a rough overview of
> the
> > new features that will likely be included in this release. We have
> created
> > a wiki page [1] for collecting such information. We'd like to kindly ask
> > all committers to fill in the page with features that they intend to work
> > on.
> >
> > We would also like to emphasize some aspects of the engineering process:
> >
> > - Stability of master: This has been an issue during the 1.13 feature
> > freeze phase and it is still going on. We encourage every committer to
> not
> > merge PRs through the Github button, but do this manually, with caution
> for
> > the commits merged after the CI being triggered. It would be appreciated
> to
> > always build the project before merging to master.
> >
> > - Documentation: Please try to see documentation as an integrated part of
> > the engineering process and don't push it to the feature freeze phase or
> > even after. You might even think about going documentation first. We, as
> > the Flink community, are adding great stuff, that is pushing the limits
> of
> > streaming data processors, with every release. We should also make this
> > stuff usable for our users by documenting it well.
> >
> > - Promotion of 1.14: What applies to documentation also applies to all
> the
> > activity around the release. We encourage every contributor to also think
> > about, plan and prepare activities like blog posts and talk, that will
> > promote and spread the release once it is done.
> >
> > Please let us know what you think.
> >
> > Thank you~
> > Dawid, Joe & Xintong
> >
> > [1] 

[jira] [Created] (FLINK-22861) TIMESTAMPADD + timestamp_ltz type throws CodeGenException when comparing with timestamp type

2021-06-02 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-22861:
---

 Summary: TIMESTAMPADD + timestamp_ltz type throws CodeGenException 
when comparing with timestamp type
 Key: FLINK-22861
 URL: https://issues.apache.org/jira/browse/FLINK-22861
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner, Table SQL / Runtime
Affects Versions: 1.14.0, 1.13.2
Reporter: Caizhi Weng


Add the following test case to 
{{org.apache.flink.table.planner.runtime.batch.sql.CalcITCase}} to reproduce 
this issue.

{code:scala}
@Test
def myTest(): Unit = {
  checkResult("SELECT TIMESTAMPADD(MINUTE, 10, CURRENT_TIMESTAMP) < 
TO_TIMESTAMP('2021-06-01 00:00:00')", Seq())
}
{code}

The exception stack is
{code}
org.apache.flink.table.planner.codegen.CodeGenException: Incomparable types: 
TIMESTAMP_LTZ(3) NOT NULL and TIMESTAMP(3) NOT NULL

at 
org.apache.flink.table.planner.codegen.calls.ScalarOperatorGens$.generateComparison(ScalarOperatorGens.scala:633)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.generateCallExpression(ExprCodeGenerator.scala:661)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:529)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:56)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:174)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator$$anonfun$11.apply(ExprCodeGenerator.scala:526)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator$$anonfun$11.apply(ExprCodeGenerator.scala:517)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:517)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:56)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:174)
at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.generateExpression(ExprCodeGenerator.scala:155)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$$anonfun$2.apply(CalcCodeGenerator.scala:141)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$$anonfun$2.apply(CalcCodeGenerator.scala:141)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.produceProjectionCode$1(CalcCodeGenerator.scala:141)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateProcessCode(CalcCodeGenerator.scala:167)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateCalcOperator(CalcCodeGenerator.scala:50)
at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator.generateCalcOperator(CalcCodeGenerator.scala)
at 
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:94)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:134)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:247)
at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecSink.translateToPlanInternal(BatchExecSink.java:58)
at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:134)
at 
org.apache.flink.table.planner.delegation.BatchPlanner$$anonfun$translateToPlan$1.apply(BatchPlanner.scala:80)
at 
org.apache.flink.table.planner.delegation.BatchPlanner$$anonfun$translateToPlan$1.apply(BatchPlanner.scala:79)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at 

Re: Integrating new connector with Flink SQL.

2021-06-02 Thread Jingsong Li
Hi santhosh,

1.I recommend you use the new source with ScanTablesource.

2.You can use `org.apache.flink.table.connector.source.SourceProvider` to
integrate to ScanTablesource. (Introduced in 1.12)

3.You can just implement a new source, this one can be used by both Flink
DataStream and Flink SQL. (As well as SourceFunction, it can be used by
both Flink DataStream and Flink SQL too)

Actually, the connector of the table is just a wrapper of DataStream. They
should not have core differences.

I believe we should migrate KafkaDynamicSource to the new
source KafkaSource in the Flink 1.14. Maybe @Qingsheng Ren is working on
this.

Best,
Jingsong

On Thu, Jun 3, 2021 at 5:19 AM santhosh venkat 
wrote:

> Hi,
>
> Please correct me if I'm wrong anywhere. I'm just new to Flink and trying
> to navigate the landscape.
>
> Within my company, currently we're trying to develop a connector for our
> internal change data capture system(brooklin) for flink. We are planning to
> use Flink SQL as a primary API to build streaming applications.
>
> When exploring flink contracts, we noticed that there are two different
> flavors of APIs available in Flink for Source integration.
>
> a) Flink Table API : The Flink ScanTableSource abstractions which are
> currently relying upon the SourceFunction interfaces for integrating with
> the underlying messaging-client libraries. For instance, KafkaDynamicSource
> and KinesisDynamicSource are currently using the FlinkKafkaConsumer(an
> implementation of RichParallelSourceFunction) and KinesisConsumer(an
> implementation of RichParallelSourceFunction) respectively to read from the
> broker.
>
> b) FLIP-27 style connector implementations: There are connectors which
>  implement SplitEnumerator and SourceReader abstractions, where the
> Enumerator runs with the JobMaster and the Readers runs within the
> TaskManager processes respectively.
>
> Questions:
>
> 1. If I want to integrate a new connector and want to use Flink SQL, then
> what is the recommendation? Are the users supposed to implement the
> RichParallelSourceFunction, CheckpointListener etc similar to
> FlinkKafkaConsumer and wire into the ScanTableSource API?
>
> 2. Just wondering, what is the long term plan for the ScanTablesource APIs?
> Are there plans for them to use and integrate with the SplitEnumerator and
> SourceReader abstractions?
>
> 3. If I want to offer my connector implementation to both Flink DataStream
> and Flink SQL APIs, then should I implement both the flavors of source
> APIs(SplitEnumerator/SourceReader as well as SourceFunction) in flink?
>
> I would really appreciate it if someone can help and answer the above
> questions.
>
> Thanks.
>


-- 
Best, Jingsong Lee


[jira] [Created] (FLINK-22860) Supplementary 'HELP' command prompt message for SQL-Cli.

2021-06-02 Thread Roc Marshal (Jira)
Roc Marshal created FLINK-22860:
---

 Summary: Supplementary 'HELP' command prompt message for SQL-Cli.
 Key: FLINK-22860
 URL: https://issues.apache.org/jira/browse/FLINK-22860
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Roc Marshal
 Attachments: attach.png

!attach.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22859) Wordcount on Docker test (custom fs plugin) fails due to output hash mismatch

2021-06-02 Thread Xintong Song (Jira)
Xintong Song created FLINK-22859:


 Summary: Wordcount on Docker test (custom fs plugin) fails due to 
output hash mismatch
 Key: FLINK-22859
 URL: https://issues.apache.org/jira/browse/FLINK-22859
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.14.0
Reporter: Xintong Song


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18568=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529=1213]

And the CI hangs until timeout after this test failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: [VOTE] Watermark propagation with Sink API

2021-06-02 Thread Zhou, Brian
+1 (non-binding)

Thanks Eron, looking forward to seeing this feature soon.

Thanks,
Brian

-Original Message-
From: Arvid Heise  
Sent: Wednesday, June 2, 2021 15:44
To: dev
Subject: Re: [VOTE] Watermark propagation with Sink API


[EXTERNAL EMAIL] 

+1 (binding)

Thanks Eron for driving this effort; it will open up new exciting use cases.

On Tue, Jun 1, 2021 at 7:17 PM Eron Wright 
wrote:

> After some good discussion about how to enhance the Sink API to 
> process watermarks, I believe we're ready to proceed with a vote.  
> Voting will be open until at least Friday, June 4th, 2021.
>
> Reference:
>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/displa
> y/FLINK/FLIP-167*3A*Watermarks*for*Sink*API__;JSsrKys!!LpKI!zkBBhuqEEi
> fxF_fDQqAjqsbuWWFmnAvwRfEAWxeC63viFWXPLul-GCBb-PTq$ 
> [cwiki[.]apache[.]org]
>
> Discussion thread:
>
> https://urldefense.com/v3/__https://lists.apache.org/thread.html/r5194
> e1cf157d1fd5ba7ca9b567cb01723bd582aa12dda57d25bca37e*40*3Cdev.flink.ap
> ache.org*3E__;JSUl!!LpKI!zkBBhuqEEifxF_fDQqAjqsbuWWFmnAvwRfEAWxeC63viF
> WXPLul-GJXlxwqN$ [lists[.]apache[.]org]
>
> Implementation Issue:
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/FLIN
> K-22700__;!!LpKI!zkBBhuqEEifxF_fDQqAjqsbuWWFmnAvwRfEAWxeC63viFWXPLul-G
> N6AJm4h$ [issues[.]apache[.]org]
>
> Thanks,
> Eron Wright
> StreamNative
>


Integrating new connector with Flink SQL.

2021-06-02 Thread santhosh venkat
Hi,

Please correct me if I'm wrong anywhere. I'm just new to Flink and trying
to navigate the landscape.

Within my company, currently we're trying to develop a connector for our
internal change data capture system(brooklin) for flink. We are planning to
use Flink SQL as a primary API to build streaming applications.

When exploring flink contracts, we noticed that there are two different
flavors of APIs available in Flink for Source integration.

a) Flink Table API : The Flink ScanTableSource abstractions which are
currently relying upon the SourceFunction interfaces for integrating with
the underlying messaging-client libraries. For instance, KafkaDynamicSource
and KinesisDynamicSource are currently using the FlinkKafkaConsumer(an
implementation of RichParallelSourceFunction) and KinesisConsumer(an
implementation of RichParallelSourceFunction) respectively to read from the
broker.

b) FLIP-27 style connector implementations: There are connectors which
 implement SplitEnumerator and SourceReader abstractions, where the
Enumerator runs with the JobMaster and the Readers runs within the
TaskManager processes respectively.

Questions:

1. If I want to integrate a new connector and want to use Flink SQL, then
what is the recommendation? Are the users supposed to implement the
RichParallelSourceFunction, CheckpointListener etc similar to
FlinkKafkaConsumer and wire into the ScanTableSource API?

2. Just wondering, what is the long term plan for the ScanTablesource APIs?
Are there plans for them to use and integrate with the SplitEnumerator and
SourceReader abstractions?

3. If I want to offer my connector implementation to both Flink DataStream
and Flink SQL APIs, then should I implement both the flavors of source
APIs(SplitEnumerator/SourceReader as well as SourceFunction) in flink?

I would really appreciate it if someone can help and answer the above
questions.

Thanks.


[jira] [Created] (FLINK-22858) avro-confluent doesn't support confluent schema registry that has security enabled

2021-06-02 Thread TAO XIAO (Jira)
TAO XIAO created FLINK-22858:


 Summary: avro-confluent doesn't support confluent schema registry 
that has security enabled
 Key: FLINK-22858
 URL: https://issues.apache.org/jira/browse/FLINK-22858
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / Ecosystem
Affects Versions: 1.13.1
Reporter: TAO XIAO


Schema registry supports HTTP authentication via configuration[1] however 
avro-confluent doesn't pass format options to schema registry client which 
results in no authentication connection to schema registry only. To fix this we 
can pass the format options to CachedSchemaCoderProvider.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22857) Add possibility to call built-in functions in SpecializedFunction

2021-06-02 Thread Timo Walther (Jira)
Timo Walther created FLINK-22857:


 Summary: Add possibility to call built-in functions in 
SpecializedFunction
 Key: FLINK-22857
 URL: https://issues.apache.org/jira/browse/FLINK-22857
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


This is the last missing piece to avoid code generation when developing 
built-in functions. Core operations such as CAST, EQUALS, etc. will still use 
code generation but other built-in functions should be able to use these core 
operations without the need for generating code. It should be possible to call 
other built-in functions via a SpecializedFunction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: recover from svaepoint

2021-06-02 Thread Piotr Nowojski
Hi,

I think there is no generic way. If this error has happened indeed after
starting a second job from the same savepoint, or something like that, user
can change Sink's operator UID.

If this is an issue of intentional recovery from an earlier
checkpoint/savepoint, maybe `FlinkKafkaProducer#setLogFailuresOnly` will be
helpful.

Best, Piotrek

wt., 1 cze 2021 o 15:16 Till Rohrmann  napisał(a):

> The error message says that we are trying to reuse a transaction id that is
> currently being used or has expired.
>
> I am not 100% sure how this can happen. My suspicion is that you have
> resumed a job multiple times from the same savepoint. Have you checked that
> there is no other job which has been resumed from the same savepoint and
> which is currently running or has run and completed checkpoints?
>
> @pnowojski  @Becket Qin  how
> does the transaction id generation ensures that we don't have a clash of
> transaction ids if we resume the same job multiple times from the same
> savepoint? From the code, I do see that we have a TransactionalIdsGenerator
> which is initialized with the taskName and the operator UID.
>
> fyi: @Arvid Heise 
>
> Cheers,
> Till
>
>
> On Mon, May 31, 2021 at 11:10 AM 周瑞  wrote:
>
> > HI:
> >   When "sink.semantic = exactly-once", the following exception is
> > thrown when recovering from svaepoint
> >
> >public static final String KAFKA_TABLE_FORMAT =
> > "CREATE TABLE "+TABLE_NAME+" (\n" +
> > "  "+COLUMN_NAME+" STRING\n" +
> > ") WITH (\n" +
> > "   'connector' = 'kafka',\n" +
> > "   'topic' = '%s',\n" +
> > "   'properties.bootstrap.servers' = '%s',\n" +
> > "   'sink.semantic' = 'exactly-once',\n" +
> > "   'properties.transaction.timeout.ms' =
> > '90',\n" +
> > "   'sink.partitioner' =
> > 'com.woqutench.qmatrix.cdc.extractor.PkPartitioner',\n" +
> > "   'format' = 'dbz-json'\n" +
> > ")\n";
> >   [] - Source: TableSourceScan(table=[[default_catalog, default_database,
> > debezium_source]], fields=[data]) -> Sink: Sink
> > (table=[default_catalog.default_database.KafkaTable], fields=[data]) (1/1
> > )#859 (075273be72ab01bf1afd3c066876aaa6) switched from INITIALIZING to
> > FAILED with failure cause: org.apache.kafka.common.KafkaException:
> > Unexpected error in InitProducerIdResponse; Producer attempted an
> > operation with an old epoch. Either there is a newer producer with the
> > same transactionalId, or the producer's transaction has been expired by
> > the broker.
> > at org.apache.kafka.clients.producer.internals.
> >
> TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager
> > .java:1352)
> > at org.apache.kafka.clients.producer.internals.
> > TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:
> > 1260)
> > at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse
> > .java:109)
> > at org.apache.kafka.clients.NetworkClient.completeResponses(
> > NetworkClient.java:572)
> > at
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:564)
> > at org.apache.kafka.clients.producer.internals.Sender
> > .maybeSendAndPollTransactionalRequest(Sender.java:414)
> > at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender
> > .java:312)
> > at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:
> > 239)
> > at java.lang.Thread.run(Thread.java:748)
> >
>


[jira] [Created] (FLINK-22856) Move our Azure pipelines away from Ubuntu 16.04 by September

2021-06-02 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-22856:
--

 Summary: Move our Azure pipelines away from Ubuntu 16.04 by 
September
 Key: FLINK-22856
 URL: https://issues.apache.org/jira/browse/FLINK-22856
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Reporter: Robert Metzger
 Fix For: 1.14.0


Azure won't support Ubuntu 16.04 starting from October, hence we need to 
migrate to a newer ubuntu version.

We should do this at a time when the builds are relatively stable to be able to 
clearly identify issues relating to the version upgrade. Also, we shouldn't do 
this before a feature freeze ;) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discuss] Planning Flink 1.14

2021-06-02 Thread JING ZHANG
Hi Dawid, Joe & Xintong,

Thanks for starting the discussion.

I would like to polish Window TVFs[1][2] which is a popular feature in SQL
introduced in 1.13.

The detailed items are as follows.
1. Add more computations based on Window TVF
* Window Join (which is already merged in master branch)
* Window Table Function
* Window Deduplicate
2. Finish related JIRA to improve user experience
   * Add offset support for TUMBLE, HOP, session window
3. Complement the missing functions compared to the group window, which is
a precondition of deprecating the legacy Grouped Window Function in the
later versions.
   * Support Session windows
   * Support allow-lateness
   * Support retract input stream
   * Support window TVF in batch mode

[1] https://issues.apache.org/jira/browse/FLINK-19604
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows

Best regards,
JING ZHANG

Xintong Song  于2021年6月2日周三 下午6:45写道:

> Hi all,
>
> As 1.13 has been released for a while, I think it is a good time to start
> planning for the 1.14 release cycle.
>
> - Release managers: This time we'd like to have a team of 3 release
> managers. Dawid, Joe and I would like to volunteer for it. What do you
> think about it?
>
> - Timeline: According to our approximate 4 months release period, we
> propose to aim for a feature freeze roughly in early August (which could
> mean something like early September for the 1.14. release). Does it work
> for everyone?
>
> - Collecting features: It would be helpful to have a rough overview of the
> new features that will likely be included in this release. We have created
> a wiki page [1] for collecting such information. We'd like to kindly ask
> all committers to fill in the page with features that they intend to work
> on.
>
> We would also like to emphasize some aspects of the engineering process:
>
> - Stability of master: This has been an issue during the 1.13 feature
> freeze phase and it is still going on. We encourage every committer to not
> merge PRs through the Github button, but do this manually, with caution for
> the commits merged after the CI being triggered. It would be appreciated to
> always build the project before merging to master.
>
> - Documentation: Please try to see documentation as an integrated part of
> the engineering process and don't push it to the feature freeze phase or
> even after. You might even think about going documentation first. We, as
> the Flink community, are adding great stuff, that is pushing the limits of
> streaming data processors, with every release. We should also make this
> stuff usable for our users by documenting it well.
>
> - Promotion of 1.14: What applies to documentation also applies to all the
> activity around the release. We encourage every contributor to also think
> about, plan and prepare activities like blog posts and talk, that will
> promote and spread the release once it is done.
>
> Please let us know what you think.
>
> Thank you~
> Dawid, Joe & Xintong
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release
>


[jira] [Created] (FLINK-22855) Translate the 'Overview of Python API' page into Chinese.

2021-06-02 Thread Roc Marshal (Jira)
Roc Marshal created FLINK-22855:
---

 Summary: Translate the 'Overview of Python API' page into Chinese.
 Key: FLINK-22855
 URL: https://issues.apache.org/jira/browse/FLINK-22855
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation
Affects Versions: 1.14.0
Reporter: Roc Marshal


target link: 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/python/overview/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22854) Translate 'Apache Flink Documentation' index page to Chinese

2021-06-02 Thread Roc Marshal (Jira)
Roc Marshal created FLINK-22854:
---

 Summary: Translate 'Apache Flink Documentation' index page to 
Chinese
 Key: FLINK-22854
 URL: https://issues.apache.org/jira/browse/FLINK-22854
 Project: Flink
  Issue Type: Technical Debt
  Components: chinese-translation
Affects Versions: 1.14.0
Reporter: Roc Marshal


target page : https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22853) FLinkSql聚合函数max/min/sum返回结果重复

2021-06-02 Thread Raypon Wang (Jira)
Raypon Wang created FLINK-22853:
---

 Summary: FLinkSql聚合函数max/min/sum返回结果重复
 Key: FLINK-22853
 URL: https://issues.apache.org/jira/browse/FLINK-22853
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.12.1
Reporter: Raypon Wang


mysql数据如下:

id    offset

1      1

1      3

1      2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Discuss] Planning Flink 1.14

2021-06-02 Thread Xintong Song
Hi all,

As 1.13 has been released for a while, I think it is a good time to start
planning for the 1.14 release cycle.

- Release managers: This time we'd like to have a team of 3 release
managers. Dawid, Joe and I would like to volunteer for it. What do you
think about it?

- Timeline: According to our approximate 4 months release period, we
propose to aim for a feature freeze roughly in early August (which could
mean something like early September for the 1.14. release). Does it work
for everyone?

- Collecting features: It would be helpful to have a rough overview of the
new features that will likely be included in this release. We have created
a wiki page [1] for collecting such information. We'd like to kindly ask
all committers to fill in the page with features that they intend to work
on.

We would also like to emphasize some aspects of the engineering process:

- Stability of master: This has been an issue during the 1.13 feature
freeze phase and it is still going on. We encourage every committer to not
merge PRs through the Github button, but do this manually, with caution for
the commits merged after the CI being triggered. It would be appreciated to
always build the project before merging to master.

- Documentation: Please try to see documentation as an integrated part of
the engineering process and don't push it to the feature freeze phase or
even after. You might even think about going documentation first. We, as
the Flink community, are adding great stuff, that is pushing the limits of
streaming data processors, with every release. We should also make this
stuff usable for our users by documenting it well.

- Promotion of 1.14: What applies to documentation also applies to all the
activity around the release. We encourage every contributor to also think
about, plan and prepare activities like blog posts and talk, that will
promote and spread the release once it is done.

Please let us know what you think.

Thank you~
Dawid, Joe & Xintong

[1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release


[jira] [Created] (FLINK-22852) Add SPNEGO authentication support

2021-06-02 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-22852:
-

 Summary: Add SPNEGO authentication support
 Key: FLINK-22852
 URL: https://issues.apache.org/jira/browse/FLINK-22852
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 1.13.1
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22851) Add basic authentication support

2021-06-02 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-22851:
-

 Summary: Add basic authentication support
 Key: FLINK-22851
 URL: https://issues.apache.org/jira/browse/FLINK-22851
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 1.13.1
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-02 Thread Gabor Somogyi
Hi team,

Happy to be here and hope I can provide quality additions in the future.

Thank you all for helpful the suggestions!
Considering them the FLIP has been modified and the work continues on the
already existing Jira.

BR,
G


On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi 
wrote:

> Thanks, Chesney - I totally missed that. Answered on the ticket too, let
> us continue there then.
>
> Till, I agree that we should keep this codepath as slim as possible. It is
> an important design decision that we aim to keep the list of authentication
> protocols to a minimum. We believe that this should not be a primary
> concern of Flink and a trusted proxy service (for example Apache Knox)
> should be used to enable a multitude of enduser authentication mechanisms.
> The bare minimum of authentication mechanisms to support consequently
> consist of a single strong authentication protocol for which Kerberos is
> the enterprise solution and HTTP Basic primary for development and
> light-weight scenarios.
>
> Added the above wording to G's doc.
>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>
>
>
> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler 
> wrote:
>
>> There's a related effort:
>> https://issues.apache.org/jira/browse/FLINK-21108
>>
>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>> > Hi Gabor, welcome to the Flink community!
>> >
>> > Thanks for sharing this proposal with the community Márton. In general,
>> I
>> > agree that authentication is missing and that this is required for using
>> > Flink within an enterprise. The thing I am wondering is whether this
>> > feature strictly needs to be implemented inside of Flink or whether a
>> proxy
>> > setup could do the job? Have you considered this option? If yes, then it
>> > would be good to list it under the point of rejected alternatives.
>> >
>> > I do see the benefit of implementing this feature inside of Flink if
>> many
>> > users need it. If not, then it might be easier for the project to not
>> > increase the surface area since it makes the overall maintenance harder.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi 
>> wrote:
>> >
>> >> Hi team,
>> >>
>> >> Firstly I would like to introduce Gabor or G [1] for short to the
>> >> community, he is a Spark committer who has recently transitioned to the
>> >> Flink Engineering team at Cloudera and is looking forward to
>> contributing
>> >> to Apache Flink. Previously G primarily focused on Spark Streaming and
>> >> security.
>> >>
>> >> Based on requests from our customers G has implemented Kerberos and
>> HTTP
>> >> Basic Authentication for the Flink Dashboard and HistoryServer.
>> Previously
>> >> lacked an authentication story.
>> >>
>> >> We are looking to contribute this functionality back to the community,
>> we
>> >> believe that given Flink's maturity there should be a common code
>> solution
>> >> for this general pattern.
>> >>
>> >> We are looking forward to your feedback on G's design. [2]
>> >>
>> >> [1] http://gaborsomogyi.com/
>> >> [2]
>> >>
>> >>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>> >>
>>
>>


[jira] [Created] (FLINK-22850) org.apache.flink.configuration.ConfigOptions.defaultValue() and noDefaultValue() is Deprecated.

2021-06-02 Thread lixiaobao (Jira)
lixiaobao created FLINK-22850:
-

 Summary: 
org.apache.flink.configuration.ConfigOptions.defaultValue() and 
noDefaultValue() is Deprecated.
 Key: FLINK-22850
 URL: https://issues.apache.org/jira/browse/FLINK-22850
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.12.4, 1.12.3, 1.13.0, 1.11.1, 1.10.1
Reporter: lixiaobao
 Fix For: 1.13.0


{code:java}
//代码占位符
@Deprecated
public  ConfigOption defaultValue(T value) {
checkNotNull(value);
return new ConfigOption<>(
key, value.getClass(), ConfigOption.EMPTY_DESCRIPTION, value, 
false);
}

@Deprecated
public ConfigOption noDefaultValue() {
return new ConfigOption<>(
key, String.class, ConfigOption.EMPTY_DESCRIPTION, null, false);
}
{code}
The method is marked as Deprecated, shoud define the type explicitly first with 
one of the intType(), stringType(), etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-02 Thread Márton Balassi
Thanks, Chesney - I totally missed that. Answered on the ticket too, let us
continue there then.

Till, I agree that we should keep this codepath as slim as possible. It is
an important design decision that we aim to keep the list of authentication
protocols to a minimum. We believe that this should not be a primary
concern of Flink and a trusted proxy service (for example Apache Knox)
should be used to enable a multitude of enduser authentication mechanisms.
The bare minimum of authentication mechanisms to support consequently
consist of a single strong authentication protocol for which Kerberos is
the enterprise solution and HTTP Basic primary for development and
light-weight scenarios.

Added the above wording to G's doc.
https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit



On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler  wrote:

> There's a related effort:
> https://issues.apache.org/jira/browse/FLINK-21108
>
> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
> > Hi Gabor, welcome to the Flink community!
> >
> > Thanks for sharing this proposal with the community Márton. In general, I
> > agree that authentication is missing and that this is required for using
> > Flink within an enterprise. The thing I am wondering is whether this
> > feature strictly needs to be implemented inside of Flink or whether a
> proxy
> > setup could do the job? Have you considered this option? If yes, then it
> > would be good to list it under the point of rejected alternatives.
> >
> > I do see the benefit of implementing this feature inside of Flink if many
> > users need it. If not, then it might be easier for the project to not
> > increase the surface area since it makes the overall maintenance harder.
> >
> > Cheers,
> > Till
> >
> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi 
> wrote:
> >
> >> Hi team,
> >>
> >> Firstly I would like to introduce Gabor or G [1] for short to the
> >> community, he is a Spark committer who has recently transitioned to the
> >> Flink Engineering team at Cloudera and is looking forward to
> contributing
> >> to Apache Flink. Previously G primarily focused on Spark Streaming and
> >> security.
> >>
> >> Based on requests from our customers G has implemented Kerberos and HTTP
> >> Basic Authentication for the Flink Dashboard and HistoryServer.
> Previously
> >> lacked an authentication story.
> >>
> >> We are looking to contribute this functionality back to the community,
> we
> >> believe that given Flink's maturity there should be a common code
> solution
> >> for this general pattern.
> >>
> >> We are looking forward to your feedback on G's design. [2]
> >>
> >> [1] http://gaborsomogyi.com/
> >> [2]
> >>
> >>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>
>
>


Re: [VOTE] Watermark propagation with Sink API

2021-06-02 Thread Arvid Heise
+1 (binding)

Thanks Eron for driving this effort; it will open up new exciting use cases.

On Tue, Jun 1, 2021 at 7:17 PM Eron Wright 
wrote:

> After some good discussion about how to enhance the Sink API to process
> watermarks, I believe we're ready to proceed with a vote.  Voting will be
> open until at least Friday, June 4th, 2021.
>
> Reference:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-167%3A+Watermarks+for+Sink+API
>
> Discussion thread:
>
> https://lists.apache.org/thread.html/r5194e1cf157d1fd5ba7ca9b567cb01723bd582aa12dda57d25bca37e%40%3Cdev.flink.apache.org%3E
>
> Implementation Issue:
> https://issues.apache.org/jira/browse/FLINK-22700
>
> Thanks,
> Eron Wright
> StreamNative
>


[jira] [Created] (FLINK-22849) Drop remaining usages of legacy planner in E2E tests and Python

2021-06-02 Thread Timo Walther (Jira)
Timo Walther created FLINK-22849:


 Summary: Drop remaining usages of legacy planner in E2E tests and 
Python
 Key: FLINK-22849
 URL: https://issues.apache.org/jira/browse/FLINK-22849
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Table SQL / Legacy Planner, Test 
Infrastructure
Reporter: Timo Walther
Assignee: Timo Walther


This removes the remaining usages of legacy planner outside of the 
{{flink-table}} module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22848) Deprecate unquoted options for SET / RESET

2021-06-02 Thread Jira
Ingo Bürk created FLINK-22848:
-

 Summary: Deprecate unquoted options for SET / RESET
 Key: FLINK-22848
 URL: https://issues.apache.org/jira/browse/FLINK-22848
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Affects Versions: 1.14.0
Reporter: Ingo Bürk


Eventually we should agree to a version in which to deprecate, and a version in 
which to remove, the unquoted syntax for SET / RESET:
{code:java}
// To be deprecated / removed
SET a = b;
RESET a;

// New
SET 'a' = 'b';
RESET 'a';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22847) SET should print options quoted

2021-06-02 Thread Jira
Ingo Bürk created FLINK-22847:
-

 Summary: SET should print options quoted
 Key: FLINK-22847
 URL: https://issues.apache.org/jira/browse/FLINK-22847
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Affects Versions: 1.14.0, 1.13.2
Reporter: Ingo Bürk
Assignee: Ingo Bürk


In FLINK-22770 we exposed SET/RESET in the parser and introduced quoting the 
options when using SET, but kept the unquoted support in SQL client for now as 
well.

To facilitate the quoted syntax for a future deprecation of the unquoted one, 
SET; should print the current options using quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22846) Add SPNEGO authentication support

2021-06-02 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-22846:
-

 Summary: Add SPNEGO authentication support
 Key: FLINK-22846
 URL: https://issues.apache.org/jira/browse/FLINK-22846
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 1.13.1
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22845) Add Basic authentication support

2021-06-02 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-22845:
-

 Summary: Add Basic authentication support
 Key: FLINK-22845
 URL: https://issues.apache.org/jira/browse/FLINK-22845
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 1.13.1
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)