Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

2021-08-09 Thread Dong Lin
Thank you Mingliang for providing the comments.

Currently option-1 proposes Graph/GraphModel/GraphBuilder to build an
Estimator from a graph of Estimator/Transformer, where Estimator could
generate the model (as a Transformer) directly. On the other hand, option-2
proposes AlgoOperator that can be linked into a graph of AlgoOperator.

It seems that option-1 is closer to what TF does than option-2. Could you
double check whether you mean option-1 or option-2?




On Tue, Aug 10, 2021 at 11:29 AM 青雉(祁明良)  wrote:

> Vote for option 2.
> It is similar to what we are doing with Tensorflow.
> 1. Define the graph in training phase
> 2. Export model with different input/output spec for online inference
>
> Thanks,
> Mingliang
>
> On Aug 10, 2021, at 9:39 AM, Becket Qin  becket@gmail.com>> wrote:
>
> estimatorInputs
>
>
>
> 本?件及其附件含有小??公司的保密信息,?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、?制、或散?)本?件中的信息。如果??收了本?件,??立即??或?件通知?件人并?除本?件!
> This communication may contain privileged or other confidential
> information of Red. If you have received it in error, please advise the
> sender by reply e-mail and immediately delete the message and any
> attachments without copying or disclosing the contents. Thank you.
>


[jira] [Created] (FLINK-23697) flink standalone Isolation is poor, why not do as spark does

2021-08-09 Thread jackylau (Jira)
jackylau created FLINK-23697:


 Summary: flink standalone Isolation is poor, why not do as spark 
does
 Key: FLINK-23697
 URL: https://issues.apache.org/jira/browse/FLINK-23697
 Project: Flink
  Issue Type: Bug
Reporter: jackylau


flink standalone Isolation is poor, why not do as spark does.
spark abstract cluster manager, executor just like flink taskmanager.
spark worker(standalone) is process, and executor is as child process of worker.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

2021-08-09 Thread 青雉(祁明良)
Vote for option 2.
It is similar to what we are doing with Tensorflow.
1. Define the graph in training phase
2. Export model with different input/output spec for online inference

Thanks,
Mingliang

On Aug 10, 2021, at 9:39 AM, Becket Qin 
mailto:becket@gmail.com>> wrote:

estimatorInputs


本?件及其附件含有小??公司的保密信息,?限于?送?以上收件人或群?。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、?制、或散?)本?件中的信息。如果??收了本?件,??立即??或?件通知?件人并?除本?件!
This communication may contain privileged or other confidential information of 
Red. If you have received it in error, please advise the sender by reply e-mail 
and immediately delete the message and any attachments without copying or 
disclosing the contents. Thank you.


[jira] [Created] (FLINK-23696) RMQSourceTest.testRedeliveredSessionIDsAck fails on azure

2021-08-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-23696:


 Summary: RMQSourceTest.testRedeliveredSessionIDsAck fails on azure
 Key: FLINK-23696
 URL: https://issues.apache.org/jira/browse/FLINK-23696
 Project: Flink
  Issue Type: Bug
  Components: Connectors/ RabbitMQ
Affects Versions: 1.13.2
Reporter: Xintong Song
 Fix For: 1.13.3


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21792=logs=e1276d0f-df12-55ec-86b5-c0ad597d83c9=906e9244-f3be-5604-1979-e767c8a6f6d9=13297

{code}
Aug 10 01:15:35 [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 3.181 s <<< FAILURE! - in 
org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
Aug 10 01:15:35 [ERROR] 
testRedeliveredSessionIDsAck(org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest)
  Time elapsed: 0.269 s  <<< FAILURE!
Aug 10 01:15:35 java.lang.AssertionError: expected:<25> but was:<27>
Aug 10 01:15:35 at org.junit.Assert.fail(Assert.java:88)
Aug 10 01:15:35 at org.junit.Assert.failNotEquals(Assert.java:834)
Aug 10 01:15:35 at org.junit.Assert.assertEquals(Assert.java:645)
Aug 10 01:15:35 at org.junit.Assert.assertEquals(Assert.java:631)
Aug 10 01:15:35 at 
org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest.testRedeliveredSessionIDsAck(RMQSourceTest.java:407)
Aug 10 01:15:35 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Aug 10 01:15:35 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 10 01:15:35 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 10 01:15:35 at java.lang.reflect.Method.invoke(Method.java:498)
Aug 10 01:15:35 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
Aug 10 01:15:35 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Aug 10 01:15:35 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
Aug 10 01:15:35 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Aug 10 01:15:35 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Aug 10 01:15:35 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Aug 10 01:15:35 at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
Aug 10 01:15:35 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
Aug 10 01:15:35 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
Aug 10 01:15:35 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
Aug 10 01:15:35 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
Aug 10 01:15:35 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
Aug 10 01:15:35 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
Aug 10 01:15:35 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
Aug 10 01:15:35 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
Aug 10 01:15:35 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
Aug 10 01:15:35 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
Aug 10 01:15:35 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
Aug 10 01:15:35 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23695) DispatcherFailoverITCase.testRecoverFromCheckpointAfterJobGraphRemovalOfTerminatedJobFailed fails due to timeout

2021-08-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-23695:


 Summary: 
DispatcherFailoverITCase.testRecoverFromCheckpointAfterJobGraphRemovalOfTerminatedJobFailed
 fails due to timeout
 Key: FLINK-23695
 URL: https://issues.apache.org/jira/browse/FLINK-23695
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.14.0
Reporter: Xintong Song
 Fix For: 1.14.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21790=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=9573

{code}
Aug 09 23:09:32 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 60.331 s <<< FAILURE! - in 
org.apache.flink.runtime.dispatcher.DispatcherFailoverITCase
Aug 09 23:09:32 [ERROR] 
testRecoverFromCheckpointAfterJobGraphRemovalOfTerminatedJobFailed(org.apache.flink.runtime.dispatcher.DispatcherFailoverITCase)
  Time elapsed: 60.265 s  <<< ERROR!
Aug 09 23:09:32 java.util.concurrent.TimeoutException: Condition was not met in 
given timeout.
Aug 09 23:09:32 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:154)
Aug 09 23:09:32 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:132)
Aug 09 23:09:32 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:124)
Aug 09 23:09:32 at 
org.apache.flink.runtime.dispatcher.AbstractDispatcherTest.awaitStatus(AbstractDispatcherTest.java:81)
Aug 09 23:09:32 at 
org.apache.flink.runtime.dispatcher.DispatcherFailoverITCase.testRecoverFromCheckpointAfterJobGraphRemovalOfTerminatedJobFailed(DispatcherFailoverITCase.java:137)
Aug 09 23:09:32 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Aug 09 23:09:32 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 09 23:09:32 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 09 23:09:32 at java.lang.reflect.Method.invoke(Method.java:498)
Aug 09 23:09:32 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Aug 09 23:09:32 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Aug 09 23:09:32 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Aug 09 23:09:32 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Aug 09 23:09:32 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Aug 09 23:09:32 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Aug 09 23:09:32 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
Aug 09 23:09:32 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
Aug 09 23:09:32 at 
org.apache.flink.runtime.util.TestingFatalErrorHandlerResource$CloseableStatement.evaluate(TestingFatalErrorHandlerResource.java:91)
Aug 09 23:09:32 at 
org.apache.flink.runtime.util.TestingFatalErrorHandlerResource$CloseableStatement.access$200(TestingFatalErrorHandlerResource.java:83)
Aug 09 23:09:32 at 
org.apache.flink.runtime.util.TestingFatalErrorHandlerResource$1.evaluate(TestingFatalErrorHandlerResource.java:55)
Aug 09 23:09:32 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Aug 09 23:09:32 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Aug 09 23:09:32 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Aug 09 23:09:32 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Aug 09 23:09:32 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Aug 09 23:09:32 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Aug 09 23:09:32 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Aug 09 23:09:32 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Aug 09 23:09:32 at 

[jira] [Created] (FLINK-23694) KafkaWriterITCase hangs on azure

2021-08-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-23694:


 Summary: KafkaWriterITCase hangs on azure
 Key: FLINK-23694
 URL: https://issues.apache.org/jira/browse/FLINK-23694
 Project: Flink
  Issue Type: Bug
  Components: kafka
Affects Versions: 1.14.0
Reporter: Xintong Song
 Fix For: 1.14.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21761=logs=c5f0071e-1851-543e-9a45-9ac140befc32=15a22db7-8faa-5b34-3920-d33c9f0ca23c=7220



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

2021-08-09 Thread Becket Qin
Thanks Mingliang. It is super helpful to get your input. At this point,
there are two ways mentioned in the FLIP to support heterogeneous topology
in training and inference phase.

1. Create two separate DAGs or code for training and inference respectively.
2. An encapsulation API called Graph, which takes estimatorInputs for
training and transformerInput for inference. [1]

It is worth mentioning that the Graph API does not conflict with the first
way. It just combines the two separate DAGs into one for the heterogeneous
case we are talking about. And users may use fit and transform calls just
like what they did in the Pipeline. However, it does mean that users will
have to describe the Graph, rather than calling Transformer.transform() and
Estimator.fit() by themselves. This introduces some complexity.

I am curious which way would you prefer when building your system?

Thanks,

Jiangjie (Becket) Qin

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181311360=true#:~:text=%C2%A0%C2%A0%C2%A0%C2%A0Graph%20build(TableId%5B%5D%20estimatorInputs%2C%20TableId%5B%5D%20transformerInputs%2C%20TableId%5B%5D%20outputs%2C%20TableId%5B%5D%20inputStates%2C%20TableId%5B%5D%20outputStates)%20%7B...%7D

On Mon, Aug 9, 2021 at 7:46 PM 青雉(祁明良)  wrote:

> Hi all,
>
> This is mingliang, a machine learning engineer in recommendation area.
> I see there’s discussion about “heterogeneous topologies in training and
> inference.” Actually this is a very common case in recommendation system
> especially in CTR prediction tasks. For training task, usually data is
> compressed and optimized for storage issues, additional feature extraction
> and transformation will be applied before train task. And for online
> inference, data is usually well prepared from recommendation engine and
> optimized for latency issues.
>
> Best,
> Mingliang
>
> > On Aug 6, 2021, at 3:55 PM, Becket Qin  wrote:
> >
> > Hi Zhipeng,
> >
> > Yes, I agree that the key difference between the two options is how they
> > support MIMO.
> >
> > My main concern for option 2 is potential inconsistent availability of
> > algorithms in the two sets of API. In order to make an algorithm
> available
> > to both sets of API, people have to implement the same algorithm with two
> > different APIs. And sometimes an algorithm available as AlgoOps may not
> > exist as a Transformer. This seems pretty confusing to the users.
> >
> > Therefore, personally speaking, I am in favor of one set of API. Also I
> > feel that the MIMO extension of the Transformer/Estimator API is still
> > quite intuitive. However, I understand that others may think differently.
> > So it would be good to see the opinion from more people.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jul 20, 2021 at 7:10 PM Zhipeng Zhang 
> > wrote:
> >
> >> Hi Becket,
> >>
> >> Thanks for the review! I totally agree that it would be easier for
> people
> >> to discuss if we can list the fundamental difference between these two
> >> proposals. (So I want to make the discussion even shorter)
> >>
> >> In my opinion, the fundamental difference between proposal-1 and
> >> proposal-2 is how they support the multi-input multi-output (MIMO)
> machine
> >> learning algorithms.
> >>
> >> Proposal-1 supports MIMO by extending Transformer/Estimator/Pipeline to
> >> take multiple inputs and output multiple outputs.
> >>
> >> Proposal-2 does not change the definition of
> >> Transformer/Estimator/Pipeline. Rather, to support MIMO it comes up
> with a
> >> new abstraction --- AlgoOperator, which is essentially an abstraction
> for
> >> machine learning functions. That is, proposal-2 employs well-recognized
> >> Transformer/Estimator/Pipeline to support single-input single-output
> (SISO)
> >> machine learning algorithms and AlgoOperator to support MIMO.
> >>
> >> In my opinion, the benefit of proposal-1 is that there is only one set
> of
> >> API and it is clean. However, it breaks the user habits (SISO for
> >> Transformer/Estimator/Pipeline). Users have to think more than before
> when
> >> using the new Transformer/Estimator/Pipeline. [1]
> >>
> >> The benefit of proposal-2 is that it does not change anything of the
> >> well-recognized Transformer/Estimator/Pipeline and existing users (e.g.,
> >> Spark MLlib users) would be happy.
> >> However, as you mentioned, proposal-2 introduces a new abstraction
> >> (AlgoOperator), which may increase the burden for understanding.
> >>
> >> [1]
> >>
> https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do/edit#heading=h.c2qr9r64btd9
> >>
> >>
> >> Thanks,
> >>
> >> Zhipeng Zhang
> >>
> >> Becket Qin  于2021年7月20日周二 上午11:42写道:
> >>
> >>> Hi Dong, Zhipeng and Fan,
> >>>
> >>> Thanks for the detailed proposals. It is quite a lot of reading! Given
> >>> that we are introducing a lot of stuff here, I find that it might be
> easier
> >>> for people to discuss if we can list the fundamental differences first.
> >>> From what I 

Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-09 Thread Xintong Song
Thanks Yun and everyone~!

Thank you~

Xintong Song



On Mon, Aug 9, 2021 at 10:14 PM Till Rohrmann  wrote:

> Thanks Yun Tang for being our release manager and the great work! Also
> thanks a lot to everyone who contributed to this release.
>
> Cheers,
> Till
>
> On Mon, Aug 9, 2021 at 9:48 AM Yu Li  wrote:
>
>> Thanks Yun Tang for being our release manager and everyone else who made
>> the release possible!
>>
>> Best Regards,
>> Yu
>>
>>
>> On Fri, 6 Aug 2021 at 13:52, Yun Tang  wrote:
>>
>>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.13.2, which is the second bugfix release for the Apache
>>> Flink 1.13 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> https://flink.apache.org/news/2021/08/06/release-1.13.2.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218==12315522
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Regards,
>>> Yun Tang
>>>
>>


Re: Incompatible RAW types in Table API

2021-08-09 Thread Timo Walther

Sorry, I meant "will be deprecated in Flink 1.14"

On 09.08.21 19:32, Timo Walther wrote:

Hi Dominik,

`toAppendStream` is soft deprecated in Flink 1.13 and will be deprecated 
in Flink 1.13. It uses the old type system and might not match perfectly 
with the other reworked type system in new functions and sources.


For SQL, a lot of Avro classes need to be treated as RAW types. But we 
might address this issue soon and further improve Avro support.


I would suggest to continue this discussion in a JIRA issue. Can you 
also share the code for `NewEvent` and te Avro schema or generated Avro 
class for `Event` for to have a fully reproducible example?


What can help is to explicitly define the types:

E.g. you can also use `DataTypes.of(TypeInformation)` both in 
`ScalarFunction.getTypeInference` and 
`StreamTableEnvironment.toDataStream()`.


I hope this helps.

Timo

On 09.08.21 16:27, Dominik Wosiński wrote:

It should be `id` instead of `licence` in the error, I've copy-pasted it
incorrectly :<

I've also tried additional thing, i.e. creating the ScalarFunction that
does mapping of one avro generated enum to additional avro generated 
enum:


@FunctionHint(
   input = Array(
 new DataTypeHint(value = "RAW", bridgedTo = classOf[OneEnum])
   ),
   output = new DataTypeHint(value = "RAW", bridgedTo = 
classOf[OtherEnum])

)
class EnumMappingFunction extends ScalarFunction {

   def eval(event: OneEnum): OtherEnum = {OtherEnum.DEFAULT_VALUE}
}

This results in the following error:



*Invalid argument type at position 0. Data type RAW('org.test.OneEnum',
'...') expected but RAW('org.test.OneEnum', '...') passed.*


pon., 9 sie 2021 o 15:13 Dominik Wosiński  napisał(a):


Hey all,

I think I've hit some weird issue in Flink TypeInformation generation. I
have the following code:

val stream: DataStream[Event] = ...
tableEnv.createTemporaryView("TableName",stream)
val table = tableEnv
.sqlQuery("SELECT id, timestamp, eventType from TableName")
tableEnvironment.toAppendStream[NewEvent](table)

In this particual example *Event* is an avro generated class and 
*NewEvent

*is just POJO. This is just a toy example so please ignore the fact that
this operation doesn't make much sense.

When I try to run the code I am getting the following error:





*org.apache.flink.table.api.ValidationException: Column types of query
result and sink for unregistered table do not match.Cause: Incompatible
types for sink column 'licence' at position 0.Query schema: [id:
RAW('org.apache.avro.util.Utf8', '...'), timestamp: BIGINT NOT NULL, 
kind:

RAW('org.test.EventType', '...')]*

*Sink schema:  id: RAW('org.apache.avro.util.Utf8', '?'), timestamp:
BIGINT, kind: RAW('org.test.EventType', '?')]*

So, it seems that the type is recognized correctly but for some reason
there is still mismatch according to Flink, maybe because of 
different type

serializer used ?

Thanks in advance for any help,
Best Regards,
Dom.












Re: Incompatible RAW types in Table API

2021-08-09 Thread Timo Walther

Hi Dominik,

`toAppendStream` is soft deprecated in Flink 1.13 and will be deprecated 
in Flink 1.13. It uses the old type system and might not match perfectly 
with the other reworked type system in new functions and sources.


For SQL, a lot of Avro classes need to be treated as RAW types. But we 
might address this issue soon and further improve Avro support.


I would suggest to continue this discussion in a JIRA issue. Can you 
also share the code for `NewEvent` and te Avro schema or generated Avro 
class for `Event` for to have a fully reproducible example?


What can help is to explicitly define the types:

E.g. you can also use `DataTypes.of(TypeInformation)` both in 
`ScalarFunction.getTypeInference` and 
`StreamTableEnvironment.toDataStream()`.


I hope this helps.

Timo

On 09.08.21 16:27, Dominik Wosiński wrote:

It should be `id` instead of `licence` in the error, I've copy-pasted it
incorrectly :<

I've also tried additional thing, i.e. creating the ScalarFunction that
does mapping of one avro generated enum to additional avro generated enum:

@FunctionHint(
   input = Array(
 new DataTypeHint(value = "RAW", bridgedTo = classOf[OneEnum])
   ),
   output = new DataTypeHint(value = "RAW", bridgedTo = classOf[OtherEnum])
)
class EnumMappingFunction extends ScalarFunction {

   def eval(event: OneEnum): OtherEnum = {OtherEnum.DEFAULT_VALUE}
}

This results in the following error:



*Invalid argument type at position 0. Data type RAW('org.test.OneEnum',
'...') expected but RAW('org.test.OneEnum', '...') passed.*


pon., 9 sie 2021 o 15:13 Dominik Wosiński  napisał(a):


Hey all,

I think I've hit some weird issue in Flink TypeInformation generation. I
have the following code:

val stream: DataStream[Event] = ...
tableEnv.createTemporaryView("TableName",stream)
val table = tableEnv
.sqlQuery("SELECT id, timestamp, eventType from TableName")
tableEnvironment.toAppendStream[NewEvent](table)

In this particual example *Event* is an avro generated class and *NewEvent
*is just POJO. This is just a toy example so please ignore the fact that
this operation doesn't make much sense.

When I try to run the code I am getting the following error:





*org.apache.flink.table.api.ValidationException: Column types of query
result and sink for unregistered table do not match.Cause: Incompatible
types for sink column 'licence' at position 0.Query schema: [id:
RAW('org.apache.avro.util.Utf8', '...'), timestamp: BIGINT NOT NULL, kind:
RAW('org.test.EventType', '...')]*

*Sink schema:  id: RAW('org.apache.avro.util.Utf8', '?'), timestamp:
BIGINT, kind: RAW('org.test.EventType', '?')]*

So, it seems that the type is recognized correctly but for some reason
there is still mismatch according to Flink, maybe because of different type
serializer used ?

Thanks in advance for any help,
Best Regards,
Dom.










Re: [VOTE] FLIP-180: Adjust StreamStatus and Idleness definition

2021-08-09 Thread Tzu-Li (Gordon) Tai
+1 (binding)

On Tue, Aug 10, 2021 at 12:15 AM Piotr Nowojski 
wrote:

> +1 (binding)
>
> Piotrek
>
> pon., 9 sie 2021 o 18:00 Stephan Ewen  napisał(a):
>
> > +1 (binding)
> >
> > On Mon, Aug 9, 2021 at 12:08 PM Till Rohrmann 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Aug 5, 2021 at 9:09 PM Arvid Heise  wrote:
> > >
> > > > Dear devs,
> > > >
> > > > I'd like to open a vote on FLIP-180: Adjust StreamStatus and Idleness
> > > > definition [1] which was discussed in this thread [2].
> > > > The vote will be open for at least 72 hours unless there is an
> > objection
> > > or
> > > > not enough votes.
> > > >
> > > > Best,
> > > >
> > > > Arvid
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r8357d64b9cfdf5a233c53a20d9ac62b75c07c925ce2c43e162f1e39c%40%3Cdev.flink.apache.org%3E
> > > >
> > >
> >
>


Re: [VOTE] FLIP-180: Adjust StreamStatus and Idleness definition

2021-08-09 Thread Piotr Nowojski
+1 (binding)

Piotrek

pon., 9 sie 2021 o 18:00 Stephan Ewen  napisał(a):

> +1 (binding)
>
> On Mon, Aug 9, 2021 at 12:08 PM Till Rohrmann 
> wrote:
>
> > +1 (binding)
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 5, 2021 at 9:09 PM Arvid Heise  wrote:
> >
> > > Dear devs,
> > >
> > > I'd like to open a vote on FLIP-180: Adjust StreamStatus and Idleness
> > > definition [1] which was discussed in this thread [2].
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > not enough votes.
> > >
> > > Best,
> > >
> > > Arvid
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/r8357d64b9cfdf5a233c53a20d9ac62b75c07c925ce2c43e162f1e39c%40%3Cdev.flink.apache.org%3E
> > >
> >
>


Re: [VOTE] FLIP-180: Adjust StreamStatus and Idleness definition

2021-08-09 Thread Stephan Ewen
+1 (binding)

On Mon, Aug 9, 2021 at 12:08 PM Till Rohrmann  wrote:

> +1 (binding)
>
> Cheers,
> Till
>
> On Thu, Aug 5, 2021 at 9:09 PM Arvid Heise  wrote:
>
> > Dear devs,
> >
> > I'd like to open a vote on FLIP-180: Adjust StreamStatus and Idleness
> > definition [1] which was discussed in this thread [2].
> > The vote will be open for at least 72 hours unless there is an objection
> or
> > not enough votes.
> >
> > Best,
> >
> > Arvid
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
> > [2]
> >
> >
> https://lists.apache.org/thread.html/r8357d64b9cfdf5a233c53a20d9ac62b75c07c925ce2c43e162f1e39c%40%3Cdev.flink.apache.org%3E
> >
>


[jira] [Created] (FLINK-23693) Add principal creation possibilities to SecureTestEnvironment

2021-08-09 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-23693:
-

 Summary: Add principal creation possibilities to 
SecureTestEnvironment
 Key: FLINK-23693
 URL: https://issues.apache.org/jira/browse/FLINK-23693
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.14.0
Reporter: Gabor Somogyi


Kerberos authentication handler (external project) needs it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23692) Clarify text on "Cancel Job" confirmation buttons

2021-08-09 Thread Jira
Márton Balassi created FLINK-23692:
--

 Summary: Clarify text on "Cancel Job" confirmation buttons
 Key: FLINK-23692
 URL: https://issues.apache.org/jira/browse/FLINK-23692
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.12.5, 1.13.1
Reporter: Márton Balassi
Assignee: Zsombor Chikán
 Attachments: image-2021-07-16-14-47-12-383.png, 
image-2021-07-21-11-03-05-977.png

When prompting a user to confirm a job cancellation currently we present the 
following popup (see screenshots). These text descriptions are the default in 
the html layer, however it happens to be misleading when the action to be 
confirmed is cancel.

Multiple users reported that this behavior is confusing and as it is only a 
display issue its scope is very limited.

 

!image-2021-07-21-11-03-05-977.png!!image-2021-07-16-14-47-12-383.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Incompatible RAW types in Table API

2021-08-09 Thread Dominik Wosiński
It should be `id` instead of `licence` in the error, I've copy-pasted it
incorrectly :<

I've also tried additional thing, i.e. creating the ScalarFunction that
does mapping of one avro generated enum to additional avro generated enum:

@FunctionHint(
  input = Array(
new DataTypeHint(value = "RAW", bridgedTo = classOf[OneEnum])
  ),
  output = new DataTypeHint(value = "RAW", bridgedTo = classOf[OtherEnum])
)
class EnumMappingFunction extends ScalarFunction {

  def eval(event: OneEnum): OtherEnum = {OtherEnum.DEFAULT_VALUE}
}

This results in the following error:



*Invalid argument type at position 0. Data type RAW('org.test.OneEnum',
'...') expected but RAW('org.test.OneEnum', '...') passed.*


pon., 9 sie 2021 o 15:13 Dominik Wosiński  napisał(a):

> Hey all,
>
> I think I've hit some weird issue in Flink TypeInformation generation. I
> have the following code:
>
> val stream: DataStream[Event] = ...
> tableEnv.createTemporaryView("TableName",stream)
> val table = tableEnv
> .sqlQuery("SELECT id, timestamp, eventType from TableName")
> tableEnvironment.toAppendStream[NewEvent](table)
>
> In this particual example *Event* is an avro generated class and *NewEvent
> *is just POJO. This is just a toy example so please ignore the fact that
> this operation doesn't make much sense.
>
> When I try to run the code I am getting the following error:
>
>
>
>
>
> *org.apache.flink.table.api.ValidationException: Column types of query
> result and sink for unregistered table do not match.Cause: Incompatible
> types for sink column 'licence' at position 0.Query schema: [id:
> RAW('org.apache.avro.util.Utf8', '...'), timestamp: BIGINT NOT NULL, kind:
> RAW('org.test.EventType', '...')]*
>
> *Sink schema:  id: RAW('org.apache.avro.util.Utf8', '?'), timestamp:
> BIGINT, kind: RAW('org.test.EventType', '?')]*
>
> So, it seems that the type is recognized correctly but for some reason
> there is still mismatch according to Flink, maybe because of different type
> serializer used ?
>
> Thanks in advance for any help,
> Best Regards,
> Dom.
>
>
>
>


Re: Incompatible RAW types in Table API

2021-08-09 Thread Till Rohrmann
Hi Dominik,

I am pulling in Timo who might know more about this.

Cheers,
Till

On Mon, Aug 9, 2021 at 3:21 PM Dominik Wosiński  wrote:

> Hey all,
>
> I think I've hit some weird issue in Flink TypeInformation generation. I
> have the following code:
>
> val stream: DataStream[Event] = ...
> tableEnv.createTemporaryView("TableName",stream)
> val table = tableEnv
> .sqlQuery("SELECT id, timestamp, eventType from TableName")
> tableEnvironment.toAppendStream[NewEvent](table)
>
> In this particual example *Event* is an avro generated class and *NewEvent
> *is just POJO. This is just a toy example so please ignore the fact that
> this operation doesn't make much sense.
>
> When I try to run the code I am getting the following error:
>
>
>
>
>
> *org.apache.flink.table.api.ValidationException: Column types of query
> result and sink for unregistered table do not match.Cause: Incompatible
> types for sink column 'licence' at position 0.Query schema: [id:
> RAW('org.apache.avro.util.Utf8', '...'), timestamp: BIGINT NOT NULL, kind:
> RAW('org.test.EventType', '...')]*
>
> *Sink schema:  id: RAW('org.apache.avro.util.Utf8', '?'), timestamp:
> BIGINT, kind: RAW('org.test.EventType', '?')]*
>
> So, it seems that the type is recognized correctly but for some reason
> there is still mismatch according to Flink, maybe because of different type
> serializer used ?
>
> Thanks in advance for any help,
> Best Regards,
> Dom.
>


Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-09 Thread Till Rohrmann
Thanks Yun Tang for being our release manager and the great work! Also
thanks a lot to everyone who contributed to this release.

Cheers,
Till

On Mon, Aug 9, 2021 at 9:48 AM Yu Li  wrote:

> Thanks Yun Tang for being our release manager and everyone else who made
> the release possible!
>
> Best Regards,
> Yu
>
>
> On Fri, 6 Aug 2021 at 13:52, Yun Tang  wrote:
>
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.13.2, which is the second bugfix release for the Apache
>> Flink 1.13 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2021/08/06/release-1.13.2.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218==12315522
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Regards,
>> Yun Tang
>>
>


Incompatible RAW types in Table API

2021-08-09 Thread Dominik Wosiński
Hey all,

I think I've hit some weird issue in Flink TypeInformation generation. I
have the following code:

val stream: DataStream[Event] = ...
tableEnv.createTemporaryView("TableName",stream)
val table = tableEnv
.sqlQuery("SELECT id, timestamp, eventType from TableName")
tableEnvironment.toAppendStream[NewEvent](table)

In this particual example *Event* is an avro generated class and *NewEvent
*is just POJO. This is just a toy example so please ignore the fact that
this operation doesn't make much sense.

When I try to run the code I am getting the following error:





*org.apache.flink.table.api.ValidationException: Column types of query
result and sink for unregistered table do not match.Cause: Incompatible
types for sink column 'licence' at position 0.Query schema: [id:
RAW('org.apache.avro.util.Utf8', '...'), timestamp: BIGINT NOT NULL, kind:
RAW('org.test.EventType', '...')]*

*Sink schema:  id: RAW('org.apache.avro.util.Utf8', '?'), timestamp:
BIGINT, kind: RAW('org.test.EventType', '?')]*

So, it seems that the type is recognized correctly but for some reason
there is still mismatch according to Flink, maybe because of different type
serializer used ?

Thanks in advance for any help,
Best Regards,
Dom.


[jira] [Created] (FLINK-23691) flink sql submit job blob parallelism

2021-08-09 Thread Peihui He (Jira)
Peihui He created FLINK-23691:
-

 Summary: flink sql submit job blob parallelism
 Key: FLINK-23691
 URL: https://issues.apache.org/jira/browse/FLINK-23691
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Affects Versions: 1.13.1
Reporter: Peihui He
 Attachments: image-2021-08-09-20-52-01-441.png, 
image-2021-08-09-20-52-45-998.png

I'm using zepplin to submit sql job. Recent i fount it takes a lot of time to 
submit sql job when i have more depency packages.

Why not blob client is current ?


org.apache.flink.runtime.client.ClientUtils.java

org.apache.flink.runtime.blob.BlobServer.java

 

by modifying the above files , it cost only a few seconds。Before that ,it 
always take more than 1 min.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23690) Processing timers can be triggered more efficiently

2021-08-09 Thread Jiayi Liao (Jira)
Jiayi Liao created FLINK-23690:
--

 Summary: Processing timers can be triggered more efficiently
 Key: FLINK-23690
 URL: https://issues.apache.org/jira/browse/FLINK-23690
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Affects Versions: 1.13.2, 1.12.5, 1.11.4, 1.14.0
Reporter: Jiayi Liao


After FLINK-23208, the processing timers are triggered more efficiently but it 
can still be improved. (The performance can be tested with 
[benchmark|https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/benchmark/ProcessingTimerBenchmark.java])

Currently {{InternalTimerService.onProcessingTime(long)}} polls a timer from 
{{processingTimeTimersQueue}} and register a new timer after the polled timer 
is triggered, which means timers with different timestamps will be registered 
for multiple times. This can be improved with codes below: 

{code:java}
long now = System.currentTimeMillis() - 1
while ((timer = processingTimeTimersQueue.peek()) != null && 
timer.getTimestamp() <= now) {
processingTimeTimersQueue.poll();
keyContext.setCurrentKey(timer.getKey());
triggerTarget.onProcessingTime(timer);
}
{code}

But due to the bug described in FLINK-23689, this change has conflicts with 
current implementation of {{TestProcessingTimeService.setCurrentTime(long)}}, 
which causes a lot of tests to fail(e.g. InternalTimerServiceImplTest). 
Therefore, before working on this improvement, FLINK-23689 should be fixed 
firstly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23689) TestProcessingTimeService.setCurrentTime(long) not delay the firing timer by 1ms delay

2021-08-09 Thread Jiayi Liao (Jira)
Jiayi Liao created FLINK-23689:
--

 Summary: TestProcessingTimeService.setCurrentTime(long) not delay 
the firing timer by 1ms delay
 Key: FLINK-23689
 URL: https://issues.apache.org/jira/browse/FLINK-23689
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task, Tests
Affects Versions: 1.13.2, 1.12.4, 1.11.4, 1.14.0
Reporter: Jiayi Liao


FLINK-9857 enabled {{SystemProcessingTimeService}}  to fire processing timers 
by 1ms delay but it ignored {{TestProcessingTimeService}}, and the method 
{{TestProcessingTimeService.setCurrentTime}} can still trigger the processing 
timers whose timestamp is equal to {{currentTime}}. 

{code:java}
while (!priorityQueue.isEmpty() && currentTime >= priorityQueue.peek().f0) {
..
}
{code}

We can simply fix the problem with {{currentTime > priorityQueue.peek().f0}}, 
but it will break too many existing tests: 

* Tests using {{TestProcessingTimeService.setCurrentTime(long)}} (64 usage)
* Tests using {{AbstractStreamOperatorTestHarness.setProcessingTime(long)}} 
(368 usage)

Tips: Some tests can be fixed with plusing 1ms on the parameter of 
{{TestProcessingTimeService.setCurrentTime(long)}}, but this doesn't work in 
tests that invokes {{setCurrentTime(long)}} for serveral times.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)

2021-08-09 Thread 青雉(祁明良)
Hi all,

This is mingliang, a machine learning engineer in recommendation area.
I see there’s discussion about “heterogeneous topologies in training and 
inference.” Actually this is a very common case in recommendation system 
especially in CTR prediction tasks. For training task, usually data is 
compressed and optimized for storage issues, additional feature extraction and 
transformation will be applied before train task. And for online inference, 
data is usually well prepared from recommendation engine and optimized for 
latency issues.

Best,
Mingliang

> On Aug 6, 2021, at 3:55 PM, Becket Qin  wrote:
>
> Hi Zhipeng,
>
> Yes, I agree that the key difference between the two options is how they
> support MIMO.
>
> My main concern for option 2 is potential inconsistent availability of
> algorithms in the two sets of API. In order to make an algorithm available
> to both sets of API, people have to implement the same algorithm with two
> different APIs. And sometimes an algorithm available as AlgoOps may not
> exist as a Transformer. This seems pretty confusing to the users.
>
> Therefore, personally speaking, I am in favor of one set of API. Also I
> feel that the MIMO extension of the Transformer/Estimator API is still
> quite intuitive. However, I understand that others may think differently.
> So it would be good to see the opinion from more people.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jul 20, 2021 at 7:10 PM Zhipeng Zhang 
> wrote:
>
>> Hi Becket,
>>
>> Thanks for the review! I totally agree that it would be easier for people
>> to discuss if we can list the fundamental difference between these two
>> proposals. (So I want to make the discussion even shorter)
>>
>> In my opinion, the fundamental difference between proposal-1 and
>> proposal-2 is how they support the multi-input multi-output (MIMO) machine
>> learning algorithms.
>>
>> Proposal-1 supports MIMO by extending Transformer/Estimator/Pipeline to
>> take multiple inputs and output multiple outputs.
>>
>> Proposal-2 does not change the definition of
>> Transformer/Estimator/Pipeline. Rather, to support MIMO it comes up with a
>> new abstraction --- AlgoOperator, which is essentially an abstraction for
>> machine learning functions. That is, proposal-2 employs well-recognized
>> Transformer/Estimator/Pipeline to support single-input single-output (SISO)
>> machine learning algorithms and AlgoOperator to support MIMO.
>>
>> In my opinion, the benefit of proposal-1 is that there is only one set of
>> API and it is clean. However, it breaks the user habits (SISO for
>> Transformer/Estimator/Pipeline). Users have to think more than before when
>> using the new Transformer/Estimator/Pipeline. [1]
>>
>> The benefit of proposal-2 is that it does not change anything of the
>> well-recognized Transformer/Estimator/Pipeline and existing users (e.g.,
>> Spark MLlib users) would be happy.
>> However, as you mentioned, proposal-2 introduces a new abstraction
>> (AlgoOperator), which may increase the burden for understanding.
>>
>> [1]
>> https://docs.google.com/document/d/1L3aI9LjkcUPoM52liEY6uFktMnFMNFQ6kXAjnz_11do/edit#heading=h.c2qr9r64btd9
>>
>>
>> Thanks,
>>
>> Zhipeng Zhang
>>
>> Becket Qin  于2021年7月20日周二 上午11:42写道:
>>
>>> Hi Dong, Zhipeng and Fan,
>>>
>>> Thanks for the detailed proposals. It is quite a lot of reading! Given
>>> that we are introducing a lot of stuff here, I find that it might be easier
>>> for people to discuss if we can list the fundamental differences first.
>>> From what I understand, the very fundamental difference between the two
>>> proposals is following:
>>>
>>> * In order to support graph structure, do we extend
>>> Transformer/Estimator, or do we introduce a new set of API? *
>>>
>>> Proposal 1 tries to keep one set of API, which is based on
>>> Transformer/Estimator/Pipeline. More specifically, it does the following:
>>>- Make Transformer and Estimator multi-input and multi-output (MIMO).
>>>- Introduce a Graph/GraphModel as counter parts of
>>> Pipeline/PipelineModel.
>>>
>>> Proposal 2 leaves the existing Transformer/Estimator/Pipeline as is.
>>> Instead, it introduces AlgoOperators to support the graph structure. The
>>> AlgoOperators are general-purpose graph nodes supporting MIMO. They are
>>> independent of Pipeline / Transformer / Estimator.
>>>
>>>
>>> My two cents:
>>>
>>> I think it is a big advantage to have a single set of API rather than two
>>> independent sets of API, if possible. But I would suggest we change the
>>> current proposal 1 a little bit, by learning from proposal 2.
>>>
>>> What I like about proposal 1:
>>> 1. A single set of API, symmetric in Graph/GraphModel and
>>> Pipeline/PipelineModel.
>>> 2. Keeping most of the benefits from Transformer/Estimator, including the
>>> fit-then-transform relation and save/load capability.
>>>
>>> However, proposal 1 also introduced some changes that I am not sure about:
>>>
>>> 1. The most controversial part of proposal 1 is whether 

[jira] [Created] (FLINK-23688) Make JaasModule.getAppConfigurationEntries public

2021-08-09 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-23688:
-

 Summary: Make JaasModule.getAppConfigurationEntries public
 Key: FLINK-23688
 URL: https://issues.apache.org/jira/browse/FLINK-23688
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hadoop Compatibility
Affects Versions: 1.14.0
Reporter: Gabor Somogyi


Kerberos authentication handler (external project) needs it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-180: Adjust StreamStatus and Idleness definition

2021-08-09 Thread Till Rohrmann
+1 (binding)

Cheers,
Till

On Thu, Aug 5, 2021 at 9:09 PM Arvid Heise  wrote:

> Dear devs,
>
> I'd like to open a vote on FLIP-180: Adjust StreamStatus and Idleness
> definition [1] which was discussed in this thread [2].
> The vote will be open for at least 72 hours unless there is an objection or
> not enough votes.
>
> Best,
>
> Arvid
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-180%3A+Adjust+StreamStatus+and+Idleness+definition
> [2]
>
> https://lists.apache.org/thread.html/r8357d64b9cfdf5a233c53a20d9ac62b75c07c925ce2c43e162f1e39c%40%3Cdev.flink.apache.org%3E
>


[jira] [Created] (FLINK-23687) Add Sql query hint to enable LookupJoin shuffle by join key of left input

2021-08-09 Thread JING ZHANG (Jira)
JING ZHANG created FLINK-23687:
--

 Summary: Add Sql query hint to enable LookupJoin shuffle by join 
key of left input
 Key: FLINK-23687
 URL: https://issues.apache.org/jira/browse/FLINK-23687
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: JING ZHANG


Add Sql query hint to enable LookupJoin shuffle by join key of left input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-09 Thread Yu Li
Thanks Yun Tang for being our release manager and everyone else who made
the release possible!

Best Regards,
Yu


On Fri, 6 Aug 2021 at 13:52, Yun Tang  wrote:

>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.13.2, which is the second bugfix release for the Apache Flink 1.13
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2021/08/06/release-1.13.2.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218==12315522
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Yun Tang
>


[jira] [Created] (FLINK-23686) KafkaSource metric "commitsSucceeded" should count per-commit instead of per-partition

2021-08-09 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-23686:
-

 Summary: KafkaSource metric "commitsSucceeded" should count 
per-commit instead of per-partition
 Key: FLINK-23686
 URL: https://issues.apache.org/jira/browse/FLINK-23686
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.13.2, 1.14.0
Reporter: Qingsheng Ren
 Fix For: 1.14.0, 1.13.3


Currently if a successful offset commit includes multiple topic partition 
(let's say 4), the counter will increase by 4 instead of 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Looking for Maintainers for Flink on YARN

2021-08-09 Thread Miklos Gergely
Hi All,

Just arrived back from holiday and seen this thread.
Aug 11 works for me too, Till please add me to the meeting too.

Thanks
Miklos

On Thu, Aug 5, 2021 at 4:55 PM Till Rohrmann  wrote:

> Great to hear from everybody. I've created an invite for Thu, Aug 11, 9am
> CEST and invited Gabor and Yangze. Let me know if somebody else wants to
> join as well. I'll add her then to the calendar invite.
>
> Cheers,
> Till
>
> On Thu, Aug 5, 2021 at 3:50 AM Yangze Guo  wrote:
>
> > Thanks Konstantin for bringing this up, and thanks Marton and Gabor
> > for stepping up!
> >
> > Both dates work for me. If there is something our team can help with
> > for the kick-off, please let me know.
> >
> > Best,
> > Yangze Guo
> >
> > On Thu, Aug 5, 2021 at 9:39 AM Xintong Song 
> wrote:
> > >
> > > Yangze will join the kick-off on behalf of our team.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Aug 4, 2021 at 8:36 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com>
> > wrote:
> > >>
> > >> Both are good to me.
> > >>
> > >> G
> > >>
> > >>
> > >> On Wed, Aug 4, 2021 at 9:30 AM Konstantin Knauf 
> > wrote:
> > >>
> > >> > Hi everyone
> > >> >
> > >> > Thank you Marton & team for stepping up and thanks, Xintong, for
> > offering
> > >> > your continued support. I'd like to leave this thread "open" for a
> bit
> > >> > longer so that others can continue to chime in. In the meantime, I
> > would
> > >> > already like to propose two dates for an informal kick-off:
> > >> >
> > >> > * Thu, Aug 11, 9am CEST
> > >> > * Fri, Aug 12, 9am CEST
> > >> >
> > >> > Would any of these dates work for you? As an agenda we propose
> > something
> > >> > along the lines of:
> > >> >
> > >> > * Introductions
> > >> > * Architecture Introduction by Till
> > >> > * Handover of Ongoing Tickets
> > >> > * Open Questions
> > >> >
> > >> > As I am out of office for a month starting tomorrow, Till will take
> > this
> > >> > thread over from hereon.
> > >> >
> > >> > Best,
> > >> >
> > >> > Konstantin
> > >> >
> > >> > Cheers,
> > >> >
> > >> > Konstantin
> > >> >
> > >> >
> > >> > On Mon, Aug 2, 2021 at 9:36 AM Gabor Somogyi <
> > gabor.g.somo...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > > Just arrived back from holiday and seen this thread.
> > >> > > I'm working with Marton and you can ping me directly on jira/PR in
> > need.
> > >> > >
> > >> > > BR,
> > >> > > G
> > >> > >
> > >> > >
> > >> > > On Mon, Aug 2, 2021 at 5:02 AM Xintong Song <
> tonysong...@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > > > Thanks Konstantin for bringing this up, and thanks Marton and
> > your team
> > >> > > for
> > >> > > > volunteering.
> > >> > > >
> > >> > > > @Marton,
> > >> > > > Our team at Alibaba will keep helping with the Flink on YARN
> > >> > maintenance.
> > >> > > > However, as Konstaintin said, the time our team dedicated to
> this
> > will
> > >> > > > probably be less than it used to be. Your offer of help is
> > significant
> > >> > > and
> > >> > > > greatly appreciated. Please feel free to reach out to us if you
> > need
> > >> > any
> > >> > > > help on this.
> > >> > > >
> > >> > > > Thank you~
> > >> > > >
> > >> > > > Xintong Song
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Jul 29, 2021 at 5:04 PM Márton Balassi <
> > >> > balassi.mar...@gmail.com
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Konstantin,
> > >> > > > >
> > >> > > > > Thank you for raising this topic, our development team at
> > Cloudera
> > >> > > would
> > >> > > > be
> > >> > > > > happy to step up to address this responsibility.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Marton
> > >> > > > >
> > >> > > > > On Thu, Jul 29, 2021 at 10:15 AM Konstantin Knauf <
> > kna...@apache.org
> > >> > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Dear community,
> > >> > > > > >
> > >> > > > > > We are looking for community members, who would like to
> > maintain
> > >> > > > Flink's
> > >> > > > > > YARN support going forward. So far, this has been handled by
> > teams
> > >> > at
> > >> > > > > > Ververica & Alibaba. The focus of these teams has shifted
> > over the
> > >> > > past
> > >> > > > > > months so that we only have little time left for this topic.
> > Still,
> > >> > > we
> > >> > > > > > think, it is important to maintain high quality support for
> > Flink
> > >> > on
> > >> > > > > YARN.
> > >> > > > > >
> > >> > > > > > What does "Maintaining Flink on YARN" mean? There are no
> known
> > >> > bigger
> > >> > > > > > efforts outstanding. We are mainly talking about addressing
> > >> > > > > > "test-stability" issues, bugs, version upgrades, community
> > >> > > > contributions
> > >> > > > > &
> > >> > > > > > smaller feature requests. The prioritization of these would
> > be up
> > >> > to
> > >> > > > the
> > >> > > > > > future maintainers, except "test-stability" issues which are
> > >> > > important
> > >> > > > to
> > >> > > > > > address for