date:20230105

[GitHub] [flink-ml] weibozhao commented on pull request #198: [FLINK-30568] Add benchmark for PolyNomialExpansion, Normalizer, Binarizer, Interaction, MaxAbsScaler, VectorSlicer, ElementWiseProduct an

2023-01-05 Thread GitBox



weibozhao commented on PR #198:
URL: https://github.com/apache/flink-ml/pull/198#issuecomment-1373311087

   > Can you rebase the PR with the latest master branch?
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-29427) LookupJoinITCase failed with classloader problem

2023-01-05 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655283#comment-17655283
 ] 

Matthias Pohl commented on FLINK-29427:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44526=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=17619

> LookupJoinITCase failed with classloader problem
> 
>
> Key: FLINK-29427
> URL: https://issues.apache.org/jira/browse/FLINK-29427
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Assignee: Alexander Smirnov
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-09-27T02:49:20.9501313Z Sep 27 02:49:20 Caused by: 
> org.codehaus.janino.InternalCompilerException: Compiling 
> "KeyProjection$108341": Trying to access closed classloader. Please check if 
> you store classloaders directly or indirectly in static fields. If the 
> stacktrace suggests that the leak occurs in a third party library and cannot 
> be fixed immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9502654Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382)
> 2022-09-27T02:49:20.9503366Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237)
> 2022-09-27T02:49:20.9504044Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
> 2022-09-27T02:49:20.9504704Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:216)
> 2022-09-27T02:49:20.9505341Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207)
> 2022-09-27T02:49:20.9505965Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> 2022-09-27T02:49:20.9506584Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:75)
> 2022-09-27T02:49:20.9507261Z Sep 27 02:49:20  at 
> org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:104)
> 2022-09-27T02:49:20.9507883Z Sep 27 02:49:20  ... 30 more
> 2022-09-27T02:49:20.9509266Z Sep 27 02:49:20 Caused by: 
> java.lang.IllegalStateException: Trying to access closed classloader. Please 
> check if you store classloaders directly or indirectly in static fields. If 
> the stacktrace suggests that the leak occurs in a third party library and 
> cannot be fixed immediately, you can disable this check with the 
> configuration 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9510835Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184)
> 2022-09-27T02:49:20.9511760Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192)
> 2022-09-27T02:49:20.9512456Z Sep 27 02:49:20  at 
> java.lang.Class.forName0(Native Method)
> 2022-09-27T02:49:20.9513014Z Sep 27 02:49:20  at 
> java.lang.Class.forName(Class.java:348)
> 2022-09-27T02:49:20.9513649Z Sep 27 02:49:20  at 
> org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89)
> 2022-09-27T02:49:20.9514339Z Sep 27 02:49:20  at 
> org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:312)
> 2022-09-27T02:49:20.9514990Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8556)
> 2022-09-27T02:49:20.9515659Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6749)
> 2022-09-27T02:49:20.9516337Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6594)
> 2022-09-27T02:49:20.9516989Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6573)
> 2022-09-27T02:49:20.9517632Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.access$13900(UnitCompiler.java:215)
> 2022-09-27T02:49:20.9518319Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6481)
> 2022-09-27T02:49:20.9519018Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9519680Z Sep 27 02:49:20  at 
> org.codehaus.janino.Java$ReferenceType.accept(Java.java:3928)
> 2022-09-27T02:49:20.9520386Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9521042Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6469)
> 2022-09-27T02:49:20.9521677Z Sep 27 02:49:20  at 
>

[jira] [Comment Edited] (FLINK-29427) LookupJoinITCase failed with classloader problem

2023-01-05 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655279#comment-17655279
 ] 

Matthias Pohl edited comment on FLINK-29427 at 1/6/23 7:53 AM:
---

Same build, 2 jobs failed:
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44525=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=20486
* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44525=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=20741


was (Author: mapohl):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44525=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=20486

> LookupJoinITCase failed with classloader problem
> 
>
> Key: FLINK-29427
> URL: https://issues.apache.org/jira/browse/FLINK-29427
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Assignee: Alexander Smirnov
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-09-27T02:49:20.9501313Z Sep 27 02:49:20 Caused by: 
> org.codehaus.janino.InternalCompilerException: Compiling 
> "KeyProjection$108341": Trying to access closed classloader. Please check if 
> you store classloaders directly or indirectly in static fields. If the 
> stacktrace suggests that the leak occurs in a third party library and cannot 
> be fixed immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9502654Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382)
> 2022-09-27T02:49:20.9503366Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237)
> 2022-09-27T02:49:20.9504044Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
> 2022-09-27T02:49:20.9504704Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:216)
> 2022-09-27T02:49:20.9505341Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207)
> 2022-09-27T02:49:20.9505965Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> 2022-09-27T02:49:20.9506584Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:75)
> 2022-09-27T02:49:20.9507261Z Sep 27 02:49:20  at 
> org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:104)
> 2022-09-27T02:49:20.9507883Z Sep 27 02:49:20  ... 30 more
> 2022-09-27T02:49:20.9509266Z Sep 27 02:49:20 Caused by: 
> java.lang.IllegalStateException: Trying to access closed classloader. Please 
> check if you store classloaders directly or indirectly in static fields. If 
> the stacktrace suggests that the leak occurs in a third party library and 
> cannot be fixed immediately, you can disable this check with the 
> configuration 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9510835Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184)
> 2022-09-27T02:49:20.9511760Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192)
> 2022-09-27T02:49:20.9512456Z Sep 27 02:49:20  at 
> java.lang.Class.forName0(Native Method)
> 2022-09-27T02:49:20.9513014Z Sep 27 02:49:20  at 
> java.lang.Class.forName(Class.java:348)
> 2022-09-27T02:49:20.9513649Z Sep 27 02:49:20  at 
> org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89)
> 2022-09-27T02:49:20.9514339Z Sep 27 02:49:20  at 
> org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:312)
> 2022-09-27T02:49:20.9514990Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8556)
> 2022-09-27T02:49:20.9515659Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6749)
> 2022-09-27T02:49:20.9516337Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6594)
> 2022-09-27T02:49:20.9516989Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6573)
> 2022-09-27T02:49:20.9517632Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.access$13900(UnitCompiler.java:215)
> 2022-09-27T02:49:20.9518319Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6481)
> 2022-09-27T02:49:20.9519018Z Sep 27 02:49:20  at 
>

[jira] [Commented] (FLINK-28818) KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee failed with AssertionFailedError

2023-01-05 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655281#comment-17655281
 ] 

Matthias Pohl commented on FLINK-28818:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44525=logs=fa307d6d-91b1-5ab6-d460-ef50f552b1fe=21eae189-b04c-5c04-662b-17dc80ffc83a=37447

> KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee failed with 
> AssertionFailedError
> 
>
> Key: FLINK-28818
> URL: https://issues.apache.org/jira/browse/FLINK-28818
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> {code:java}
> 2022-08-04T13:31:52.8933185Z Aug 04 13:31:52 [ERROR] 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee
>   Time elapsed: 4.146 s  <<< FAILURE!
> 2022-08-04T13:31:52.8933887Z Aug 04 13:31:52 
> org.opentest4j.AssertionFailedError: 
> 2022-08-04T13:31:52.8936215Z Aug 04 13:31:52 
> 2022-08-04T13:31:52.8936766Z Aug 04 13:31:52 expected: 1664L
> 2022-08-04T13:31:52.8937362Z Aug 04 13:31:52  but was: 1858L
> 2022-08-04T13:31:52.8937955Z Aug 04 13:31:52  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 2022-08-04T13:31:52.8938814Z Aug 04 13:31:52  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 2022-08-04T13:31:52.8939622Z Aug 04 13:31:52  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2022-08-04T13:31:52.8940522Z Aug 04 13:31:52  at 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase.writeRecordsToKafka(KafkaSinkITCase.java:399)
> 2022-08-04T13:31:52.8941489Z Aug 04 13:31:52  at 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee(KafkaSinkITCase.java:213)
> 2022-08-04T13:31:52.8942418Z Aug 04 13:31:52  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-04T13:31:52.8943081Z Aug 04 13:31:52  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-04T13:31:52.8944125Z Aug 04 13:31:52  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-04T13:31:52.8944851Z Aug 04 13:31:52  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-04T13:31:52.8945519Z Aug 04 13:31:52  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 2022-08-04T13:31:52.8946254Z Aug 04 13:31:52  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2022-08-04T13:31:52.8947307Z Aug 04 13:31:52  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 2022-08-04T13:31:52.8948048Z Aug 04 13:31:52  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2022-08-04T13:31:52.8948777Z Aug 04 13:31:52  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2022-08-04T13:31:52.8949511Z Aug 04 13:31:52  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2022-08-04T13:31:52.8950206Z Aug 04 13:31:52  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2022-08-04T13:31:52.8950885Z Aug 04 13:31:52  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2022-08-04T13:31:52.8951656Z Aug 04 13:31:52  at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> 2022-08-04T13:31:52.8952421Z Aug 04 13:31:52  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 2022-08-04T13:31:52.8953092Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2022-08-04T13:31:52.8953802Z Aug 04 13:31:52  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 2022-08-04T13:31:52.8954499Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 2022-08-04T13:31:52.8955195Z Aug 04 13:31:52  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 2022-08-04T13:31:52.8955939Z Aug 04 13:31:52  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 2022-08-04T13:31:52.8956600Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 2022-08-04T13:31:52.8957229Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 2022-08-04T13:31:52.8957880Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>

[jira] [Updated] (FLINK-28818) KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee failed with AssertionFailedError

2023-01-05 Thread Matthias Pohl (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-28818:
--
Affects Version/s: 1.17.0

> KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee failed with 
> AssertionFailedError
> 
>
> Key: FLINK-28818
> URL: https://issues.apache.org/jira/browse/FLINK-28818
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Priority: Major
>  Labels: test-stability
>
> {code:java}
> 2022-08-04T13:31:52.8933185Z Aug 04 13:31:52 [ERROR] 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee
>   Time elapsed: 4.146 s  <<< FAILURE!
> 2022-08-04T13:31:52.8933887Z Aug 04 13:31:52 
> org.opentest4j.AssertionFailedError: 
> 2022-08-04T13:31:52.8936215Z Aug 04 13:31:52 
> 2022-08-04T13:31:52.8936766Z Aug 04 13:31:52 expected: 1664L
> 2022-08-04T13:31:52.8937362Z Aug 04 13:31:52  but was: 1858L
> 2022-08-04T13:31:52.8937955Z Aug 04 13:31:52  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 2022-08-04T13:31:52.8938814Z Aug 04 13:31:52  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 2022-08-04T13:31:52.8939622Z Aug 04 13:31:52  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2022-08-04T13:31:52.8940522Z Aug 04 13:31:52  at 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase.writeRecordsToKafka(KafkaSinkITCase.java:399)
> 2022-08-04T13:31:52.8941489Z Aug 04 13:31:52  at 
> org.apache.flink.connector.kafka.sink.KafkaSinkITCase.testWriteRecordsToKafkaWithExactlyOnceGuarantee(KafkaSinkITCase.java:213)
> 2022-08-04T13:31:52.8942418Z Aug 04 13:31:52  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-04T13:31:52.8943081Z Aug 04 13:31:52  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-04T13:31:52.8944125Z Aug 04 13:31:52  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-04T13:31:52.8944851Z Aug 04 13:31:52  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-04T13:31:52.8945519Z Aug 04 13:31:52  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 2022-08-04T13:31:52.8946254Z Aug 04 13:31:52  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2022-08-04T13:31:52.8947307Z Aug 04 13:31:52  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 2022-08-04T13:31:52.8948048Z Aug 04 13:31:52  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2022-08-04T13:31:52.8948777Z Aug 04 13:31:52  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2022-08-04T13:31:52.8949511Z Aug 04 13:31:52  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2022-08-04T13:31:52.8950206Z Aug 04 13:31:52  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2022-08-04T13:31:52.8950885Z Aug 04 13:31:52  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2022-08-04T13:31:52.8951656Z Aug 04 13:31:52  at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> 2022-08-04T13:31:52.8952421Z Aug 04 13:31:52  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 2022-08-04T13:31:52.8953092Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2022-08-04T13:31:52.8953802Z Aug 04 13:31:52  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 2022-08-04T13:31:52.8954499Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 2022-08-04T13:31:52.8955195Z Aug 04 13:31:52  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 2022-08-04T13:31:52.8955939Z Aug 04 13:31:52  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 2022-08-04T13:31:52.8956600Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 2022-08-04T13:31:52.8957229Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 2022-08-04T13:31:52.8957880Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 2022-08-04T13:31:52.8958531Z Aug 04 13:31:52  at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 2022-08-04T13:31:52.8959336Z Aug 04 13:31:52  at 
>

[jira] [Commented] (FLINK-29594) RMQSourceITCase.testMessageDelivery timed out

2023-01-05 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655280#comment-17655280
 ] 

Matthias Pohl commented on FLINK-29594:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44525=logs=fc7981dc-d266-55b0-5fff-f0d0a2294e36=1a9b228a-3e0e-598f-fc81-c321539dfdbf=37809

> RMQSourceITCase.testMessageDelivery timed out
> -
>
> Key: FLINK-29594
> URL: https://issues.apache.org/jira/browse/FLINK-29594
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors/ RabbitMQ
>Affects Versions: 1.17.0, rabbitmq-3.0.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> [This 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41843=logs=fc7981dc-d266-55b0-5fff-f0d0a2294e36=1a9b228a-3e0e-598f-fc81-c321539dfdbf=38211]
>  failed (not exclusively) due to {{RMQSourceITCase.testMessageDelivery}} 
> timing out.
> I wasn't able to reproduce it locally with 200 test runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] lindong28 commented on a diff in pull request #197: [FLINK-30566] Add benchmark for AgglomerativeClustering, HashingTF, IDF, KBinsDiscretizer, LinearRegression, LinearSVC, Logi

2023-01-05 Thread GitBox



lindong28 commented on code in PR #197:
URL: https://github.com/apache/flink-ml/pull/197#discussion_r1063213303


##
flink-ml-benchmark/src/main/resources/linearregression-benchmark.json:
##
@@ -0,0 +1,45 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+{
+  "version": 1,
+  "linearregression": {
+"stage": {
+  "className": 
"org.apache.flink.ml.regression.linearregression.LinearRegression",
+  "paramMap": {
+"featuresCol": "features",

Review Comment:
   Would it be simpler (and more consistent with existing benchmark configs) to 
skip specifying the default values for `featuresCol`?
   
   Same for `labelCol`, `weightCol`, `inputCol` and `outputCol`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-29427) LookupJoinITCase failed with classloader problem

2023-01-05 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655279#comment-17655279
 ] 

Matthias Pohl commented on FLINK-29427:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44525=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=20486

> LookupJoinITCase failed with classloader problem
> 
>
> Key: FLINK-29427
> URL: https://issues.apache.org/jira/browse/FLINK-29427
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Assignee: Alexander Smirnov
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-09-27T02:49:20.9501313Z Sep 27 02:49:20 Caused by: 
> org.codehaus.janino.InternalCompilerException: Compiling 
> "KeyProjection$108341": Trying to access closed classloader. Please check if 
> you store classloaders directly or indirectly in static fields. If the 
> stacktrace suggests that the leak occurs in a third party library and cannot 
> be fixed immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9502654Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382)
> 2022-09-27T02:49:20.9503366Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237)
> 2022-09-27T02:49:20.9504044Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
> 2022-09-27T02:49:20.9504704Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:216)
> 2022-09-27T02:49:20.9505341Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207)
> 2022-09-27T02:49:20.9505965Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> 2022-09-27T02:49:20.9506584Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:75)
> 2022-09-27T02:49:20.9507261Z Sep 27 02:49:20  at 
> org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:104)
> 2022-09-27T02:49:20.9507883Z Sep 27 02:49:20  ... 30 more
> 2022-09-27T02:49:20.9509266Z Sep 27 02:49:20 Caused by: 
> java.lang.IllegalStateException: Trying to access closed classloader. Please 
> check if you store classloaders directly or indirectly in static fields. If 
> the stacktrace suggests that the leak occurs in a third party library and 
> cannot be fixed immediately, you can disable this check with the 
> configuration 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9510835Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184)
> 2022-09-27T02:49:20.9511760Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192)
> 2022-09-27T02:49:20.9512456Z Sep 27 02:49:20  at 
> java.lang.Class.forName0(Native Method)
> 2022-09-27T02:49:20.9513014Z Sep 27 02:49:20  at 
> java.lang.Class.forName(Class.java:348)
> 2022-09-27T02:49:20.9513649Z Sep 27 02:49:20  at 
> org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89)
> 2022-09-27T02:49:20.9514339Z Sep 27 02:49:20  at 
> org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:312)
> 2022-09-27T02:49:20.9514990Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8556)
> 2022-09-27T02:49:20.9515659Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6749)
> 2022-09-27T02:49:20.9516337Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6594)
> 2022-09-27T02:49:20.9516989Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6573)
> 2022-09-27T02:49:20.9517632Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.access$13900(UnitCompiler.java:215)
> 2022-09-27T02:49:20.9518319Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6481)
> 2022-09-27T02:49:20.9519018Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9519680Z Sep 27 02:49:20  at 
> org.codehaus.janino.Java$ReferenceType.accept(Java.java:3928)
> 2022-09-27T02:49:20.9520386Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9521042Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6469)
> 2022-09-27T02:49:20.9521677Z Sep 27 02:49:20  at 
>

[GitHub] [flink-ml] zhipeng93 commented on pull request #198: [FLINK-30568] Add benchmark for PolyNomialExpansion, Normalizer, Binarizer, Interaction, MaxAbsScaler, VectorSlicer, ElementWiseProduct an

2023-01-05 Thread GitBox



zhipeng93 commented on PR #198:
URL: https://github.com/apache/flink-ml/pull/198#issuecomment-1373252824

   Can you rebase the PR with the latest master branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fsk119 commented on a diff in pull request #20822: [FLINK-25447][table-planner] Fix batch query cannot generate plan when a sorted view into multi sinks

2023-01-05 Thread GitBox



fsk119 commented on code in PR #20822:
URL: https://github.com/apache/flink/pull/20822#discussion_r1063197587


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/batch/sql/SortTest.scala:
##
@@ -46,6 +46,134 @@ class SortTest extends TableTestBase {
 util.verifyExecPlan("SELECT * FROM MyTable ORDER BY a DESC, b")
   }
 
+  @Test
+  def testSortedResultIntoMultiSinks(): Unit = {
+val env = util.tableEnv
+env.executeSql(s"""
+  |CREATE TABLE Source (
+  |  a int,
+  |  b bigint,
+  |  c string,
+  |  d string,
+  |  e string
+  | ) WITH (
+  |  'connector' = 'values',
+  |  'bounded' = 'true'
+  |)
+  |""".stripMargin)
+env.executeSql("""
+ |CREATE TABLE sink1 (
+ |   a int,
+ |   b bigint,
+ |   c string
+ |) WITH (
+ |   'connector' = 'filesystem',
+ |   'format' = 'testcsv',
+ |   'path' = '/tmp/test'
+ |)
+ |""".stripMargin)
+env.executeSql("""
+ |CREATE TABLE sink2 (
+ |   a int,
+ |   b bigint,
+ |   c string,
+ |   d string
+ |) WITH (
+ |   'connector' = 'filesystem',
+ |   'format' = 'testcsv',
+ |   'path' = '/tmp/test'
+ |)
+ |""".stripMargin)
+val stmtSet = env.createStatementSet()
+stmtSet.addInsertSql("INSERT INTO sink1 select a, b, c from Source order 
by d")
+stmtSet.addInsertSql("INSERT INTO sink2 select a, b, c, d from Source 
order by d")
+
+util.verifyExecPlan(stmtSet)
+  }
+
+  @Test
+  def testSortedViewResultIntoMultiSinks(): Unit = {

Review Comment:
   I think the test is more related to the {@link SubPlanReuseTest}. Could you 
move the test?



##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/batch/sql/SortTest.scala:
##
@@ -46,6 +46,134 @@ class SortTest extends TableTestBase {
 util.verifyExecPlan("SELECT * FROM MyTable ORDER BY a DESC, b")
   }
 
+  @Test
+  def testSortedResultIntoMultiSinks(): Unit = {
+val env = util.tableEnv
+env.executeSql(s"""
+  |CREATE TABLE Source (
+  |  a int,
+  |  b bigint,
+  |  c string,
+  |  d string,
+  |  e string
+  | ) WITH (
+  |  'connector' = 'values',
+  |  'bounded' = 'true'
+  |)
+  |""".stripMargin)
+env.executeSql("""
+ |CREATE TABLE sink1 (
+ |   a int,
+ |   b bigint,
+ |   c string
+ |) WITH (
+ |   'connector' = 'filesystem',
+ |   'format' = 'testcsv',
+ |   'path' = '/tmp/test'
+ |)
+ |""".stripMargin)
+env.executeSql("""
+ |CREATE TABLE sink2 (
+ |   a int,
+ |   b bigint,
+ |   c string,
+ |   d string
+ |) WITH (
+ |   'connector' = 'filesystem',
+ |   'format' = 'testcsv',
+ |   'path' = '/tmp/test'
+ |)
+ |""".stripMargin)
+val stmtSet = env.createStatementSet()
+stmtSet.addInsertSql("INSERT INTO sink1 select a, b, c from Source order 
by d")
+stmtSet.addInsertSql("INSERT INTO sink2 select a, b, c, d from Source 
order by d")
+
+util.verifyExecPlan(stmtSet)
+  }
+
+  @Test
+  def testSortedViewResultIntoMultiSinks(): Unit = {
+val env = util.tableEnv
+env.executeSql(s"""
+  |CREATE TABLE Source (
+  |  a int,
+  |  b bigint,
+  |  c string,
+  |  d string,
+  |  e string
+  | ) WITH (
+  |  'connector' = 'values',
+  |  'bounded' = 'true'
+  |)
+  |""".stripMargin)
+val query = "SELECT * FROM Source order by d"
+val table = env.sqlQuery(query)
+env.registerTable("SortedTable", table)
+env.executeSql("""
+ |CREATE TABLE sink1 (
+ |   a int,
+ |   b bigint,
+ |   c string
+

[jira] [Created] (FLINK-30585) Improve flame graph performance at subtask level

2023-01-05 Thread Rui Fan (Jira)

Rui Fan created FLINK-30585:
---

 Summary: Improve flame graph performance at subtask level
 Key: FLINK-30585
 URL: https://issues.apache.org/jira/browse/FLINK-30585
 Project: Flink
  Issue Type: Sub-task
Reporter: Rui Fan


After FLINK-30185 , we can view the flame graph of subtask level. However, it 
always collects flame graphs for all subtasks.

We should collect the flame graph of single subtask instead of all subtasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30584) Update the flame graph doc of subtask level

2023-01-05 Thread Rui Fan (Jira)

Rui Fan created FLINK-30584:
---

 Summary: Update the flame graph doc of subtask level
 Key: FLINK-30584
 URL: https://issues.apache.org/jira/browse/FLINK-30584
 Project: Flink
  Issue Type: Sub-task
Reporter: Rui Fan
 Fix For: 1.17.0


Update the flame graph doc of subtask level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30185) Provide the flame graph to the subtask level

2023-01-05 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-30185:

Parent: FLINK-30583
Issue Type: Sub-task  (was: Improvement)

> Provide the flame graph to the subtask level
> 
>
> Key: FLINK-30185
> URL: https://issues.apache.org/jira/browse/FLINK-30185
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
> Attachments: image-2022-11-24-14-49-42-845.png, 
> image-2022-11-28-14-38-47-145.png, image-2022-11-28-14-48-20-462.png
>
>
> FLINK-13550 supported for CPU FlameGraphs in web UI.
> As Flink doc mentioned:
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/flame_graphs/#sampling-process
> {code:java}
> Note: Stack trace samples from all threads of an operator are combined 
> together. If a method call consumes 100% of the resources in one of the 
> parallel tasks but none in the others, the bottleneck might be obscured by 
> being averaged out.
> There are plans to address this limitation in the future by providing “drill 
> down” visualizations to the task level. {code}
>  
> The flame graph at the subtask level is very useful when a small number of 
> subtasks are bottlenecked. So we should provide the flame graph to the 
> subtask level
>  
> !image-2022-11-24-14-49-42-845.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30583) Provide the flame graph to the subtask level

2023-01-05 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-30583:

Issue Type: New Feature  (was: Bug)

> Provide the flame graph to the subtask level
> 
>
> Key: FLINK-30583
> URL: https://issues.apache.org/jira/browse/FLINK-30583
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: Rui Fan
>Priority: Major
> Fix For: 1.17.0
>
>
> This is a umbrella Jira about providing the flame graph to the subtask level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] weibozhao commented on pull request #192: [FLINK-30451] Add Estimator and Transformer for Swing

2023-01-05 Thread GitBox



weibozhao commented on PR #192:
URL: https://github.com/apache/flink-ml/pull/192#issuecomment-1373235362

   Thanks for the update.  I Left some comments below.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30583) Provide the flame graph to the subtask level

2023-01-05 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-30583:

Description: This is a umbrella Jira about   (was: Provide the flame graph 
to the subtask level)

> Provide the flame graph to the subtask level
> 
>
> Key: FLINK-30583
> URL: https://issues.apache.org/jira/browse/FLINK-30583
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: Rui Fan
>Priority: Major
> Fix For: 1.17.0
>
>
> This is a umbrella Jira about 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30583) Provide the flame graph to the subtask level

2023-01-05 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-30583:

Description: This is a umbrella Jira about providing the flame graph to the 
subtask level.  (was: This is a umbrella Jira about )

> Provide the flame graph to the subtask level
> 
>
> Key: FLINK-30583
> URL: https://issues.apache.org/jira/browse/FLINK-30583
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: Rui Fan
>Priority: Major
> Fix For: 1.17.0
>
>
> This is a umbrella Jira about providing the flame graph to the subtask level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30583) Provide the flame graph to the subtask level

2023-01-05 Thread Rui Fan (Jira)

Rui Fan created FLINK-30583:
---

 Summary: Provide the flame graph to the subtask level
 Key: FLINK-30583
 URL: https://issues.apache.org/jira/browse/FLINK-30583
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST, Runtime / Web Frontend
Reporter: Rui Fan
 Fix For: 1.17.0


Provide the flame graph to the subtask level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] Fanoid commented on pull request #191: [FLINK-30401] Add Estimator and Transformer for MinHashLSH

2023-01-05 Thread GitBox



Fanoid commented on PR #191:
URL: https://github.com/apache/flink-ml/pull/191#issuecomment-1373231437

   @jiangxin369 Thanks again for your comments. I've updated the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] Fanoid commented on a diff in pull request #191: [FLINK-30401] Add Estimator and Transformer for MinHashLSH

2023-01-05 Thread GitBox



Fanoid commented on code in PR #191:
URL: https://github.com/apache/flink-ml/pull/191#discussion_r1063193679


##
flink-ml-lib/src/test/java/org/apache/flink/ml/feature/MinHashLSHTest.java:
##
@@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.feature.lsh.MinHashLSH;
+import org.apache.flink.ml.feature.lsh.MinHashLSHModel;
+import org.apache.flink.ml.feature.lsh.MinHashLSHModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.SparseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.TestUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.collections.IteratorUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.table.api.Expressions.$;
+
+/** Tests {@link MinHashLSH} and {@link MinHashLSHModel}. */
+public class MinHashLSHTest extends AbstractTestBase {
+@Rule public final TemporaryFolder tempFolder = new TemporaryFolder();
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+
+/**
+ * Default case for most tests.
+ *
+ * @return a tuple including the estimator, input data table, and output 
rows.
+ */
+private Tuple3> getDefaultCase() {

Review Comment:
   I have updated this part in both Java and Python codes to make them 
consistent with other tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #460: [FLINK-30573] Fix bug that Table Store dedicated compact job may skip some records when checkpoint interval is long

2023-01-05 Thread GitBox



JingsongLi commented on code in PR #460:
URL: https://github.com/apache/flink-table-store/pull/460#discussion_r1063181702


##
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/sink/FullChangelogStoreSinkWrite.java:
##
@@ -166,67 +191,83 @@ public List prepareCommit(boolean 
doCompaction, long checkpointId)
 currentFirstWriteMs = null;
 }
 
-if (snapshotIdentifierToCheck == null // wait for last forced full 
compaction to complete
-&& !writtenBuckets.isEmpty() // there should be something to 
compact
+if (LOG.isDebugEnabled()) {
+for (Map.Entry>> 
checkpointIdAndBuckets :
+writtenBuckets.entrySet()) {
+LOG.debug(
+"Written buckets for checkpoint #{} are:", 
checkpointIdAndBuckets.getKey());
+for (Tuple2 bucket : 
checkpointIdAndBuckets.getValue()) {
+LOG.debug("  * partition {}, bucket {}", bucket.f0, 
bucket.f1);
+}
+}
+}
+
+if (!writtenBuckets.isEmpty() // there should be something to compact
 && System.currentTimeMillis() - 
firstWriteMs.firstEntry().getValue()
 >= fullCompactionThresholdMs // time without full 
compaction exceeds
 ) {
 doCompaction = true;
 }
 
 if (doCompaction) {
-snapshotIdentifierToCheck = checkpointId;
-Set> compactedBuckets = new 
HashSet<>();
-for (Set> buckets : 
writtenBuckets.values()) {
-for (Tuple2 bucket : buckets) {
-if (compactedBuckets.contains(bucket)) {
-continue;
-}
-compactedBuckets.add(bucket);
-try {
-write.compact(bucket.f0, bucket.f1, true);
-} catch (Exception e) {
-throw new RuntimeException(e);
-}
-}
+if (LOG.isDebugEnabled()) {
+LOG.debug("Submit full compaction for checkpoint #{}", 
checkpointId);
 }
+submitFullCompaction();
+commitIdentifiersToCheck.add(checkpointId);
 }
 
 return super.prepareCommit(doCompaction, checkpointId);
 }
 
-private Optional findSnapshot(long identifierToCheck) {
-// TODO We need a mechanism to do timeout recovery in case of snapshot 
expiration.
+private void checkSuccessfulFullCompaction() {
 SnapshotManager snapshotManager = table.snapshotManager();
 Long latestId = snapshotManager.latestSnapshotId();
 if (latestId == null) {
-return Optional.empty();
+return;
 }
-
 Long earliestId = snapshotManager.earliestSnapshotId();
 if (earliestId == null) {
-return Optional.empty();
+return;
 }
 
-// We must find the snapshot whose identifier is exactly 
`identifierToCheck`.
-// We can't just compare with the latest snapshot identifier by this 
user (like what we do
-// in `AbstractFileStoreWrite`), because even if the latest snapshot 
identifier is newer,
-// compact changes may still be discarded due to commit conflicts.
 for (long id = latestId; id >= earliestId; id--) {
 Snapshot snapshot = snapshotManager.snapshot(id);
-if (!snapshot.commitUser().equals(commitUser)) {
-continue;
-}
-if (snapshot.commitIdentifier() == identifierToCheck) {
-return Optional.of(snapshot);
-} else if (snapshot.commitIdentifier() < identifierToCheck) {
-// We're searching from new snapshots to old ones. So if we 
find an older
-// identifier, we can make sure our target snapshot will never 
occur.
-return Optional.empty();
+if (snapshot.commitUser().equals(commitUser)
+&& snapshot.commitKind() == Snapshot.CommitKind.COMPACT) {
+long commitIdentifier = snapshot.commitIdentifier();
+if (commitIdentifiersToCheck.contains(commitIdentifier)) {
+// we found a full compaction snapshot
+if (LOG.isDebugEnabled()) {
+LOG.debug(
+"Found full compaction snapshot #{} with 
identifier {}",
+id,
+commitIdentifier);
+}
+writtenBuckets.headMap(commitIdentifier, true).clear();
+firstWriteMs.headMap(commitIdentifier, true).clear();
+commitIdentifiersToCheck.headSet(commitIdentifier).clear();
+break;
+}
 }
 }
+}
 
-return Optional.empty();
+private

[jira] [Assigned] (FLINK-30532) Add benchmark for DCT, SQLTransformer and StopWordsRemover algorithm

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-30532:


Assignee: Yunfeng Zhou

> Add benchmark for DCT, SQLTransformer and StopWordsRemover algorithm
> 
>
> Key: FLINK-30532
> URL: https://issues.apache.org/jira/browse/FLINK-30532
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Yunfeng Zhou
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-30532) Add benchmark for DCT, SQLTransformer and StopWordsRemover algorithm

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-30532.

Resolution: Fixed

> Add benchmark for DCT, SQLTransformer and StopWordsRemover algorithm
> 
>
> Key: FLINK-30532
> URL: https://issues.apache.org/jira/browse/FLINK-30532
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Yunfeng Zhou
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] lindong28 closed pull request #195: [FLINK-30532] Add benchmark for DCT, SQLTransformer and StopWordsRemover

2023-01-05 Thread GitBox



lindong28 closed pull request #195: [FLINK-30532] Add benchmark for DCT, 
SQLTransformer and StopWordsRemover
URL: https://github.com/apache/flink-ml/pull/195


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on pull request #195: [FLINK-30532] Add benchmark for DCT, SQLTransformer and StopWordsRemover

2023-01-05 Thread GitBox



lindong28 commented on PR #195:
URL: https://github.com/apache/flink-ml/pull/195#issuecomment-1373214965

   Thanks for the update. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 merged pull request #189: [FLINK-30348] Add HasSeed param for RandomSplitter

2023-01-05 Thread GitBox



zhipeng93 merged PR #189:
URL: https://github.com/apache/flink-ml/pull/189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] Fanoid commented on pull request #191: [FLINK-30401] Add Estimator and Transformer for MinHashLSH

2023-01-05 Thread GitBox



Fanoid commented on PR #191:
URL: https://github.com/apache/flink-ml/pull/191#issuecomment-1373205202

   @jiangxin369 Thanks again for your comments. I've updated the PR>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-30249) TableUtils.getRowTypeInfo() creating wrong TypeInformation

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-30249.

Resolution: Fixed

> TableUtils.getRowTypeInfo() creating wrong TypeInformation
> --
>
> Key: FLINK-30249
> URL: https://issues.apache.org/jira/browse/FLINK-30249
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Affects Versions: ml-2.0.0, ml-2.1.0
>Reporter: Zhipeng Zhang
>Assignee: Zhipeng Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] lindong28 merged pull request #188: [FLINK-30249] Fix TableUtils.getRowTypeInfo by using ExternalTypeInfo.of()

2023-01-05 Thread GitBox



lindong28 merged PR #188:
URL: https://github.com/apache/flink-ml/pull/188


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on pull request #188: [FLINK-30249] Fix TableUtils.getRowTypeInfo by using ExternalTypeInfo.of()

2023-01-05 Thread GitBox



lindong28 commented on PR #188:
URL: https://github.com/apache/flink-ml/pull/188#issuecomment-1373194232

   Thanks for the update! LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] godfreyhe commented on a diff in pull request #21489: [FLINK-30365][table-planner] New dynamic partition pruning strategy to support more dpp patterns

2023-01-05 Thread GitBox



godfreyhe commented on code in PR #21489:
URL: https://github.com/apache/flink/pull/21489#discussion_r1063054685


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -137,6 +140,20 @@ private static RelNode convertDppFactSide(
 || !(tableSource instanceof ScanTableSource)) {
 return rel;
 }
+
+// Dpp cannot success if source support aggregate push down, 
source aggregate push
+// down enabled is true and aggregate push down success.
+if (tableSource instanceof SupportsAggregatePushDown
+&& ShortcutUtils.unwrapContext(rel)
+.getTableConfig()
+.get(
+OptimizerConfigOptions
+
.TABLE_OPTIMIZER_SOURCE_AGGREGATE_PUSHDOWN_ENABLED)
+&& Arrays.stream(tableSourceTable.abilitySpecs())

Review Comment:
   just need check 
`Arrays.stream(tableSourceTable.abilitySpecs()).anyMatch(spec -> spec 
instanceof AggregatePushDownSpec)`



##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkDynamicPartitionPruningProgram.java:
##
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.optimize.program;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.table.api.config.OptimizerConfigOptions;
+import 
org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalDynamicFilteringTableSourceScan;
+import 
org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalTableSourceScan;
+import org.apache.flink.table.planner.plan.utils.DefaultRelShuttle;
+import org.apache.flink.table.planner.utils.DynamicPartitionPruningUtils;
+import org.apache.flink.table.planner.utils.ShortcutUtils;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinInfo;
+import org.apache.calcite.rel.core.JoinRelType;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Planner program that tries to do partition prune in the execution phase, 
which can translate a
+ * {@link BatchPhysicalTableSourceScan} to a {@link 
BatchPhysicalDynamicFilteringTableSourceScan}
+ * whose source is a partition source. The {@link
+ * OptimizerConfigOptions#TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED} need to 
be true.
+ *
+ * Suppose we have the original physical plan:
+ *
+ * {@code
+ * LogicalProject(...)
+ * HashJoin(joinType=[InnerJoin], where=[=(fact_partition_key, dim_key)], 
select=[xxx])
+ *  * :- TableSourceScan(table=[[fact]], fields=[xxx, fact_partition_key],) # 
Is a partition table.
+ *  * +- Exchange(distribution=[broadcast])
+ *  *+- Calc(select=[xxx], where=[<(xxx, xxx)]) # Need have an arbitrary 
filter condition.
+ *  *   +- TableSourceScan(table=[[dim, filter=[]]], fields=[xxx, dim_key])
+ * }
+ *
+ * This physical plan will be rewritten to:
+ *
+ * {@code
+ * HashJoin(joinType=[InnerJoin], where=[=(fact_partition_key, dim_key)], 
select=[xxx])
+ * :- DynamicFilteringTableSourceScan(table=[[fact]], fields=[xxx, 
fact_partition_key]) # Is a partition table.
+ * :  +- DynamicFilteringDataCollector(fields=[dim_key])
+ * : +- Calc(select=[xxx], where=[<(xxx, xxx)])
+ * :+- TableSourceScan(table=[[dim, filter=[]]], fields=[xxx, dim_key])
+ * +- Exchange(distribution=[broadcast])
+ *+- Calc(select=[xxx], where=[<(xxx, xxx)]) # Need have an arbitrary 
filter condition.
+ *   +- TableSourceScan(table=[[dim, filter=[]]], fields=[xxx, dim_key])
+ * }
+ *
+ * We use a {@link FlinkOptimizeProgram} instead of a {@link 
org.apache.calcite.plan.RelRule} to
+ * realize dynamic partition pruning because the {@link 
org.apache.calcite.plan.hep.HepPlanner} in
+ * Flink doesn't support matching a simple join, and replacing one node on one 
side of the join
+ * node. After that, rebuilding this join node. This is a defect of the 
existing optimizer, and it's
+ * matching

[jira] [Closed] (FLINK-30523) Refine benchmark of vectorAssembler

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-30523.

Resolution: Fixed

> Refine benchmark of vectorAssembler
> ---
>
> Key: FLINK-30523
> URL: https://issues.apache.org/jira/browse/FLINK-30523
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: weibo zhao
>Assignee: weibo zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Refine benchmark of vectorAssembler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30523) Refine benchmark of vectorAssembler

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-30523:


Assignee: weibo zhao

> Refine benchmark of vectorAssembler
> ---
>
> Key: FLINK-30523
> URL: https://issues.apache.org/jira/browse/FLINK-30523
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: weibo zhao
>Assignee: weibo zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Refine benchmark of vectorAssembler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] weibozhao commented on pull request #189: [FLINK-30348] Add HasSeed param for RandomSplitter

2023-01-05 Thread GitBox



weibozhao commented on PR #189:
URL: https://github.com/apache/flink-ml/pull/189#issuecomment-1373185876

   > @weibozhao Thanks for the update :) Can you also update the PR title as 
well as the commit message to make it more informative? (e.g., `Add HasSeed 
param for RandomSplitter`)
   
   done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-29603) Add Transformer for StopWordsRemover

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-29603:


Assignee: Yunfeng Zhou

> Add Transformer for StopWordsRemover
> 
>
> Key: FLINK-29603
> URL: https://issues.apache.org/jira/browse/FLINK-29603
> Project: Flink
>  Issue Type: New Feature
>  Components: Library / Machine Learning
>Affects Versions: ml-2.2.0
>Reporter: Yunfeng Zhou
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Transformer for StopWordsRemover.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.feature.StopWordsRemover. The relevant PR should contain 
> the following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test (Optional)
>  * Markdown document (Optional)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #460: [FLINK-30573] Fix bug that Table Store dedicated compact job may skip some records when checkpoint interval is long

2023-01-05 Thread GitBox



tsreaper commented on code in PR #460:
URL: https://github.com/apache/flink-table-store/pull/460#discussion_r1063148351


##
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/sink/StoreSinkWrite.java:
##
@@ -39,6 +40,14 @@ interface StoreSinkWrite {
 
 void compact(BinaryRowData partition, int bucket, boolean fullCompaction) 
throws Exception;
 
+void compact(

Review Comment:
   `notifyNewFiles`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-29603) Add Transformer for StopWordsRemover

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-29603.

Resolution: Fixed

> Add Transformer for StopWordsRemover
> 
>
> Key: FLINK-29603
> URL: https://issues.apache.org/jira/browse/FLINK-29603
> Project: Flink
>  Issue Type: New Feature
>  Components: Library / Machine Learning
>Affects Versions: ml-2.2.0
>Reporter: Yunfeng Zhou
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Transformer for StopWordsRemover.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.feature.StopWordsRemover. The relevant PR should contain 
> the following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test (Optional)
>  * Markdown document (Optional)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-29601) Add Estimator and Transformer for UnivariateFeatureSelector

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-29601.

Resolution: Fixed

> Add Estimator and Transformer for UnivariateFeatureSelector
> ---
>
> Key: FLINK-29601
> URL: https://issues.apache.org/jira/browse/FLINK-29601
> Project: Flink
>  Issue Type: New Feature
>  Components: Library / Machine Learning
>Affects Versions: ml-2.2.0
>Reporter: Yunfeng Zhou
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Estimator and Transformer for UnivariateFeatureSelector.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.feature.UnivariateFeatureSelector. The relevant PR should 
> contain the following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test (Optional)
>  * Markdown document (Optional)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-29911) Improve performance of AgglomerativeClustering

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-29911.

Resolution: Fixed

> Improve performance of AgglomerativeClustering
> --
>
> Key: FLINK-29911
> URL: https://issues.apache.org/jira/browse/FLINK-29911
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Zhipeng Zhang
>Assignee: Zhipeng Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-29601) Add Estimator and Transformer for UnivariateFeatureSelector

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-29601:


Assignee: Jiang Xin

> Add Estimator and Transformer for UnivariateFeatureSelector
> ---
>
> Key: FLINK-29601
> URL: https://issues.apache.org/jira/browse/FLINK-29601
> Project: Flink
>  Issue Type: New Feature
>  Components: Library / Machine Learning
>Affects Versions: ml-2.2.0
>Reporter: Yunfeng Zhou
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Estimator and Transformer for UnivariateFeatureSelector.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.feature.UnivariateFeatureSelector. The relevant PR should 
> contain the following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test (Optional)
>  * Markdown document (Optional)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30159) Add Transformer for ANOVATest

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-30159:


Assignee: Jiang Xin

> Add Transformer for ANOVATest
> -
>
> Key: FLINK-30159
> URL: https://issues.apache.org/jira/browse/FLINK-30159
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Transformer for ANOVATest.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.stat.ANOVATest. The relevant PR should contain the 
> following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30160) Add Transformer for FValueTest

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-30160:


Assignee: Jiang Xin

> Add Transformer for FValueTest
> --
>
> Key: FLINK-30160
> URL: https://issues.apache.org/jira/browse/FLINK-30160
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Transformer for FValueTest.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.stat.FValueTest. The relevant PR should contain the 
> following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (FLINK-30160) Add Transformer for FValueTest

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved FLINK-30160.
--
Resolution: Fixed

> Add Transformer for FValueTest
> --
>
> Key: FLINK-30160
> URL: https://issues.apache.org/jira/browse/FLINK-30160
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add Transformer for FValueTest.
> Its function would be at least equivalent to Spark's 
> org.apache.spark.ml.stat.FValueTest. The relevant PR should contain the 
> following components:
>  * Java implementation/test (Must include)
>  * Python implementation/test 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30523) Refine benchmark of vectorAssembler

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated FLINK-30523:
-
Fix Version/s: ml-2.2.0

> Refine benchmark of vectorAssembler
> ---
>
> Key: FLINK-30523
> URL: https://issues.apache.org/jira/browse/FLINK-30523
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: weibo zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Refine benchmark of vectorAssembler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-30515) Add benchmark for CountVectorizer, Imputer, RobustScaler, UnivariateFeatureSelector and VarianceThresholdSelector

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-30515.

Resolution: Fixed

> Add benchmark for CountVectorizer, Imputer, RobustScaler, 
> UnivariateFeatureSelector and VarianceThresholdSelector
> -
>
> Key: FLINK-30515
> URL: https://issues.apache.org/jira/browse/FLINK-30515
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30515) Add benchmark for CountVectorizer, Imputer, RobustScaler, UnivariateFeatureSelector and VarianceThresholdSelector

2023-01-05 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-30515:


Assignee: Jiang Xin

> Add benchmark for CountVectorizer, Imputer, RobustScaler, 
> UnivariateFeatureSelector and VarianceThresholdSelector
> -
>
> Key: FLINK-30515
> URL: https://issues.apache.org/jira/browse/FLINK-30515
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] lindong28 merged pull request #193: [FLINK-30515] Add benchmark for CountVectorizer, Imputer, RobustScaler, UnivariateFeatureSelector and VarianceThresholdSelector

2023-01-05 Thread GitBox



lindong28 merged PR #193:
URL: https://github.com/apache/flink-ml/pull/193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a diff in pull request #193: [FLINK-30515] Add benchmark for CountVectorizer, Imputer, RobustScaler, UnivariateFeatureSelector and VarianceThresholdSelector

2023-01-05 Thread GitBox



lindong28 commented on code in PR #193:
URL: https://github.com/apache/flink-ml/pull/193#discussion_r1063143447


##
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/clustering/KMeansModelDataGenerator.java:
##
@@ -55,20 +61,36 @@ public Table[] getData(StreamTableEnvironment tEnv) {
 InputDataGenerator vectorArrayGenerator = new 
DenseVectorArrayGenerator();
 ReadWriteUtils.updateExistingParams(vectorArrayGenerator, paramMap);
 vectorArrayGenerator.setNumValues(1);
+vectorArrayGenerator.setColNames(new String[] {"centroids"});

Review Comment:
   Sounds good. Thanks for the explanation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on pull request #193: [FLINK-30515] Add benchmark for CountVectorizer, Imputer, RobustScaler, UnivariateFeatureSelector and VarianceThresholdSelector

2023-01-05 Thread GitBox



lindong28 commented on PR #193:
URL: https://github.com/apache/flink-ml/pull/193#issuecomment-1373179625

   Thanks for the update. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-29845) ThroughputCalculator throws java.lang.IllegalArgumentException: Time should be non negative under very low throughput cluster

2023-01-05 Thread Weijie Guo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655255#comment-17655255
 ] 

Weijie Guo commented on FLINK-29845:


[~dawnwords] Thanks for reporting this.

After FLINK-24189, BufferDebloater will not be scheduled if 
`buffer-debloat.enabled` is false. So, I think this problem is fixed in 
flink-1.15`.

As for the concerns raised by [~kevin.cyj] , I also think it's better to give 
it a fix. If you don't have enough time, I can fix this.

> ThroughputCalculator throws java.lang.IllegalArgumentException: Time should 
> be non negative under very low throughput cluster
> -
>
> Key: FLINK-29845
> URL: https://issues.apache.org/jira/browse/FLINK-29845
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network, Runtime / Task
>Affects Versions: 1.14.6
>Reporter: Jingxiao GU
>Priority: Major
>
> Our team are using Flink@1.14.6 to process data from Kafka.
> It works all fine unless the same job jar with same arguments deployed in an 
> environment with{color:#ff} *very low kafka source throughput.*{color} 
> The job crashed sometimes with the following Exception and could not be able 
> to recover unless we restarted TaskManagers, which is unacceptable for a 
> production environment.
> {code:java}
> [2022-10-31T15:33:57.153+08:00] [o.a.f.runtime.taskmanager.Task#cess 
> (2/16)#244] - [WARN ] KeyedProcess (2/16)#244 
> (b9b54f6445419fc43c4d58fcd95cee82) switched from RUNNING to FAILED with 
> failure cause: java.lang.IllegalArgumentException: Time should be non negative
>   at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
>   at 
> org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44)
>   at 
> org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:789)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:781)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:806)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:758)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> After checking the source code roughly, we found if buffer debloating is 
> disabled 
> ([https://github.com/apache/flink/blob/release-1.14.6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L427]
>  ), the buffer debloater will still be scheduled 
> ([https://github.com/apache/flink/blob/release-1.14.6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L755]
>  ) so that the {{ThrouputCalculator}}  keeps calculating the throughput 
> ([https://github.com/apache/flink/blob/release-1.14.6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L789]
>  ) which causes the division of zero and seems useless as i suppose.
> Currently, we tried to workaround by setting 
> {{taskmanager.network.memory.buffer-debloat.period: 365d}} to avoid the 
> buffer debloater being scheduled frequently causing the random crash.
> P.S. We found a bug with similar stacktrace 
> https://issues.apache.org/jira/browse/FLINK-25454 which was fixed in 1.14.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30582) Flink-avro Flink-orc free for flink-table-store-format

2023-01-05 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-30582:


 Summary: Flink-avro Flink-orc free for flink-table-store-format
 Key: FLINK-30582
 URL: https://issues.apache.org/jira/browse/FLINK-30582
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces

2023-01-05 Thread James Busche (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655246#comment-17655246
 ] 

James Busche commented on FLINK-30577:
--

So document that they should edit the configmap and restart the operator? I 
just tried that and it worked:

 

oc edit cm flink-operator-config -n openshift-operators
{quote}change:

    # kubernetes.operator.user.artifacts.base.dir: /opt/flink/artifacts

to

    kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts
{quote}
 

Restart the operator pod:

oc get pods -n openshift-operators
{quote}NAME                                         READY   STATUS    RESTARTS  
 AGE

flink-kubernetes-operator-5f5bb584db-t75ts   2/2     Running   0          4m45s
{quote}
oc delete pod flink-kubernetes-operator-5f5bb584db-t75ts -n openshift-operators

 

Then in a non-default namespace the session jobs work:

oc get flinksessionjobs -n jim
{quote}NAME                             JOB STATUS   RECONCILIATION STATUS

basic-session-job-example        RUNNING      DEPLOYED

basic-session-job-example2       RUNNING      DEPLOYED

basic-session-job-only-example   RUNNING      DEPLOYED
{quote}

> OpenShift FlinkSessionJob artifact write error on non-default namespaces
> 
>
> Key: FLINK-30577
> URL: https://issues.apache.org/jira/browse/FLINK-30577
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.0
>Reporter: James Busche
>Priority: Major
>
> [~tagarr] has pointed out an issue with using the /opt/flink/artifacts 
> filesystem on OpenShift in non-default namespaces.  The OpenShift permissions 
> don't allow write to /opt.  
> ```
> org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: 
> /opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example
> ```
> A few ways to solve the problem are:
> 1. Remove the comment on line 34 here in 
> [flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34]
>  and change it to: /tmp/flink/artifacts
> 2. Append this after line 143 here in 
> [values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]:
> kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts
> 3.  Changing it in line 142 of 
> [KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142]
>  like this:
> .defaultValue("/tmp/flink/artifacts") 
> and then rebuilding the operator image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] ramkrish86 commented on pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path

2023-01-05 Thread GitBox



ramkrish86 commented on PR #21508:
URL: https://github.com/apache/flink/pull/21508#issuecomment-1373152299

   Thanks @xintongsong . Also it would be great if this can be there in 1.17 
release. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30108) ZooKeeperLeaderElectionConnectionHandlingTest.testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled times out

2023-01-05 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655241#comment-17655241
 ] 

Zhu Zhu commented on FLINK-30108:
-

Thanks for the input! [~mapohl]
I can see tens of thousands occurring of "type:ping XXX txntype:unknown 
reqpath:n/a" logs in the zk server of this problematic case, which never 
happened for other tests.

There are such kind of ZK client logs which does not happen in other cases

{code:java}
00:59:34,271 [main-SendThread(127.0.0.1:42967)] WARN  
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Client 
session timed out, have not heard from server in 1001ms for sessionid 0x0
{code}

What's unexpected here is that the leader election is pretty slow in this case. 
For other cases, it takes tens of milli-seconds to get leadership after 
connected to ZK. However, in this case, the process did not finish after 
several seconds. Another unexpected point is the leader election did not 
succeed after reconnected to ZK.

I have no idea of the root cause. But I suspect it is the session timeout which 
triggers this problem. The session timeout happens due to the small timeout 
(1000ms) of the case 
{{testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled}}.

> ZooKeeperLeaderElectionConnectionHandlingTest.testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled
>  times out
> -
>
> Key: FLINK-30108
> URL: https://issues.apache.org/jira/browse/FLINK-30108
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.17.0
>Reporter: Leonard Xu
>Priority: Major
>  Labels: test-stability
> Attachments: zookeeper-server.FLINK-30108.log
>
>
> {noformat}
> Nov 18 01:02:58 [INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, 
> Time elapsed: 109.22 s - in 
> org.apache.flink.runtime.operators.hash.InPlaceMutableHashTableTest
> Nov 18 01:18:09 
> ==
> Nov 18 01:18:09 Process produced no output for 900 seconds.
> Nov 18 01:18:09 
> ==
> Nov 18 01:18:09 
> ==
> Nov 18 01:18:09 The following Java processes are running (JPS)
> Nov 18 01:18:09 
> ==
> Picked up JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> Nov 18 01:18:09 924 Launcher
> Nov 18 01:18:09 23421 surefirebooter1178962604207099497.jar
> Nov 18 01:18:09 11885 Jps
> Nov 18 01:18:09 
> ==
> Nov 18 01:18:09 Printing stack trace of Java process 924
> Nov 18 01:18:09 
> ==
> Picked up JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> Nov 18 01:18:09 2022-11-18 01:18:09
> Nov 18 01:18:09 Full thread dump OpenJDK 64-Bit Server VM (25.292-b10 mixed 
> mode):
> ...
> ...
> ...
> Nov 18 01:18:09 
> ==
> Nov 18 01:18:09 Printing stack trace of Java process 11885
> Nov 18 01:18:09 
> ==
> 11885: No such process
> Nov 18 01:18:09 Killing process with pid=923 and all descendants
> /__w/2/s/tools/ci/watchdog.sh: line 113:   923 Terminated  $cmd
> Nov 18 01:18:10 Process exited with EXIT CODE: 143.
> Nov 18 01:18:10 Trying to KILL watchdog (919).
> Nov 18 01:18:10 Searching for .dump, .dumpstream and related files in 
> '/__w/2/s'
> Nov 18 01:18:16 Moving 
> '/__w/2/s/flink-runtime/target/surefire-reports/2022-11-18T00-55-55_041-jvmRun3.dumpstream'
>  to target directory ('/__w/_temp/debug_files')
> Nov 18 01:18:16 Moving 
> '/__w/2/s/flink-runtime/target/surefire-reports/2022-11-18T00-55-55_041-jvmRun3.dump'
>  to target directory ('/__w/_temp/debug_files')
> The STDIO streams did not close within 10 seconds of the exit event from 
> process '/bin/bash'. This may indicate a child process inherited the STDIO 
> streams and has not yet exited.
> ##[error]Bash exited with code '143'.
> Finishing: Test - core
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43277=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30561) ChangelogStreamHandleReaderWithCache cause FileNotFoundException

2023-01-05 Thread Yanfei Lei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655240#comment-17655240
 ] 

Yanfei Lei commented on FLINK-30561:


Hi [~Feifan Wang], could you please share what circumstances the error occurs? 

> This happens when changeIterator.read(tuple2.f0, tuple2.f1) throws an 
> exception (for example, when the task is canceled for other reasons during 
> the restore process) 

IIUC, when the task is canceled for other reason, the whole job is canceled, 
for the next restart, the `FileNotFoundException` will not affect the next run, 
and the refCount would be reset in the next run. Will the previous refCount 
still be used after the job is canceled?

> ChangelogStreamHandleReaderWithCache cause FileNotFoundException
> 
>
> Key: FLINK-30561
> URL: https://issues.apache.org/jira/browse/FLINK-30561
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Feifan Wang
>Priority: Major
>
> When a job with state changelog enabled continues to restart, the following 
> exceptions may occur :
> {code:java}
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data1/hadoop/yarn/nm-local-dir/usercache/hadoop-rt/appcache/application_1671689962742_192/dstl-cache-file/dstl6215344559415829831.tmp
>  (No such file or directory)
>     at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>     at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>     at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>     at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:107)
>     at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:78)
>     at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:94)
>     at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:265)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:726)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:702)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:669)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /data1/hadoop/yarn/nm-local-dir/usercache/hadoop-rt/appcache/application_1671689962742_192/dstl-cache-file/dstl6215344559415829831.tmp
>  (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.init(FileInputStream.java:138)
>     at 
> org.apache.flink.changelog.fs.ChangelogStreamHandleReaderWithCache.openAndSeek(ChangelogStreamHandleReaderWithCache.java:158)
>     at 
> org.apache.flink.changelog.fs.ChangelogStreamHandleReaderWithCache.openAndSeek(ChangelogStreamHandleReaderWithCache.java:95)
>     at 
>

[jira] [Commented] (FLINK-30512) Flink SQL state TTL has no effect when using Interval Join

2023-01-05 Thread Yanfei Lei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655237#comment-17655237
 ] 

Yanfei Lei commented on FLINK-30512:


Thanks for reporting this issue and sharing the solution.

>From your screenshot, the job ran for 45min, your ttl was set to 15min and the 
>checkpoint interval is 5min. The full checkpoint size of chk-14 and chk-18 is 
>smaller than the previous one, so I think TTL is in effect. 

> the state size is getting bigger and bigger.
I guess it is because the TPS of your job is relatively high during this 
period, new states grow faster than old states expire.

 

> Flink SQL state TTL has no effect when using Interval Join
> --
>
> Key: FLINK-30512
> URL: https://issues.apache.org/jira/browse/FLINK-30512
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends, Table SQL / Runtime
>Affects Versions: 1.15.1, 1.16.1
>Reporter: wangkang
>Priority: Major
> Attachments: flink1.16.png
>
>
> Take the following join SQL program as an example:
> {code:java}
> SET 'table.exec.state.ttl' = '90 ms';
> select 
> ...
> from kafka_source_dwdexpose as t1
> left join kafka_source_expose_attr_click t3 
> ON t1.mid = t3.mid and t1.sr = t3.sr 
> and t1.time_local = t3.time_local 
> and t1.log_ltz BETWEEN t3.log_ltz - INTERVAL '2' MINUTE  AND t3.log_ltz + 
> INTERVAL '2' MINUTE {code}
> !flink1.16.png|width=906,height=278!
> the state size is getting bigger and bigger.
> we also test the same sql with flink sql 1.13,the state size is stable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] Mrart commented on pull request #21527: [FLINK-27925] [kubernetes]Performance optimization when watch tm pod and list pod.

2023-01-05 Thread GitBox



Mrart commented on PR #21527:
URL: https://github.com/apache/flink/pull/21527#issuecomment-1373105538

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #193: [FLINK-30515] Add benchmark for CountVectorizer, Imputer, RobustScaler, UnivariateFeatureSelector and VarianceThresholdSelecto

2023-01-05 Thread GitBox



jiangxin369 commented on code in PR #193:
URL: https://github.com/apache/flink-ml/pull/193#discussion_r1063068991


##
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/clustering/KMeansModelDataGenerator.java:
##
@@ -55,20 +61,36 @@ public Table[] getData(StreamTableEnvironment tEnv) {
 InputDataGenerator vectorArrayGenerator = new 
DenseVectorArrayGenerator();
 ReadWriteUtils.updateExistingParams(vectorArrayGenerator, paramMap);
 vectorArrayGenerator.setNumValues(1);
+vectorArrayGenerator.setColNames(new String[] {"centroids"});

Review Comment:
   Adding the first line `vectorArrayGenerator.setColNames(new String[] 
{"centroids"});` is because `DenseVectorArrayGenerator` requires `columnNames` 
is set, otherwise NPE would be thrown. 
   The other changes are to reduce the operation of `toDataStream` and 
`fromDataStream`, just do the conversion base on `Table`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] swuferhong commented on a diff in pull request #21597: [FLINK-30554][sql-client] Remove useless sessionId in the Executor

2023-01-05 Thread GitBox



swuferhong commented on code in PR #21597:
URL: https://github.com/apache/flink/pull/21597#discussion_r1063058379


##
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/LocalExecutor.java:
##
@@ -99,88 +94,69 @@ public void start() {
 }
 
 @Override
-public String openSession(@Nullable String sessionId) throws 
SqlExecutionException {
-SessionContext sessionContext =
-LocalContextUtils.buildSessionContext(sessionId, 
defaultContext);
-sessionId = sessionContext.getSessionId();
-if (this.contextMap.containsKey(sessionId)) {
-throw new SqlExecutionException(
-"Found another session with the same session identifier: " 
+ sessionId);
-} else {
-this.contextMap.put(sessionId, sessionContext);
-}
-return sessionId;
+public void openSession(@Nullable String sessionId) throws 
SqlExecutionException {
+// do nothing
+sessionContext = LocalContextUtils.buildSessionContext(sessionId, 
defaultContext);
 }
 
 @Override
-public void closeSession(String sessionId) throws SqlExecutionException {
+public void closeSession() throws SqlExecutionException {
 resultStore
 .getResults()
 .forEach(
 (resultId) -> {
 try {
-cancelQuery(sessionId, resultId);
+cancelQuery(resultId);
 } catch (Throwable t) {
 // ignore any throwable to keep the clean up 
running
 }
 });
 // Remove the session's ExecutionContext from contextMap and close it.
-SessionContext context = this.contextMap.remove(sessionId);
-if (context != null) {
-context.close();
-}
-}
-
-private SessionContext getSessionContext(String sessionId) {
-SessionContext context = this.contextMap.get(sessionId);
-if (context == null) {
-throw new SqlExecutionException("Invalid session identifier: " + 
sessionId);
+if (sessionContext != null) {
+sessionContext.close();
 }
-return context;
 }
 
 /**
  * Get the existed {@link ExecutionContext} from contextMap, or thrown 
exception if does not
  * exist.
  */
 @VisibleForTesting
-protected ExecutionContext getExecutionContext(String sessionId) throws 
SqlExecutionException {
-return getSessionContext(sessionId).getExecutionContext();
+protected ExecutionContext getExecutionContext() throws 
SqlExecutionException {
+return sessionContext.getExecutionContext();
 }
 
 @Override
-public Map getSessionConfigMap(String sessionId) throws 
SqlExecutionException {
-return getSessionContext(sessionId).getConfigMap();
+public Map getSessionConfigMap() throws 
SqlExecutionException {
+return sessionContext.getConfigMap();
 }
 
 @Override
-public ReadableConfig getSessionConfig(String sessionId) throws 
SqlExecutionException {
-return getSessionContext(sessionId).getReadableConfig();
+public ReadableConfig getSessionConfig() throws SqlExecutionException {
+return sessionContext.getReadableConfig();
 }
 
 @Override
-public void resetSessionProperties(String sessionId) throws 
SqlExecutionException {
-SessionContext context = getSessionContext(sessionId);
+public void resetSessionProperties() throws SqlExecutionException {
+SessionContext context = sessionContext;
 context.reset();
 }
 
 @Override
-public void resetSessionProperty(String sessionId, String key) throws 
SqlExecutionException {
-SessionContext context = getSessionContext(sessionId);
+public void resetSessionProperty(String key) throws SqlExecutionException {
+SessionContext context = sessionContext;
 context.reset(key);
 }
 
 @Override
-public void setSessionProperty(String sessionId, String key, String value)
-throws SqlExecutionException {
-SessionContext context = getSessionContext(sessionId);
+public void setSessionProperty(String key, String value) throws 
SqlExecutionException {
+SessionContext context = sessionContext;
 context.set(key, value);

Review Comment:
   ditto



##
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/LocalExecutor.java:
##
@@ -320,19 +294,17 @@ public void cancelQuery(String sessionId, String 
resultId) throws SqlExecutionEx
 }
 
 @Override
-public void removeJar(String sessionId, String jarUrl) {
-final SessionContext context = getSessionContext(sessionId);
+public void removeJar(String jarUrl) {
+final SessionContext context =

[jira] [Created] (FLINK-30581) Deprecate FileStoreTableITCase and use CatalogITCaseBase

2023-01-05 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-30581:


 Summary: Deprecate FileStoreTableITCase and use CatalogITCaseBase
 Key: FLINK-30581
 URL: https://issues.apache.org/jira/browse/FLINK-30581
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


We recommend users to use Catalog tables instead managed tables.
Managed tables should be deprecated. Now we already did not expose managed in 
documentation. We can remove it.
Before removing, tests should be refactored.

FileStoreTableITCase with managed tables should be changed to CatalogITCaseBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30580) [umbrella] Refactor tests for table store

2023-01-05 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-30580:


 Summary: [umbrella] Refactor tests for table store
 Key: FLINK-30580
 URL: https://issues.apache.org/jira/browse/FLINK-30580
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


This is a umbrella issue to improve tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] reswqa commented on pull request #21560: [FLINK-29768] Hybrid shuffle supports consume partial finished subtask

2023-01-05 Thread GitBox



reswqa commented on PR #21560:
URL: https://github.com/apache/flink/pull/21560#issuecomment-1373058743

   @xintongsong Thanks for the review, I have updated this pr, PTAL~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-30579) Introducing cofigurable option to enable hive native function

2023-01-05 Thread dalongliu (Jira)

dalongliu created FLINK-30579:
-

 Summary: Introducing cofigurable option to enable hive native 
function
 Key: FLINK-30579
 URL: https://issues.apache.org/jira/browse/FLINK-30579
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Affects Versions: 1.16.0
Reporter: dalongliu
 Fix For: 1.17.0


Currently, hive native function implementation can't assign behavior with hive 
udaf, so we should introduce an configurable option to allow enable this 
optimization, the default behavior is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30576) JdbcOutputFormat refactor

2023-01-05 Thread Lijie Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655209#comment-17655209
 ] 

Lijie Wang commented on FLINK-30576:


I think all changes on the same PR is better, a separate commit is enough.

> JdbcOutputFormat refactor
> -
>
> Key: FLINK-30576
> URL: https://issues.apache.org/jira/browse/FLINK-30576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> This refactor is to allow the use of JdbcOutputFormat on Sink2
> Actually the JdbcOutputFormat needs the RuntimeContext to check if 
> ObjectReuse is active or not..
> The refactor is for change from RuntimeContext to ExecutionConfig (we still 
> need that ExecutionConfig be available on Sink2.InitContext, and a FLIP will 
> be raised)
>  
> [~wanglijie] this is what we talk about



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #188: [FLINK-30249] Fix TableUtils.getRowTypeInfo by using ExternalTypeInfo.of()

2023-01-05 Thread GitBox



zhipeng93 commented on code in PR #188:
URL: https://github.com/apache/flink-ml/pull/188#discussion_r1063043750


##
flink-ml-core/src/main/java/org/apache/flink/ml/common/datastream/TableUtils.java:
##
@@ -26,18 +26,20 @@
 import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
 import org.apache.flink.table.catalog.Column;
 import org.apache.flink.table.catalog.ResolvedSchema;
+import org.apache.flink.table.runtime.typeutils.ExternalTypeInfo;
 import org.apache.flink.types.Row;
 
 /** Utility class for operations related to Table API. */
 public class TableUtils {
-// Constructs a RowTypeInfo from the given schema.
+// Constructs a RowTypeInfo from the given schema. Currently, this 
function does not support
+// the case when the input contains DataTypes.TIMESTAMP_WITH_TIME_ZONE().
 public static RowTypeInfo getRowTypeInfo(ResolvedSchema schema) {
 TypeInformation[] types = new 
TypeInformation[schema.getColumnCount()];
 String[] names = new String[schema.getColumnCount()];
 
 for (int i = 0; i < schema.getColumnCount(); i++) {
 Column column = schema.getColumn(i).get();
-types[i] = 
TypeInformation.of(column.getDataType().getConversionClass());
+types[i] = ExternalTypeInfo.of(column.getDataType());

Review Comment:
   Thanks for the comment. I have updated the PR to use `ExternalTypeInfo.of()` 
when necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30576) JdbcOutputFormat refactor

2023-01-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30576:
---
Labels: pull-request-available  (was: )

> JdbcOutputFormat refactor
> -
>
> Key: FLINK-30576
> URL: https://issues.apache.org/jira/browse/FLINK-30576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Priority: Major
>  Labels: pull-request-available
>
> This refactor is to allow the use of JdbcOutputFormat on Sink2
> Actually the JdbcOutputFormat needs the RuntimeContext to check if 
> ObjectReuse is active or not..
> The refactor is for change from RuntimeContext to ExecutionConfig (we still 
> need that ExecutionConfig be available on Sink2.InitContext, and a FLIP will 
> be raised)
>  
> [~wanglijie] this is what we talk about



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-connector-jdbc] wanglijie95 commented on pull request #15: [FLINK-30576] Refactor JdbcOutputFormat to use ExecutionConfig

2023-01-05 Thread GitBox



wanglijie95 commented on PR #15:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/15#issuecomment-1373051123

   @eskabetxe Can we start reviewing this change after  the FLIP is approved ? 
Because we don’t know what we can get at the moment, executionConfig or 
objectReuse, or something else, which will decide how we modify the 
`OutputFormat#open`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-30255) Throw exception for upper case fields are used in hive metastore

2023-01-05 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-30255.

Fix Version/s: table-store-0.3.0
 Assignee: Shammon
   Resolution: Fixed

master: cbdb78a37d520e2adc208867957ce8d5134d0a09
release-0.3: 947a8e2b9fe38f9d647c6ad13f39b294ff541aea

> Throw exception for upper case fields are used in hive metastore
> 
>
> Key: FLINK-30255
> URL: https://issues.apache.org/jira/browse/FLINK-30255
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Affects Versions: table-store-0.3.0
>Reporter: Shammon
>Assignee: Shammon
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.3.0
>
>
> Currently there will be incompatible when user use upper case in hive 
> metastore and table store, we should throw exception for it and find a more 
> elegant compatibility mode later



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] JingsongLi merged pull request #413: [FLINK-30255] Throw exception when names are upper case in HiveCatalog

2023-01-05 Thread GitBox



JingsongLi merged PR #413:
URL: https://github.com/apache/flink-table-store/pull/413


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-30569) File Format can not change with data file exists

2023-01-05 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-30569.

Resolution: Fixed

master: 63a27cd7af945839f67afaf6e946bcf617ad18a2

> File Format can not change with data file exists
> 
>
> Key: FLINK-30569
> URL: https://issues.apache.org/jira/browse/FLINK-30569
> Project: Flink
>  Issue Type: Bug
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> # Set file format to orc
> # Write records.
> # Set file format to parquet.
> # Write records
> # Read -> throw exception...
> We should support change file format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] JingsongLi merged pull request #459: [FLINK-30569] File Format can not change with data file exists

2023-01-05 Thread GitBox



JingsongLi merged PR #459:
URL: https://github.com/apache/flink-table-store/pull/459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #21607: fix(sec): upgrade org.apache.calcite:calcite-core to 1.32.0

2023-01-05 Thread GitBox



flinkbot commented on PR #21607:
URL: https://github.com/apache/flink/pull/21607#issuecomment-1373009717

   
   ## CI report:
   
   * cac8ff40e6369c303613b86e1bd8f9ed5e50505f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30578) Publish SBOM artifacts

2023-01-05 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated FLINK-30578:
--
Summary: Publish SBOM artifacts  (was: AVRO-3700: Publish SBOM artifacts)

> Publish SBOM artifacts
> --
>
> Key: FLINK-30578
> URL: https://issues.apache.org/jira/browse/FLINK-30578
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.17.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] dongjoon-hyun commented on pull request #21606: [FLINK-30578][build] Publish SBOM artifacts

2023-01-05 Thread GitBox



dongjoon-hyun commented on PR #21606:
URL: https://github.com/apache/flink/pull/21606#issuecomment-1372997984

   cc @gyfora , @mbalassi , @morhidi, @gaborgsomogyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #21606: [FLINK-30578][build] Publish SBOM artifacts

2023-01-05 Thread GitBox



flinkbot commented on PR #21606:
URL: https://github.com/apache/flink/pull/21606#issuecomment-1372973843

   
   ## CI report:
   
   * 8059fbaf55cd2876826e9e6cc58fb27d77f126b6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30578) AVRO-3700: Publish SBOM artifacts

2023-01-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30578:
---
Labels: pull-request-available  (was: )

> AVRO-3700: Publish SBOM artifacts
> -
>
> Key: FLINK-30578
> URL: https://issues.apache.org/jira/browse/FLINK-30578
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.17.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] dongjoon-hyun opened a new pull request, #21606: [FLINK-30578][build] Publish SBOM artifacts

2023-01-05 Thread GitBox



dongjoon-hyun opened a new pull request, #21606:
URL: https://github.com/apache/flink/pull/21606

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-30578) AVRO-3700: Publish SBOM artifacts

2023-01-05 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created FLINK-30578:
-

 Summary: AVRO-3700: Publish SBOM artifacts
 Key: FLINK-30578
 URL: https://issues.apache.org/jira/browse/FLINK-30578
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.17.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] kristoffSC commented on pull request #21393: [Draft][FLINK-27246_master] - Split generated java methods - Work in progress

2023-01-05 Thread GitBox



kristoffSC commented on PR #21393:
URL: https://github.com/apache/flink/pull/21393#issuecomment-1372780716

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces

2023-01-05 Thread Gyula Fora (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655131#comment-17655131
 ] 

Gyula Fora commented on FLINK-30577:


Or users on open shift could simply configure the users.artifact.base.dir to 
something other than the default:) 

> OpenShift FlinkSessionJob artifact write error on non-default namespaces
> 
>
> Key: FLINK-30577
> URL: https://issues.apache.org/jira/browse/FLINK-30577
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.0
>Reporter: James Busche
>Priority: Major
>
> [~tagarr] has pointed out an issue with using the /opt/flink/artifacts 
> filesystem on OpenShift in non-default namespaces.  The OpenShift permissions 
> don't allow write to /opt.  
> ```
> org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: 
> /opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example
> ```
> A few ways to solve the problem are:
> 1. Remove the comment on line 34 here in 
> [flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34]
>  and change it to: /tmp/flink/artifacts
> 2. Append this after line 143 here in 
> [values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]:
> kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts
> 3.  Changing it in line 142 of 
> [KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142]
>  like this:
> .defaultValue("/tmp/flink/artifacts") 
> and then rebuilding the operator image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces

2023-01-05 Thread James Busche (Jira)

James Busche created FLINK-30577:


 Summary: OpenShift FlinkSessionJob artifact write error on 
non-default namespaces
 Key: FLINK-30577
 URL: https://issues.apache.org/jira/browse/FLINK-30577
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.0
Reporter: James Busche


[~tagarr] has pointed out an issue with using the /opt/flink/artifacts 
filesystem on OpenShift in non-default namespaces.  The OpenShift permissions 
don't allow write to /opt.  
```
org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: 
/opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example
```
A few ways to solve the problem are:
1. Remove the comment on line 34 here in 
[flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34]
 and change it to: /tmp/flink/artifacts

2. Append this after line 143 here in 
[values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]:
kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts

3.  Changing it in line 142 of 
[KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142]
 like this:
.defaultValue("/tmp/flink/artifacts") 
and then rebuilding the operator image.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-21383) Docker image does not play well together with ConfigMap based flink-conf.yamls

2023-01-05 Thread Arseniy Tashoyan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655112#comment-17655112
 ] 

Arseniy Tashoyan edited comment on FLINK-21383 at 1/5/23 8:04 PM:
--

One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}
The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:
{code:bash}
envsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}
This attempt fails because the ConfigMap is mounted readonly:
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
UPDATE: The snakeyaml library used by Flink seems to support environment 
variables: 
[https://bitbucket.org/snakeyaml/snakeyaml/wiki/Variable%20substitution]. This 
library could be used to parse {_}flink-conf.yaml{_}.


was (Author: tashoyan):
One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}

The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:
{code:bash}
envsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}
This attempt fails because the ConfigMap is mounted readonly:
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
 

> Docker image does not play well together with ConfigMap based flink-conf.yamls
> --
>
> Key: FLINK-21383
> URL: https://issues.apache.org/jira/browse/FLINK-21383
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, flink-docker
>Affects Versions: 1.11.6, 1.12.7, 1.13.5, 1.14.3
>Reporter: Till Rohrmann
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> Flink's Docker image does not play well together with ConfigMap based 
> flink-conf.yamls. The {{docker-entrypoint.sh}} script offers a few env 
> variables to overwrite configuration values (e.g. {{FLINK_PROPERTIES}}, 
> {{JOB_MANAGER_RPC_ADDRESS}}, etc.). The problem is that the entrypoint script 
> assumes that it can modify the existing {{flink-conf.yaml}}. This is not the 
> case if the {{flink-conf.yaml}} is based on a {{ConfigMap}}.
> Making things worse, failures updating the {{flink-conf.yaml}} are not 
> reported. Moreover, the called {{jobmanager.sh}} and {{taskmanager.sh}} 
> scripts don't support to pass in dynamic configuration properties into the 
> processes.
> I think the problem is that our assumption that we can modify the 
> {{flink-conf.yaml}} does not always hold true. If we updated the final 
> configuration from within the Flink process (dynamic properties and env 
> variables), then this problem could be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-21383) Docker image does not play well together with ConfigMap based flink-conf.yamls

2023-01-05 Thread Arseniy Tashoyan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655112#comment-17655112
 ] 

Arseniy Tashoyan edited comment on FLINK-21383 at 1/5/23 7:39 PM:
--

One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}

The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:
{code:bash}
envsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}
This attempt fails because the ConfigMap is mounted readonly:
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
 


was (Author: tashoyan):
One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}
  
The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:

{code:bash}
envsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}

This attempt fails because the ConfigMap is mounted readonly:
 
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
 

> Docker image does not play well together with ConfigMap based flink-conf.yamls
> --
>
> Key: FLINK-21383
> URL: https://issues.apache.org/jira/browse/FLINK-21383
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, flink-docker
>Affects Versions: 1.11.6, 1.12.7, 1.13.5, 1.14.3
>Reporter: Till Rohrmann
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> Flink's Docker image does not play well together with ConfigMap based 
> flink-conf.yamls. The {{docker-entrypoint.sh}} script offers a few env 
> variables to overwrite configuration values (e.g. {{FLINK_PROPERTIES}}, 
> {{JOB_MANAGER_RPC_ADDRESS}}, etc.). The problem is that the entrypoint script 
> assumes that it can modify the existing {{flink-conf.yaml}}. This is not the 
> case if the {{flink-conf.yaml}} is based on a {{ConfigMap}}.
> Making things worse, failures updating the {{flink-conf.yaml}} are not 
> reported. Moreover, the called {{jobmanager.sh}} and {{taskmanager.sh}} 
> scripts don't support to pass in dynamic configuration properties into the 
> processes.
> I think the problem is that our assumption that we can modify the 
> {{flink-conf.yaml}} does not always hold true. If we updated the final 
> configuration from within the Flink process (dynamic properties and env 
> variables), then this problem could be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-21383) Docker image does not play well together with ConfigMap based flink-conf.yamls

2023-01-05 Thread Arseniy Tashoyan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655112#comment-17655112
 ] 

Arseniy Tashoyan edited comment on FLINK-21383 at 1/5/23 7:38 PM:
--

One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}
  
The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:

{code:bash}
envsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}

This attempt fails because the ConfigMap is mounted readonly:
 
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
 


was (Author: tashoyan):
One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}
  
The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:
 
{code}
Unable to find source-code formatter for language: shell. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlenvsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}
This attempt fails because the ConfigMap is mounted readonly:
 
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
 

> Docker image does not play well together with ConfigMap based flink-conf.yamls
> --
>
> Key: FLINK-21383
> URL: https://issues.apache.org/jira/browse/FLINK-21383
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, flink-docker
>Affects Versions: 1.11.6, 1.12.7, 1.13.5, 1.14.3
>Reporter: Till Rohrmann
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> Flink's Docker image does not play well together with ConfigMap based 
> flink-conf.yamls. The {{docker-entrypoint.sh}} script offers a few env 
> variables to overwrite configuration values (e.g. {{FLINK_PROPERTIES}}, 
> {{JOB_MANAGER_RPC_ADDRESS}}, etc.). The problem is that the entrypoint script 
> assumes that it can modify the existing {{flink-conf.yaml}}. This is not the 
> case if the {{flink-conf.yaml}} is based on a {{ConfigMap}}.
> Making things worse, failures updating the {{flink-conf.yaml}} are not 
> reported. Moreover, the called {{jobmanager.sh}} and {{taskmanager.sh}} 
> scripts don't support to pass in dynamic configuration properties into the 
> processes.
> I think the problem is that our assumption that we can modify the 
> {{flink-conf.yaml}} does not always hold true. If we updated the final 
> configuration from within the Flink process (dynamic properties and env 
> variables), then this problem could be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-21383) Docker image does not play well together with ConfigMap based flink-conf.yamls

2023-01-05 Thread Arseniy Tashoyan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655112#comment-17655112
 ] 

Arseniy Tashoyan commented on FLINK-21383:
--

One more problem caused by this issue: it is impossible to use environment 
variables in _flink-conf.yaml_

For example:
{code:yaml}
metrics.reporter.custom_reporter.username: ${reporter_username}
metrics.reporter.custom_reporter.password: ${reporter_password}
{code}
  
The script _docker-entrypoints.sh_ makes an attempt to rewrite 
_flink-conf.yaml_ with resolved environment variables:
 
{code}
Unable to find source-code formatter for language: shell. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlenvsubst < "${CONF_FILE}" > "${CONF_FILE}.tmp" && mv "${CONF_FILE}.tmp" 
"${CONF_FILE}"
{code}
This attempt fails because the ConfigMap is mounted readonly:
 
{code:none}
/docker-entrypoint.sh: line 89: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
{code}
Normally, Flink should resolve environment variables when reading values from 
{_}flink-conf.yaml{_}. A proper way could be to use a template engine like 
Apache Velocity.
 
 

> Docker image does not play well together with ConfigMap based flink-conf.yamls
> --
>
> Key: FLINK-21383
> URL: https://issues.apache.org/jira/browse/FLINK-21383
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, flink-docker
>Affects Versions: 1.11.6, 1.12.7, 1.13.5, 1.14.3
>Reporter: Till Rohrmann
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> Flink's Docker image does not play well together with ConfigMap based 
> flink-conf.yamls. The {{docker-entrypoint.sh}} script offers a few env 
> variables to overwrite configuration values (e.g. {{FLINK_PROPERTIES}}, 
> {{JOB_MANAGER_RPC_ADDRESS}}, etc.). The problem is that the entrypoint script 
> assumes that it can modify the existing {{flink-conf.yaml}}. This is not the 
> case if the {{flink-conf.yaml}} is based on a {{ConfigMap}}.
> Making things worse, failures updating the {{flink-conf.yaml}} are not 
> reported. Moreover, the called {{jobmanager.sh}} and {{taskmanager.sh}} 
> scripts don't support to pass in dynamic configuration properties into the 
> processes.
> I think the problem is that our assumption that we can modify the 
> {{flink-conf.yaml}} does not always hold true. If we updated the final 
> configuration from within the Flink process (dynamic properties and env 
> variables), then this problem could be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-29607) Simplify controller flow by introducing FlinkControllerContext

2023-01-05 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-29607.
--
Fix Version/s: kubernetes-operator-1.4.0
   Resolution: Fixed

merged to main 35604761eac2c75491811e731ffba540bc7e0363

> Simplify controller flow by introducing FlinkControllerContext
> --
>
> Key: FLINK-29607
> URL: https://issues.apache.org/jira/browse/FLINK-29607
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.4.0
>
>
> Currently contextual information such as observer/reconciler implementations, 
> flink service, status recorder, generated configs are created/passed around 
> in many pieces leading to a lot of overall code duplication in the system.
> We should introduce context object that capture these bits that could be 
> reused across the controller flow to simplify the logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-kubernetes-operator] gyfora merged pull request #501: [FLINK-29607] Introduce FlinkResourceContext to simplify observer/reconciler logic

2023-01-05 Thread GitBox



gyfora merged PR #501:
URL: https://github.com/apache/flink-kubernetes-operator/pull/501


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30231) Update to Fabric8 Kubernetes Client to a version that has automatic renewal of service account tokens

2023-01-05 Thread Martijn Visser (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655091#comment-17655091
 ] 

Martijn Visser commented on FLINK-30231:


[~cthomson] [~liadsh] This has been fixed for Flink 1.17. We don't backport 
these changes (since they introduce new functionality to patch versions of 
Flink) to Flink 1.15 or Flink 1.16. In case you need that, you can manually 
cherry-pick these changes and build a Flink version yourself with this patch. 

> Update to Fabric8 Kubernetes Client to a version that has automatic renewal 
> of service account tokens
> -
>
> Key: FLINK-30231
> URL: https://issues.apache.org/jira/browse/FLINK-30231
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.15.2
> Environment: Kubernetes 1.23 environment provided by Amazon Web 
> Services managed Kubernetes service (EKS), using Flink 1.15.2.
>Reporter: Chris Thomson
>Assignee: ouyangwulin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> The Fabric8 Kubernetes Client library was updated to account for Kubernetes 
> configuration changes that result in service account tokens becoming bounded 
> in duration, needing to be renewed after an hour. The AWS managed Kubernetes 
> service (AWS EKS) currently has a configuration change that extends the one 
> hour bounded duration for the account to 90 days but this will eventually be 
> removed by AWS and  produces warnings.
> It appears that Fabric8 Kubernetes Client library version 5.12.4 is the 
> closest version to 5.12.3 that is currently in use by the Apache Flink 
> project to contain https://github.com/fabric8io/kubernetes-client/issues/2271.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30231) Update to Fabric8 Kubernetes Client to a version that has automatic renewal of service account tokens

2023-01-05 Thread Martijn Visser (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser reassigned FLINK-30231:
--

Assignee: ouyangwulin

> Update to Fabric8 Kubernetes Client to a version that has automatic renewal 
> of service account tokens
> -
>
> Key: FLINK-30231
> URL: https://issues.apache.org/jira/browse/FLINK-30231
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.15.2
> Environment: Kubernetes 1.23 environment provided by Amazon Web 
> Services managed Kubernetes service (EKS), using Flink 1.15.2.
>Reporter: Chris Thomson
>Assignee: ouyangwulin
>Priority: Minor
>  Labels: pull-request-available
>
> The Fabric8 Kubernetes Client library was updated to account for Kubernetes 
> configuration changes that result in service account tokens becoming bounded 
> in duration, needing to be renewed after an hour. The AWS managed Kubernetes 
> service (AWS EKS) currently has a configuration change that extends the one 
> hour bounded duration for the account to 90 days but this will eventually be 
> removed by AWS and  produces warnings.
> It appears that Fabric8 Kubernetes Client library version 5.12.4 is the 
> closest version to 5.12.3 that is currently in use by the Apache Flink 
> project to contain https://github.com/fabric8io/kubernetes-client/issues/2271.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-30231) Update to Fabric8 Kubernetes Client to a version that has automatic renewal of service account tokens

2023-01-05 Thread Martijn Visser (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-30231.
--
Fix Version/s: 1.17.0
   Resolution: Fixed

Fixed in master: 1028ee3285257d39312a2c9b0f91847cca1a2e68

> Update to Fabric8 Kubernetes Client to a version that has automatic renewal 
> of service account tokens
> -
>
> Key: FLINK-30231
> URL: https://issues.apache.org/jira/browse/FLINK-30231
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.15.2
> Environment: Kubernetes 1.23 environment provided by Amazon Web 
> Services managed Kubernetes service (EKS), using Flink 1.15.2.
>Reporter: Chris Thomson
>Assignee: ouyangwulin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> The Fabric8 Kubernetes Client library was updated to account for Kubernetes 
> configuration changes that result in service account tokens becoming bounded 
> in duration, needing to be renewed after an hour. The AWS managed Kubernetes 
> service (AWS EKS) currently has a configuration change that extends the one 
> hour bounded duration for the account to 90 days but this will eventually be 
> removed by AWS and  produces warnings.
> It appears that Fabric8 Kubernetes Client library version 5.12.4 is the 
> closest version to 5.12.3 that is currently in use by the Apache Flink 
> project to contain https://github.com/fabric8io/kubernetes-client/issues/2271.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] MartijnVisser merged pull request #21599: [FLINK-30231][kubernetes] Update to Fabric8 Kubernetes Client to 5.12…

2023-01-05 Thread GitBox



MartijnVisser merged PR #21599:
URL: https://github.com/apache/flink/pull/21599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30562) CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+

2023-01-05 Thread Thomas Wozniakowski (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655070#comment-17655070
 ] 

Thomas Wozniakowski commented on FLINK-30562:
-

 [^flink-asf-30562-clean.zip] 

I've produced a (relatively) simple project here that reproduces the problem. 
Please let me know if you have any questions.

> CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+
> 
>
> Key: FLINK-30562
> URL: https://issues.apache.org/jira/browse/FLINK-30562
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Library / CEP
>Affects Versions: 1.16.0, 1.15.3
> Environment: Problem observed in:
> Production:
> Dockerised Flink cluster running in AWS Fargate, sourced from AWS Kinesis and 
> sink to AWS SQS
> Local:
> Completely local MiniCluster based test with no external sinks or sources
>Reporter: Thomas Wozniakowski
>Priority: Major
> Attachments: flink-asf-30562-clean.zip
>
>
> (Apologies for the speculative and somewhat vague ticket, but I wanted to 
> raise this while I am investigating to see if anyone has suggestions to help 
> me narrow down the problem.)
> We are encountering an issue where our streaming Flink job has stopped 
> working correctly since Flink 1.15.3. This problem is also present on Flink 
> 1.16.0. The Keyed CEP operators that our job uses are no longer emitting 
> Patterns reliably, but critically *this is only happening when parallelism is 
> set to a value greater than 1*. 
> Our local build tests were previously set up using in-JVM `MiniCluster` 
> instances, or dockerised Flink clusters all set with a parallelism of 1, so 
> this problem was not caught and it caused an outage when we upgraded the 
> cluster version in production.
> Observing the job using the Flink console in production, I can see that 
> events are *arriving* into the Keyed CEP operators, but no Pattern events are 
> being emitted out of any of the operators. Furthermore, all the reported 
> Watermark values are zero, though I don't know if that is a red herring as it 
> seems Watermark reporting seems to have changed since 1.14.x.
> I am currently attempting to create a stripped down version of our streaming 
> job to demonstrate the problem, but this is quite tricky to set up. In the 
> meantime I would appreciate any hints that could point me in the right 
> direction.
> I have isolated the problem to the Keyed CEP operator by removing our real 
> sinks and sources from the failing test. I am still seeing the erroneous 
> behaviour when setting up a job as:
> # Events are read from a list using `env.fromCollection( ... )`
> # CEP operator processes events
> # Output is captured in another list for assertions
> My best guess at the moment is something to do with Watermark emission? There 
> seems to have been changes related to watermark alignment, perhaps this has 
> caused some kind of regression in the CEP library? To reiterate, *this 
> problem only occurs with parallelism of 2 or more. Setting the parallelism to 
> 1 immediately fixes the issue*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30562) CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+

2023-01-05 Thread Thomas Wozniakowski (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Wozniakowski updated FLINK-30562:

Attachment: flink-asf-30562-clean.zip

> CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+
> 
>
> Key: FLINK-30562
> URL: https://issues.apache.org/jira/browse/FLINK-30562
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Library / CEP
>Affects Versions: 1.16.0, 1.15.3
> Environment: Problem observed in:
> Production:
> Dockerised Flink cluster running in AWS Fargate, sourced from AWS Kinesis and 
> sink to AWS SQS
> Local:
> Completely local MiniCluster based test with no external sinks or sources
>Reporter: Thomas Wozniakowski
>Priority: Major
> Attachments: flink-asf-30562-clean.zip
>
>
> (Apologies for the speculative and somewhat vague ticket, but I wanted to 
> raise this while I am investigating to see if anyone has suggestions to help 
> me narrow down the problem.)
> We are encountering an issue where our streaming Flink job has stopped 
> working correctly since Flink 1.15.3. This problem is also present on Flink 
> 1.16.0. The Keyed CEP operators that our job uses are no longer emitting 
> Patterns reliably, but critically *this is only happening when parallelism is 
> set to a value greater than 1*. 
> Our local build tests were previously set up using in-JVM `MiniCluster` 
> instances, or dockerised Flink clusters all set with a parallelism of 1, so 
> this problem was not caught and it caused an outage when we upgraded the 
> cluster version in production.
> Observing the job using the Flink console in production, I can see that 
> events are *arriving* into the Keyed CEP operators, but no Pattern events are 
> being emitted out of any of the operators. Furthermore, all the reported 
> Watermark values are zero, though I don't know if that is a red herring as it 
> seems Watermark reporting seems to have changed since 1.14.x.
> I am currently attempting to create a stripped down version of our streaming 
> job to demonstrate the problem, but this is quite tricky to set up. In the 
> meantime I would appreciate any hints that could point me in the right 
> direction.
> I have isolated the problem to the Keyed CEP operator by removing our real 
> sinks and sources from the failing test. I am still seeing the erroneous 
> behaviour when setting up a job as:
> # Events are read from a list using `env.fromCollection( ... )`
> # CEP operator processes events
> # Output is captured in another list for assertions
> My best guess at the moment is something to do with Watermark emission? There 
> seems to have been changes related to watermark alignment, perhaps this has 
> caused some kind of regression in the CEP library? To reiterate, *this 
> problem only occurs with parallelism of 2 or more. Setting the parallelism to 
> 1 immediately fixes the issue*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30576) JdbcOutputFormat refactor

2023-01-05 Thread Jira



[ 
https://issues.apache.org/jira/browse/FLINK-30576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655058#comment-17655058
 ] 

João Boto commented on FLINK-30576:
---

Moved to sub-task..

Or you prefer all changes on same PR

> JdbcOutputFormat refactor
> -
>
> Key: FLINK-30576
> URL: https://issues.apache.org/jira/browse/FLINK-30576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Priority: Major
>
> This refactor is to allow the use of JdbcOutputFormat on Sink2
> Actually the JdbcOutputFormat needs the RuntimeContext to check if 
> ObjectReuse is active or not..
> The refactor is for change from RuntimeContext to ExecutionConfig (we still 
> need that ExecutionConfig be available on Sink2.InitContext, and a FLIP will 
> be raised)
>  
> [~wanglijie] this is what we talk about



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30576) JdbcOutputFormat refactor

2023-01-05 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-30576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

João Boto updated FLINK-30576:
--
Parent: FLINK-28284
Issue Type: Sub-task  (was: Improvement)

> JdbcOutputFormat refactor
> -
>
> Key: FLINK-30576
> URL: https://issues.apache.org/jira/browse/FLINK-30576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Reporter: João Boto
>Priority: Major
>
> This refactor is to allow the use of JdbcOutputFormat on Sink2
> Actually the JdbcOutputFormat needs the RuntimeContext to check if 
> ObjectReuse is active or not..
> The refactor is for change from RuntimeContext to ExecutionConfig (we still 
> need that ExecutionConfig be available on Sink2.InitContext, and a FLIP will 
> be raised)
>  
> [~wanglijie] this is what we talk about



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] gaborgsomogyi commented on a diff in pull request #21604: [FLINK-30425][runtime][security] Generalize token receive side

2023-01-05 Thread GitBox



gaborgsomogyi commented on code in PR #21604:
URL: https://github.com/apache/flink/pull/21604#discussion_r1062654455


##
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DefaultDelegationTokenManager.java:
##
@@ -58,6 +59,13 @@
 @Internal
 public class DefaultDelegationTokenManager implements DelegationTokenManager {
 
+private static final String PROVIDER_RECEIVER_INCONSISTENCY_ERROR =
+" There is an inconsistency between loaded delegation token 
providers and receivers. "

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaborgsomogyi commented on a diff in pull request #21604: [FLINK-30425][runtime][security] Generalize token receive side

2023-01-05 Thread GitBox



gaborgsomogyi commented on code in PR #21604:
URL: https://github.com/apache/flink/pull/21604#discussion_r1062653461


##
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DefaultDelegationTokenManager.java:
##
@@ -118,10 +133,13 @@ private Map 
loadProviders() {
 provider.serviceName());
 }
 } catch (Exception | NoClassDefFoundError e) {
-LOG.warn(
+// The intentional general rule is that if a provider's init 
method throws exception
+// then stop the workload
+LOG.error(
 "Failed to initialize delegation token provider {}",
 provider.serviceName(),
 e);
+throw new RuntimeException(e);

Review Comment:
   Makes sense to use `FlinkRuntimeException`, changed on both places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] XComp commented on a diff in pull request #21604: [FLINK-30425][runtime][security] Generalize token receive side

2023-01-05 Thread GitBox



XComp commented on code in PR #21604:
URL: https://github.com/apache/flink/pull/21604#discussion_r1062636835


##
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DefaultDelegationTokenManager.java:
##
@@ -118,10 +133,13 @@ private Map 
loadProviders() {
 provider.serviceName());
 }
 } catch (Exception | NoClassDefFoundError e) {
-LOG.warn(
+// The intentional general rule is that if a provider's init 
method throws exception
+// then stop the workload
+LOG.error(
 "Failed to initialize delegation token provider {}",
 provider.serviceName(),
 e);
+throw new RuntimeException(e);

Review Comment:
   Shouldn't we do either logging or throwing the exception and if we decide to 
throw the exception, adding the log message as the error message might be 
useful. :thinking: `FlinkRuntimeException` might be useful since it indicates 
that the exception was actually initiated by Flink code. ...just as an idea.



##
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DefaultDelegationTokenManager.java:
##
@@ -58,6 +59,13 @@
 @Internal
 public class DefaultDelegationTokenManager implements DelegationTokenManager {
 
+private static final String PROVIDER_RECEIVER_INCONSISTENCY_ERROR =
+" There is an inconsistency between loaded delegation token 
providers and receivers. "

Review Comment:
   ```suggestion
   "There is an inconsistency between loaded delegation token 
providers and receivers. "
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30562) CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+

2023-01-05 Thread Thomas Wozniakowski (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Wozniakowski updated FLINK-30562:

Component/s: API / DataStream

> CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+
> 
>
> Key: FLINK-30562
> URL: https://issues.apache.org/jira/browse/FLINK-30562
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Library / CEP
>Affects Versions: 1.16.0, 1.15.3
> Environment: Problem observed in:
> Production:
> Dockerised Flink cluster running in AWS Fargate, sourced from AWS Kinesis and 
> sink to AWS SQS
> Local:
> Completely local MiniCluster based test with no external sinks or sources
>Reporter: Thomas Wozniakowski
>Priority: Major
>
> (Apologies for the speculative and somewhat vague ticket, but I wanted to 
> raise this while I am investigating to see if anyone has suggestions to help 
> me narrow down the problem.)
> We are encountering an issue where our streaming Flink job has stopped 
> working correctly since Flink 1.15.3. This problem is also present on Flink 
> 1.16.0. The Keyed CEP operators that our job uses are no longer emitting 
> Patterns reliably, but critically *this is only happening when parallelism is 
> set to a value greater than 1*. 
> Our local build tests were previously set up using in-JVM `MiniCluster` 
> instances, or dockerised Flink clusters all set with a parallelism of 1, so 
> this problem was not caught and it caused an outage when we upgraded the 
> cluster version in production.
> Observing the job using the Flink console in production, I can see that 
> events are *arriving* into the Keyed CEP operators, but no Pattern events are 
> being emitted out of any of the operators. Furthermore, all the reported 
> Watermark values are zero, though I don't know if that is a red herring as it 
> seems Watermark reporting seems to have changed since 1.14.x.
> I am currently attempting to create a stripped down version of our streaming 
> job to demonstrate the problem, but this is quite tricky to set up. In the 
> meantime I would appreciate any hints that could point me in the right 
> direction.
> I have isolated the problem to the Keyed CEP operator by removing our real 
> sinks and sources from the failing test. I am still seeing the erroneous 
> behaviour when setting up a job as:
> # Events are read from a list using `env.fromCollection( ... )`
> # CEP operator processes events
> # Output is captured in another list for assertions
> My best guess at the moment is something to do with Watermark emission? There 
> seems to have been changes related to watermark alignment, perhaps this has 
> caused some kind of regression in the CEP library? To reiterate, *this 
> problem only occurs with parallelism of 2 or more. Setting the parallelism to 
> 1 immediately fixes the issue*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30562) CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+

2023-01-05 Thread Thomas Wozniakowski (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Wozniakowski updated FLINK-30562:

Summary: CEP Operator misses patterns on SideOutputs and parallelism >1 
since 1.15.x+  (was: Patterns are not emitted with parallelism >1 since 1.15.x+)

> CEP Operator misses patterns on SideOutputs and parallelism >1 since 1.15.x+
> 
>
> Key: FLINK-30562
> URL: https://issues.apache.org/jira/browse/FLINK-30562
> Project: Flink
>  Issue Type: Bug
>  Components: Library / CEP
>Affects Versions: 1.16.0, 1.15.3
> Environment: Problem observed in:
> Production:
> Dockerised Flink cluster running in AWS Fargate, sourced from AWS Kinesis and 
> sink to AWS SQS
> Local:
> Completely local MiniCluster based test with no external sinks or sources
>Reporter: Thomas Wozniakowski
>Priority: Major
>
> (Apologies for the speculative and somewhat vague ticket, but I wanted to 
> raise this while I am investigating to see if anyone has suggestions to help 
> me narrow down the problem.)
> We are encountering an issue where our streaming Flink job has stopped 
> working correctly since Flink 1.15.3. This problem is also present on Flink 
> 1.16.0. The Keyed CEP operators that our job uses are no longer emitting 
> Patterns reliably, but critically *this is only happening when parallelism is 
> set to a value greater than 1*. 
> Our local build tests were previously set up using in-JVM `MiniCluster` 
> instances, or dockerised Flink clusters all set with a parallelism of 1, so 
> this problem was not caught and it caused an outage when we upgraded the 
> cluster version in production.
> Observing the job using the Flink console in production, I can see that 
> events are *arriving* into the Keyed CEP operators, but no Pattern events are 
> being emitted out of any of the operators. Furthermore, all the reported 
> Watermark values are zero, though I don't know if that is a red herring as it 
> seems Watermark reporting seems to have changed since 1.14.x.
> I am currently attempting to create a stripped down version of our streaming 
> job to demonstrate the problem, but this is quite tricky to set up. In the 
> meantime I would appreciate any hints that could point me in the right 
> direction.
> I have isolated the problem to the Keyed CEP operator by removing our real 
> sinks and sources from the failing test. I am still seeing the erroneous 
> behaviour when setting up a job as:
> # Events are read from a list using `env.fromCollection( ... )`
> # CEP operator processes events
> # Output is captured in another list for assertions
> My best guess at the moment is something to do with Watermark emission? There 
> seems to have been changes related to watermark alignment, perhaps this has 
> caused some kind of regression in the CEP library? To reiterate, *this 
> problem only occurs with parallelism of 2 or more. Setting the parallelism to 
> 1 immediately fixes the issue*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 >

1 - 100 of 241 matches

Mail list logo