[GitHub] [flink] luoyuxia commented on pull request #21149: [FLINK-29527][formats/parquet] Make unknownFieldsIndices work for single ParquetReader

2022-12-04 Thread GitBox


luoyuxia commented on PR #21149:
URL: https://github.com/apache/flink/pull/21149#issuecomment-1336860164

   @saLeox Thanks for contribution.
   I had a quick look about the changes.  These changes looks reasonable to me.
   @Tartarus0zm Could you please have a look again to check these changes since 
you are the author of [FLINK-23715]?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #410: [FLINK-30247] Introduce time travel reading for table store

2022-12-04 Thread GitBox


tsreaper commented on code in PR #410:
URL: https://github.com/apache/flink-table-store/pull/410#discussion_r1039215031


##
docs/layouts/shortcodes/generated/core_configuration.html:
##
@@ -8,6 +8,18 @@
 
 
 
+
+as-of-snapshot
+(none)
+Long
+Read snapshot specific by snapshot id.
+
+
+as-of-timestamp-mills

Review Comment:
   Why introduce a new configuration? Time travel reading is actually very 
similar to the "from-timestamp" startup mode in streaming jobs. I'm considering 
supporting startup mode for both batch and streaming jobs. See 
https://issues.apache.org/jira/browse/FLINK-30294 .



##
docs/content/docs/development/query-table.md:
##
@@ -126,3 +126,14 @@ SELECT * FROM MyTable$options;
 +++
 1 rows in set
 ```
+
+## Time Travel reading
+
+You can read snapshot specific by commit time or snapshot id from a table.
+```sql
+-- Read snapshot specific by commit time.
+SELECT * FROM T /+ OPTIONS('as-of-timestamp-mills'='121230')/;

Review Comment:
   Should be `/*+ OPTIONS(...) */`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-30294) Change table property key 'log.scan' to 'startup.mode' and add a default startup mode in Table Store

2022-12-04 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-30294:
---

 Summary: Change table property key 'log.scan' to 'startup.mode' 
and add a default startup mode in Table Store
 Key: FLINK-30294
 URL: https://issues.apache.org/jira/browse/FLINK-30294
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Caizhi Weng
Assignee: Caizhi Weng


We're introducing time-travel reading of Table Store for batch jobs. However 
this reading mode is quite similar to the "from-timestamp" startup mode for 
streaming jobs, just that "from-timestamp" streaming jobs only consume 
incremental data but not history data.

We can support startup mode for both batch and streaming jobs. For batch jobs, 
"from-timestamp" startup mode will produce all records from the last snapshot 
before the specified timestamp. For streaming jobs the behavior doesn't change.

Previously, in order to use "from-timestamp" startup mode, users will have to 
specify "log.scan" and also "log.scan.timestamp-millis", which is a little 
inconvenient. We can introduce a "default" startup mode and its behavior will 
base on the execution environment and other configurations. In this way, to use 
"from-timestamp" startup mode, it is enough for users to specify just 
"startup.timestamp-millis".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2022-12-04 Thread Samrat Deb (Jira)


[ https://issues.apache.org/jira/browse/FLINK-28513 ]


Samrat Deb deleted comment on FLINK-28513:


was (Author: samrat007):
 
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:83)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:256)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:247)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:240)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:738)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:715)
at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78)
at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:309)
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:307)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:222)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:84)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:168)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.UnsupportedOperationException: Cannot sync state to system like S3. 
Use persist() to create a persistent recoverable intermediate point.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.completeProcessing(SourceStreamTask.java:363)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:335)
Caused by: java.lang.UnsupportedOperationException: Cannot sync state to system 
like S3. Use persist() to create a persistent recoverable intermediate point.
at 
org.apache.flink.core.fs.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
at 
org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
at 
org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:111)
at 
org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:645)
at 

[jira] [Commented] (FLINK-28513) Flink Table API CSV streaming sink throws SerializedThrowable exception

2022-12-04 Thread Samrat Deb (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643135#comment-17643135
 ] 

Samrat Deb commented on FLINK-28513:


 
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:83)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:256)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:247)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:240)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:738)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:715)
at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78)
at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:309)
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:307)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:222)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:84)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:168)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.UnsupportedOperationException: Cannot sync state to system like S3. 
Use persist() to create a persistent recoverable intermediate point.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.completeProcessing(SourceStreamTask.java:363)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:335)
Caused by: java.lang.UnsupportedOperationException: Cannot sync state to system 
like S3. Use persist() to create a persistent recoverable intermediate point.
at 
org.apache.flink.core.fs.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
at 
org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
at 
org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:111)
at 
org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:645)

[jira] [Updated] (FLINK-30293) Create an enumerator for static (batch)

2022-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30293:
---
Labels: pull-request-available  (was: )

> Create an enumerator for static (batch)
> ---
>
> Key: FLINK-30293
> URL: https://issues.apache.org/jira/browse/FLINK-30293
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.3.0
>
>
> In FLINK-30207, we have created enumerator for continuous.
> We should have an enumerator for static (batch).
> For example, for the current read-compacted, time traveling may specify the 
> commit time to read snapshots in the future.
> I think these capabilities need to be in the core, but should they be in 
> scan? (It seems that it should not)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] zjureel opened a new pull request, #421: [FLINK-30293] Add static snapshot enumerator

2022-12-04 Thread GitBox


zjureel opened a new pull request, #421:
URL: https://github.com/apache/flink-table-store/pull/421

   Add a `StaticSnapshotEnumerator` to generate snapshots for 
`StaticFileStoreSource` in scan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-30254) Sync Pulsar updates to external Pulsar connector repository

2022-12-04 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643129#comment-17643129
 ] 

Martijn Visser commented on FLINK-30254:


[~syhily] This will probably be done today or tomorrow at latest

> Sync Pulsar updates to external Pulsar connector repository
> ---
>
> Key: FLINK-30254
> URL: https://issues.apache.org/jira/browse/FLINK-30254
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Pulsar
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>
> Currently the external Pulsar repository contains the code from the 
> {release-1.16} branch. This should be synced with the changes that are merged 
> into {master} since. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30254) Sync Pulsar updates to external Pulsar connector repository

2022-12-04 Thread Yufan Sheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643128#comment-17643128
 ] 

Yufan Sheng commented on FLINK-30254:
-

[~martijnvisser] I want to migrate all the pending PRs to new repository. When 
can we get this resolved?

> Sync Pulsar updates to external Pulsar connector repository
> ---
>
> Key: FLINK-30254
> URL: https://issues.apache.org/jira/browse/FLINK-30254
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Pulsar
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>
> Currently the external Pulsar repository contains the code from the 
> {release-1.16} branch. This should be synced with the changes that are merged 
> into {master} since. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #183: [FLINK-29603] Add Transformer for StopWordsRemover

2022-12-04 Thread GitBox


jiangxin369 commented on code in PR #183:
URL: https://github.com/apache/flink-ml/pull/183#discussion_r1039164531


##
docs/content/docs/operators/feature/stopwordsremover.md:
##
@@ -0,0 +1,165 @@
+---
+title: "StopWordsRemover"
+weight: 1
+type: docs
+aliases:
+- /operators/feature/stopwordsremover.html
+---
+
+
+
+## StopWordsRemover
+
+A feature transformer that filters out stop words from input.
+
+Note: null values from input array are preserved unless adding null to 
stopWords
+explicitly.
+
+See Also: http://en.wikipedia.org/wiki/Stop_words;>Stop words
+(Wikipedia)
+
+### Input Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| inputCols  | String[] | `null`  | Arrays of strings containing stop words to 
remove. |
+
+### Output Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| outputCols | String[] | `null`  | Arrays of strings with stop words removed. 
|
+
+### Parameters
+
+| Key   | Default| Type
 | Required | Description   
 |
+|---||--|--||
+| inputCols | `null` | 
String[] | yes  | Input column names.   
 |
+| outputCols| `null` | 
String[] | yes  | Output column name.   
 |
+| stopWords | `StopWordsRemover.loadDefaultStopWords("english")` | 
String[] | no   | The words to be filtered out. 
 |
+| caseSensitive | `false`| Boolean 
 | no   | Whether to do a case-sensitive comparison over the stop words.
 |
+| locale| `StopWordsRemover.getDefaultOrUS().toString()` | String  
 | no   | Locale of the input for case insensitive matching. Ignored when 
caseSensitive is true. |

Review Comment:
   But we always write the markdown docs in Java style instead of Python, just 
like the default value of locale `StopWordsRemover.getDefaultOrUS().toString()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #183: [FLINK-29603] Add Transformer for StopWordsRemover

2022-12-04 Thread GitBox


jiangxin369 commented on code in PR #183:
URL: https://github.com/apache/flink-ml/pull/183#discussion_r1039164531


##
docs/content/docs/operators/feature/stopwordsremover.md:
##
@@ -0,0 +1,165 @@
+---
+title: "StopWordsRemover"
+weight: 1
+type: docs
+aliases:
+- /operators/feature/stopwordsremover.html
+---
+
+
+
+## StopWordsRemover
+
+A feature transformer that filters out stop words from input.
+
+Note: null values from input array are preserved unless adding null to 
stopWords
+explicitly.
+
+See Also: http://en.wikipedia.org/wiki/Stop_words;>Stop words
+(Wikipedia)
+
+### Input Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| inputCols  | String[] | `null`  | Arrays of strings containing stop words to 
remove. |
+
+### Output Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| outputCols | String[] | `null`  | Arrays of strings with stop words removed. 
|
+
+### Parameters
+
+| Key   | Default| Type
 | Required | Description   
 |
+|---||--|--||
+| inputCols | `null` | 
String[] | yes  | Input column names.   
 |
+| outputCols| `null` | 
String[] | yes  | Output column name.   
 |
+| stopWords | `StopWordsRemover.loadDefaultStopWords("english")` | 
String[] | no   | The words to be filtered out. 
 |
+| caseSensitive | `false`| Boolean 
 | no   | Whether to do a case-sensitive comparison over the stop words.
 |
+| locale| `StopWordsRemover.getDefaultOrUS().toString()` | String  
 | no   | Locale of the input for case insensitive matching. Ignored when 
caseSensitive is true. |

Review Comment:
   But we always write the markdown docs in Java style instead of Python, just 
like the default value of `StopWordsRemover.getDefaultOrUS().toString()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] fsk119 commented on pull request #21133: [FLINK-29732][sql-gateway] support configuring session with SQL statement.

2022-12-04 Thread GitBox


fsk119 commented on PR #21133:
URL: https://github.com/apache/flink/pull/21133#issuecomment-1336779615

   I have added a commit to improve the codes. Could you take a time to look, 
@yuzelin ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] (FLINK-27176) FLIP-220: Binary Sorted State

2022-12-04 Thread Echo Lee (Jira)


[ https://issues.apache.org/jira/browse/FLINK-27176 ]


Echo Lee deleted comment on FLINK-27176:
--

was (Author: echo lee):
Hi [~danderson] 

Is there any progress on this feature? We want to optimize temporal join 
operator. I think this idea is very good.

I tried your POC, but found that the performance is lower than before. What do 
you think?

> FLIP-220: Binary Sorted State
> -
>
> Key: FLINK-27176
> URL: https://issues.apache.org/jira/browse/FLINK-27176
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Major
>
> Task for implementing FLIP-220: Binary Sorted State: 
> [https://cwiki.apache.org/confluence/x/Xo_FD]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #21453: [hotfix][table-planner] fix tests for FLINK-28988 merged(no code conflict with latest master but a related test case failed)

2022-12-04 Thread GitBox


flinkbot commented on PR #21453:
URL: https://github.com/apache/flink/pull/21453#issuecomment-1336746345

   
   ## CI report:
   
   * 812fe5920a60dfc085c592b4da3802f88201605a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] lincoln-lil opened a new pull request, #21453: [hotfix][table-planner] fix tests for FLINK-28988 merged(no code conflict with latest master but a related test case failed)

2022-12-04 Thread GitBox


lincoln-lil opened a new pull request, #21453:
URL: https://github.com/apache/flink/pull/21453

   FLINK-28988 fixed the incorrectly filter pushdown issue, but when merged 
into master it affects a related test case (added by FLINK-29781 recently), 
this pr will fix this fail case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] zjureel commented on pull request #410: [FLINK-30247] Introduce time travel reading for table store

2022-12-04 Thread GitBox


zjureel commented on PR #410:
URL: 
https://github.com/apache/flink-table-store/pull/410#issuecomment-1336743159

   > 
   
   Pretty good! I will fix https://issues.apache.org/jira/browse/FLINK-30293 
first, please assign it to me, THX


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #21452: [FLINK-30282] Fix Logical type ROW lost inner field's nullability after converting to RelDataType

2022-12-04 Thread GitBox


flinkbot commented on PR #21452:
URL: https://github.com/apache/flink/pull/21452#issuecomment-1336741875

   
   ## CI report:
   
   * c3bdb35efe0e90698a18302794ed37e9449b5279 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] LadyForest opened a new pull request, #21452: [FLINK-30282] Fix Logical type ROW lost inner field's nullability after converting to RelDataType

2022-12-04 Thread GitBox


LadyForest opened a new pull request, #21452:
URL: https://github.com/apache/flink/pull/21452

   ## What is the purpose of the change
   
   This pull request fixes the issue that Flink SQL's logical type `Row` lost 
the inner field's nullability after converting to `RelDataType`. E.g. `a Row` is converted to `a Row`
   
   
   ## Brief change log
   -  Let`ExtendedSqlRowTypeNameSpec` pass the inner field's nullability when 
deriving type.
   - Specify the `StructKind.PEEK_FIELDS_NO_EXPAND` to create a struct type. 
This is required because in `FlinkTypeFactory#createTypeWithNullability`, only 
`PEEK_FIELDS_NO_EXPAND` preserves the inner field's nullability.
   
   
   ## Verifying this change
   
   This change is already covered by existing tests 
`SqlToOperationConverterTest#testCreateTableWithFullDataTypes`. The nullability 
check has been tweaked. However, the issue that the precision of the `TIME` and 
`Row` type's inner comment loss still exists, so the `TODO` is not removed. 
Beyond this, the `table.q` file also covers the change check.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduces a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-30282) Logical type ROW lost inner field's nullability after convert to RelDataType

2022-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30282:
---
Labels: pull-request-available  (was: )

> Logical type ROW lost inner field's nullability after convert to RelDataType
> 
>
> Key: FLINK-30282
> URL: https://issues.apache.org/jira/browse/FLINK-30282
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.16.0, 1.16.1
>Reporter: Jane Chan
>Priority: Major
>  Labels: pull-request-available
>
> h3. Issue History
> This is not a new issue, FLINK-13604 has tracked it before, and FLINK-16344 
> spared efforts to fix it (but did not tweak the ut case mentioned in 
> FLINK-13604, i.e. 
> SqlToOperationConverterTest#testCreateTableWithFullDataTypes). Nevertheless, 
> the FunctionITCase added by FLINK-16344, which validates the fix, has been 
> removed in FLINK-16377. 
> h3. How to Reproduce
>  c.c2 lost nullability
> {code:java}
> Flink SQL> create table dummy (a array not null, b array not null>, c row) with ('connector' = 'datagen');
> [INFO] Execute statement succeed.
> Flink SQL> desc dummy;
> +--++---+-++---+
> | name |                       type |  null | key | extras | watermark |
> +--++---+-++---+
> |    a |        ARRAY | FALSE |     |        |           |
> |    b |     ARRAY |  TRUE |     |        |           |
> |    c | ROW<`c1` INT, `c2` DOUBLE> |  TRUE |     |        |           |
> +--++---+-++---+
> 3 rows in set
> {code}
> h3. Root Cause
> Two places are causing this problem in ExtendedSqlRowTypeNameSpec.
> 1. dt.deriveType should also pass dt's nullability as well. See 
> [https://github.com/apache/flink/blob/fb27e6893506006b9a3b1ac3e9b878fb6cad061a/flink-table/flink-sql-parser/src/main/java/org/apache/flink/sql/parser/type/ExtendedSqlRowTypeNameSpec.java#L159]
>  
> 2. StructKind should be PEEK_FIELDS_NO_EXPAND instead of FULLY_QUALIFIED(see 
> [https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/type/StructKind.java]),
>  so that FlinkTypeFactory#createTypeWithNullability will not fall back to 
> super implement. See 
> [https://github.com/apache/flink/blob/fb27e6893506006b9a3b1ac3e9b878fb6cad061a/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/calcite/FlinkTypeFactory.scala#L417]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] maosuhan commented on a diff in pull request #21436: [FLINK-30093] [formats] Protobuf Timestamp Compile Error

2022-12-04 Thread GitBox


maosuhan commented on code in PR #21436:
URL: https://github.com/apache/flink/pull/21436#discussion_r1039121017


##
flink-formats/flink-protobuf/src/test/java/org/apache/flink/formats/protobuf/Pb3ToRowTest.java:
##
@@ -85,6 +88,8 @@ public void testDeserialization() throws Exception {
 
 assertEquals(1, rowData.getInt(0));
 assertEquals(2L, rowData.getLong(1));
+
+assertEquals(ts.getNanos(), row.getTimestamp(11, 
3).getNanoOfMillisecond());

Review Comment:
   @hdulay Yes, I agree with your idea. RowType must be doable to reflect a 
Timestamp pb type. If you want to make it more easy to use, you can treat 
`google.protobuf.Timestamp` a special type and manually convert Timestamp to 
TimestampData in code. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #21451: [hotfix][docs] Update the link for `concepts'

2022-12-04 Thread GitBox


flinkbot commented on PR #21451:
URL: https://github.com/apache/flink/pull/21451#issuecomment-1336707580

   
   ## CI report:
   
   * 811579374aa63e4a411ce2b1d552758cdd0b48b2 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] ChenZhongPu opened a new pull request, #21451: [hotfix][docs] Update the link for `concepts'

2022-12-04 Thread GitBox


ChenZhongPu opened a new pull request, #21451:
URL: https://github.com/apache/flink/pull/21451

   - The `concepts` link in try-flink/local_installation#Summary is broken, and 
it leads an empty page. (ref "docs/concepts" -> ref "docs/concepts/overview")
   - For consistency, the first letter should be in upper case. (concepts -> 
Concepts)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] (FLINK-29527) Make unknownFieldsIndices work for single ParquetReader

2022-12-04 Thread Sun Shun (Jira)


[ https://issues.apache.org/jira/browse/FLINK-29527 ]


Sun Shun deleted comment on FLINK-29527:
--

was (Author: JIRAUSER283602):
[~lirui] could you please help take a look at this PR when you are free, thanks

> Make unknownFieldsIndices work for single ParquetReader
> ---
>
> Key: FLINK-29527
> URL: https://issues.apache.org/jira/browse/FLINK-29527
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Sun Shun
>Assignee: Sun Shun
>Priority: Major
>  Labels: pull-request-available
>
> Currently, from the improvement FLINK-23715, Flink use a collection named 
> `unknownFieldsIndices` to track the nonexistent fields, and it is kept inside 
> the `ParquetVectorizedInputFormat`, and applied to all parquet files under 
> given path.
> However, some fields may only be nonexistent in some of the historical 
> parquet files, while exist in latest ones. And based on 
> `unknownFieldsIndices`, flink will always skip these fields, even thought 
> they are existing in the later parquets.
> As a result, the value of these fields will become empty when they are 
> nonexistent in some historical parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] yunfengzhou-hub commented on a diff in pull request #183: [FLINK-29603] Add Transformer for StopWordsRemover

2022-12-04 Thread GitBox


yunfengzhou-hub commented on code in PR #183:
URL: https://github.com/apache/flink-ml/pull/183#discussion_r1039104387


##
docs/content/docs/operators/feature/stopwordsremover.md:
##
@@ -0,0 +1,165 @@
+---
+title: "StopWordsRemover"
+weight: 1
+type: docs
+aliases:
+- /operators/feature/stopwordsremover.html
+---
+
+
+
+## StopWordsRemover
+
+A feature transformer that filters out stop words from input.
+
+Note: null values from input array are preserved unless adding null to 
stopWords
+explicitly.
+
+See Also: http://en.wikipedia.org/wiki/Stop_words;>Stop words
+(Wikipedia)
+
+### Input Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| inputCols  | String[] | `null`  | Arrays of strings containing stop words to 
remove. |
+
+### Output Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| outputCols | String[] | `null`  | Arrays of strings with stop words removed. 
|
+
+### Parameters
+
+| Key   | Default| Type
 | Required | Description   
 |
+|---||--|--||
+| inputCols | `null` | 
String[] | yes  | Input column names.   
 |
+| outputCols| `null` | 
String[] | yes  | Output column name.   
 |
+| stopWords | `StopWordsRemover.loadDefaultStopWords("english")` | 
String[] | no   | The words to be filtered out. 
 |
+| caseSensitive | `false`| Boolean 
 | no   | Whether to do a case-sensitive comparison over the stop words.
 |
+| locale| `StopWordsRemover.getDefaultOrUS().toString()` | String  
 | no   | Locale of the input for case insensitive matching. Ignored when 
caseSensitive is true. |

Review Comment:
   It should be avoided to let python users learn Java. Can we solve this 
problem by providing `StopWordsRemover.get_available_locales()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] yunfengzhou-hub commented on a diff in pull request #183: [FLINK-29603] Add Transformer for StopWordsRemover

2022-12-04 Thread GitBox


yunfengzhou-hub commented on code in PR #183:
URL: https://github.com/apache/flink-ml/pull/183#discussion_r1039103892


##
flink-ml-lib/src/main/resources/org/apache/flink/ml/feature/stopwords/README:
##
@@ -0,0 +1,12 @@
+Stopwords Corpus

Review Comment:
   When the amount of words is relatively small, I think the plain text has 
equivalent or better readability. Existing practice can be found at 
https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/README.txt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dupen01 closed pull request #16947: update hive_function.md

2022-12-04 Thread GitBox


dupen01 closed pull request #16947: update hive_function.md
URL: https://github.com/apache/flink/pull/16947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] godfreyhe merged pull request #20745: [FLINK-28988] Don't push filters down into the right table for temporal join

2022-12-04 Thread GitBox


godfreyhe merged PR #20745:
URL: https://github.com/apache/flink/pull/20745


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] yunfengzhou-hub commented on pull request #171: [FLINK-29911] Improve performance of AgglomerativeClustering

2022-12-04 Thread GitBox


yunfengzhou-hub commented on PR #171:
URL: https://github.com/apache/flink-ml/pull/171#issuecomment-1336653558

   Thanks for the update! It LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #414: [FLINK-30263] Introduce schemas table in table store

2022-12-04 Thread GitBox


JingsongLi commented on code in PR #414:
URL: https://github.com/apache/flink-table-store/pull/414#discussion_r1039094224


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/table/metadata/SchemasTable.java:
##
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.table.metadata;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.data.GenericArrayData;
+import org.apache.flink.table.data.GenericMapData;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.data.StringData;
+import org.apache.flink.table.store.file.predicate.Predicate;
+import org.apache.flink.table.store.file.schema.SchemaManager;
+import org.apache.flink.table.store.file.schema.TableSchema;
+import org.apache.flink.table.store.file.utils.IteratorRecordReader;
+import org.apache.flink.table.store.file.utils.RecordReader;
+import org.apache.flink.table.store.file.utils.SerializationUtils;
+import org.apache.flink.table.store.table.Table;
+import org.apache.flink.table.store.table.source.Split;
+import org.apache.flink.table.store.table.source.TableRead;
+import org.apache.flink.table.store.table.source.TableScan;
+import org.apache.flink.table.store.utils.ProjectedRowData;
+import org.apache.flink.table.types.logical.ArrayType;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.MapType;
+import org.apache.flink.table.types.logical.RowType;
+
+import org.apache.flink.shaded.guava30.com.google.common.collect.Iterators;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+import static 
org.apache.flink.table.store.file.catalog.Catalog.METADATA_TABLE_SPLITTER;
+
+/** A {@link Table} for showing schemas of table. */
+public class SchemasTable implements Table {
+
+private static final long serialVersionUID = 1L;
+
+public static final String SCHEMAS = "schemas";
+
+public static final RowType TABLE_TYPE =
+new RowType(
+Arrays.asList(
+new RowType.RowField("schema_id", new 
BigIntType(false)),
+new RowType.RowField(
+"fields",
+new ArrayType(
+new RowType(
+Arrays.asList(
+new 
RowType.RowField(
+"id", new 
IntType(false)),
+new 
RowType.RowField(
+"name",
+
SerializationUtils
+
.newStringType(false)),
+new 
RowType.RowField(
+"type",
+
SerializationUtils
+
.newStringType(false)),
+new 
RowType.RowField(
+
"description",
+
SerializationUtils
+
.newStringType(
+   
 true)),
+new RowType.RowField(
+"partition_keys",
+

[jira] [Closed] (FLINK-30207) Move split initialization and discovery logic fully into SnapshotEnumerator in Table Store

2022-12-04 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-30207.

Resolution: Fixed

master: 23dad789879735249651992ab9fe23b986ef1564

> Move split initialization and discovery logic fully into SnapshotEnumerator 
> in Table Store
> --
>
> Key: FLINK-30207
> URL: https://issues.apache.org/jira/browse/FLINK-30207
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Affects Versions: table-store-0.3.0
>Reporter: Caizhi Weng
>Assignee: Caizhi Weng
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.3.0
>
>
> It is possible that the separated compact job is started long after the write 
> jobs. so compact job sources need a special split initialization logic: we 
> will find the latest COMPACT snapshot, and start compacting right after this 
> snapshot.
> However, split initialization logic are currently coded into 
> {{FileStoreSource}}. We should extract these logic into 
> {{SnapshotEnumerator}} so that we can create our special 
> {{SnapshotEnumerator}} for compact sources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi merged pull request #411: [FLINK-30207] Move split initialization and discovery logic fully into SnapshotEnumerator in Table Store

2022-12-04 Thread GitBox


JingsongLi merged PR #411:
URL: https://github.com/apache/flink-table-store/pull/411


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #410: [FLINK-30247] Introduce time travel reading for table store

2022-12-04 Thread GitBox


JingsongLi commented on code in PR #410:
URL: https://github.com/apache/flink-table-store/pull/410#discussion_r1039089223


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/operation/AbstractFileStoreScan.java:
##
@@ -184,10 +203,14 @@ public Plan plan() {
 Long snapshotId = specifiedSnapshotId;
 if (manifests == null) {
 if (snapshotId == null) {
-snapshotId =
-readCompacted
-? snapshotManager.latestCompactedSnapshotId()
-: snapshotManager.latestSnapshotId();
+if (timestampMills == null) {
+snapshotId =
+readCompacted
+? 
snapshotManager.latestCompactedSnapshotId()
+: snapshotManager.latestSnapshotId();
+} else {
+snapshotId = 
snapshotManager.earlierThanTimeMills(timestampMills);

Review Comment:
   Maybe we should use `earlier Than or equals`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-30293) Create an enumerator for static (batch)

2022-12-04 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-30293:


 Summary: Create an enumerator for static (batch)
 Key: FLINK-30293
 URL: https://issues.apache.org/jira/browse/FLINK-30293
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.3.0


In FLINK-30207, we have created enumerator for continuous.
We should have an enumerator for static (batch).
For example, for the current read-compacted, time traveling may specify the 
commit time to read snapshots in the future.
I think these capabilities need to be in the core, but should they be in scan? 
(It seems that it should not)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #416: [FLINK-30271] Introduce Table.copy from dynamic options

2022-12-04 Thread GitBox


JingsongLi commented on code in PR #416:
URL: https://github.com/apache/flink-table-store/pull/416#discussion_r1039087564


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/table/AbstractFileStoreTable.java:
##
@@ -40,6 +47,43 @@ public AbstractFileStoreTable(Path path, TableSchema 
tableSchema) {
 
 protected abstract FileStore store();
 
+protected abstract FileStoreTable copy(TableSchema newTableSchema);
+
+@Override
+public FileStoreTable copy(Map dynamicOptions) {
+// check option is not immutable
+Map options = tableSchema.options();
+dynamicOptions.forEach(
+(k, v) -> {
+if (!Objects.equals(v, options.get(k))) {

Review Comment:
   The immutable field may be exist in the dynamic parameters.
   For example, for Flink, Flink planner just pass a options merged table 
options and dynamic options.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] Myracle commented on pull request #21347: [FLINK-29640][Client/Job Submission]Enhance the function configured by execution.shutdown-on-attached-exi…

2022-12-04 Thread GitBox


Myracle commented on PR #21347:
URL: https://github.com/apache/flink/pull/21347#issuecomment-1336607971

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-30292) Better support for conversion between DataType and TypeInformation

2022-12-04 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-30292:


 Summary: Better support for conversion between DataType and 
TypeInformation
 Key: FLINK-30292
 URL: https://issues.apache.org/jira/browse/FLINK-30292
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.15.3
Reporter: Yunfeng Zhou


In Flink 1.15, we have the following ways to convert a DataType to a 
TypeInformation. Each of them has some disadvantages.

* `TypeConversions.fromDataTypeToLegacyInfo`
It might lead to precision losses in face of some data types like timestamp.
It has been deprecated.
* `ExternalTypeInfo.of`
It cannot be used to get detailed type information like `RowTypeInfo`
It might bring some serialization overhead.

Given that the ways mentioned above are both not perfect,  Flink SQL should 
provide a better API to support DataType-TypeInformation conversions, and thus 
better support Table-DataStream conversions.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #21450: [FLINK-30291][Connector/DynamoDB] Update docs to render DynamoDB conn…

2022-12-04 Thread GitBox


flinkbot commented on PR #21450:
URL: https://github.com/apache/flink/pull/21450#issuecomment-1336574515

   
   ## CI report:
   
   * c89ac59bba265345ff5d3757492ea9bd8698e3df UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dannycranmer opened a new pull request, #21450: [FLINK-30291][Connector/DynamoDB] Update docs to render DynamoDB conn…

2022-12-04 Thread GitBox


dannycranmer opened a new pull request, #21450:
URL: https://github.com/apache/flink/pull/21450

   ## What is the purpose of the change
   
   Integrate flink-connector-aws into Flink docs
   
   Related https://github.com/apache/flink/pull/21449
   
   ## Brief change log
   
   - Add a new short_code (`sql_connector_download_table`) to render SQL 
connector info for externalized connectors
   - Integrate AWS connectors, flink-connector-dynamodb
   
   
   ## Verifying this change
   
   Rendered web pages locally
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #21449: [FLINK-30291][Connector/DynamoDB] Update docs to render DynamoDB conn…

2022-12-04 Thread GitBox


flinkbot commented on PR #21449:
URL: https://github.com/apache/flink/pull/21449#issuecomment-1336572479

   
   ## CI report:
   
   * 73129e07b3e7076ed0b0bd2dbe9c89ac8b860972 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-30291) Integrate flink-connector-aws into Flink docs

2022-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30291:
---
Labels: pull-request-available  (was: )

> Integrate flink-connector-aws into Flink docs
> -
>
> Key: FLINK-30291
> URL: https://issues.apache.org/jira/browse/FLINK-30291
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS, Documentation
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.1
>
>
> Update the docs render to integrate {{{}flink-connector-aws{}}}.
> Add a new shortcode to handle rendering the SQL connector correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dannycranmer opened a new pull request, #21449: [FLINK-30291][Connector/DynamoDB] Update docs to render DynamoDB conn…

2022-12-04 Thread GitBox


dannycranmer opened a new pull request, #21449:
URL: https://github.com/apache/flink/pull/21449

   ## What is the purpose of the change
   
   Integrate flink-connector-aws into Flink docs
   
   
   ## Brief change log
   
   - Add a new short_code (`sql_connector_download_table`) to render SQL 
connector info for externalized connectors
   - Integrate AWS connectors, flink-connector-dynamodb
   
   
   ## Verifying this change
   
   Rendered web pages locally
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-30291) Integrate flink-connector-aws into Flink docs

2022-12-04 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer reassigned FLINK-30291:
-

Assignee: Danny Cranmer

> Integrate flink-connector-aws into Flink docs
> -
>
> Key: FLINK-30291
> URL: https://issues.apache.org/jira/browse/FLINK-30291
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / AWS, Documentation
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
> Fix For: 1.17.0, 1.16.1
>
>
> Update the docs render to integrate {{{}flink-connector-aws{}}}.
> Add a new shortcode to handle rendering the SQL connector correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30291) Integrate flink-connector-aws into Flink docs

2022-12-04 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-30291:
-

 Summary: Integrate flink-connector-aws into Flink docs
 Key: FLINK-30291
 URL: https://issues.apache.org/jira/browse/FLINK-30291
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / AWS, Documentation
Reporter: Danny Cranmer
 Fix For: 1.17.0, 1.16.1


Update the docs render to integrate {{{}flink-connector-aws{}}}.

Add a new shortcode to handle rendering the SQL connector correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-aws] dannycranmer opened a new pull request, #35: [FLINK-29907] Externalize Kinesis/Firehose documentation

2022-12-04 Thread GitBox


dannycranmer opened a new pull request, #35:
URL: https://github.com/apache/flink-connector-aws/pull/35

   Copy over documentation for Kinesis and Firehose from the Flink repo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-29109) Checkpoint path conflict with stateless upgrade mode

2022-12-04 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved FLINK-29109.
--
Resolution: Fixed

> Checkpoint path conflict with stateless upgrade mode
> 
>
> Key: FLINK-29109
> URL: https://issues.apache.org/jira/browse/FLINK-29109
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.1.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.3.0, kubernetes-operator-1.2.0
>
>
> A stateful job with stateless upgrade mode (yes, there are such use cases) 
> fails with checkpoint path conflict due to constant jobId and FLINK-19358 
> (applies to Flink < 1.16x). Since with stateless upgrade mode the checkpoint 
> id resets on restart the job is going to write to previously used locations 
> and fail. The workaround is to rotate the jobId on every redeploy when the 
> upgrade mode is stateless. While this can be worked around externally it is 
> best done in the operator itself because reconciliation resolves when a 
> restart is actually required while rotating jobId externally may trigger 
> unnecessary restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29109) Checkpoint path conflict with stateless upgrade mode

2022-12-04 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated FLINK-29109:
-
Fix Version/s: kubernetes-operator-1.3.0

> Checkpoint path conflict with stateless upgrade mode
> 
>
> Key: FLINK-29109
> URL: https://issues.apache.org/jira/browse/FLINK-29109
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.1.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.2.0, kubernetes-operator-1.3.0
>
>
> A stateful job with stateless upgrade mode (yes, there are such use cases) 
> fails with checkpoint path conflict due to constant jobId and FLINK-19358 
> (applies to Flink < 1.16x). Since with stateless upgrade mode the checkpoint 
> id resets on restart the job is going to write to previously used locations 
> and fail. The workaround is to rotate the jobId on every redeploy when the 
> upgrade mode is stateless. While this can be worked around externally it is 
> best done in the operator itself because reconciliation resolves when a 
> restart is actually required while rotating jobId externally may trigger 
> unnecessary restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30287) Configmaps get cleaned up when upgrading standalone Flink cluster

2022-12-04 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-30287:
-
Affects Version/s: (was: 1.2)

> Configmaps get cleaned up when upgrading standalone Flink cluster
> -
>
> Key: FLINK-30287
> URL: https://issues.apache.org/jira/browse/FLINK-30287
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Steven Zhang
>Priority: Major
>  Labels: pull-request-available
>
> I started up a Standalone session Flink cluster and ran one job on it. I 
> checked the configMaps and see the HA data.
>  
> {code:java}
> kubectl get configmaps -n flink-operator
> NAME                                                                          
>   DATA   AGE
> flink-config-sql-example-deployment-s3-testing                                
>   2      5m41s
> flink-operator-config                                                         
>   3      42h
> kube-root-ca.crt                                                              
>   1      42h
> sql-example-deployment-s3-testing-3f57cd5f0002-config-map 
>   0      11m
> sql-example-deploymnt-s3-testing-cluster-config-map                           
>   5      12m
> {code}
>  
> I then update the FlinkDep image field and the Flink cluster gets restarted. 
> The HA configmap for the job is now gone.
> {code:java}
> kubectl get configmaps -n flink-operator
> NAME                                                   DATA   AGE
> flink-config-sql-example-deployment-s3-testing         2      18m
> flink-operator-config                                  3      43h
> kube-root-ca.crt                                       1      43h
> sql-example-deployment-s3-testing-cluster-config-map   3      31m {code}
>  
> I think this is due to a race condition where the TM first terminates which 
> causes the JM to interpret the Job entering a failed state which causes it to 
> clean up the configmaps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30290) IteratorSourceReaderBase should report END_OF_INPUT sooner

2022-12-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30290:


 Summary: IteratorSourceReaderBase should report END_OF_INPUT sooner
 Key: FLINK-30290
 URL: https://issues.apache.org/jira/browse/FLINK-30290
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Core
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0


The iterator source reader base does not report end_of_input when the last 
value was emitted, but instead requires an additional call to pollNext to be 
made.
This is fine functionality-wise, and allowed by the the source reader api 
contracts, but it's not intuitive behavior and leaks into tests for the datagen 
source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30289) RateLimitedSourceReader uses wrong signal for checkpoint rate-limiting

2022-12-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30289:


 Summary: RateLimitedSourceReader uses wrong signal for checkpoint 
rate-limiting
 Key: FLINK-30289
 URL: https://issues.apache.org/jira/browse/FLINK-30289
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0


The checkpoint rate limiter is notified when the checkpoint is complete, but 
since this signal comes at some point in the future (or not at all) it can 
result in no records being emitted for a checkpoint, or more records than 
expected being emitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-30202) RateLimitedSourceReader may emit one more record than permitted

2022-12-04 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-30202.

Fix Version/s: 1.17.0
   Resolution: Fixed

master: 81ed6c649e1a219c457628c54a7165d75b803474

> RateLimitedSourceReader may emit one more record than permitted
> ---
>
> Key: FLINK-30202
> URL: https://issues.apache.org/jira/browse/FLINK-30202
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.17.0
>
>
>  [This 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43483=logs=5cae8624-c7eb-5c51-92d3-4d2dacedd221=5acec1b4-945b-59ca-34f8-168928ce5199=24747]
>  failed due to a failed assertion in 
> {{{}DataGeneratorSourceITCase.testGatedRateLimiter{}}}:
> {code:java}
> Nov 25 03:26:45 org.opentest4j.AssertionFailedError: 
> Nov 25 03:26:45 
> Nov 25 03:26:45 expected: 2
> Nov 25 03:26:45  but was: 1 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] zentol merged pull request #21416: [FLINK-30202][tests] Do not assert on checkpointId

2022-12-04 Thread GitBox


zentol merged PR #21416:
URL: https://github.com/apache/flink/pull/21416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-27184) Optimize IntervalJoinOperator by using binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27184:
---
Summary: Optimize IntervalJoinOperator by using binary sorted state  (was: 
Optimize IntervalJoinOperator by using temporal state)

> Optimize IntervalJoinOperator by using binary sorted state
> --
>
> Key: FLINK-27184
> URL: https://issues.apache.org/jira/browse/FLINK-27184
> Project: Flink
>  Issue Type: Sub-task
>Reporter: David Anderson
>Priority: Major
>
> The performance of interval joins on RocksDB can be optimized by using 
> temporal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27184) Optimize IntervalJoinOperator by using binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27184:
---
Description: The performance of interval joins on RocksDB can be optimized 
by using binary sorted state.  (was: The performance of interval joins on 
RocksDB can be optimized by using temporal state.)

> Optimize IntervalJoinOperator by using binary sorted state
> --
>
> Key: FLINK-27184
> URL: https://issues.apache.org/jira/browse/FLINK-27184
> Project: Flink
>  Issue Type: Sub-task
>Reporter: David Anderson
>Priority: Major
>
> The performance of interval joins on RocksDB can be optimized by using binary 
> sorted state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27182) Optimize RowTimeSortOperator by using binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27182:
---
Summary: Optimize RowTimeSortOperator by using binary sorted state  (was: 
Optimize RowTimeSortOperator by using temporal state)

> Optimize RowTimeSortOperator by using binary sorted state
> -
>
> Key: FLINK-27182
> URL: https://issues.apache.org/jira/browse/FLINK-27182
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: David Anderson
>Priority: Major
>
> The performance of the RowTimeSortOperator can be significantly improved by 
> using temporal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27183) Optimize CepOperator by using temporal state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27183:
---
Description: The performance of CEP on RocksDB can be significantly 
improved by having it use binary sorted state.  (was: The performance of CEP on 
RocksDB can be significantly improved by having it use temporal state.)

> Optimize CepOperator by using temporal state
> 
>
> Key: FLINK-27183
> URL: https://issues.apache.org/jira/browse/FLINK-27183
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / CEP
>Reporter: David Anderson
>Priority: Major
>
> The performance of CEP on RocksDB can be significantly improved by having it 
> use binary sorted state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27183) Optimize CepOperator by using binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27183:
---
Summary: Optimize CepOperator by using binary sorted state  (was: Optimize 
CepOperator by using temporal state)

> Optimize CepOperator by using binary sorted state
> -
>
> Key: FLINK-27183
> URL: https://issues.apache.org/jira/browse/FLINK-27183
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / CEP
>Reporter: David Anderson
>Priority: Major
>
> The performance of CEP on RocksDB can be significantly improved by having it 
> use binary sorted state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27182) Optimize RowTimeSortOperator by using binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27182:
---
Description: The performance of the RowTimeSortOperator can be 
significantly improved by using binary sorted state.  (was: The performance of 
the RowTimeSortOperator can be significantly improved by using temporal state.)

> Optimize RowTimeSortOperator by using binary sorted state
> -
>
> Key: FLINK-27182
> URL: https://issues.apache.org/jira/browse/FLINK-27182
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: David Anderson
>Priority: Major
>
> The performance of the RowTimeSortOperator can be significantly improved by 
> using binary sorted state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27181) Optimize TemporalRowTimeJoinOperator by using temporal state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27181:
---
Description: The throughput of the TemporalRowTimeJoinOperator can be 
significantly improved by using binary sorted state in its implementation.  
(was: The throughput of the TemporalRowTimeJoinOperator can be significantly 
improved by using temporal state in its implementation.)

> Optimize TemporalRowTimeJoinOperator by using temporal state
> 
>
> Key: FLINK-27181
> URL: https://issues.apache.org/jira/browse/FLINK-27181
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: David Anderson
>Priority: Major
>
> The throughput of the TemporalRowTimeJoinOperator can be significantly 
> improved by using binary sorted state in its implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27181) Optimize TemporalRowTimeJoinOperator by using binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27181:
---
Summary: Optimize TemporalRowTimeJoinOperator by using binary sorted state  
(was: Optimize TemporalRowTimeJoinOperator by using temporal state)

> Optimize TemporalRowTimeJoinOperator by using binary sorted state
> -
>
> Key: FLINK-27181
> URL: https://issues.apache.org/jira/browse/FLINK-27181
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: David Anderson
>Priority: Major
>
> The throughput of the TemporalRowTimeJoinOperator can be significantly 
> improved by using binary sorted state in its implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-27179) Update training example to use temporal state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson closed FLINK-27179.
--
Resolution: Won't Do

> Update training example to use temporal state
> -
>
> Key: FLINK-27179
> URL: https://issues.apache.org/jira/browse/FLINK-27179
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation / Training
>Reporter: David Anderson
>Priority: Major
>
> [https://nightlies.apache.org/flink/flink-docs-master/docs/learn-flink/event_driven/#example]
>  is a good use case for temporal state (this example is doing windowing in a 
> keyed process function).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27180) Docs for binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27180:
---
Description: 
Update

[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state]
 

to include binary sorted state.

  was:
Update

[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state]
 

to include temporal state.


> Docs for binary sorted state
> 
>
> Key: FLINK-27180
> URL: https://issues.apache.org/jira/browse/FLINK-27180
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: David Anderson
>Priority: Major
>
> Update
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state]
>  
> to include binary sorted state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27180) Docs for binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27180:
---
Summary: Docs for binary sorted state  (was: Docs for temporal state)

> Docs for binary sorted state
> 
>
> Key: FLINK-27180
> URL: https://issues.apache.org/jira/browse/FLINK-27180
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: David Anderson
>Priority: Major
>
> Update
> [https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state]
>  
> to include temporal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27178) create examples that use binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27178:
---
Summary: create examples that use binary sorted state  (was: create 
examples that use temporal state)

> create examples that use binary sorted state
> 
>
> Key: FLINK-27178
> URL: https://issues.apache.org/jira/browse/FLINK-27178
> Project: Flink
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: David Anderson
>Priority: Major
>
> Add examples showing how to use temporal state. E.g., sorting and/or a 
> temporal join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27178) create examples that use binary sorted state

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27178:
---
Description: Add examples showing how to use binary sorted state. E.g., for 
custom windowing, or sorting.  (was: Add examples showing how to use temporal 
state. E.g., sorting and/or a temporal join.)

> create examples that use binary sorted state
> 
>
> Key: FLINK-27178
> URL: https://issues.apache.org/jira/browse/FLINK-27178
> Project: Flink
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: David Anderson
>Priority: Major
>
> Add examples showing how to use binary sorted state. E.g., for custom 
> windowing, or sorting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27177) Implement Binary Sorted State

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27177:
---
Summary: Implement Binary Sorted State  (was: Implement Temporal State)

> Implement Binary Sorted State
> -
>
> Key: FLINK-27177
> URL: https://issues.apache.org/jira/browse/FLINK-27177
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Priority: Major
>
> Following the plan in 
> [FLIP-220|[https://cwiki.apache.org/confluence/x/Xo_FD]|https://cwiki.apache.org/confluence/x/Xo_FD],]
>  * add methods to the RuntimeContext and KeyedStateStore interfaces for 
> registering TemporalValueState and TemporalListState
>  * 
> h3. implement TemporalValueState and TemporalListState



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27177) Implement Binary Sorted State

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27177:
---
Description: Following the plan in 
[FLIP-220|[https://cwiki.apache.org/confluence/x/Xo_FD]|https://cwiki.apache.org/confluence/x/Xo_FD],]
  (was: Following the plan in 
[FLIP-220|[https://cwiki.apache.org/confluence/x/Xo_FD]|https://cwiki.apache.org/confluence/x/Xo_FD],]
 * add methods to the RuntimeContext and KeyedStateStore interfaces for 
registering TemporalValueState and TemporalListState
 * 
h3. implement TemporalValueState and TemporalListState)

> Implement Binary Sorted State
> -
>
> Key: FLINK-27177
> URL: https://issues.apache.org/jira/browse/FLINK-27177
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Priority: Major
>
> Following the plan in 
> [FLIP-220|[https://cwiki.apache.org/confluence/x/Xo_FD]|https://cwiki.apache.org/confluence/x/Xo_FD],]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27176) FLIP-220: Binary Sorted State

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27176:
---
Description: Task for implementing FLIP-220: Binary Sorted State: 
[https://cwiki.apache.org/confluence/x/Xo_FD]  (was: Task for implementing 
FLIP-220: Temporal State: [https://cwiki.apache.org/confluence/x/Xo_FD])

> FLIP-220: Binary Sorted State
> -
>
> Key: FLINK-27176
> URL: https://issues.apache.org/jira/browse/FLINK-27176
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Major
>
> Task for implementing FLIP-220: Binary Sorted State: 
> [https://cwiki.apache.org/confluence/x/Xo_FD]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27176) FLIP-220: Binary Sorted State

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27176:
---
Release Note: Added BinarySortedState to optimize state accesses and 
updates used for sorting, temporal joins, and custom windowing.   (was: Added 
TemporalValueState and TemporalListState to optimize state accesses and updates 
used for sorting, joins, and windowing. )

> FLIP-220: Binary Sorted State
> -
>
> Key: FLINK-27176
> URL: https://issues.apache.org/jira/browse/FLINK-27176
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Major
>
> Task for implementing FLIP-220: Temporal State: 
> [https://cwiki.apache.org/confluence/x/Xo_FD]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27176) FLIP-220: Binary Sorted State

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-27176:
---
Summary: FLIP-220: Binary Sorted State  (was: FLIP-220: Temporal State)

> FLIP-220: Binary Sorted State
> -
>
> Key: FLINK-27176
> URL: https://issues.apache.org/jira/browse/FLINK-27176
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Priority: Major
>
> Task for implementing FLIP-220: Temporal State: 
> [https://cwiki.apache.org/confluence/x/Xo_FD]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-27176) FLIP-220: Binary Sorted State

2022-12-04 Thread David Anderson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson reassigned FLINK-27176:
--

Assignee: David Anderson

> FLIP-220: Binary Sorted State
> -
>
> Key: FLINK-27176
> URL: https://issues.apache.org/jira/browse/FLINK-27176
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Major
>
> Task for implementing FLIP-220: Temporal State: 
> [https://cwiki.apache.org/confluence/x/Xo_FD]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] hdulay commented on a diff in pull request #21436: [FLINK-30093] [formats] Protobuf Timestamp Compile Error

2022-12-04 Thread GitBox


hdulay commented on code in PR #21436:
URL: https://github.com/apache/flink/pull/21436#discussion_r1039021869


##
flink-formats/flink-protobuf/src/test/java/org/apache/flink/formats/protobuf/Pb3ToRowTest.java:
##
@@ -85,6 +88,8 @@ public void testDeserialization() throws Exception {
 
 assertEquals(1, rowData.getInt(0));
 assertEquals(2L, rowData.getLong(1));
+
+assertEquals(ts.getNanos(), row.getTimestamp(11, 
3).getNanoOfMillisecond());

Review Comment:
   @MartijnVisser Hi. I'm suggesting to add support for 3rd party protobuf 
datatype. Which needs a mapping to either a RowType or a SimpleType in the 
protobuf codegen components. I'm assuming this is the correct approach. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] MartijnVisser commented on a diff in pull request #21436: [FLINK-30093] [formats] Protobuf Timestamp Compile Error

2022-12-04 Thread GitBox


MartijnVisser commented on code in PR #21436:
URL: https://github.com/apache/flink/pull/21436#discussion_r1039012360


##
flink-formats/flink-protobuf/src/test/java/org/apache/flink/formats/protobuf/Pb3ToRowTest.java:
##
@@ -85,6 +88,8 @@ public void testDeserialization() throws Exception {
 
 assertEquals(1, rowData.getInt(0));
 assertEquals(2L, rowData.getLong(1));
+
+assertEquals(ts.getNanos(), row.getTimestamp(11, 
3).getNanoOfMillisecond());

Review Comment:
   Just to step in: adding a new datatype requires a FLIP and a discussion. Why 
can't any of the existing datatypes be reused?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-30186) Bump Flink version to 1.15.3

2022-12-04 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-30186.
--
Resolution: Fixed

Merged to main 3c5ec6cd86de61752c708eba385646a7c1164880

> Bump Flink version to 1.15.3
> 
>
> Key: FLINK-30186
> URL: https://issues.apache.org/jira/browse/FLINK-30186
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.0
>Reporter: Jeff Yang
>Assignee: Jeff Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.3.0
>
>
> A new stable version of flink has been released, Bump Flink version to 1.15.3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] gyfora merged pull request #451: [FLINK-30186] Bump Flink version to 1.15.3

2022-12-04 Thread GitBox


gyfora merged PR #451:
URL: https://github.com/apache/flink-kubernetes-operator/pull/451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] hdulay commented on a diff in pull request #21436: [FLINK-30093] [formats] Protobuf Timestamp Compile Error

2022-12-04 Thread GitBox


hdulay commented on code in PR #21436:
URL: https://github.com/apache/flink/pull/21436#discussion_r1039003583


##
flink-formats/flink-protobuf/src/test/java/org/apache/flink/formats/protobuf/Pb3ToRowTest.java:
##
@@ -85,6 +88,8 @@ public void testDeserialization() throws Exception {
 
 assertEquals(1, rowData.getInt(0));
 assertEquals(2L, rowData.getLong(1));
+
+assertEquals(ts.getNanos(), row.getTimestamp(11, 
3).getNanoOfMillisecond());

Review Comment:
   @maosuhan Yes as I stepped through the ProtobufSQLITCaseTest to add a 
timestamp field, I realized this issue goes beyond just source generation and 
compilation errors. Is there an example of how to add new data type?  I'm 
assuming I'll need to somehow map it to the TimestampType logical type. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] zjureel commented on a diff in pull request #416: [FLINK-30271] Introduce Table.copy from dynamic options

2022-12-04 Thread GitBox


zjureel commented on code in PR #416:
URL: https://github.com/apache/flink-table-store/pull/416#discussion_r1038981579


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/table/AbstractFileStoreTable.java:
##
@@ -40,6 +47,43 @@ public AbstractFileStoreTable(Path path, TableSchema 
tableSchema) {
 
 protected abstract FileStore store();
 
+protected abstract FileStoreTable copy(TableSchema newTableSchema);
+
+@Override
+public FileStoreTable copy(Map dynamicOptions) {
+// check option is not immutable
+Map options = tableSchema.options();
+dynamicOptions.forEach(
+(k, v) -> {
+if (!Objects.equals(v, options.get(k))) {

Review Comment:
   Should the immutable field be exist in the dynamic parameters? Maybe it's 
better to `SchemaManager.checkAlterTableOption(k)` without comparing the value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] maosuhan commented on a diff in pull request #21436: [FLINK-30093] [formats] Protobuf Timestamp Compile Error

2022-12-04 Thread GitBox


maosuhan commented on code in PR #21436:
URL: https://github.com/apache/flink/pull/21436#discussion_r1038980183


##
flink-formats/flink-protobuf/src/test/java/org/apache/flink/formats/protobuf/Pb3ToRowTest.java:
##
@@ -85,6 +88,8 @@ public void testDeserialization() throws Exception {
 
 assertEquals(1, rowData.getInt(0));
 assertEquals(2L, rowData.getLong(1));
+
+assertEquals(ts.getNanos(), row.getTimestamp(11, 
3).getNanoOfMillisecond());

Review Comment:
   @hdulay I think there is still gap between protobuf and flink internal 
memory layout. I tried to build a concrete Timestamp like `Timestamp ts 
=Timestamp.newBuilder().setSeconds(xxx).setNanos(yyy).build()` in unit test. 
And you can easily find the difference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #457: [docs] Add controller flow docs

2022-12-04 Thread GitBox


gyfora commented on PR #457:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/457#issuecomment-1336426415

   I will remove them once back from holiday , this is not urgent. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] zjureel commented on a diff in pull request #418: [FLINK-30273] Introduce RecordReaderUtils.transform to transform RecordReader

2022-12-04 Thread GitBox


zjureel commented on code in PR #418:
URL: https://github.com/apache/flink-table-store/pull/418#discussion_r1038979729


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/utils/RecordReaderUtils.java:
##
@@ -44,4 +45,49 @@ public static  void forEachRemaining(
 reader.close();
 }
 }
+
+/**
+ * Returns a {@link RecordReader} that applies {@code function} to each 
element of {@code
+ * fromReader}.
+ */
+public static  RecordReader transform(
+RecordReader fromReader, Function function) {
+return new RecordReader() {
+@Override
+public RecordIterator readBatch() throws IOException {
+RecordIterator iterator = fromReader.readBatch();
+if (iterator == null) {
+return null;
+}
+return transform(iterator, function);
+}
+
+@Override
+public void close() throws IOException {
+fromReader.close();
+}
+};
+}
+
+/**
+ * Returns an iterator that applies {@code function} to each element of 
{@code fromIterator}.
+ */
+public static  RecordReader.RecordIterator transform(
+RecordReader.RecordIterator fromIterator, Function 
function) {
+return new RecordReader.RecordIterator() {
+@Override
+public R next() throws IOException {

Review Comment:
   Add @Nullable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] zjureel commented on a diff in pull request #418: [FLINK-30273] Introduce RecordReaderUtils.transform to transform RecordReader

2022-12-04 Thread GitBox


zjureel commented on code in PR #418:
URL: https://github.com/apache/flink-table-store/pull/418#discussion_r1038979666


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/utils/RecordReaderUtils.java:
##
@@ -44,4 +45,49 @@ public static  void forEachRemaining(
 reader.close();
 }
 }
+
+/**
+ * Returns a {@link RecordReader} that applies {@code function} to each 
element of {@code
+ * fromReader}.
+ */
+public static  RecordReader transform(
+RecordReader fromReader, Function function) {
+return new RecordReader() {
+@Override
+public RecordIterator readBatch() throws IOException {

Review Comment:
   Add @Nullable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-28766) UnalignedCheckpointStressITCase.runStressTest failed with NoSuchFileException

2022-12-04 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-28766.
--
Resolution: Fixed

Merged to master as 1ddaa2a460d and 9e1cefe2509

> UnalignedCheckpointStressITCase.runStressTest failed with NoSuchFileException
> -
>
> Key: FLINK-28766
> URL: https://issues.apache.org/jira/browse/FLINK-28766
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Assignee: Anton Kalashnikov
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.17.0
>
>
> {code:java}
> 2022-08-01T01:36:16.0563880Z Aug 01 01:36:16 [ERROR] 
> org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.runStressTest
>   Time elapsed: 12.579 s  <<< ERROR!
> 2022-08-01T01:36:16.0565407Z Aug 01 01:36:16 java.io.UncheckedIOException: 
> java.nio.file.NoSuchFileException: 
> /tmp/junit1058240190382532303/f0f99754a53d2c4633fed75011da58dd/chk-7/61092e4a-5b9a-4f56-83f7-d9960c53ed3e
> 2022-08-01T01:36:16.0566296Z Aug 01 01:36:16  at 
> java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88)
> 2022-08-01T01:36:16.0566972Z Aug 01 01:36:16  at 
> java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104)
> 2022-08-01T01:36:16.0567600Z Aug 01 01:36:16  at 
> java.util.Iterator.forEachRemaining(Iterator.java:115)
> 2022-08-01T01:36:16.0568290Z Aug 01 01:36:16  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> 2022-08-01T01:36:16.0569172Z Aug 01 01:36:16  at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> 2022-08-01T01:36:16.0569877Z Aug 01 01:36:16  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> 2022-08-01T01:36:16.0570554Z Aug 01 01:36:16  at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> 2022-08-01T01:36:16.0571371Z Aug 01 01:36:16  at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> 2022-08-01T01:36:16.0572417Z Aug 01 01:36:16  at 
> java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:546)
> 2022-08-01T01:36:16.0573618Z Aug 01 01:36:16  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.discoverRetainedCheckpoint(UnalignedCheckpointStressITCase.java:289)
> 2022-08-01T01:36:16.0575187Z Aug 01 01:36:16  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.runAndTakeExternalCheckpoint(UnalignedCheckpointStressITCase.java:262)
> 2022-08-01T01:36:16.0576540Z Aug 01 01:36:16  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.runStressTest(UnalignedCheckpointStressITCase.java:158)
> 2022-08-01T01:36:16.0577684Z Aug 01 01:36:16  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-01T01:36:16.0578546Z Aug 01 01:36:16  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-01T01:36:16.0579374Z Aug 01 01:36:16  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-01T01:36:16.0580298Z Aug 01 01:36:16  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-01T01:36:16.0581243Z Aug 01 01:36:16  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 2022-08-01T01:36:16.0582029Z Aug 01 01:36:16  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2022-08-01T01:36:16.0582766Z Aug 01 01:36:16  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 2022-08-01T01:36:16.0583488Z Aug 01 01:36:16  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2022-08-01T01:36:16.0584203Z Aug 01 01:36:16  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2022-08-01T01:36:16.0585087Z Aug 01 01:36:16  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2022-08-01T01:36:16.0585778Z Aug 01 01:36:16  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2022-08-01T01:36:16.0586482Z Aug 01 01:36:16  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2022-08-01T01:36:16.0587155Z Aug 01 01:36:16  at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> 2022-08-01T01:36:16.0587809Z Aug 01 01:36:16  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 2022-08-01T01:36:16.0588434Z Aug 01 01:36:16  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2022-08-01T01:36:16.0589203Z Aug 01 01:36:16  at 
> 

[GitHub] [flink] pnowojski merged pull request #21440: [FLINK-28766][tests] Safe iterate over checkpoint files in order to avoid exception during parallel savepoint deletion

2022-12-04 Thread GitBox


pnowojski merged PR #21440:
URL: https://github.com/apache/flink/pull/21440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] morhidi commented on pull request #457: [docs] Add controller flow docs

2022-12-04 Thread GitBox


morhidi commented on PR #457:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/457#issuecomment-1336415811

   > > +1 I was unable to render the pictures on this PR.
   > 
   > Sorry @morhidi , I should have posted some screenshots. Unfortunately I 
don’t have the laptop on me
   hm.. it is supposed to work without screenshots. I checked and the svgs are 
not formatted properly. The `` tag is 
duplicated in them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] morhidi merged pull request #420: FLINK-29536 - Add WATCH_NAMESPACE env var to operator

2022-12-04 Thread GitBox


morhidi merged PR #420:
URL: https://github.com/apache/flink-kubernetes-operator/pull/420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #457: [docs] Add controller flow docs

2022-12-04 Thread GitBox


gyfora commented on PR #457:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/457#issuecomment-1336413490

   > +1 I was unable to render the pictures on this PR.
   
   Sorry @morhidi , I should have posted some screenshots. Unfortunately I 
don’t have the laptop on me 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] morhidi merged pull request #469: [FLINK-29109] Generate random jobId for stateless upgrade mode irrespective of Flink version

2022-12-04 Thread GitBox


morhidi merged PR #469:
URL: https://github.com/apache/flink-kubernetes-operator/pull/469


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] tagarr commented on a diff in pull request #420: FLINK-29536 - Add WATCH_NAMESPACE env var to operator

2022-12-04 Thread GitBox


tagarr commented on code in PR #420:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/420#discussion_r1038965294


##
helm/flink-kubernetes-operator/templates/flink-operator.yaml:
##
@@ -79,7 +79,9 @@ spec:
   {{- end }}
   env:
 - name: OPERATOR_NAMESPACE

Review Comment:
   So it is a more consistent way of defining the operator namespace and works 
for all install methods. That's why I made the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-30288) Use visitor to convert predicate for orc

2022-12-04 Thread Shammon (Jira)
Shammon created FLINK-30288:
---

 Summary: Use visitor to convert predicate for orc
 Key: FLINK-30288
 URL: https://issues.apache.org/jira/browse/FLINK-30288
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Shammon


Use `PredicateVisitor` to convert `Predicate` in table store for orc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #183: [FLINK-29603] Add Transformer for StopWordsRemover

2022-12-04 Thread GitBox


jiangxin369 commented on code in PR #183:
URL: https://github.com/apache/flink-ml/pull/183#discussion_r1038916503


##
flink-ml-python/pyflink/ml/lib/feature/stopwordsremover.py:
##
@@ -0,0 +1,135 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+import typing
+from typing import Tuple
+
+from pyflink.java_gateway import get_gateway
+from pyflink.ml.core.param import Param, StringArrayParam, BooleanParam, 
StringParam
+from pyflink.ml.core.wrapper import JavaWithParams
+from pyflink.ml.lib.feature.common import JavaFeatureTransformer
+from pyflink.ml.lib.param import HasInputCols, HasOutputCols
+
+
+def _load_default_stop_words(language: str):
+return get_gateway().jvm.org.apache.flink.ml.feature. \
+stopwordsremover.StopWordsRemover.loadDefaultStopWords(language)
+
+
+def _get_default_or_us():
+return get_gateway().jvm.org.apache.flink.ml.feature. \
+stopwordsremover.StopWordsRemover.getDefaultOrUS()
+
+
+class _StopWordsRemoverParams(
+JavaWithParams,
+HasInputCols,
+HasOutputCols
+):
+"""
+Params for :class:`StopWordsRemover`.
+"""
+
+STOP_WORDS: Param[Tuple[str, ...]] = StringArrayParam(
+"stop_words",
+"The words to be filtered out.",
+_load_default_stop_words('english'))
+
+CASE_SENSITIVE: Param[bool] = BooleanParam(
+"case_sensitive",
+"Whether to do a case-sensitive comparison over the stop words.",
+False
+)
+
+LOCALE: Param[str] = StringParam(
+"locale",
+"Locale of the input for case insensitive matching. Ignored when 
caseSensitive is true.",
+_get_default_or_us())
+
+def __init__(self, java_params):
+super(_StopWordsRemoverParams, self).__init__(java_params)
+
+def set_stop_words(self, value: Tuple[str, ...]):

Review Comment:
   Let's replace the param with `*value: str` to set `StringArrayParam`.



##
flink-ml-lib/src/main/resources/org/apache/flink/ml/feature/stopwords/README:
##
@@ -0,0 +1,12 @@
+Stopwords Corpus

Review Comment:
   Would it be better to write this file in markdown syntax like `## Stopwords 
Corpus` for better readability?



##
docs/content/docs/operators/feature/stopwordsremover.md:
##
@@ -0,0 +1,165 @@
+---
+title: "StopWordsRemover"
+weight: 1
+type: docs
+aliases:
+- /operators/feature/stopwordsremover.html
+---
+
+
+
+## StopWordsRemover
+
+A feature transformer that filters out stop words from input.
+
+Note: null values from input array are preserved unless adding null to 
stopWords
+explicitly.
+
+See Also: http://en.wikipedia.org/wiki/Stop_words;>Stop words
+(Wikipedia)
+
+### Input Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| inputCols  | String[] | `null`  | Arrays of strings containing stop words to 
remove. |
+
+### Output Columns
+
+| Param name | Type | Default | Description
|
+|:---|:-|:|:---|
+| outputCols | String[] | `null`  | Arrays of strings with stop words removed. 
|
+
+### Parameters
+
+| Key   | Default| Type
 | Required | Description   
 |
+|---||--|--||
+| inputCols | `null` | 
String[] | yes  | Input column names.   
 |
+| outputCols| `null` | 
String[] | yes  | Output column name.   
 |
+| stopWords | `StopWordsRemover.loadDefaultStopWords("english")` | 
String[] |