[GitHub] [flink] flinkbot edited a comment on pull request #17531: [FLINK-24598][table-runtime] Fix test problem that the current IT cas…
flinkbot edited a comment on pull request #17531: URL: https://github.com/apache/flink/pull/17531#issuecomment-948196975 ## CI report: * 4de4de84a971dbba3b6701922d083359a732b4f2 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25294) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-23062) FLIP-129: Register sources/sinks in Table API
[ https://issues.apache.org/jira/browse/FLINK-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ingo Bürk closed FLINK-23062. - Fix Version/s: 1.14.0 Resolution: Fixed > FLIP-129: Register sources/sinks in Table API > - > > Key: FLINK-23062 > URL: https://issues.apache.org/jira/browse/FLINK-23062 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Ingo Bürk >Assignee: Ingo Bürk >Priority: Major > Fix For: 1.14.0 > > > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23062) FLIP-129: Register sources/sinks in Table API
[ https://issues.apache.org/jira/browse/FLINK-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432204#comment-17432204 ] Ingo Bürk commented on FLINK-23062: --- Yes, sorry. Will close it and move the FLIP. > FLIP-129: Register sources/sinks in Table API > - > > Key: FLINK-23062 > URL: https://issues.apache.org/jira/browse/FLINK-23062 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Ingo Bürk >Assignee: Ingo Bürk >Priority: Major > > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22096) ServerTransportErrorHandlingTest.testRemoteClose fail
[ https://issues.apache.org/jira/browse/FLINK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432202#comment-17432202 ] Yuxin Tan commented on FLINK-22096: --- In the test case, a `{{BindException`}} may be thrown when initializing Netty server. To solve this problem, a retry is added when calling `{{initServerAndClient`}} method. And I submitted a PR on my point of view. What do you think about the case? [~maguowei] [~kevin.cyj]. Please correct me at any time if I missed anything, thanks. > ServerTransportErrorHandlingTest.testRemoteClose fail > -- > > Key: FLINK-22096 > URL: https://issues.apache.org/jira/browse/FLINK-22096 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Guowei Ma >Assignee: Yuxin Tan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.15.0, 1.14.1, 1.13.4 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15966=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=05b74a19-4ee4-5036-c46f-ada307df6cf0=6580 > {code:java} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.415 > s <<< FAILURE! - in > org.apache.flink.runtime.io.network.netty.ServerTransportErrorHandlingTest > [ERROR] > testRemoteClose(org.apache.flink.runtime.io.network.netty.ServerTransportErrorHandlingTest) > Time elapsed: 1.338 s <<< ERROR! > org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: > bind(..) failed: Address already in use > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #17532: [FLINK-22096][tests] Fix port conflict for ServerTransportErrorHandlingTest#testRemoteClose
flinkbot edited a comment on pull request #17532: URL: https://github.com/apache/flink/pull/17532#issuecomment-948251540 ## CI report: * dd8d7b161f71cef2099c746f4e54cb62d4b881dd Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25296) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #17532: [FLINK-22096][tests] Fix port conflict for ServerTransportErrorHandlingTest#testRemoteClose
flinkbot commented on pull request #17532: URL: https://github.com/apache/flink/pull/17532#issuecomment-948252161 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit dd8d7b161f71cef2099c746f4e54cb62d4b881dd (Thu Oct 21 04:40:31 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-22096) ServerTransportErrorHandlingTest.testRemoteClose fail
[ https://issues.apache.org/jira/browse/FLINK-22096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-22096: --- Labels: pull-request-available test-stability (was: test-stability) > ServerTransportErrorHandlingTest.testRemoteClose fail > -- > > Key: FLINK-22096 > URL: https://issues.apache.org/jira/browse/FLINK-22096 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Guowei Ma >Assignee: Yuxin Tan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.15.0, 1.14.1, 1.13.4 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15966=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=05b74a19-4ee4-5036-c46f-ada307df6cf0=6580 > {code:java} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.415 > s <<< FAILURE! - in > org.apache.flink.runtime.io.network.netty.ServerTransportErrorHandlingTest > [ERROR] > testRemoteClose(org.apache.flink.runtime.io.network.netty.ServerTransportErrorHandlingTest) > Time elapsed: 1.338 s <<< ERROR! > org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: > bind(..) failed: Address already in use > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #17532: [FLINK-22096][tests] Fix port conflict for ServerTransportErrorHandlingTest#testRemoteClose
flinkbot commented on pull request #17532: URL: https://github.com/apache/flink/pull/17532#issuecomment-948251540 ## CI report: * dd8d7b161f71cef2099c746f4e54cb62d4b881dd UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] TanYuxin-tyx opened a new pull request #17532: [FLINK-22096][tests] Fix port conflict for ServerTransportErrorHandlingTest#testRemoteClose
TanYuxin-tyx opened a new pull request #17532: URL: https://github.com/apache/flink/pull/17532 ## What is the purpose of the change Fix port conflict in `ServerTransportErrorHandlingTest#testRemoteClose`. In the test case, a `BindException` may be thrown when init Netty server. When initializing Netty server, `NetUtils.getAvailablePort()` is called in `createConfig()`. After obtaining the available port, the port may be used by other processes, which may lead to `BindException`. To resolve the issue, a retry is added when calling `initServerAndClient` method. ## Brief change log - *Add the number of retries when initializing the Netty server in the test case* ## Verifying this change - *Initializing the Netty server successfully in the test case* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21068) Add new timeout options for Elasticsearch connector
[ https://issues.apache.org/jira/browse/FLINK-21068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432192#comment-17432192 ] Yangze Guo commented on FLINK-21068: I think it is a valid improvement. However, it seems the `userConfig` is now only used for the bulk processor in es7 and es6. Does someonw know the context of it? cc [~fabian.paul] [~dwysakowicz] > Add new timeout options for Elasticsearch connector > --- > > Key: FLINK-21068 > URL: https://issues.apache.org/jira/browse/FLINK-21068 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Affects Versions: 1.12.1 >Reporter: jinfeng >Priority: Minor > Labels: auto-deprioritized-major > > Currently, the connection.max-retry-timeout seems not work with new > elasticsearch connector. The elasticsearch community has Remove > setMaxRetryTimeoutMillis from RestClientBuilder. We can set timeout options > when create RestHighLevelClient in > Elasticsearch7ApiCallBridge , like > {code:java} > //代码占位符 > @Override > public RestHighLevelClient createClient(Map clientConfig) > throws IOException { >RestClientBuilder builder = RestClient.builder(httpHosts.toArray(new > HttpHost[httpHosts.size()])); >builder.setRequestConfigCallback(new > RestClientBuilder.RequestConfigCallback() { > @Override > public RequestConfig.Builder > customizeRequestConfig(RequestConfig.Builder builder) { > if (clientConfig.containsKey(CONFIG_KEY_CONNECTION_TIMEOUT)) { > > builder.setConnectTimeout(Integer.valueOf(clientConfig.get(CONFIG_KEY_CONNECTION_TIMEOUT))); > } > if (clientConfig.containsKey(CONFIG_KEY_CONNECTION_SOCKET_TIMEOUT)) { > > builder.setSocketTimeout(Integer.valueOf(clientConfig.get(CONFIG_KEY_CONNECTION_SOCKET_TIMEOUT))); > } > if (clientConfig.containsKey(CONFIG_KEY_CONNECTION_REQUEST_TIMEOUT)) > { > > builder.setConnectionRequestTimeout(Integer.valueOf(clientConfig.get(CONFIG_KEY_CONNECTION_REQUEST_TIMEOUT))); > } > return builder; > } >}); > {code} > > So, we can add three table config to config eleasticsearch timeout. > connection.timeout > connection.socket-timeout > connection.request-timeout > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-24501) Unexpected behavior of cumulate window aggregate for late event after recover from sp/cp
[ https://issues.apache.org/jira/browse/FLINK-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432190#comment-17432190 ] Wenlong Lyu commented on FLINK-24501: - I think it may be better to make sure that watermark would not reduce after restoring from a checkpoint/savepoint instead of modifying the manner of operator to cover such abnormal case. For example, add an operator state in watermark assigner, to avoid it producing wrong watermark after restore? > Unexpected behavior of cumulate window aggregate for late event after recover > from sp/cp > > > Key: FLINK-24501 > URL: https://issues.apache.org/jira/browse/FLINK-24501 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime >Reporter: JING ZHANG >Assignee: JING ZHANG >Priority: Major > Labels: pull-request-available > > *Problem description* > After recover from savepoint or checkpoint, unexpected behavior of cumulate > window aggregate for late event may happened. > *Bug analyze* > Currently, for cumulate window aggregate, late events belongs to the cleaned > slice would be merged into the merged window state, and would be counted into > the later slice. > For example, for a CUMULATE window, step is 1 minute, size is 1 day. > {code:java} > SELECT window_start, window_end, COUNT(USER_ID) > FROM TABLE( > CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '1' MINUTES, INTERVAL > '1' DAY)) > GROUP BY window_start, window_end;{code} > When the watermark already comes to 11:01, result of window [00:00, 11:01) > would be emitted. Let's assume the result is INSERT (00:00, 11:01, 4) > Then if a late record which event time is 11:00 comes, it would be merged > into merged state, and would be counted into the later slice, for example, > for window [00:00, 11:02), [00:00, 11:03)... But the emitted window result > INSERT (00:00, 11:01, 4) would not be retracted and updated. > The behavior would be different if the job recover from savepoint/checkpoint. > Let's do a savepoint after watermark comes to 11:01 and emit (00:00, 11:01, > 4). > Then recover the job from savepoint. Watermarks are not checkpointed and they > need to be repopulated again. So after recovered, the watermark may rollback > to 11:00, then if a record which event time is 11:00 comes, it would not be > processed as late event, after watermark comes to 11:01 again, a window > result INSERT (00:00, 11:01, 5) would be emitted to downstream. > So the downstream operator would receive two INSERT record for WINDOW (00:00, > 11:01) which may leads to wrong result. > > *Solution* > There are two solutions for the problem: > # save watermark to state in slice shared operator. (Prefered) > # update the behavior for late event. For example, retract the emitted > result and send the updated result. It needs to change the behavior of slice > state clean mechanism because we clean the slice state after watermark > exceeds the slice end currently. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-24072) Add support for setting default headers in elasticsearch connector
[ https://issues.apache.org/jira/browse/FLINK-24072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432186#comment-17432186 ] Yangze Guo commented on FLINK-24072: [~hackergin] I'm not an expert on Elasticsearch. May I ask two questions about this issue: - The `connection.default-headers` should be a map, do I understand it correctly? - Do we only support `BasicHeader`? Does it make sense for users? > Add support for setting default headers in elasticsearch connector > -- > > Key: FLINK-24072 > URL: https://issues.apache.org/jira/browse/FLINK-24072 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: jinfeng >Priority: Major > > If we add support for setting default headers , we can add some head options > in sql options. > The ddl would be like this. > {code:sql} > // Some comments here > create table es-sink ( > a varchar, > b varchar > ) with ( > 'connector' = 'elasticsearch-7', > 'connection.default-headers' = 'Authorization:xxx' > ); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24607) SourceCoordinator may miss to close SplitEnumerator when failover frequently
Jark Wu created FLINK-24607: --- Summary: SourceCoordinator may miss to close SplitEnumerator when failover frequently Key: FLINK-24607 URL: https://issues.apache.org/jira/browse/FLINK-24607 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.3 Reporter: Jark Wu Attachments: jobmanager.log We are having a connection leak problem when using mysql-cdc [1] source. We observed that many enumerators are not closed from the JM log. {code} ➜ test123 cat jobmanager.log | grep "SourceCoordinator \[\] - Restoring SplitEnumerator" | wc -l 264 ➜ test123 cat jobmanager.log | grep "SourceCoordinator \[\] - Starting split enumerator" | wc -l 264 ➜ test123 cat jobmanager.log | grep "MySqlSourceEnumerator \[\] - Starting enumerator" | wc -l 263 ➜ test123 cat jobmanager.log | grep "SourceCoordinator \[\] - Closing SourceCoordinator" | wc -l 264 ➜ test123 cat jobmanager.log | grep "MySqlSourceEnumerator \[\] - Closing enumerator" | wc -l 195 {code} We added "Closing enumerator" log in {{MySqlSourceEnumerator#close()}}, and "Starting enumerator" in {{MySqlSourceEnumerator#start()}}. From the above result you can see that SourceCoordinator is restored and closed 264 times, split enumerator is started 264 but only closed 195 times. It seems that {{SourceCoordinator}} misses to close enumerator when job failover frequently. I also went throught the code of {{SourceCoordinator}} and found some suspicious point: The {{started}} flag and {{enumerator}} is assigned in the main thread, however {{SourceCoordinator#close()}} is executed async by {{DeferrableCoordinator#closeAsync}}. That means the close method will check the {{started}} and {{enumerator}} variable async. Is there any concurrency problem here which mean lead to dirty read and miss to close the {{enumerator}}? I'm still not sure, because it's hard to reproduce locally, and we can't deploy a custom flink version to production env. [1]: https://github.com/ververica/flink-cdc-connectors -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24606) AvroDeserializationSchema buffer is not clean
[ https://issues.apache.org/jira/browse/FLINK-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] heyu dou updated FLINK-24606: - Description: org.apache.flink.formats.avro.AvroDeserializationSchema Decoder want reuse org.apache.avro.io.BinaryDecoder. But the way it is used is wrong. Should be reset Decoder before deserialization(). if not, when schema change, the last result will be enter the current. was: org.apache.flink.formats.avro.AvroDeserializationSchema Decoder want reuse org.apache.avro.io.BinaryDecoder. But the way it is used is wrong. Should be reset Decoder before deserialization(). > AvroDeserializationSchema buffer is not clean > - > > Key: FLINK-24606 > URL: https://issues.apache.org/jira/browse/FLINK-24606 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: heyu dou >Priority: Major > > org.apache.flink.formats.avro.AvroDeserializationSchema Decoder want reuse > org.apache.avro.io.BinaryDecoder. > But the way it is used is wrong. > Should be reset Decoder before deserialization(). > if not, when schema change, the last result will be enter the current. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24606) AvroDeserializationSchema buffer is not clean
heyu dou created FLINK-24606: Summary: AvroDeserializationSchema buffer is not clean Key: FLINK-24606 URL: https://issues.apache.org/jira/browse/FLINK-24606 Project: Flink Issue Type: Bug Components: API / DataStream Reporter: heyu dou org.apache.flink.formats.avro.AvroDeserializationSchema Decoder want reuse org.apache.avro.io.BinaryDecoder. But the way it is used is wrong. Should be reset Decoder before deserialization(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] fsk119 commented on pull request #14444: [FLINK-20091][avro] add ignore-parse-error for avro formats
fsk119 commented on pull request #1: URL: https://github.com/apache/flink/pull/1#issuecomment-948210311 I am recently busy working in the inner branch. I think I have no time to finish this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17436: [FLINK-15987][tabel-planner]SELECT 1.0e0 / 0.0e0 throws NumberFormatException
flinkbot edited a comment on pull request #17436: URL: https://github.com/apache/flink/pull/17436#issuecomment-938641830 ## CI report: * 8e7cadf33f0414a07b1496cabe0c8d7088f870c2 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25262) * c4fd4fc12dddf7d9a37bf111429158d158f32b2c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25295) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17436: [FLINK-15987][tabel-planner]SELECT 1.0e0 / 0.0e0 throws NumberFormatException
flinkbot edited a comment on pull request #17436: URL: https://github.com/apache/flink/pull/17436#issuecomment-938641830 ## CI report: * 8e7cadf33f0414a07b1496cabe0c8d7088f870c2 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25262) * c4fd4fc12dddf7d9a37bf111429158d158f32b2c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tsreaper edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory
tsreaper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948202218 @JingGe > For point 1, the uncompressed data size should be controlled by `StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap size, since the last read might overfulfil the quota a little bit, but it is acceptable. This is not the case. For example xz compression comes with a compression ratio of ~15% (google xz compression ratio if you want to confirm). Note that avro can be represented both in json and in compact binary form, so you may expect a 6x inflation after uncompressing the data. It will become worse as Java objects always come with extra overhead and this is not "overfulfil the quota a little bit". > `StreamFormatAdapter` has built-in compressors support. Does this PR implementation have the same support too? If you take a look at the implementation of `StreamFormatAdapter` you'll find that it supports decompression by calling `StandardDeCompression#getDecompressorForFileName`, which determines the decompressor by the file extensions. Avro files are often ends with `.avro` so there will be no match. Also avro files are compressed by blocks. Avro files contain their own magic numbers, specific headers and block splitters which cannot be understood by the standard xz or bzip2 decompressor. You have to use the avro reader to interpret the file and the avro reader will deal with all the work like decompression or such. > For point 2, `StreamFormat` defines a way to read each record. The problem is that you just cannot read one record at a time from an avro file stream. Avro readers read one **block** at a time from the file stream and store the inflated raw bytes in memory. For detailed code see my reply to @slinkydeveloper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tsreaper commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory
tsreaper commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948202218 @JingGe > For point 1, the uncompressed data size should be controlled by `StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap size, since the last read might overfulfil the quota a little bit, but it is acceptable. This is not the case. For example xz compression comes with a compression ratio of ~15% (google xz compression ratio if you want to confirm). Note that avro can be represented both in json and in compact binary form, so you may expect a 6x inflation after uncompressing the data. It will become worse as Java objects always come with extra overhead and this is not "overfulfil the quota a little bit". > `StreamFormatAdapter` has built-in compressors support. Does this PR implementation have the same support too? If you take a look at the implementation of `StreamFormatAdapter` you'll find that it supports decompression by calling `StandardDeCompression#getDecompressorForFileName`, which determines the decompressor by the file extensions. Avro files are often ends with `.avro` so there will be no match. Also avro files are compressed by blocks. Avro files contain their own magic numbers, specific headers and block splitters which cannot be understand by the standard xz or bzip2 decompressor. You have to use the avro reader to interpret the file and the avro reader will deal with all the work like decompression or such. > For point 2, `StreamFormat` defines a way to read each record. The problem is that you just cannot read one record at a time from an avro file stream. Avro readers read one **block** at a time from the file stream and store the inflated raw bytes in memory. For detailed code see my reply to @slinkydeveloper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-24539) ChangelogNormalize operator tooks too long time to INITIALIZING until failed
[ https://issues.apache.org/jira/browse/FLINK-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432172#comment-17432172 ] vmaster.cc commented on FLINK-24539: [~pnowojski] Thank you very much, i have subscribed the user mailling list. I presume it's busy recovering it's state too, so i have optimized the logic to avoid using Flink to process full data. You said that 'unless you are using unaligned checkpoints', how can i use this? > ChangelogNormalize operator tooks too long time to INITIALIZING until failed > > > Key: FLINK-24539 > URL: https://issues.apache.org/jira/browse/FLINK-24539 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Task, Table SQL / > Runtime >Affects Versions: 1.13.1 > Environment: Flink version :1.13.1 > TaskManager memory: > !image-2021-10-14-13-36-56-899.png|width=578,height=318! > JobManager memory: > !image-2021-10-14-13-37-51-445.png|width=578,height=229! >Reporter: vmaster.cc >Priority: Major > Attachments: image-2021-10-14-13-19-08-215.png, > image-2021-10-14-13-36-56-899.png, image-2021-10-14-13-37-51-445.png, > image-2021-10-14-14-13-13-370.png, image-2021-10-14-14-15-40-101.png, > image-2021-10-14-14-16-33-080.png, > taskmanager_container_e11_1631768043929_0012_01_04_log.txt > > > I'm using debezium to produce cdc from mysql, considering its at least one > delivery, so i must set the config > 'table.exec.source.cdc-events-duplicate=true'. > But when some unknown case make my task down, flink task restart failed > always. I found that ChangelogNormalize operator tooks too long time in > INITIALIZING stage. > > screenshot and log fragment are as follows: > !image-2021-10-14-13-19-08-215.png|width=567,height=293! > > {code:java} > 2021-10-14 12:32:33,660 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder [] - > Finished building RocksDB keyed state-backend at > /data3/yarn/nm/usercache/flink/appcache/application_1631768043929_0012/flink-io-f31735c3-e726-4c49-89a5-916670809b7a/job_7734977994a6a10f7cc784d50e4a1a34_op_KeyedProcessOperator_dc2290bb6f8f5cd2bd425368843494fe__1_1__uuid_6cbbe6ae-f43e-4d2a-b1fb-f0cb71f257af.2021-10-14 > 12:32:33,662 INFO org.apache.flink.runtime.taskmanager.Task >[] - GroupAggregate(groupBy=[teacher_id, create_day], select=[teacher_id, > create_day, SUM_RETRACT($f2) AS teacher_courseware_count]) -> > Calc(select=[teacher_id, create_day, CAST(teacher_courseware_count) AS > teacher_courseware_count]) -> NotNullEnforcer(fields=[teacher_id, > create_day]) (1/1)#143 (9cca3ef1293cc6364698381bbda93998) switched from > INITIALIZING to RUNNING.2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Ignoring > checkpoint aborted notification for non-running task > ChangelogNormalize(key=[c_id]) -> Calc(select=[c_author_id AS teacher_id, > DATE_FORMAT(c_create_time, _UTF-16LE'-MM-dd') AS create_day, IF((c_state > = 10), 1, 0) AS $f2], where=[((c_is_group = 0) AND (c_author_id <> > _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"))]) > (1/1)#143.2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Attempting > to cancel task Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (2/2)#143 > (cc25f9ae49c4db01ab40ff103fae43fd).2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (2/2)#143 > (cc25f9ae49c4db01ab40ff103fae43fd) switched from RUNNING to > CANCELING.2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Triggering > cancellation of task code Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (2/2)#143 > (cc25f9ae49c4db01ab40ff103fae43fd).2021-10-14 12:38:07,583 INFO > org.apache.flink.runtime.taskmanager.Task[] - Attempting > to cancel task Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (1/2)#143 > (5419f41a3f0cc6c2f3f4c82c87f4ae22).2021-10-14 12:38:07,583 INFO > org.apache.flink.runtime.taskmanager.Task[] - Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count])
[jira] [Commented] (FLINK-24539) ChangelogNormalize operator tooks too long time to INITIALIZING until failed
[ https://issues.apache.org/jira/browse/FLINK-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432171#comment-17432171 ] vmaster.cc commented on FLINK-24539: [~twalthr] Yeah, there is no partitioned, because the source data comes from mysql, and then delivery into single partition of kafka topic use debezium. Is there some problem if multiple parallelism is used to consume a single partition? May only one node can consume data? > ChangelogNormalize operator tooks too long time to INITIALIZING until failed > > > Key: FLINK-24539 > URL: https://issues.apache.org/jira/browse/FLINK-24539 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Task, Table SQL / > Runtime >Affects Versions: 1.13.1 > Environment: Flink version :1.13.1 > TaskManager memory: > !image-2021-10-14-13-36-56-899.png|width=578,height=318! > JobManager memory: > !image-2021-10-14-13-37-51-445.png|width=578,height=229! >Reporter: vmaster.cc >Priority: Major > Attachments: image-2021-10-14-13-19-08-215.png, > image-2021-10-14-13-36-56-899.png, image-2021-10-14-13-37-51-445.png, > image-2021-10-14-14-13-13-370.png, image-2021-10-14-14-15-40-101.png, > image-2021-10-14-14-16-33-080.png, > taskmanager_container_e11_1631768043929_0012_01_04_log.txt > > > I'm using debezium to produce cdc from mysql, considering its at least one > delivery, so i must set the config > 'table.exec.source.cdc-events-duplicate=true'. > But when some unknown case make my task down, flink task restart failed > always. I found that ChangelogNormalize operator tooks too long time in > INITIALIZING stage. > > screenshot and log fragment are as follows: > !image-2021-10-14-13-19-08-215.png|width=567,height=293! > > {code:java} > 2021-10-14 12:32:33,660 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder [] - > Finished building RocksDB keyed state-backend at > /data3/yarn/nm/usercache/flink/appcache/application_1631768043929_0012/flink-io-f31735c3-e726-4c49-89a5-916670809b7a/job_7734977994a6a10f7cc784d50e4a1a34_op_KeyedProcessOperator_dc2290bb6f8f5cd2bd425368843494fe__1_1__uuid_6cbbe6ae-f43e-4d2a-b1fb-f0cb71f257af.2021-10-14 > 12:32:33,662 INFO org.apache.flink.runtime.taskmanager.Task >[] - GroupAggregate(groupBy=[teacher_id, create_day], select=[teacher_id, > create_day, SUM_RETRACT($f2) AS teacher_courseware_count]) -> > Calc(select=[teacher_id, create_day, CAST(teacher_courseware_count) AS > teacher_courseware_count]) -> NotNullEnforcer(fields=[teacher_id, > create_day]) (1/1)#143 (9cca3ef1293cc6364698381bbda93998) switched from > INITIALIZING to RUNNING.2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Ignoring > checkpoint aborted notification for non-running task > ChangelogNormalize(key=[c_id]) -> Calc(select=[c_author_id AS teacher_id, > DATE_FORMAT(c_create_time, _UTF-16LE'-MM-dd') AS create_day, IF((c_state > = 10), 1, 0) AS $f2], where=[((c_is_group = 0) AND (c_author_id <> > _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"))]) > (1/1)#143.2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Attempting > to cancel task Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (2/2)#143 > (cc25f9ae49c4db01ab40ff103fae43fd).2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (2/2)#143 > (cc25f9ae49c4db01ab40ff103fae43fd) switched from RUNNING to > CANCELING.2021-10-14 12:38:07,581 INFO > org.apache.flink.runtime.taskmanager.Task[] - Triggering > cancellation of task code Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (2/2)#143 > (cc25f9ae49c4db01ab40ff103fae43fd).2021-10-14 12:38:07,583 INFO > org.apache.flink.runtime.taskmanager.Task[] - Attempting > to cancel task Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count]) (1/2)#143 > (5419f41a3f0cc6c2f3f4c82c87f4ae22).2021-10-14 12:38:07,583 INFO > org.apache.flink.runtime.taskmanager.Task[] - Sink: > Sink(table=[default_catalog.default_database.t_flink_school_teacher_courseware_count], > fields=[teacher_id, create_day, teacher_courseware_count])
[GitHub] [flink] flinkbot commented on pull request #17531: [FLINK-24598][table-runtime] Fix test problem that the current IT cas…
flinkbot commented on pull request #17531: URL: https://github.com/apache/flink/pull/17531#issuecomment-948198035 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 4de4de84a971dbba3b6701922d083359a732b4f2 (Thu Oct 21 02:26:43 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-24598).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17531: [FLINK-24598][table-runtime] Fix test problem that the current IT cas…
flinkbot edited a comment on pull request #17531: URL: https://github.com/apache/flink/pull/17531#issuecomment-948196975 ## CI report: * 4de4de84a971dbba3b6701922d083359a732b4f2 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25294) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #17531: [FLINK-24598][table-runtime] Fix test problem that the current IT cas…
flinkbot commented on pull request #17531: URL: https://github.com/apache/flink/pull/17531#issuecomment-948196975 ## CI report: * 4de4de84a971dbba3b6701922d083359a732b4f2 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-24598) Current IT case do not cover fallback path for hash aggregate
[ https://issues.apache.org/jira/browse/FLINK-24598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-24598: --- Labels: pull-request-available (was: ) > Current IT case do not cover fallback path for hash aggregate > - > > Key: FLINK-24598 > URL: https://issues.apache.org/jira/browse/FLINK-24598 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Affects Versions: 1.14.0, 1.15.0 >Reporter: Shuo Cheng >Priority: Minor > Labels: pull-request-available > > Test data in AggregateITCaseBase#testBigData is not big enough to trigger > hash agg to sort and spill buffer and fallback to sort agg. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] RocMarshal edited a comment on pull request #17508: [FLINK-24351][docs] Translate "JSON Function" pages into Chinese
RocMarshal edited a comment on pull request #17508: URL: https://github.com/apache/flink/pull/17508#issuecomment-948196371 @MonsterChenzhuo Thanks for the update. Maybe you should rebase from latest master branch for your branch and resolve the conflicting file before next review . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] cshuo opened a new pull request #17531: [FLINK-24598][table-runtime] Fix test problem that the current IT cas…
cshuo opened a new pull request #17531: URL: https://github.com/apache/flink/pull/17531 …e do not cover fallback path for hash aggregate ## What is the purpose of the change Test data in AggregateITCaseBase#testBigData is not big enough to trigger hash agg to sort and spill buffer and fallback to sort agg. We enlarge the size of test data to fix this problem. ## Brief change log - Modify IT test in AggregateITCaseBase ## Verifying this change - The IT test case itself can cover the modification. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #17508: [FLINK-24351][docs] Translate "JSON Function" pages into Chinese
RocMarshal commented on pull request #17508: URL: https://github.com/apache/flink/pull/17508#issuecomment-948196371 @MonsterChenzhuo Maybe you should rebase from latest master branch for your branch and resolve the conflicting file before next review . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #17352: [FLINK-10230][table] Support 'SHOW CREATE VIEW' syntax to print the query of a view
RocMarshal commented on pull request #17352: URL: https://github.com/apache/flink/pull/17352#issuecomment-948195266 @wuchong I really appreciate it. I make some change based on your suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-23997) Improvement for SQL windowing table-valued function
[ https://issues.apache.org/jira/browse/FLINK-23997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Young updated FLINK-23997: --- Fix Version/s: 1.15.0 > Improvement for SQL windowing table-valued function > --- > > Key: FLINK-23997 > URL: https://issues.apache.org/jira/browse/FLINK-23997 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.15.0 >Reporter: JING ZHANG >Priority: Major > Fix For: 1.15.0 > > > This is an umbrella issue for follow up issues related with windowing > table-valued function. > FLIP-145: > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] godfreyhe commented on a change in pull request #17344: [FLINK-20895] [flink-table-planner] support local aggregate push down in table planner
godfreyhe commented on a change in pull request #17344: URL: https://github.com/apache/flink/pull/17344#discussion_r732918626 ## File path: flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/factories/TestValuesTableFactory.java ## @@ -867,27 +885,118 @@ public String asSummaryString() { allPartitions.isEmpty() ? Collections.singletonList(Collections.emptyMap()) : allPartitions; + int numRetained = 0; for (Map partition : keys) { -for (Row row : data.get(partition)) { +Collection rowsInPartition = data.get(partition); + +// handle predicates and projection +List rowsRetained = +rowsInPartition.stream() +.filter( +row -> + FilterUtils.isRetainedAfterApplyingFilterPredicates( +filterPredicates, getValueGetter(row))) +.map( +row -> { +Row projectedRow = projectRow(row); + projectedRow.setKind(row.getKind()); +return projectedRow; +}) +.collect(Collectors.toList()); + +// handle aggregates +if (!aggregateExpressions.isEmpty()) { +rowsRetained = applyAggregatesToRows(rowsRetained); +} + +// handle row data +for (Row row : rowsRetained) { +final RowData rowData = (RowData) converter.toInternal(row); +if (rowData != null) { +if (numRetained >= numElementToSkip) { +rowData.setRowKind(row.getKind()); +result.add(rowData); +} +numRetained++; +} + +// handle limit. No aggregates will be pushed down when there is a limit. if (result.size() >= limit) { return result; } -boolean isRetained = - FilterUtils.isRetainedAfterApplyingFilterPredicates( -filterPredicates, getValueGetter(row)); -if (isRetained) { -final Row projectedRow = projectRow(row); -final RowData rowData = (RowData) converter.toInternal(projectedRow); -if (rowData != null) { -if (numRetained >= numElementToSkip) { -rowData.setRowKind(row.getKind()); -result.add(rowData); -} -numRetained++; -} +} +} + +return result; +} + +private List applyAggregatesToRows(List rows) { +if (groupingSet != null && groupingSet.length > 0) { +// has group by, group firstly +Map> buffer = new HashMap<>(); +for (Row row : rows) { +Row bufferKey = new Row(groupingSet.length); +for (int i = 0; i < groupingSet.length; i++) { +bufferKey.setField(i, row.getField(groupingSet[i])); +} +if (buffer.containsKey(bufferKey)) { +buffer.get(bufferKey).add(row); +} else { +buffer.put(bufferKey, new ArrayList<>(Collections.singletonList(row))); } } +List result = new ArrayList<>(); +for (Map.Entry> entry : buffer.entrySet()) { +result.add(Row.join(entry.getKey(), accumulateRows(entry.getValue(; +} +return result; +} else { +return Collections.singletonList(accumulateRows(rows)); +} +} + +// can only apply sum/sum0/avg function for long type fields for testing +private Row accumulateRows(List rows) { +Row result = new Row(aggregateExpressions.size()); +for (int i = 0; i < aggregateExpressions.size(); i++) { +FunctionDefinition aggFunction = +aggregateExpressions.get(i).getFunctionDefinition(); +List arguments = aggregateExpressions.get(i).getArgs(); +if (aggFunction instanceof MinAggFunction) { +int argIndex =
[GitHub] [flink] godfreyhe commented on a change in pull request #17344: [FLINK-20895] [flink-table-planner] support local aggregate push down in table planner
godfreyhe commented on a change in pull request #17344: URL: https://github.com/apache/flink/pull/17344#discussion_r732920524 ## File path: flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/factories/TestValuesTableFactory.java ## @@ -867,27 +885,118 @@ public String asSummaryString() { allPartitions.isEmpty() ? Collections.singletonList(Collections.emptyMap()) : allPartitions; + int numRetained = 0; for (Map partition : keys) { -for (Row row : data.get(partition)) { +Collection rowsInPartition = data.get(partition); + +// handle predicates and projection +List rowsRetained = +rowsInPartition.stream() +.filter( +row -> + FilterUtils.isRetainedAfterApplyingFilterPredicates( +filterPredicates, getValueGetter(row))) +.map( +row -> { +Row projectedRow = projectRow(row); + projectedRow.setKind(row.getKind()); +return projectedRow; +}) +.collect(Collectors.toList()); + +// handle aggregates +if (!aggregateExpressions.isEmpty()) { +rowsRetained = applyAggregatesToRows(rowsRetained); +} + +// handle row data +for (Row row : rowsRetained) { +final RowData rowData = (RowData) converter.toInternal(row); +if (rowData != null) { +if (numRetained >= numElementToSkip) { +rowData.setRowKind(row.getKind()); +result.add(rowData); +} +numRetained++; +} + +// handle limit. No aggregates will be pushed down when there is a limit. if (result.size() >= limit) { return result; } -boolean isRetained = - FilterUtils.isRetainedAfterApplyingFilterPredicates( -filterPredicates, getValueGetter(row)); -if (isRetained) { -final Row projectedRow = projectRow(row); -final RowData rowData = (RowData) converter.toInternal(projectedRow); -if (rowData != null) { -if (numRetained >= numElementToSkip) { -rowData.setRowKind(row.getKind()); -result.add(rowData); -} -numRetained++; -} +} +} + +return result; +} + +private List applyAggregatesToRows(List rows) { +if (groupingSet != null && groupingSet.length > 0) { +// has group by, group firstly +Map> buffer = new HashMap<>(); +for (Row row : rows) { +Row bufferKey = new Row(groupingSet.length); +for (int i = 0; i < groupingSet.length; i++) { +bufferKey.setField(i, row.getField(groupingSet[i])); +} +if (buffer.containsKey(bufferKey)) { +buffer.get(bufferKey).add(row); +} else { +buffer.put(bufferKey, new ArrayList<>(Collections.singletonList(row))); } } +List result = new ArrayList<>(); +for (Map.Entry> entry : buffer.entrySet()) { +result.add(Row.join(entry.getKey(), accumulateRows(entry.getValue(; +} +return result; +} else { +return Collections.singletonList(accumulateRows(rows)); +} +} + +// can only apply sum/sum0/avg function for long type fields for testing +private Row accumulateRows(List rows) { +Row result = new Row(aggregateExpressions.size()); +for (int i = 0; i < aggregateExpressions.size(); i++) { +FunctionDefinition aggFunction = +aggregateExpressions.get(i).getFunctionDefinition(); +List arguments = aggregateExpressions.get(i).getArgs(); +if (aggFunction instanceof MinAggFunction) { +int argIndex =
[jira] [Created] (FLINK-24605) org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partitions
Abhijit Talukdar created FLINK-24605: Summary: org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partitions Key: FLINK-24605 URL: https://issues.apache.org/jira/browse/FLINK-24605 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.14.0 Reporter: Abhijit Talukdar Getting below issue when using 'scan.startup.mode' = 'group-offsets'. WITH ( 'connector' = 'kafka', 'topic' = 'ss7gsm-signaling-event', 'properties.bootstrap.servers' = '**:9093', 'properties.group.id' = 'ss7gsm-signaling-event-T5', 'value.format' = 'avro-confluent', 'value.avro-confluent.schema-registry.url' = 'https://***:9099', {color:#ff8b00}'scan.startup.mode' = 'group-offsets',{color} {color:#ff8b00} 'properties.auto.offset.reset' = 'earliest',{color} 'properties.security.protocol'= 'SASL_SSL', 'properties.ssl.truststore.location'= '/*/*/ca-certs.jks', 'properties.ssl.truststore.password'= '*', 'properties.sasl.kerberos.service.name'= 'kafka' ) 'ss7gsm-signaling-event-T5' is a new group id. If the group id is present in ZK then it works otherwise getting below exception. 'properties.auto.offset.reset' property is ignored. 021-10-20 22:18:28,267 INFO org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.ConsumerConfig [] - ConsumerConfig values: 021-10-20 22:18:28,267 INFO org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.ConsumerConfig [] - ConsumerConfig values: allow.auto.create.topics = false auto.commit.interval.ms = 5000 {color:#FF} +*auto.offset.reset = none*+{color} bootstrap.servers = [.xxx.com:9093] Exception: 021-10-20 22:18:28,620 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: KafkaSource-hiveSignaling.signaling_stg.ss7gsm_signaling_event_flink_k -> Sink: Collect table sink (1/1) (89b175333242fab8914271ad7638ba92) switched from INITIALIZING to RUNNING.021-10-20 22:18:28,620 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: KafkaSource-hiveSignaling.signaling_stg.ss7gsm_signaling_event_flink_k -> Sink: Collect table sink (1/1) (89b175333242fab8914271ad7638ba92) switched from INITIALIZING to RUNNING.2021-10-20 22:18:28,621 INFO org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator [] - Assigning splits to readers \{0=[[Partition: ss7gsm-signaling-event-2, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-8, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-7, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-9, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-5, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-6, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-0, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-4, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-1, StartingOffset: -3, StoppingOffset: -9223372036854775808], [Partition: ss7gsm-signaling-event-3, StartingOffset: -3, StoppingOffset: -9223372036854775808]]}2021-10-20 22:18:28,716 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: KafkaSource-hiveSignaling.signaling_stg.ss7gsm_signaling_event_flink_k -> Sink: Collect table sink (1/1) (89b175333242fab8914271ad7638ba92) switched from RUNNING to FAILED on xx.xxx.xxx.xxx:42075-d80607 @ xx.xxx.com (dataPort=34120).java.lang.RuntimeException: One or more fetchers have encountered exception at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:225) ~[flink-table_2.11-1.14.0.jar:1.14.0] at org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169) ~[flink-table_2.11-1.14.0.jar:1.14.0] at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:130) ~[flink-table_2.11-1.14.0.jar:1.14.0] at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:342) ~[flink-dist_2.11-1.14.0.jar:1.14.0] at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68) ~[flink-dist_2.11-1.14.0.jar:1.14.0] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[flink-dist_2.11-1.14.0.jar:1.14.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) ~[flink-dist_2.11-1.14.0.jar:1.14.0] at
[GitHub] [flink] flinkbot edited a comment on pull request #17352: [FLINK-10230][table] Support 'SHOW CREATE VIEW' syntax to print the query of a view
flinkbot edited a comment on pull request #17352: URL: https://github.com/apache/flink/pull/17352#issuecomment-926627700 ## CI report: * d45a1f76630a03ec6a0efd3d38044ab925ab7533 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25288) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-23062) FLIP-129: Register sources/sinks in Table API
[ https://issues.apache.org/jira/browse/FLINK-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431431#comment-17431431 ] Martijn Visser commented on FLINK-23062: [~airblader] Should this ticket be closed? > FLIP-129: Register sources/sinks in Table API > - > > Key: FLINK-23062 > URL: https://issues.apache.org/jira/browse/FLINK-23062 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Ingo Bürk >Assignee: Ingo Bürk >Priority: Major > > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #17440: [FLINK-24468][runtime] Wait for the channel activation before creating partition request client
flinkbot edited a comment on pull request #17440: URL: https://github.com/apache/flink/pull/17440#issuecomment-938816662 ## CI report: * 4be189160a729c2814f3e2400300632d97e76470 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25286) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * a177b8b7a724a0de1c4f76326c55753eabcc417b Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25285) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-20578) Cannot create empty array using ARRAY[]
[ https://issues.apache.org/jira/browse/FLINK-20578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431414#comment-17431414 ] Saad Ur Rahman commented on FLINK-20578: Hello, I would like to try and resolve this issue. I am trying to get familiar with the codebase and this would be an excellent entry point. > Cannot create empty array using ARRAY[] > --- > > Key: FLINK-20578 > URL: https://issues.apache.org/jira/browse/FLINK-20578 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.11.2 >Reporter: Fabian Hueske >Priority: Major > Labels: starter > Fix For: 1.15.0 > > > Calling the ARRAY function without an element (`ARRAY[]`) results in an error > message. > Is that the expected behavior? > How can users create empty arrays? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
flinkbot edited a comment on pull request #17522: URL: https://github.com/apache/flink/pull/17522#issuecomment-94673 ## CI report: * d16cc0f9e2290406d788453db9f771fe6cbc8637 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25283) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-24604) Failing tests for casting decimals to boolean
[ https://issues.apache.org/jira/browse/FLINK-24604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther closed FLINK-24604. Fix Version/s: 1.15.0 Resolution: Fixed Fixed in master: 474b241eb1257f978a83aac934cf6601570d05aa > Failing tests for casting decimals to boolean > - > > Key: FLINK-24604 > URL: https://issues.apache.org/jira/browse/FLINK-24604 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Table SQL / Runtime >Affects Versions: 1.15.0 >Reporter: Marios Trivyzas >Assignee: Marios Trivyzas >Priority: Blocker > Labels: pull-request-available > Fix For: 1.15.0 > > > Currently some tests in *CalcITCase.scala, > SimplifyJoinConditionRuleTest.scala* and *FlinkRexUtilTest.scala* are failing > because of the merge of [https://github.com/apache/flink/pull/17311] and > [https://github.com/apache/flink/pull/17439] where the first one adds some > tests with decimal to boolean cast and the latter drops this support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] twalthr closed pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
twalthr closed pull request #17530: URL: https://github.com/apache/flink/pull/17530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17352: [FLINK-10230][table] Support 'SHOW CREATE VIEW' syntax to print the query of a view
flinkbot edited a comment on pull request #17352: URL: https://github.com/apache/flink/pull/17352#issuecomment-926627700 ## CI report: * 3d03869b64cad0b1051bd0c07427a5a158320668 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=24707) * d45a1f76630a03ec6a0efd3d38044ab925ab7533 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25288) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-24331) PartiallyFinishedSourcesITCase fails with "No downstream received 0 from xxx;"
[ https://issues.apache.org/jira/browse/FLINK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kalashnikov reassigned FLINK-24331: - Assignee: Anton Kalashnikov > PartiallyFinishedSourcesITCase fails with "No downstream received 0 from xxx;" > -- > > Key: FLINK-24331 > URL: https://issues.apache.org/jira/browse/FLINK-24331 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Runtime / Task >Affects Versions: 1.14.0, 1.15.0 >Reporter: Xintong Song >Assignee: Anton Kalashnikov >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > Attachments: > logs-ci_build-test_ci_build_finegrained_resource_management-1633890853.zip > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=24287=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798=10945 > {code} > Sep 18 02:21:08 [ERROR] Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 224.44 s <<< FAILURE! - in > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase > Sep 18 02:21:08 [ERROR] test[complex graph SINGLE_SUBTASK, failover: true, > strategy: region] Time elapsed: 28.807 s <<< FAILURE! > Sep 18 02:21:08 java.lang.AssertionError: No downstream received 0 from > 0003[0]; received: {0=OperatorFinished > 0007/0, 1=OperatorFinished > 0007/1, 2=OperatorFinished > 0007/2, 3=OperatorFinished > 0007/3} > Sep 18 02:21:08 at org.junit.Assert.fail(Assert.java:89) > Sep 18 02:21:08 at org.junit.Assert.assertTrue(Assert.java:42) > Sep 18 02:21:08 at > org.apache.flink.runtime.operators.lifecycle.validation.TestJobDataFlowValidator.lambda$checkDataFlow$1(TestJobDataFlowValidator.java:96) > Sep 18 02:21:08 at java.util.HashMap.forEach(HashMap.java:1289) > Sep 18 02:21:08 at > org.apache.flink.runtime.operators.lifecycle.validation.TestJobDataFlowValidator.checkDataFlow(TestJobDataFlowValidator.java:94) > Sep 18 02:21:08 at > org.apache.flink.runtime.operators.lifecycle.validation.TestJobDataFlowValidator.checkDataFlow(TestJobDataFlowValidator.java:62) > Sep 18 02:21:08 at > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:139) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #17352: [FLINK-10230][table] Support 'SHOW CREATE VIEW' syntax to print the query of a view
flinkbot edited a comment on pull request #17352: URL: https://github.com/apache/flink/pull/17352#issuecomment-926627700 ## CI report: * 3d03869b64cad0b1051bd0c07427a5a158320668 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=24707) * d45a1f76630a03ec6a0efd3d38044ab925ab7533 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17529: [FLINK-24563][table-planner] Fix NullPointerException when comparing timestamp_ltz with random string
flinkbot edited a comment on pull request #17529: URL: https://github.com/apache/flink/pull/17529#issuecomment-947630768 ## CI report: * f1a2a94b889d9645505ec351d434c0c9b7e095f8 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25280) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] JingGe edited a comment on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory
JingGe edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-947822402 > @slinkydeveloper There are four reasons why I did not choose `StreamFormat`. > > 1. The biggest concern is that `StreamFormatAdapter.Reader#readBatch` stores all results in a batch in heap memory. This is bad because avro is a format which supports compression. You'll never know how much data will be stuffed into heap memory after inflation. > 2. `StreamFormat`, from its concept, is for a stream of bytes where each record is shipped independently. Avro is a file format which organizes the records in its own blocks, so they do not match from the concept. I would say csv format will be more suitable for `StreamFormat`. > 3. `StreamFormatAdapter` cuts batches by counting number of bytes read from the file stream. If the sync size of avro is 2MB it will read 2M bytes from file in one go and produce a batch containing no records. However this only happens at the beginning of reading a file so this might be OK. > 4. Both orc and parquet formats have implemented `BulkFormat` instead of `StreamFormat`, so why not `StreamFormat` for them? The consideration behind your solution was great! Thanks for your contribution. I will try to share what I understood with you. Let's discuss and understand the design together. Correct me if I am wrong. For point 1, the uncompressed data size should be controlled by `StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap size, since the last read might overfulfil the quota a little bit, but it is acceptable. Speaking of compression, StreamFormatAdapter has built-in compressors support. Does this PR implementation have the same support too? For point 2, StreamFormat defines a way to read each record. The granularity of each record could be controlled by the generic type `StreamFormat.Reader`. There is plenty room to play if single avro record is too small in this case. For point 4, it is a good question, we should deep dive into the code. Might It make sense to refactor the orc and parquet formats to StreamFormat too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] JingGe commented on pull request #17520: [FLINK-24565][avro] Port avro file format factory to BulkReaderFormatFactory
JingGe commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-947822402 > @slinkydeveloper There are four reasons why I did not choose `StreamFormat`. > > 1. The biggest concern is that `StreamFormatAdapter.Reader#readBatch` stores all results in a batch in heap memory. This is bad because avro is a format which supports compression. You'll never know how much data will be stuffed into heap memory after inflation. > 2. `StreamFormat`, from its concept, is for a stream of bytes where each record is shipped independently. Avro is a file format which organizes the records in its own blocks, so they do not match from the concept. I would say csv format will be more suitable for `StreamFormat`. > 3. `StreamFormatAdapter` cuts batches by counting number of bytes read from the file stream. If the sync size of avro is 2MB it will read 2M bytes from file in one go and produce a batch containing no records. However this only happens at the beginning of reading a file so this might be OK. > 4. Both orc and parquet formats have implemented `BulkFormat` instead of `StreamFormat`, so why not `StreamFormat` for them? The consideration behind your solution was great! Thanks for your contribution. Let's discuss and understand the design together. Correct me if I am wrong. For point 1, the uncompressed data size should be controlled by `StreamFormat.FETCH_IO_SIZE`. It might not be very precise to control the heap size, since the last read might overfulfil the quota a little bit, but it is acceptable. Speaking of compression, StreamFormatAdapter has built-in compressors support. Does this PR implementation have the same support too? For point 2, StreamFormat defines a way to read each record. The granularity of each record could be controlled by the generic type `StreamFormat.Reader`. There is plenty room to play if single avro record is too small in this case. For point 4, it is a good question, we should deep dive into the code. Might It make sense to refactor the orc and parquet formats to StreamFormat too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) * dcfd55678d1aa04a9223d5b7c60e80a4acaa8643 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25287) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] godfreyhe commented on a change in pull request #17344: [FLINK-20895] [flink-table-planner] support local aggregate push down in table planner
godfreyhe commented on a change in pull request #17344: URL: https://github.com/apache/flink/pull/17344#discussion_r732638497 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/batch/PushLocalHashAggIntoScanRule.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.planner.plan.rules.physical.batch; + +import org.apache.flink.table.api.config.OptimizerConfigOptions; +import org.apache.flink.table.connector.source.abilities.SupportsAggregatePushDown; +import org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalExchange; +import org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalLocalHashAggregate; +import org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalTableSourceScan; +import org.apache.flink.table.planner.plan.schema.TableSourceTable; + +import org.apache.calcite.plan.RelOptRuleCall; + +/** + * Planner rule that tries to push a local hash or sort aggregate which without sort into a {@link Review comment: the comment should be updated, remove "or sort aggregate" ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/batch/PushLocalSortAggWithoutSortIntoScanRule.java ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.planner.plan.rules.physical.batch; + +import org.apache.flink.table.api.config.OptimizerConfigOptions; +import org.apache.flink.table.connector.source.abilities.SupportsAggregatePushDown; +import org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalExchange; +import org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalLocalSortAggregate; +import org.apache.flink.table.planner.plan.nodes.physical.batch.BatchPhysicalTableSourceScan; +import org.apache.flink.table.planner.plan.schema.TableSourceTable; + +import org.apache.calcite.plan.RelOptRuleCall; + +/** + * Planner rule that tries to push a local hash or sort aggregate which without sort into a {@link + * BatchPhysicalTableSourceScan} whose table is a {@link TableSourceTable} with a source supporting + * {@link SupportsAggregatePushDown}. The {@link + * OptimizerConfigOptions#TABLE_OPTIMIZER_SOURCE_AGGREGATE_PUSHDOWN_ENABLED} need to be true. + * + * Suppose we have the original physical plan: + * + * {@code + * BatchPhysicalHashAggregate (global) Review comment: BatchPhysicalSortAggregate ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/batch/PushLocalAggIntoScanRuleBase.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) * dcfd55678d1aa04a9223d5b7c60e80a4acaa8643 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
flinkbot edited a comment on pull request #17522: URL: https://github.com/apache/flink/pull/17522#issuecomment-94673 ## CI report: * 2cacbb5c9e9c01e5d047c4a65116c2df43b8ebbf Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25281) * d16cc0f9e2290406d788453db9f771fe6cbc8637 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25283) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) * dcfd55678d1aa04a9223d5b7c60e80a4acaa8643 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) * dcfd55678d1aa04a9223d5b7c60e80a4acaa8643 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) * dcfd55678d1aa04a9223d5b7c60e80a4acaa8643 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17528: [FLINK-24563][table-planner] Fix NullPointerException when comparing timestamp_ltz with random string
flinkbot edited a comment on pull request #17528: URL: https://github.com/apache/flink/pull/17528#issuecomment-947612146 ## CI report: * 61f8472269a74ee2dc391272a9b2a7b4244b44f3 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25278) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17440: [FLINK-24468][runtime] Wait for the channel activation before creating partition request client
flinkbot edited a comment on pull request #17440: URL: https://github.com/apache/flink/pull/17440#issuecomment-938816662 ## CI report: * 658994fafd3947a2a9a49fa91d3c7e4fce3e52b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25227) * 4be189160a729c2814f3e2400300632d97e76470 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25286) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17440: [FLINK-24468][runtime] Wait for the channel activation before creating partition request client
flinkbot edited a comment on pull request #17440: URL: https://github.com/apache/flink/pull/17440#issuecomment-938816662 ## CI report: * 658994fafd3947a2a9a49fa91d3c7e4fce3e52b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25227) * 4be189160a729c2814f3e2400300632d97e76470 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17503: [FLINK-24334][Deployment/Kubernetes] Configuration 'kubernetes.flink.log.dir' compatible
flinkbot edited a comment on pull request #17503: URL: https://github.com/apache/flink/pull/17503#issuecomment-945138587 ## CI report: * a1619682d1eeaefeaec347c646e12f2b4ed7feb9 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25277) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25282) * a177b8b7a724a0de1c4f76326c55753eabcc417b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25285) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource
JingGe commented on a change in pull request #17501: URL: https://github.com/apache/flink/pull/17501#discussion_r732881890 ## File path: flink-formats/flink-avro/pom.xml ## @@ -26,7 +26,7 @@ under the License. org.apache.flink flink-formats 1.15-SNAPSHOT - .. + ../pom.xml Review comment: Further more, I am not aware that BulkFormat was "specifically" designed to support orc and parquet. The javadoc tells us that "The BulkFormat reads and decodes **batches** of records at a time. **Examples of bulk formats** are formats like ORC or Parquet." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot edited a comment on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25284) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25282) * a177b8b7a724a0de1c4f76326c55753eabcc417b UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource
JingGe commented on a change in pull request #17501: URL: https://github.com/apache/flink/pull/17501#discussion_r732878761 ## File path: flink-formats/flink-avro/pom.xml ## @@ -26,7 +26,7 @@ under the License. org.apache.flink flink-formats 1.15-SNAPSHOT - .. + ../pom.xml Review comment: There are plenty logics implemented in the StreamFormatAdapter, as I mentioned in the "open questions" section, why should I do my own implementation again from BulkFormat instead of reusing them? The design idea is to let BulkStream handle batch and let StreamFormat/FileRecordFormat handle streaming, afaik. Your question is leading actually to a fundamental question: why do we need StreamFormat/FileRecordFormat if we can implement everything from the BulkFormat which supports both batch and streaming, quoted from your word, I didn't see any reference about this conclusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot commented on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947759705 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 44a40ae5521cc5684ef7a809f6f772c25cb3d779 (Wed Oct 20 15:06:18 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] matriv commented on a change in pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
matriv commented on a change in pull request #17522: URL: https://github.com/apache/flink/pull/17522#discussion_r732876782 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/data/casting/CastRuleProvider.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.data.casting; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.table.data.casting.rules.IdentityCastRule$; +import org.apache.flink.table.data.casting.rules.TimestampToStringCastRule$; +import org.apache.flink.table.data.casting.rules.UpcastToBigIntCastRule$; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.LogicalTypeFamily; +import org.apache.flink.table.types.logical.LogicalTypeRoot; + +import javax.annotation.Nullable; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** This class resolves {@link CastRule} starting from the input and the target type. */ Review comment: Thx!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
flinkbot commented on pull request #17530: URL: https://github.com/apache/flink/pull/17530#issuecomment-947758131 ## CI report: * 44a40ae5521cc5684ef7a809f6f772c25cb3d779 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-24604) Failing tests for casting decimals to boolean
[ https://issues.apache.org/jira/browse/FLINK-24604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-24604: --- Labels: pull-request-available (was: ) > Failing tests for casting decimals to boolean > - > > Key: FLINK-24604 > URL: https://issues.apache.org/jira/browse/FLINK-24604 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Table SQL / Runtime >Affects Versions: 1.15.0 >Reporter: Marios Trivyzas >Assignee: Marios Trivyzas >Priority: Blocker > Labels: pull-request-available > > Currently some tests in *CalcITCase.scala, > SimplifyJoinConditionRuleTest.scala* and *FlinkRexUtilTest.scala* are failing > because of the merge of [https://github.com/apache/flink/pull/17311] and > [https://github.com/apache/flink/pull/17439] where the first one adds some > tests with decimal to boolean cast and the latter drops this support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] matriv opened a new pull request #17530: [FLINK-24604] Remove tests which use decimal -> boolean cast
matriv opened a new pull request #17530: URL: https://github.com/apache/flink/pull/17530 The decimal->boolean cast has been dropped but another PR added some tests using this cast, removing the casts, and keep only the ones that cast integer numerics to boolean. Follows: #32f7cc9e34be67eaf1b746697f2fabefcd5f46c5 Follows: #fc92a8830d07416d37be0ed4c5fe472ac0531c25 ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-24604) Failing tests for casting decimals to boolean
Marios Trivyzas created FLINK-24604: --- Summary: Failing tests for casting decimals to boolean Key: FLINK-24604 URL: https://issues.apache.org/jira/browse/FLINK-24604 Project: Flink Issue Type: Bug Components: Table SQL / Planner, Table SQL / Runtime Affects Versions: 1.15.0 Reporter: Marios Trivyzas Assignee: Marios Trivyzas Currently some tests in *CalcITCase.scala, SimplifyJoinConditionRuleTest.scala* and *FlinkRexUtilTest.scala* are failing because of the merge of [https://github.com/apache/flink/pull/17311] and [https://github.com/apache/flink/pull/17439] where the first one adds some tests with decimal to boolean cast and the latter drops this support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource
JingGe commented on a change in pull request #17501: URL: https://github.com/apache/flink/pull/17501#discussion_r732865726 ## File path: flink-formats/flink-avro/pom.xml ## @@ -26,7 +26,7 @@ under the License. org.apache.flink flink-formats 1.15-SNAPSHOT - .. + ../pom.xml Review comment: where is the reference to tell us that BulkFormat support streaming? Afaik, all javadocs about BulkFormat are only talking about batch, please refer to the javadoc of BulkFormat itself and the javadoc of FileSource. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17472: [FLINK-24486][rest] Make async result store duration configurable
flinkbot edited a comment on pull request #17472: URL: https://github.com/apache/flink/pull/17472#issuecomment-943076656 ## CI report: * f7c2d004a9d29fb2c3445445ed32ba9e9bb74e78 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25274) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18568) Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
[ https://issues.apache.org/jira/browse/FLINK-18568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431292#comment-17431292 ] Martijn Visser commented on FLINK-18568: [~psrinivasulu] Hi! I wanted to check if there is any update from your end regarding this ticket? > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink > -- > > Key: FLINK-18568 > URL: https://issues.apache.org/jira/browse/FLINK-18568 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Assignee: Srinivasulu Punuru >Priority: Minor > Labels: auto-deprioritized-major, stale-assigned > Fix For: 1.15.0 > > > The objective of this improvement is to add support for Azure Data Lake Store > Gen 2 (ADLS Gen2) [2] as one of the supported filesystems for the Streaming > File Sink [1] > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html > [2] https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
flinkbot edited a comment on pull request #17522: URL: https://github.com/apache/flink/pull/17522#issuecomment-94673 ## CI report: * c5310e9d376b11af0bdbb9d3f420129a5cec7975 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25266) * 2cacbb5c9e9c01e5d047c4a65116c2df43b8ebbf Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25281) * d16cc0f9e2290406d788453db9f771fe6cbc8637 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25283) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
flinkbot edited a comment on pull request #17522: URL: https://github.com/apache/flink/pull/17522#issuecomment-94673 ## CI report: * c5310e9d376b11af0bdbb9d3f420129a5cec7975 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25266) * 2cacbb5c9e9c01e5d047c4a65116c2df43b8ebbf Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25281) * d16cc0f9e2290406d788453db9f771fe6cbc8637 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] slinkydeveloper commented on a change in pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
slinkydeveloper commented on a change in pull request #17522: URL: https://github.com/apache/flink/pull/17522#discussion_r732842077 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/data/casting/CastRuleProvider.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.data.casting; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.table.data.casting.rules.IdentityCastRule$; +import org.apache.flink.table.data.casting.rules.TimestampToStringCastRule$; +import org.apache.flink.table.data.casting.rules.UpcastToBigIntCastRule$; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.LogicalTypeFamily; +import org.apache.flink.table.types.logical.LogicalTypeRoot; + +import javax.annotation.Nullable; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** This class resolves {@link CastRule} starting from the input and the target type. */ Review comment: How about now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr closed pull request #17405: [FLINK-21456][table] Copy DateTimeUtils from avatica-core and introduce StringUtils
twalthr closed pull request #17405: URL: https://github.com/apache/flink/pull/17405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17464: [FLINK-24413] Casting to a CHAR() and VARCHAR() doesn't trim the stri…
flinkbot edited a comment on pull request #17464: URL: https://github.com/apache/flink/pull/17464#issuecomment-942309521 ## CI report: * 172ccce01b7019a637a5730c0cd76422cac94f89 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25273) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25282) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17527: [docs] Add link from README.md to 'Building Flink from Source'
flinkbot edited a comment on pull request #17527: URL: https://github.com/apache/flink/pull/17527#issuecomment-947535485 ## CI report: * d65d5d3d68e46a5649d762c45a3b973602586607 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25270) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #16606: [FLINK-21357][runtime/statebackend]Periodic materialization for generalized incremental checkpoints
flinkbot edited a comment on pull request #16606: URL: https://github.com/apache/flink/pull/16606#issuecomment-887431748 ## CI report: * f09d671bd530b29cfc8ed31c1300179a97698576 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25272) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-24577) Support cast from BINARY/VARBINARY/BYTES to of RAW()
[ https://issues.apache.org/jira/browse/FLINK-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431252#comment-17431252 ] Marios Trivyzas commented on FLINK-24577: - Sorry, didn't get, what do you mean position-based? Are you maybe mixing up *ROW* with *RAW*? > Support cast from BINARY/VARBINARY/BYTES to of RAW() > --- > > Key: FLINK-24577 > URL: https://issues.apache.org/jira/browse/FLINK-24577 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Reporter: Marios Trivyzas >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-24385) Add TRY_CAST function to be able to handle errors
[ https://issues.apache.org/jira/browse/FLINK-24385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marios Trivyzas reassigned FLINK-24385: --- Assignee: Marios Trivyzas > Add TRY_CAST function to be able to handle errors > - > > Key: FLINK-24385 > URL: https://issues.apache.org/jira/browse/FLINK-24385 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Marios Trivyzas >Assignee: Marios Trivyzas >Priority: Major > > Currently, *CAST* is returning null when the conversion requested fails, > whereas normally in SQL it would just throw an error. Maybe it would be > better to change the implementation of *CAST* to throw errors on failed > conversions, and introduce *TRY_CAST* which would return null in such cases. > Then with a simple wrapping of a *TRY_CAST* expression with *COALESCE* the > user can also use an alternative default value to be returned instead of > null. e..g: > {{}}{{SELECT COALESCE(TRY_CAST(col1 AS INT), -1) FROM test}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #17508: [FLINK-24351][docs] Translate "JSON Function" pages into Chinese
flinkbot edited a comment on pull request #17508: URL: https://github.com/apache/flink/pull/17508#issuecomment-945469753 ## CI report: * 0bf509b5cfdb5f035d72539c07d7443aa55e181e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25275) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * 817c08dd17d68e1c8939a7f1eef5aa901e128080 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25279) * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25282) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * 817c08dd17d68e1c8939a7f1eef5aa901e128080 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25279) * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * 817c08dd17d68e1c8939a7f1eef5aa901e128080 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25279) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-24544) Failure when using Kafka connector in Table API with Avro and Confluent schema registry
[ https://issues.apache.org/jira/browse/FLINK-24544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431204#comment-17431204 ] Peter Schrott edited comment on FLINK-24544 at 10/20/21, 1:36 PM: -- The underlying problem with deserialization of records with enums form Kafka & schema registry lies in the initialization of {{GenericDatumReader}}: *Case Kafka & SR:* In {{AvroDeserializationSchema.java}} the {{GenericDatumReader}} is initialized with {{writerSchema = null}} and {{readerSchema = the schma gained from table ddl}} -> When calling {{RegistryAvroDeserializationSchema.deserialize(.)}} {{datumReader.setSchema()}} sets the attribute {{actual}} is set to the actual avro schema, whereas {{expected}} is already set to {{readerSchema}} -> The inequality of {{actual}} and {{expected}} causes the exception on serializing as type of {{actual}} and {{expected}} do not match --> Root of this is: the initialization of {{DeserializationSchema}} in {{RegistryAvroFormatFactory.java}} uses the {{rowType}} && {{ConfluentRegistryAvroDeserializationSchema.forGeneric(.)}} when creating the {{ConfluentRegistryAvroDeserializationSchema}} *Case FS:* In {{AvroInputFormat.java}} the {{GenericDatumReader}} is initialized with {{writerSchema = null}} and {{readerSchema = null}} -> This leads in initialization of {{DataFileStream}}, where {{reader.getSchema(.)}} is called with the actual avro, to the fact that in the {{GenericDatumReader}} attribute {{expected}} and {{actual}} is set to the passed value -> The avro schema is taken from file -> The equality of {{actual}} and {{expected}} leads to the fact that serialized data can be read from file was (Author: peter.schrott): The underlying problem with deserialization of records with enums form Kafka & schema registry lies in the initialization of \{{GenericDatumReader}}: Case Kafka & SR: In \{{AvroDeserializationSchema.java}} the \{{GenericDatumReader}} is initialized with \{{writerSchema = null}} and \{{readerSchema = the schma gained from table ddl}} -> When calling \{{RegistryAvroDeserializationSchema.deserialize(.)}} \{{datumReader.setSchema()}} sets the attribute \{{actual}} is set to the actual avro schema, whereas \{{expected}} is already set to \{{readerSchema}} -> The inequality of \{{actual}} and \{{expected}} causes the exception on serializing as type of \{{actual}} and \{{expected}} do not match --> Root of this is: the initialization of \{{DeserializationSchema}} in \{{RegistryAvroFormatFactory.java}} uses the \{{rowType}} && \{{ConfluentRegistryAvroDeserializationSchema.forGeneric(.)}} when creating the \{{ConfluentRegistryAvroDeserializationSchema}} Case FS: In \{{AvroInputFormat.java}} the \{{GenericDatumReader}} is initialized with \{{writerSchema = null}} and \{{readerSchema = null}} -> This leads in initialization of \{{DataFileStream}}, where \{{reader.getSchema(.)}} is called with the actual avro, to the fact that in the \{{GenericDatumReader}} attribute \{{expected}} and \{{actual}} is set to the passed value -> The avro schema is taken from file -> The equality of \{{actual}} and \{{expected}} leads to the fact that serialized data can be read from file > Failure when using Kafka connector in Table API with Avro and Confluent > schema registry > > > Key: FLINK-24544 > URL: https://issues.apache.org/jira/browse/FLINK-24544 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Formats (JSON, Avro, Parquet, ORC, > SequenceFile), Table SQL / Ecosystem >Affects Versions: 1.13.1 >Reporter: Francesco Guardiani >Priority: Major > Attachments: flink-deser-avro-enum.zip > > > A user reported in the [mailing > list|https://lists.apache.org/thread.html/re38a07f6121cc580737a20c11574719cfe554e58d99817f79db9bb4a%40%3Cuser.flink.apache.org%3E] > that Avro deserialization fails when using Kafka, Avro and Confluent Schema > Registry: > {code:java} > Caused by: java.io.IOException: Failed to deserialize Avro record. > at > org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:106) > at > org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:46) > > at > org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:82) > at > org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaDeserializationSchema.deserialize(DynamicKafkaDeserializationSchema.java:113) > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:179) > > at >
[GitHub] [flink] zentol commented on a change in pull request #17493: [FLINK-24019][build][dist] Package Scala APIs separately
zentol commented on a change in pull request #17493: URL: https://github.com/apache/flink/pull/17493#discussion_r732774163 ## File path: flink-dist/src/main/assemblies/bin.xml ## @@ -69,6 +69,14 @@ under the License. 0644 + + + ../flink-dist-scala/target/flink-dist-scala_${scala.binary.version}-${project.version}.jar + lib/ + flink-scala_${scala.binary.version}-${project.version}.jar Review comment: No, because that would break all scripts that use globbing patters like `flink-dist*`. (tried that initially...) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on pull request #17493: [FLINK-24019][build][dist] Package Scala APIs separately
zentol commented on pull request #17493: URL: https://github.com/apache/flink/pull/17493#issuecomment-947674061 > [flink-dist does not bundle any scala-reliant classes so let's remove the suffix from the _artifact_ but keep it on the artifactId] That's something we could consider, yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * 817c08dd17d68e1c8939a7f1eef5aa901e128080 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25279) * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] matriv commented on a change in pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
matriv commented on a change in pull request #17522: URL: https://github.com/apache/flink/pull/17522#discussion_r732782351 ## File path: flink-table/flink-table-planner/src/test/java/org/apache/flink/table/data/casting/CastRulesTest.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.data.casting; + +import org.apache.flink.table.data.StringData; +import org.apache.flink.table.data.TimestampData; +import org.apache.flink.table.types.DataType; + +import org.junit.jupiter.api.DynamicTest; +import org.junit.jupiter.api.TestFactory; + +import java.time.LocalDateTime; +import java.time.ZoneId; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.stream.Stream; + +import static org.apache.flink.table.api.DataTypes.BIGINT; +import static org.apache.flink.table.api.DataTypes.INT; +import static org.apache.flink.table.api.DataTypes.SMALLINT; +import static org.apache.flink.table.api.DataTypes.STRING; +import static org.apache.flink.table.api.DataTypes.TIMESTAMP; +import static org.apache.flink.table.api.DataTypes.TIMESTAMP_LTZ; +import static org.apache.flink.table.api.DataTypes.TINYINT; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; + +class CastRulesTest { Review comment: nit: Maybe add a class comment, linking to `CastFunctionITCase` and the opposite so that folks can easily jump back and forth from the unitTests to the ITTests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] matriv commented on a change in pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
matriv commented on a change in pull request #17522: URL: https://github.com/apache/flink/pull/17522#discussion_r732780163 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/data/casting/CastRuleProvider.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.data.casting; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.table.data.casting.rules.IdentityCastRule$; +import org.apache.flink.table.data.casting.rules.TimestampToStringCastRule$; +import org.apache.flink.table.data.casting.rules.UpcastToBigIntCastRule$; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.LogicalTypeFamily; +import org.apache.flink.table.types.logical.LogicalTypeRoot; + +import javax.annotation.Nullable; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** This class resolves {@link CastRule} starting from the input and the target type. */ Review comment: I would then just drop the `starting from`, and use by using lookup tables bases on input and target type. I just think that stating `starting from` reveals implementation detail which is in fact opposite from the one actually followed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * 817c08dd17d68e1c8939a7f1eef5aa901e128080 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25279) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #17493: [FLINK-24019][build][dist] Package Scala APIs separately
zentol commented on a change in pull request #17493: URL: https://github.com/apache/flink/pull/17493#discussion_r732774163 ## File path: flink-dist/src/main/assemblies/bin.xml ## @@ -69,6 +69,14 @@ under the License. 0644 + + + ../flink-dist-scala/target/flink-dist-scala_${scala.binary.version}-${project.version}.jar + lib/ + flink-scala_${scala.binary.version}-${project.version}.jar Review comment: No, because that would break all scripts that use globbing patters like `flink-dist*`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17522: [FLINK-24462][table] Introduce CastRule interface to reorganize casting code
flinkbot edited a comment on pull request #17522: URL: https://github.com/apache/flink/pull/17522#issuecomment-94673 ## CI report: * c5310e9d376b11af0bdbb9d3f420129a5cec7975 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25266) * 2cacbb5c9e9c01e5d047c4a65116c2df43b8ebbf Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25281) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #17518: [FLINK-24018][build] Remove Scala dependencies from Java APIs
flinkbot edited a comment on pull request #17518: URL: https://github.com/apache/flink/pull/17518#issuecomment-946526433 ## CI report: * 817c08dd17d68e1c8939a7f1eef5aa901e128080 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25279) * fc128a1c856f13f484bd2582b7d2d7c6958bb8ec UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org