[GitHub] [flink] flinkbot commented on pull request #23144: [BP-1.17][FLINK-32261][table] Add built-in MAP_UNION function.
flinkbot commented on PR #23144: URL: https://github.com/apache/flink/pull/23144#issuecomment-1666393524 ## CI report: * ae352d372df44f4d3b8bcce8a14a16b910d056e3 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] hanyuzheng7 commented on pull request #23144: [BP-1.17][FLINK-32261][table] Add built-in MAP_UNION function.
hanyuzheng7 commented on PR #23144: URL: https://github.com/apache/flink/pull/23144#issuecomment-1666393326 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] hanyuzheng7 opened a new pull request, #23144: [BP-1.17][FLINK-32261][table] Add built-in MAP_UNION function.
hanyuzheng7 opened a new pull request, #23144: URL: https://github.com/apache/flink/pull/23144 ## What is the purpose of the change This is an implementation of MAP_UNION Returns a map created by merging at least one map. These maps should have a common map type. If there are overlapping keys, the value from 'map2' will overwrite the value from 'map1', the value from 'map3' will overwrite the value from 'map2', the value from 'mapn' will overwrite the value from 'map(n-1)'. If any of maps is null, return null. ## Brief change log MAP_UNION for Table API and SQL - Syntax: `MAP_UNION(map1, ...)` - Arguments: map1:The first map to be merged. map2:The second map to be merged. mapn: the n map to be merged - Returns: If there are overlapping keys, the value from 'map2' will overwrite the value from 'map1', the value from 'map3' will overwrite the value from 'map2', the value from 'mapn' will overwrite the value from 'map(n-1)'. If any of maps is null, return null. - Examples: Merging maps with unique keys: ``` map1 = ['a': 1, 'b': 2] map2 = ['c': 3, 'd': 4] map_union(map1, map2) Output: ['a': 1, 'b': 2, 'c': 3, 'd': 4] ``` Merging maps with overlapping keys: ``` map1 = ['a': 1, 'b': 2] map2 = ['b': 3, 'c': 4] map_union(map1, map2) Output: ['a': 1, 'b': 3, 'c': 4] ``` ## Verifying this change - This change added tests in MapFunctionITCase. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23143: [FLINK-32759][doc] Remove the `cluster.declarative-resource-management.enabled` in doc
flinkbot commented on PR #23143: URL: https://github.com/apache/flink/pull/23143#issuecomment-1666379967 ## CI report: * 7ee09938a1c348b0d1552fc1ad3a41d598fd2beb UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32759) Remove the removed config in the doc
[ https://issues.apache.org/jira/browse/FLINK-32759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32759: --- Labels: pull-request-available (was: ) > Remove the removed config in the doc > > > Key: FLINK-32759 > URL: https://issues.apache.org/jira/browse/FLINK-32759 > Project: Flink > Issue Type: Technical Debt > Components: Documentation, Runtime / Coordination >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Attachments: image-2023-08-05-11-54-17-714.png > > > The cluster.declarative-resource-management.enabled was removed at > FLINK-21095(https://github.com/apache/flink/pull/15838/files), so it doesn't > work now. > However, the flink doc still have it. > !image-2023-08-05-11-54-17-714.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] 1996fanrui opened a new pull request, #23143: [FLINK-32759][doc] Remove the `cluster.declarative-resource-management.enabled` in doc
1996fanrui opened a new pull request, #23143: URL: https://github.com/apache/flink/pull/23143 ## What is the purpose of the change he cluster.declarative-resource-management.enabled was removed at [FLINK-21095](https://issues.apache.org/jira/browse/FLINK-21095)(https://github.com/apache/flink/pull/15838/files), so it doesn't work now. However, the flink doc still have it. ![](https://github.com/apache/flink/assets/38427477/1912dc7a-414f-44e1-906b-762ee1b5432b) ## Brief change log [FLINK-32759][doc] Remove the `cluster.declarative-resource-management.enabled` in doc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-32759) Remove the removed config in the doc
Rui Fan created FLINK-32759: --- Summary: Remove the removed config in the doc Key: FLINK-32759 URL: https://issues.apache.org/jira/browse/FLINK-32759 Project: Flink Issue Type: Technical Debt Components: Documentation, Runtime / Coordination Reporter: Rui Fan Assignee: Rui Fan Attachments: image-2023-08-05-11-54-17-714.png The cluster.declarative-resource-management.enabled was removed at FLINK-21095(https://github.com/apache/flink/pull/15838/files), so it doesn't work now. However, the flink doc still have it. !image-2023-08-05-11-54-17-714.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] hanyuzheng7 commented on pull request #23098: [BP-1.17][FLINK-32262] Add MAP_ENTRIES support
hanyuzheng7 commented on PR #23098: URL: https://github.com/apache/flink/pull/23098#issuecomment-1666377009 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32755) Add quick start guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-32755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751246#comment-17751246 ] xiangyu feng commented on FLINK-32755: -- Hi [~zjureel], the quick start guide will also include flink olap e2e performance test report. Would pls assign this to me? > Add quick start guide for Flink OLAP > > > Key: FLINK-32755 > URL: https://issues.apache.org/jira/browse/FLINK-32755 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Priority: Major > > I propose to add a new {{QUICKSTART.md}} guide that provides instructions for > beginner to build a production ready Flink OLAP Service by using > flink-jdbc-driver, flink-sql-gateway and flink session cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-32716) Give 'None' option for 'scheduler-mode'
[ https://issues.apache.org/jira/browse/FLINK-32716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750517#comment-17750517 ] Kwangin (Dennis) Jung edited comment on FLINK-32716 at 8/5/23 1:10 AM: --- [~paul8263] thanks for comment! I'm currently working on it. If it's okay please review the work when ready :) But seems I need to wait to be assigned... was (Author: JIRAUSER301448): [~paul8263] thanks for comment! I'm currently working on it. If it's okay please review the work when ready :) > Give 'None' option for 'scheduler-mode' > --- > > Key: FLINK-32716 > URL: https://issues.apache.org/jira/browse/FLINK-32716 > Project: Flink > Issue Type: Improvement >Reporter: Kwangin (Dennis) Jung >Priority: Minor > > By setting-up scheduler-mode as 'REACTIVE', it scales-up/down by computing > status. > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-scheduling-options] > But currently it only allows 'REACTIVE', and when I want to de-activate with > such value as 'None', it causes exception. > (For now, it causes exception if I setup any other value instead of > 'REACTIVE') > > To make configuration bit more flexible, how about give 'None' (or 'Default') > as an option, to run in default mode? > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32716) Give 'None' option for 'scheduler-mode'
[ https://issues.apache.org/jira/browse/FLINK-32716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kwangin (Dennis) Jung updated FLINK-32716: -- Description: By setting-up scheduler-mode as 'REACTIVE', it scales-up/down by computing status. [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-scheduling-options] But currently it only allows 'REACTIVE', and when I want to de-activate with such value as 'None', it causes exception. (For now, it causes exception if I setup any other value instead of 'REACTIVE') To make configuration bit more flexible, how about give 'None' (or 'Default') as an option, to run in default mode? was: By setting-up scheduler-mode as 'REACTIVE', it scales-up/down by computing status. To make configuration bit more flexible, how about give 'None' (or 'Default') as an option, to run in default mode? (For now, it causes exception if I setup any other value instead of 'REACTIVE') > Give 'None' option for 'scheduler-mode' > --- > > Key: FLINK-32716 > URL: https://issues.apache.org/jira/browse/FLINK-32716 > Project: Flink > Issue Type: Improvement >Reporter: Kwangin (Dennis) Jung >Priority: Minor > > By setting-up scheduler-mode as 'REACTIVE', it scales-up/down by computing > status. > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-scheduling-options] > But currently it only allows 'REACTIVE', and when I want to de-activate with > such value as 'None', it causes exception. > (For now, it causes exception if I setup any other value instead of > 'REACTIVE') > > To make configuration bit more flexible, how about give 'None' (or 'Default') > as an option, to run in default mode? > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] hanyuzheng7 closed pull request #23142: test
hanyuzheng7 closed pull request #23142: test URL: https://github.com/apache/flink/pull/23142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] hanyuzheng7 opened a new pull request, #23142: test
hanyuzheng7 opened a new pull request, #23142: URL: https://github.com/apache/flink/pull/23142 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-32758) PyFlink bounds are overly restrictive and outdated
Deepyaman Datta created FLINK-32758: --- Summary: PyFlink bounds are overly restrictive and outdated Key: FLINK-32758 URL: https://issues.apache.org/jira/browse/FLINK-32758 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.17.1 Reporter: Deepyaman Datta Hi! I am part of a team building the Flink backend for Ibis ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink under the hood for execution; however, PyFlink's requirements are incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and restrictive requirements prevent it from being used alongside most recent releases of Python data libraries. Some of the major libraries we (and likely others in the Python community interested in using PyFlink alongside other libraries) need compatibility with: * PyArrow (at least >=10.0.0, but there's no reason not to be also be compatible with latest) * pandas (should be compatible with 2.x series, but also probably with 1.4.x, released January 2022, and 1.5.x) * numpy (1.22 was released in December 2022) * Newer releases of Apache Beam * Newer releases of cython Furthermore, uncapped dependencies could be more generally preferable, as they avoid the need for frequent PyFlink releases as newer versions of libraries are released. A common (and great) argument for not upper-bounding dependencies, especially for libraries: [https://iscinumpy.dev/post/bound-version-constraints/] I am currently testing removing upper bounds in [https://github.com/apache/flink/pull/23141]; so far, builds pass without issue in [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581], and I'm currently waiting on [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6] to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed dependencies results in: {{#}} {{# This file is autogenerated by pip-compile with Python 3.8}} {{# by the following command:}} {{#}} {{# pip-compile --config=pyproject.toml --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}} {{#}} {{apache-beam==2.49.0}} {{ # via -r dev/dev-requirements.txt}} {{avro-python3==1.10.2}} {{ # via -r dev/dev-requirements.txt}} {{certifi==2023.7.22}} {{ # via requests}} {{charset-normalizer==3.2.0}} {{ # via requests}} {{cloudpickle==2.2.1}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{crcmod==1.7}} {{ # via apache-beam}} {{cython==3.0.0}} {{ # via -r dev/dev-requirements.txt}} {{dill==0.3.1.1}} {{ # via apache-beam}} {{dnspython==2.4.1}} {{ # via pymongo}} {{docopt==0.6.2}} {{ # via hdfs}} {{exceptiongroup==1.1.2}} {{ # via pytest}} {{fastavro==1.8.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{fasteners==0.18}} {{ # via apache-beam}} {{find-libpython==0.3.0}} {{ # via pemja}} {{grpcio==1.56.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # grpcio-tools}} {{grpcio-tools==1.56.2}} {{ # via -r dev/dev-requirements.txt}} {{hdfs==2.7.0}} {{ # via apache-beam}} {{httplib2==0.22.0}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{idna==3.4}} {{ # via requests}} {{iniconfig==2.0.0}} {{ # via pytest}} {{numpy==1.24.4}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{ # pyarrow}} {{objsize==0.6.1}} {{ # via apache-beam}} {{orjson==3.9.2}} {{ # via apache-beam}} {{packaging==23.1}} {{ # via pytest}} {{pandas==2.0.3}} {{ # via -r dev/dev-requirements.txt}} {{pemja==0.3.0 ; platform_system != "Windows"}} {{ # via -r dev/dev-requirements.txt}} {{pluggy==1.2.0}} {{ # via pytest}} {{proto-plus==1.22.3}} {{ # via apache-beam}} {{protobuf==4.23.4}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # grpcio-tools}} {{ # proto-plus}} {{py4j==0.10.9.7}} {{ # via -r dev/dev-requirements.txt}} {{pyarrow==11.0.0}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{pydot==1.4.2}} {{ # via apache-beam}} {{pymongo==4.4.1}} {{ # via apache-beam}} {{pyparsing==3.1.1}} {{ # via}} {{ # httplib2}} {{ # pydot}} {{pytest==7.4.0}} {{ # via -r dev/dev-requirements.txt}} {{python-dateutil==2.8.2}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{pytz==2023.3}} {{ # via}} {{ # -r dev/dev-requirements.txt}} {{ # apache-beam}} {{ # pandas}} {{regex==2023.6.3}} {{ # via apache-beam}} {{requests==2.31.0}} {{ # via}} {{ # apache-beam}} {{ # hdfs}} {{six==1.16.0}} {{ # via}} {{ # hdfs}} {{ # python-dateutil}}
[jira] [Assigned] (FLINK-32757) Update the copyright year in NOTICE files
[ https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Liang Teoh reassigned FLINK-32757: --- Assignee: Jiabao Sun > Update the copyright year in NOTICE files > - > > Key: FLINK-32757 > URL: https://issues.apache.org/jira/browse/FLINK-32757 > Project: Flink > Issue Type: Improvement > Components: Connectors / MongoDB >Affects Versions: mongodb-1.0.0, mongodb-1.0.1 >Reporter: Jiabao Sun >Assignee: Jiabao Sun >Priority: Major > Labels: pull-request-available > Fix For: mongodb-1.0.2 > > > The current copyright year is 2014-2022 in NOTICE files. We should change it > to 2014-2023. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-32757) Update the copyright year in NOTICE files
[ https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Liang Teoh resolved FLINK-32757. - Resolution: Fixed > Update the copyright year in NOTICE files > - > > Key: FLINK-32757 > URL: https://issues.apache.org/jira/browse/FLINK-32757 > Project: Flink > Issue Type: Improvement > Components: Connectors / MongoDB >Affects Versions: mongodb-1.0.0, mongodb-1.0.1 >Reporter: Jiabao Sun >Assignee: Jiabao Sun >Priority: Major > Labels: pull-request-available > Fix For: mongodb-1.0.2 > > > The current copyright year is 2014-2022 in NOTICE files. We should change it > to 2014-2023. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32757) Update the copyright year in NOTICE files
[ https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751227#comment-17751227 ] Hong Liang Teoh commented on FLINK-32757: - Merged commit [{{fc656c4}}|https://github.com/apache/flink-connector-mongodb/commit/fc656c420e9b20676bf5e67c0c1c059a5ad44216] into apache:main > Update the copyright year in NOTICE files > - > > Key: FLINK-32757 > URL: https://issues.apache.org/jira/browse/FLINK-32757 > Project: Flink > Issue Type: Improvement > Components: Connectors / MongoDB >Affects Versions: mongodb-1.0.0, mongodb-1.0.1 >Reporter: Jiabao Sun >Priority: Major > Labels: pull-request-available > Fix For: mongodb-1.0.2 > > > The current copyright year is 2014-2022 in NOTICE files. We should change it > to 2014-2023. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] HuangZhenQiu opened a new pull request, #642: [FLINK-32729] allow new deployment with suspended state
HuangZhenQiu opened a new pull request, #642: URL: https://github.com/apache/flink-kubernetes-operator/pull/642 ## What is the purpose of the change Allow new deployment with suspended state. So that a backup application can created in advance. ## Brief change log - Change the DefaultValidator to accept both running and suspend state for first deployment - Change DefaultValidatorTest to reflect the latest logic. ## Verifying this change This change is already covered by existing tests in DefaultValidatorTest. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changes to the `CustomResourceDescriptors`: (no) - Core observer or reconciler logic that is regularly executed: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp closed pull request #21555: [FLINK-26522][draft] Cleanup of LeaderElectionService
XComp closed pull request #21555: [FLINK-26522][draft] Cleanup of LeaderElectionService URL: https://github.com/apache/flink/pull/21555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23141: [python] Remove PyFlink dependencies' upper bounds
flinkbot commented on PR #23141: URL: https://github.com/apache/flink/pull/23141#issuecomment-1666055186 ## CI report: * 118206bfb082af7903c5c0ec135d0f4ef9c6841e UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] deepyaman opened a new pull request, #23141: [python] Remove PyFlink dependencies' upper bounds
deepyaman opened a new pull request, #23141: URL: https://github.com/apache/flink/pull/23141 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have implemented SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. This works well in our production. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient and a new ClientHAServices. > > In our production usage, we have enabled JobManager ZK HA and use > ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices > will establish a network connection with ZK and create four ZK related > threads. > > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > 4000 ZK related threads are created simultaneously. This will raise a > significant stability risk in production. > > To address this problem, we have implemented SharedZKClientHAService for > different sessions to share a ZK connection and ZKClient. This works well in > our production. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient and a new ClientHAServices. > > In our production usage, we have enabled JobManager ZK HA and use > ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices > will establish a network connection with ZK and create four ZK related > threads. > > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > 4000 ZK related threads are created simultaneously. This will raise a > significant stability risk in production. > > To address this problem, we have created SharedZKClientHAService for > different sessions to share a ZK connection and ZKClient. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient to submit queries and a new ClientHAServices to discover the latest address of the JobManager. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient and a new ClientHAServices. > > In our production usage, we have enabled JobManager ZK HA and use > ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices > will establish a network connection with ZK and create four ZK related > threads. > > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > more than 4000 ZK related threads are created simultaneously. This will raise > a significant stability risk in production. > > To address this problem, we have created SharedZKClientHAService for > different sessions to share a ZK connection and ZKClient. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on pull request #14: [FLINK-32757][connectors/mongodb][build] Update the copyright with year 2023
Jiabao-Sun commented on PR #14: URL: https://github.com/apache/flink-connector-mongodb/pull/14#issuecomment-1665934374 Hi @dannycranmer, Could you help review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32757) Update the copyright year in NOTICE files
[ https://issues.apache.org/jira/browse/FLINK-32757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32757: --- Labels: pull-request-available (was: ) > Update the copyright year in NOTICE files > - > > Key: FLINK-32757 > URL: https://issues.apache.org/jira/browse/FLINK-32757 > Project: Flink > Issue Type: Improvement > Components: Connectors / MongoDB >Affects Versions: mongodb-1.0.0, mongodb-1.0.1 >Reporter: Jiabao Sun >Priority: Major > Labels: pull-request-available > Fix For: mongodb-1.0.2 > > > The current copyright year is 2014-2022 in NOTICE files. We should change it > to 2014-2023. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32757) Update the copyright year in NOTICE files
Jiabao Sun created FLINK-32757: -- Summary: Update the copyright year in NOTICE files Key: FLINK-32757 URL: https://issues.apache.org/jira/browse/FLINK-32757 Project: Flink Issue Type: Improvement Components: Connectors / MongoDB Affects Versions: mongodb-1.0.1, mongodb-1.0.0 Reporter: Jiabao Sun Fix For: mongodb-1.0.2 The current copyright year is 2014-2022 in NOTICE files. We should change it to 2014-2023. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #23140: [hotfix] Fix some typo in java doc and comments
flinkbot commented on PR #23140: URL: https://github.com/apache/flink/pull/23140#issuecomment-1665863480 ## CI report: * b2d0952def78f5a644621cbd8f852fba8120b32d UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zoltar9264 commented on pull request #23140: [hotfix] Fix some typo in java doc and comments
zoltar9264 commented on PR #23140: URL: https://github.com/apache/flink/pull/23140#issuecomment-1665861589 Hi @reswqa , I try fix some type, can you help to review it ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zoltar9264 opened a new pull request, #23140: [hotfix] Fix some typo in java doc and comments
zoltar9264 opened a new pull request, #23140: URL: https://github.com/apache/flink/pull/23140 ## What is the purpose of the change As the title, fix some typo in java doc and comments. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient to submit queries and a new ClientHAServices to discover the latest address of the JobManager. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient to submit queries and a new ClientHAServices to discover the latest address of the JobManager. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient to submit queries and a new ClientHAServices to discover > the latest address of the JobManager. > > In our production usage, we have enabled JobManager ZK HA and use > ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices > will establish a network connection with ZK and create four ZK related > threads. > > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > more than 4000 ZK related threads are created simultaneously. This will raise > a significant stability risk in production. > > To address this problem, we have created SharedZKClientHAService for > different sessions to share a ZK connection and ZKClient. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient to submit queries and a new ClientHAServices to discover the latest address of the JobManager. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have created SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient to submit queries and a new ClientHAServices to discover the latest address of the JobManager. In our production usage, we have enabled JobManager HA and use ZKClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient to submit queries and a new ClientHAServices to discover > the latest address of the JobManager. > > In our production usage, we have enabled JobManager ZK HA and use > ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices > will establish a network connection with ZK and create four ZK related > threads. > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > more than 4000 ZK related threads are created simultaneously. This will raise > a significant stability risk in production. > > To address this problem, we have created SharedZKClientHAService for > different sessions to share a ZK connection and ZKClient. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Summary: Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs (was: Reues ZK connections when submitting OLAP jobs to Flink session cluster) > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient to submit queries and a new ClientHAServices to discover > the latest address of the JobManager. > In our production usage, we have enabled JobManager HA and use > ZKClientHAServices to do service discovery. Each ZKClientHAServices will > establish a network connection with ZK and create four ZK related threads. > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > more than 4000 ZK related threads are created simultaneously. This will raise > a significant stability risk in production. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reues ZK connections when submitting OLAP jobs to Flink session cluster
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient to submit queries and a new ClientHAServices to discover the latest address of the JobManager. In our production usage, we have enabled JobManager HA and use ZKClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and more than 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. > Reues ZK connections when submitting OLAP jobs to Flink session cluster > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > In OLAP scenario, we submit queries to flink session cluster through the > flink-sql-gateway service. When receiving queries, the gateway service will > create sessions to handle the query, each session will create a new > RestClusterClient to submit queries and a new ClientHAServices to discover > the latest address of the JobManager. > In our production usage, we have enabled JobManager HA and use > ZKClientHAServices to do service discovery. Each ZKClientHAServices will > establish a network connection with ZK and create four ZK related threads. > When QPS reaches 200, more than 1000 sessions are created in a single > flink-sql-gateway instance, which means more than 1000 ZK connections and > more than 4000 ZK related threads are created simultaneously. This will raise > a significant stability risk in production. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] Samrat002 commented on a diff in pull request #21458: [FLINK-28513] Fix Flink Table API CSV streaming sink throws SerializedThrowable exception
Samrat002 commented on code in PR #21458: URL: https://github.com/apache/flink/pull/21458#discussion_r1284570168 ## flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/writer/S3RecoverableFsDataOutputStream.java: ## @@ -126,7 +126,9 @@ public long getPos() throws IOException { @Override public void sync() throws IOException { -fileStream.sync(); +// for s3 there is no sync supported. +// instead calling persist() to put data into s3. +persist(); } Review Comment: Added test and validated manually in EMR cluster via writing Csv data in s3 of (10Gb and 70GB) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Samrat002 commented on pull request #21458: [FLINK-28513] Fix Flink Table API CSV streaming sink throws SerializedThrowable exception
Samrat002 commented on PR #21458: URL: https://github.com/apache/flink/pull/21458#issuecomment-1665786834 @dannycranmer please review whenever time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (FLINK-32736) Add a post to announce new connectors and connector externalisation
[ https://issues.apache.org/jira/browse/FLINK-32736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer resolved FLINK-32736. --- Resolution: Done > Add a post to announce new connectors and connector externalisation > --- > > Key: FLINK-32736 > URL: https://issues.apache.org/jira/browse/FLINK-32736 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Elphas Toringepi >Assignee: Elphas Toringepi >Priority: Major > Labels: pull-request-available > > Announce the availability of the DynamoDB, MongoDB and OpenSearch connectors > as well the externalisation of Flink connectors. > *References* > * [FLIP-252: Amazon DynamoDB Sink > Connector|https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector] > * [FLIP-262: Introduce MongoDB > connector|https://cwiki.apache.org/confluence/display/FLINK/FLIP-262%3A+Introduce+MongoDB+connector] > * [FLIP-243: Dedicated OpenSearch > connectors|https://cwiki.apache.org/confluence/display/FLINK/FLIP-243%3A+Dedicated+Opensearch+connectors] > * [Externalized Connector > development|https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32736) Add a post to announce new connectors and connector externalisation
[ https://issues.apache.org/jira/browse/FLINK-32736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751146#comment-17751146 ] Danny Cranmer commented on FLINK-32736: --- Merged commit [{{aca5d7a}}|https://github.com/apache/flink-web/commit/aca5d7a38f0d30590679e039d7b34f73592ac40f] into apache:asf-site > Add a post to announce new connectors and connector externalisation > --- > > Key: FLINK-32736 > URL: https://issues.apache.org/jira/browse/FLINK-32736 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Elphas Toringepi >Assignee: Elphas Toringepi >Priority: Major > Labels: pull-request-available > > Announce the availability of the DynamoDB, MongoDB and OpenSearch connectors > as well the externalisation of Flink connectors. > *References* > * [FLIP-252: Amazon DynamoDB Sink > Connector|https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector] > * [FLIP-262: Introduce MongoDB > connector|https://cwiki.apache.org/confluence/display/FLINK/FLIP-262%3A+Introduce+MongoDB+connector] > * [FLIP-243: Dedicated OpenSearch > connectors|https://cwiki.apache.org/confluence/display/FLINK/FLIP-243%3A+Dedicated+Opensearch+connectors] > * [Externalized Connector > development|https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32736) Add a post to announce new connectors and connector externalisation
[ https://issues.apache.org/jira/browse/FLINK-32736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-32736: -- Fix Version/s: (was: 1.18.0) > Add a post to announce new connectors and connector externalisation > --- > > Key: FLINK-32736 > URL: https://issues.apache.org/jira/browse/FLINK-32736 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Elphas Toringepi >Assignee: Elphas Toringepi >Priority: Major > Labels: pull-request-available > > Announce the availability of the DynamoDB, MongoDB and OpenSearch connectors > as well the externalisation of Flink connectors. > *References* > * [FLIP-252: Amazon DynamoDB Sink > Connector|https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector] > * [FLIP-262: Introduce MongoDB > connector|https://cwiki.apache.org/confluence/display/FLINK/FLIP-262%3A+Introduce+MongoDB+connector] > * [FLIP-243: Dedicated OpenSearch > connectors|https://cwiki.apache.org/confluence/display/FLINK/FLIP-243%3A+Dedicated+Opensearch+connectors] > * [Externalized Connector > development|https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-web] dannycranmer merged pull request #668: [FLINK-32736] Externalized connectors
dannycranmer merged PR #668: URL: https://github.com/apache/flink-web/pull/668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32565) Support cast from NUMBER to BYTES
[ https://issues.apache.org/jira/browse/FLINK-32565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751138#comment-17751138 ] Hanyu Zheng commented on FLINK-32565: - [~twalthr] , Through research, It seem that other vendors use cast but not convert. > Support cast from NUMBER to BYTES > - > > Key: FLINK-32565 > URL: https://issues.apache.org/jira/browse/FLINK-32565 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels: pull-request-available > > We are undertaking a task that requires casting from the DOUBLE type to BYTES > In particular, we have a INTEGER 1234. Our current approach is to convert > this INTEGER to BYTES using the following SQL query: > {code:java} > SELECT CAST(1234 as BYTES);{code} > {{ }} > However, we encounter an issue when executing this query, potentially due to > an error in the conversion between INTEGER and BYTES. Our goal is to identify > and correct this issue so that our query can execute successfully. The tasks > involved are: > # Investigate and pinpoint the specific reason for the conversion failure > from INTEGER to BYTES. > # Design and implement a solution that enables our query to function > correctly. > # Test this solution across all required scenarios to ensure its robustness. > > see also: > 1. PostgreSQL: PostgreSQL supports casting from NUMBER types (INTEGER, > BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST > or TO_BINARY function for performing the conversion. URL: > [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] > 2. MySQL: MySQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, > etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or CONVERT > functions for performing the conversion. URL: > [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] > 3. Microsoft SQL Server: SQL Server supports casting from NUMBER types (INT, > BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use CAST > or CONVERT functions for performing the conversion. URL: > [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] > 4. Oracle Database: Oracle supports casting from NUMBER types (NUMBER, > INTEGER, FLOAT, etc.) to BYTES type (RAW). You can use UTL_RAW.CAST_TO_RAW > function for performing the conversion. URL: > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_BINARY_DOUBLE.html] > > for the problem of bytes order may arise (little vs big endian). > > 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with > byte order issues across different platforms and architectures. The Hadoop > File System (HDFS) uses a technique called "sequence files," which include > metadata to describe the byte order of the data. This metadata ensures that > data is read and written correctly, regardless of the endianness of the > platform. > 2. Apache Avro: Avro is a data serialization system used by various big data > frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding > format that includes a marker for the byte order. This allows Avro to handle > endianness issues seamlessly when data is exchanged between systems with > different byte orders. > 3. Apache Parquet: Parquet is a columnar storage format used in big data > processing frameworks like Apache Spark. Parquet uses a little-endian format > for encoding numeric values, which is the most common format on modern > systems. When reading or writing Parquet data, data processing engines > typically handle any necessary byte order conversions transparently. > 4. Apache Spark: Spark is a popular big data processing engine that can > handle data on distributed systems. It relies on the underlying data formats > it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These > formats are designed to handle byte order correctly, ensuring that Spark can > handle data correctly on different platforms. > 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by > Google Cloud. When dealing with binary data and endianness, BigQuery relies > on the data encoding format. For example, when loading data in Avro or > Parquet formats, these formats already include byte order information, > allowing BigQuery to handle data across different platforms correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32564) Support cast from BYTES to NUMBER
[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751139#comment-17751139 ] Hanyu Zheng commented on FLINK-32564: - [~twalthr] , Through research, It seem that other vendors use cast but not convert. > Support cast from BYTES to NUMBER > - > > Key: FLINK-32564 > URL: https://issues.apache.org/jira/browse/FLINK-32564 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels: pull-request-available > > We are dealing with a task that requires casting from the BYTES type to > BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert > this string to BYTES and then cast the result to BIGINT with the following > SQL query: > {code:java} > SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} > However, an issue arises when executing this query, likely due to an error in > the conversion between BYTES and BIGINT. We aim to identify and rectify this > issue so our query can run correctly. The tasks involved are: > # Investigate and identify the specific reason for the failure of conversion > from BYTES to BIGINT. > # Design and implement a solution to ensure our query can function correctly. > # Test this solution across all required scenarios to guarantee its > functionality. > > see also > 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER > types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or > type conversion operator (::) for performing the conversion. URL: > [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] > 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER > types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT > functions for performing the conversion. URL: > [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] > 3. Microsoft SQL Server: SQL Server supports casting from BYTES type > (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use > CAST or CONVERT functions for performing the conversion. URL: > [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] > 4. Oracle Database: Oracle supports casting from RAW type (equivalent to > BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the > TO_NUMBER function for performing the conversion. URL: > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] > 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or > ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by > using the {{cast}} function. URL: > [https://spark.apache.org/docs/latest/api/sql/#cast] > > for the problem of bytes order may arise (little vs big endian). > > 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with > byte order issues across different platforms and architectures. The Hadoop > File System (HDFS) uses a technique called "sequence files," which include > metadata to describe the byte order of the data. This metadata ensures that > data is read and written correctly, regardless of the endianness of the > platform. > 2. Apache Avro: Avro is a data serialization system used by various big data > frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding > format that includes a marker for the byte order. This allows Avro to handle > endianness issues seamlessly when data is exchanged between systems with > different byte orders. > 3. Apache Parquet: Parquet is a columnar storage format used in big data > processing frameworks like Apache Spark. Parquet uses a little-endian format > for encoding numeric values, which is the most common format on modern > systems. When reading or writing Parquet data, data processing engines > typically handle any necessary byte order conversions transparently. > 4. Apache Spark: Spark is a popular big data processing engine that can > handle data on distributed systems. It relies on the underlying data formats > it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These > formats are designed to handle byte order correctly, ensuring that Spark can > handle data correctly on different platforms. > 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by > Google Cloud. When dealing with binary data and endianness, BigQuery relies > on the data encoding format. For example, when loading data in Avro or > Parquet formats, these formats already include byte order information, > allowing BigQuery to handle data across different platforms correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32565) Support cast from NUMBER to BYTES
[ https://issues.apache.org/jira/browse/FLINK-32565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32565: Description: We are undertaking a task that requires casting from the DOUBLE type to BYTES In particular, we have a INTEGER 1234. Our current approach is to convert this INTEGER to BYTES using the following SQL query: {code:java} SELECT CAST(1234 as BYTES);{code} {{ }} However, we encounter an issue when executing this query, potentially due to an error in the conversion between INTEGER and BYTES. Our goal is to identify and correct this issue so that our query can execute successfully. The tasks involved are: # Investigate and pinpoint the specific reason for the conversion failure from INTEGER to BYTES. # Design and implement a solution that enables our query to function correctly. # Test this solution across all required scenarios to ensure its robustness. see also: 1. PostgreSQL: PostgreSQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST or TO_BINARY function for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from NUMBER types (INT, BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from NUMBER types (NUMBER, INTEGER, FLOAT, etc.) to BYTES type (RAW). You can use UTL_RAW.CAST_TO_RAW function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_BINARY_DOUBLE.html] for the problem of bytes order may arise (little vs big endian). 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with byte order issues across different platforms and architectures. The Hadoop File System (HDFS) uses a technique called "sequence files," which include metadata to describe the byte order of the data. This metadata ensures that data is read and written correctly, regardless of the endianness of the platform. 2. Apache Avro: Avro is a data serialization system used by various big data frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding format that includes a marker for the byte order. This allows Avro to handle endianness issues seamlessly when data is exchanged between systems with different byte orders. 3. Apache Parquet: Parquet is a columnar storage format used in big data processing frameworks like Apache Spark. Parquet uses a little-endian format for encoding numeric values, which is the most common format on modern systems. When reading or writing Parquet data, data processing engines typically handle any necessary byte order conversions transparently. 4. Apache Spark: Spark is a popular big data processing engine that can handle data on distributed systems. It relies on the underlying data formats it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These formats are designed to handle byte order correctly, ensuring that Spark can handle data correctly on different platforms. 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by Google Cloud. When dealing with binary data and endianness, BigQuery relies on the data encoding format. For example, when loading data in Avro or Parquet formats, these formats already include byte order information, allowing BigQuery to handle data across different platforms correctly. was: We are undertaking a task that requires casting from the DOUBLE type to BYTES In particular, we have a INTEGER 1234. Our current approach is to convert this INTEGER to BYTES using the following SQL query: {code:java} SELECT CAST(1234 as BYTES);{code} {{ }} However, we encounter an issue when executing this query, potentially due to an error in the conversion between INTEGER and BYTES. Our goal is to identify and correct this issue so that our query can execute successfully. The tasks involved are: # Investigate and pinpoint the specific reason for the conversion failure from INTEGER to BYTES. # Design and implement a solution that enables our query to function correctly. # Test this solution across all required scenarios to ensure its robustness. see also: 1. PostgreSQL: PostgreSQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST or TO_BINARY function for performing
[jira] [Updated] (FLINK-32565) Support cast from NUMBER to BYTES
[ https://issues.apache.org/jira/browse/FLINK-32565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32565: Description: We are undertaking a task that requires casting from the DOUBLE type to BYTES In particular, we have a INTEGER 1234. Our current approach is to convert this INTEGER to BYTES using the following SQL query: {code:java} SELECT CAST(1234 as BYTES);{code} {{ }} However, we encounter an issue when executing this query, potentially due to an error in the conversion between INTEGER and BYTES. Our goal is to identify and correct this issue so that our query can execute successfully. The tasks involved are: # Investigate and pinpoint the specific reason for the conversion failure from INTEGER to BYTES. # Design and implement a solution that enables our query to function correctly. # Test this solution across all required scenarios to ensure its robustness. see also: 1. PostgreSQL: PostgreSQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST or TO_BINARY function for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from NUMBER types (INT, BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from NUMBER types (NUMBER, INTEGER, FLOAT, etc.) to BYTES type (RAW). You can use UTL_RAW.CAST_TO_RAW function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_BINARY_DOUBLE.html] was: We are undertaking a task that requires casting from the DOUBLE type to BYTES In particular, we have a INTEGER 1234. Our current approach is to convert this INTEGER to BYTES using the following SQL query: {code:java} SELECT CAST(1234 as BYTES);{code} {{ }} However, we encounter an issue when executing this query, potentially due to an error in the conversion between INTEGER and BYTES. Our goal is to identify and correct this issue so that our query can execute successfully. The tasks involved are: # Investigate and pinpoint the specific reason for the conversion failure from INTEGER to BYTES. # Design and implement a solution that enables our query to function correctly. # Test this solution across all required scenarios to ensure its robustness. see also: # *PostgreSQL:* PostgreSQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST or TO_BINARY function for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] # *MySQL:* MySQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] # *Microsoft SQL Server:* SQL Server supports casting from NUMBER types (INT, BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] # *Oracle Database:* Oracle supports casting from NUMBER types (NUMBER, INTEGER, FLOAT, etc.) to BYTES type (RAW). You can use UTL_RAW.CAST_TO_RAW function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_BINARY_DOUBLE.html] > Support cast from NUMBER to BYTES > - > > Key: FLINK-32565 > URL: https://issues.apache.org/jira/browse/FLINK-32565 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels: pull-request-available > > We are undertaking a task that requires casting from the DOUBLE type to BYTES > In particular, we have a INTEGER 1234. Our current approach is to convert > this INTEGER to BYTES using the following SQL query: > {code:java} > SELECT CAST(1234 as BYTES);{code} > {{ }} > However, we encounter an issue when executing this query, potentially due to > an error in the conversion between INTEGER and BYTES. Our goal is to identify > and correct this issue so that our query can execute successfully. The tasks >
[jira] [Updated] (FLINK-32565) Support cast from NUMBER to BYTES
[ https://issues.apache.org/jira/browse/FLINK-32565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32565: Description: We are undertaking a task that requires casting from the DOUBLE type to BYTES In particular, we have a INTEGER 1234. Our current approach is to convert this INTEGER to BYTES using the following SQL query: {code:java} SELECT CAST(1234 as BYTES);{code} {{ }} However, we encounter an issue when executing this query, potentially due to an error in the conversion between INTEGER and BYTES. Our goal is to identify and correct this issue so that our query can execute successfully. The tasks involved are: # Investigate and pinpoint the specific reason for the conversion failure from INTEGER to BYTES. # Design and implement a solution that enables our query to function correctly. # Test this solution across all required scenarios to ensure its robustness. see also: # *PostgreSQL:* PostgreSQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST or TO_BINARY function for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] # *MySQL:* MySQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL, etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] # *Microsoft SQL Server:* SQL Server supports casting from NUMBER types (INT, BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] # *Oracle Database:* Oracle supports casting from NUMBER types (NUMBER, INTEGER, FLOAT, etc.) to BYTES type (RAW). You can use UTL_RAW.CAST_TO_RAW function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_BINARY_DOUBLE.html] was: We are undertaking a task that requires casting from the DOUBLE type to BYTES In particular, we have a INTEGER 1234. Our current approach is to convert this INTEGER to BYTES using the following SQL query: {code:java} SELECT CAST(1234 as BYTES);{code} {{ }} However, we encounter an issue when executing this query, potentially due to an error in the conversion between INTEGER and BYTES. Our goal is to identify and correct this issue so that our query can execute successfully. The tasks involved are: # Investigate and pinpoint the specific reason for the conversion failure from INTEGER to BYTES. # Design and implement a solution that enables our query to function correctly. # Test this solution across all required scenarios to ensure its robustness. > Support cast from NUMBER to BYTES > - > > Key: FLINK-32565 > URL: https://issues.apache.org/jira/browse/FLINK-32565 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels: pull-request-available > > We are undertaking a task that requires casting from the DOUBLE type to BYTES > In particular, we have a INTEGER 1234. Our current approach is to convert > this INTEGER to BYTES using the following SQL query: > {code:java} > SELECT CAST(1234 as BYTES);{code} > {{ }} > However, we encounter an issue when executing this query, potentially due to > an error in the conversion between INTEGER and BYTES. Our goal is to identify > and correct this issue so that our query can execute successfully. The tasks > involved are: > # Investigate and pinpoint the specific reason for the conversion failure > from INTEGER to BYTES. > # Design and implement a solution that enables our query to function > correctly. > # Test this solution across all required scenarios to ensure its robustness. > > see also: > # *PostgreSQL:* PostgreSQL supports casting from NUMBER types (INTEGER, > BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST > or TO_BINARY function for performing the conversion. URL: > [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] > # *MySQL:* MySQL supports casting from NUMBER types (INTEGER, BIGINT, > DECIMAL, etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or > CONVERT functions for performing the conversion. URL: > [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] > # *Microsoft SQL Server:* SQL Server supports casting from NUMBER types > (INT, BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use > CAST or CONVERT functions for performing the conversion. URL: >
[jira] [Updated] (FLINK-32564) Support cast from BYTES to NUMBER
[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32564: Description: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] for the problem of bytes order may arise (little vs big endian). 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with byte order issues across different platforms and architectures. The Hadoop File System (HDFS) uses a technique called "sequence files," which include metadata to describe the byte order of the data. This metadata ensures that data is read and written correctly, regardless of the endianness of the platform. 2. Apache Avro: Avro is a data serialization system used by various big data frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding format that includes a marker for the byte order. This allows Avro to handle endianness issues seamlessly when data is exchanged between systems with different byte orders. 3. Apache Parquet: Parquet is a columnar storage format used in big data processing frameworks like Apache Spark. Parquet uses a little-endian format for encoding numeric values, which is the most common format on modern systems. When reading or writing Parquet data, data processing engines typically handle any necessary byte order conversions transparently. 4. Apache Spark: Spark is a popular big data processing engine that can handle data on distributed systems. It relies on the underlying data formats it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These formats are designed to handle byte order correctly, ensuring that Spark can handle data correctly on different platforms. 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by Google Cloud. When dealing with binary data and endianness, BigQuery relies on the data encoding format. For example, when loading data in Avro or Parquet formats, these formats already include byte order information, allowing BigQuery to handle data across different platforms correctly. was: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly.
[jira] [Updated] (FLINK-32564) Support cast from BYTES to NUMBER
[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32564: Description: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] for order 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with byte order issues across different platforms and architectures. The Hadoop File System (HDFS) uses a technique called "sequence files," which include metadata to describe the byte order of the data. This metadata ensures that data is read and written correctly, regardless of the endianness of the platform. 2. Apache Avro: Avro is a data serialization system used by various big data frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding format that includes a marker for the byte order. This allows Avro to handle endianness issues seamlessly when data is exchanged between systems with different byte orders. 3. Apache Parquet: Parquet is a columnar storage format used in big data processing frameworks like Apache Spark. Parquet uses a little-endian format for encoding numeric values, which is the most common format on modern systems. When reading or writing Parquet data, data processing engines typically handle any necessary byte order conversions transparently. 4. Apache Spark: Spark is a popular big data processing engine that can handle data on distributed systems. It relies on the underlying data formats it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These formats are designed to handle byte order correctly, ensuring that Spark can handle data correctly on different platforms. 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by Google Cloud. When dealing with binary data and endianness, BigQuery relies on the data encoding format. For example, when loading data in Avro or Parquet formats, these formats already include byte order information, allowing BigQuery to handle data across different platforms correctly. was: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to
[jira] [Updated] (FLINK-32564) Support cast from BYTES to NUMBER
[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32564: Description: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] was: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] > Support cast from BYTES to NUMBER > - > > Key: FLINK-32564 > URL: https://issues.apache.org/jira/browse/FLINK-32564 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels:
[jira] [Updated] (FLINK-32564) Support cast from BYTES to NUMBER
[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32564: Description: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] 3. Microsoft SQL Server: SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] 4. Oracle Database: Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] was: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also # *PostgreSQL:* PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] # *MySQL:* MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] # *Microsoft SQL Server:* SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] # *Oracle Database:* Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] # *Apache Spark:* Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] > Support cast from BYTES to NUMBER > - > > Key: FLINK-32564 > URL: https://issues.apache.org/jira/browse/FLINK-32564 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels:
[jira] [Updated] (FLINK-32564) Support cast from BYTES to NUMBER
[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanyu Zheng updated FLINK-32564: Description: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. see also # *PostgreSQL:* PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or type conversion operator (::) for performing the conversion. URL: [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] # *MySQL:* MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT functions for performing the conversion. URL: [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] # *Microsoft SQL Server:* SQL Server supports casting from BYTES type (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use CAST or CONVERT functions for performing the conversion. URL: [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] # *Oracle Database:* Oracle supports casting from RAW type (equivalent to BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the TO_NUMBER function for performing the conversion. URL: [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] # *Apache Spark:* Spark DataFrame supports casting binary types (BinaryType or ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by using the {{cast}} function. URL: [https://spark.apache.org/docs/latest/api/sql/#cast] was: We are dealing with a task that requires casting from the BYTES type to BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert this string to BYTES and then cast the result to BIGINT with the following SQL query: {code:java} SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} However, an issue arises when executing this query, likely due to an error in the conversion between BYTES and BIGINT. We aim to identify and rectify this issue so our query can run correctly. The tasks involved are: # Investigate and identify the specific reason for the failure of conversion from BYTES to BIGINT. # Design and implement a solution to ensure our query can function correctly. # Test this solution across all required scenarios to guarantee its functionality. > Support cast from BYTES to NUMBER > - > > Key: FLINK-32564 > URL: https://issues.apache.org/jira/browse/FLINK-32564 > Project: Flink > Issue Type: Sub-task >Reporter: Hanyu Zheng >Assignee: Hanyu Zheng >Priority: Major > Labels: pull-request-available > > We are dealing with a task that requires casting from the BYTES type to > BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert > this string to BYTES and then cast the result to BIGINT with the following > SQL query: > {code:java} > SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} > However, an issue arises when executing this query, likely due to an error in > the conversion between BYTES and BIGINT. We aim to identify and rectify this > issue so our query can run correctly. The tasks involved are: > # Investigate and identify the specific reason for the failure of conversion > from BYTES to BIGINT. > # Design and implement a solution to ensure our query can function correctly. > # Test this solution across all required scenarios to guarantee its > functionality. > > see also > # *PostgreSQL:* PostgreSQL supports casting from BYTES type (BYTEA) to > NUMBER types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use > CAST or type conversion operator (::) for performing the conversion. URL: > [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] > # *MySQL:* MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER > types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT > functions for performing the conversion. URL: >
[GitHub] [flink] Samrat002 commented on a diff in pull request #21458: [FLINK-28513] Fix Flink Table API CSV streaming sink throws SerializedThrowable exception
Samrat002 commented on code in PR #21458: URL: https://github.com/apache/flink/pull/21458#discussion_r1284527153 ## flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/writer/S3RecoverableFsDataOutputStream.java: ## @@ -126,7 +126,16 @@ public long getPos() throws IOException { @Override public void sync() throws IOException { -fileStream.sync(); Review Comment: `sync` method is called on the following scenerios 1. `S3RecoverableWriter` 2. `FlinkS3FileSystem` creates new instance of `S3RecoverableWriter` when `createRecoverableWriter()` method is called 3. `CsvBulkWriter` uses `FlinkS3FileSystem` and calls recoverableWriter. 4. `BulkWriter` This change will not alter any processing guarantee. In the current changes in `sync()` method , it takes the lock first then makes a call to filesystem flush and commits remaining blocks (writes to s3). This flow results in exactly once . Same code flow is implemented for `AzureBlobFsRecoverableDataOutputStream` . From the class `BlockBlobAppendStream` ``` public void hsync() throws IOException { if (this.compactionEnabled) { this.flush(); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32756) Reues ZK connections when submitting OLAP jobs to Flink session cluster
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Summary: Reues ZK connections when submitting OLAP jobs to Flink session cluster (was: Reues ZK CuratorFramework when submitting OLAP jobs to Flink session cluster) > Reues ZK connections when submitting OLAP jobs to Flink session cluster > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-24370) [FLIP-171] Documentation for Generic AsyncSinkBase
[ https://issues.apache.org/jira/browse/FLINK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer resolved FLINK-24370. --- Resolution: Done > [FLIP-171] Documentation for Generic AsyncSinkBase > -- > > Key: FLINK-24370 > URL: https://issues.apache.org/jira/browse/FLINK-24370 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common >Reporter: Zichen Liu >Assignee: Zichen Liu >Priority: Major > Labels: pull-request-available, stale-assigned > > h2. Motivation > To write documentation for FLIP-171 Async Sink Base. This will help sink > implementers get acquainted with the necessary information to write their > concrete sinks. > h2. References > More details to be found > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-24278) [FLIP-171] Async Sink Base Sink Developer Guide for Documentation
[ https://issues.apache.org/jira/browse/FLINK-24278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer resolved FLINK-24278. --- Resolution: Done > [FLIP-171] Async Sink Base Sink Developer Guide for Documentation > - > > Key: FLINK-24278 > URL: https://issues.apache.org/jira/browse/FLINK-24278 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common >Reporter: Zichen Liu >Assignee: Zichen Liu >Priority: Major > Labels: pull-request-available, stale-assigned > > As an Async Sink developer, I’d like to have a step by step guide to > implementing new Async Sinks > *Scope:* > * A mark down file in the async sink package guiding developers through > steps to create new async sink implementations. We could generate PDFs and > HTML pages from this file later, to share it in other places if needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-24278) [FLIP-171] Async Sink Base Sink Developer Guide for Documentation
[ https://issues.apache.org/jira/browse/FLINK-24278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751128#comment-17751128 ] Danny Cranmer commented on FLINK-24278: --- This is done, we published a blog https://flink.apache.org/2022/03/16/the-generic-asynchronous-base-sink/ > [FLIP-171] Async Sink Base Sink Developer Guide for Documentation > - > > Key: FLINK-24278 > URL: https://issues.apache.org/jira/browse/FLINK-24278 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common >Reporter: Zichen Liu >Assignee: Zichen Liu >Priority: Major > Labels: pull-request-available, stale-assigned > > As an Async Sink developer, I’d like to have a step by step guide to > implementing new Async Sinks > *Scope:* > * A mark down file in the async sink package guiding developers through > steps to create new async sink implementations. We could generate PDFs and > HTML pages from this file later, to share it in other places if needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] XComp commented on a diff in pull request #22341: [FLINK-27204][flink-runtime] Refract FileSystemJobResultStore to execute I/O operations on the ioExecutor
XComp commented on code in PR #22341: URL: https://github.com/apache/flink/pull/22341#discussion_r1284412779 ## flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java: ## @@ -561,21 +563,28 @@ public CompletableFuture submitFailedJob( return archiveExecutionGraphToHistoryServer(executionGraphInfo); } +/** + * Checks whether the given job has already been submitted, executed, or awaiting termination. + * + * @param jobId identifying the submitted job + * @return true if the job has already been submitted (is running) or has been executed + * @throws Exception if the job scheduling status cannot be retrieved + */ +private boolean isDuplicateJob(JobID jobId) throws Exception { +return isInGloballyTerminalState(jobId).get() +|| jobManagerRunnerRegistry.isRegistered(jobId) +|| submittedAndWaitingTerminationJobIDs.contains(jobId); +} + Review Comment: This was accidentally re-added when rebasing to most-recent `master`. It can be removed again. ## flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/AbstractThreadsafeJobResultStore.java: ## @@ -112,10 +141,15 @@ private void withWriteLock(ThrowingRunnable runnable) throws IOExce } } -private T withReadLock(SupplierWithException runnable) throws IOException { +private CompletableFuture withReadLockAsync( +SupplierWithException runnable) { Review Comment: ```suggestion SupplierWithException supplier) { ``` copy error on my side. You changed it for `withReadLock` already. :+1: ## flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/nonha/embedded/EmbeddedJobResultStore.java: ## Review Comment: The commit message should have a `[hotfix][runtime]` prefix instead of being labeled with `[FLINK-27204]`. The change is indepenent of FLINK-27204 (i.e. more like a code cleanup that is shipped along with FLINK-27204). Think of it like that: The hotfix commit wouldn't need to be reverted when reverting FLINK-27204 because it's still a valid change to improve the code. Additionally, you only cleaned `markResultAsCleanInternal` into the hotfix commit. `hasDirtyJobResultEntryInternal`, `hasCleanJobResultEntryInternal` and `getDirtyResultsInternal` were cleaned as well but in the wrong commit. These three method changes should end up in the hotfix commit as well. ## flink-runtime/src/test/java/org/apache/flink/runtime/highavailability/FileSystemJobResultStoreFileOperationsTest.java: ## @@ -116,46 +123,74 @@ public void testBaseDirectoryCreationOnResultStoreInitialization() throws Except assertThat(emptyBaseDirectory).doesNotExist(); fileSystemJobResultStore = -new FileSystemJobResultStore(basePath.getFileSystem(), basePath, false); +new FileSystemJobResultStore( +basePath.getFileSystem(), basePath, false, manuallyTriggeredExecutor); // Result store operations are creating the base directory on-the-fly assertThat(emptyBaseDirectory).doesNotExist(); -fileSystemJobResultStore.createDirtyResult(DUMMY_JOB_RESULT_ENTRY); +CompletableFuture dirtyResultAsync = + fileSystemJobResultStore.createDirtyResultAsync(DUMMY_JOB_RESULT_ENTRY); +manuallyTriggeredExecutor.triggerAll(); +dirtyResultAsync.get(); assertThat(emptyBaseDirectory).exists().isDirectory(); Review Comment: I noticed that we could extend the tests to check the async nature: We could put the same (but inverted) assert in front of the trigger statement. That way we check that no synchronous activity happens. The same also applies to the other tests. ## flink-runtime/src/test/java/org/apache/flink/runtime/highavailability/JobResultStoreContractTest.java: ## @@ -46,72 +47,97 @@ public interface JobResultStoreContractTest { @Test default void testStoreJobResultsWithDuplicateIDsThrowsException() throws IOException { JobResultStore jobResultStore = createJobResultStore(); -jobResultStore.createDirtyResult(DUMMY_JOB_RESULT_ENTRY); +jobResultStore.createDirtyResultAsync(DUMMY_JOB_RESULT_ENTRY).join(); final JobResultEntry otherEntryWithDuplicateId = new JobResultEntry( TestingJobResultStore.createSuccessfulJobResult( DUMMY_JOB_RESULT_ENTRY.getJobId())); -assertThatThrownBy(() -> jobResultStore.createDirtyResult(otherEntryWithDuplicateId)) -.isInstanceOf(IllegalStateException.class); +assertThatThrownBy( +() -> +jobResultStore + .createDirtyResultAsync(otherEntryWithDuplicateId) +
[jira] [Updated] (FLINK-32746) Using ZGC in JDK17 to solve long time class unloading STW
[ https://issues.apache.org/jira/browse/FLINK-32746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32746: - Summary: Using ZGC in JDK17 to solve long time class unloading STW (was: Enable ZGC in JDK17 to solve long time class unloading STW) > Using ZGC in JDK17 to solve long time class unloading STW > - > > Key: FLINK-32746 > URL: https://issues.apache.org/jira/browse/FLINK-32746 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime >Reporter: xiangyu feng >Priority: Major > > In a OLAP session cluster, a TM need to frequently create new classloaders > and generate new classes. These classes will be accumulated in metaspace. > When metaspace data usage reaches a threshold, a FullGC with a long time > Stop-the-World will be triggered. Currently, both SerialGC, ParallelGC and > G1GC are doing Stop-the-World class unloading. Only ZGC supports concurrent > class unload, see more in > [https://bugs.openjdk.org/browse/JDK-8218905|https://bugs.openjdk.org/browse/JDK-8218905).]. > > In our scenario, a class unloading for a 2GB metaspace with 5million classes > will stop the application more than 40 seconds. After switch to ZGC, the > maximum STW of the application has been reduced to less than 10ms. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-web] dannycranmer opened a new pull request, #669: MongoDB connector v1.0.2
dannycranmer opened a new pull request, #669: URL: https://github.com/apache/flink-web/pull/669 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-32751) DistinctAggregateITCaseBase.testMultiDistinctAggOnDifferentColumn got stuck on AZP
[ https://issues.apache.org/jira/browse/FLINK-32751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751105#comment-17751105 ] Matthias Pohl edited comment on FLINK-32751 at 8/4/23 1:00 PM: --- The actual test failure happened in {{SortDistinctAggregateITCase#testMultiDistinctAggOnDifferentColumn}} which derives the test from {{DistinctAggregateITCaseBase}}. FYI: This can be determined by looking at the surefire reporting which prints that {{HashDistinctAggregateITCase}} completed but {{SortDistinctAggregateITCase}} didn't. {code} [...] Aug 04 02:12:29 02:12:29.073 [INFO] Running org.apache.flink.table.planner.runtime.batch.sql.agg.SortDistinctAggregateITCase [...] Aug 04 02:19:04 02:19:04.720 [INFO] Running org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase Aug 04 02:20:38 02:20:38.255 [INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.527 s - in org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase [...] {code} The issue we're seeing seems to be independent of the actual test, though. The timeout happens when the {{CollectDynamicSink}} tries to request more data through the Dispatcher which forwards the request. Unfortunately, we don't have any logs from the Dispatcher side of that request. Therefore, we cannot reliably say where the request halted. [~Sergey Nuyanzin] had a point when pointing out that there are multiple other past Jira issues (FLINK-20254, FLINK-22129, FLINK-22181, FLINK-22100) that had a similar stacktrace. These Jiras were handled as duplicates of FLINK-21996 which was a bug in RPC layer with messages being swallowed. In the end, it's strange that the request wasn't completed in some way due to the {{MiniCluster}} having been shut down. There should be an error being thrown when trying to get the RPC endpoint for the dispatcher (through [LeaderGatewayRetriever#getFuture|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/LeaderGatewayRetriever.java#L49]) or for the JobMaster (through [Dispatcher#getJobMaster|https://github.com/apache/flink/blob/c6d58e17e8ce736a062234e1558ac8d7b65990ef/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L1455]) which we don't see. That makes it more likely that the RPC request/response was swallowed. was (Author: mapohl): The actual test failure happened in {{SortDistinctAggregateITCase#testMultiDistinctAggOnDifferentColumn}} which derives the test from {{DistinctAggregateITCaseBase}}. FYI: This can be determined by looking at the surefire reporting which prints that {{HashDistinctAggregateITCase}} completed but {{SortDistinctAggregateITCase}} didn't. {code} [...] Aug 04 02:12:29 02:12:29.073 [INFO] Running org.apache.flink.table.planner.runtime.batch.sql.agg.SortDistinctAggregateITCase [...] Aug 04 02:19:04 02:19:04.720 [INFO] Running org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase Aug 04 02:20:38 02:20:38.255 [INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.527 s - in org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase [...] {code} The issue we're seeing seems to be independent of the actual test, though. The timeout happens when the {{CollectDynamicSink}} tries to request more data through the Dispatcher which forwards the request. Unfortunately, we don't have any logs from the Dispatcher side of that request. Therefore, we cannot reliably say where the request halted. [~Sergey Nuyanzin] had a point when pointing out that there are multiple other past Jira issues (FLINK-20254, FLINK-22129, FLINK-22181, FLINK-22100) that had a similar stacktrace. These Jiras were handled as duplicates of FLINK-21996 which was a bug in RPC layer with messages being swallowed. In the end, it's strange that the request wasn't completed in some way due to the {{MiniCluster}} having been shut down. > DistinctAggregateITCaseBase.testMultiDistinctAggOnDifferentColumn got stuck > on AZP > -- > > Key: FLINK-32751 > URL: https://issues.apache.org/jira/browse/FLINK-32751 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build hangs > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51955=logs=ce3801ad-3bd5-5f06-d165-34d37e757d90=5e4d9387-1dcc-5885-a901-90469b7e6d2f=14399 > {noformat} > Aug 04 03:03:47 "ForkJoinPool-1-worker-51" #28 daemon prio=5 os_prio=0 > cpu=49342.66ms elapsed=3079.49s tid=0x7f67ccdd
[jira] [Commented] (FLINK-32604) PyFlink end-to-end fails with kafka-server-stop.sh: No such file or directory
[ https://issues.apache.org/jira/browse/FLINK-32604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751114#comment-17751114 ] Sergey Nuyanzin commented on FLINK-32604: - https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51978=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d=6843 > PyFlink end-to-end fails with kafka-server-stop.sh: No such file or > directory > --- > > Key: FLINK-32604 > URL: https://issues.apache.org/jira/browse/FLINK-32604 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51253=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d=7883 > fails as > {noformat} > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/kafka-common.sh: line > 117: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-27379214502/kafka_2.12-3.2.3/bin/kafka-server-stop.sh: > No such file or directory > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/kafka-common.sh: line > 121: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-27379214502/kafka_2.12-3.2.3/bin/zookeeper-server-stop.sh: > No such file or directory > Jul 13 19:43:07 [FAIL] Test script contains errors. > Jul 13 19:43:07 Checking of logs skipped. > Jul 13 19:43:07 > Jul 13 19:43:07 [FAIL] 'PyFlink end-to-end test' failed after 0 minutes and > 40 seconds! Test exited with exit code 1 > Jul 13 19:43:07 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30719) flink-runtime-web failed due to a corrupted
[ https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751112#comment-17751112 ] Sergey Nuyanzin commented on FLINK-30719: - https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51978=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=11227 > flink-runtime-web failed due to a corrupted > > > Key: FLINK-30719 > URL: https://issues.apache.org/jira/browse/FLINK-30719 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend, Test Infrastructure, Tests >Affects Versions: 1.16.0, 1.17.0, 1.18.0 >Reporter: Matthias Pohl >Assignee: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=12550 > The build failed due to a corrupted nodejs dependency: > {code} > [ERROR] The archive file > /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz > is corrupted and will be deleted. Please try the build again. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29913) Shared state would be discarded by mistake when maxConcurrentCheckpoint>1
[ https://issues.apache.org/jira/browse/FLINK-29913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1775#comment-1775 ] Feifan Wang commented on FLINK-29913: - [~roman] , I submitted two PRs for [1.16|https://github.com/apache/flink/pull/23137] and [1.17|https://github.com/apache/flink/pull/23139], please take a look. > Shared state would be discarded by mistake when maxConcurrentCheckpoint>1 > - > > Key: FLINK-29913 > URL: https://issues.apache.org/jira/browse/FLINK-29913 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0, 1.16.0, 1.17.0 >Reporter: Yanfei Lei >Assignee: Feifan Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0, 1.16.3, 1.17.2 > > > When maxConcurrentCheckpoint>1, the shared state of Incremental rocksdb state > backend would be discarded by registering the same name handle. See > [https://github.com/apache/flink/pull/21050#discussion_r1011061072] > cc [~roman] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32667) Use standalone store and embedding writer for jobs with no-restart-strategy in session cluster
[ https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751107#comment-17751107 ] Matthias Pohl commented on FLINK-32667: --- Thanks, [~zjureel]. I think it's a good idea to initiate the discussion in the mailing list. That way we might gather additional feedback on the scenarios and avoid redoing things because we haven't considered certain edge cases. That said, I'm confident that this use case makes sense to consider as part of the Flink roadmap. I'm looking forward to your ML post. > Use standalone store and embedding writer for jobs with no-restart-strategy > in session cluster > -- > > Key: FLINK-32667 > URL: https://issues.apache.org/jira/browse/FLINK-32667 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Major > Labels: pull-request-available > > When a flink session cluster use zk or k8s high availability service, it will > store jobs in zk or ConfigMap. When we submit flink olap jobs to the session > cluster, they always turn off restart strategy. These jobs with > no-restart-strategy should not be stored in zk or ConfigMap in k8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32751) DistinctAggregateITCaseBase.testMultiDistinctAggOnDifferentColumn got stuck on AZP
[ https://issues.apache.org/jira/browse/FLINK-32751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751105#comment-17751105 ] Matthias Pohl commented on FLINK-32751: --- The actual test failure happened in {{SortDistinctAggregateITCase#testMultiDistinctAggOnDifferentColumn}} which derives the test from {{DistinctAggregateITCaseBase}}. FYI: This can be determined by looking at the surefire reporting which prints that {{HashDistinctAggregateITCase}} completed but {{SortDistinctAggregateITCase}} didn't. {code} [...] Aug 04 02:12:29 02:12:29.073 [INFO] Running org.apache.flink.table.planner.runtime.batch.sql.agg.SortDistinctAggregateITCase [...] Aug 04 02:19:04 02:19:04.720 [INFO] Running org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase Aug 04 02:20:38 02:20:38.255 [INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.527 s - in org.apache.flink.table.planner.runtime.batch.sql.agg.HashDistinctAggregateITCase [...] {code} The issue we're seeing seems to be independent of the actual test, though. The timeout happens when the {{CollectDynamicSink}} tries to request more data through the Dispatcher which forwards the request. Unfortunately, we don't have any logs from the Dispatcher side of that request. Therefore, we cannot reliably say where the request halted. [~Sergey Nuyanzin] had a point when pointing out that there are multiple other past Jira issues (FLINK-20254, FLINK-22129, FLINK-22181, FLINK-22100) that had a similar stacktrace. These Jiras were handled as duplicates of FLINK-21996 which was a bug in RPC layer with messages being swallowed. In the end, it's strange that the request wasn't completed in some way due to the {{MiniCluster}} having been shut down. > DistinctAggregateITCaseBase.testMultiDistinctAggOnDifferentColumn got stuck > on AZP > -- > > Key: FLINK-32751 > URL: https://issues.apache.org/jira/browse/FLINK-32751 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build hangs > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51955=logs=ce3801ad-3bd5-5f06-d165-34d37e757d90=5e4d9387-1dcc-5885-a901-90469b7e6d2f=14399 > {noformat} > Aug 04 03:03:47 "ForkJoinPool-1-worker-51" #28 daemon prio=5 os_prio=0 > cpu=49342.66ms elapsed=3079.49s tid=0x7f67ccdd nid=0x5234 waiting on > condition [0x7f6791a19000] > Aug 04 03:03:47java.lang.Thread.State: WAITING (parking) > Aug 04 03:03:47 at > jdk.internal.misc.Unsafe.park(java.base@11.0.19/Native Method) > Aug 04 03:03:47 - parking to wait for <0xad3b1fb8> (a > java.util.concurrent.CompletableFuture$Signaller) > Aug 04 03:03:47 at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.19/LockSupport.java:194) > Aug 04 03:03:47 at > java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.19/CompletableFuture.java:1796) > Aug 04 03:03:47 at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.19/ForkJoinPool.java:3118) > Aug 04 03:03:47 at > java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.19/CompletableFuture.java:1823) > Aug 04 03:03:47 at > java.util.concurrent.CompletableFuture.get(java.base@11.0.19/CompletableFuture.java:1998) > Aug 04 03:03:47 at > org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sendRequest(CollectResultFetcher.java:171) > Aug 04 03:03:47 at > org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:129) > Aug 04 03:03:47 at > org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106) > Aug 04 03:03:47 at > org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80) > Aug 04 03:03:47 at > org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222) > Aug 04 03:03:47 at > java.util.Iterator.forEachRemaining(java.base@11.0.19/Iterator.java:132) > Aug 04 03:03:47 at > org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:122) > Aug 04 03:03:47 at > org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:309) > Aug 04 03:03:47 at > org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:145) > Aug 04 03:03:47 at > org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:109) > Aug 04 03:03:47 at >
[jira] [Commented] (FLINK-32755) Add quick start guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-32755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751101#comment-17751101 ] dalongliu commented on FLINK-32755: --- Big +1 > Add quick start guide for Flink OLAP > > > Key: FLINK-32755 > URL: https://issues.apache.org/jira/browse/FLINK-32755 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Priority: Major > > I propose to add a new {{QUICKSTART.md}} guide that provides instructions for > beginner to build a production ready Flink OLAP Service by using > flink-jdbc-driver, flink-sql-gateway and flink session cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] luoyuxia commented on a diff in pull request #23109: [FLINK-32475][docs] Add doc for time travel
luoyuxia commented on code in PR #23109: URL: https://github.com/apache/flink/pull/23109#discussion_r1284335758 ## docs/content/docs/dev/table/sql/queries/time-travel.md: ## @@ -0,0 +1,68 @@ +--- +title: Time Travel +weight: 18 +type: docs +--- + + +# Time Travel + +{{< label Batch >}} {{< label Streaming >}} + +Time travel is an SQL syntax used for querying historical data. It allows users to specify a point in time, query the corresponding table data, and use the schema that corresponds to that time. Review Comment: I don't think we need to refer to schema as `query the corresponding table data` also means the schema at that time. WDTY? ## docs/content/docs/dev/table/sql/queries/time-travel.md: ## @@ -0,0 +1,68 @@ +--- +title: Time Travel +weight: 18 +type: docs +--- + + +# Time Travel + +{{< label Batch >}} {{< label Streaming >}} + +Time travel is an SQL syntax used for querying historical data. It allows users to specify a point in time, query the corresponding table data, and use the schema that corresponds to that time. + +Attention Currently, `Time Travel` requires corresponding Catalog implement the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/Catalog.java" name="getTable(ObjectPath tablePath, long timestamp)" >}} method。 + +```sql +SELECT select_list FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-07-31 00:00:00' +``` + +## Expression Description + +Attention `Time Travel` currently only supports some constant expressions and does not support the use of functions or udf. + +### Constant Expression + +```sql +SELECT select_list FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-07-31 00:00:00' +``` + +### Constant Expression Addition and Subtraction + +```sql +SELECT select_list FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-07-31 00:00:00' - INTERVAL '1' DAY +``` + +### Time Function or UDF (Not Supported) + +When using UDF or functions, a valid timestamp cannot be generated due to limitations of the current framework, and an exception will be thrown when executing the following query. + +```sql +SELECT select_list FROM paimon_tb FOR SYSTEM_TIME AS OF TO_TIMESTAMP_LTZ(0, 3) +``` + +## Time Zone Handling + +By default, the data type generated by the TIMESTAMP expression should be TIMESTAMP type, while the Review Comment: Do we really need the implement details as it just a doc to user. I think we just should descibe the behavior for time travel with time zone handling. ## docs/content/docs/dev/table/sql/queries/time-travel.md: ## @@ -0,0 +1,68 @@ +--- +title: Time Travel +weight: 18 +type: docs +--- + + +# Time Travel + +{{< label Batch >}} {{< label Streaming >}} + +Time travel is an SQL syntax used for querying historical data. It allows users to specify a point in time, query the corresponding table data, and use the schema that corresponds to that time. + +Attention Currently, `Time Travel` requires corresponding Catalog implement the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/Catalog.java" name="getTable(ObjectPath tablePath, long timestamp)" >}} method。 Review Comment: ```suggestion Attention Currently, `Time Travel` requires the corresponding catalog that the table belongs to implement the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/Catalog.java" name="getTable(ObjectPath tablePath, long timestamp)" >}} method。 ``` ## docs/content.zh/docs/dev/table/sql/queries/time-travel.md: ## @@ -0,0 +1,64 @@ +--- +title: 时间旅行 +type: docs +--- + + +# 时间旅行 Review Comment: Some comments in en doc are also suitable for zh doc. ## docs/content/docs/dev/table/sql/queries/time-travel.md: ## @@ -0,0 +1,68 @@ +--- +title: Time Travel +weight: 18 +type: docs +--- + + +# Time Travel + +{{< label Batch >}} {{< label Streaming >}} + +Time travel is an SQL syntax used for querying historical data. It allows users to specify a point in time, query the corresponding table data, and use the schema that corresponds to that time. + +Attention Currently, `Time Travel` requires corresponding Catalog implement the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/Catalog.java" name="getTable(ObjectPath tablePath, long timestamp)" >}} method。 Review Comment: ```suggestion Attention Currently, `Time Travel` requires the corresponding catalog that the table belongs to implement the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/Catalog.java" name="getTable(ObjectPath tablePath, long timestamp)" >}} method。 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:
[jira] [Commented] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
[ https://issues.apache.org/jira/browse/FLINK-32754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751098#comment-17751098 ] Yun Tang commented on FLINK-32754: -- I think this problem still existed, and [~ruanhang1993] could you please take a look? > Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE > -- > > Key: FLINK-32754 > URL: https://issues.apache.org/jira/browse/FLINK-32754 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.17.0, 1.17.1 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-08-04-18-28-05-897.png > > > We registered some metrics in the `enumerator` of the flip-27 source via > `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in > JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. > {*}Meanwhile, the task does not experience failover, and the Checkpoints > cannot be successfully created even after the task is in running state{*}. > We found that the implementation class of `SplitEnumerator` is > `LazyInitializedCoordinatorContext`, however, the metricGroup() is > initialized after calling lazyInitialize(). By reviewing the code, we found > that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() > has not been called yet, so NPE is thrown. > *Q: Why does this bug prevent the task from creating the Checkpoint?* > `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the > member variable `enumerator` in `SourceCoordinator` being null. > Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called > via `runInEventLoop()`. > In `runInEventLoop()`, if the enumerator is null, it will return directly. > *Q: Why this bug doesn't trigger a task failover?* > In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if > `internalCoordinator.resetToCheckpoint` throws an exception, then it will > catch the exception and call `cleanAndFailJob ` to try to fail the job. > However, `globalFailureHandler` is also initialized in `lazyInitialize()`, > while `schedulerExecutor.execute` will ignore the NPE triggered by > `globalFailureHandler.handleGlobalFailure(e)`. > Thus it appears that the task did not failover. > !image-2023-08-04-18-28-05-897.png|width=963,height=443! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] luoyuxia commented on a diff in pull request #23116: [FLINK-32581][docs] Add docs for atomic CTAS and RTAS
luoyuxia commented on code in PR #23116: URL: https://github.com/apache/flink/pull/23116#discussion_r1284307429 ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -554,7 +554,13 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 * 暂不支持创建分区表。 * 暂不支持主键约束。 -**注意** 目前,CTAS 创建的目标表是非原子性的,如果在向表中插入数据时发生错误,该表不会被自动删除。 +**注意** 默认情况下,CTAS 是非原子性的,这意味着如果在向表中插入数据时发生错误,该表不会被自动删除。 + + 原子性 + +如果要启用 CTAS 的原子性,则应确保: +* Sink 已经实现了 CTAS 的原子性语义。通过阅读 Sink 的文档可以知道其是否已经支持了原子性语义。如果开发者想要实现原子性语义,请参考文档 [SupportsStaging]({{< ref "docs/dev/table/sourcesSinks" >}}#sink-abilities)。 Review Comment: ```suggestion * 对应的 Connector sink 已经实现了 CTAS 的原子性语义,你可能需要阅读对应 Connector 的文档看是否已经支持了原子性语义。如果开发者想要实现原子性语义,请参考文档 [SupportsStaging]({{< ref "docs/dev/table/sourcesSinks" >}}#sink-abilities)。 ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -554,7 +554,13 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 * 暂不支持创建分区表。 * 暂不支持主键约束。 -**注意** 目前,CTAS 创建的目标表是非原子性的,如果在向表中插入数据时发生错误,该表不会被自动删除。 +**注意** 默认情况下,CTAS 是非原子性的,这意味着如果在向表中插入数据时发生错误,该表不会被自动删除。 + + 原子性 + +如果要启用 CTAS 的原子性,则应确保: +* Sink 已经实现了 CTAS 的原子性语义。通过阅读 Sink 的文档可以知道其是否已经支持了原子性语义。如果开发者想要实现原子性语义,请参考文档 [SupportsStaging]({{< ref "docs/dev/table/sourcesSinks" >}}#sink-abilities)。 +* 设置配置项 `table.rtas-ctas.atomicity-enabled` 为 `true`。 Review Comment: Add a link for `table.rtas-ctas.atomicity-enabled` ## docs/content/docs/dev/table/sourcesSinks.md: ## @@ -344,6 +344,11 @@ that a sink can still work on common data structures and perform a conversion at and consuming them to achieve the purpose of row(s) update. + +{{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/abilities/SupportsStaging.java" name="SupportsStaging" >}} +Enables to supports atomic semantic for CTAS or RTAS in a DynamicTableSink. The table sink is responsible for return `StagedTable` object that provide atomic semantics. Review Comment: ```suggestion Enables to support atomic semantic for CTAS(CREATE TABLE AS) or RTAS([CREATE OR] REPLACE TABLE AS SELECT) in a DynamicTableSink. The table sink is responsible for returning StagedTable object that provide atomic semantics. ``` ## docs/content.zh/docs/dev/table/sourcesSinks.md: ## @@ -268,15 +268,19 @@ Flink 会对工厂类逐个进行检查,确保其“标识符”是全局唯 {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/abilities/SupportsDeletePushDown.java" name="SupportsDeletePushDown" >}} -支持将 DELETE 语句中的过滤条件下推到 DynamicTableSink,sink 端可以直接根据过滤条件来删除数据。 +支持将 DELETE 语句中的过滤条件下推到 DynamicTableSink,sink 端可以直接根据过滤条件来删除数据。 {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/abilities/SupportsRowLevelDelete.java" name="SupportsRowLevelDelete" >}} -支持 DynamicTableSink 根据行级别的变更来删除已有的数据。该接口的实现者需要告诉 Planner 如何产生这些行变更,并且需要消费这些行变更从而达到删除数据的目的。 +支持 DynamicTableSink 根据行级别的变更来删除已有的数据。该接口的实现者需要告诉 Planner 如何产生这些行变更,并且需要消费这些行变更从而达到删除数据的目的。 {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/abilities/SupportsRowLevelUpdate.java" name="SupportsRowLevelUpdate" >}} -支持 DynamicTableSink 根据行级别的变更来更新已有的数据。该接口的实现者需要告诉 Planner 如何产生这些行变更,并且需要消费这些行变更从而达到更新数据的目的。 +支持 DynamicTableSink 根据行级别的变更来更新已有的数据。该接口的实现者需要告诉 Planner 如何产生这些行变更,并且需要消费这些行变更从而达到更新数据的目的。 + + +{{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/abilities/SupportsStaging.java" name="SupportsStaging" >}} +支持 DynamicTableSink 提供 CTAS 或 RTAS 的原子性语义。该接口的实现者需要返回一个提供原子性语义实现的 `StagedTable` 对象。 Review Comment: ```suggestion 支持 DynamicTableSink 提供 CTAS(CREATE TABLE AS) 或 RTAS([CREATE OR] REPLACE TABLE AS SELECT) 的原子性语义。该接口的实现者需要返回一个提供原子性语义实现的 StagedTable 对象。 ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -608,6 +614,14 @@ INSERT INTO my_rtas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 * 暂不支持创建分区表。 * 暂不支持主键约束。 +**注意** 默认情况下,RTAS 是非原子性的,这意味着如果在向表中插入数据时发生错误,该表不会被自动删除或还原成原来的表。 Review Comment: nit ```suggestion **注意:** 默认情况下,RTAS 是非原子性的,这意味着如果在向表中插入数据时发生错误,该表不会被自动删除或还原成原来的表。 ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -554,7 +554,13 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 * 暂不支持创建分区表。 * 暂不支持主键约束。 -**注意** 目前,CTAS 创建的目标表是非原子性的,如果在向表中插入数据时发生错误,该表不会被自动删除。 +**注意** 默认情况下,CTAS 是非原子性的,这意味着如果在向表中插入数据时发生错误,该表不会被自动删除。 Review Comment: nit ```suggestion **注意:** 默认情况下,CTAS 是非原子性的,这意味着如果在向表中插入数据时发生错误,该表不会被自动删除。 ``` ## docs/content/docs/dev/table/sql/create.md: ## @@ -609,6 +615,14
[jira] [Commented] (FLINK-32667) Use standalone store and embedding writer for jobs with no-restart-strategy in session cluster
[ https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751092#comment-17751092 ] Fang Yong commented on FLINK-32667: --- [~mapohl] I think performance testing is a good suggestion, we also discussed and created FLINK-25356 at the beginning. I will consider how to add e2e benchmark in project flink-benchmarks and I think we should to add micro benchmarks for primary issue. Unfortunately, there is currently no ML about the ML for short-living jobs in flink. We only discussed this with [~xtsong] off-line when we created FLINK-25318, but I strongly agree with you that we need to initiate broader discussion in the community dev ML. I will collect our practical experiences and initiate a discussion in ML later, thank you very much for your valuable suggestion! > Use standalone store and embedding writer for jobs with no-restart-strategy > in session cluster > -- > > Key: FLINK-32667 > URL: https://issues.apache.org/jira/browse/FLINK-32667 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Major > Labels: pull-request-available > > When a flink session cluster use zk or k8s high availability service, it will > store jobs in zk or ConfigMap. When we submit flink olap jobs to the session > cluster, they always turn off restart strategy. These jobs with > no-restart-strategy should not be stored in zk or ConfigMap in k8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32755) Add quick start guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-32755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751088#comment-17751088 ] Benchao Li commented on FLINK-32755: +1 > Add quick start guide for Flink OLAP > > > Key: FLINK-32755 > URL: https://issues.apache.org/jira/browse/FLINK-32755 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Priority: Major > > I propose to add a new {{QUICKSTART.md}} guide that provides instructions for > beginner to build a production ready Flink OLAP Service by using > flink-jdbc-driver, flink-sql-gateway and flink session cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reues ZK CuratorFramework when submitting OLAP jobs to Flink session cluster
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Component/s: Client / Job Submission > Reues ZK CuratorFramework when submitting OLAP jobs to Flink session cluster > > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32756) Reues ZK CuratorFramework when submitting OLAP jobs to Flink session cluster
xiangyu feng created FLINK-32756: Summary: Reues ZK CuratorFramework when submitting OLAP jobs to Flink session cluster Key: FLINK-32756 URL: https://issues.apache.org/jira/browse/FLINK-32756 Project: Flink Issue Type: Sub-task Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32755) Add quick start guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-32755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32755: - Description: I propose to add a new {{QUICKSTART.md}} guide that provides instructions for beginner to build a production ready Flink OLAP Service by using flink-jdbc-driver, flink-sql-gateway and flink session cluster. > Add quick start guide for Flink OLAP > > > Key: FLINK-32755 > URL: https://issues.apache.org/jira/browse/FLINK-32755 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Priority: Major > > I propose to add a new {{QUICKSTART.md}} guide that provides instructions for > beginner to build a production ready Flink OLAP Service by using > flink-jdbc-driver, flink-sql-gateway and flink session cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32755) Add quick start guide for Flink OLAP
xiangyu feng created FLINK-32755: Summary: Add quick start guide for Flink OLAP Key: FLINK-32755 URL: https://issues.apache.org/jira/browse/FLINK-32755 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32701) Potential Memory Leak in Flink CEP due to Persistent Starting States in NFAState
[ https://issues.apache.org/jira/browse/FLINK-32701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751083#comment-17751083 ] Puneet Duggal commented on FLINK-32701: --- [~martijnvisser] [~Juntao Hu] [~nicholasjiang] During the execution of a job, two primary sources of memory leak have been identified in the CEP Operator: # {{NFAState}} # {{{}SharedBuffer.eventsCount{}}}. Implementations to resolve these memory leaks are as follows: *NFAState Leak Resolution* The {{NFAState}} keyed state in CEP Operator contains two states: * {{Queue partialMatches}} * {{Queue completedMatches}} Despite all events for a key being processed (either matches or timed out), the {{partialMatches}} still retains the starting state for that key. This occurs for every key encountered by CEP throughout the job execution, leading to a memory leak. To mitigate this, a check has been introduced: once all matches have been processed and the time for all states advances based on the watermark, the {{NFAState}} is cleared if {{completedMatches}} is empty and {{partialMatches}} only contains a single state (the starting state). {code:java} // STEP 4 updateNFA(nfaState); // In order to remove dangling partial matches if (nfaState.getPartialMatches().size() == 1 && nfaState.getCompletedMatches().isEmpty()) { computationStates.clear(); } {code} The applied fix has been tested with the existing set of Flink unit test cases, all of which have passed. The fix has also been verified against our specific use case scenarios, and it functions as expected. *SharedBuffer.EventsCount Leak Resolution* The {{eventsCount}} in the shared buffer is responsible for maintaining the mapping of timestamp and {{eventId}} for each event for a key. As the watermark surpasses the timestamp of an event, CEP continues to remove mappings from {{{}eventsCount{}}}. However, an empty map state for a key still consumes memory, resulting in a memory leak. To rectify this, a check has been added: if the {{eventsCount}} map state is empty after the CEP Operator advances time (removing events and matches with a timestamp earlier than the watermark), it is cleared. This fix, upon testing, resulted in the failure of two unit test cases. These failures occurred because the tests assert a fixed number of total state writes in the CEP Operator when evaluating a pattern sequence. As expected, this number has increased because we are clearing the {{eventsCount}} map. However, when tested against our specific use case scenarios, the fix functioned correctly. {code:java} void advanceTime(long timestamp) throws Exception { Iterator iterator = eventsCount.keys().iterator(); while (iterator.hasNext()) { Long next = iterator.next(); if (next < timestamp) { iterator.remove(); } } //memory leak resolution if (eventsCount.isEmpty()) { eventsCount.clear(); } } {code} Please let me know if there are any concerns or questions. > Potential Memory Leak in Flink CEP due to Persistent Starting States in > NFAState > > > Key: FLINK-32701 > URL: https://issues.apache.org/jira/browse/FLINK-32701 > Project: Flink > Issue Type: Bug > Components: Library / CEP >Affects Versions: 1.17.0, 1.17.1 >Reporter: Puneet Duggal >Priority: Critical > Attachments: Screenshot 2023-07-26 at 11.45.06 AM.png, Screenshot > 2023-07-26 at 11.50.28 AM.png > > > Our team has encountered a potential memory leak issue while working with the > Complex Event Processing (CEP) library in Flink v1.17. > h2. Context > The CEP Operator maintains a keyed state called NFAState, which holds two > queues: one for partial matches and one for completed matches. When a key is > first encountered, the CEP creates a starting computation state and stores it > in the partial matches queue. As more events occur that match the defined > conditions (e.g., a TAKE condition), additional computation states get added > to the queue, with their specific type (normal, pending, end) depending on > the pattern sequence. > However, I have noticed that the starting computation state remains in the > partial matches queue even after the pattern sequence has been completely > matched. This is also the case for keys that have already timed out. As a > result, the state gets stored for all keys that the CEP ever encounters, > leading to a continual increase in the checkpoint size. > h2. How to reproduce this > # Pattern Sequence - A not_followed_by B within 5 mins > # Time Characteristic - EventTime > # StateBackend - FsStateBackend > On my local machine, I started this pipeline and started sending events at > the rate of 10
[jira] [Updated] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
[ https://issues.apache.org/jira/browse/FLINK-32754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32754: Description: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. {*}Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state{*}. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. *Q: Why does this bug prevent the task from creating the Checkpoint?* `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. *Q: Why this bug doesn't trigger a task failover?* In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=963,height=443! was: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=963,height=443! > Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE > -- > > Key: FLINK-32754 > URL: https://issues.apache.org/jira/browse/FLINK-32754 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.17.0, 1.17.1 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-08-04-18-28-05-897.png > > > We registered some metrics in the `enumerator` of the flip-27 source via > `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in > JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. > {*}Meanwhile, the task does not experience failover, and the Checkpoints > cannot be successfully created even after the task is in running state{*}. > We found that the implementation class of `SplitEnumerator` is > `LazyInitializedCoordinatorContext`, however, the metricGroup() is > initialized after calling lazyInitialize(). By reviewing the code, we found > that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() > has not been called yet, so NPE is thrown. > *Q: Why does this bug prevent the task from creating the Checkpoint?* > `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the > member variable `enumerator` in `SourceCoordinator` being null. > Unfortunately,
[jira] [Updated] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
[ https://issues.apache.org/jira/browse/FLINK-32754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32754: Description: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=963,height=443! was: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=2442,height=1123! > Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE > -- > > Key: FLINK-32754 > URL: https://issues.apache.org/jira/browse/FLINK-32754 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.17.0, 1.17.1 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-08-04-18-28-05-897.png > > > We registered some metrics in the `enumerator` of the flip-27 source via > `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in > JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. > Meanwhile, the task does not experience failover, and the Checkpoints cannot > be successfully created even after the task is in running state. > We found that the implementation class of `SplitEnumerator` is > `LazyInitializedCoordinatorContext`, however, the metricGroup() is > initialized after calling lazyInitialize(). By reviewing the code, we found > that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() > has not been called yet, so NPE is thrown. > Q: Why does this bug prevent the task from creating the Checkpoint? > `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the > member variable `enumerator` in `SourceCoordinator` being null. > Unfortunately, all
[jira] [Created] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
Yu Chen created FLINK-32754: --- Summary: Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE Key: FLINK-32754 URL: https://issues.apache.org/jira/browse/FLINK-32754 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.17.1, 1.17.0 Reporter: Yu Chen Attachments: image-2023-08-04-18-28-05-897.png We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=2442,height=1123! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32701) Potential Memory Leak in Flink CEP due to Persistent Starting States in NFAState
[ https://issues.apache.org/jira/browse/FLINK-32701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Puneet Duggal updated FLINK-32701: -- Description: Our team has encountered a potential memory leak issue while working with the Complex Event Processing (CEP) library in Flink v1.17. h2. Context The CEP Operator maintains a keyed state called NFAState, which holds two queues: one for partial matches and one for completed matches. When a key is first encountered, the CEP creates a starting computation state and stores it in the partial matches queue. As more events occur that match the defined conditions (e.g., a TAKE condition), additional computation states get added to the queue, with their specific type (normal, pending, end) depending on the pattern sequence. However, I have noticed that the starting computation state remains in the partial matches queue even after the pattern sequence has been completely matched. This is also the case for keys that have already timed out. As a result, the state gets stored for all keys that the CEP ever encounters, leading to a continual increase in the checkpoint size. h2. How to reproduce this # Pattern Sequence - A not_followed_by B within 5 mins # Time Characteristic - EventTime # StateBackend - FsStateBackend On my local machine, I started this pipeline and started sending events at the rate of 10 events per second (only A) and as expected after 5 mins, CEP started sending pattern matched output with the same rate. But the issue was that after every 2 mins (checkpoint interval), checkpoint size kept on increasing. Expectation was that after 5 mins (2-3 checkpoints), checkpoint size will remain constant since any window of 5 mins will consist of the same number of unique keys (older ones will get matched or timed out hence removed from state). But as you can see below attached images, checkpoint size kept on increasing till 40 checkpoints (around 1.5hrs). P.S. - After 3 checkpoints (6 mins), the checkpoint size was around 1.78MB. Hence assumption is that ideal checkpoint size for a 5 min window should be less than 1.78MB. As you can see after 39 checkpoints, I triggered a savepoint for this pipeline. After that I used a savepoint reader to investigate what all is getting stored in CEP states. Below code investigates NFAState of CEPOperator for potential memory leak. {code:java} import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import org.apache.flink.api.common.state.ValueState; import org.apache.flink.api.common.state.ValueStateDescriptor; import org.apache.flink.cep.nfa.NFAState; import org.apache.flink.cep.nfa.NFAStateSerializer; import org.apache.flink.configuration.Configuration; import org.apache.flink.runtime.state.filesystem.FsStateBackend; import org.apache.flink.state.api.OperatorIdentifier; import org.apache.flink.state.api.SavepointReader; import org.apache.flink.state.api.functions.KeyedStateReaderFunction; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; import org.junit.jupiter.api.Test; import java.io.Serializable; import java.util.Objects; public class NFAStateReaderTest { private static final String NFA_STATE_NAME = "nfaStateName"; @Test public void testNfaStateReader() throws Exception { StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); SavepointReader savepointReader = SavepointReader.read(environment, "file:///opt/flink/savepoints/savepoint-093404-9bc0a38654df", new FsStateBackend("file:///abc")); DataStream stream = savepointReader.readKeyedState(OperatorIdentifier.forUid("select_pattern_events"), new NFAStateReaderTest.NFAStateReaderFunction()); stream.print(); environment.execute(); } static class NFAStateReaderFunction extends KeyedStateReaderFunction { private ValueState computationStates; private static Long danglingNfaCount = 0L; private static Long newNfaCount = 0L; private static Long minTimestamp = Long.MAX_VALUE; private static Long minKeyForCurrentNfa = Long.MAX_VALUE; private static Long minKeyForDanglingNfa = Long.MAX_VALUE; private static Long maxKeyForDanglingNfa = Long.MIN_VALUE; private static Long maxKeyForCurrentNfa = Long.MIN_VALUE; @Override public void open(Configuration parameters) { computationStates = getRuntimeContext().getState(new ValueStateDescriptor<>(NFA_STATE_NAME, new NFAStateSerializer())); } @Override public void readKey(DynamicTuple key, Context ctx, Collector out) throws Exception { NFAState nfaState = computationStates.value(); if
[GitHub] [flink] flinkbot commented on pull request #23139: backport FLINK-29913 to release-1.17
flinkbot commented on PR #23139: URL: https://github.com/apache/flink/pull/23139#issuecomment-1665395206 ## CI report: * 47e7e3e67f61286867a5585ca92b67bc7aba4754 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32721) agg max/min supports char type
[ https://issues.apache.org/jira/browse/FLINK-32721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751074#comment-17751074 ] dalongliu commented on FLINK-32721: --- [~jackylau] After tracking down and finding the history issue https://issues.apache.org/jira/browse/FLINK-12834, we have char type support for the Min function in it. After discussing with [~lzljs3620320] offline, the reason why we don't support char type for Max function is because of the omission. So I think we should reuse the existing code as we did for the Min function, instead of having to extract a new code implementation for char alone. > agg max/min supports char type > -- > > Key: FLINK-32721 > URL: https://issues.apache.org/jira/browse/FLINK-32721 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: Jacky Lau >Assignee: Jacky Lau >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > {code:java} > // flink > Flink SQL> CREATE TABLE Orders ( > > name char(10), > > price DECIMAL(32,2), > > buyer ROW, > > order_time TIMESTAMP(3) > > ) WITH ( > > 'connector' = 'datagen' > > ); > [INFO] Execute statement succeed. > Flink SQL> select max(name) from Orders; > [ERROR] Could not execute SQL statement. Reason: > org.apache.flink.table.api.TableException: Max aggregate function does not > support type: ''CHAR''. > Please re-check the data type. {code} > {code:java} > // mysql > CREATE TABLE IF NOT EXISTS `docs` ( > `id` int(6) unsigned NOT NULL, > `rev` int(3) unsigned NOT NULL, > `content` char(200) NOT NULL, > PRIMARY KEY (`id`,`rev`) > ) DEFAULT CHARSET=utf8; > INSERT INTO `docs` (`id`, `rev`, `content`) VALUES > ('1', '1', 'The earth is flat'), > ('2', '1', 'One hundred angels can dance on the head of a pin'), > ('1', '2', 'The earth is flat and rests on a bull\'s horn'), > ('1', '3', 'The earth is like a ball.'); > select max(content) from docs; > // result > |max(content)| > The earth is like a ball.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] WencongLiu closed pull request #23126: Test
WencongLiu closed pull request #23126: Test URL: https://github.com/apache/flink/pull/23126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-32557) API deprecations in Flink 1.18
[ https://issues.apache.org/jira/browse/FLINK-32557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song closed FLINK-32557. Resolution: Done > API deprecations in Flink 1.18 > -- > > Key: FLINK-32557 > URL: https://issues.apache.org/jira/browse/FLINK-32557 > Project: Flink > Issue Type: Technical Debt >Reporter: Xintong Song >Priority: Major > Fix For: 1.18.0 > > > As discussed in [1], we are deprecating multiple APIs in release 1.18, in > order to completely remove them in release 2.0. > The listed APIs possibly should have been deprecated already, i.e., already > (or won't) have replacements, but are somehow not yet. > [1] https://lists.apache.org/thread/3dw4f8frlg8hzlv324ql7n2755bzs9hy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32558) Properly deprecate DataSet API
[ https://issues.apache.org/jira/browse/FLINK-32558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song closed FLINK-32558. Release Note: DataSet API is formally deprecated, and will be removed in the next major release. Resolution: Done master (1.18): aa98c18d2ba975479fcfa4930b0139fa575d303e > Properly deprecate DataSet API > -- > > Key: FLINK-32558 > URL: https://issues.apache.org/jira/browse/FLINK-32558 > Project: Flink > Issue Type: Sub-task > Components: API / DataSet >Reporter: Xintong Song >Assignee: Wencong Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > DataSet API is described as "legacy", "soft deprecated" in user documentation > [1]. The required tasks for formally deprecating / removing it, according to > FLIP-131 [2], are all completed. > This task include marking all related API classes as `@Deprecated` and update > the user documentation. > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/dataset/overview/ > [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] xintongsong closed pull request #23026: [FLINK-32558][flink-java] Deprecate all DataSet API
xintongsong closed pull request #23026: [FLINK-32558][flink-java] Deprecate all DataSet API URL: https://github.com/apache/flink/pull/23026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] WencongLiu commented on pull request #23026: [FLINK-32558][flink-java] Deprecate all DataSet API
WencongLiu commented on PR #23026: URL: https://github.com/apache/flink/pull/23026#issuecomment-1665359267 The CI has passed. Thanks for the careful review. @xintongsong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23138: [FLINK-32753] Print JVM flags on AZP
flinkbot commented on PR #23138: URL: https://github.com/apache/flink/pull/23138#issuecomment-1665316557 ## CI report: * 8b118cc7dfa52c52038f98ea0ca67f3bf9221f34 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32753) Print JVM flags on AZP
[ https://issues.apache.org/jira/browse/FLINK-32753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zakelly Lan updated FLINK-32753: Description: I suggest printing JVM flags before the tests run, which could help investigate the test failures (especially memory or GC related issue). An example of pipeline output [here|https://dev.azure.com/lzq82555906/flink-for-Zakelly/_build/results?buildId=122=logs=9dc1b5dc-bcfa-5f83-eaa7-0cb181ddc267=511d2595-ec54-5ab7-86ce-92f328796f20=165]. You may search 'JVM information' in this log. (was: I suggest printing JVM flags before the tests run, which could help investigate the test failures (especially memory or GC related issue). An example: [https://dev.azure.com/lzq82555906/42941eae-0335-452a-89c5-bd6a71809dc9/_apis/build/builds/122/logs/157] . You may search 'JVM information' in this log.) > Print JVM flags on AZP > -- > > Key: FLINK-32753 > URL: https://issues.apache.org/jira/browse/FLINK-32753 > Project: Flink > Issue Type: Improvement > Components: Build System / Azure Pipelines >Reporter: Zakelly Lan >Assignee: Zakelly Lan >Priority: Minor > Labels: pull-request-available > Fix For: 1.18.0 > > > I suggest printing JVM flags before the tests run, which could help > investigate the test failures (especially memory or GC related issue). An > example of pipeline output > [here|https://dev.azure.com/lzq82555906/flink-for-Zakelly/_build/results?buildId=122=logs=9dc1b5dc-bcfa-5f83-eaa7-0cb181ddc267=511d2595-ec54-5ab7-86ce-92f328796f20=165]. > You may search 'JVM information' in this log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32753) Print JVM flags on AZP
[ https://issues.apache.org/jira/browse/FLINK-32753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32753: --- Labels: pull-request-available (was: ) > Print JVM flags on AZP > -- > > Key: FLINK-32753 > URL: https://issues.apache.org/jira/browse/FLINK-32753 > Project: Flink > Issue Type: Improvement > Components: Build System / Azure Pipelines >Reporter: Zakelly Lan >Assignee: Zakelly Lan >Priority: Minor > Labels: pull-request-available > Fix For: 1.18.0 > > > I suggest printing JVM flags before the tests run, which could help > investigate the test failures (especially memory or GC related issue). An > example: > [https://dev.azure.com/lzq82555906/42941eae-0335-452a-89c5-bd6a71809dc9/_apis/build/builds/122/logs/157] > . You may search 'JVM information' in this log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] Zakelly opened a new pull request, #23138: [FLINK-32753] Print JVM flags on AZP
Zakelly opened a new pull request, #23138: URL: https://github.com/apache/flink/pull/23138 ## What is the purpose of the change JVM flags could help investigate the test failures (especially memory or GC related issue). This PR prints the JVM flags before tests run. An example of pipeline output here: https://dev.azure.com/lzq82555906/42941eae-0335-452a-89c5-bd6a71809dc9/_apis/build/builds/122/logs/157 . You may search 'JVM information' in this log. ## Brief change log add JVM information printing section in `print_system_info` of pipeline scripts. ## Verifying this change This change is a pipeline script update without any test coverage. Check the AZP log and get the print. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-32753) Print JVM flags on AZP
[ https://issues.apache.org/jira/browse/FLINK-32753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zakelly Lan reassigned FLINK-32753: --- Assignee: Zakelly Lan > Print JVM flags on AZP > -- > > Key: FLINK-32753 > URL: https://issues.apache.org/jira/browse/FLINK-32753 > Project: Flink > Issue Type: Improvement > Components: Build System / Azure Pipelines >Reporter: Zakelly Lan >Assignee: Zakelly Lan >Priority: Minor > Fix For: 1.18.0 > > > I suggest printing JVM flags before the tests run, which could help > investigate the test failures (especially memory or GC related issue). An > example: > [https://dev.azure.com/lzq82555906/42941eae-0335-452a-89c5-bd6a71809dc9/_apis/build/builds/122/logs/157] > . You may search 'JVM information' in this log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32752) Release Testing: Verify FLINK-24909 SQL syntax highlighting in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-32752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Nuyanzin updated FLINK-32752: Description: After FLINK-24909 it is possible to specify syntax highlight color schema as mentioned in doc via {{sql-client.display.color-schema}} config option {code:sql} SET 'sql-client.display.color-schema' = ... {code} Possible values are {{chester}}, {{dracula}}, {{solarized}}, {{vs2010}}, {{obsidian}}, {{geshi}}, {{default}}. It allows to highlight keywords, quoted text, sql identifiers quoted text (ticks for default dialect and double quotes for Hive), comments (both one-line and block comments), hints was: After it is possible to specify syntax highlight color schema as mentioned in doc via {{sql-client.display.color-schema}} config option {code:sql} SET 'sql-client.display.color-schema' = ... {code} Possible values are {{chester}}, {{dracula}}, {{solarized}}, {{vs2010}}, {{obsidian}}, {{geshi}}, {{default}}. It allows to highlight keywords, quoted text, sql identifiers quoted text (ticks for default dialect and double quotes for Hive), comments (both one-line and block comments), hints > Release Testing: Verify FLINK-24909 SQL syntax highlighting in SQL Client > - > > Key: FLINK-32752 > URL: https://issues.apache.org/jira/browse/FLINK-32752 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Sergey Nuyanzin >Priority: Major > Labels: release-testing > > After FLINK-24909 it is possible to specify syntax highlight color schema as > mentioned in doc via > {{sql-client.display.color-schema}} config option > {code:sql} > SET 'sql-client.display.color-schema' = ... > {code} > Possible values are {{chester}}, {{dracula}}, {{solarized}}, {{vs2010}}, > {{obsidian}}, {{geshi}}, {{default}}. > It allows to highlight keywords, quoted text, sql identifiers quoted text > (ticks for default dialect and double quotes for Hive), comments (both > one-line and block comments), hints -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32753) Print JVM flags on AZP
Zakelly Lan created FLINK-32753: --- Summary: Print JVM flags on AZP Key: FLINK-32753 URL: https://issues.apache.org/jira/browse/FLINK-32753 Project: Flink Issue Type: Improvement Components: Build System / Azure Pipelines Reporter: Zakelly Lan Fix For: 1.18.0 I suggest printing JVM flags before the tests run, which could help investigate the test failures (especially memory or GC related issue). An example: [https://dev.azure.com/lzq82555906/42941eae-0335-452a-89c5-bd6a71809dc9/_apis/build/builds/122/logs/157] . You may search 'JVM information' in this log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32752) Release Testing: Verify FLINK-24909 SQL syntax highlighting in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-32752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32752: -- Labels: release-testing (was: ) > Release Testing: Verify FLINK-24909 SQL syntax highlighting in SQL Client > - > > Key: FLINK-32752 > URL: https://issues.apache.org/jira/browse/FLINK-32752 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Sergey Nuyanzin >Priority: Major > Labels: release-testing > > After it is possible to specify syntax highlight color schema as mentioned in > doc via > {{sql-client.display.color-schema}} config option > {code:sql} > SET 'sql-client.display.color-schema' = ... > {code} > Possible values are {{chester}}, {{dracula}}, {{solarized}}, {{vs2010}}, > {{obsidian}}, {{geshi}}, {{default}}. > It allows to highlight keywords, quoted text, sql identifiers quoted text > (ticks for default dialect and double quotes for Hive), comments (both > one-line and block comments), hints -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32752) Release Testing: Verify FLINK-24909 SQL syntax highlighting in SQL Client
Sergey Nuyanzin created FLINK-32752: --- Summary: Release Testing: Verify FLINK-24909 SQL syntax highlighting in SQL Client Key: FLINK-32752 URL: https://issues.apache.org/jira/browse/FLINK-32752 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Sergey Nuyanzin After it is possible to specify syntax highlight color schema as mentioned in doc via {{sql-client.display.color-schema}} config option {code:sql} SET 'sql-client.display.color-schema' = ... {code} Possible values are {{chester}}, {{dracula}}, {{solarized}}, {{vs2010}}, {{obsidian}}, {{geshi}}, {{default}}. It allows to highlight keywords, quoted text, sql identifiers quoted text (ticks for default dialect and double quotes for Hive), comments (both one-line and block comments), hints -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32667) Use standalone store and embedding writer for jobs with no-restart-strategy in session cluster
[ https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751048#comment-17751048 ] Matthias Pohl commented on FLINK-32667: --- It would be great to have these performance tests set up before merging the changes. That would give us some more assurance that working on those issues is reasonable. I'm also wondering whether there was a broader discussion on adding this short-living jobs to the scope of Apache Flink in the ML. Do you have a link to that discussion? > Use standalone store and embedding writer for jobs with no-restart-strategy > in session cluster > -- > > Key: FLINK-32667 > URL: https://issues.apache.org/jira/browse/FLINK-32667 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Major > Labels: pull-request-available > > When a flink session cluster use zk or k8s high availability service, it will > store jobs in zk or ConfigMap. When we submit flink olap jobs to the session > cluster, they always turn off restart strategy. These jobs with > no-restart-strategy should not be stored in zk or ConfigMap in k8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32656) Deprecate ManagedTable related APIs
[ https://issues.apache.org/jira/browse/FLINK-32656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song closed FLINK-32656. Release Note: ManagedTable related APIs are deprecated and will be removed in a future major release. Resolution: Done master (1.18): 34729c8891448b8f0a96dbbc12603b44a6e130c5 > Deprecate ManagedTable related APIs > --- > > Key: FLINK-32656 > URL: https://issues.apache.org/jira/browse/FLINK-32656 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.18.0 >Reporter: Jane Chan >Assignee: Jane Chan >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Please refer to [FLIP-346: Deprecate ManagedTable related > APIs|https://cwiki.apache.org/confluence/display/FLINK/FLIP-346%3A+Deprecate+ManagedTable+related+APIs] > for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] xintongsong commented on pull request #23135: [FLINK-32741]Remove DataSet related descriptions in doc
xintongsong commented on PR #23135: URL: https://github.com/apache/flink/pull/23135#issuecomment-1665254926 Thanks, @pegasas. I think https://github.com/apache/flink/blob/master/docs/content.zh/docs/dev/dataset/hadoop_compatibility.md should be removed. Actually, this file only exist for the `content.zh` but not for `content`. If you check the git history, you'll find that hadoop format related documents have been moved to `connectors/datastream/formats` in FLINK-21407, where the Chinese translation was overlooked. I'll review the other changes asap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30629) ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable
[ https://issues.apache.org/jira/browse/FLINK-30629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-30629: -- Fix Version/s: (was: 1.17.2) > ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable > - > > Key: FLINK-30629 > URL: https://issues.apache.org/jira/browse/FLINK-30629 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.17.0, 1.18.0 >Reporter: Xintong Song >Assignee: Liu >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.18.0 > > Attachments: ClientHeartbeatTestLog.txt, > logs-cron_azure-test_cron_azure_core-1685497478.zip > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44690=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=10819 > {code:java} > Jan 11 04:32:39 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 21.02 s <<< FAILURE! - in > org.apache.flink.client.ClientHeartbeatTest > Jan 11 04:32:39 [ERROR] > org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat > Time elapsed: 9.157 s <<< ERROR! > Jan 11 04:32:39 java.lang.IllegalStateException: MiniCluster is not yet > running or has already been shut down. > Jan 11 04:32:39 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:1044) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:917) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniCluster.getJobStatus(MiniCluster.java:841) > Jan 11 04:32:39 at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.getJobStatus(MiniClusterJobClient.java:91) > Jan 11 04:32:39 at > org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat(ClientHeartbeatTest.java:79) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] WencongLiu commented on a diff in pull request #22341: [FLINK-27204][flink-runtime] Refract FileSystemJobResultStore to execute I/O operations on the ioExecutor
WencongLiu commented on code in PR #22341: URL: https://github.com/apache/flink/pull/22341#discussion_r1284129028 ## flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java: ## @@ -516,7 +517,7 @@ public CompletableFuture submitJob(JobGraph jobGraph, Time timeout) try { if (isDuplicateJob(jobGraph.getJobID())) { -if (isInGloballyTerminalState(jobGraph.getJobID())) { +if (isInGloballyTerminalState(jobGraph.getJobID()).get()) { Review Comment: Fixed. I've fixed it with the async style. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org