flink????????????????????

2021-12-01 Thread ????

Issues in Batch Jobs Submission for a Session Cluster

2021-12-01 Thread Ghiya, Jay (GE Healthcare)
Hi Flink Team, Greetings from GE Healthcare team. Here is a stackoverflow post for the same too posted by fellow dev here : https://stackoverflow.com/questions/70068336/flink-job-not-getting-submitted-java-io-ioexception-cannot-allocate-memory Summary of the post: Here is the usecase and

Re:Re: 关于streamFileSink在checkpoint下生成文件问题

2021-12-01 Thread 黄志高
| 32684 | COMPLETED | 8/8 | 13:52:36 | 13:52:38 | 2s | 126 KB | 0 B | | | 32683 | COMPLETED | 8/8 | 13:42:36 | 13:42:39 | 2s | 126 KB | 0 B | | | 32682 | COMPLETED | 8/8 | 13:32:36 | 13:32:39 | 2s | 126 KB | 0 B | | | 32681 | COMPLETED | 8/8 | 13:22:36 | 13:22:39 | 2s | 125 KB | 0 B | | |

ERROR org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState - Authentication failed

2021-12-01 Thread summer
在我CDH6.3.2集成Flink1.13.3的时候,在执行flink-sql的时候,在日志中会出现这个报错: ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal

ERROR org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState - Authentication failed

2021-12-01 Thread summer
在我CDH6.3.2集成Flink1.13.3的时候,在执行flink-sql的时候,在日志中会出现这个报错: ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal

Re:Re: flink remote shuffle example运行出错

2021-12-01 Thread casel.chen
sorry, 应该是我本地切换FLINK_HOME没生效导致的,重新开启一个terminal新环境变量生效后重新操作就正常了,打扰各位了 在 2021-12-02 10:35:53,"weijie guo" 写道: >你好,你是在flink standalone模式下提交的作业吗,另外用的flink是官网download的Apache Flink 1.14.0 >for Scala 2.12

Re: 关于streamFileSink在checkpoint下生成文件问题

2021-12-01 Thread Caizhi Weng
Hi! 邮件里看不到图片和附件,建议使用外部图床。 partFile 文件是不是以英文句点开头的?这是因为 streamingFileSink 写文件的时候还没做 checkpoint,为了保证 exactly once,这些临时写下的 .partFile 文件都是不可见的,需要等 checkpoint 之后才会重命名成可见的文件。 黄志高 于2021年12月1日周三 下午9:53写道: > hi,各位大佬,咨询个问题 > > >

Re: 谁能解释一下 GlobalStreamExchangeMode 这几种交换模式的不同和使用场景吗?

2021-12-01 Thread Yingjie Cao
这个是可以直接控制内部连边的方式,可以参考一下这个的Java doc。不过这个是一个内部接口,还是建议使用 env.setRuntimeMode(RuntimeExecutionMode.BATCH),这个可以参考一下这个文档: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/execution_mode/ 。 public enum GlobalStreamExchangeMode { /** Set all job edges to be {@link

Re: flink remote shuffle example运行出错

2021-12-01 Thread Yingjie Cao
看起来flink-streaming-java应该不在class path下面,如果是官网下的flink,直接在FLINK_HOME下执行./bin/flink run提交job应该就不会出错。另外这个错误我理解本身和remote shuffle应该关系不大,不用remote shuffle,应该也会抛。建议是从官网下载一个Flink版本再试一下。

Re: flink remote shuffle example运行出错

2021-12-01 Thread weijie guo
你好,你是在flink standalone模式下提交的作业吗,另外用的flink是官网download的Apache Flink 1.14.0 for Scala 2.12 吗 casel.chen 于2021年12月2日周四 上午8:12写道: > 按照 https://github.com/flink-extended/flink-remote-shuffle 上的指南试着运行flink >

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-12-01 Thread Hang Ruan
Sorry, I spell it wrong, which I mean the PR. Here it is https://github.com/apache/flink/pull/17276 . Marco Villalobos 于2021年12月1日周三 下午9:18写道: > Thank you. One last question. What is an RP? Where can I read it? > > Marco > > On Nov 30, 2021, at 11:06 PM, Hang Ruan wrote: > > Hi, > > In

Re: flink sql collect函数使用问题

2021-12-01 Thread cyril cui
af里acc为个list,merge的时候合并,输出的时候 list拼成string即可 casel.chen 于2021年12月2日周四 上午9:46写道: > 使用场景如下,将kafka源表通过flink sql处理成mongodb汇表存入。按照班级进行group > by,输出对应班级所有的学生数据集合。请问用flink sql自带的collect函数能实现吗?如果能的话要怎么写sql? > 如果不能的话要怎么写UDAF,有例子参考吗?谢谢! > > kafka源表: > 班级 学号 姓名 年龄 > 1 20001张三

flink sql collect函数使用问题

2021-12-01 Thread casel.chen
使用场景如下,将kafka源表通过flink sql处理成mongodb汇表存入。按照班级进行group by,输出对应班级所有的学生数据集合。请问用flink sql自带的collect函数能实现吗?如果能的话要怎么写sql? 如果不能的话要怎么写UDAF,有例子参考吗?谢谢! kafka源表: 班级 学号 姓名 年龄 1 20001张三 15 2 20011李四 16 1 20002王五 16 2 20012吴六

????

2021-12-01 Thread zdj

谁能解释一下 GlobalStreamExchangeMode 这几种交换模式的不同和使用场景吗?

2021-12-01 Thread casel.chen
GlobalStreamExchangeMode 这几种交换模式的不同和使用场景是什么?哪些适合流式作业,哪些适合批式作业? Flink Remote Shuffle Service的推出是不是意味着可以在生产环境使用Flink处理批式作业?谢谢! package org.apache.flink.streaming.api.graph; import org.apache.flink.annotation.Internal; @Internal public enum GlobalStreamExchangeMode {

flink remote shuffle example运行出错

2021-12-01 Thread casel.chen
按照 https://github.com/flink-extended/flink-remote-shuffle 上的指南试着运行flink remote shuffle服务跑一个batch作业,结果报错如下。我本地使用的是scala 2.12 因此编译打包flink-remote-shuffle的时候使用的命令是:mvn clean install -DskipTests -Dscala.binary.version=2.12

Re: [DISCUSS] Deprecate Java 8 support

2021-12-01 Thread Chesnay Schepler
Flink can be built with Java 11 since 1.10. If I recall correctly we solved the tools.jar issue, which Hadoop depends on, by excluding that dependency. As far as we could tell it's not actually required. On 01/12/2021 19:56, Nicolás Ferrario wrote: Hi all, this would be awesome, I'm so tired

Re: [DISCUSS] Deprecate Java 8 support

2021-12-01 Thread Nicolás Ferrario
Hi all, this would be awesome, I'm so tired of seeing Java 8 everywhere (reminds me of Python 2.7). We're currently building our code against Java 11 because that's the latest version of Java available as a Flink Docker image, but it'd be great to use newer versions. I think it would also help to

Re: REST API for detached minicluster based integration test

2021-12-01 Thread Jin Yi
so i went ahead and put some logging in the WatermarkGeneartor.onEvent and .onPeriodicEmit functions in the test source watermark generator, and i do see the watermarks come by with values through those functions. they're just not being returned as expected via the rest api. On Tue, Nov 30, 2021

Re:Re: flink访问多个oss bucket问题

2021-12-01 Thread casel.chen
fs.oss.credentials.provider可以指定两个不同的provider吗? 如何区分是写数据的provider,还是做checkpoint/savepoint用的provider呢? 在 2021-12-01 10:58:18,"Caizhi Weng" 写道: >Hi! > >如果只是 bucket 不同的话,通过在 with 参数里指定 path 即可。 > >如果连 ak id 和 secret >都不同,可以考虑实现自己的 com.aliyun.oss.common.auth.CredentialsProvider

flink sql group by后收集数据集合问题

2021-12-01 Thread casel.chen
业务中使用flink sql group by操作后想收集每个分组下所有的数据,如下示例: kafka源表: 班级 学号 姓名 年龄 1 20001张三 15 2 20011李四 16 1 20002王五 16 2 20012吴六 15 create table source_table ( class_no: INT, student_no: INT, name: STRING,

关于streamFileSink在checkpoint下生成文件问题

2021-12-01 Thread 黄志高
hi,各位大佬,咨询个问题

Re: flink sql lookup join中维表不可以是视图吗?

2021-12-01 Thread Tony Wei
Hi, 如果兩次 left join 的話是否滿足你的需求呢? 然後在取 temporal table 的字段時,用 IF 去判斷取值。參考 SQL 如下 SELECT c.mer_cust_id, *IF(k.mer_cust_id IS NOT NULL AND a.mercust_id IS NOT NULL AND k.mer_cust_id <> '', k.update_time, NULL) AS update_time* FROM charge_log as c LEFT JOIN ka_mer_info FOR SYSTEM_TIME AS OF

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-12-01 Thread Marco Villalobos
Thank you. One last question. What is an RP? Where can I read it? Marco > On Nov 30, 2021, at 11:06 PM, Hang Ruan wrote: > > Hi, > > In 1.12.0-1.12.5 versions, committing offset to kafka when the checkpoint is > open is the default behavior in KafkaSourceBuilder. And it can not be changed

Re: [DISCUSS] Deprecate Java 8 support

2021-12-01 Thread Chesnay Schepler
Hello Gavin, If you run into any issues with Java 17, please report them in FLINK-15736 . I recently did some experiments with Java 17 myself; I would think that you will run into some blockers (like ASM requiring an upgrade

flink sql lookup join中维表不可以是视图吗?

2021-12-01 Thread casel.chen
lookup join用的维表需要从两张mysql表做关联后得到,因此创建了一个视图。但发现flink sql不支持lookup join关联视图,会抛 Temporal Table Join requires primary key in versioned table, but no primary key can be found. 请问这种情况要怎么解决? CREATE VIEW query_mer_view (mer_cust_id, update_time) AS SELECT a.mer_cust_id, k.update_time FROM

Flink checkpoint文件大小与对应内存大小映射关系

2021-12-01 Thread mayifan
Hi,All~! 麻烦大家一个问题,有大佬了解过checkpoint文件大小与实际内存对应的状态数据大小的映射关系吗? 比如Fs状态后端checkpoint后文件大小是1MB,对应的状态数据在内存中占用大概是多少呢? 感谢答复~!

Re: [DISCUSS] Deprecate Java 8 support

2021-12-01 Thread Gavin Lee
Thanks for sharing this info with us Chesnay. We've been using Flink for 5 years, and upgraded to 1.13.2 months ago. The java version is still 8. Currently we're testing with java 17 in our staging environment. There are no special concerns. Will update when tests complete. On Tue, Nov 30, 2021

Re: how to run streaming process after batch process is completed?

2021-12-01 Thread vtygoss
Hi Alexander, This is my ideal data pipeline. - 1. Sqoop transfer bounded data from database to hive. And I think flink batch process is more efficient than streaming process, so i want to process this bounded data in batch mode and write result in HiveTable2. - 2. There ares some tools to

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-12-01 Thread Yingjie Cao
Hi Jiangang, Great to hear that, welcome to work together to make the project better. Best, Yingjie 刘建刚 于2021年12月1日周三 下午3:27写道: > Good work for flink's batch processing! > Remote shuffle service can resolve the container lost problem and reduce > the running time for batch jobs once failover.

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-12-01 Thread Yingjie Cao
Hi Jiangang, Great to hear that, welcome to work together to make the project better. Best, Yingjie 刘建刚 于2021年12月1日周三 下午3:27写道: > Good work for flink's batch processing! > Remote shuffle service can resolve the container lost problem and reduce > the running time for batch jobs once failover.

Re:Re: flink sql group by后收集数据问题

2021-12-01 Thread casel.chen
我想要的是一个通用的收集ROW类型集合(ARRAY去重和不去重),不是只针对特定ROW @DataTypeHint("ROW") 这样写没有问题@DataTypeHint("ROW") 这样写会报错 在 2021-12-01 11:12:27,"Caizhi Weng" 写道: >Hi! > >UDF 支持 ROW 类型,详见 [1] 中关于 ROW 的示例。 > >[1]

Watermark behavior when connecting streams

2021-12-01 Thread Alexis Sarda-Espinosa
Hi everyone, Based on what I know, a single operator with parallelism > 1 checks the watermarks from all its streams and uses the smallest one out of the non-idle streams. My first question is whether watermarks are forwarded as long as a different watermark strategy is not applied downstream?

RE: Windows and data loss.

2021-12-01 Thread Schwalbe Matthias
Hi John, Sorry for the delay … I’m a little tight on spare time for user@flink currently. If you are still interested we could pick up the discussion and continue. However I’m don’t exactly understand what you want to achieve: 1. Would processing time windows be enough for you (and