Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Yuval Itzchakov
Hi All, I would really love to merge https://github.com/apache/flink/pull/15307 prior to 1.13 release cutoff, it just needs some more tests which I can hopefully get to today / tomorrow morning. This is a critical fix as now predicate pushdown won't work for any stream which generates a

Question about setting up Task-local recovery with a RocksDB state backend

2021-04-01 Thread Sonam Mandal
Hello, I've been going through the documentation for task-local recovery and came across this section which discusses that with incremental

Re: Re: How does Flink SQL read Avro union?

2021-04-01 Thread Arvid Heise
Hi Vincent, yes if you cannot influence the schema, then there is little you can do on SQL level and your workaround is probably the only way to go. However, I'd encourage you to speak with the other consumers/producers to find a way without unions. They are also ugly to use in all strongly

Re: Checkpoint timeouts at times of high load

2021-04-01 Thread Guowei Ma
Hi, I think there are many reasons that could lead to the checkpoint timeout. Would you like to share some detailed information of checkpoint? For example, the detailed checkpoint information from the web.[1] And which Flink version do you use? [1]

Re: Proper way to get DataStream

2021-04-01 Thread Arvid Heise
Hi, it seems as if the data is written with a confluent registry in mind, so you cannot use option 1: the kafka record is invalid avro as it contains a 5 byte prefix that identifies the schema. So the second way, is the way to go and it actually works well: it tells you that you have read with a

Re: s3 FileSystem Error "s3 file system implementation does not support recoverable writers"

2021-04-01 Thread Robert Cullen
Guowei, I changed to “s3a://argo-artifacts/“ but now I get this error: BTW: I'm using the example playground from here: [1] https://docs.ververica.com/getting_started/installation.html org.apache.flink.util.SerializedThrowable: 4bcbea1c-254c-4a7f-9348-14bf5f3e1f1a/cmdaa-0-0.ext: initiate

Re: DataStream from kafka topic

2021-04-01 Thread Arvid Heise
Arian gave good pointers, but I'd go even further: you should have ITCases where you pretty much just execute a mini job with docker-based Kafka and run it automatically. I strongly recommend to check out testcontainers [1], it makes writing such a test a really smooth experience. [1]

Re: Flink Taskmanager failure recovery and large state

2021-04-01 Thread Yaroslav Tkachenko
Hi Guowei, I thought Flink can support any HDFS-compatible object store like the majority of Big Data frameworks. So we just added "flink-shaded-hadoop-2-uber" and "gcs-connector-latest-hadoop2" dependencies to the classpath, after that using "gs" prefix seems to be possible:

Re: [External] : Re: Need help with executing Flink CLI for native Kubernetes deployment

2021-04-01 Thread Fuyao Li
Hi Yang, Thanks for sharing the insights. For problem 1: I think I can’t do telnet in the container. I tried to use curl 144.25.13.78:8081 and I could see the HTML of Flink dashboard UI. This proves such public IP is reachable inside the cluster. Just as you mentioned, there might still be

Re: DataStream from kafka topic

2021-04-01 Thread Arian Rohani
Thank you Arvid, I was going to suggest something like this also. We use TestContainers and the docker images provided by ververica to do exactly this in our team. I am currently working on a small project on github to start sharing for use cases like this. The project will contain some example

[ANNOUNCE] Release 1.13.0, release candidate #0

2021-04-01 Thread Dawid Wysakowicz
|Hi everyone,| |As promised I created a release candidate #0 for the version 1.13.0. I am not starting a vote for this release as I've created it mainly for verifying the release process. We are still aware of some improvements coming in shortly. However we will greatly appreciate any help testing

Re: s3 FileSystem Error "s3 file system implementation does not support recoverable writers"

2021-04-01 Thread Guowei Ma
Hi, Robert It seems that your AccessKeyId is not valid. I think you could find more detailed from [1] about how to configure the s3' access key. [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/ Best, Guowei On Thu, Apr 1, 2021 at 9:19 PM Robert Cullen

How to know if task-local recovery kicked in for some nodes?

2021-04-01 Thread Sonam Mandal
Hello, We are experimenting with task local recovery and I wanted to know whether there is a way to validate that some tasks of the job recovered from the local state rather than the remote state. We've currently set this up to have 2 Task Managers with 2 slots each, and we run a job with

Re: Flink Taskmanager failure recovery and large state

2021-04-01 Thread Guowei Ma
Hi, Yaroslav AFAIK Flink does not retry if the download checkpoint from the storage fails. On the other hand the FileSystem already has this retry mechanism already. So I think there is no need for flink to retry. I am not very sure but from the log it seems that the gfs's retry is interrupted by

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Guowei Ma
Hi, Yuval Thanks for your contribution. I am not a SQL expert, but it seems to be beneficial to users, and the amount of code is not much and only left is the test. Therefore, I am open to this entry into rc1. But according to the rules, you still have to see if there are other PMC's objections

Avro schema

2021-04-01 Thread Sumeet Malhotra
Hi, Is it possible to directly import Avro schema while ingesting data into Flink? Or do we always have to specify the entire schema in either SQL DDL for Table API or using DataStream data types? From a code maintenance standpoint, it would be really helpful to keep one source of truth for the

Re: [ANNOUNCE] Release 1.13.0, release candidate #0

2021-04-01 Thread Xingbo Huang
Hi Dawid, Thanks a lot for the great work! Regarding to the issue of flink-python, I have provided a quick fix and will try to fix it ASAP. Best, Xingbo Dawid Wysakowicz 于2021年4月2日周五 上午4:04写道: > Hi everyone, > As promised I created a release candidate #0 for the version 1.13.0. I am > not

Re: Avro schema

2021-04-01 Thread Sumeet Malhotra
Just realized, my question was probably not clear enough. :-) I understand that the Avro (or JSON for that matter) format can be ingested as described here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#apache-avro-format, but this still requires the entire table

Why is Hive dependency flink-sql-connector-hive not available on Maven Central?

2021-04-01 Thread Yik San Chan
The question is cross-posted in StackOverflow https://stackoverflow.com/questions/66914119/flink-why-is-hive-dependency-flink-sql-connector-hive-not-available-on-maven-ce According to [Flink SQL Hive: Using bundled hive jar](

Re: Measuring the Size of State, Savepoint Size vs. Restore time

2021-04-01 Thread Yun Tang
HI Kevin, Currently, you can view logs to find when to start and finish to restore [1] to know how much time spent on task side. Flink-1.13 also try to expose stage of task initializations [2] and maybe it could help you. state.backend.rocksdb.metrics.total-sst-files-size should be correct to

Re: JDBC connector support for JSON

2021-04-01 Thread Matthias Pohl
Hi Fanbin, I'm not that familiar with the FlinkSQL features. But it looks like the JdbcConnector does not support Json as stated in the documentation [1]. You might work around it by implementing your own user-defined functions [2]. I hope this helps. Matthias [1]

Checkpoint timeouts at times of high load

2021-04-01 Thread Geldenhuys, Morgan Karl
Hi Community, I have a number of flink jobs running inside my session cluster with varying checkpoint intervals plus a large amount of operator state and in times of high load, the jobs fail due to checkpoint timeouts (set to 6 minutes). I can only assume this is because the latencies for

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Dawid Wysakowicz
Hi all, @Kurt @Arvid I think it's fine to merge those two, as they are pretty much finished. We can wait for those two before creating the RC0. @Leonard Personally I'd be ok with 3 more days for that single PR. I find the request reasonable and I second that it's better to have a proper review

退订

2021-04-01 Thread Chouchou Mei
退订

Re: IO benchmarking

2021-04-01 Thread Matthias Pohl
For 2. there are also efforts to expose the state and operator initialization through the logs (see FLINK-17012 [1]). For 3. the TypeSerializer [2] might be another point of interest. It is used to serialize specific types. Other than that, the state serialzation depends heavily on the used state

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Arvid Heise
Hi Dawid and Guowei, I'd like to merge [FLINK-13550][rest][ui] Vertex Flame Graph [1]. We are pretty much just waiting for AZP to turn green, it's separate from other components, and it's a super useful feature for Flink users. Best, Arvid [1] https://github.com/apache/flink/pull/15054 On

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Kurt Young
Thanks Dawid, I have merged FLINK-20320. Best, Kurt On Thu, Apr 1, 2021 at 2:49 PM Dawid Wysakowicz wrote: > Hi all, > > @Kurt @Arvid I think it's fine to merge those two, as they are pretty much > finished. We can wait for those two before creating the RC0. > > @Leonard Personally I'd be ok

Re: ARM support

2021-04-01 Thread Guowei Ma
Hi, Rex I think that Flink does not have an official release that supports the arm architecture. There are some efforts and discussion [1][2][3] about supporting the architecture. I think you could find some builds at openlabtesting. [4] But AFAIK there is no clear timeline about that.(correct me

Re: Restoring from Flink Savepoint in Kubernetes not working

2021-04-01 Thread Matthias Pohl
The logs would have helped to understand better what you were doing. The stacktrace you shared indicates that you either asked for the status of a savepoint creation that had already been completed and was, therefore, removed from the operations cache or you used some job ID/request ID pair that

Re: Flink Taskmanager failure recovery and large state

2021-04-01 Thread Guowei Ma
Hi, Yaroslav AFAIK there is no official GCS FileSystem support in FLINK. Does the GCS is implemented by yourself? Would you like to share the whole log of jm? BTW: From the following log I think the implementation has already some retry mechanism. >>> Interrupted while sleeping before retry.

Re: 关于Flink水位线与时间戳分配的疑问

2021-04-01 Thread Shengkai Fang
hi, 图挂了。 1. 可以这么使用这个方法: ··· input.assignTimestampsAndWatermarks( WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMillis(10)) .withTimestampAssigner((event, timestamp) -> 42L)); ··· TimestampAssigner 会从输入的event上读取数据 并由watermark generator 决定输出对应的watermark. 3.

????

2021-04-01 Thread ???0?6

Re: 回复:现在的flink1.12支持批流混合的作业吗?

2021-04-01 Thread 键盘击打者
谢谢老哥的回复。 可能我举得例子不是很好。我其实还是单纯的想问 使用DataStream的API,支不支持同一个应用中批流混合的作业(比如一个应用多个job,job有流有批)... 就我目前看到的文档中来看,是支持的。但是指定execution.runtime-mode为批、流、自动模式后,作业的执行还是会依照批和流其中一种模式,所以在性能上不能很好的支持批流混合的作业。 不知道我理解的是否正确,学生狗一只。 祝好, 耳朵 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink 1.12.2 sql-cli 写入Hive报错 is_generic

2021-04-01 Thread Rui Li
你好, 我用你提供的这个DDL没有复现这个问题,有更详细的操作步骤么?另外如果kafka表是通过create table like创建的话有个已知问题: https://issues.apache.org/jira/browse/FLINK-21660 On Thu, Apr 1, 2021 at 4:08 PM HunterXHunter <1356469...@qq.com> wrote: > 当配置好HiveCatalog后, > SQL-Cli 也可以查到hive库表信息 > 创建kafka表: > > create table test.test_kafka( > word

Re: flink 1.12.2 sql-cli 写入Hive报错 is_generic

2021-04-01 Thread HunterXHunter
查看hdfs文件: 分区一直是这样的一个文件,没有生成 _SUCCESS文件 .part-40a2c94d-0437-4666-8d43-31c908aaa02e-0-0.inprogress.73dcc10b-44f4-47e3-abac-0c14bd59f9c9 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink 1.12.2 sql-cli 写入Hive报错 is_generic

2021-04-01 Thread HunterXHunter
你好,这个问题已经解决了。 我现在通过官方例子: SET table.sql-dialect=default; create table flink_kafka( sys_time bigint, rt AS TO_TIMESTAMP(FROM_UNIXTIME(sys_time / 1000, '-MM-dd HH:mm:ss')), WATERMARK FOR rt AS rt - INTERVAL '5' SECOND ) WITH ( 'connector' = 'kafka', 'topic' = 'xx',

Re: flink集群提交任务挂掉

2021-04-01 Thread shimin huang
增大`taskmanager.memory.task.off-heap.size`配置 bowen li 于2021年4月2日周五 上午10:54写道: > Hi,大家好: > 现在我们遇到的场景是这样的,提交任务的时候会报错。我们使用的版本是1.12.1,搭建模式是standalone的。下面是报错信息。 > >java.lang.OutOfMemoryError: Direct buffer memory. The direct > out-of-memory error has occurred. This can mean two things:

Re: jdbc connectors

2021-04-01 Thread guoyb
不是replace,用的是insert into duplicate key update ---Original--- From: "liujian"<13597820...@qq.com Date: Thu, Apr 1, 2021 16:02 PM To: "user-zh"

blink planner里的Scala代码,未来会由Java改写吗?

2021-04-01 Thread Luna Wong
目前blink planner中有大量Scala代码,Scala在这方面写起来确实简单不少。未来不需要用Java重写是吗?

flink集群提交任务挂掉

2021-04-01 Thread bowen li
Hi,大家好: 现在我们遇到的场景是这样的,提交任务的时候会报错。我们使用的版本是1.12.1,搭建模式是standalone的。下面是报错信息。 java.lang.OutOfMemoryError: Direct buffer memory. The direct out-of-memory error has occurred. This can mean two things: either job(s) require(s) a larger size of JVM direct memory or there is a direct memory

求助:通过实时 Pipeline 的手段消费 Hive Table 报java.lang.ArrayIndexOutOfBoundsException: -1,谢谢!

2021-04-01 Thread samuel....@ubtrobot.com
你好: 1. 实时通过读KAFKA,然后将数据写入了hive,建一张hive表,format 是 Parquet,是按天、小时、分钟来分区; 2. 通过实时 Pipeline 的手段消费 Hive Table 报java.lang.ArrayIndexOutOfBoundsException: -1 在flink sql client下: 1)直接select 所有字段,是没有问题,可以正常读出所有数据。 执行: select * from ubtCatalog.ubtHive.event_all_dwd /*+

Re: 关于flink CheckPoint 状态数据保存的问题

2021-04-01 Thread tison
这个配置本身我看了一下只能走 flink-conf.yaml,而且似乎是 per cluster 配置的,虽然 perjob / application 部署的时候没啥问题,但是 session 可能就不行了。配置这块 Flink 是有点全走 flink-conf.yaml + 默认你是用 perjob / application 的意思。 你提的数据看不到的问题,首先确认一下是否 chk 真的有数据。另外我依稀记得 tangyun(in cc) 做过一个改动,可以问下他的看法。 Best, tison. tison 于2021年4月1日周四 下午3:50写道: >

???????????????? OutOfMemoryError: Metaspace ????

2021-04-01 Thread ??????
?? sql ?? s3 clickhouse ?? kafka ?? task-manager OutOfMemoryError: Metaspace ?? flink ??1.12.2 ?? standalone kubernetes session ??

Re: 关于flink CheckPoint 状态数据保存的问题

2021-04-01 Thread lp
好的,谢谢 -- Sent from: http://apache-flink.147419.n8.nabble.com/

关于flink CheckPoint 状态数据保存的问题

2021-04-01 Thread lp
我写了一个带状态的function 采用了如下cp配置: env.enableCheckpointing(5000L, CheckpointingMode.EXACTLY_ONCE); env.getCheckpointConfig().setMinPauseBetweenCheckpoints(100L); env.getCheckpointConfig().setCheckpointTimeout(6L);

Re: 关于flink CheckPoint 状态数据保存的问题

2021-04-01 Thread lp
如题,除了通过这种全局配置文件中的方式修改,能在程序中通过代码的方式修改吗 -- Sent from: http://apache-flink.147419.n8.nabble.com/

???????????????? OutOfMemoryError: Metaspace ????

2021-04-01 Thread ??????
?? sql ?? s3 clickhouse ?? kafka ?? task-manager OutOfMemoryError: Metaspace ?? flink ??1.12.2 ?? standalone kubernetes session ??

退订

2021-04-01 Thread Chouchou Mei
退订

Re: 关于flink CheckPoint 状态数据保存的问题

2021-04-01 Thread tison
只有一个的问题是因为默认保留的 chk 数量是一个,可以修改这个配置[1]来改变。 Best, tison. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#state-checkpoints-num-retained lp <973182...@qq.com> 于2021年4月1日周四 下午3:48写道: > 我写了一个带状态的function > 采用了如下cp配置: > env.enableCheckpointing(5000L,

jdbc connectors

2021-04-01 Thread liujian
Hi: jdbc connectors sink??mysql, ?? replace into??,,mysqlid,AUTO_INCREMENT ??2

flink 1.12.2 sql-cli 写入Hive报错 is_generic

2021-04-01 Thread HunterXHunter
当配置好HiveCatalog后, SQL-Cli 也可以查到hive库表信息 创建kafka表: create table test.test_kafka( word VARCHAR ) WITH ( 'connector' = 'kafka', 'topic' = 'xx', 'scan.startup.mode' = 'latest-offset', 'properties.bootstrap.servers' = 'xx', 'properties.group.id' = 'test', 'format' = 'json',

回复:现在的flink1.12支持批流混合的作业吗?

2021-04-01 Thread 飞翔
哈哈,建议你去了解flink-cdc 发自我的iPhone -- 原始邮件 -- 发件人: 键盘击打者 http://apache-flink.147419.n8.nabble.com/file/t1425/1617266981%281%29.jpg; 祝好, 耳朵 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 关于flink CheckPoint 状态数据保存的问题

2021-04-01 Thread Paul Lam
关于 chk 下只有 _metadata 的问题,大概是因为 state 比较小,被嵌入到 _medata 文件里了。可以参考这个配置项 [1]。 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#state-backend-fs-memory-threshold Best, Paul Lam > 2021年4月1日 16:25,lp <973182...@qq.com> 写道: > > 好的,谢谢 > > > > -- > Sent from:

Re: Connection reset by peer

2021-04-01 Thread 骆凡
加内存试试 -- Sent from: http://apache-flink.147419.n8.nabble.com/

关于Flink水位线与时间戳分配的疑问

2021-04-01 Thread 陈柏含
您好: 我是目前正在自学Flink以完成毕业设计的计算机专业学生。目前对Flink时间戳与水位线的代码有很多不确定且找不到权威解答的疑问,自己调试程序也因为之前没有Flink经验而对Debug变量窗口中各种复杂的结构找不到头绪。因此,抱着试一试的想法尝试通过这个邮箱寻求解答。 下面两个图片是一个分配器 我有一下几个疑问:1.是不是我们只要调用env.addSource(...).assignTimestampsAndWatermarks(new

现在的flink1.12支持批流混合的作业吗?

2021-04-01 Thread 键盘击打者
嗨,您好! 比如我想在一个应用里同时跑实时数据清洗和历史数据清洗(批作业和流作业混合)? 类似下图的应用。 祝好, 耳朵 -- Sent from: http://apache-flink.147419.n8.nabble.com/

FS StateBackend 到 RocksDB StateBackend 状态恢复问题

2021-04-01 Thread LakeShen
Hi 社区, 如果实时任务状态后端之前是 FS StateBackend ,然后任务停止后,换成 RocksDB StateBackend 做恢复,作业状态能恢复吗? Best, LakeShen

回复:FS StateBackend 到 RocksDB StateBackend 状态恢复问题

2021-04-01 Thread 范瑞
Hi Lake: 目前的 Flink 版本应该都是不支持的,也就是说换了 StateBackend 不能正常恢复。Flink 1.13 做了这个事情,具体参考:FLIP41 和 FLINK-20976 https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State https://issues.apache.org/jira/browse/FLINK-2097 Best, fanrui ---原始邮件--- 发件人: "LakeShen"

Re: FS StateBackend 到 RocksDB StateBackend 状态恢复问题

2021-04-01 Thread LakeShen
确定了 不能 LakeShen 于2021年4月1日周四 下午7:15写道: > Hi 社区, >如果实时任务状态后端之前是 FS StateBackend ,然后任务停止后,换成 RocksDB StateBackend > 做恢复,作业状态能恢复吗? > > Best, > LakeShen >

Re: FS StateBackend 到 RocksDB StateBackend 状态恢复问题

2021-04-01 Thread LakeShen
Hi fanrui, thank you so much! Best, LakeShen 范瑞 <836961...@qq.com> 于2021年4月1日周四 下午7:36写道: > Hi Lake: > > > 目前的 Flink 版本应该都是不支持的,也就是说换了 StateBackend 不能正常恢复。Flink 1.13 > 做了这个事情,具体参考:FLIP41 和 FLINK-20976 > > > >