Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-24 Thread Yun Gao
Hi all, I tested the previous PoC with the current tests and I found some new issues that might cause divergence, and sorry for there might also be some reversal for some previous problems: 1. Which operators should wait for one more checkpoint before close ? One

flink1.10.1??????????????org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.TimeoutException

2020-12-24 Thread bigdata
The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.TimeoutException at

Re:Re: pyflink1.12 使用connector read.query参数报错

2020-12-24 Thread 肖越
谢谢,老师们的指导,根据嘉伟的建议,发现pyflink1.12确实并不支持这个参数~ 还是希望官方能够开放这个参数,就目前的工作情景来说,取数据就需要定义整张表,如果数据库更改,代码这边很不便于维护; 从本机的实验结果上看,pyflink内部进行query的效率并不高,正准备放到集群上试试~ 在 2020-12-25 09:45:28,"Leonard Xu" 写道: >Hi, 嘉伟 > >1.12 应该不支持 read.query, 社区还在讨论是否要开放这个,有些concern, 简单的讲,就如你这个query写的,创建的这张JDBC >表应该是一个 View

Re: Re: Flink-1.11.1流写filesystem分区提交问题

2020-12-24 Thread amen...@163.com
想请问下,写filesystem的时候依赖checkpoint进行commit,那么做完一次checkpoint的时候可提交的文件数是由并行度parallelism数决定的吗?我发现我的文件提交数都是3个3个的当每次chk结束后。 发件人: amen...@163.com 发送时间: 2020-12-24 18:47 收件人: user-zh 主题: Re: Re: Flink-1.11.1流写filesystem分区提交问题 一语点醒梦中人,谢谢回复@冯嘉伟 因为我是先在sql-client中进行的提交测试,因此忽略了这个问题,谢谢 best, amenhub

Flink TaskManager失败的日志关键词

2020-12-24 Thread 赵一旦
如题,有人知道关键词吗,每次失败日志太多哦。 显示各种task的cancel等。 最后突然就失败了。。。 目前感觉经常是因为cancel(180s)。导致Task did not exit gracefully within 180 + seconds。 此外,大家生产中会修改日志格式和日志文件吗。我调整了之后WEB-UI上那个日志从来没能看过。现在虽然有个日志list,但点了也没效果。 我调整了日志文件名。

Flink读取kafka没有报错也没有数据输出,Kafka消费端有数据,谢谢

2020-12-24 Thread Appleyuchi
大佬们好 我的环境是: | 组件 | 版本 | | Flink | 1.12 | | Kafka | 2.5.0 | | Zookeeper | 3.6.0 | 完整代码是 https://paste.ubuntu.com/p/pRWpvJw4b8/ kafka消费端(使用命令行消费)确认有数据输出。 但是上述代码没有输出,DDL检查过确认无误。 因为听说executeSql会提交任务,所以把最后一句execute给注销了。 求问应该如何修改代码才能让代码有输出? 谢谢

Re: yarn.provided.lib.dirs在flink1.11 yarn提交不生效

2020-12-24 Thread Yang Wang
非常不建议你将非Flink binary的jar存放到yarn.provided.lib.dirs,因为这个下面的jar会以Yarn public distributed cache的方式进行分发 并在NodeManager上缓存,共享给所有的application使用 你这个报错的根本原因是本地运行main的时候udf还是在hdfs上,所以报错在client端了 有两个办法修复: 1. 不要将udf放到hdfs上的provided lib dirs,除非你确实想将它共享给很多application 2.

Re: yarn.provided.lib.dirs在flink1.11 yarn提交不生效

2020-12-24 Thread datayangl
用-D 还是加载不了,难道yarn.provided.lib.dirs只有application mode支持??? 我看阿里云有yarn-cluster的例子: https://developer.aliyun.com/article/762501?spm=a2c6h.12873639.0.0.14ac3a9eM6GNSi

Re: yarn.provided.lib.dirs在flink1.11 yarn提交不生效

2020-12-24 Thread zhisheng
hi 使用 -Dyarn.provided.lib.dirs 试试 Best zhisheng datayangl 于2020年12月22日周二 下午4:56写道: > > > flink1.11 on yarn模式,我提前将flink > lib下的依赖及自定义函数jar上传到hdfs上,提交时使用yarn.provided.lib.dirs > 指定hdfs的依赖路径。原本设想程序中使用反射去寻找自定义函数的类并且实例化,但是提交时报错,程序并没有找到自定义函数的路径 > > 提交命令:/usr/hdp/flink1.11/bin/flink run -m

Re: StreamingFileSink closed file exception

2020-12-24 Thread Yun Gao
Hi Billy, StreamingFileSink does not expect the Encoder to close the stream passed in in encode method. However, ObjectMapper would close it at the end of the write method. Thus I think you think disable the close action for ObjectMapper, or change the encode implementation to

Re: pyflink1.12 使用connector read.query参数报错

2020-12-24 Thread Leonard Xu
Hi, 嘉伟 1.12 应该不支持 read.query, 社区还在讨论是否要开放这个,有些concern, 简单的讲,就如你这个query写的,创建的这张JDBC 表应该是一个 View 而不是对应一张JDBC 表,同时这个表只能用来作为source,不能用来作为sink。 祝好, Leonard > 在 2020年12月24日,19:16,冯嘉伟 <1425385...@qq.com> 写道: > > hi! 试试这个 > > CREATE TABLE source_table( >yldrate DECIMAL, >

Re:Re: flink-1.11.1 setMinPauseBetweenCheckpoints不生效

2020-12-24 Thread nicygan
Dian Fu: 谢谢解惑,我试试换个版本。 thank you by nicygan 在 2020-12-24 22:44:04,"Dian Fu" 写道: >应该是个已知问题,在1.11.2里已经修复了:https://issues.apache.org/jira/browse/FLINK-18856 > >> 在 2020年12月24日,下午9:34,赵一旦 写道: >> >> I don't believe what you say... >> >> nicygan 于2020年12月24日周四 下午7:25写道: >> >>> dear all:

Re: Issue in WordCount Example with DataStream API in BATCH RuntimeExecutionMode

2020-12-24 Thread Aljoscha Krettek
Thanks for reporting this! This is not the expected behaviour, I created a Jira Issue: https://issues.apache.org/jira/browse/FLINK-20764. Best, Aljoscha On 23.12.20 22:26, David Anderson wrote: I did a little experiment, and I was able to reproduce this if I use the sum aggregator on

Re: FileSink class in 1.12?

2020-12-24 Thread Billy Bain
Of course I found it shortly after submitting my query. compile group: 'org.apache.flink', name: 'flink-connector-files', version: '1.12.0' On 2020/12/24 15:57:20, Billy Bain wrote: > I can't seem to find the org.apache.flink.connector.file.sink.FileSink > class. > > I can find the

FileSink class in 1.12?

2020-12-24 Thread Billy Bain
I can't seem to find the org.apache.flink.connector.file.sink.FileSink class. I can find the StreamingFileSink, but not FileSink referenced here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/connectors/file_sink.html Am I missing a dependency? compile group:

Realtime Data processing from HBase

2020-12-24 Thread s_penakalap...@yahoo.com
Hi Team, I recently encountered one usecase in my project as described below: My data source is HBaseWe receive huge volume of data at very high speed to HBase tables from source system.Need to read from HBase, perform computation and insert to postgreSQL. I would like few inputs on the below

Re: flink-1.11.1 setMinPauseBetweenCheckpoints不生效

2020-12-24 Thread Dian Fu
应该是个已知问题,在1.11.2里已经修复了:https://issues.apache.org/jira/browse/FLINK-18856 > 在 2020年12月24日,下午9:34,赵一旦 写道: > > I don't believe what you say... > > nicygan 于2020年12月24日周四 下午7:25写道: > >> dear all: >>我在checkpoint设置中,设置了 >> >> checkpointConfig.setMinPauseBetweenCheckpoints(180_000L) >>

Flink+DDL读取kafka没有输出信息,但是kafka消费端有信息

2020-12-24 Thread Appleyuchi
是Flink1.12的,kafka消费端能读取到数据,但是下面的代码无法读取到数据,运行后没有报错也没有输出,求助,谢谢 import org.apache.flink.streaming.api.scala._ import org.apache.flink.table.api.{EnvironmentSettings, Table} import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment import org.apache.flink.types.Row import

StreamingFileSink closed file exception

2020-12-24 Thread Billy Bain
I am new to Flink and am trying to process a file and write it out formatted as JSON. This is a much simplified version. public class AndroidReader { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env =

Re: flink应用起来后flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function.

2020-12-24 Thread 赵一旦
报错信息看下:Caused by: org.apache.flink.shaded.guava18.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NumberFormatException: Not a version: 9。 bigdata <1194803...@qq.com> 于2020年12月24日周四 下午9:49写道: > flink1.10.1集群dml报错如下 >

flink??????????flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function.

2020-12-24 Thread bigdata
flink1.10.1dml org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function. at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:269) at org.apache.flink.streaming.runtime.tasks.OperatorChain.

Re: flink-1.11.1 setMinPauseBetweenCheckpoints不生效

2020-12-24 Thread 赵一旦
I don't believe what you say... nicygan 于2020年12月24日周四 下午7:25写道: > dear all: > 我在checkpoint设置中,设置了 > > checkpointConfig.setMinPauseBetweenCheckpoints(180_000L) > 但是好像并没有生效, > 比如id=238的结束时间为17:13:30 > 但是id=239的开始时间也是17:13:30 > > > 我的理解id=239的开始时间至少应该是17:16:30 > >

Re: flink.client.program.ProgramInvocationException: The main method caused an error: Unable to instantiate java compiler

2020-12-24 Thread 冯嘉伟
hi! 可以试试修改配置文件: classloader.resolve-order: parent-first 或者可以尝试 org.apache.flink.table.runtime.generated.CompileUtils 这个工具类 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: pyflink1.12 使用connector read.query参数报错

2020-12-24 Thread 冯嘉伟
hi! 试试这个 CREATE TABLE source_table( yldrate DECIMAL, pf_id VARCHAR, symbol_id VARCHAR) WITH( 'connector' = 'jdbc', 'url' = 'jdbc:mysql://ip/db', 'driver' = 'com.mysql.cj.jdbc.Driver',

flink-1.11.1 setMinPauseBetweenCheckpoints不生效

2020-12-24 Thread nicygan
dear all: 我在checkpoint设置中,设置了 checkpointConfig.setMinPauseBetweenCheckpoints(180_000L) 但是好像并没有生效, 比如id=238的结束时间为17:13:30 但是id=239的开始时间也是17:13:30 我的理解id=239的开始时间至少应该是17:16:30 是我对这个参数理解有误吗? thanks by nicygan

Re: Registering RawType with LegacyTypeInfo via Flink CREATE TABLE DDL

2020-12-24 Thread Yuval Itzchakov
An expansion to my question: What I really want is for the UDF to return `RAW(io.circe.Json, ?)` type, but I have to do a conversion between Table and DataStream, and TypeConversions.fromDataTypeToLegacyInfo cannot convert a plain RAW type back to TypeInformation. On Thu, Dec 24, 2020 at 12:59

Re: pyflink1.12 使用connector read.query参数报错

2020-12-24 Thread Dian Fu
'table-name' = 'TS_PF_SEC_YLDRATE' 这一行后面少个逗号 > 在 2020年12月24日,下午2:02,肖越 <18242988...@163.com> 写道: > > 使用DDL 定义connector连接Mysql数据库,想通过发送sql的方式直接获取数据: > source_ddl = """ > CREATE TABLE source_table( >yldrate DECIMAL, >pf_id VARCHAR, >symbol_id

Registering RawType with LegacyTypeInfo via Flink CREATE TABLE DDL

2020-12-24 Thread Yuval Itzchakov
Hi, I have a UDF which returns a type of MAP')>. When I try to register this type with Flink via the CREATE TABLE DDL, I encounter an exception: - SQL parse failed. Encountered "(" at line 2, column 256. Was expecting one of: "NOT" ... "NULL" ... ">" ... "MULTISET" ...

Re: Re: Flink-1.11.1流写filesystem分区提交问题

2020-12-24 Thread amen...@163.com
一语点醒梦中人,谢谢回复@冯嘉伟 因为我是先在sql-client中进行的提交测试,因此忽略了这个问题,谢谢 best, amenhub 发件人: 冯嘉伟 发送时间: 2020-12-24 18:39 收件人: user-zh 主题: Re: Flink-1.11.1流写filesystem分区提交问题 有开启checkpoint吗? Part files can be in one of three states: In-progress : The part file that is currently being written to is

Re: Flink-1.11.1流写filesystem分区提交问题

2020-12-24 Thread 冯嘉伟
有开启checkpoint吗? Part files can be in one of three states: In-progress : The part file that is currently being written to is in-progress Pending : Closed (due to the specified rolling policy) in-progress files that are waiting to be committed Finished : On successful checkpoints (STREAMING) or at

Re: How does Flink handle shorted lived keyed streams

2020-12-24 Thread narasimha
Thanks Xintong. I'll check it out and get back to you. On Thu, Dec 24, 2020 at 1:30 PM Xintong Song wrote: > I believe what you are looking for is the State TTL [1][2]. > > > Thank you~ > > Xintong Song > > > [1] >

Re: Flink-1.11.1流写filesystem分区提交问题

2020-12-24 Thread amen...@163.com
完了,现在的问题是发现好像所有的分区都没有提交,一直不提交,这是为什么呢? 发件人: amen...@163.com 发送时间: 2020-12-24 17:04 收件人: user-zh 主题: Flink-1.11.1流写filesystem分区提交问题 hi everyone, 最近在验证需求,kafka数据流写hdfs文件系统,使用官网文档Flink-1.11版本的示例demo成功提交到yarn之后,发现如期生成分区目录及文件,但是分区提交有些疑惑想请教下大家。 问题描述:

DynamicTableSource中Filter push down

2020-12-24 Thread jy l
Hi: 各位大佬,请教一个问题。 我再flink-1.12.0上自定义一个DynamicTableSource,并支持SupportsFilterPushDown,SupportsProjectionPushDown等特性。 然后我的ScanRuntimeProvider使用的是InputFormatProvider。 我运行时,下推的filters在创建InputFormat和copy()方法之后,那我还怎么在InputFormat中根据filters去过滤数据源呢?

DynamicTableSource中Filter push down

2020-12-24 Thread automths
Hi: 各位大佬,请教一个问题。 我再flink-1.12.0上自定义一个DynamicTableSource,并支持SupportsFilterPushDown,SupportsProjectionPushDown等特性。 然后我的ScanRuntimeProvider使用的是InputFormatProvider。 我运行时,下推的filters在创建InputFormat和copy()方法之后,那我还怎么在InputFormat中根据filters去过滤数据源呢? 望知道的告知一下,感谢! 祝好!

Flink-1.11.1流写filesystem分区提交问题

2020-12-24 Thread amen...@163.com
hi everyone, 最近在验证需求,kafka数据流写hdfs文件系统,使用官网文档Flink-1.11版本的示例demo成功提交到yarn之后,发现如期生成分区目录及文件,但是分区提交有些疑惑想请教下大家。 问题描述:

Re: Flink 操作hive 一些疑问

2020-12-24 Thread Jacob
Hi, 谢谢回复 对,也可以这样理解,总体分为两部分,先处理流消息,每隔15min写进hive表。然后再做mapreduce处理上步15min的数据。 目前的现状是: 第一步用flink处理,第二步是一个定时job去处理上一步的数据。 改善计划: 想整合这两步,都使用flin处理,flink新版本对hive有支持,就不用再使用MapReduce了,现在就是不知道怎样平滑地在同一个Job中执行。 - Thanks! Jacob -- Sent from: http://apache-flink.147419.n8.nabble.com/

RE: RE: checkpointing seems to be throttled.

2020-12-24 Thread Colletta, Edward
FYI, this was an EFS issue. I originally dismissed EFS being the issue because the Percent I/O limit metric was very low. But I later noticed the throughput utilization was very high. We increased the provisioned throughput and the checkpoint times are greatly reduced. From: Colletta,

Re: How does Flink handle shorted lived keyed streams

2020-12-24 Thread Xintong Song
I believe what you are looking for is the State TTL [1][2]. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl [2]