Flink application kept restarting

2021-02-25 Thread Rainie Li
Hi All, Our flink application kept restarting and it did lots of RPC calls to a dependency service. *We saw this exception from failed task manager log: * org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator at

Re: flinksql集成hive权限管理

2021-02-25 Thread Rui Li
你好, 目前hive connector还没有支持ranger,只支持HMS端基于storage的权限控制。 On Thu, Feb 25, 2021 at 8:49 PM 阿华田 wrote: > > 目前在做基于flinksql搭建实时计算平台,使用hive的catalog管理flinksql的元数据。hive本身使用Ranger进行权限管理。但是flinksql使用hive的catalog管理元数据,用户的权限Ranger好像无法管理。各位大佬知道有什么比较好的方式解决吗? > > > | | > 阿华田 > | > | > a15733178...@163.com > |

Re: Flink SQL 应用情况请教

2021-02-25 Thread Smile
你好, 关于指标的问题,可以进到具体的算子里面的 Metrics 页面看看每个算子的 numRecordsIn 和 numRecordsOut,看是哪个算子开始有输入没输出的。 上面贴的指标看起来是 Overview 页面上的,这个地方展示的指标是对整个 Chain 起来的整体算的。 GroupWindowAggregate(groupBy=[stkcode], window=[TumblingGroupWindow('w$, matchtime, 6)], properties=[w$start, w$end, w$rowtime, w$proctime],

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-02-25 Thread Kurt Young
Hi Timo, First of all I want to thank you for introducing this planner design back in 1.9, this is a great work that allows lots of blink features to be merged to Flink in a reasonably short time. It greatly accelerates the evolution speed of Table & SQL. Everything comes with a cost, as you

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-02-25 Thread Kurt Young
Hi Timo, First of all I want to thank you for introducing this planner design back in 1.9, this is a great work that allows lots of blink features to be merged to Flink in a reasonably short time. It greatly accelerates the evolution speed of Table & SQL. Everything comes with a cost, as you

Flink SQL 应用情况请教

2021-02-25 Thread yinghua...@163.com
我们在使用Flink SQL 统计一分钟内各个股票的交易量,SQL代码如下: CREATE TABLE t_stock_match_p_1( id VARCHAR, stkcode INT, volume INT, matchtime TIMESTAMP, WATERMARK FOR matchtime as matchtime ) WITH ( 'connector' = 'kafka-0.10', 'topic' = 'stock_match_p_1', 'scan.startup.mode' = 'latest-offset',

部署时Flink connector放置位置

2021-02-25 Thread automths
Hi: flink on yarn 上,我可能会用到好多个connector,比如hbase、kafka、es、mysql,我想在flink的安装位置提前把这些connector以及connector所依赖的包给准备好,然后用户开发的包中就可以不用在包含这些包了。 我目前知道的是可以放到flink安装包的/lib目录下,但是这样看起来好乱,而且也不好管理,可能有些还会有共同的依赖,如果版本不一致,可能会导致冲突。 我看有一个plugin目录,我按照plugin目录下面的README准备,发现用的时候class path中并没有生效。

Re: flink sql 并发数问题

2021-02-25 Thread Smile
Hi Jeff, 对于 SQL,现在只能设置整个 SQL 的并发,不能单独提高某个算子的并发。 不过可以考虑把消费 Kafka 的部分用 DataStream 来实现,然后再把 DataStream 转成 Table 去跑 SQL。这样消费 Kafka 的并发和 SQL 的并发就可以分开来设置了。 还有一个想法是如果你的 Kafka Source 到 UDF 之间有 hash (比如 Group By)之类的重分发的逻辑,是否可以忽略 Kafka

Re: Flink 维表延迟join

2021-02-25 Thread Smile
我们也有遇到维度关联的时候维表比流晚到的情况,不过我们的流一般都有唯一键,因此目前是用 session window 来自定义控制流的落后时间来关联。 具体思路: 对于流,按唯一键 Group By,对 process time 开延迟时间长度的 session window。 因为 Group By 了唯一键,每个 session 窗口里面一定只有一条数据,所以一定是到了设定的 size 就会触发的。 比如期望流数据到达后一分钟再去关联维表,SQL 如下: -- 原始 SQL SELECT s.id, s.another_field, r.dim_id_value

flink1.12 执行sql_query(),同样的数据源表,pyflink执行时间9min,java执行3s

2021-02-25 Thread xiaoyue
不知道大家有没有遇到这个问题,流环境中链接Mysql数据库,利用DDL定义两个数据源表 source1, source2. sql = "SELECT ID, NAME, IP, PHONE FROM source1 JOIN source2 ON source1.ID = source2.ID WHERE ID = '123456' AND DATE BETWEEN '20160701' AND '20170307'" # 获取Query结果 query_table = env.sql_query(sql) query_table.to_pandas()

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-25 Thread sofya
What was the actual solution? Did you have to modify pom? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

flink版本:1.11.2, 从kafka读取数据写到hdfs,运行一段时间报错

2021-02-25 Thread 史 正超
flink版本:1.11.2, 从kafka读取数据写到hdfs,运行一段时间报错, sql 和 日志如下: ```sql CREATE TABLE T_ED_CELL_NUM_INFO_SRC( bigBox STRING, edCode STRING, mBoxAmount INT, mFaultBoxAmount INT, mFaultReserveBoxAmount INT, mReserveBoxAmount INT, mUseReserveBox INT, mUsedBoxCount INT,

Flink checkpoint恢复疑问

2021-02-25 Thread flink2021
请问下,如果我代码逻辑有变动了,可以从历史的checkpoint恢复吗?我试了下,从历史的checkpoints恢复的话,并没有执行我的修改后的逻辑,例如原来逻辑是 where flag=1 改为:where flag>1 ,从9点30的checkpoint :chk100 恢复,是不是 where flag>1 的逻辑只会对9点30后后的数据生效呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/

ecommerce photography studio

2021-02-25 Thread Jason
Hi, Hope all is well. I am reaching you to check if you have photo shooting needs for your products. Our product photography studio is specially designed and outfitted with professional cameras, lighting, reflectors, diffusers, soft boxes, props, and all the tools needed to allow our team to

Re: FLINK 消费kafka 报错java.lang.OutOfMemoryError: Direct buffer memory

2021-02-25 Thread flink2021
设置taskmanager.memory.framework.off-heap.size 512MB 使用rocksdb存储,现在很稳定 -- Sent from: http://apache-flink.147419.n8.nabble.com/

来自徐嘉培的邮件

2021-02-25 Thread 徐嘉培
退订

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread Rion Williams
 Hi David, Thanks for your prompt reply, it was very helpful and the PseudoWindow example is excellent. I believe it closely aligns with an approach that I was tinkering with but seemed to be missing a few key pieces. In my case, I'm essentially going to want to be aggregating the messages

Best way to implemented non-windowed join

2021-02-25 Thread Yaroslav Tkachenko
Hello everyone, I have a question about implementing a join of N datastreams (where N > 2) without any time guarantees. According to my requirements, late data is not tolerable, so if I have a join between stream A and stream B and a message with key X arrives in stream B one year after arriving

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread David Anderson
Rion, What you want isn't really achievable with the APIs you are using. Without some sort of per-key (per-tenant) watermarking -- which Flink doesn't offer -- the watermarks and windows for one tenant can be held up by the failure of another tenant's events to arrive in a timely manner.

Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread Rion Williams
Hey folks, I have a somewhat high-level/advice question regarding Flink and if it has the mechanisms in place to accomplish what I’m trying to do. I’ve spent a good bit of time using Apache Beam, but recently pivoted over to native Flink simply because some of the connectors weren’t as mature or

退订

2021-02-25 Thread uuid
退订 发自我的iPhone

Re: Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-25 Thread Bariša
Small update: we believe that the off heap memory is used by the parquet writer ( used in sink to write to S3 ) On Wed, 24 Feb 2021 at 23:25, Bariša wrote: > I'm running flink 1.8.2 in a container, and under heavy load, container > gets OOM from the kernel. > I'm guessing that that reason for

回复: 升级kafka导致fink job失败

2021-02-25 Thread freeza1...@outlook.com
补充:凌晨的升级是针对kafka的 freeza1...@outlook.com 发件人: freeza1...@outlook.com 发送时间: 2021-02-25 22:23 收件人: user-zh 主题: 升级kafka导致fink job失败 Hi : 生产上在凌晨的时候做了1个升级,线上有13个job,其中有1个名为gift的job失败了,其他的都是正常的,很奇怪,查看日志后看到如下信息,此类问题如何规避,是否flink需要做一些特殊设置? JM的日志 [2021-02-08

升级kafka导致fink job失败

2021-02-25 Thread freeza1...@outlook.com
Hi : 生产上在凌晨的时候做了1个升级,线上有13个job,其中有1个名为gift的job失败了,其他的都是正常的,很奇怪,查看日志后看到如下信息,此类问题如何规避,是否flink需要做一些特殊设置? JM的日志 [2021-02-08 00:49:23.714][org.apache.flink.runtime.checkpoint.CheckpointCoordinator][]Completed checkpoint 1435986 for job d31fe16e0525e20a19dc79c88ab958a2 (10872 bytes in 41 ms).

flinksql集成hive权限管理

2021-02-25 Thread 阿华田
目前在做基于flinksql搭建实时计算平台,使用hive的catalog管理flinksql的元数据。hive本身使用Ranger进行权限管理。但是flinksql使用hive的catalog管理元数据,用户的权限Ranger好像无法管理。各位大佬知道有什么比较好的方式解决吗? | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制

Re: Flink custom trigger use case

2021-02-25 Thread Roman Khachatryan
Hi, Yes, you have an Iterable with window elements as the ProcessWindowFunction input. You can then emit them individually. Regards, Roman On Thu, Feb 25, 2021 at 7:22 AM Diwakar Jha wrote: > Hello, > > I tried using *processWindowFunction* since it gives access to > *globalstate* through

Flink1.12.0版本 Distinct Aggregation

2021-02-25 Thread guanyq
附件是代码,按照官网写的demo。 不知道哪里有问题,麻烦帮忙看下。 root |-- orderId: STRING |-- userId: INT |-- money: INT |-- createTime: BIGINT |-- pt: TIMESTAMP(3) *PROCTIME* 17:17:11,935 INFO org.apache.flink.api.java.typeutils.TypeExtractor [] - class org.apache.flink.types.Row is missing a

Re: Jackson object serialisations

2021-02-25 Thread Lasse Nedergaard
Thanks for your feedback. I go with specific Kryo serialisation as it make the code easier to use and if I encounter perf. Problems I can change the dataformat later. Med venlig hilsen / Best regards Lasse Nedergaard > Den 24. feb. 2021 kl. 17.44 skrev Maciej Obuchowski > : > > Hey

groupBy和keyBy的使用方式不同吗?

2021-02-25 Thread 李继
case class Student(name: String, age: Int,teacher:Teacher) case class Teacher(name:String,room:(Int,Int,Int),salary:Int) def main(args: Array[String]): Unit = { val teacher = Teacher("teacher-w",(1,2,3),99) val students = List(Student("a",11,teacher),Student("b",22,teacher)) val benv =

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-25 Thread Jan Oelschlegel
Hi Benchao, i’m observing this behaviour only for the SQL API. With the Datastream API i can take more or less source-tasks then kafka partition count. And FLIP-27 seems to belong to the Datastream API. The problem is only on the SQL site. Best, Jan Von: Benchao Li Gesendet: Donnerstag,