Re: Does the Kafka source perform retractions on Key?

2021-02-26 Thread Rex Fenley
Digging around, it looks like Upsert Kafka which requires a Primary Key will actually do what I want and uses compaction, but it doesn't look compatible with Debezium format? Is this on the roadmap? In the meantime, we're considering consuming from Debezium Kafka (still compacted) and then

java Flink local test failure (Could not create actor system)

2021-02-26 Thread Vijayendra Yadav
Hi Team, While running java flink project in local, I am facing following issues: *Could not create actor system ; Caused by: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V* Could you suggest does flink java project needs scala at run time? What versions might be

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-26 Thread Rion Williams
David and Timo, Firstly, thank you both so much for your contributions and advice. I believe I’ve implemented things along the lines that you both detailed and things appear to work just as expected (e.g. I can see things arriving, being added to windows, discarding late records, and

[Statefun] Exception occurs during function chaining / Async function

2021-02-26 Thread Le Xu
Hello! I'm getting an exception running a modified version of datastream/statefun example. (See exception details that follow.) The example was adapted from the original datastream example provided in statefun repo. I was trying to play with the example by chaining two functions (with the 1st

Re: Sliding Window Count: Tricky Edge Case / Count Zero Problem

2021-02-26 Thread Jan Brusch
Hi everybody, I just wanted to say thanks again for all your input and share the (surprisingly simple) solution that we came up with in the meantime: class SensorRecordCounter extends KeyedProcessFunctionSensorRecord, SensorCount>{ private ValueState state; private long windowSizeMs =

Re: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-26 Thread Shengkai Fang
Hi Jan. Thanks for your reply. Do you set the option `table.exec.source.idle-timeout` and `pipeline.auto-watermark-interval` ? If the `pipeline.auto-watermark-interval ` is zero, it will not trigger the detection of the idle source. Best, Shengkai Jan Oelschlegel 于2021年2月26日周五 下午11:09写道: >

Re: How to pass PROCTIME through an aggregate

2021-02-26 Thread Rex Fenley
This is great Timo. Maybe it only works in SQL but not Table API in the middle of a plan, which is fine. We'll give this a shot, thank you so much. On Fri, Feb 26, 2021 at 2:00 AM Timo Walther wrote: > Hi Rex, > > as far as I know, we recently allowed PROCTIME() also at arbitrary > locations in

Re: Processing-time temporal join is not supported yet.

2021-02-26 Thread Leonard Xu
Hi, Eric Firstly FileSystemTableSource doe not implement LookupTableSource which means we cannot directly lookup a Filesystem table. In FLINK-19830, we plan to support Processing-time temporal join any table/views by lookup the data in join operator state which scanned from the filesystem

Re: Producer Configuration

2021-02-26 Thread Alexey Trenikhun
I believe bootstrap.servers is mandatory Kafka property, but it looks like you didn’t set it From: Claude M Sent: Friday, February 26, 2021 12:02:10 PM To: user Subject: Producer Configuration Hello, I created a simple Producer and when the job ran, it was

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-26 Thread Timo Walther
If this problems affects multiple people, feel free to open an issue that explains how to easily reproduce the problem. This helps us or contributors to provide a fix. Regards, Timo On 26.02.21 05:08, sofya wrote: What was the actual solution? Did you have to modify pom? -- Sent from:

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-26 Thread Timo Walther
Hi Rion, I think what David was refering to is that you do the entire time handling yourself in process function. That means not using the `context.timerService()` or `onTimer()` that Flink provides but calling your own logic based on the timestamps that enter your process function and the

Re: Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-26 Thread Timo Walther
Hi Barisa, by looking at the 1.8 documentation [1] it was possible to configure the off heap memory as well. Also other memory options were already present. So I don't think that you need an upgrade to 1.11 immediately. Please let us know if you could fix your problem, otherwise we can try to

Re: BroadcastState dropped when data deleted in Kafka

2021-02-26 Thread Arvid Heise
Hi, I have no idea what's going on. There is no mechanism in DataStream to react to deleted records. Can you reproduce it locally and debug through it? On Wed, Feb 24, 2021 at 5:21 PM bat man wrote: > Hi Arvid, > > The Flink application was not re-started. I had checked on that. > By adding

Flink CEP: can't process PatternStream

2021-02-26 Thread Люльченко Юрий Николаевич
Hello everyone.   I’m trying to use Flink Cep library and I want to fetch some events by pattern. At first I’ve created a simple HelloWorld project. But I have a problem exactly like it described here:  https://stackoverflow.com/questions/39575991/flink-cep-no-results-printed   You can see my

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-26 Thread Timo Walther
Until we have more information, maybe this is also helpful: https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#inverted-class-loading-and-classloader-resolution-order On 26.02.21 09:20, Timo Walther wrote: If this problems affects multiple people,

Re: BackPressure in RowTime Task of FlinkSql Job

2021-02-26 Thread Timo Walther
Hi Aeden, the rowtime task is actually just a simple map function that extracts the event-time timestamp into a field of the row for the next operator. It should not be the problem. Can you share a screenshot of your pipeline? What is your watermarking strategy? Is it possible that you are

Re: Flink CEP: can't process PatternStream

2021-02-26 Thread Dawid Wysakowicz
Hi Yuri, Which Flink version are you using? Is it 1.12? In 1.12 we changed the default TimeCharacteristic to EventTime. Therefore you need watermarks and timestamp[1] for your program to work correctly. If you want to apply your pattern in ProcessingTime you can do: PatternStream patternStream =

Re: Best way to implemented non-windowed join

2021-02-26 Thread Timo Walther
Hi Yaroslav, I think your approach is correct. Union is perfect to implement multiway joins if you normalize the type of all streams before. It can simply be a composite type with the key and a member variable for each stream where only one of those variables is not null. A keyed process

Re: Does the Kafka source perform retractions on Key?

2021-02-26 Thread Arvid Heise
Just to clarify, intermediate topics should in most cases not be compacted for exactly the reasons if your application depends on all intermediate data. For the final topic, it makes sense. If you also consume intermediate topics for web application, one solution is to split it into two topics

Processing-time temporal join is not supported yet.

2021-02-26 Thread eric hoffmann
Hello Working with flink 1.12.1 i read in the doc that Processing-time temporal join is supported for kv like join but when i try i get a: Exception in thread "main" org.apache.flink.table.api.TableException: Processing-time temporal join is not supported yet. at

Re: How to pass PROCTIME through an aggregate

2021-02-26 Thread Timo Walther
Hi Rex, as far as I know, we recently allowed PROCTIME() also at arbitrary locations in the query. So you don't have to pass it through the aggregate but you can call it afterwards again. Does that work in your use case? Something like: SELECT i, COUNT(*) FROM customers GROUP BY i,

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-26 Thread David Anderson
Yes indeed, Timo is correct -- I am proposing that you not use timers at all. Watermarks and event-time timers go hand in hand -- and neither mechanism can satisfy your requirements. You can instead put all of the timing logic in the processElement method -- effectively emulating what you would

Re: BackPressure in RowTime Task of FlinkSql Job

2021-02-26 Thread Aeden Jameson
>>Is it possible that you are generating to many watermarks that need to be send to all downstream tasks? This was it basically. I had unexpected flooding on specific keys, which was guessing intermittently hot partitions that was back pressuring the rowtime task. I do have another question, how

Re: Does the Kafka source perform retractions on Key?

2021-02-26 Thread Rex Fenley
Does this also imply that it's not safe to compact the initial topic where data is coming from Debezium? I'd think that Flink's Kafka source would emit retractions on any existing data with a primary key as new data with the same pk arrived (in our case all data has primary keys). I guess that

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Matthias Pohl
Hi Debraj, thanks for reaching out to the Flink community. Without knowing the details on how you've set up the Single-Node YARN cluster, I would still guess that it is a configuration issue on the YARN side. Flink does not know about a .flink folder. Hence, there is no configuration to set this

Flink CEP: can't process PatternStream (v 1.12, EventTime mode)

2021-02-26 Thread Люльченко Юрий Николаевич
Hello,   I’ve already asked the question today and got the solve:  http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-CEP-can-t-process-PatternStream-td41722.html , and it’s clean for me how PatternStream works with ProcessTime.   But I need help again, I can’t write

Re: Flink CEP: can't process PatternStream (v 1.12, EventTime mode)

2021-02-26 Thread Dawid Wysakowicz
Hi, What is exactly the problem? Is it that no patterns are being generated? Usually the problem is in idle parallel instances[1]. You need to have data flowing in each of the parallel instances for a watermark to progress. You can also read about it in the aspect of Kafka's partitions[2].

Re: Flink CEP: can't process PatternStream

2021-02-26 Thread Maminspapin
Hello, David. Yes, I’m using 1.12. And my code is now working. Thank you very much for your comment. Yuri L. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Debraj Manna
In my setup hadoop-yarn-nodemenager is running with yarn user. ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager yarn 4953 1 2 05:53 ?00:11:26 /usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager -XX:+HeapDumpOnOutOfMemoryError

Re[2]: Flink CEP: can't process PatternStream (v 1.12, EventTime mode)

2021-02-26 Thread Люльченко Юрий Николаевич
David,   Thank you again for a reply. It really looks like this situation is happened  because of the parallel instances.   Best, Yuri L.   >Пятница, 26 февраля 2021, 15:40 +03:00 от Dawid Wysakowicz >: >  >Hi, >What is exactly the problem? Is it that no patterns are being generated? >Usually

Re: Flink application kept restarting

2021-02-26 Thread Matthias Pohl
Hi Rainie, the network buffer pool was destroyed for some reason. This happens when the NettyShuffleEnvironment gets closed which is triggered when an operator is cleaned up, for instance. Maybe, the timeout in the metric system caused this. But I'm not sure how this is connected. I'm gonna add

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-02-26 Thread Seth Wiesman
Strong +1 Having two planners is confusing to users and the diverging semantics make it difficult to provide useful learning material. It is time to rip the bandage off. Seth On Fri, Feb 26, 2021 at 12:54 AM Kurt Young wrote: > change.> > > Hi Timo, > > First of all I want to thank you for

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-02-26 Thread Matthias Pohl
Hi Abhishek, this might be caused by the switch from log4j to log4j2 as the default in Flink 1.11 [1]. Have you had a chance to look at the logging documentation [2] to enable log4j again? Best, Matthias [1]

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-26 Thread Jan Oelschlegel
Hi Shengkai, i’m using Flink 1.11.2. The problem is if I use a parallelism higher than my kafka partition count, the watermarks are not increasing and so the windows are never ggot fired. I suspect that then a source task is not marked as idle and thus the watermark is not increased. In any

Re: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-26 Thread Shengkai Fang
Hi, Jan. Could you tell us which Flink version you use? As far as I know, the kafka sql connector has implemented `SupportWatermarkPushDown` in Flink-1.12. The `SupportWatermarkPushDown` pushes the watermark generator into the source and emits the minimum watermark among all the partitions. For

Re: Get JobId and JobManager RPC Address in RichMapFunction executed in TaskManager

2021-02-26 Thread Matthias Pohl
Hi Sandeep, thanks for reaching out to the community. Unfortunately, the information you're looking for is not exposed in a way that you could access it from within your RichMapFunction. Could you elaborate a bit more on what you're trying to achieve? Maybe, we can find another solution for your

Re: Processing-time temporal join is not supported yet.

2021-02-26 Thread Matthias Pohl
Hi Eric, it looks like you ran into FLINK-19830 [1]. I'm gonna add Leonard to the thread. Maybe, he has a workaround for your case. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-19830 On Fri, Feb 26, 2021 at 11:40 AM eric hoffmann wrote: > Hello > Working with flink 1.12.1 i

Re: Flink application kept restarting

2021-02-26 Thread Rainie Li
Thank you Mattias. It’s version1.9. Best regards Rainie On Fri, Feb 26, 2021 at 6:33 AM Matthias Pohl wrote: > Hi Rainie, > the network buffer pool was destroyed for some reason. This happens when > the NettyShuffleEnvironment gets closed which is triggered when an operator > is cleaned up,

Producer Configuration

2021-02-26 Thread Claude M
Hello, I created a simple Producer and when the job ran, it was getting the following error: Caused by: org.apache.kafka.common.errors.TimeoutException I read about increasing the request.timeout.ms. Thus, I added the following properties. Properties properties = new Properties();

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Debraj Manna
Thanks Matthias for replying. Yes there was some yarn configuration issue on my side which I mentioned in my last email. I am starting on flink. So just for my understanding in few links (posted below) it is reported that flink needs to create a .flink directory in the users home folder. Even

Re: Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-26 Thread Matthias Pohl
Hi Bariša, have you had the chance to analyze the memory usage in more detail? An OutOfMemoryError might be an indication for some memory leak which should be solved instead of lowering some memory configuration parameters. Or is it that the off-heap memory is not actually used but blocks the JVM

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Matthias Pohl
Hi Debrai, sorry for misleading you first. You're right. I looked through the code once more and found something: There's the yarn.staging-directory [1] that is set to the user's home folder by default. This parameter is used by the YarnApplicationFileUploader [2] to upload the application files.

Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-02-26 Thread Alexey Trenikhun
Hello, We have Flink job running in Kubernetes with Kuberenetes HA enabled (JM is deployed as Job, single TM as StatefulSet). We taken savepoint with cancel=true. Now when we are trying to start job using --fromSavepoint A, where is A path we got from taking savepoint (ClusterEntrypoint reports

Flink checkpoint 速度慢问题请教

2021-02-26 Thread Jacob
Hi All, 生产环境有一个Job,在hadoopA集群运行稳定正常,checkpoint速度也很快(checkpoint间隔时间是30s,每一个checkpoint大小几十kb,做一次checkpoint耗时为毫秒级别) 相同的job,代码没有任何变化,将job迁移到另一个hadoopB集群,checkpoint就非常慢,做一次耗时10几分钟,导致job运行瘫痪,大部分时间和资源都在做checkpoint,而没有处理我们的业务逻辑。

Re: Flink SQL 应用情况请教

2021-02-26 Thread 占英华
不是指标显示问题,是数据一直没写到mysql中,也没啥错误日志,然后今天早上我把任务重启了下,数据就全部写入到mysql中了 > 在 2021年2月26日,15:02,Smile 写道: > > 你好, > > 关于指标的问题,可以进到具体的算子里面的 Metrics 页面看看每个算子的 numRecordsIn 和 > numRecordsOut,看是哪个算子开始有输入没输出的。 > 上面贴的指标看起来是 Overview 页面上的,这个地方展示的指标是对整个 Chain 起来的整体算的。 > > GroupWindowAggregate(groupBy=[stkcode],

退订

2021-02-26 Thread 李延念
退订

Re: Flink SQL 应用情况请教

2021-02-26 Thread yinghua...@163.com
这个问题不知道是不是这个原因导致的,我在Flink的webUI监控界面source和sink任务中都没看到watermark的值,其中source的Watermarks显示No Data,sink显示的是No Watermark 我的SQL语句如下: CREATE TABLE t_stock_match_p_1( id VARCHAR, stkcode INT, volume INT, matchtime BIGINT, ts as TO_TIMESTAMP(FROM_UNIXTIME(matchtime/1000,'-MM-dd HH:mm:ss')),

回复: flinksql集成hive权限管理

2021-02-26 Thread 阿华田
Ok 感谢 | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2021年02月26日 15:29,Rui Li 写道: 你好, 目前hive connector还没有支持ranger,只支持HMS端基于storage的权限控制。 On Thu, Feb 25, 2021 at 8:49 PM 阿华田 wrote:

Re: Flink sql 1.12写入hive报metastore失败

2021-02-26 Thread will he
我也遇到相同的问题了, 区别在于我是有一个springboot的项目提交的sql, 1.11.3上是好的, 换成1.12.1之后就不行了.sql-client本身可以执行, 但是我自己在springboot里面就提交不了sql了. 报的错是一样的, 求问楼主最后怎么解决的, 我以为应该是包有冲突, 但是具体是哪个jar包有冲突我还说不上来. -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 基于kafka中转数据流情况下,下游任务的watermark推进问题。

2021-02-26 Thread yidan zhao
或者如果不行我就继续合并在一起了。 但是这样就需要解决一个其他问题。 问题描述 能否基于检查点/保存点重启的时候,唯独让KafkaSource不基于检查点和保存点中的offset继续消费,而是通过我指定的offset开始消费。 简而言之:我希望保留状态的同时,忽略部分数据。应用场景:数据延迟了,但我希望快速赶到最新数据去,但不希望直接不基于保存点重启任务,因为部分算子的状态比较重要,是天级别的状态,需要一整天保留。 yidan zhao 于2021年2月26日周五 下午5:48写道: > 如题,如果我任务本身是多个连续的window处理。 >

Re: Flink sql 1.12写入hive报metastore失败

2021-02-26 Thread will he
我也遇到类似的问题了, 求问楼主最后怎么解决的. -- Sent from: http://apache-flink.147419.n8.nabble.com/

基于kafka中转数据流情况下,下游任务的watermark推进问题。

2021-02-26 Thread yidan zhao
如题,如果我任务本身是多个连续的window处理。 现在想拆分,基于kafka中转数据。但面临的第一个麻烦问题就是watermark的推进,当然简单实现也能满足功能,但是比如我窗口都是5min的,会导致下游窗口晚5min触发。比如window1 => window2的场景下,使用maxOutOfOrderness为1min的时候,[0-5) 的数据在6min数据到的时候触发计算。如果拆分了,那么window2需要11min时候window1输出[5-10)的数据到达window2时候才会触发window2的[0,5)的计算。

Re:flink-Kafka 报错:ByteArraySerializer is not an instance of org.apache.kafka.common.serialization.Serializer

2021-02-26 Thread felixzh
hi,这个是依赖的问题。如果集群flink/lib下已经有了flink-connector-kafka.jar,提交的任务pom里面就要provider一下 在 2021-01-22 16:14:17,"lp" <973182...@qq.com> 写道: >测试代码如下: >-- >public class Sink_KafkaSink_1{ >public static void main(String[] args) throws Exception { >final

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-02-26 Thread Seth Wiesman
Strong +1 Having two planners is confusing to users and the diverging semantics make it difficult to provide useful learning material. It is time to rip the bandage off. Seth On Fri, Feb 26, 2021 at 12:54 AM Kurt Young wrote: > change.> > > Hi Timo, > > First of all I want to thank you for