Digging around, it looks like Upsert Kafka which requires a Primary Key
will actually do what I want and uses compaction, but it doesn't look
compatible with Debezium format? Is this on the roadmap?
In the meantime, we're considering consuming from Debezium Kafka (still
compacted) and then
Hi Team,
While running java flink project in local, I am facing following issues: *Could
not create actor system ; Caused by: java.lang.NoSuchMethodError:
scala.Product.$init$(Lscala/Product;)V*
Could you suggest does flink java project needs scala at run time? What
versions might be
David and Timo,
Firstly, thank you both so much for your contributions and advice. I believe
I’ve implemented things along the lines that you both detailed and things
appear to work just as expected (e.g. I can see things arriving, being added to
windows, discarding late records, and
Hello!
I'm getting an exception running a modified version of datastream/statefun
example. (See exception details that follow.) The example was adapted from
the original datastream example provided in statefun repo. I was trying to
play with the example by chaining two functions (with the 1st
Hi everybody,
I just wanted to say thanks again for all your input and share the
(surprisingly simple) solution that we came up with in the meantime:
class SensorRecordCounter extends KeyedProcessFunctionSensorRecord, SensorCount>{
private ValueState state;
private long windowSizeMs =
Hi Jan.
Thanks for your reply. Do you set the option
`table.exec.source.idle-timeout` and `pipeline.auto-watermark-interval` ?
If the `pipeline.auto-watermark-interval ` is zero, it will not trigger the
detection of the idle source.
Best,
Shengkai
Jan Oelschlegel 于2021年2月26日周五
下午11:09写道:
>
This is great Timo. Maybe it only works in SQL but not Table API in the
middle of a plan, which is fine. We'll give this a shot, thank you so much.
On Fri, Feb 26, 2021 at 2:00 AM Timo Walther wrote:
> Hi Rex,
>
> as far as I know, we recently allowed PROCTIME() also at arbitrary
> locations in
Hi, Eric
Firstly FileSystemTableSource doe not implement LookupTableSource which means
we cannot directly lookup a Filesystem table.
In FLINK-19830, we plan to support Processing-time temporal join any
table/views by lookup the data in join operator state which scanned from the
filesystem
I believe bootstrap.servers is mandatory Kafka property, but it looks like you
didn’t set it
From: Claude M
Sent: Friday, February 26, 2021 12:02:10 PM
To: user
Subject: Producer Configuration
Hello,
I created a simple Producer and when the job ran, it was
If this problems affects multiple people, feel free to open an issue
that explains how to easily reproduce the problem. This helps us or
contributors to provide a fix.
Regards,
Timo
On 26.02.21 05:08, sofya wrote:
What was the actual solution? Did you have to modify pom?
--
Sent from:
Hi Rion,
I think what David was refering to is that you do the entire time
handling yourself in process function. That means not using the
`context.timerService()` or `onTimer()` that Flink provides but calling
your own logic based on the timestamps that enter your process function
and the
Hi Barisa,
by looking at the 1.8 documentation [1] it was possible to configure the
off heap memory as well. Also other memory options were already present.
So I don't think that you need an upgrade to 1.11 immediately. Please
let us know if you could fix your problem, otherwise we can try to
Hi,
I have no idea what's going on. There is no mechanism in DataStream to
react to deleted records.
Can you reproduce it locally and debug through it?
On Wed, Feb 24, 2021 at 5:21 PM bat man wrote:
> Hi Arvid,
>
> The Flink application was not re-started. I had checked on that.
> By adding
Hello everyone.
I’m trying to use Flink Cep library and I want to fetch some events by pattern.
At first I’ve created a simple HelloWorld project. But I have a problem exactly
like it described here:
https://stackoverflow.com/questions/39575991/flink-cep-no-results-printed
You can see my
Until we have more information, maybe this is also helpful:
https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#inverted-class-loading-and-classloader-resolution-order
On 26.02.21 09:20, Timo Walther wrote:
If this problems affects multiple people,
Hi Aeden,
the rowtime task is actually just a simple map function that extracts
the event-time timestamp into a field of the row for the next operator.
It should not be the problem. Can you share a screenshot of your
pipeline? What is your watermarking strategy? Is it possible that you
are
Hi Yuri,
Which Flink version are you using? Is it 1.12? In 1.12 we changed the
default TimeCharacteristic to EventTime. Therefore you need watermarks
and timestamp[1] for your program to work correctly. If you want to
apply your pattern in ProcessingTime you can do:
PatternStream patternStream =
Hi Yaroslav,
I think your approach is correct. Union is perfect to implement multiway
joins if you normalize the type of all streams before. It can simply be
a composite type with the key and a member variable for each stream
where only one of those variables is not null. A keyed process
Just to clarify, intermediate topics should in most cases not be compacted
for exactly the reasons if your application depends on all intermediate
data. For the final topic, it makes sense. If you also consume intermediate
topics for web application, one solution is to split it into two topics
Hello
Working with flink 1.12.1 i read in the doc that Processing-time temporal
join is supported for kv like join but when i try i get a:
Exception in thread "main" org.apache.flink.table.api.TableException:
Processing-time temporal join is not supported yet.
at
Hi Rex,
as far as I know, we recently allowed PROCTIME() also at arbitrary
locations in the query. So you don't have to pass it through the
aggregate but you can call it afterwards again.
Does that work in your use case? Something like:
SELECT i, COUNT(*) FROM customers GROUP BY i,
Yes indeed, Timo is correct -- I am proposing that you not use timers at
all. Watermarks and event-time timers go hand in hand -- and neither
mechanism can satisfy your requirements.
You can instead put all of the timing logic in the processElement method --
effectively emulating what you would
>>Is it possible that you
are generating to many watermarks that need to be send to all downstream
tasks?
This was it basically. I had unexpected flooding on specific keys, which
was guessing intermittently hot partitions that was back pressuring the
rowtime task.
I do have another question, how
Does this also imply that it's not safe to compact the initial topic where
data is coming from Debezium? I'd think that Flink's Kafka source would
emit retractions on any existing data with a primary key as new data with
the same pk arrived (in our case all data has primary keys). I guess that
Hi Debraj,
thanks for reaching out to the Flink community. Without knowing the details
on how you've set up the Single-Node YARN cluster, I would still guess that
it is a configuration issue on the YARN side. Flink does not know about a
.flink folder. Hence, there is no configuration to set this
Hello,
I’ve already asked the question today and got the solve:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-CEP-can-t-process-PatternStream-td41722.html
, and it’s clean for me how PatternStream works with ProcessTime.
But I need help again, I can’t write
Hi,
What is exactly the problem? Is it that no patterns are being generated?
Usually the problem is in idle parallel instances[1]. You need to have
data flowing in each of the parallel instances for a watermark to
progress. You can also read about it in the aspect of Kafka's partitions[2].
Hello, David.
Yes, I’m using 1.12. And my code is now working. Thank you very much for
your comment.
Yuri L.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
In my setup hadoop-yarn-nodemenager is running with yarn user.
ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager
yarn 4953 1 2 05:53 ?00:11:26
/usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager
-XX:+HeapDumpOnOutOfMemoryError
David,
Thank you again for a reply. It really looks like this situation is happened
because of the parallel instances.
Best,
Yuri L.
>Пятница, 26 февраля 2021, 15:40 +03:00 от Dawid Wysakowicz
>:
>
>Hi,
>What is exactly the problem? Is it that no patterns are being generated?
>Usually
Hi Rainie,
the network buffer pool was destroyed for some reason. This happens when
the NettyShuffleEnvironment gets closed which is triggered when an operator
is cleaned up, for instance. Maybe, the timeout in the metric system caused
this. But I'm not sure how this is connected. I'm gonna add
Strong +1
Having two planners is confusing to users and the diverging semantics make
it difficult to provide useful learning material. It is time to rip the
bandage off.
Seth
On Fri, Feb 26, 2021 at 12:54 AM Kurt Young wrote:
> change.>
>
> Hi Timo,
>
> First of all I want to thank you for
Hi Abhishek,
this might be caused by the switch from log4j to log4j2 as the default in
Flink 1.11 [1]. Have you had a chance to look at the logging documentation
[2] to enable log4j again?
Best,
Matthias
[1]
Hi Shengkai,
i’m using Flink 1.11.2. The problem is if I use a parallelism higher than my
kafka partition count, the watermarks are not increasing and so the windows are
never ggot fired.
I suspect that then a source task is not marked as idle and thus the watermark
is not increased. In any
Hi, Jan.
Could you tell us which Flink version you use? As far as I know, the kafka
sql connector has implemented `SupportWatermarkPushDown` in Flink-1.12. The
`SupportWatermarkPushDown` pushes the watermark generator into the source
and emits the minimum watermark among all the partitions. For
Hi Sandeep,
thanks for reaching out to the community. Unfortunately, the information
you're looking for is not exposed in a way that you could access it from
within your RichMapFunction. Could you elaborate a bit more on what you're
trying to achieve? Maybe, we can find another solution for your
Hi Eric,
it looks like you ran into FLINK-19830 [1]. I'm gonna add Leonard to the
thread. Maybe, he has a workaround for your case.
Best,
Matthias
[1] https://issues.apache.org/jira/browse/FLINK-19830
On Fri, Feb 26, 2021 at 11:40 AM eric hoffmann
wrote:
> Hello
> Working with flink 1.12.1 i
Thank you Mattias.
It’s version1.9.
Best regards
Rainie
On Fri, Feb 26, 2021 at 6:33 AM Matthias Pohl
wrote:
> Hi Rainie,
> the network buffer pool was destroyed for some reason. This happens when
> the NettyShuffleEnvironment gets closed which is triggered when an operator
> is cleaned up,
Hello,
I created a simple Producer and when the job ran, it was getting the
following error:
Caused by: org.apache.kafka.common.errors.TimeoutException
I read about increasing the request.timeout.ms. Thus, I added the
following properties.
Properties properties = new Properties();
Thanks Matthias for replying.
Yes there was some yarn configuration issue on my side which I mentioned in
my last email.
I am starting on flink. So just for my understanding in few links (posted
below) it is reported that flink needs to create a .flink directory in the
users home folder. Even
Hi Bariša,
have you had the chance to analyze the memory usage in more detail? An
OutOfMemoryError might be an indication for some memory leak which should
be solved instead of lowering some memory configuration parameters. Or is
it that the off-heap memory is not actually used but blocks the JVM
Hi Debrai,
sorry for misleading you first. You're right. I looked through the code
once more and found something: There's the yarn.staging-directory [1] that
is set to the user's home folder by default. This parameter is used by the
YarnApplicationFileUploader [2] to upload the application files.
Hello,
We have Flink job running in Kubernetes with Kuberenetes HA enabled (JM is
deployed as Job, single TM as StatefulSet). We taken savepoint with
cancel=true. Now when we are trying to start job using --fromSavepoint A, where
is A path we got from taking savepoint (ClusterEntrypoint reports
Hi All,
生产环境有一个Job,在hadoopA集群运行稳定正常,checkpoint速度也很快(checkpoint间隔时间是30s,每一个checkpoint大小几十kb,做一次checkpoint耗时为毫秒级别)
相同的job,代码没有任何变化,将job迁移到另一个hadoopB集群,checkpoint就非常慢,做一次耗时10几分钟,导致job运行瘫痪,大部分时间和资源都在做checkpoint,而没有处理我们的业务逻辑。
不是指标显示问题,是数据一直没写到mysql中,也没啥错误日志,然后今天早上我把任务重启了下,数据就全部写入到mysql中了
> 在 2021年2月26日,15:02,Smile 写道:
>
> 你好,
>
> 关于指标的问题,可以进到具体的算子里面的 Metrics 页面看看每个算子的 numRecordsIn 和
> numRecordsOut,看是哪个算子开始有输入没输出的。
> 上面贴的指标看起来是 Overview 页面上的,这个地方展示的指标是对整个 Chain 起来的整体算的。
>
> GroupWindowAggregate(groupBy=[stkcode],
退订
这个问题不知道是不是这个原因导致的,我在Flink的webUI监控界面source和sink任务中都没看到watermark的值,其中source的Watermarks显示No
Data,sink显示的是No Watermark
我的SQL语句如下:
CREATE TABLE t_stock_match_p_1(
id VARCHAR,
stkcode INT,
volume INT,
matchtime BIGINT,
ts as TO_TIMESTAMP(FROM_UNIXTIME(matchtime/1000,'-MM-dd HH:mm:ss')),
Ok 感谢
| |
阿华田
|
|
a15733178...@163.com
|
签名由网易邮箱大师定制
在2021年02月26日 15:29,Rui Li 写道:
你好,
目前hive connector还没有支持ranger,只支持HMS端基于storage的权限控制。
On Thu, Feb 25, 2021 at 8:49 PM 阿华田 wrote:
我也遇到相同的问题了, 区别在于我是有一个springboot的项目提交的sql, 1.11.3上是好的,
换成1.12.1之后就不行了.sql-client本身可以执行, 但是我自己在springboot里面就提交不了sql了. 报的错是一样的,
求问楼主最后怎么解决的, 我以为应该是包有冲突, 但是具体是哪个jar包有冲突我还说不上来.
--
Sent from: http://apache-flink.147419.n8.nabble.com/
或者如果不行我就继续合并在一起了。
但是这样就需要解决一个其他问题。
问题描述
能否基于检查点/保存点重启的时候,唯独让KafkaSource不基于检查点和保存点中的offset继续消费,而是通过我指定的offset开始消费。
简而言之:我希望保留状态的同时,忽略部分数据。应用场景:数据延迟了,但我希望快速赶到最新数据去,但不希望直接不基于保存点重启任务,因为部分算子的状态比较重要,是天级别的状态,需要一整天保留。
yidan zhao 于2021年2月26日周五 下午5:48写道:
> 如题,如果我任务本身是多个连续的window处理。
>
我也遇到类似的问题了, 求问楼主最后怎么解决的.
--
Sent from: http://apache-flink.147419.n8.nabble.com/
如题,如果我任务本身是多个连续的window处理。
现在想拆分,基于kafka中转数据。但面临的第一个麻烦问题就是watermark的推进,当然简单实现也能满足功能,但是比如我窗口都是5min的,会导致下游窗口晚5min触发。比如window1
=> window2的场景下,使用maxOutOfOrderness为1min的时候,[0-5)
的数据在6min数据到的时候触发计算。如果拆分了,那么window2需要11min时候window1输出[5-10)的数据到达window2时候才会触发window2的[0,5)的计算。
hi,这个是依赖的问题。如果集群flink/lib下已经有了flink-connector-kafka.jar,提交的任务pom里面就要provider一下
在 2021-01-22 16:14:17,"lp" <973182...@qq.com> 写道:
>测试代码如下:
>--
>public class Sink_KafkaSink_1{
>public static void main(String[] args) throws Exception {
>final
Strong +1
Having two planners is confusing to users and the diverging semantics make
it difficult to provide useful learning material. It is time to rip the
bandage off.
Seth
On Fri, Feb 26, 2021 at 12:54 AM Kurt Young wrote:
> change.>
>
> Hi Timo,
>
> First of all I want to thank you for
54 matches
Mail list logo