Hi,
I think it should be working. At least from the top of my head I do not see
any reason why it shouldn't be working.
Just make sure that you are proxying all relevant methods, not only those
defined in `SourceFunction`. For example `FlinkKafkaConsumer` is
implementing/extending:
Hi,
Depending how you configured your FlinkKafkaSource, but you can make the
source to commit consumed offsets back to Kafka. So one way to examine
them, would be to check those offsets in Kafka (I don't know how, but I'm
pretty sure there is a way to do it).
Secondly, if you want to examine
Hi,
One thing that you can do is to read this record using Avro keeping
`Result` as `bytes` and in a subsequent mapping function, you could change
the record type and deserialize the result. In Data Stream API:
source.map(new MapFunction { ...} )
Best,
Piotrek
śr., 14 kwi 2021 o 03:17 Sumeet
OOM: java heap space. Where to move next? simply bump up
> taskmananger.memory? or just increase heap?
> 3. What’s the final state? Job running fine and ensuring XYZ headroom in
> each memory component?
>
> Best
> Lu
>
> On Tue, Apr 6, 2021 at 12:26 AM Piotr Nowojski
> wrote:
>
> &g
Hi,
this should be posted on the user mailing list not the dev.
Apart from that, this looks like normal/standard behaviour of JVM, and has
very little to do with Flink. Garbage Collector (GC) is kicking in when
memory usage is approaching some threshold:
Hi,
What Flink version are you using and what is the scenario that's happening?
It can be a number of things, most likely an issue that your filed mounted
under:
>
/mnt/checkpoints/5dde50b6e70608c63708cbf979bce4aa/shared/47993871-c7eb-4fec-ae23-207d307c384a
disappeared or stopped being
Hi,
I hope someone else might have a better answer, but one thing that would
most likely work is to convert this field and define even time during
DataStream to table conversion [1]. You could always pre process this field
in the DataStream API.
Piotrek
[1]
Hi Sandeep,
I think it should work fine with `StandaloneCompletedCheckpointStore`.
Have you checked if your directory /Users/test/savepoint is being
populated in the first place? And if so, if the restarted job is not
throwing some exceptions like it can not access those files?
Also note, that
Hi Alexey,
You should definitely investigate why the job is stuck.
1. First of all, is it completely stuck, or is something moving? - Use
Flink metrics [1] (number bytes/records processed), and go through all of
the operators/tasks to check this.
2. The stack traces like the one you quoted:
>
dated blog post
>
> Thanks,
> Alexey
>
> ------
> *From:* Piotr Nowojski
> *Sent:* Friday, March 19, 2021 5:01 AM
> *To:* Alexey Trenikhun
> *Cc:* Flink User Mail List
> *Subject:* Re: inputFloatingBuffersUsage=0?
>
> Hi,
>
>
` and
`busyTimeMsPerSecond`. We are planning to release a new updated blog post
about analysing backpressure in the following weeks.
Best,
Piotrek
pt., 19 mar 2021 o 11:57 Piotr Nowojski napisał(a):
> Hi Alexey,
>
> Have you looked at the documentation [1]?
>
> > inPoolUsage An estimate of
>
> Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure
> Thing. 23 Jul 2019 Nico Kruber & Piotr Nowojski . In a previous blog post,
> we presented how Flink’s network stack works from the high-level
> abstractions to the low-level details.This second blog
ailable in Jira:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349502=12315522
> > >
> > > We would like to thank all contributors of the Apache Flink community
> who
> > > made this release possible!
> > >
> > > Special thanks to Yuan Mei for managing the release and PMC members
> > Robert
> > > Metzger, Chesnay Schepler and Piotr Nowojski.
> > >
> > > Regards,
> > > Roman
> > >
> >
>
sal Flink producer, because of an older Kafka
> version I am reading from. So unfortunately for now I will have to go with
> the hack.
> Thanks
> --
> *From:* Piotr Nowojski
> *Sent:* 03 March 2021 21:10
> *To:* Witzany, Tomas
> *Cc:* user@flink.a
> Jan
>
>
> *Gesendet:* Mittwoch, 03. März 2021 um 19:53 Uhr
> *Von:* "Piotr Nowojski"
> *An:* "Jan Nitschke"
> *Cc:* "user"
> *Betreff:* Re: Independence of task parallelism
> Hi Jan,
>
> As far as I remember, Flink does
er. However when I
> checked the registeredTaskManager variable, only one or two TaskManagers
> are registered even I started 9 TaskManagers. I would like to know how I
> can register every started TaskManager.
>
> Best regards,
>
> Hyejo
>
> 2021년 3월 3일 (수) 오후 7:37, Piotr Nowoj
Hi Hemant,
State of the latest seen watermarks is not persisted in the operators.
Currently DataStream API assumes that after recovery watermarks are going
to be re-emitted sooner or later. What probably happens is that one of your
sources has emitted watermarks (maybe some very high one or even
st,
> Kezhu Wang
>
>
> On March 4, 2021 at 22:13:48, Piotr Nowojski (piotr.nowoj...@gmail.com)
> wrote:
>
> Hi Meissner,
>
> Can you clarify, are you talking about stateful functions? [1] Or the
> stream iterations [2]? The first e-mail suggests stateful functions, but
>
Hi Meissner,
Can you clarify, are you talking about stateful functions? [1] Or the
stream iterations [2]? The first e-mail suggests stateful functions, but
the ticket that Kezhu created is talking about the latter.
Piotrek
[1] https://flink.apache.org/news/2020/04/07/release-statefun-2.0.0.html
re trying to contribute to the community. See FLINK-21110 [1] for the
> details.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-21110
>
> On Thu, Mar 4, 2021 at 3:39 PM Piotr Nowojski
> wrote:
>
>> Hi Joey,
>>
>&
Hi,
Sorry, I don't know. I've heard that this kind of pattern is discouraged by
Confluent. At least it used to be.
Maybe someone else from the community will be able to help from his
experience, however keep in mind that under the hood Flink is just simply
using KafkaConsumer and KafkaProducer
Great :)
Just one more note. Currently FlinkKafkaShuffle has a critical bug [1] that
probably will prevent you from using it directly. I hope it will be fixed
in some next release. In the meantime you can just inspire your solution
with the source code.
Best,
Piotrek
[1]
Hi Joey,
Sorry for not responding to your question sooner. As you can imagine there
are not many users running Flink at such scale. As far as I know, Alibaba
is running the largest/one of the largest clusters, I'm asking for someone
who is familiar with those deployments to take a look at this
Hi,
The question would be, why do you want to do it? I think it might be
possible, but probably nobody has ever tested it. Flink is a distributed
system, so running it on an Android phone doesn't make much sense.
I would suggest you first make your app/example work outside of Android. To
make
Hi,
What Flink version and which FlinkKafkaProducer version are you using?
`FlinkKafkaProducerBase` is no longer used in the latest version. I would
guess some older versions, and FlinkKafkaProducer010 or later (no longer
supported).
I would suggest either to use the universal FlinkKafkaProducer
Hi,
I'm not sure what's the reason behind this. Probably classes are somehow
attached to the state and this would explain why you are experiencing this
issue. I've asked someone else from the community to chip in, but in the
meantime, can not you just prepare a new "version 1" of the job, with
Hi,
Can not you write the watermark as a special event to the "mid-topic"? In
the "new job2" you would parse this event and use it to assign watermark
before `xxxWindow2`? I believe this is what FlinkKafkaShuffle is doing [1],
you could look at its code for inspiration.
Piotrek
[1]
Hi,
1)
Do you want to output those metrics as Flink metrics? Or output those
"metrics"/counters as values to some external system (like Kafka)? The
problem discussed in [1], was that the metrics (Counters) were not fitting
in memory, so David suggested to hold them on Flink's state and treat the
Hi Jan,
As far as I remember, Flink doesn't handle very well cases like (1-2-1-1-1)
and two Task Managers. There are no guarantees how the operators/subtasks
are going to be scheduled, but most likely it will be as you
mentioned/observed. First task manager will be handling all of the
operators,
Hi Hyejo,
I don't think it's possible. May I ask why do you want to do this?
Best, Piotrek
pon., 1 mar 2021 o 21:02 황혜조 napisał(a):
> Hi,
>
> I am looking for a way to allocate each created subTask to a specific
> TaskManager.
> Is there any way to force assigning tasks to specific
16:58 Kezhu Wang napisał(a):
> Piotr is right. So just ignore my words. It is the price of going deep
> down the rabbit hole:-).
>
>
> Best,
> Kezhu Wang
>
>
> On February 17, 2021 at 23:48:30, Piotr Nowojski (pnowoj...@apache.org)
> wrote:
>
> Note^2: InputSele
Note^2: InputSelectable is `@PublicEvolving` API, so it can be used.
However as Timo pointed out, it would block the checkpointing. If I
remember correctly there is a checkState that will not allow to use
`InputSelectable` with enabled checkpointing.
Piotrek
śr., 17 lut 2021 o 16:46 Kezhu Wang
Hi Salva,
I'm not sure, but I think you can not access the state (especially the
keyed state) from within the metric, as metrics are being evaluated outside
of the keyed context, and also from another thread. Also things like
`ValueState`/`MapState` are not exposing any size.
So probably you
Hi,
As Kezhu Wang pointed out, this MIGHT BE caused by the
https://issues.apache.org/jira/browse/FLINK-21028 issue.
During stop with savepoint procedure, source thread might be interrupted,
leaving the whole application in an invalid and inconsistent state. In
FLINK-1.12.x one potential symptom
kpointing? I would expect
> Amazon to have enough resources. When I turn my sink (the next operator)
> into a print, it fails during checkpointing as well.
>
> I will explore what you mentioned though. Thank you.
>
> On Mon, Feb 1, 2021 at 6:53 AM Piotr Nowojski
> wrote:
>
>&g
Hey,
Sorry for my hasty response. I didn't notice you have the import inside the
code block.
Have you maybe tried one of the responses suggested in the Stackoverflow by
other users?
Best,
Piotrek
pon., 1 lut 2021 o 15:49 Piotr Nowojski napisał(a):
> Hey Devin,
>
> Have you ma
Hi,
Yes, it's working. You would need to analyse what's working slower than
expected. Checkpointing times? (Async duration? Sync duration? Start
delay/back pressure?) Throughput? Recovery/startup? Are you being rate
limited by Amazon?
Piotrek
czw., 28 sty 2021 o 03:46 Marco Villalobos
Hey Devin,
Have you maybe tried looking for an answer via Google? Via just copying
pasting your error message into Google I'm getting hundreds of results
pointing towards:
import org.apache.flink.api.scala._
Best,
Piotrek
czw., 28 sty 2021 o 04:13 Devin Bost napisał(a):
> I posted this
Hi Billy,
Could you maybe share some minimal code reproducing the problem? For
example I would suggest to start with reading from local files with some
trivial application.
Best Piotrek
pt., 22 sty 2021 o 00:21 Billy Bain napisał(a):
> I have a Streaming process where new directories are
Hi Sudharsan,
Sorry for maybe a bit late response, but as far as I can tell, this comment
refers to this piece of code:
public void apply(KEY key, W window, Iterable>
values, Collector out)
throws Exception {
List oneValues = new ArrayList<>();
Hi,
Yes Dawid is correct. Communications between two tasks on the same
TaskManager are not going through the network, but via "local" channel
(`LocalInputChannel`). It's still serialising and deserializing the data,
but there are no network overheads, and local channels have only half of
the
duplicate a
> job in order to do some testing out-of-bound from the normal job while
> slightly tweaking / tuning things. Is there any way to transfer offsets
> between consumer groups?
>
> On Tue, Jan 19, 2021 at 5:45 AM Piotr Nowojski
> wrote:
>
>> Hi,
>>
&
No problem :)
Piotrek
śr., 20 sty 2021 o 02:12 Kazunori Shinhira
napisał(a):
> Hi,
>
>
> Thank you for your explanation.
>
> I now understand the need for checkpoint lock :)
>
>
>
> Best,
>
> 2021年1月19日(火) 18:00 Piotr Nowojski :
>
>> Hi,
>>
nt? Something else must be going on that's
> in addition to the normal alignment process.
>
> On Tue, Jan 19, 2021 at 8:14 AM Piotr Nowojski
> wrote:
>
>> Hey Rex,
>>
>> What do you mean by "Start Delay" when recovering from a checkpoint? Did
>> you me
Hey Rex,
What do you mean by "Start Delay" when recovering from a checkpoint? Did
you mean when taking a checkpoint? If so:
1. https://www.google.com/search?q=flink+checkpoint+start+delay
2. top 3 result (at least for me)
from a checkpoint or savepoints offsets if there are
> some (unless checkpointing offsets is turned off).
>
> Is this interpretation correct?
>
> Thanks!
>
>
> On Mon, Jan 18, 2021 at 3:23 AM Piotr Nowojski
> wrote:
>
>> Hi Rex,
>>
>> I believe this secti
nding correct ?
>
>
> Thank you for the information on the new Source interface.
>
> I’ll look into how to implement it.
>
>
>
> Best,
>
> 2021年1月18日(月) 23:45 Piotr Nowojski :
>
>> Hi Kazunori,
>>
>> The checkpoint lock is acquired preemptive
ed? Anything else I'm missing?
>
> Thanks!
>
> On Mon, Jan 18, 2021 at 2:49 AM Piotr Nowojski
> wrote:
>
>> Hi Rex,
>>
>> Good that you have found the source of your problem and thanks for
>> reporting it back.
>>
>> Regarding your question about th
Hi Kazunori,
The checkpoint lock is acquired preemptively inside the
SourceContext#collect call for the cases if the source is state less.
However this is not enough if you are implementing a stateful
`SourceFunction`, since if you are modifying state in your source function
outside of the
Hi Rex,
I believe this section answers your question [1]
Piotrek
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration
pon., 18 sty 2021 o 09:00 赵一旦 napisał(a):
> If you changed the consumer group in your new job,
Hi Penguin,
Building on top of Yangze's response, you can also take a look at the more
detailed system resources usage [1] after adding an optional dependency to
the class path/lib directory.
Regarding the single task/task slot metrics, as Yangze noted there is
"almost" no isolation of the
Hi Rex,
Good that you have found the source of your problem and thanks for
reporting it back.
Regarding your question about the recovery steps (ignoring scheduling and
deploying). I think it depends on the used state backend. From your
other emails I see you are using RocksDB, so I believe this
ultez go/secu.
> Be cautious before opening attachments or clicking on any links. If in
> doubt, use "Suspicious email" button or visit go/secu.
>
>
>
>
>
>
>
> -- Message transféré -
> De : *Piotr Nowojski*
> Date : mer. 6 janv. 2021 à 17:26
&
Hi,
If you have an unstable network, which is dropping packets in a weird way
(data is lost, but the connection is still kept alive from the perspective
of the underlying operating system) it could happen that task will be
perpetually blocked. But this is extremely rare. I would first suggest
Hi,
1. I think those changes will mostly bring new features/functionalities to
the existing Streaming APIs in order to fully support batch executions. For
example one way or another to better handle "bounded data streams" in the
DataStream API.
2. I think there is and there is not going to be one
Hey,
have you added Kafka connector as the dependency? [1]
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/kafka.html#dependencies
Best,
Piotrek
śr., 6 sty 2021 o 04:37 Aeden Jameson napisał(a):
> I've upgraded from 1.11.1 to 1.12 in hopes of using the
Hi,
It's hard for me to guess what could be the problem. There was the same
error reported a couple of months ago [1], but there is frankly no extra
information there.
Can we start from looking at the full TaskManager and JobManager logs?
Could you share them with us?
Best,
Piotrek
[1]
Hi Eric,
We have never measured it. Probably the most important overhead (probably
the only significant thing) is the type conversion. Especially if object
reuse is disabled this means serialization step.
The best thing for you would be to just try it out in your particular use
case.
Best,
> capacity for it or if ~1 GB is appropriate.
>
>
>
> taskmanager.memory.task.off-heap.size: 1536m
>
> taskmanager.memory.managed.size: 3g
>
> taskmanager.memory.task.heap.size: 6g
>
> taskmanager.memory.jvm-metaspace.size: 1536m
&g
r with the jobmanager?
>
> -K
>
> On Tue, Dec 8, 2020 at 3:19 AM Piotr Nowojski
> wrote:
>
>> Hi Kye,
>>
>> Almost for sure this error is not the primary cause of the failure. This
>> error means that the node reporting it, has detected some fatal failure on
Hi Kye,
Almost for sure this error is not the primary cause of the failure. This
error means that the node reporting it, has detected some fatal failure on
the other side of the wire (connection reset by peer), but the original
error is somehow too slow or unable to propagate to the JobManager
Hi Josson,
Thanks for great investigation and coming back to use. Aljoscha, could you
help us here? It looks like you were involved in this original BEAM-3087
issue.
Best,
Piotrek
pt., 23 paź 2020 o 07:36 Josson Paul napisał(a):
> @Piotr Nowojski @Nico Kruber
>
> An update.
>
| Broadcast
>
> | |
>
> Union --
>
> |
>
> ___
>
> | Sink |
>
>
-docs-release-1.11/dev/table/common.html
>
> Thanks for your help~
>
> -- 原始邮件 ------
> *发件人:* "Piotr Nowojski" ;
> *发送时间:* 2020年10月14日(星期三) 晚上10:20
> *收件人:* "大森林";
> *抄送:* "user";
> *主题:* Re: what's the datasets u
No problem :)
Piotrek
czw., 15 paź 2020 o 08:18 Pankaj Chand
napisał(a):
> Thank you for the quick and informative reply, Piotrek!
>
> On Thu, Oct 15, 2020 at 2:09 AM Piotr Nowojski
> wrote:
>
>> Hi Pankay,
>>
>> Yes, you can trigger a window per each e
or not? The main problem I
> encountered when playing around with broadcast state was that I couldn’t
> figure out how to access the broadcast state within the sink, but perhaps I
> just haven’t thought about it the right way. I’ll meditate on the docs
> further
>
>
>
> Julian
r every processed record.
>
> How do I do this?
>
> Also, is there any way I can change the execution.buffer-timeout or
> setbuffertimeout(milliseconds) dynamically while the job is running?
>
> Thank you,
>
> Pankaj
>
> On Wed, Oct 14, 2020 at 9:42 AM Piotr N
I'm glad to hear that :)
Best regards,
Piotrek
śr., 14 paź 2020 o 18:28 Vijayendra Yadav
napisał(a):
> Thank You Piotre. I moved *flink-s3-fs-hadoop* library to plugin. Now
> it's good.
>
>
> On Wed, Oct 14, 2020 at 6:23 AM Piotr Nowojski
> wrote:
>
>> Hi,
>>
Great! Please let us know if it solves the issue or not.
Best,
Piotrek
śr., 14 paź 2020 o 17:46 Vijayendra Yadav
napisał(a):
> Hi Piotrek,
>
> That is correct I was still in 1.10, I am upgrading to 1.11.
>
> Regards,
> Vijay
>
> On Wed, Oct 14, 2020 at 6:12 AM Piotr No
t;
> I mean:
> Is there such a dataset can be downloaded
> to satisfy all the examples in the document?
>
> Thanks for your help
>
> -- 原始邮件 ------
> *发件人:* "Piotr Nowojski" ;
> *发送时间:* 2020年10月14日(星期三) 晚上9:52
> *收件人:* "大森林&quo
Hi,
It depends how you defined `orders` in your example. For example here [1]
> Table orders = tEnv.from("Orders"); // schema (a, b, c, rowtime)
`orders` is obtained from the environment, from a table registered under
the name "Orders". You would need to first register such table, or register
a
Hi Pankaj,
I'm not entirely sure if I understand your question.
If you want to minimize latency, you should avoid using windows or any
other operators, that are buffering data for long periods of time. You
still can use windowing, but you might want to emit updated value of the
window per every
Hi Julian,
Have you seen Broadcast State [1]? I have never used it personally, but it
sounds like something you want. Maybe your job should look like:
1. read raw messages from Kafka, without using the schema
2. read schema changes and broadcast them to 3. and 5.
3. deserialize kafka records in
Hi,
Are you sure you are loading the filesystems correctly? Are you using the
plugin mechanism? [1] Since Flink 1.10 plugins can only be loaded in this
way [2], while there were some changes to plug some holes in Flink 1.11 [3].
Best,
Piotrek
[1]
Hi Yadav,
What Flink version are you using? `getPartPrefix` and `getPartSufix`
methods were not public before 1.10.1/1.11.0, which might be causing
this problem for you. Other than that, if you are already using Flink
1.10.1 (or newer), maybe please double check what class are you extending?
The
if I upgrade to a newer
> JDK version (I tried with JDK 1.8.0_265) the issue doesn’t happen.
>
> Thank you for helping
> -Binh
>
> On Fri, Oct 9, 2020 at 11:36 AM Piotr Nowojski
> wrote:
>
>> Hi Binh,
>>
>> Could you try upgrading Flink's Java
nt (build 1.8.0_77-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
>
> Thanks
> -Binh
>
> On Fri, Oct 9, 2020 at 10:23 AM Piotr Nowojski
> wrote:
>
>> Hi,
>>
>> One more thing. It looks like it's not a Flink issue, but some JDK bug.
Hi,
One more thing. It looks like it's not a Flink issue, but some JDK bug.
Others reported that upgrading JDK version (for example to jdk1.8.0_251)
seemed to be solving this problem. What JDK version are you using?
Piotrek
pt., 9 paź 2020 o 17:59 Piotr Nowojski napisał(a):
> Hi,
>
&g
Hi,
Thanks for reporting the problem. I think this is a known issue [1] on
which we are working to fix.
Piotrek
[1] https://issues.apache.org/jira/browse/FLINK-18196
pon., 5 paź 2020 o 08:54 Binh Nguyen Van napisał(a):
> Hi,
>
> I have a streaming job that is written in Apache Beam and uses
Hi Sunitha,
First and foremost, the DataSet API will be deprecated soon [1] so I would
suggest trying to migrate to the DataStream API. When using the DataStream
API it doesn't mean that you can not work with bounded inputs - you can.
Flink SQL (Blink planner) is in fact using DataStream API to
Hi,
It sounds more like an Intellij issue, not a Flink issue. But have you
checked your configured target language level for your modules?
Best regards,
Piotrek
pon., 28 wrz 2020 o 10:57 Lu Weizheng napisał(a):
> Hi all,
>
> I recently upgraded Intellij IEDA from 2019 to 2020.2 Community
Hi Prasanna,
As Theo has suggested on Stackoverflow, can you use multiple independent
jobs instead?
Piotrek
sob., 26 wrz 2020 o 19:17 Prasanna kumar
napisał(a):
> Hi,
>
> My requirement has been captured by the following stack overflow question.
>
>
>
d control streaming according to your
> first suggest.
>
> Piotr Nowojski 于2020年9月16日周三 下午11:56写道:
>
>> Hey,
>>
>> If you are worried about increased amount of buffered data by the
>> WindowOperator if watermarks/event time is not progressing uniformly across
>>
salunkhe
napisał(a):
> I would like to do performance testing for my flink job specially related
> with volume, how my flink job perform if more streaming data coming to my
> source connectors and measure benchmark for various operators?
>
> On Wed, 16 Sep 2020 at 12:03, Piotr
Hi,
Could it be related to https://issues.apache.org/jira/browse/FLINK-18223 ?
Also maybe as a workaround, is it working if you enable object reuse
(`StreamExecutionEnvironment#getConfig()#enableObjectReuse())`)?
Best regards
Piotrek
śr., 16 wrz 2020 o 08:09 Lian Jiang napisał(a):
> Hi,
>
>
Hi,
I'm not sure what you are asking for. We do not provide benchmarks for all
of the operators. We currently have a couple of micro benchmarks [1] for
some of the operators, and we are also setting up some adhoc benchmarks
when implementing various features. If you want to benchmark something
Hi,
Have you seen "Reinterpreting a pre-partitioned data stream as keyed
stream" feature? [1] However I'm not sure if and how can it be integrated
with the Table API. Maybe someone more familiar with the Table API can help
with that?
Piotrek
[1]
Hey,
If you are worried about increased amount of buffered data by the
WindowOperator if watermarks/event time is not progressing uniformly across
multiple sources, then there is little you can do currently. FLIP-27 [1]
will allow us to address this problem in more generic way. What you can
tps://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#network
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#network
pon., 14 wrz 2020 o 05:04 Josson Paul napisał(a):
> @Piotr Nowojski @Nico Kruber
> I have attached the Ta
and Cpu. Both are holding good.
>
> In Flin 1.8 I could reach only 160 Clients/Sec and the issue started
> happening. Issue started within 15 minutes of starting the ingestion. @Piotr
> Nowojski , you can see that there is no meta space
> related issue. All the GC related details are available
om Heap dump to show you the difference
> between Flink 1.4 and 1.8 the way HeapKeyedStateBackend is created. Not
> sure whether this change has something to do with this memory issue that I
> am facing.
> Name Flink-1.4.jpg for the 1.4 and Flink-1.8.jpg for 1.8
>
>
> Thanks,
>
et
> GCed/Finalized. Any change around this between Flink 1.4 and Flink 1.8.
>
> My understanding on back pressure is that it is not based on Heap memory
> but based on how fast the Network buffers are filled. Is this correct?.
> Does Flink use TCP connection to communicate between tas
4 and Beam 2.4.0
>
> Any insights into this will help me to debug further
>
> Thanks,
> Josson
>
>
> On Thu, Sep 3, 2020 at 3:34 AM Piotr Nowojski
> wrote:
>
>> Hi,
>>
>> Have you tried using a more recent Flink version? 1.8.x is no longer
>> sup
Hi,
This is a known problem. As of recently, there was no way to solve this
problem generically, for every source. This is changing now, as one of the
motivations behind FLIP-27, was to actually address this issue [1]. Note,
as of now, there are no FLIP-27 sources yet in the code base, but for
Hi,
Have you tried using a more recent Flink version? 1.8.x is no longer
supported, and latest versions might not have this issue anymore.
Secondly, have you tried backtracking those references to the Finalizers?
Assuming that Finalizer is indeed the class causing problems.
Also it may well be
Hi Ori,
No. Flink does it differently. Operators that are keeping track of late
events, are remembering the latest watermark. If a new element arrives with
even time lower compared to the latest watermark, it is marked as a late
event [1]
Piotrek
[1]
all operator instances
> from CANCELING to CANCELED is around 30s, do you have any ideas about
> this?
>
> Many thanks.
>
> Regards,
> Zhinan
>
> On Thu, 20 Aug 2020 at 21:26, Piotr Nowojski wrote:
> >
> > Hi,
> >
> > > I want to decompose the recove
bGraph
> (assuming you aren't changing the Zookeeper configuration).
>
> If you wish to update the job, then you should cancel it (along with
> creating a savepoint), which will clear the Zookeeper state, and then
> create a new deployment
>
> On 20/08/2020 15:43, Piotr Nowojski
Hi,
It's hard for me to help you debug your code, but as long as:
- you are using event time for processing records (in operators like
`WindowOperator`)
- you do not have late records
- you are replaying the same records
- your code is deterministic
- you do not rely on the order of the records
Hi,
It looks more like a dependency convergence issue - you have two
conflicting versions of
`org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest`
on the class path. Or you built your jar with one version and trying to
execute it with a different one.
Till is it some kind of a known
101 - 200 of 607 matches
Mail list logo