if I misunderstood. Thank you.
>
> Best Regards,
> Tony Wei
>
>
>
> 2018-06-09 9:45 GMT+08:00 TechnoMage :
>
>> Thank you all. This discussion is very helpful. It sounds like I can
>> wait for 1.6 though given our development status.
>>
>> Michael
>
teV1”.
>>>>
>>>> I have once implemented something like that here:
>>>>
>>>> https://github.com/pnowojski/flink/blob/bfc8858fc4b9125b8fc7
>>>> acd03cb3f95c000926b2/flink-streaming-java/src/main/java/org/
>>>> apache/flink/str
Since the predict function does only support DataSet and not DataStream,
>> on stackoverflow a flink contributor mentioned that this should be done
>> using a map/flatMap function.
>> Unfortunately I am not able to work this function out.
>>
>> It would be incredible for me if you could help me a little bit further!
>>
>> Kind regards and thanks in advance
>> Cederic Bosmans
>>
>
>
>
--
*David Anderson* | Training Coordinator | data Artisans
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
t;
>>> best,
>>> Zhengwen
>>>
>>> On Tue, Aug 28, 2018 at 7:35 PM Nicos Maris
wrote:
>>>>
>>>> Hi all,
>>>>
>>>>
>>>> How can I test in Java any streaming job that has a time window?
>>>>
-> Reload lookup data source into in-memory table
>> -> Continue processing
>>
>>
>> My concern around these are :
>>
>> 1) Possibly storing the same copy of data in every Task slots memory or
>> state backend(RocksDB in my case).
>> 2) Having a dedicated refresh thread for each subtask instance(possibly,
>> every Task Manager having multiple refresh thread)
>>
>> Am i thinking in the right direction? Or missing something very obvious?
>> It confusing.
>>
>> Any leads are much appreciated. Thanks in advance.
>>
>> Cheers,
>> Chirag
>>
>>
>> --
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> Custom big data solutions & training
>> Flink, Solr, Hadoop, Cascading & Cassandra
>>
>>
--
*David Anderson* | Training Coordinator | data Artisans
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
ime(long time, GlobalWindow window,
> TriggerContext ctx) throws Exception {
> return TriggerResult.CONTINUE;
> }
>
> @Override
> public TriggerResult onEventTime(long time, GlobalWindow window,
> TriggerContext ctx) throws Exception {
> return TriggerResult.CONTINUE;
> }
>
> @Override
> public void clear(GlobalWindow window, TriggerContext ctx) throws
> Exception {
> ctx.getPartitionedState(prevLunum).clear();
> }
> }
>
> I'm very grateful for your help.
>
> Regards,
>
> Daniel
>
>
--
*David Anderson* | Training Coordinator
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
Another solution to the watermarking issue is to write an
AssignerWithPeriodicWatermarks for the control stream that always returns
Watermark.MAX_WATERMARK as the current watermark. This produces watermarks
for the control stream that will effectively be ignored.
On Thu, Jan 3, 2019 at 9:18 PM
call
state.clear(), but in this way I can only clear the state for one key, and
actually I need a key group level cleanup. So I’m wondering is there any
best practice for my case? Thanks a lot!
>>
>> Best,
>> Paul Lam
>
>
--
David Anderson | Training Coordinator | data Artisans
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
it will be my last resort .
>
> I wonder if it’s possible to implement a clearAll() method for keyed
> states which clears user states for all namespaces, and does it violate the
> principle of keyed states?
>
> Thanks again!
>
> Best,
> Paul Lam
>
> 在 2018年9月14日,16
; Hi All,
>> Can someone show a good example of how watermarks need to be
>> generated when using EventTime windows? What happens if the watermark is
>> same as the timestamp? How does the watermark help in the window to be
>> triggered and what if watermarks a
/flink-docs-stable/ops/config.html#metrics
David Anderson | Training Coordinator
Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
On Sat, Mar 23, 2019 at 1:03 PM Padarn Wilson wrote:
>
> Hi User,
>
> I am runnin
t seem to be exported to datadog (should
> it be?)
>
>
>
> On Sat, Mar 23, 2019 at 8:43 PM David Anderson
> wrote:
>
>> Because latency tracking is expensive, it is turned off by default. You
>> turn it on by setting the interval; that looks something like this:
>
There are some step-by-step instructions for setting up the sql client in
https://training.ververica.com/setup/sqlClient.html, plus some examples.
When you run a Flink job from within an IDE, you end up running with a
LocalStreamEnvironment (rather than a remote cluster) that by default does
not provide the Web UI. If you want the Flink running in the IDE to have
its own dashboard, you can do this by adding this to your application:
What Watermarks do is to advance the event time clock. You can
consider a Watermark(t) as an assertion about the completeness of the
stream -- it marks a point in the stream and says that at that point,
the stream is (probably) now complete up to time t.
The autoWatermarkInterval determines how
I'm not sure I fully understand the scenario you envision. Are you
saying you want to have some sort of window that batches (and
deduplicates) up until a downstream map has finished processing the
previous deduplicated batch, and then the window should emit the new
batch?
If that's what you want,
The role of the watermarks in your job will be to trigger the closing
of the sliding event time windows. In order to play that role
properly, they should be based on the timestamps in the events, rather
than some arbitrary constant (L). The reason why the same object
is responsible for
If you still need help diagnosing the cause of the misbehavior, please
share more of the code with us.
On Wed, Aug 21, 2019 at 6:24 PM Eric Isling wrote:
>
> I should add that the behaviour persists, even when I force parallelism to 1.
>
> On Wed, Aug 21, 2019 at 5:19 PM Eric Isling wrote:
>>
e.o.gutierrez
> -- https://felipeogutierrez.blogspot.com
>
>
> On Wed, Aug 21, 2019 at 9:09 PM David Anderson wrote:
>>
>> What Watermarks do is to advance the event time clock. You can
>> consider a Watermark(t) as an assertion about the completeness of the
>> stream -
I believe there can be advantages and disadvantages in both
directions. For example, fewer containers with multiple slots reduces
the effort the Flink Master has to do whenever global coordination is
required, i.e., during checkpointing. And the network stack in the
task managers is optimized to
Yes, it's possible to sort a stream by the event time timestamps on
the events. This depends on having reliable watermarks -- as events
that are late can not be emitted in order.
There are several ways to accomplish this. You could, for example, use
a ProcessFunction, and implement the sorting
Krzysztof,
Note that if you want to have Flink treat these sequence numbers as event
time timestamps, you probably can, so long as they are generally
increasing, and there's some bound on how out-of-order they can be.
The advantage to doing this is that you might be able to use Flink SQL's
event
I'll attempt to answer your questions.
If we have allowed lateness to be greater than 0 (say 5), then if an event
> which arrives at window end + 3 (within allowed lateness),
> (a) it is considered late and included in the window function as a
> late firing ?
>
An event with a timestamp that
code.
On Wed, Dec 11, 2019 at 2:08 PM David Anderson wrote:
> I'll attempt to answer your questions.
>
> If we have allowed lateness to be greater than 0 (say 5), then if an event
>> which arrives at window end + 3 (within allowed lateness),
>> (a) it is considered
The reason why the watermark is not advancing is that
assignAscendingTimestamps is a periodic watermark generator. This
style of watermark generator is called at regular intervals to create
watermarks -- by default, this is done every 200 msec. With only a
tiny bit of data to process, the job
The Elasticsearch, HBase, and JDBC[1] table sinks all support streaming
UPSERT mode[2]. While not exactly the same as RETRACT mode, it seems like
this might get the job done (unless I'm missing something, which is
entirely possible).
David
[1]
[Note that this question is better suited for the user mailing list than
dev.]
In general using ListState and MapState is recommended rather than
using ValueState> or ValueState>.
Some of the state backends are able to optimize common access patterns for
ListState and MapState in ways that are
Gaurav,
I haven't used it for a couple of years, so I don't know if it still works,
but https://github.com/jgrier/flink-stuff/tree/master/flink-influx-reporter is
an influxdb reporter (wrapped around
https://github.com/davidB/metrics-influxdb/tree/master/src/main/java/metrics_influxdb)
that uses
> In kafka010, ConsumerRecord has a field named timestamp. It is
encapsulated in Kafka010Fetcher.
> How can I get the timestamp when I write a flink job?
Kafka010Fetcher puts the timestamps into the StreamRecords that wrap your
events. If you want to access these timestamps, you can use a
Watermarks are a tool for handling out-of-orderness when working with event
time timestamps. They provide a mechanism for managing the tradeoff between
latency and completeness, allowing you to manage how long to wait for any
out-of-orderness to resolve itself. Note the way that Flink uses these
The State Processor API goes a bit in the direction you asking about, by
making it possible to query savepoints.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html
Dear Flink Community!
For some time now Ververica has been hosting some freely available Apache
Flink training materials at https://training.ververica.com. This includes
tutorial content covering the core concepts of the DataStream API, and
hands-on exercises that accompany those explanations.
Hypothetically, yes, I think this is possible to some extent. You would
have to give up all the things that require a KeyedStream, such as timers,
and the RocksDB state backend. And performance would suffer.
As for the question of determining which key groups (and ultimately, which
keys) are
@Chesnay Flink doesn't seem to guarantee client-jobmanager compability,
even for bug-fix releases. For example, some jobs compiled with 1.9.0 don't
work with a cluster running 1.9.2. See
https://github.com/ververica/sql-training/issues/8#issuecomment-590966210 for
an example of a case when
he community should be aware of is that we also need to
> maintain the training material. Maybe you could share with us how much
> effort this usually is when updating the training material for a new Flink
> version, David?
>
> Cheers,
> Till
>
> On Fri, Apr 10, 2020 a
org and flink playgrounds respectively are the
>> best places to host this content.
>>
>> On Thu, Apr 9, 2020 at 2:56 PM Niels Basjes wrote:
>>
>>> Hi,
>>>
>>> Sounds like a very nice thing to have as part of the project ecosystem.
>
There are a few ways to pre-ingest data from a side input before beginning
to process another stream. One is to use the State Processor API [1] to
create a savepoint that has the data from that side input in its state. For
a simple example of bootstrapping state into a savepoint, see [2].
Another
ent
directory
- all these existence requests are S3 HEAD requests, which have very low
request rate limits
*David Anderson* | Training Coordinator
Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
On Fri, Mar 27, 2020 at 7
seeksst has already covered many of the relevant points, but a few more
thoughts:
I would start by checking to see if the checkpoints are failing because
they timeout, or for some other reason. Assuming they are timing out, then
a good place to start is to look and see if this can be explained by
Following up on Piotr's outline, there's an example in the documentation of
how to use a KeyedProcessFunction to implement an event-time tumbling
window [1]. Perhaps that can help you get started.
Regards,
David
[1]
There's no explicit, out-of-the-box support for autoscaling, but it can be
done.
Autoscaling came up a few times at the recent Virtual Flink Forward,
including a talk on Autoscaling Flink at Netflix [1] by Timothy Farkas.
[1] https://www.youtube.com/watch?v=NV0jvA5ZDNc
Regards,
David
On Mon,
With the FsStateBackend you could also try increasing the value
of state.backend.fs.memory-threshold [1]. Only those state chunks that are
larger than this value are stored in separate files; smaller chunks go into
the checkpoint metadata file. The default is 1KB, increasing this
should reduce
Humberto,
Although Flink CEP lacks notFollowedBy at the end of a Pattern, there is a
way to implement this by exploiting the timeout feature.
The Flink training includes an exercise [1] where the objective is to
identify taxi rides with a START event that is not followed by an END event
within
/alpinegizmo/ff3d2e748287853c88f21259830b29cf for
my version.
*David Anderson* | Training Coordinator
Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
On Thu, Mar 19, 2020 at 2:54 AM Dmitry Minaev wrote:
> Hi every
Map applies a MapFunction (or a RichMapFunction) to a DataStream and does a
one-to-one transformation of the stream elements.
Process applies a ProcessFunction, which can produce zero, one, or many
events in response to each event. And when used on a keyed stream, a
KeyedProcessFunction can use
ad set the parallelism to 1
> >job wide, but when I reran the task, I added your line. Are there any
> >circumstances where despite having the global level set to 1, you
> >still need to set the level on individual operators?
> >
> >PS: I sent this to you directly I'm so
Pankaj,
The Flink web UI doesn't do any visualizations of histogram metrics, so the
only way to access the latency metrics is either through the REST api or a
metrics reporter.
The REST endpoint you tried is the correct place to find these metrics in
all recent versions of Flink, but somewhere
ommit ID: 7eb514a.
> Is it possible that the default SocketWindowWordCount job is too simple to
> generate Latency metrics? Or that the latency metrics disappear from the
> output JSON when the data ingestion is zero?
>
> Thanks,
>
> Pankaj
>
>
> On Wed, Sep 9, 2020 at 6:27 AM David An
Heidy, which state backend are you using? With RocksDB Flink will have to
do ser/de on every access and update, but with the FsStateBackend, your
sparse matrix will sit in memory, and only have to be serialized during
checkpointing.
David
On Wed, Sep 9, 2020 at 2:41 PM Heidi Hazem Mohamed
Arti,
The problem with watermarks and the File source operator will be fixed in
1.11.2 [1]. This bug was introduced in 1.10.0, and isn't related to the new
WatermarkStrategy api.
[1] https://issues.apache.org/jira/browse/FLINK-19109
David
On Wed, Sep 9, 2020 at 2:52 PM Arti Pande wrote:
> Hi
The details of what can go wrong will vary depending on the precise
scenario, but no, Flink is unable to provide any such guarantee. Doing so
would require being able to control the scheduling of various threads
running on different machines, which isn't possible.
Of course, if event A becomes
s, which is not happening in my case.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/monitoring/metrics.html#latency-tracking
>>
>> On Wed, Sep 9, 2020 at 10:08 AM David Anderson
>> wrote:
>>
>>> Pankaj,
>>>
>
Dan,
The first point you've raised is a known issue: When a job is stopped, the
unfinished part files are not transitioned to the finished state. This is
mentioned in the docs as Important Note 2 [1], and fixing this is waiting
on FLIP-46 [2]. That section of the docs also includes some
uld happen, happy to submit a
> patch.
>
>
>
>
> On Wed, Oct 7, 2020 at 1:37 AM David Anderson
> wrote:
>
>> Dan,
>>
>> The first point you've raised is a known issue: When a job is stopped,
>> the unfinished part files are not transitioned to the f
; could you tell me where to set RestOption.POPT?in configuration
>> what's the value should I set for RestOption.PORT?
>>
>> Thanks.
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Arvid Heise" ;
>> *发送时间:* 2020年10月9日(星期五) 下午3:00
For a dockerized playground that includes a dataset, many working examples,
and training slides, see [1].
[1] https://github.com/ververica/sql-training
David
On Thu, Oct 15, 2020 at 10:18 AM Piotr Nowojski
wrote:
> Hi,
>
> The examples in the documentation are not fully functional. They
It looks like you were trying to resume from a checkpoint taken with the
FsStateBackend into a revised version of the job that uses the
RocksDbStateBackend. Switching state backends in this way is not supported:
checkpoints and savepoints are written in a state-backend-specific format,
and can
kEnd)?
>
> Much Thanks~!
>
>
> -- 原始邮件 --
> 发件人: "大森林" ;
> 发送时间: 2020年10月2日(星期五) 晚上11:41
> 收件人: "David Anderson";
> 抄送: "user";
> 主题: 回复: need help about "incremental checkpoint",Thanks
&g
eBackend.*
> *It's NOT a match.*
> So I'm wrong in step 5?
> Is my above understanding right?
>
> Thanks for your help.
>
> ------ 原始邮件 --
> *发件人:* "David Anderson" ;
> *发送时间:* 2020年10月2日(星期五) 晚上10:35
> *收件人:* "大森林";
> *抄送
.
Best,
David
On Mon, Oct 5, 2020 at 2:38 PM 大森林 wrote:
> Could you give more details?
> Thanks
>
>
> -- 原始邮件 --
> *发件人:* "大森林" ;
> *发送时间:* 2020年10月3日(星期六) 上午9:30
> *收件人:* "David Anderson";
> *抄送:* "user";
&g
I think the pertinent question is whether there are interesting cases where
the BucketingSink is still a better choice. One case I'm not sure about is
the situation described in docs for the StreamingFileSink under Important
Note 2 [1]:
... upon normal termination of a job, the last
erstand both:
>
> ----
> Dear David Anderson:
> Is the whole command like this?
> flink run *--backpressure* -c wordcount_increstate
> d
t 5:32 PM 大森林 wrote:
>
> Could I use your command with no docker?
>
> -- 原始邮件 ------
> *发件人:* "David Anderson" ;
> *发送时间:* 2020年10月10日(星期六) 晚上10:30
> *收件人:* "大森林";
> *抄送:* "Arvid Heise";"user";
Piper,
1. Thanks for reporting the problem with the broken links. I've just fixed
this.
2. The exercises were recently rewritten so that they no longer use the old
file-based datasets. Now they use data generators that are included in the
project. As part of this update, the schema was modified
or be ignored (but not throw an exception). I
> could also see ignoring the timers in batch as the default, if this
> makes more sense.
>
> By the way, do you have any usecases in mind that will help us better
> shape our processing time timer handling?
>
> Kostas
>
> On
You should begin by trying to identify the cause of the backpressure,
because the appropriate fix depends on the details.
Possible causes that I have seen include:
- the job is inadequately provisioned
- blocking i/o is being done in a user function
- a huge number of timers are firing
Teodor,
This is happening because of the way that readTextFile works when it is
executing in parallel, which is to divide the input file into a bunch of
splits, which are consumed in parallel. This is making it so that the
watermark isn't able to move forward until much or perhaps all of the file
The purpose of the reduce() and aggregate() methods on windows is to allow
for incremental computation of window results. This has two principal
advantages: (1) the computation of the results is spread out, rather than
occurring all in one go at the end of each window, thereby reducing the
On Wed, Aug 26, 2020 at 7:41 PM David Anderson
wrote:
> You should begin by trying to identify the cause of the backpressure,
> because the appropriate fix depends on the details.
>
> Possible causes that I have seen include:
>
> - the job is inadequately provisioned
> - block
in the
> new repo.
>
> Best,
> Piper
>
> On Sun, Aug 23, 2020 at 10:15 AM David Anderson
> wrote:
>
>> Piper,
>>
>> 1. Thanks for reporting the problem with the broken links. I've just
>> fixed this.
>>
>> 2. The exercises were recently r
I assume that along with DataStream#fold you would also
remove WindowedStream#fold.
I'm in favor of going ahead with all of these.
David
On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
wrote:
> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the
>
Kostas,
I'm pleased to see some concrete details in this FLIP.
I wonder if the current proposal goes far enough in the direction of
recognizing the need some users may have for "batch" and "bounded
streaming" to be treated differently. If I've understood it correctly, the
section on scheduling
Steven,
I'm pretty sure this is a scenario that doesn't have an obvious good
solution. As you have discovered, the window API isn't much help; using a
process function does make sense. The challenge is finding a data structure
to use in keyed state that can be efficiently accessed and updated.
If you were to use per-partition watermarking, which you can do by
calling assignTimestampsAndWatermarks directly on the Flink Kafka consumer
[1], then I believe the idle partition(s) would consistently hold back the
overall watermark.
With per-partition watermarking, each Kafka source task will
The performance loss being referred to there is reduced throughput.
There's a blog post by Nico Kruber [1] that covers Flink's network stack in
considerable detail. The last section on latency vs throughput gives some
more insight on this point. In the experiment reported on there, the
difference
If your job crashes, Flink will automatically restart from the latest
checkpoint, without any manual intervention. JobManager HA is only needed
for automatic recovery after the failure of the Job Manager.
You only need externalized checkpoints and "-s :checkpointPath" if you want
to use
ProcessWindowFunction#process is passed a Context object that contains
public abstract KeyedStateStore windowState();
public abstract KeyedStateStore globalState();
which are available for you to use for custom state. Whatever you store in
windowState is scoped to a window, and is cleared
Jincheng,
One thing that I like about the way that the documentation is currently
organized is that it's relatively straightforward to compare the Python API
with the Java and Scala versions. I'm concerned that if the PyFlink docs
are more independent, it will be challenging to respond to
Flavio,
Have you looked at application mode [1] [2] [3], added in 1.11? It offers
at least some of what you are looking for -- the application jar and its
dependencies can be pre-uploaded to HDFS, and the main() method runs on the
job manager, so none of the classes have to be loaded in the
ent structure.
>
> >> it's relatively straightforward to compare the Python API with the
> Java and Scala versions.
>
> Regarding the comparison between Python API and Java/Scala API, I think
> the majority of users, especially the beginner users, would not have this
> deman
It sounds like you would like to have something like event-time-based
windowing, but with independent watermarking for every key. An approach
that can work, but it is somewhat cumbersome, is to not use watermarks or
windows, but instead put all of the logic in a KeyedProcessFunction (or
Vijay,
There's a section of the docs that describes some strategies for writing
tests of various types, and it includes some Scala examples [1].
There are also some nice examples from Konstantin Knauf in [2], though they
are mostly in Java.
[1]
With an AscendingTimestampExtractor, watermarks are not created for every
event, and as your job starts up, some events will be processed before the
first watermark is generated.
The impossible value you see is an initial value that's in place until the
first real watermark is available. On the
You can study LocalStreamingFileSinkTest [1] for an example of how to
approach this. You can use the test harnesses [2], keeping in mind that
- initializeState is called during instance creation
- the provided context indicates if state is being restored from a snapshot
- snapshot is called when
In this use case, couldn't the custom trigger register an event time timer
for MAX_WATERMARK, which would be triggered when the bounded input reaches
its end?
David
On Mon, Jul 20, 2020 at 5:47 PM Piotr Nowojski wrote:
> Hi,
>
> I'm afraid that there is not out of the box way of doing this.
Setting up a Flink metrics dashboard in Grafana requires setting up and
configuring one of Flink's metrics reporters [1] that is supported by
Grafana as a data source. That means your options for a metrics reporter
are Graphite, InfluxDB, Prometheus, or the Prometheus push reporter.
If you want
If the rules can be implemented by examining events in isolation (e.g.,
temperature > 25), then the DataStream API is all you need. But if you want
rules that are looking for temporal patterns that play across multiple
events, then CEP or MATCH_RECOGNIZE (part of Flink SQL) will simplify
the
You could approach testing this in the same way that Flink has implemented
its unit tests for KeyedBroadcastProcessFunctions, which is to use
a KeyedTwoInputStreamOperatorTestHarness with
a CoBroadcastWithKeyedOperator. To learn how to use Flink's test harnesses,
see [1], and for an example of
Backpressure is typically caused by something like one of these things:
* problems relating to i/o to external services (e.g., enrichment via an
API or database lookup, or a sink)
* data skew (e.g., a hot key)
* under-provisioning, or competition for resources
* spikes in traffic
* timer storms
Every job is required to have a sink, but there's no requirement that all
output be done via sinks. It's not uncommon, and doesn't have to cause
problems, to have other operators that do I/O.
What can be problematic, however, is doing blocking I/O. While your user
function is blocked, the
lly by the
> WatermarkAssignerOperator.
>
> Piotrek
>
> pon., 27 lip 2020 o 09:16 Flavio Pompermaier
> napisał(a):
>
>> Yes it could..where should I emit the MAX_WATERMARK and how do I detect
>> that the input reached its end?
>>
>> On Sat,
When you modify a job by removing a stateful operator, then during a
restart when Flink tries to restore the state, it will complain that the
snapshot contains state that can not be restored.
The solution to this is to restart from the savepoint (or retained
checkpoint), specifying that you want
I believe this should work, with a couple of caveats:
- You can't do this with unaligned checkpoints
- If you have dropped some state, you must specify --allowNonRestoredState
when you restart the job
David
On Wed, Jul 22, 2020 at 4:06 PM Sivaprasanna
wrote:
> Hi,
>
> We are trying out state
You should be able to tune your setup to avoid the OOM problem you have run
into with RocksDB. It will grow to use all of the memory available to it,
but shouldn't leak. Perhaps something is misconfigured.
As for performance, with the FSStateBackend you can expect:
* much better throughput and
Prasanna,
The Flink project does not have an SQS connector, and a quick google search
hasn't found one. Nor does Flink have an HTTP sink, but with a bit of
googling you can find that various folks have implemented this themselves.
As for implementing SQS as a custom sink, if you need exactly
; between 136kb and 2.45mb of state, with a checkpoint duration of 280ms to 2
> seconds. Each of the 32 subtasks appear to be handling 1,700-50,000 records
> an hour with a bytes received of 7mb and 170mb
>
>
>
> Am I barking up the wrong tree?
>
>
>
> -Steve
>
>
When Flink is running on a cluster, `DataStream#print()` prints to files in
the log directory.
Regards,
David
On Tue, Nov 24, 2020 at 6:03 AM Pankaj Chand
wrote:
> Please correct me if I am wrong. `DataStream#print()` only prints to the
> screen when running from the IDE, but does not work
taskmanager.cpu.cores is intended for internal use only -- you aren't meant
to set this option. What happens if you leave it alone?
Regards,
David
On Sat, Dec 5, 2020 at 8:04 AM Rex Fenley wrote:
> We're running this in a local environment so that may be contributing to
> what we're seeing.
>
nt()` but I don't quite understand how to
> > implement it. Could you please give me an example? I'm using Intellij so
> > what I would need is just to see the data on my screen.
> >
> > Thanks
> >
> >
> >
RocksDB does do compaction in the background, and incremental checkpoints
simply mirror to S3 the set of RocksDB SST files needed by the current set
of checkpoints.
However, unlike checkpoints, which can be incremental, savepoints are
always full snapshots. As for why one host would have much
1 - 100 of 227 matches
Mail list logo