Hi Garrett,
I agree, there seems to be an issue and increasing the timeout should not
be the right approach to solve it.
Are you running streaming or batch jobs, i.e., do some of the tasks finish
much earlier than others?
I'm adding Till to this thread who's very familiar with scheduling and
Hi Dulce,
This functionality is not supported by the JDBCOutputFormat.
Some database systems (AFAIK, MySQL) support Upsert writes, i.e., writes
that insert if the primary key is not present or update the row if the PK
exists. Not sure if that would meet your requirements.
If you don't want to go
Hi Vishal,
In general, Kryo serializers are not very upgrade friendly.
Serializer compatibility [1] might be right approach here, but Gordon (in
CC) might know more about this.
Best, Fabian
[1]
Hi Avihai,
Rafi pointed out the two common approaches to deal with this situation. Let
me expand a bit on those.
1) Transactional producing in to queues: There are two approaches to
accomplish exactly-once producing into queues, 1) using a system with
transactional support such as Kafka or 2)
gt; see an incorrect value from a dashboard.
> This is the biggest concern of mine at this point.
>
> Best,
>
> - Dongwon
>
>
> On Tue, Jun 19, 2018 at 7:14 PM, Fabian Hueske wrote:
>
>> Hi Dongwon,
>>
>> Do you need to number of active
. Is it hard to implement ? I am a new to flink table api & sql.
>
> Best Minglei.
>
> 在 2018年6月19日,下午10:36,Fabian Hueske 写道:
>
> Hi,
>
> Which version are you using? We fixed a similar issue for Flink 1.5.0.
> If you can't upgrade yet, you can also implement a user-def
Hi,
Which version are you using? We fixed a similar issue for Flink 1.5.0.
If you can't upgrade yet, you can also implement a user-defined function
that evaluates the big CASE WHEN statement.
Best, Fabian
2018-06-19 16:27 GMT+02:00 zhangminglei <18717838...@163.com>:
> Hi, friends.
>
> When I
oach wasn't driven by the requirements but by operational
> aspects (state size), so using a concept like idle state retention time
> would be a more natural fit.
>
> Thanks,
>
> Johannes
>
> On Mon, Jun 18, 2018 at 9:57 AM Fabian Hueske wrote:
>
>> Hi Johannes,
tate inside a trigger.
> TriggerContext only allows to interact with state that is scoped to the
> window and the key of the current trigger invocation (as shown in
> Trigger#TriggerContext)
>
> Now I've come to a conclusion that it might not be possible using
> DataStream API.
>
h
> more TM in play.
>
> @Ovidiu question is interesting to know too. @Till do you mind to share
> your thoughts?
>
> Thank you guys!
>
> --
> *From:* Ovidiu-Cristian MARCU
> *Sent:* Monday, June 18, 2018 6:28 PM
> *To:* Fabian Hueske
>
perfectly 1-2-4-8-16 because all happens in same TM. When
> scale to 32 the performance drop, not even in par with case of parallelism
> 16. Is this something expected? Thank you.
>
> Regards,
> Yow
>
> --
> *From:* Fabian Hueske
> *Sent:* Mon
Hi Johannes,
EventTimeSessionWindows [1] use the EventTimeTrigger [2] as default trigger
(see EventTimeSessionWindows.getDefaultTrigger()).
I would take the EventTimeTrigger and extend it with early firing
functionality.
However, there are a few things to consider
* you need to be aware that
Hi,
Which Flink version are you using?
Did you try to analyze the bottleneck of the application, i.e., is it CPU,
disk IO, or network bound?
Regarding the task scheduling. AFAIK, before version 1.5.0, Flink tried to
schedule tasks on the same machine to reduce the amount of network transfer.
Hi,
At the moment (Flink 1.5.0), the operator UIDs depend on the overall
application and not only on the query.
Hence, changing the application by adding another query might change the
existing UIDs.
In general, you can only expect savepoint restarts to work if you don't
change the application
Hi everyone,
*Flink Forward Berlin 2018 will take place from September 3rd to 5th.*
The conference will start with one day of training and continue with two
days of keynotes and talks.
*The registration for Flink Forward Berlin 2018 is now open!*
A limited amount of early-bird passes is
s caused by the fact
> that Memory states are large as it is throwing error states are larger than
> certain size. So solution of (1) will possibly solve (2) as well.
>
> Thanks again,
>
> Ashish
>
>
> On Jun 7, 2018, at 4:25 PM, Fabian Hueske wrote:
>
> H
Hi Angelica,
The Flink cluster needs to provide a sufficient number of slots to process
the tasks of all submitted jobs.
Besides that there is no limit. However, if you run super many jobs, you
might need to tune a few configuration parameters.
Best, Fabian
2018-06-12 8:46 GMT+02:00 Sampath
Hi Antonio,
Cascading window aggregations as done in your example is a good idea and is
preferable if the aggregation function is combinable, which is true for sum
(count can be done as sum of 1s).
Best, Fabian
2018-06-09 4:00 GMT+02:00 antonio saldivar :
> Hello
>
> Has anyone work this way?
ctual timestamps of their input data. For me it was helpful to make
> this change in my Flink job: for late data output, include both processing
> time (DateTime.now()) along with the event time (original timestamp).
>
> On Mon, May 14, 2018 at 12:42 PM, Fabian Hueske wrote:
>
>
Hi Ashish,
Thanks for the great write up.
If I understood you correctly, there are two different issues that are
caused by the disabled checkpointing.
1) Recovery from a failure without restarting all operators to preserve the
state in the running tasks
2) Planned restarts an application without
Hi Turar,
Managed state is a general concept in Flink's DataStream API and not
specifically designed for windows (although they use internally).
I'd recommend the broadcast state that Aljoscha proposed. It was
specifically designed for these use cases.
It is true that the state is currently
Hi,
Flink uses a few libraries that allocate direct (off-heap) memory (Netty,
RocksDB). Flink can also allocate direct memory by itself (only relevant
for batch setups though).
Therefore, Xmx controls only one part of Flink's memory footprint.
Best, Fabian
2018-06-04 16:48 GMT+02:00 aitozi :
>
One reason is that we shade away several of dependencies to avoid version
conflicts with user dependencies or dependencies of internal dependencies.
Best, Fabian
2018-06-05 4:07 GMT+02:00 makeyang :
> thanks rongrong, but it seems unrelevant.
>
>
>
> --
> Sent from:
28150685815] [label:state_timeout] ontimer at
> 1528150685813(CLEANUP_TIME_2), clean/empty the rowMapState
> [{key:1528149885813, value:[Row:(orderId:002,userId:U123)]}]*
>
>
>
>
>
>
>
>
> [1] : https://issues.apache.org/jira/browse/FLINK-9524
>
>
> -
Hi,
The continuous file source is split into two components. 1) A split
generator that monitors a directory and generates splits when a new file is
observed, and 2) reading tasks that receive splits and read the referenced
files.
I think this is the code that generates input splits which are
Hi Nik,
Can you have a look at this JIRA ticket [1] and check if it is related to
the problems your are facing?
If so, would you mind leaving a comment there?
Thank you,
Fabian
[1] https://issues.apache.org/jira/browse/FLINK-8946
2018-05-31 4:41 GMT+02:00 Nikolas Davis :
> We keep track of
Hi everybody,
Due to popular demand, we've extended the Call for Presentations for Flink
Forward Berlin 2018 by one week.
The call will close on *Monday, June 11* (11:59pm CEST).
Please submit a proposal to present your Flink and Stream Processing use
case, experiences, and best practices in
Hi,
The release notes state that "multiple slots are not *fully* supported".
In Flink 1.5.0, the configured number of slots is ignored when requesting
containers for TaskManagers from a resource manager, i.e., Flink assumes
TMs with 1 slot.
Hence, Flink request too many containers and starts too
the out of order events number is very high
> though :thinking_face:
>
>
>
>
>
> On Wed, May 30, 2018 at 1:55 PM, Fabian Hueske wrote:
>
>> Hi Nara and Sihua,
>>
>> That's indeed an unexpected behavior and it would be good to identify the
>> reason for
Hi Isabelle,
Welcome to the Flink user mailing list!
You are mixing up the two ways to specify a function:
1. Defining a function as a class / object and passing an instance in the
map() method. Given your CustomMapFunction class, this looks as follows:
stream.keyBy(...).map(new
or ProcTimeBoundedRangeOver. I will update
> with my test result and fire a JIRA after that.
>
>
> Best
>
> Yan
> ------
> *From:* Fabian Hueske
> *Sent:* Wednesday, May 30, 2018 1:43:01 AM
> *To:* Dawid Wysakowicz
> *Cc:* user
> *Subje
Hi,
Dawid's analysis is certainly correct, but looking at the code this should
not happen.
I have a few questions:
- You said this only happens if you configure idle state retention times,
right?
- Does the error occur the first time without a previous recovery?
- Did you run the same query on
Hi,
It is mandatory for all DataStream programs and most DataSet programs.
Exceptions are ExecutionEnvironment.print() and
ExecutionEnvironment.collect().
Both methods are defined on the DataSet ExecutionEnvironment and call
execute() internally.
Best, Fabian
2018-05-29 12:31 GMT+02:00 Esa
Hi all,
This is the final reminder for the call for presentations for Flink Forward
Berlin 2018.
*The call closes in 7 days* (June 4, 11:59 PM CEST).
Submit your talk and get to present your Apache Flink and stream processing
ideas, experiences, use cases, and best practices on September 4-5 in
Hi,
Jörn is probably right.
In contrast to print(), which immediately triggers an execution,
writeToSink() just appends a sink operator and requires to explicitly
trigger the execution.
The INFO messages of the TypeExtractor are "just" telling you, that Row
cannot be used as a POJO type, but
I agree, this should be fixed.
Thanks for noticing, Dhruv.
Would you mind creating a JIRA for this?
Thank you,
Fabian
2018-05-28 8:39 GMT+02:00 Bowen Li :
> Hi Dhruv,
>
> I can see it's confusing, and it does seem the comment should be improved.
> You can find concrete
Hi Chirag,
There have been some issue with very large execution graphs.
You might need to adjust the default configuration and configure larger
Akka buffers and/or timeouts.
Also, 2000 sources means that you run at least 2000 threads at once.
The FileInputFormat (and most of its sub-classes) in
Thank you Till for serving as a release manager for Flink 1.5!
2018-05-25 19:46 GMT+02:00 Till Rohrmann :
> Quick update: I had to update the date of the release blog post which also
> changed the URL. It can now be found here:
>
>
epends on its
> location in the graph and its input/output.
>
>
> Best
>
> Yan
> ------
> *From:* Fabian Hueske <fhue...@gmail.com>
> *Sent:* Wednesday, May 23, 2018 3:18:08 AM
> *To:* Yan Zhou [FDS Science]
> *Cc:* user@flink.apache.org
>
Hi Vishal,
Release candidate 5 (RC5) has been published and the voting period ends
later today.
Unless we find a blocking issue, we can push the release out today.
FYI, if you are interested in the release progress, you can subscribe to
the dev mailing list (or just check out the archives at
Hi Andrei,
With the current version of Flink, there is no general solution to this
problem.
The upcoming version 1.5.0 of Flink adds a feature called credit-based flow
control which might help here.
I'm adding @Piotr to this thread who knows more about the details of this
new feature.
Best,
Hi,
I've posted an answer on SO.
Best, Fabian
2018-05-22 18:11 GMT+02:00 Shimony, Shay :
> Hi everyone,
>
>
>
> I have this question in StackOverflow, and would be happy if you could
> answer.
>
> https://stackoverflow.com/questions/50340107/order-of-
>
Hi Esteban,
If you need the parameters to configure specific operators (instead of the
over all flow), you could pass the parameters as a file using the
distributed cache [1].
Note, the docs point to the DataSet (batch) API, but the feature works the
same way for DataStream programs as well.
Hi,
At the moment, you can only restore a query from a savepoint if the query
is not modified and the same Flink version is used.
Since SQL queries are automatically translated into data flows, it is not
transparent to the user, which operators will be created.
We would need to expose an
Hi Gregory,
Rong's analysis is correct. The UNION with duplicate elimination is
translated into a UNION ALL and a subsequent grouping operator on all
attributes without an aggregation function.
Flink assumes that all grouping operators can produce retractions (updates)
and window-grouped
Hi all,
Flink Forward returns to Berlin on September 3-5, 2018.
We are happy to announce the Call for Presentations is now open!
Please submit a proposal if you'd like to present your Apache Flink
experience, best practices or use case in front of an international
audience of highly skilled and
Functions with different parallelism cannot be chained.
Chaining means that Functions are fused into a single operator and that
records are passed by method calls (instead of serializing them into an
in-memory or network channel).
Hence, chaining does not work if you have different parallelism.
<Integer,
> Long>> {
> private static final long serialVersionUID = 1L;
>
> @Override
> public Tuple2<Integer, Long> reduce(Tuple2<Integer, Long> tup1,
> Tuple2<Integer, Long> tup2) throws Exception {
> Tuple2<Integer, Long> retTup = tup1;
I think this would be a very good feature.
There's a pretty old JIRA for it [1].
It's even from pre-Apache times because it was imported from the original
Github repository.
Cheers, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-766
2018-05-14 16:46 GMT+02:00 Flavio Pompermaier
Hi Ashish,
Did you use per-window state (also called partitioned state) in your
Trigger?
If yes, you need to make sure that it is completely removed in the clear()
method because processing time timers won't fire once a window was purged.
So you cannot (fully) rely on timers to clean up
ave
>> any idea on this? Is there some existing test that simulates out of order
>> input to flink's kafka consumer? I could try to build a test case based on
>> that to possibly reproduce my problem. I'm not sure how to gather enough
>> debug information on the production str
Hi,
Avro provides schema for data and can be used to serialize individual
records in a binary format.
It does not compress the data (although this can be put on top) but is more
space efficient due to the binary serialization.
I think you can implement a Writer for the BucketingSink that writes
Hi Juho,
Thanks for bringing up this topic! I share your intuition.
IMO, records should only be filtered out and send to a side output if any
of the windows they would be assigned to is closed already.
I had a look into the code and found that records are filtered out as late
based on the
Great, thank you!
2018-05-11 10:31 GMT+02:00 Juho Autio <juho.au...@rovio.com>:
> Thanks Fabian, here's the ticket:
> https://issues.apache.org/jira/browse/FLINK-9335
>
> On Wed, May 2, 2018 at 12:53 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Juh
Hi Varun,
The focus of the DataSet execution is on robustness. The smaller DataSet is
stored serialized in memory.
Also most of the communication happens via serialization (instead of
passing object references).
The serialization overhead should have a significant overhead compared to a
Hi,
>From the dev perspective there hasn't been done much on that component
since a long time [1].
Are there any users of this feature on the user list and can comment on how
it works for them?
Best, Fabian
[1] https://github.com/apache/flink/commits/master/flink-contrib/flink-storm
2018-05-11
Hi Peter,
Building the state for a DataStream job in a DataSet (batch) job is
currently not possible.
You can however, implement a DataStream job that reads batch data and
builds the state. When all data was processed, you'd need to save the state
as a savepoint and can resume a streaming job
L together in Scala code ?
>
>
>
> Best, Esa
>
>
>
> *From:* Fabian Hueske <fhue...@gmail.com>
> *Sent:* Tuesday, May 8, 2018 10:26 PM
>
> *To:* Esa Heikkinen <esa.heikki...@student.tut.fi>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-files in pa
tructure of main program ?
>
> I did mean, if I want to read many csv-files and I have certain
> consecutive reading order of them. Is that possible and how ?
>
>
>
> Actually I want to implement upper level (state-machine-based) logic for
> reading csv-files by certain order.
are running a custom build of Flink. Which version did
you base your build on?
Best, Fabian
2018-05-08 17:41 GMT+02:00 Chan, Regina <regina.c...@gs.com>:
> There’s no collect() explicitly from me. It has a cogroup operator before
> writing to DataSink.
>
>
>
>
>
> *Fro
Hi Helmut,
In fact this is possible with the DataSet API. However, AFAIK it is an
undocumented feature and probably not widely used.
You can do this by specifying so-called SplitDataProperties on a DataSource
as follows:
DataSource src = env.createInput(...);
SplitDataProperties splitProps =
at read csv-files
> parallel ?
>
>
>
> Best, Esa
>
>
>
> *From:* Fabian Hueske <fhue...@gmail.com>
> *Sent:* Monday, May 7, 2018 3:48 PM
> *To:* Esa Heikkinen <esa.heikki...@student.tut.fi>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Reading csv-
Hi,
Flink will automatically stop the execution of a DataStream program once
all sources have finished to provide data, i.e., when all SourceFunction
return from the run() method.
The DeserializationSchema.isEndOfStream() method can be used to tell a
built-in SourceFunction such as a
Hi Esa,
you can certainly read CSV files in parallel. This works very well in a
batch query.
For streaming queries, that expect data to be ingested in timestamp order
this is much more challenging, because you need 1) read the files in the
right order and 2) cannot split files (unless you
Hi Andre,
Sharing a Zookeeper cluster between Kafka and Flink should be OK.
If you're running just one cluster, you could in principle keep the default.
However, I'd change the configuration just in case.
Otherwise, you might get into trouble when you (accidentally) run another
Flink setup.
Hi Derek,
1. I've created a JIRA issue to improve the docs as you recommended [1].
2. This discussion goes quite a bit into the internals of the HA setup. Let
me pull in Till (in CC) who knows the details of HA.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-9309
2018-05-05
Hi Dongwon,
I see that you are using the latest master (Flink 1.6-SNAPSHOT).
This is a known problem in the new FLIP-6 mode. The ResourceManager tries
to allocate too many resources, basically on TM per required slot, i.e., it
does not take the number of slots per TM into account.
The resources
Hi,
FoldFunction was deprecated because it doesn't support partial aggregation.
AggregateFunction is much more expressive, however requires a bit more
implementation effort.
In favor of a concise API, FoldFunction was deprecated because it doesn't
offer more functionality than AggregateFunction.
Hi Peter,
State initialization with with historic data is a use case that's coming up
more and more.
Unfortunately, there's no good solution for this yet but just a couple of
workaround that require careful design and work for all cases.
There was a talk about exactly this problem and some ideas
Hi,
Most, but not all, of the FLIP-6 features will be released with Flink 1.5.0.
I'm not sure if this deployment mode will be fully supported in 1.5.0. Gary
(in CC) might know details here.
Anyway, the deployment would work by starting the image using regular
Docker/Kubernetes tools.
The image
Hi Regina,
I see from the logs that you are using the DataSet API.
Are you trying to fetch a large result to your client using the collect()
method?
Best, Fabian
2018-05-02 0:38 GMT+02:00 Chan, Regina :
> Hi,
>
>
>
> I’m running a single TM with the following params -yn 1
mples in the documentation referenced below
> have a number of bugs, see FLINK-9299
> <https://issues.apache.org/jira/browse/FLINK-9299>.
>
>
> On May 4, 2018, at 7:35 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
> Hi Ken,
>
> You can also
Hi Ken,
You are right. The merge() method combines partial aggregates, similar to a
combinable reducer.
The only situation when merge() is called in a DataStream job (that I am
aware of) is when session windows get merged.
For example when you define a session window with 30 minute gap and you
isk after the
> flatMaps?
>
> On Fri, May 4, 2018 at 6:02 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> That will happen if you join (or coGroup) the branched DataSets, i.e.,
>> you have branching and merging pattern in your stream.
>>
>> The problem in th
usage during the job and it was corresponding exactly to the size estimated
> by the Flink UI, that is twice it's initial size).
> Probably there are no problem until you keep data in memory but in my case
> it's very problematic this memory explosion :(
>
> On Fri, May 4, 2018 at 5:14 PM, Fabian Huesk
Hi Flavio,
No, there's no way around it.
DataSets that are processed by more than one operator cannot be processed
by chained operators.
The records need to be copied to avoid concurrent modifications. However,
the data should not be shipped over the network if all operators have the
same
Hi Ken,
You can also use an additional ProcessWindowFunction [1] that is applied on
the result of the AggregateFunction to set the key.
Since the PWF is only applied on the final result, there no overhead
(actually, it might even be slightly cheaper because the AggregateFunction
can be simpler).
Hi,
Union is not an actual operator in Flink. Instead, the operator that is
applied on the unioned stream ingests its input from all unioned streams.
The parallelism of that operator is the configured default parallelism (can
be specified at the execution environment) unless it is explicitly
Hi,
Flink can add events to mulitple windows. For instance, the built-in
sliding windows are doing this.
You can address your use case by implementing a custom WindowAssigner [1].
Best, Fabian
[1]
Hi Wouter,
you can try to make the SerializationSchema serializable by overriding
Java's serialization methods writeObject() and readObject() similar as
Flink's AvroRowSerializationSchema [1] does.
Best, Fabian
[1]
It's not a requirement but the exception reads "org.apache.flink.runtime.
client.JobClientActorConnectionTimeoutException: Lost connection to the
JobManager.".
So increasing the timeout might help.
Best, Fabian
2018-05-02 12:20 GMT+02:00 m@xi :
> Hello Fabian!
>
> Thanks
Hi Vishal,
AFAIK it is not possible with Flink's default time windows.
However, it should be possible to implement a custom WindowAssigner for
your use case.
I'd have a look at the TumblingEventTimeWindows class and copy/modify it to
your needs.
Best, Fabian
2018-05-02 15:12 GMT+02:00 Vishal
Hi,
did you try to increase the Akka timeout [1]?
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#distributed-coordination-via-akka
2018-04-29 19:44 GMT+02:00 m@xi :
> Guys seriously I have done the process as described in the
Hi Amit,
We recently fixed a bug in the network stack that affected batch jobs
(FLINK-9144).
The fix was added after your commit.
Do you have a chance to build the current release-1.5 branch and check if
the fix also resolves your problem?
Otherwise it would be great if you could open a blocker
Hi Juho,
I assume that these logs are generated from a different process, i.e., the
client process and not the JM or TM process.
Hence, they end up in a different log file and are not covered by the log
collection of the UI.
The reason is that this process might also be run on a machine outside
Hi Tao,
The watermarks of operators that consume from two (or more) streams are
always synced to the lowest watermark.
This behavior guarantees that data won't be late (unless it was late when
watermarks were assigned). However, the operator will most likely need to
buffer more events from the
>
> Best,
> Chengzhi
>
> On Thu, Apr 26, 2018 at 6:05 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Chengzhi,
>>
>> Functions in Flink are implemented in a way to preserve the timestamps of
>> elements or assign timestamps which are aligned
issue.
>
> Regards,
>
> — Ken
>
>
> On Apr 25, 2018, at 12:39 PM, Michael Latta <lat...@me.com> wrote:
>
> Using a flat map function, you can always buffer the non-meta data stream
> in the operator state until the metadata is aggregated, and then process
> an
Thanks for reporting the issue Chris!
Would you mind opening a JIRA issue [1] to track the bug for Flink 1.4?
Thank you, Fabian
[1] https://issues.apache.org/jira/browse/FLINK
2018-04-25 21:11 GMT+02:00 Chris Schneider :
> Hi Gang,
>
> FWIW, the code below works
Hi Chengzhi,
Functions in Flink are implemented in a way to preserve the timestamps of
elements or assign timestamps which are aligned with the existing
watermarks.
For example, the result of a time window aggregation has the end timestamp
of the window as a timestamp and records emitted by the
Hi,
This sounds like one of the partitions does not receive data. Watermark
generation is data driven, i.e., the watermark can only advance if the
TimestampAndWatermarkAssigner sees events.
By changing the parallelism between the map and the assigner, the events
are shuffled across and hence
You can certainly setup and build Flink applications with Gradle.
However the bad news is, the Flink project does not provide a
pre-configured Gradle project/configuration yet.
The good news is, the community is working on that [1] and there's already
a PR [2] (opened 19 hours ago).
Btw. besides
Hi Sebastien,
I think you can do that with Flink's Table API / SQL and the
KafkaJsonTableSource.
Note that in Flink 1.4.2, the KafkaJsonTableSource does not support flat
JSON yet.
You'd also need a table-valued UDFs for the parsing of the message and
joining the result with the original row.
unds, that must be helpful for my case as well
>
> Thank you,
> Alex
>
> On Mon, Apr 23, 2018 at 2:14 PM Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Miki,
>>
>> Sorry for the late response.
>> There are basically two ways to implement an enri
Hi Alex,
That's correct. The n refers to the n-th checkpoint. The checkpoint ID is
important, because operators need to align the barriers to ensure that they
consumed all inputs up to the point, where the barriers were injected into
the stream.
Each operator checkpoints its own state. For
Hi Esa,
there's no built-in support for handling spatial data in Flink.
However, you can use any JVM-based spatial library in your library to
perform such computations.
One option is the ESRI library [1].
Also there is a JIRA issue [2] to add support for a few spatial functions
(as provided by
Hi,
I just realized that the documentation completely lacks a section about
implementing custom source connectors :-(
The JavaDocs of the relevant interface classes [1] [2] [3] [4] are quite
extensive though.
I'd also recommend to have a look at the implementations of other source
connectors
Hi Miguel,
Actually, a lot has changed since 1.4.
Flink 1.5 will feature a completely (cluster) setup and deployment model.
The dev effort is known as FLIP-6 [1].
So it is not unlikely that you discovered a regression.
Would you mind opening a JIRA ticker for the issue?
Thank you very much,
Hi Miki,
Sorry for the late response.
There are basically two ways to implement an enrichment join as in your use
case.
1) Keep the meta data in the database and implement a job that reads the
stream from Kafka and queries the database in an ASyncIO operator for every
stream record. This should
Hi,
The semantics of the joins offered by the DataStream API in Flink 1.4 and
before as well as the upcoming 1.5 version are a bit messed up, IMO.
Since Flink 1.4, Flink SQL implements a better windowed join [1].
DataStream and SQL can be easily integrated with each other.
A similar
501 - 600 of 1535 matches
Mail list logo