Hi Luqman,
From your description, it seems like that you want to infer the type (case
class, tuple, etc.) of a stream dynamically at runtime.
AFAIK, I don’t think this is supported in Flink. You’re required to have
defined types for your DataStreams.
Could you also provide an example code of
Hi Jonas,
A few things to clarify first:
Stream A has a rate of 100k tuples/s. After processing the whole Kafka queue,
the rate drops to 10 tuples/s.
From this description it seems like the job is re-reading from the beginning
from the topic, and once you reach the latest record at the head
Hi Dmitry,
I think currently the simplest way to do this is simply to add a
program argument as a flag to whether or not the current run
is from a savepoint (so you manually supply the flag whenever you’re starting
the
job from a savepoint), and check that flag in the main method.
The main
Hi Ninad and Till,
Thank you for looking into the issue! This is actually a bug.
Till’s suggestion is correct:
The producer holds a `pendingRecords` value that is incremented on each
invoke() and decremented on each callback, used to check if the producer needs
to sync on pending callbacks on
Hi Sandeep!
While auto scaling jobs in Flink still isn’t possible, in Flink 1.2 you will be
able to rescale jobs by stopping and restarting.
This works by taking a savepoint of the job before stopping the job, and then
redeploy the job with a higher / lower parallelism using the savepoint.
Upon
the
affected sinks participate in checkpointing is enough of a solution - is there
anything special about `SinkFunction[T]` extending `Checkpointed[S]`, or can I
just implement it as I would for e.g. a mapping function?
Thanks,
Andrew
On Jan 13, 2017, at 4:34 PM, Tzu-Li (Gordon) Tai <tz
and I could
filter the data more efficiently because the data would not need to go over the
network before this filter.
Afterwards I can scale it up to 'many' tasks for the heavier processing that
follows.
As a concept: Could that be made to work?
Niels
On Mon, Jan 9, 2017 at 9:14 AM, Tzu-Li (G
Hi,
You’re correct that the FlinkKafkaProducer may emit duplicates to Kafka topics,
as it currently only provides at-least-once guarantees.
Note that this isn’t a restriction only in the FlinkKafkaProducer, but a
general restriction for Kafka's message delivery.
This can definitely be improved
Hi Andrew,
Your observations are correct. Like you mentioned, the current problem circles
around how we deal with the pending buffered requests with accordance to
Flink’s checkpointing.
I’ve filed a JIRA for this, as well as some thoughts for the solution in the
description:
Hi,
This is expected behaviour due to how the per-partition watermarks are designed
in the Kafka consumer, but I think it’s probably a good idea to handle idle
partitions also when the Kafka consumer itself emits watermarks. I’ve filed a
JIRA issue for this:
Henri Heiskanen <henri.heiska...@gmail.com>
wrote:
Hi,
We had the same problem when running 0.9 consumer against 0.10 Kafka. Upgrading
Flink Kafka connector to 0.10 fixed our issue.
Br,
Henkka
On Mon, Jan 9, 2017 at 5:39 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
Hi,
N
Hi,
Not sure what might be going on here. I’m pretty certain that for
FlinkKafkaConsumer09 when checkpointing is turned off, the internally used
KafkaConsumer client will auto commit offsets back to Kafka at a default
interval of 5000ms (the default value for “auto.commit.interval.ms”).
Could
Hi Niels,
Thank you for bringing this up. I recall there was some previous discussion
related to this before: [1].
I don’t think this is possible at the moment, mainly because of how the API is
designed.
On the other hand, a KeyedStream in Flink is basically just a DataStream with a
hash
Hi Igor!
What you can actually do is let a single FlinkKafkaConsumer consume from both
topics, producing a single DataStream which you can keyBy afterwards.
All versions of the FlinkKafkaConsumer support consuming multiple Kafka topics
simultaneously. This is logically the same as union and
Hi Matt,
1. There’s some in-progress work on wrapper util classes for Kafka
de/serializers here [1] that allows
Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with
minimal user overhead.
The PR also has some proposed adds to the documentations for the wrappers.
2. I
Hi Dominik,
Do you mean how Flink redistributes an operator’s state when the parallelism of
the operator is changed?
If so, you can take a look at [1] and [2].
Cheers,
Gordon
[1] https://issues.apache.org/jira/browse/FLINK-3755
[2]
Hi,
The FlinkKafkaProducers currently support per record topic through the
user-provided serialization schema which has a "getTargetTopic(T element)"
method called for per record,
and also decides the partition the record will be sent to through a custom
KafkaPartitioner, which is also provided
Hi Matt,
Just to be clear, what I'm looking for is a way to serialize a POJO class for
Kafka but also for Flink, I'm not sure the interface of both frameworks are
compatible but it seems they aren't.
For Kafka (producer) I need a Serializer and a Deserializer class, and for
Flink (consumer) a
Hi Craig,
I think the email wasn't sent to the ‘dev’ list, somehow.
Have you tried this:
mvn clean install -DskipTests
# In Maven 3.3 the shading of flink-dist doesn't work properly in one run, so
we need to run mvn for flink-dist again.
cd flink-dist
mvn clean install -DskipTests
I agree that
Hi Phillip,
Thanks for testing it. From your log and my own tests, I can confirm the
problem is with Kinesalite not correctly
mocking the official Kinesis behaviour for the `describeStream` API.
There’s a PR for the fix here: https://github.com/apache/flink/pull/2822. With
this change, shard
Hi Matt,
Here’s an example of writing a DeserializationSchema for your POJOs: [1].
As for simply writing messages from WebSocket to Kafka using a Flink job, while
it is absolutely viable, I would not recommend it,
mainly because you’d never know if you might need to temporarily shut down
Flink
Hi Philipp,
When used against Kinesalite, can you tell if the connector is already reading
data from the test shard before any
of the shard discovery messages? If you have any spare time to test this, you
can set a larger value for the
`ConsumerConfigConstants.SHARD_DISCOVERY_INTERVAL_MILLIS`
the job from my last savepoint, it's consuming the stream fine
again with no problems.
Do you have any idea what might be causing this, or anything I should do to
investigate further?
Cheers,
Josh
On Wed, Oct 5, 2016 at 4:55 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
Hi S
processing?
Thanks,
Li
On Oct 17, 2016, at 11:10 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
Hi!
No, the operator does not need to pause processing input records while the
checkpointing of its state is in progress.
The checkpointing of operator state is asynchronous. The op
Really great to hear this!
Cheers,
Gordon
On October 4, 2016 at 8:13:27 PM, Till Rohrmann (trohrm...@apache.org) wrote:
It's always great to hear Flink success stories :-) Thanks for sharing it with
the community.
I hope Flink helps you to solve even more problems. And don't hesitate to
Hi,
Helping out here: this is the PR for async Kafka offset committing -
https://github.com/apache/flink/pull/2574.
It has already been merged into the master and release-1.1 branches, so you can
try out the changes now if you’d like.
The change should also be included in the 1.1.3 release,
rators? If I need the state stored on previous
operators, how can I fetch it?
I don’t think this is possible.
Fine, thanks.
Thanks.
On Mon, Oct 3, 2016 at 12:23 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:
Hi!
- I'm currently running my flink program on 1.2 SNAPSHOT with kafka s
Hi!
- I'm currently running my flink program on 1.2 SNAPSHOT with kafka source and
I have checkpoint enabled. When I look at the consumer offsets in kafka it
appears to be stagnant and there is a huge lag. But I can see my flink program
is in pace with kafka source in JMX metrics and outputs.
Hi!
This is definitely a planned feature for the Kafka connectors, there’s a JIRA
exactly for this [1].
We’re currently going through some blocking tasks to make this happen, I also
hope to speed up things over there :)
Your observation is correct that the Kaka consumer uses “assign()” instead
Hi!
Yes, you can set custom operator names by calling `.name(…)` on DataStreams
after a transformation.
For example, `.addSource(…).map(...).name(…)`. This name will be used for
visualization on the dashboard, and also for logging.
Regards,
Gordon
On September 12, 2016 at 3:44:58 PM, Bart
Hi David,
Is it possible that your Kafka installation is an older version than 0.9? Or
you may have used a different Kafka client major version in your job jar's
dependency?
This seems like an odd incompatible protocol with the Kafka broker to me, as
the client in the Kafka consumer is reading
57 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:
Hi Josh,
Thank you for reporting this, I’m looking into it. There was some major changes
to the Kinesis connector after mid June, but the changes don’t seem to be
related to the iterator timeout, so it may be a bug that had always
Hi Josh,
Thank you for reporting this, I’m looking into it. There was some major changes
to the Kinesis connector after mid June, but the changes don’t seem to be
related to the iterator timeout, so it may be a bug that had always been there.
I’m not sure yet if it may be related, but may I
:)
Regards,
Gordon
On August 23, 2016 at 9:31:58 PM, Tzu-Li (Gordon) Tai (tzuli...@gmail.com)
wrote:
Slight misunderstanding here. The one thread per Kafka broker happens
*after* the assignment of Kafka partitions to the source instances. So,
with a total of 10 partitions and 10 source instances
stamps receiving and sending data. If there is
only one source instance per broker how does that happen?
Thanks,
Sameer
On Tue, Aug 23, 2016 at 7:17 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com>
wrote:
> Hi!
>
> Kinesis shards should be ideally evenly assigned to the source insta
on the source stream-
assignTimestampsAndWatermarks(new IngestionTimeExtractor<>())
Sameer
On Tue, Aug 23, 2016 at 7:29 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com>
wrote:
> Hi,
>
> For the Kinesis consumer, when you use Event Time but do not explicitly
> assign timestamps,
Hi,
For the Kinesis consumer, when you use Event Time but do not explicitly
assign timestamps, the Kinesis server-side timestamp (the time which
Kinesis received the record) is attached to the record as default, not
Flink’s ingestion time.
Does this answer your question?
Regards,
Gordon
On
Hi!
Kinesis shards should be ideally evenly assigned to the source instances.
So, with your example of source parallelism of 10 and 20 shards, each
source instance will have 2 shards and will have 2 threads consuming them
(therefore, not in round robin).
For the Kafka consumer, in the source
Hi,
Please also note that the “auto.offset.reset” property is only respected
when there is no offsets under the same consumer group in ZK. So,
currently, in order to make sure you read from the latest / earliest
offsets every time you restart your Flink application, you’d have to use an
unique
Hi Shannon,
Thanks for your investigation on the issue and the JIRA. There's actually a
previous JIRA on this problem already:
https://issues.apache.org/jira/browse/FLINK-4023. Would you be ok with
tracking this issue on FLINK-4023, and close FLINK-4069 as a duplicate
issue? As you can see, I've
Hi Soumya,
No, currently there is no Flink standard supported S3 streaming source. As
far as I know, there isn't one out in the public yet either. The community
is open to submissions for new connectors, so if you happen to be working on
one for S3, you can file up a JIRA to let us know.
Also,
Hi Rafal,
>From your description, it seems like Flink is complaining because it cannot
access the Elasticsearch API related dependencies as well. You'd also have
to include the following into your Maven build, under :
org.elasticsearch
elasticsearch
2.3.2
jar
false
Hi Francis,
A part of every complete snapshot is the record positions associated with
the barrier that triggered the checkpointing of this snapshot. The snapshot
is completed only when all the records within the checkpoint reaches the
sink. When a topology fails, all the operators' state will
Hi Saiph,
In Flink, the key for keyBy() can be provided in different ways:
https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys
(the doc is for DataSet API, but specifying keys is basically the same for
DataStream and DataSet).
As described in the
Hi Sourav,
A little help with more clarification on your last comment.
In sense of "where" the driver program is executed, then yes the Flink
driver program runs in a mode similar to Spark's YARN-client.
However, the "role" of the driver program and the work that it is
responsible of is quite
Hi Giancarlo,
Since it has been a while since the last post and there hasn't been a JIRA
ticket opened for Kinesis connector yet, I'm wondering how you are doing on
the Kinesis connector and hope to help out with this feature :)
I've opened a JIRA
501 - 546 of 546 matches
Mail list logo