Hi,
Are there any plans to upload the videos to the Flink Forward YouTube
channel?
There are so many interesting talks I would like to watch and filling out a
form for each and every video makes it more difficult...
Thanks,
Rafi
Hi,
We're using Kinesis as our input & output of a job and experiencing parsing
exception while reading from the output stream. All streams contain 1 shard
only.
While investigating the issue I noticed a weird behaviour where records get
a PartitionKey I did not assign and the record Data is
gt;
> Piotrek
>
>
> On 24 May 2018, at 09:40, Rafi Aroch <rafi.ar...@gmail.com> wrote:
>
> Hi,
>
> We're using Kinesis as our input & output of a job and experiencing
> parsing exception while reading from the output stream. All streams contain
> 1 shard only.
&
Hi Avihai,
The problem is that every message queuing sink only provides at-least-once
> guarantee
>
>From what I see, possible messaging queue which guarantees exactly-once is
Kafka 0.11, while using the Kafka transactional messaging
Hi,
Below is the In/Out metrics as they appear in the Flink UI.
I was wondering what are the possible reasons that the "Records sent" of
one operator is different than the "Records received" of the next one. I
would expect to see the same number...
[image: image.png]
* We're using Flink 1.5.0
Hi Jiayi,
This topic has been discussed by others, take a look here for some options
by Lyft: https://youtu.be/WdMcyN5QZZQ
Rafi
On Fri, Oct 12, 2018, 16:51 bupt_ljy wrote:
> Yes…that’s an option, but it’ll be very complicated because of our storage
> and business.
>
> Now I’m trying to write
Awesome, thanks!
On Mon, Oct 15, 2018, 17:36 Chesnay Schepler wrote:
> There is a known issue in 1.5.0 with the numRecordsIn/Out metrics for
> chained tasks: https://issues.apache.org/jira/browse/FLINK-9530
>
> This has been fixed in 1.5.1.
>
> On 15.10.2018 14:37, Rafi A
Hi,
I'm also experiencing this with Flink 1.5.2. This is probably related to
BucketingSink not working properly with S3 as filesystem because of the
eventual-consistency of S3.
I see that https://issues.apache.org/jira/browse/FLINK-9752 will be part of
1.6.2 release. It might help, if you use
Hi,
I'm writing a Batch job which reads Parquet, does some aggregations and
writes back as Parquet files.
I would like the output to be partitioned by year, month, day by event
time. Similarly to the functionality of the BucketingSink.
I was able to achieve the reading/writing to/from Parquet by
Hi Steve,
We've encountered this also. We have way more than enough shards, but were
still getting exceptions.
We think we know what is the reason, we would love for someone to
approve/reject.
What we suspect is happening is as follows:
The KPL's RateLimit parameter is tracking the amount of
Hi Avi,
I can't see the part where you use assignTimestampsAndWatermarks.
If this part in not set properly, it's possible that watermarks are not
sent and nothing will be written to your Sink.
See here for more details:
t it will do the job.
>
> Cheers,
> Kostas
>
>
>
> On Mon, Mar 25, 2019 at 9:49 AM Rafi Aroch wrote:
>
>> Hi Kostas,
>>
>> Thank you.
>> I'm currently testing my job against a small file, so it's finishing
>> before the checkpointing starts.
Hi,
In my job I want to read Parquet files from buckets by a date range.
For that i'm using the Hadoop Compatibility features to use
*ProtoParquetInputFormat*.
If in the processed date range the Parquet schema underwent changes (even
valid ones). Job fails with
et, for example, will not write the footer that is needed to be able
> to properly read the file.
>
> Kostas
>
>
> On Thu, Mar 21, 2019 at 8:03 PM Rafi Aroch wrote:
>
>> Hi Kostas,
>>
>> Yes I have.
>>
>> Rafi
>>
>> On Thu, Mar 21, 2019, 20:
Hi,
I'm trying to stream events in Prorobuf format into a parquet file.
I looked into both streaming-file options: BucketingSink &
StreamingFileSink.
I first tried using the newer *StreamingFileSink* with the *forBulkFormat *API.
I noticed there's currently support only for the Avro format with
t shouldn’t you be just reading committed files and
>> ignore in-progress? Maybe Kostas could add more insight to this topic.
>>
>> Piotr Nowojski
>>
>> On 20 Mar 2019, at 12:23, Rafi Aroch wrote:
>>
>> Hi,
>>
>> I'm trying to stream events in Pror
Hi Kostas,
Yes I have.
Rafi
On Thu, Mar 21, 2019, 20:47 Kostas Kloudas wrote:
> Hi Rafi,
>
> Have you enabled checkpointing for you job?
>
> Cheers,
> Kostas
>
> On Thu, Mar 21, 2019 at 5:18 PM Rafi Aroch wrote:
>
>> Hi Piotr and Kostas,
>>
>> Tha
Hi,
I'm seeing an issue when trying to set a different credentials provider for
AWS S3.
I'm setting in flink-conf.json the *fs.s3a.aws.credentials.provider *to a
different value.
I'm using the *flink-s3-fs-hadoop* dependency and I get an exception when
job starts:
*RuntimeException: Failed to
Hi Vijay,
When using windows, you may use the 'trigger' to set a Custom Trigger which
would trigger your *ProcessWindowFunction* accordingly.
In your case, you would probably use:
> *.trigger(ContinuousProcessingTimeTrigger.of(Time.minutes(5)))*
>
Thanks,
Rafi
On Mon, Jun 17, 2019 at 9:01 PM
Hi Debasish,
It would be a bit tedious, but in order to override the default
AvroSerializer you could specify a TypeInformation object where needed.
You would need to implement your own MyAvroTypeInfo instead of the provided
AvroTypeInfo.
For example:
env.addSource(kafkaConsumer)
Hi Peter,
I also encountered this issue. As far as I know, it is not currently
possible to stream from files (or any bounded stream) into a
*StreamingFileSink*.
This is because files are rolled over only on checkpoints and NOT when the
stream closes. This is due to the fact that at the function
Hi,
If it helps, we're using Lightbend's Config for that:
* https://github.com/lightbend/config
*
https://www.stubbornjava.com/posts/environment-aware-configuration-with-typesafe-config
Thanks,
Rafi
On Wed, Apr 17, 2019 at 7:07 AM Andy Hoang wrote:
> I have 3 different files for env: test,
Hi,
S3 would delete files only if you have 'lifecycle rules' [1] defined on the
bucket. Could that be the case? If so, make sure to disable / extend the
object expiration period.
[1]
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
Thanks,
Rafi
On Sat, Aug 17, 2019
Hi Ravi,
Consider moving to RocksDB state backend, where you can enable incremental
checkpointing. This will make you checkpoints size stay pretty much
constant even when your state becomes larger.
Hi Debasish,
Have you taken a look at the AsyncIO API for running async operations? I
think this is the preferred way of doing it. [1]
So it would look something like this:
class AsyncDatabaseRequest extends AsyncFunction[String, (String, String)] {
/** The database specific client that can
Hi,
I have a Flink 1.9.0 cluster deployed on AWS ECS. Cluster is running, but
metrics are not showing in the UI.
For other services (RPC / Data) it works because the connection is
initiated from the TM to the JM through a load-balancer. But it does not
work for metrics where JM tries to initiate
Hi Pooja,
Here's an implementation from Jamie Grier for de-duplication using
in-memory cache with some expiration time:
https://github.com/jgrier/FilteringExample/blob/master/src/main/java/com/dataartisans/DedupeFilteringJob.java
If for your use-case you can limit the time period where
Hi Chandu,
Maybe you can use a custom trigger:
* .trigger(**ContinuousEventTimeTrigger.of(Time.minutes(15)))*
This would continuously trigger your aggregate every period of time.
Thanks,
Rafi
On Thu, Dec 5, 2019 at 1:09 PM Andrey Zagrebin wrote:
> Hi Chandu,
>
> I am not sure whether
Hi Ori,
Make sure that latency metrics is enabled. It's disabled by default. See
also that the scope is set properly.
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#latency-tracking
Hi,
This happens because StreamingFileSink does not support a finite input
stream.
In the docs it's mentioned under "Important Considerations":
[image: image.png]
This behaviour often surprises users...
There's a FLIP
I think one "edge" case which is not handled would be that the first event
(by event-time) arrives late, then a wrong "started-window" would be
reported.
Rafi
On Thu, Feb 20, 2020 at 12:36 PM Manas Kale wrote:
> Is the reason ValueState cannot be use because session windows are always
>
e
> data in partitioned.
>
> Thanks,
> Anuj
>
>
> On Thu, Oct 25, 2018 at 5:35 PM Andrey Zagrebin
> wrote:
>
>> Hi Rafi,
>>
>> At the moment I do not see any support of Parquet in DataSet API
>> except HadoopOutputFormat, mentioned in stack overflow
Hi,
Take a look here: https://github.com/eastcirclek/flink-service-discovery
I used it successfully quite a while ago, so things might have changed
since.
Thanks,
Rafi
On Wed, Dec 25, 2019, 05:54 vino yang wrote:
> Hi Mans,
>
> IMO, the mechanism of metrics reporter does not depend on any
Hi Eyal,
Sounds trivial, but can you verify that the file actually exists in
/opt/flink/conf/log4j-console.properties? Also, verify that the user
running the process has read permissions to that file.
You said you use Flink in YARN mode, but the the example above you run
inside a docker image so
Hi Peter,
I've dealt with the cross-account delegation issues in the past (with no
relation to Flink) and got into the same ownership problems (accounts can't
access data, account A 'loses' access to it's own data).
My 2-cents are that:
- The account that produces the data (A) should be the
Hi Ori,
In your code, are you using the process() API?
.process(new MyProcessWindowFunction());
if you do, the ProcessWindowFunction is getting as argument an Iterable
with ALL elements collected along the session. This will make the state per
key potentially huge (like you're experiencing).
Hi Ori,
I guess you consume from Kafka from the earliest offset, so you consume
historical data and Flink is catching-up.
Regarding: *My event-time timestamps also do not have big gaps*
Just to verify, if you do keyBy sessionId, do you check the gaps of
events from the same session?
Rafi
On
Hi Ori,
Once a session ends, it's state should get purged. You should take care
that a session does end.
For example, if you wait for a 'session-end' event, limit it with some time
threshold. If it's defined with inactivity gap and your client sends
infinite events, you could limit the session
Hi Arvid,
Would it be possible to implement a BucketAssigner that for example loads
the configuration periodically from an external source and according to the
event type decides on a different sub-folder?
Thanks,
Rafi
On Wed, Jun 24, 2020 at 10:53 PM Arvid Heise wrote:
> Hi Anuj,
>
> There
Hi Sidney,
Have a look at implementing a BucketAssigner for StreamingFileSink:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#bucket-assignment
Rafi
On Sat, Dec 26, 2020 at 11:48 PM Sidney Feiner
wrote:
> Hey,
> I would like to create a dynamic
Hi,
Take a look at the new 1.14 feature called Hybrid Source:
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/
Rafi
On Tue, Oct 26, 2021 at 7:46 PM Qihua Yang wrote:
> Hi,
>
> My flink app has two data sources. One is from a Kafka topic, one
41 matches
Mail list logo