Hi,
Sharing new findings.
The issue I have mentioned above seems to be happening only with the latest
version of EMR(emr-5.20.0, hadoop: Amazon 2.8.5, Flink: 1.6.2) and it is
reproducible with our setup every time. I have verified the same setup
working and scaling without any issues on an older
Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger never
PURGES but only FIRES what I said is semantically true. The window
contents are never cleared.
What I missed is that in this case since you're using a function that
incrementally reduces on the fly rather than processing all
Hi Piotr,
Ideally on DEBUG level.
Best,
Gary
On Fri, Jan 18, 2019 at 3:41 PM Piotr Szczepanek
wrote:
> Hey Gary,
> thanks for your reply.
> Before we have been using Flink version 1.5.2.
> With both version we're using Flink deployed on Yarn.
>
> Regarding log would you like to have log
Thanks. That makes sense :)
From: Kostas Kloudas
Date: Friday, 18 January 2019 at 8:25 PM
To: Harshith Kumar Bolar
Cc: "user@flink.apache.org"
Subject: [External] Re: Is there a way to find the age of an element in a
Global window?
Hi Harshith,
The evictor has 2 methods:
void
Hi Henry,
Can you share your pom.xml and the full stacktrace with us? It is expected
behavior that org.elasticsearch.client.RestClientBuilder is not shaded. That
class comes from the elasticsearch Java client, and we only shade its
transitive dependencies. Could it be that you have a dependency
So, do you mean to have your application running in real-time and use the
same instance of it to also process historical data at the same time?
If that's the case then I would advise not to try to do it that way. What
I would recommend instead is to process that historical data with another
Hi Harshith,
The evictor has 2 methods:
void evictBefore(Iterable> elements, int size, W
window, EvictorContext evictorContext);
void evictAfter(Iterable> elements, int size, W window,
EvictorContext evictorContext);
In the iterables, you have access to the elements and their timestamps, and
the
Hey Gary,
thanks for your reply.
Before we have been using Flink version 1.5.2.
With both version we're using Flink deployed on Yarn.
Regarding log would you like to have log entries with DEBUG enabled or INFO
would be enough?
Thanks,
Piotr
pt., 18 sty 2019 o 15:14 Gary Yao napisał(a):
> Hi
I'm not sure if this is required. It's quite convenient to be able to just
grab a single tarball and you've got everything you need.
I just did this for the latest binary release and it was 273MB and took
about 25 seconds to download. Of course I know connection speeds vary
quite a bit but I
Yeah, that’s what I have so far in my solutions pocket. Another problem is
to spawn a huge application just to process a hundred entries... :(
If you want the whole picture: there is a number of devices with internal
acknowledgment system to guarantee the order. Nevertheless sometimes the
network
Thanks a lot for clarifying :-)
- Harshith
From: Fabian Hueske
Date: Friday, 18 January 2019 at 4:31 PM
To: Harshith Kumar Bolar
Cc: "user@flink.apache.org"
Subject: [External] Re: Should the entire cluster be restarted if a single Task
Manager crashes?
Hi Harshith,
No, you don't need to
Hi Chesnay,
Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors
(although in application space).
+1
Fabian
Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:
> Hello,
>
> the
Hi Piotr,
What was the version you were using before 1.7.1?
How do you deploy your cluster, e.g., YARN, standalone?
Can you attach full TM and JM logs?
Best,
Gary
On Fri, Jan 18, 2019 at 3:05 PM Piotr Szczepanek
wrote:
> Hello,
> we have scenario with running Data Processing jobs that
Hi all,
We're running a standalone Flink cluster with 2 Job Managers and 3 Task
Managers. Whenever a TM crashes, we simply restart that particular TM and
proceed with the processing.
But reading the comments on
Hi Harshith,
No, you don't need to restart the whole cluster. Flink only needs enough
processing slots to recover the job.
If you have a standby TM, the job should restart immediately (according to
its restart policy). Otherwise, you have to start a new TM to provide more
slots. Once the slots
Hello,
the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.
With Flink growing
Suggestions please.
Thinking of options
1. Initilizing spring application context in the 'open' method. Instead of
loading entire context, move service related beans to one/multiple
packages and scan only those packages. Requires code refactoring.
2. Direct database query - direct query cannot
Hello,
we have scenario with running Data Processing jobs that generates export
files on demand. Our first approach was using ClusterClient, but recently
we switched to REST API for job submittion. In the meantime we switched to
flink 1.7.1 and that started to cause a problems.
Some of our jobs
The output is a bunch of files in parquet format. The thing reading them
would be presto, so I can really tell it to ignore some rows but not
others. Not to mention that the files would keep piling making sql queries
super slow.
On Fri, Jan 18, 2019, 10:01 AM Jamie Grier Sorry my earlier comment
Sorry my earlier comment should read: "It would just read all the files in
order and NOT worry about which data rows are in which files"
On Fri, Jan 18, 2019 at 10:00 AM Jamie Grier wrote:
> Hmm.. I would have to look into the code for the StreamingFileSink more
> closely to understand the
Hmm.. I would have to look into the code for the StreamingFileSink more
closely to understand the concern but typically you should not be concerned
at all with *when* checkpoints happen. They are meant to be a completely
asynchronous background process that has absolutely no bearing on
Well, the problem is that, conceptually, the way I'm trying to approach
this is ok. But in practice, it has some edge cases.
So back to my original premise: if you both, trigger and checkpoint happen
around the same time, there is a chance that the streaming file sink rolls
the bucket BEFORE it
Hi,
I have a requirement and need to understand if same can be achieved with
Flink retract stream. Let's say we have stream with 4 attributes userId,
orderId, status, event_time where orderId is unique and hence any change in
same orderId updates previous value as below
*Changelog* *Event Stream*
Hi all,
I'm using Global Windows for my application with a custom trigger and custom
evictor based on some conditions. Now, I also want to evict those elements from
the window that have stayed there for too long, let's say 30 mins. How would I
go about doing this? Is there a utility that Flink
24 matches
Mail list logo