This works; I had some issues modifying my scripts, but it’s OK and I could
confirm by JMX that env.java.opts.jobmanager had priority over the “normal”
heap size (calculated from cutoff).
Thanks !
From: Yang Wang
Sent: mercredi 20 novembre 2019 03:52
To: Gwenhael Pasquiers
Cc: user
I see, good idea, I’ll try that and tell you the result.
Thanks,
From: Yang Wang
Sent: mercredi 20 novembre 2019 03:52
To: Gwenhael Pasquiers
Cc: user@flink.apache.org
Subject: Re: YARN : Different cutoff for job and task managers
Hi Gwenhael,
I'm afraid that we could not set different cut
Hello,
In a setup where we allocate most of the memory to rocksdb (off-heap) we ha= ve
an important cutoff.
Our issue is that the same cutoff applies to both task and job managers : the
heap size of the job manager then becomes too low.
Is there a way to apply different cutoffs to job
>From what I understood, in your case you might solve your issue by using
>specific key classes instead of Strings.
Maybe you could create key classes that have a user-specified hashcode that
could take the previous key's hashcode as a value. That way your data shouldn't
be sent over the wire
I think I finally found a way to "simulate" a Timer thanks to the the
processWatermark function of the AbstractStreamOperator.
Sorry for the monologue.
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: vendredi 10 novembre 2017 16:02
To: 'user@flink.apache.
Hello,
Finally, even after creating my operator, I still get the error : "Timers can
only be used on keyed operators".
Isn't there any way around this ? A way to "key" my stream without shuffling
the data ?
From: Gwenhael Pasquiers
Sent: vendredi 10 novembre 2017 11:42
To
Maybe you don't need to bother with that question.
I'm currently discovering AbstractStreamOperator, OneInputStreamOperator and
Triggerable.
That should do it :-)
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: jeudi 9 novembre 2017 18:00
To: 'user@flink.apache.org
Hello,
(Flink 1.2.1)
For performances reasons I'm trying to reduce the volume of data of my stream
as soon as possible by windowing/folding it for 15 minutes before continuing to
the rest of the chain that contains keyBys and windows that will transfer data
everywhere.
Because of the huge
Hi,
Well yes, I could probably make it work with a constant number of operators
(and consequently buffers) by developing specific input and output classes, and
that way I'd have a workaround for that buffers issue.
The size of my job is input-dependent mostly because my code creates one full
Hi,
Is it possible to use checkpointing to restore the state of an app after a
restart on yarn ?
From what I've seen it looks like that checkpointing only works within a flink
cluster life-time. However the yarn mode has one cluster per app, and (unless
the app crashes and is automatically
Hello,
Sorry to ask you again, but no idea on this ?
-Original Message-
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: lundi 21 août 2017 12:04
To: Nico Kruber <n...@data-artisans.com>
Cc: Ufuk Celebi <u...@apache.org>; user@flink.apache.org
Subjec
(32 and 4) in order to
limit the numbers of files on HDFS.
-Original Message-
From: Nico Kruber [mailto:n...@data-artisans.com]
Sent: vendredi 18 août 2017 14:58
To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
Cc: Ufuk Celebi <u...@apache.org>; user@flink.apache.org
Sub
4
To: Ufuk Celebi <u...@apache.org>
Cc: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>;
user@flink.apache.org; Nico Kruber <n...@data-artisans.com>
Subject: Re: Great number of jobs and numberOfBuffers
PS: Also pulling in Nico (CC'd) who is working on the network stack.
On Thu, Au
Hello,
We're meeting a limit with the numberOfBuffers.
In a quite complex job we do a lot of operations, with a lot of operators, on a
lot of folders (datehours).
In order to split the job into smaller "batches" (to limit the necessary
"numberOfBuffers") I've done a loop over the batches
possibly up-to-date watermark.
-Original Message-
From: Aljoscha Krettek [mailto:aljos...@apache.org]
Sent: vendredi 4 août 2017 12:22
To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
Cc: Nico Kruber <n...@data-artisans.com>; user@flink.apache.org
Subject: Re: Event-ti
the watermark ?
B.R.
-Original Message-
From: Nico Kruber [mailto:n...@data-artisans.com]
Sent: jeudi 3 août 2017 16:30
To: user@flink.apache.org
Cc: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
Subject: Re: Event-time and first watermark
Hi Gwenhael,
"A Watermark(t) decla
Hi,
>From my tests it seems that the initial watermark value is Long.MIN_VALUE even
>though my first data passed through the timestamp extractor before arriving
>into my ProcessFunction. It looks like the watermark "lags" behind the data by
>one message.
Is there a way to have a watermark
Thanks !
I didn’t know of this function and indeed it seems to match my needs better
than Windows. And I’ll be able to clear my state once it’s empty (and re-create
it when necessary).
B.R.
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: lundi 26 juin 2017 12:13
To: Gwenhael Pasquiers
Hello,
This may be premature optimization for memory usage but here is my question :
I have to do an app that will have to monitor sessions of (millions of) users.
I don’t know when the session starts nor ends, nor a reasonable maximum
duration.
I want to have a maximum duration (timeout) of
Hello,
Up to now we’ve been using kafka with jaas (plain login/password) the following
way:
- yarnship the jaas file
- add the jaas file name into “flink-conf.yaml” using property
“env.java.opts”
How to support multiple secured kafka 0.10 consumers and producers (with
ed, right?
Best,
Aljoscha
On 18. Apr 2017, at 16:19, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Thanks for your answer.
Does that means that flink does not rely on the offset in written to zookeeper
anymore, but relies on the sna
Hi,
Before doing it myself I thought it would be better to ask.
We need to consume from kafka 0.8 and produce to kafka 0.10 in a flink app.
I guess there will be classes and package names conflicts for a lot of
dependencies of both connectors.
The obvious solution it to make a “shaded” version
variables is kept in-memory on each node, it
should not become too large. For simpler things like scalar values you can
simply make parameters part of the closure of a function, or use the
withParameters(...) method to pass in a configuration.
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui
broadcasting it rather than sending its elements 1 by 1.
I’ll try to use it, I’ll take anything that will make my code cleaner !
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: vendredi 3 mars 2017 17:55
To: user@flink.apache.org
Subject: RE: Cross operation on two huge datasets
d store the output as Flink State – that would be more fault tolerant
but storing it as cache in JVM should work too.
Is this a batch job or streaming?
Between I am a newbee to Flink, still only learning – so take my suggestions
with caution ☺
Thanks
Ankit
From: Gwenhael Pasquiers
<gwenh
for now even if the static thingie is a bit dirty.
However I’m surprised that reading 20 MB of parquet become 21GB of “bytes sent”
by the flink reader.
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: jeudi 2 mars 2017 16:28
To: user@flink.apache.org
Subject: RE: Cross
with the partitioned point data
set.
Cheers,
Till
On Thu, Mar 2, 2017 at 3:29 PM, Gwenhael Pasquiers
[gwenhael.pasqui...@ericsson.com](mailto:gwenhael.pasqui...@ericsson.com)<http://mailto:[gwenhael.pasqui...@ericsson.com](mailto:gwenhael.pasqui...@ericsson.com)>
wrote:
The best for me would be t
ontext().getBroadcastVariable("broadcast");
}
@Override
public void flatMap(Integer integer, Collector collector) throws
Exception {
}
}).withBroadcastSet(broadcastSet, "broadcast");
Cheers,
Till
On Thu, Mar 2, 2017 at 12:12 PM, Gwenhael Pasquiers
<gw
it is usually better to apply a coarse-grained spatial
partitioning and do a key-based join on the partitions. Within each partition
you'd perform a cross.
Best, Fabian
2017-02-21 18:34 GMT+01:00 Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com&g
Within each partition
you'd perform a cross.
Best, Fabian
2017-02-21 18:34 GMT+01:00 Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>>:
Hi,
I need (or at least I think I do) to do a cross operation between two huge
datasets. One dataset is a l
are on Flink 1.1.x, you will need to implement a custom operator which
is a much more low-level interface.
Best, Fabian
2017-01-11 17:16 GMT+01:00 Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>>:
Hi,
Sorry if this was already asked.
For
Hi,
Sorry if this was already asked.
For performances reasons (streaming as well as batch) I'd like to "group"
messages (let's say by batches of 1000) before sending them to my sink (kafka,
but mainly ES) so that I have a smaller overhead.
I've seen the "countWindow" operation but if I'm not
On Mon, Jan 2, 2017 at 1:34 AM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Hi, and best wishes for the year to come ☺
I’d like to be able to programmatically get the (live) values of accumulators
in order to send them us
Hi, and best wishes for the year to come :)
I'd like to be able to programmatically get the (live) values of accumulators
in order to send them using a statsd (or another) client in the JobManager of a
yarn-deployed application. I say live because I'd like to use that in streaming
(24/7)
back?
Thanks a lot,
Fabian
2016-12-20 16:37 GMT+01:00 Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>>:
Thanks, it is working properly now.
NB : Had to delete the folder by code because Hadoop’s OuputFormats will only
overwrite file by file, n
Hi,
Sorry if it's already been asked but is there an embedded way for flink to
generate a _SUCCESS file in the folders it's been writing into (using the write
method with OutputFormat) ?
We are replacing a spark job that was generating those files (and further
operations rely on it).
Best
-path-directory
Regarding the phases: the best way to exchange data between batch jobs is via
files. You can then execute two programs one after the other, the first one
produces the files, which the second jobs uses as input.
– Ufuk
On Mon, Mar 21, 2016 at 12:14 PM, Gwenhael Pasquiers
Hello,
Sorry if this has been already asked or is already in the docs, I did not find
the answer :
Is there a way to read a given set of folders in Flink batch ? Let's say we
have one folder per hour of data, written by flume, and we'd like to read only
the N last hours (or any other pattern
;
> I added a new Ticket: https://issues.apache.org/jira/browse/FLINK-3336
>
> This will implement the data shipping pattern that you mentioned in your
> initial mail. I also have the Pull request almost ready.
>
>> On 04 Feb 2016, at 16:25, Gwenhael Pasquiers
>>
at 10:56 AM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Hi,
I’ve got two more questions on different topic…
First one :
Is there a way to monitor the buffers status. In order to find bottleneck in
our application we th
Hi,
I’ve got two more questions on different topic…
First one :
Is there a way to monitor the buffers status. In order to find bottleneck in
our application we though it could be usefull to be able to have a look at the
different exchange buffers’ status. To know if they are full (or as an
be at 24 if there was no chaining …
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: jeudi 4 février 2016 09:55
To: user@flink.apache.org
Subject: RE: Distribution of sinks among the nodes
Don’t we need to set the number of slots to 24 (4 sources + 16 mappers + 4
sinks
re. That
is also the reason why the slots are per TaskManager, and not global, to
associate them with a constant set of resources (mainly memory).
Greetings,
Stephan
On Thu, Feb 4, 2016 at 9:54 AM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@eric
, Feb 3, 2016 at 4:31 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
It is one type of mapper with a parallelism of 16
It's the same for the sinks and sources (parallelism of 4)
The settings are
Env.setParallelism(4)
Mapper.s
Hi,
We try to deploy an application with the following “architecture” :
4 kafka sources => 16 maps => 4 kafka sinks, on 4 nodes, with 24 slots (we
disabled operator chaining).
So we’d like on each node :
1x source => 4x map => 1x sink
That way there are no exchanges between different
you say 16 maps, are we talking about one mapper with parallelism 16 or 16
unique map operators?
Regards,
Aljoscha
> On 03 Feb 2016, at 15:48, Gwenhael Pasquiers
> <gwenhael.pasqui...@ericsson.com> wrote:
>
> Hi,
>
> We try to deploy an application with the following
then we have to check what the problem is.
Could you check the size of your user jars?
Cheers,
Till
On Wed, Jan 27, 2016 at 4:38 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Hello,
We got a question about blob.storage.di
Hello,
We got a question about blob.storage.dir and it’s .buffer files :
What are they ? And are they cleaned or is there a way to limit their size and
to evaluate the necessary space ?
We got a node root volume disk filled by those files (~20GB) and it crashed.
Well, the root was filled
later on, the String[] poses no issues with Tuple and it seems
to be OK.
Now ... Let's write those String[] in parquet too :)
From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com]
Sent: vendredi 18 décembre 2015 10:04
To: user@flink.apache.org
Subject: Reading Parquet/Hive
Hi,
I'm tr
this step.
The reasoning for the initial solution (not removing anything) was to make sure
that no jobs are deleted by accident. But it looks like this is more confusing
than helpful.
– Ufuk
> On 23 Nov 2015, at 11:45, Gwenhael Pasquiers
> <gwenhael.pasqui...@ericsson.com> wrote:
&
. The reason is that
then all clusters will think that they belong together.
Cheers,
Till
On Mon, Nov 23, 2015 at 2:15 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
OK, I understand.
Maybe we are not really using flink as
On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Hello,
We’re trying to set up high availability using an existing zookeeper quorum
already running in our Cloudera cluster.
So, as per the doc we’ve cha
Hi,
We're having some OOM permgen exceptions when running on yarn.
We're not yet sure if it is either a consequence or a cause of our crashes, but
we've been trying to increase that value... And we did not find how to do it.
I've seen that the yarn-daemon.sh sets a 256m value.
It looks to me
To: user@flink.apache.org
Subject: Re: Building Flink with hadoop 2.6
Great.
Which classes can it not find at runtime?
I'll try to build and run Flink with exactly the command you've provided.
On Wed, Oct 14, 2015 at 4:49 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.
Hi ;
We need to test some things with flink and hadoop 2.6 (the trunc method).
We've set up a build task on our Jenkins and everything seem okay.
However when we replace the original jar from your 0.10-SNAPSHOT distribution
by ours there are some missing dependencies (log4j, curator, and maybe
well its working for you and if you're missing something.
Its a pretty new feature :)
Also note that you can use the fs sinks with hadoop versions below 2.7.0, then
we'll write some metadata containing the valid offsets.
On Wed, Oct 14, 2015 at 5:18 PM, Gwenhael Pasquiers
<gwenhael.pas
u're using for building Flink?
On Wed, Oct 14, 2015 at 4:37 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Hi ;
We need to test some things with flink and hadoop 2.6 (the trunc method).
We’ve set up a build task on our Jenki
Hi everyone,
We're trying to use consume a 0.8.1 Kafka on Flink 0.9.1 and we've run into the
following issue :
My offset became OutOfRange however now when I start my job, it loops on the
OutOfRangeException, no matter what the value of auto.offset.reset is...
(earliest, latest, largest,
-cluster-per-job
mode, then we'll have to try to find another solution.
Best,
Aljoscha
On Tue, 25 Aug 2015 at 16:22 Gwenhael Pasquiers
gwenhael.pasqui...@ericsson.commailto:gwenhael.pasqui...@ericsson.com wrote:
Hi,
We’re developing the first of (we hope) many flink streaming app.
We’d like
Hi,
We're developing the first of (we hope) many flink streaming app.
We'd like to package the logging configuration (log4j) together with the jar.
Meaning, different application will probably have different logging
configuration (ex: to different logstash ports) ...
Is there a way to
60 matches
Mail list logo