Hi Ben,
Sorry for the late reply but I guess that your question was answered
in StackOverflow, right?
Did that answer solve your problem?
Cheers,
Kostas
On Mon, Dec 21, 2020 at 9:09 AM Ben Beasley wrote:
>
> First off I want to thank the folks in this email list for their help thus
> far.
>
>
Glad I could help!
On Mon, Dec 21, 2020 at 3:42 AM Ben Beasley wrote:
>
> That worked. Thankyou, Kostas.
>
>
>
> From: Kostas Kloudas
> Date: Sunday, December 20, 2020 at 7:21 AM
> To: Ben Beasley
> Cc: user@flink.apache.org
> Subject: Re: No execution.target s
Hi Ben,
You can try using StreamExecutionEnvironment
streamExecutionEnvironment =
StreamExecutionEnvironment.getExecutionEnvironment();
instead of directly creating a new one. This will allow to pick up the
configuration parameters you pass through the command line.
I hope this helps,
Kostas
On
Hi Lalala,
Even in session mode, the jobgraph is created before the job is
executed. So all the above hold.
Although I am not super familiar with the catalogs, what you want is
that two or more jobs share the same readers of a source. This is not
done automatically in DataStream or DataSet and I a
Hi Hector,
The main reasons for deprecating the readFileStream() was that:
1) it was only capable of parsing Strings and in a rather limited way
as one could not even specify the encoding
2) it was not fault-tolerant, so your concerns about exactly-once were
not covered
One concern that I can fin
I am also cc'ing Timo to see if he has anything more to add on this.
Cheers,
Kostas
On Thu, Nov 19, 2020 at 9:41 PM Kostas Kloudas wrote:
>
> Hi,
>
> Thanks for reaching out!
>
> First of all, I would like to point out that an interesting
> alternative to the per-jo
Hi,
Thanks for reaching out!
First of all, I would like to point out that an interesting
alternative to the per-job cluster could be running your jobs in
application mode [1].
Given that you want to run arbitrary SQL queries, I do not think you
can "share" across queries the part of the job grap
Hi Marco,
I agree with you that the -m help message is misleading but I do not
think it has changed between releases.
You can specify the address of the jobmanager or, for example, you can
put "-m yarn-cluster" and depending on your environment setup Flink
will pick up a session cluster or will cr
Hi Ashwin,
Do you have any filtering or aggregation (or any operation that emits
less data than it receives) in your logic? If yes, you could for
example put if before the reschaling operation so that it gets chained
to your source and you reduce the amount of data you ship through the
network. Af
Hi Nikola,
Apart from the potential overhead you mentioned about having one more
operator, I cannot find any other. Also even this one I think is
negligible.
The reason why we recommend attaching the Watermark Generator to the
source is more about semantics rather than efficiency. It seems
natural
Hi Flavio,
Coould this https://issues.apache.org/jira/browse/FLINK-20020 help?
Cheers,
Kostas
On Thu, Nov 5, 2020 at 9:39 PM Flavio Pompermaier wrote:
>
> Hi everybody,
> I was trying to use the JobListener in my job but onJobExecuted() on Flink
> 1.11.0 but I can't understand if the job succe
Could you also post the ticket here @Flavio Pompermaier and we will
have a look before the upcoming release.
Thanks,
Kostas
On Wed, Nov 4, 2020 at 10:58 AM Chesnay Schepler wrote:
>
> Good find, this is an oversight in the CliFrontendParser; no help is
> printed for the run-application action.
st of friction for some users.
>>
>> To be clear, I, personally, don't have a problem with removing it (we
>> have removed other connectors in the past that did not have a migration
>> plan), I just reject he argumentation.
>>
>> On 10/28/2020 12:21 PM, Kostas Kl
e users because there are better alternatives.
>
> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> > I think that the mailing lists is the best we can do and I would say
> > that they seem to be working pretty well (e.g. the recent Mesos
> > discussion).
> > Of course they ar
Hi all,
I will have a look in the whole stack trace in a bit.
@Chesnay Schepler I think that we are setting the correct classloader
during jobgraph creation [1]. Is that what you mean?
Cheers,
Kostas
[1]
https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/c
strict.
On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler wrote:
>
> If the conclusion is that we shouldn't remove it if _anyone_ is using
> it, then we cannot remove it because the user ML obviously does not
> reach all users.
>
> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
&
gt; >
> > Seth
> >
> > https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >
> > On Thu, Oct 15, 2020 at 2:57 PM K
Thanks Piyush for the message.
After this, I revoke my +1. I agree with the previous opinions that we
cannot drop code that is actively used by users, especially if it
something that deep in the stack as support for cluster management
framework.
Cheers,
Kostas
On Fri, Oct 23, 2020 at 4:15 PM Piyu
+1 for adding a warning about the removal of Mesos support and I would
also propose to state explicitly in the warning the version that we
are planning to actually remove it (e.g. 1.13 or even 1.14 if we feel
it is too aggressive).
This will help as a reminder to users and devs about the upcoming
els rushed to remove it at this point.
>
> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas wrote:
>
> > @Chesnay Schepler Off the top of my head, I cannot find an easy way
> > to migrate from the BucketingSink to the StreamingFileSink. It may be
> > possible but it will
rsions of the module compatible with 1.12+?
>
> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> > Hi all,
> >
> > As the title suggests, this thread is to discuss the removal of the
> > flink-connector-filesystem module which contains (only) the deprecated
> > Buc
Hi all,
As the title suggests, this thread is to discuss the removal of the
flink-connector-filesystem module which contains (only) the deprecated
BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
favor of the relatively recently introduced StreamingFileSink.
For the sake of a
Hi Jason,
Your analysis seems correct.
As an alternative, you could:
1) either call readFile multiple times on the
StreamExecutionEnvironment (once for each dir you want to monitor) and
then union the streams, or
2) you could put all the dirs you want to monitor under a common
parent dir and spec
Hi all,
I will have a look.
Kostas
On Mon, Sep 28, 2020 at 3:56 PM Till Rohrmann wrote:
>
> Hi Cristian,
>
> thanks for reporting this issue. It looks indeed like a very critical problem.
>
> The problem seems to be that the ApplicationDispatcherBootstrap class
> produces an exception (that th
iodically fetching a
> new version of data from some external storage.
>
> Thanks,
>
> Dongwon
>
> > 2020. 9. 23. 오전 4:59, Kostas Kloudas 작성:
> >
> > Hi Dongwon,
>
>
>
>
>
> >
> > If you know the data in advance, you can always use th
Hi Dongwon,
If you know the data in advance, you can always use the Yarn options
in [1] (e.g. the "yarn.ship-directories") to ship the directories with
the data you want only once to each Yarn container (i.e. TM) and then
write a udf which reads them in the open() method. This will allow the
data
Thanks a lot for the discussion!
I will open a voting thread shortly!
Kostas
On Mon, Aug 24, 2020 at 9:46 AM Kostas Kloudas wrote:
>
> Hi Guowei,
>
> Thanks for the insightful comment!
>
> I agree that this can be a limitation of the current runtime, but I
> think that t
available in the BATCH mode in current
> implementation.
> So maybe we need more checks in the AUTOMATIC execution mode.
>
> Best,
> Guowei
>
>
> On Thu, Aug 20, 2020 at 10:27 PM Kostas Kloudas wrote:
>>
>> Hi all,
>>
>> Thanks for the comments!
>>
mers at the end of
> a job would be interesting, and would help in (at least some of) the cases I
> have in mind. I don't have a better idea.
>
> David
>
> On Mon, Aug 17, 2020 at 8:24 PM Kostas Kloudas wrote:
>>
>> Hi Kurt and David,
>>
>> Thanks a lot
h" and "bounded
>> > streaming" to be treated differently. If I've understood it correctly, the
>> > section on scheduling allows me to choose STREAMING scheduling even if I
>> > have bounded sources. I like that approach, because it recognizes that
nt#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
> (deprecated in 1.2)
>
> Looking forward to more opinions on the issue.
>
> Best,
>
> Dawid
>
>
> On 17/08/2020 12:49, Kostas Kloudas wrote:
>
> Thanks a lot for starting this
t; run exactly the same job as in production, except with different sources and
> sinks. While it might be a reasonable default, I'm not convinced that
> switching a processing time streaming job to read from a bounded source
> should always cause it to fail.
>
> David
>
&g
Thanks a lot for starting this Dawid,
Big +1 for the proposed clean-up, and I would also add the deprecated
methods of the StreamExecutionEnvironment like:
enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
enableCheckpointing()
isForceCheckpointing()
readFile(FileInputFor
Hi Dasraj,
Yes, I would recommend to use Public and, if necessary, PublicEvolving
APIs as they provide better guarantees for future maintenance.
Unfortunately there are no Docs about which APIs are public or
publiceEvolving but you can see the annotations of the classes in the
source code.
I guess
Hi Narasimha,
I am not sure why the TMs are not shutting down, as Yun said, so I am
cc'ing Till here as he may be able to shed some light.
For the application mode, the page in the documentation that you
pointed is the recommended way to deploy an application in application
mode.
Cheers,
Kostas
Hi all,
As described in FLIP-131 [1], we are aiming at deprecating the DataSet
API in favour of the DataStream API and the Table API. After this work
is done, the user will be able to write a program using the DataStream
API and this will execute efficiently on both bounded and unbounded
data. But
Hi Dasraj,
You are right. On your previous email I did not pay attention that you
migrated from 1.9.
Since 1.9 the ClusterClient has changed significantly as it is not
annotated as @Public API.
I am not sure how easy it is to use the old logic in your settings.
You could try copying the old code
Hi Dasraj,
Could you please specify where is the clusterClient.run() method and
how does it submit a job to a cluster?
Is the clusterClient your custom code?
Any details will help us pin down the problem.
One thing that is worth looking at is the release-notes of 1.11 [1].
There you will find all
turned Not Found consistently returned a correct result.
> It had never occurred before and I am afraid now I could no longer observe it
> again. I appreciate it does not give too much information so I will come back
> with more info on this thread if it happens again.
>
> -Original
Hi Alex,
Maybe Seth (cc'ed) may have an opinion on this.
Cheers,
Kostas
On Thu, Jul 23, 2020 at 12:08 PM Александр Сергеенко
wrote:
>
> Hi,
>
> We use so-called "control stream" pattern to deliver settings to the Flink
> job using Apache Kafka topics. The settings are in fact an unlimited stre
Hi Senthil,
You can see the configuration from the WebUI or you can get from the
REST API[1].
In addition, if you enable debug logging, you will have a line
starting with "Effective executor configuration:" in your client logs
(although I am not 100% sure if this will contain all the
configuration
Hi Lorenzo,
If you want to learn how Flink uses watermarks, it is worth checking [1].
Now in a nutshell, what a watermark will do in a pipeline is that it
may fire timers that you may have registered, or windows that you may
have accumulated.
If you have no time-sensitive operations between the f
Hi Basanth,
If I understand your usecase correctly:
1) you want to find all A->B->C->D
2) for all of them you want to know how long it took to complete
3) if one completed within X it is considered ok, if not, it is
considered problematic
4) you want to count each one of them
One way to go is th
Hi Tomasz,
Thanks a lot for reporting this issue. If you have verified that the
job is running AND that the REST server is also up and running (e.g.
check the overview page) then I think that this should not be
happening. I am cc'ing Chesnay who may have an additional opinion on
this.
Cheers,
Kos
Hi Alexander,
Routing of input data in Flink can be done through keying and this can
guarantee collocation constraints. This means that you can send two
records to the same node by giving them the same key, e.g. the topic
name. Keep in mind that elements with different keys do not
necessarily go t
Hi John,
I think that using different plugins is not going to be an issue,
assuming that the scheme of your FS's do not collide. This is already
the case for S3 within Flink, where we have 2 implementations, one
based on Presto and one based on Hadoop. For the first you can use the
scheme s3p whil
Hi Alan,
Unfortunately not but the release is expected to come out in the next
couple of weeks, so then it will be available.
Until then, you can either copy the code of the AvroWriterFactory to
your project and use it from there, or download the project from
github, as you said.
Cheers,
Kostas
Hi Alan,
In the upcoming Flink 1.11 release, there will be support for Avro
using the AvroWriterFactory as seen in [1].
Do you think that this can solve your problem?
You can also download the current release-1.11 branch and also test it
out to see if it fits your needs.
Cheers,
Kostas
[1] http
I understand. Thanks for looking into it Senthil!
Kostas
On Tue, Jun 9, 2020 at 7:32 PM Senthil Kumar wrote:
>
> OK, will do and report back.
>
> We are on 1.9.1,
>
> 1.10 will take some time __
>
> On 6/9/20, 2:06 AM, "Kostas Kloudas" wrote:
>
>
is cancelled, Flink sends an Interrupt signal to the Thread
> running the Source.run method
>
>
>
> For some reason (unknown to me), this does not happen when a Stop command
> is issued.
>
>
>
> We ran into some minor issues because of said behavior.
>
>
>
>
Hi Omkar,
For the first part of the question where you set the "drain" to false
and the state gets drained, this can be an issue on our side. Just to
clarify, no matter what is the value of the "drain", Flink always
takes a savepoint. Drain simply means that we also send MAX_WATERMARK
before takin
Hi Senthil,
>From a quick look at the code, it seems that the cancel() of your
source function should be called, and the thread that it is running on
should be interrupted.
In order to pin down the problem and help us see if this is an actual
bug, could you please:
1) enable debug logging and see
What Arvid said is correct.
The only thing I have to add is that "stop" allows also exactly-once sinks
to push out their buffered data to their final destination (e.g.
Filesystem). In other words, it takes into account side-effects, so it
guarantees exactly-once end-to-end, assuming that you are
us
Hi all,
@Venkata, Do you have many small files being created as Arvid suggested? If
yes, then I tend to agree that S3 is probably not the best sink. Although I
did not get that from your description.
In addition, instead of PrintStream you can have a look at the code of the
SimpleStringEncoder in
Hi Singh,
The only thing to add to what Yang said is that the "execution.target"
configuration option (in the config file) is also used for the same
purpose from the execution environments.
Cheers,
Kostas
On Wed, May 27, 2020 at 4:49 AM Yang Wang wrote:
>
> Hi M Singh,
>
> The Flink CLI picks u
Hi all,
I would like to bring the discussion in
https://issues.apache.org/jira/browse/FLINK-17745 to the dev mailing
list, just to hear the opinions of the community.
In a nutshell, in the early days of Flink, users could submit their
jobs as fat-jars that had a specific structure. More concretel
Hi Eyal and Dawid,
@Eyal I think Dawid explained pretty well what is happening and why in
distributed settings, the underlying FS on which the StreamingFileSink
writes has to be durable and accessible to all parallel instances of
the job. Please let us know if you have any further questions.
Chee
Roshan Punnoose wrote:
>>
>> Nope just the s3a. I'll keep looking around to see if there is anything else
>> I can see. If you think of anything else to try, let me know.
>>
>> On Thu, Apr 9, 2020, 7:41 AM Kostas Kloudas wrote:
>>>
>>> It should n
;t see any exceptions there. Not sure where to look for
>>> s3 exceptions in particular.
>>>
>>> On Thu, Apr 9, 2020, 7:16 AM Kostas Kloudas wrote:
>>>>
>>>> Yes, this is why I reached out for further information.
>>>>
>>>> I
Hi Roshan,
Your logs refer to a simple run without any failures or re-running
from a savepoint, right?
I am asking because I am trying to reproduce it by running a modified
ParquetStreamingFileSinkITCase [1] and so far I cannot.
The ITCase runs against the local filesystem, and not S3, but I adde
Hi Eyal,
This is a known issue which is fixed now (see [1]) and will be part of
the next releases.
Cheers,
Kostas
[1] https://issues.apache.org/jira/browse/FLINK-16371
On Tue, Mar 24, 2020 at 11:10 AM Eyal Pe'er wrote:
>
> Hi all,
>
> I am trying to write a sink function that retrieves string
th the
>> `fileStateSizeThreshold` argument when constructing the `FsStateBackend`.
>> The purpose of that threshold is to ensure that the backend does not create
>> a large amount of very small files, where potentially the file pointers are
>> actually larger than the
ey Kostas,
>
> We’re a little bit off from a 1.10 update but I can certainly see if that
> CompressWriterFactory might solve my use case for when we do.
>
> If there is anything I can do to help document that feature, please let me
> know.
>
> Thanks!
>
> Austin
>
>
Hi Jacob,
Could you specify which StateBackend you are using?
The reason I am asking is that, from the documentation in [1]:
"Note that if you use the MemoryStateBackend, metadata and savepoint
state will be stored in the _metadata file. Since it is
self-contained, you may move the file and rest
m/austince/flink-streaming-file-sink-compression/tree/unbounded
>
> On Tue, Mar 3, 2020 at 3:50 AM Kostas Kloudas wrote:
>
>> Hi Austin and Rafi,
>>
>> @Rafi Thanks for providing the pointers!
>> Unfortunately there is no progress on the FLIP (or the issue).
>>
>
have
>> checkpointing enabled in my previous case -- still must be doing something
>> wrong with that config though.
>>
>> Thanks!
>> Austin
>>
>> [1]: https://github.com/austince/flink-streaming-file-sink-compression
>>
>>
>> On Mon, Fe
Hi Austin,
Dawid is correct in that you need to enable checkpointing for the
StreamingFileSink to work.
I hope this solves the problem,
Kostas
On Mon, Feb 24, 2020 at 11:08 AM Dawid Wysakowicz
wrote:
>
> Hi Austing,
>
> If I am not mistaken the StreamingFileSink by default flushes on checkpoint
Hi Juergen,
I will reply to your questions inline. As a general comment I would
suggest to also have a look at [3] so that you have an idea of some of
the alternatives.
With that said, here come the answers :)
1) We receive files every day, which are exports from some database
tables, containing
Hi Hemant,
Why not using simple connected streams, one containing the
measurements, and the other being the control stream with the
thresholds which are updated from time to time.
Both will be keyed by the device class, to make sure that the
measurements and the thresholds for a specific device cl
Parallelism for source function is 1 and for Process function its currently 2.
>
> Thanks for the response.
>
> —
> Akshay
>
> > On Feb 12, 2020, at 2:07 AM, Kostas Kloudas wrote:
> >
> > Hi Akshay,
> >
> > Could you be more specific on what you are tr
gt; Also wondering if it might be expressed in DataStream API.
>
> ср, 12 февр. 2020 г. в 13:06, Kostas Kloudas :
>>
>> Hi Oleg,
>>
>> Could you be more specific on what do you mean by
>> "for events of last n seconds(time units in general) for every incoming
Hi Salva,
Yes, the same applies to the Operator API as the output is not
thread-safe and there is no way of "checkpointing" the "in-flight"
data without explicit handling.
If you want to dig deeper, I would recommend to have a look also at
the source code of the AsyncWaitOperator to see how you co
Hi Akshay,
Could you be more specific on what you are trying to achieve with this scheme?
I am asking because if your source is too fast and you want it to slow
it down so that it produces data at the same rate as your process
function can consume them, then Flink's backpressure will eventually
d
Hi Salva and Yun,
Yun is correct on that the collector is not thread-safe so writing
should be guarded.
In addition, such a pattern that issues a request to a 3rd party
multi-threaded library and registers a callback for the future does
not play well with checkpointing. In your case, if a failure
Hi Apoorv,
I am not so familiar with the internal of RocksDB and how the number
of open files correlates with the number of (keyed) states and the
parallelism you have, but as a starting point you can have a look to
[1] for recommendations on how to tune RocksDb for large state and I
am also cc'in
Hi Oleg,
Could you be more specific on what do you mean by
"for events of last n seconds(time units in general) for every incoming event."?
Do you mean that you have a stream of parallelism 1 and you want for
each incoming element to have your function fire with input the event
itself and all the
Hi Fatima,
I am not super familiar with Parquet but your issue seems to be
related to [1], which seems to be expected behaviour on the Parquet
side.
The reason for this behaviour seems to be the format of the parquet
files which store only the leaf fields but not the structure of the
groups, so if
Hi John,
As you suggested, I would also lean towards increasing the number of
allowed open handles, but
for recommendation on best practices, I am cc'ing Piotr who may be
more familiar with the Kafka consumer.
Cheers,
Kostas
On Tue, Feb 11, 2020 at 9:43 PM John Smith wrote:
>
> Just wondering i
://issues.apache.org/jira/browse/FLINK-13027
[2] https://issues.apache.org/jira/browse/FLINK-15476
On Mon, Feb 3, 2020 at 8:14 PM Kostas Kloudas wrote:
> Hi Mark,
>
> Currently no, but if rolling on every checkpoint is ok with you, in future
> versions it is easy to allow to roll on every checkpoint,
BulkFormat?
>
> Best regards,
>
> Mark
> --
> *From:* Kostas Kloudas
> *Sent:* 03 February 2020 15:39
> *To:* Mark Harris
> *Cc:* Piotr Nowojski ; Cliff Resnick <
> cre...@gmail.com>; David Magalhães ; Till Rohrmann
> ; flink-u...@apache.org
> *Subj
s
On Mon, Feb 3, 2020 at 4:11 PM Mark Harris wrote:
> Hi Kostas,
>
> Sorry, stupid question: How do I set that for a StreamingFileSink?
>
> Best regards,
>
> Mark
> ----------
> *From:* Kostas Kloudas
> *Sent:* 03 February 2020 14:58
> *To:* Ma
Hi Mark,
Have you tried to set your rolling policy to close inactive part files
after some time [1]?
If the part files in the buckets are inactive and there are no new part
files, then the state handle for those buckets will also be removed.
Cheers,
Kostas
https://ci.apache.org/projects/flink/fl
anks,
> Pawel
>
>
> On Fri, 24 Jan 2020 at 20:45, Kostas Kloudas wrote:
>>
>> Hi Pawel,
>>
>> You are correct that counters are unique within the same bucket but
>> NOT across buckets. Across buckets, you may see the same counter being
>> used.
>>
Hi Pawel,
You are correct that counters are unique within the same bucket but
NOT across buckets. Across buckets, you may see the same counter being
used.
The max counter is used only upon restoring from a failure, resuming
from a savepoint or rescaling and this is done to guarantee that n
valid d
Oops, sorry for not sending the reply to everyone
and thanks David for reposting it here.
Great to hear that you solved your issue!
Kostas
On Wed, Jan 15, 2020 at 1:57 PM David Magalhães wrote:
>
> Sorry, I've only saw the replies today.
>
> Regarding my previous email,
>
>> Still, there is so
lose" from
> "success finish close" in StreamingFileSink?
>
> Best,
> Jingsong Lee
>
> On Mon, Dec 9, 2019 at 10:32 PM Kostas Kloudas wrote:
>>
>> Hi Li,
>>
>> This is the expected behavior. All the "exactly-once" sinks in Flink
>
Hi Krzysztof,
If I get it correctly, your main reason behind not using side-outputs
is that it seems that "side-output", by the name, seems to be a
"second class citizen" compared to the main output.
I see your point but in terms of functionality, there is no difference
between the different outp
Hi Kristoff,
The recommended alternative is to use SideOutputs as described in [1].
Could you elaborate why you think side outputs are not a good choice
for your usecase?
Cheers,
Kostas
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html
On Thu, Dec 19, 2019
Thanks a lot for reporting this!
I believe that this can be really useful for the community!
Cheers,
Kostas
On Tue, Dec 17, 2019 at 1:29 PM spoganshev wrote:
>
> In case you experience an exception similar to the following:
>
> org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException:
Hi all,
With the feature-freeze for the release-1.10 already past us, it is
time to focus a little bit on documenting the new features that the
community added to this release, and improving the already existing
documentation based on questions that we see in Flink's mailing lists.
To this end, I
configured to be 5 minutes. You see that the preceding commits follow this
> pattern of one commit per checkpoint interval, which is what we expect. It's
> very strange that two files for the same TopicPartition (same TaskManager)
> are committed.
>
>
> I am eager to hear y
Hi Pankaj,
When you start a session cluster with the bin/yarn-session.sh script,
Flink will create the cluster and then write a "Yarn Properties file"
named ".yarn-properties-YOUR_USER_NAME" in the directory:
either the one specified by the option "yarn.properties-file.location"
in the flink-conf.
Hi Li,
This is the expected behavior. All the "exactly-once" sinks in Flink
require checkpointing to be enabled.
We will update the documentation to be clearer in the upcoming release.
Thanks a lot,
Kostas
On Sat, Dec 7, 2019 at 3:47 AM Li Peng wrote:
>
> Ok I seem to have solved the issue by e
Hi Harrison,
One thing to keep in mind is that Flink will only write files if there
is data to write. If, for example, your partition is not active for a
period of time, then no files will be written.
Could this explain the fact that dt 2019-11-24T01 and 2019-11-24T02
are entirely skipped?
In add
As a side note, I am assuming that you are using the same Flink Job
before and after the savepoint and the same Flink version.
Am I correct?
Cheers,
Kostas
On Mon, Nov 25, 2019 at 2:40 PM Kostas Kloudas wrote:
>
> Hi Singh,
>
> This behaviour is strange.
> One thing I can reco
Hi Singh,
This behaviour is strange.
One thing I can recommend to see if the two jobs are identical is to
launch also the second job without a savepoint,
just start from scratch, and simply look at the web interface to see
if everything is there.
Also could you please provide some code from your
Hi Vinay,
You are correct when saying that the bulk formats only support
onCheckpointRollingPolicy.
The reason for this has to do with the fact that currently Flink
relies on the Hadoop writer for Parquet.
Bulk formats keep important details about how they write the actual
data (such as compress
Hi Amran,
If you want to know from which partition your input data come from,
you can always have a separate bucket for each partition.
As described in [1], you can extract the offset/partition/topic
information for an incoming record and based on this, decide the
appropriate bucket to put the rec
Hi all,
Big +1 for contributing Stateful Functions to Flink and as for the
main question at hand, I would vote for putting it in the main
repository.
I understand that this can couple the release cadence of Flink and
Stateful Functions although I think the pros of having a "you break
it,
you fix
Hi Anton,
First of all, there is this PR
https://github.com/apache/flink/pull/9581 that may be interesting to
you.
Second, I think you have to keep in mind that the hourly bucket
reporting will be per-subtask. So if you have parallelism of 4, each
of the 4 tasks will report individually that they
1 - 100 of 424 matches
Mail list logo