ing some
> documentation and compiling it.
> I’m new with this but I’d like to do a small contribution, I will send
> another email to the list explaining how I’ll try to help.
>
> Thanks!!
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrie
c` methods has been added
> > ---
> >
> >
> > The intended effects would be:
> > - reduce friction in the PR process created by basic oversights such as
> > checkstyle violations or missing tests
> > - provide a helping hand for new contributors
> >
&g
t; SDK not defined
> Do we need to select JDK and point to local jdk install path?
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
Hey all,
I've noticed a few times now when trying to help users implement particular
things in the Flink API that it can be complicated to map what they know
they are trying to do onto higher-level Flink concepts such as windowing or
Connect/CoFlatMap/ValueState, etc.
At some point it just
; .apply(...)
> >
> > DataStream lateData = windowedResult.getSideOutput();
> >
> > Right now, the result of window operations is a
> > SingleOutputStreamOperator, same as it is for all DataStream
> operations.
> > Making the result type more specific, i.e. a WindowedOperator, wou
cases of their own they've run into or has ideas for a
solution to this problem.
Thanks for reading..
-Jamie
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
h this to the users first but currently I thinm its not
> easy
> > > enough to be good starting point.
> > >
> > > The user needs to understand a lot about the system if the dont want to
> > > hurt other parts of the pipeline. For insance working with the
&
> as time passes, i.e. 0, 10, 15, 20. For this we need to keep track of
> > > the watermarks of A and B separately. But after we have a correct
> > > watermark for the bucket, all we need to care about is the bucket
> > > watermarks. So somewhere (most probably at the source) w
a predefined OutputTag which user have no control nor definition
> > an extra UDF which essentially filter out all mainOutputs and only let
> > sideOutput pass (like filterFunction)
> >
> > Thanks,
> > Chen
> >
> > > On Feb 24, 2017, at 1:17 PM, Jamie Grier &
o this is to consume that configuration as a stream and
hold the configuration internally in the state of a particular user
function.
>
> Thanks
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
. Please advise on the better approach to handle these kind of
> scenarios and how other applications are handling it. Thanks.
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
users to pass a custom parameter when registering the
> > callback, and the parameter would be passed to the onTimer method
> > - Allow users to pass custom callback functions when registering the
> > timers, but this would mean we have to support some sort of context for
> > accessi
--
> View this message in context: http://apache-flink-mailing-
> list-archive.1008284.n3.nabble.com/Categorize-or-
> GroupBy-datastream-data-and-process-with-CEP-separately-tp15139.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>
--
J
ny potential scaling limitations as the processing capacity increases.
>
> Thanks
> Govind
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
84.n3.nabble.com/Categorize-or-
> GroupBy-datastream-data-and-process-with-CEP-separately-tp15139p15148.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
re has already been an attempt to come to consensus [1],
> > > which
> > > > > resulted in two design documents. I tried to consolidate those two
> and
> > > > > also added a section about implementation plans. This is the
> resulting
> > > > > FLIP:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> > > > 17+Side+Inputs+for+DataStream+API
> > > > >
> > > > >
> > > > > In terms of semantics I tried to go with the minimal viable
> solution.
> > > > > The part that needs discussing is how we want to implement this. I
> > > > > outlined three possible implementation plans in the FLIP but what
> it
> > > > > boils down to is that we need to introduce some way of getting
> several
> > > > > inputs into an operator/task.
> > > > >
> > > > >
> > > > > Please have a look at the doc and let us know what you think.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Aljoscha
> > > > >
> > > > >
> > > > >
> > > > > [1]
> > > > > https://lists.apache.org/thread.html/
> 797df0ba066151b77c7951fd7d603a
> > > > 8afd7023920d0607a0c6337db3@1462181294@%3Cdev.flink.apache.org%3E
> > > > >
> > > >
> > >
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
eam()" API without the explicit
> >>>>> OutputTag.
> >>>>>
> >>>>> It seems the backwards compatibility checker doesn't like this and
> >>>>> complains about breaking binary backwards compatibility. +Robert
> >>> Metzger
&
I know this is a very simplistic idea but...
In general the issue Eron is describing occurs whenever two (or more)
parallel partitions are assigned to the same Flink sub-task and there is
large time delta between them. This problem exists though largely because
we are not making any decisions
Big +1 on trying to come up with a common framework for partition-based,
replayable sources. There is so much common code to be written that makes
it possible to write correct connectors and Gordon's bullet points are
exactly those -- and it's not just Kinesis and Kafka. It's also true for
Is the `flink-connector-filesystem` connector supposed to work with the
latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto`
filesystem implementation?
-Jamie
.6 is to provide a BucketingSink that uses the Flink
> FileSystem and also works well with eventually consistent file systems.
>
> --
> Aljoscha
>
> > On 23. Feb 2018, at 06:31, Jamie Grier <jgr...@lyft.com> wrote:
> >
> > Is the `flink-connector-filesystem` connect
on YARN and
> only on a Hadoop-free cluster. Maybe...
>
> > On 23. Feb 2018, at 13:32, Jamie Grier <jgr...@lyft.com> wrote:
> >
> > Thanks, Aljoscha :)
> >
> > So is it possible to continue to use the new "native' fllesystems along
> > with the BucketingSi
This is great, Gyula! A colleague here at Lyft has also done some work
around bootstrapping DataStream programs and we've also talked a bit about
doing this by running DataSet programs.
On Fri, Aug 17, 2018 at 3:28 AM, Gyula Fóra wrote:
> Hi All!
>
> I want to share with you a little project
I think we need to modify the way we write checkpoints to S3 for high-scale
jobs (those with many total tasks). The issue is that we are writing all
the checkpoint data under a common key prefix. This is the worst case
scenario for S3 performance since the key is used as a partition key.
In the
Here's a doc I started describing some changes we would like to make
starting with the Kinesis Source.. It describes a refactoring of that code
specifically and also hopefully a pattern and some reusable code we can use
in the other sources as well. The end goal would be best-effort event-time
Thanks Aljoscha for getting this effort going!
There's been plenty of discussion here already and I'll add my big +1 to
making this interface very simple to implement for a new
Source/SplitReader. Writing a new production quality connector for Flink
is very difficult today and requires a lot of
ds. The JobManagerwill accumulator or state all
> the
> > > reported progress and then give responses for different source tasks.
> We
> > > can define a protocol for indicating the fast soruce task to sleep for
> > > specific time for example. To do so, the coordinator ha
I'll add to what Thomas already said.. The larger issue driving this is
that when reading from a source with many parallel partitions, especially
when reading lots of historical data (or recovering from downtime and there
is a backlog to read), it's quite common for there to develop an event-time
Okay, so I think there is a lot of agreement here about (a) This is a real
issue for people, and (b) an ideal long-term approach to solving it.
As Aljoscha and Elias said a full solution to this would be to also
redesign the source interface such that individual partitions are exposed
in the API
?
On Wed, Oct 10, 2018 at 6:58 AM Jamie Grier wrote:
> Okay, so I think there is a lot of agreement here about (a) This is a real
> issue for people, and (b) an ideal long-term approach to solving it.
>
> As Aljoscha and Elias said a full solution to this would be to also
> rede
I'm not sure if this is required. It's quite convenient to be able to just
grab a single tarball and you've got everything you need.
I just did this for the latest binary release and it was 273MB and took
about 25 seconds to download. Of course I know connection speeds vary
quite a bit but I
+1 to try the bot solution and see how it goes.
On Fri, Jan 11, 2019 at 6:54 AM jincheng sun
wrote:
> +1 for the bot solution!
> and I think Timo‘s suggestion is very useful!
> Thanks,
> Jincheng
>
>
> Timo Walther 于2019年1月11日 周五22:44写道:
>
> > Thanks for bringing up this discussion again. +1
One unfortunate problem with the current back-pressure detection mechanism
is that it doesn't work well with all of our sources. The problem is that
some sources (Kinesis for sure) emit elements from threads Flink knows
nothing about and therefore those stack traces aren't sampled. The result
is
.. Maybe we should add a way to register those threads such that they are
also sampled. Thoughts?
On Thu, Jan 3, 2019 at 10:25 AM Jamie Grier wrote:
> One unfortunate problem with the current back-pressure detection mechanism
> is that it doesn't work well with all of our s
I think maybe if I understood this correctly this design is going in the
wrong direction. The problem with Flink logging, when you are running
multiple jobs in the same TMs, is not just about separating out the
business level logging into separate files. The Flink framework itself
logs many
We've run into an issue that limits the max parallelism of jobs we can run
and what it seems to boil down to is that the JobManager becomes
unresponsive while essentially spending all of it's time discarding
checkpoints from S3. This results in sluggish UI, sporadic
AkkaAskTimeouts, heartbeat
eartbeat misses and akka timeouts during deletes.
> Are some parts of the deletes happening sychronously in the actor thread?
>
> On Wed, Mar 6, 2019 at 3:40 PM Jamie Grier
> wrote:
>
> > We've run into an issue that limits the max parallelism of jobs we can
>
This is awesome, Stephan! Thanks for doing this.
-Jamie
On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote:
> Here is the pull request with a draft of the roadmap:
> https://github.com/apache/flink-web/pull/178
>
> Best,
> Stephan
>
> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote:
>
>>
gt; Subject: Re: JobManager scale limitation - Slow S3 checkpoint deletes
> >
> > Nice!
> >
> > Perhaps for file systems without TTL/expiration support (AFAIK includes
> > HDFS), cleanup could be performed in the task managers?
> >
> >
> > On Wed
I had the same reaction initially as some of the others on this thread --
which is "Use Kafka quotas".. I agree that in general a service should
protect itself with it's own rate limiting rather than building it into
clients like the FlinkKafkaConsumer.
However, there are a few reasons we need
Hey all,
I need to make sense of this behavior. Any help would be appreciated.
Here’s an example of a set of Flink checkpoint metrics I don’t understand.
This is the first operator in a job and as you can see the end-to-end time
for the checkpoint is long, but it’s not explained by either sync,
still be
> alignment buffers, which need to be processes while the next checkpoint has
> already started.
>
> Cheers,
>
> Konstantin
>
>
>
> On Sat, Sep 14, 2019 at 1:35 AM Jamie Grier
> wrote:
>
> > Here's the second screenshot I forgot to include:
> &
Hope that helps understanding what is going on.
>
> Best,
> Stephan
>
>
> On Thu, Sep 12, 2019 at 1:25 AM Seth Wiesman wrote:
>
> > Great timing, I just debugged this on Monday. E2e time is checkpoint
> > coordinator to checkpoint coordinator, so it includes RPC to
Here's the second screenshot I forgot to include:
https://pasteboard.co/IxhNIhc.png
On Fri, Sep 13, 2019 at 4:34 PM Jamie Grier wrote:
> Alright, here's another case where this is very pronounced. Here's a link
> to a couple of screenshots showing the overall stats for a slow task as
it take 29 minutes to consume this data in the sink?
Anyway, I'd appreciate and feedback or insights.
Thanks :)
-Jamie
On Fri, Sep 13, 2019 at 12:11 PM Jamie Grier wrote:
> Thanks Seth and Stephan,
>
> Yup, I had intended to upload a image. Here it is:
> https://pasteboard.co
appreciated!
-Jamie Grier
On 2022/10/28 06:06:49 Shengkai Fang wrote:
> Hi.
>
> > Is there a possibility for us to get engaged and at least introduce
> initial changes to support authentication/authorization?
>
> Yes. You can write a FLIP about the design and change. We can disc
Sorry for coming very late to this thread. I have not contributed much to
Flink publicly for quite some time but I have been involved with Flink,
daily, for years now and I'm keenly interested in where we take Flink SQL
going forward.
Thanks for the proposal!! I think it's definitely a step in
I think
> that will just add to users' confusion and learning curve which is already
> substantial with Flink. We need to make things easier rather than harder.
>
> Also, just to clarify, and sorry if my previous statement may not be that
> accurate, this is not a new concept, it is
Jamie Grier created FLINK-3617:
--
Summary: NPE from CaseClassSerializer when dealing with null
Option field
Key: FLINK-3617
URL: https://issues.apache.org/jira/browse/FLINK-3617
Project: Flink
Jamie Grier created FLINK-3627:
--
Summary: Task stuck on lock in StreamSource when cancelling
Key: FLINK-3627
URL: https://issues.apache.org/jira/browse/FLINK-3627
Project: Flink
Issue Type: Bug
Jamie Grier created FLINK-3679:
--
Summary: DeserializationSchema should handle zero or more outputs
for every input
Key: FLINK-3679
URL: https://issues.apache.org/jira/browse/FLINK-3679
Project: Flink
Jamie Grier created FLINK-3680:
--
Summary: Remove or improve (not set) text in the Job Plan UI
Key: FLINK-3680
URL: https://issues.apache.org/jira/browse/FLINK-3680
Project: Flink
Issue Type
Jamie Grier created FLINK-4391:
--
Summary: Provide support for asynchronous operations over streams
Key: FLINK-4391
URL: https://issues.apache.org/jira/browse/FLINK-4391
Project: Flink
Issue
Jamie Grier created FLINK-5635:
--
Summary: Improve Docker tooling to make it easier to build images
and launch Flink via Docker tools
Key: FLINK-5635
URL: https://issues.apache.org/jira/browse/FLINK-5635
Jamie Grier created FLINK-4947:
--
Summary: Make all configuration possible via flink-conf.yaml and
CLI.
Key: FLINK-4947
URL: https://issues.apache.org/jira/browse/FLINK-4947
Project: Flink
Jamie Grier created FLINK-4980:
--
Summary: Include example source code in Flink binary distribution
Key: FLINK-4980
URL: https://issues.apache.org/jira/browse/FLINK-4980
Project: Flink
Issue
Jamie Grier created FLINK-6199:
--
Summary: Single outstanding Async I/O operation per key
Key: FLINK-6199
URL: https://issues.apache.org/jira/browse/FLINK-6199
Project: Flink
Issue Type
Jamie Grier created FLINK-10154:
---
Summary: Make sure we always read at least one record in
KinesisConnector
Key: FLINK-10154
URL: https://issues.apache.org/jira/browse/FLINK-10154
Project: Flink
Jamie Grier created FLINK-9061:
--
Summary: S3 checkpoint data not partitioned well -- causes errors
and poor performance
Key: FLINK-9061
URL: https://issues.apache.org/jira/browse/FLINK-9061
Project
Jamie Grier created FLINK-10888:
---
Summary: Expose new global watermark RPC to sources
Key: FLINK-10888
URL: https://issues.apache.org/jira/browse/FLINK-10888
Project: Flink
Issue Type: Sub
Jamie Grier created FLINK-10886:
---
Summary: Event time synchronization across sources
Key: FLINK-10886
URL: https://issues.apache.org/jira/browse/FLINK-10886
Project: Flink
Issue Type
Jamie Grier created FLINK-10887:
---
Summary: Add source watermarking tracking to the JobMaster
Key: FLINK-10887
URL: https://issues.apache.org/jira/browse/FLINK-10887
Project: Flink
Issue Type
Jamie Grier created FLINK-10484:
---
Summary: New latency tracking metrics format causes metrics
cardinality explosion
Key: FLINK-10484
URL: https://issues.apache.org/jira/browse/FLINK-10484
Project
Jamie Grier created FLINK-11617:
---
Summary: CLONE - Handle AmazonKinesisException gracefully in
Kinesis Streaming Connector
Key: FLINK-11617
URL: https://issues.apache.org/jira/browse/FLINK-11617
64 matches
Mail list logo