This is awesome, Stephan! Thanks for doing this.
-Jamie
On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote:
> Here is the pull request with a draft of the roadmap:
> https://github.com/apache/flink-web/pull/178
>
> Best,
> Stephan
>
> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote:
>
>>
Vishal, that answer to your question about IngestionTime is "no".
Ingestion time in this context means the time the data was read by Flink
not the time it was written to Kafka.
To get the effect you're looking for you have to set
TimeCharacteristic.EventTime and follow the instructions here:
Run each job individually as described here:
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn
Yes they will run concurrently and be completely isolated from each other.
-Jamie
On Sun, Jan 27, 2019 at 6:08 AM Eran Twili
t every snapshot is written to a different folder. And
> they are supposed to represent the state of the whole table at a point in
> time.
>
> On Fri, Jan 18, 2019, 8:26 AM Jamie Grier
>> Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger never
>> PUR
Sorry my earlier comment should read: "It would just read all the files in
order and NOT worry about which data rows are in which files"
On Fri, Jan 18, 2019 at 10:00 AM Jamie Grier wrote:
> Hmm.. I would have to look into the code for the StreamingFileSink more
> close
Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger never
PURGES but only FIRES what I said is semantically true. The window
contents are never cleared.
What I missed is that in this case since you're using a function that
incrementally reduces on the fly rather than processing all
e from one months ago. And I’m searching a way on how
> to dump this data into a working flink application which already processed
> this data (watermarks are far away from those dates).
>
> On Fri 18. Jan 2019 at 03:22, Jamie Grier wrote:
>
>> I don't think I understood all of yo
I'm not sure if this is required. It's quite convenient to be able to just
grab a single tarball and you've got everything you need.
I just did this for the latest binary release and it was 273MB and took
about 25 seconds to download. Of course I know connection speeds vary
quite a bit but I
If I'm understanding you correctly you're just trying to do some data
reduction so that you write data for each key once every five minutes
rather than for every CDC update.. Is that correct? You also want to keep
the state for most recent key you've ever seen so you don't apply writes
out of
I don't think I understood all of your question but with regard to the
watermarking and keys.. You are correct that watermarking (event time
advancement) is not per key. Event-time is a local property of each Task
in an executing Flink job. It has nothing to do with keys. It has only to
do
+1 to what Zhenghua said. You're abusing the metrics system I think.
Rather just do a stream.keyBy().sum() and then write a Sink to do something
with the data -- for example push it to your metrics system if you wish.
However, from experience, many metrics systems don't like that sort of
thing.
Avi,
The stack trace there is pretty much a red herring. That happens whenever
a job shuts down for any reason and is not a root cause. To diagnose this
you will want to look at all the TaskManager logs as well as the JobManager
logs. If you have a way to easily grep these (all of them at
n 15, 2019 at 6:27 AM bastien dine wrote:
> Hello Jamie,
>
> Does #1 apply to batch jobs too ?
>
> Regards,
>
> --
>
> Bastien DINE
> Data Architect / Software Engineer / Sysadmin
> bastiendine.io
>
>
> Le lun. 14 janv. 2019 à 20:39, Jami
There are a lot of different ways to deploy Flink. It would be easier to
answer your question with a little more context about your use case but in
general I would advocate the following:
1) Don't run a "permanent" Flink cluster and then submit jobs to it.
Instead what you should do is run an
Flink is designed such that local state is backed up to a highly available
system such as HDFS or S3. When a TaskManager fails state is recovered
from there.
I suggest reading this:
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html
On Fri, Jan 11, 2019
What you're describing is not possible. There is no runtime context or
metrics you can use at that point.
The best you can probably do (at least for start time) is just keep a flag
in your function and log a metric once and only once when it first starts
executing.
On Wed, Nov 21, 2018 at 5:18
Hi Avi,
The typical approach would be as you've described in #1. #2 is not
necessary -- #1 is already doing basically exactly that.
-Jamie
On Wed, Nov 21, 2018 at 3:36 AM Avi Levi wrote:
> Hi ,
> I am very new to flink so please be gentle :)
>
> *The challenge:*
> I have a road sensor that
Hi Chang,
The partitioning steps, like keyBy() are not operators. In general you can
let Flink's fluent-style API tell you the answer. If you can call .uid()
in the API and it compiles then the thing just before that is an operator ;)
-Jamie
On Wed, Nov 21, 2018 at 5:59 AM Chang Liu wrote:
Hi Vishal,
No, there is no way to do this currently.
On Wed, Nov 21, 2018 at 10:22 AM Vishal Santoshi
wrote:
> Any one ?
>
> On Tue, Nov 20, 2018 at 12:48 PM Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> Is it possible to have checkpointing but reset the kafka offsets to
>>
interpreted
as the hostname for the jobmanager to bind to.
The solution was just to remove `cluster` from that command.
On Tue, Sep 25, 2018 at 10:15 AM Jamie Grier wrote:
> Anybody else seen this and know the solution? We're dead in the water
> with Flink 1.5.4.
>
> On Sun, Sep 23, 2018
Anybody else seen this and know the solution? We're dead in the water with
Flink 1.5.4.
On Sun, Sep 23, 2018 at 11:46 PM alex wrote:
> We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We
> have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers
>
Anybody else seen this? I'm running both the JM and TM on the same host in
this setup. This was working fine w/ Flink 1.5.3.
On the TaskManager:
00:31:30.268 INFO o.a.f.r.t.TaskExecutor - Could not resolve
ResourceManager address akka.tcp://flink@localhost:6123/user/resourcemanager,
retrying
Hey Cliff, can you provide the stack trace of the issue you were seeing?
We recently ran into a similar issue that we're still debugging. Did it
look like this:
java.lang.IllegalStateException: Could not initialize operator state
> backend.
> at
>
Hi all,
I'm looking to learn if/how others are running Flink jobs in such a way
that they can survive failure of a single Amazon AWS availability zone.
If you're currently doing this I would love a reply detailing your setup.
Thanks!
-Jamie
but i
> need to know the topic to write to and for that I need to be able to read
> the key. Is there a way to do this?
>
>
> Is there a better way to do this, rather than using a KeyedStream.
>
>
> Paul
>
--
Jamie Grier
data Artisans, Director of Applications En
t?
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/keyBy-called-
> twice-Second-time-INetAddress-and-Array-Byte-are-empty-tp10907p10947.html
> Sent from the Apache Flink User Mailing List archive. mailing list arc
.print()
> -
> While this executes, it breaks the assignment of the keys to the tasks:
> The "ExpensiveOperation" is now not executed on the same nodes anymore all
> the time (visible by the prefixes in the print()).
>
> What am I doing wrong? Is the only cha
gt; to user sessions identified for windows.
>
> 4. I also may have an additional requirement of writing out each event
> enriched with current session and profile data. I basically could do this
> again with generic window function and write out each event with collector
> when it
ome code on github (tests files) where it’s done using the
> underlying akka framework, I don’t mind doing it the same way and creating
> an actor to get notifications messages, but I don’t know the best way, and
> there probably is a better one.
>
>
>
> Thanks in advance,
>
>
>
Hemel Hempstead, Hertfordshire, HP2
> 4NN. Rave Technologies (India) Pvt Limited, registered in India under
> number 117068 with a registered address of 2nd Floor, Ballard House, Adi
> Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 41.
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
-docs-release-1.1/concepts/concepts.html
On Thu, Dec 15, 2016 at 3:21 PM, Jamie Grier <ja...@data-artisans.com>
wrote:
> All streams can be parallelized in Flink even with only one source. You
> can have multiple sinks as well.
>
> On Thu, Dec 15, 2016 at 7:56 AM, Meghashyam Sandee
ltiple sinks as well?
>
> On Dec 14, 2016 10:46 PM, <dromitl...@gmail.com> wrote:
>
>> Got it. Thanks!
>>
>> On Dec 15, 2016, at 02:58, Jamie Grier <ja...@data-artisans.com> wrote:
>>
>> Ahh, sorry, for #2: A single Flink job can have as many sou
when using a custom trigger, not the way it assigns windows, it makes sense
> now.
>
> Regarding #4, after doing some more tests I think it's more complex than I
> first thought. I'll probably create another thread explaining more that
> specific question.
>
> Thanks,
> Matt
&
rigger that fires on every new element, with up
> to 10 elements at a time. The result would be windows of sizes: 1 element,
> then 2, 3, ..., 9, 10, 10, 10, Is there a way to achieve this with
> predefined triggers or a custom trigger is the only way to go here?
>
> Best rega
http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Flink-Metrics-
> InfluxDB-Grafana-Help-with-query-influxDB-query-for-
> Grafana-to-plot-numRecordsIn-numRen-tp9775p9819.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
p-with-query-influxDB-query-for-
> Grafana-to-plot-numRecordsIn-numRen-tp9775p9816.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
PM, Jamie Grier <ja...@data-artisans.com> wrote:
> Another note. In the example the template variable type is "custom" and
> the values have to be enumerated manually. So in your case you would have
> to configure all the possible values of "subtask" to be 0-49.
&g
Another note. In the example the template variable type is "custom" and
the values have to be enumerated manually. So in your case you would have
to configure all the possible values of "subtask" to be 0-49.
On Tue, Nov 1, 2016 at 2:43 PM, Jamie Grier <ja...@d
t; multiple subtask values. I have tried the 'All' option for this templating
> variable- This give me an incorrect plot showing me negative values while
> the individual selection of subtask values when selected from the
> templating variable drop down yields correct result.
>
> Thank you!
>
> Regards,
> Anchit
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
one give me some hints on how Flink manage window buffer and
> how streaming manages its memory. I see this page on batch api memory
> management and wonder what is the equivalent for streaming?
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
>
> --
> C
anager and task-manager containers.
>
> Thanks,
> Prabhu
>
> On Thu, Aug 4, 2016 at 3:07 PM, Jamie Grier <ja...@data-artisans.com>
> wrote:
>
>> Use *env.java.opts*
>>
>> This will be respected by the YARN client.
>>
>>
>>
>> On Thu,
nored by the yarn client, is there a
> way to set the jvm opts for yarn ?
>
> Thanks,
> Prabhu
>
> On Wed, Aug 3, 2016 at 7:03 PM, Prabhu V <vpra...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there a way to set jvm options on the yarn application-manager and
>> task-
or adding new operations, windows, etc to a
> running application? Should I start multiple execution contexts?
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Adding-and-removing-operations-after-execute-tp7863.htm
stock market data. In this, for every symbol i
> want to find max of stock price in last 10 mins. I want to generate
> watermarks specific to key rather than across the stream. Is this possible
> in flink?
>
> --
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>
-
to big logs files (>4MB).
>
> How can I disable from my Java code (through the Configuration object)
> the progress messages displayed in console?
>
> Thanks,
> Andres
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
his is the way to go. My
>> state
>> > backend is HDFS and I can see that the checkpoint path has the data
>> that has
>> > been buffered in the window.
>> >
>> > I want to start the job in a way such that it will read the
>> checkpointed
>> > data before
> estimation. This approach is based on estimation and may add execution
> latency to those windows.
>
> Which would be suggested way in general?
>
> Thanks,
> Chen
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
t;
>
> *HERE Seattle*
>
> 701 Pike St, Suite 2000, Seattle, WA 98101
>
> *47° 36' 41" N. 122° 19' 57" W
> <http://here.com/usa/seattle/98101/pike-st/701/map=47.611439,-122.332741,17/title=HERE%20Seattle%20-%20701%20Pike%20Street>*
>
>
>
> <http://360.here.com/> <https://twitter.com/here>
> <https://www.facebook.com/here><https://linkedin.com/company/heremaps>
> <https://www.instagram.com/here>
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
ately or is failure recovery a function
> of HA?
>
> 5. How would I migrate the RocksDB state once I move to HA mode? Is there
> a straight forward path?
>
> Thanks for your time,
>
> Ryan
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
ailure ?
>
> Thanks,
> Prabhu
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Failed-job-restart-flink-on-yarn-tp7764.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
attle*
>
> 701 Pike St, Suite 2000, Seattle, WA 98101
>
> *47° 36' 41" N. 122° 19' 57" W
> <http://here.com/usa/seattle/98101/pike-st/701/map=47.611439,-122.332741,17/title=HERE%20Seattle%20-%20701%20Pike%20Street>*
>
>
>
> <http://360.here.com/> <https://twitter.com/here>
> <https://www.facebook.com/here><https://linkedin.com/company/heremaps>
> <https://www.instagram.com/here>
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
.tuple.Tuple]*
>
> Is this a Scala issue? Should I switch over to Java?
>
>
> Thanks!
> Eamon
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
asReachedEnd;
> }
> }
>
> This class returns the content of the whole file as a string.
>
> Is this the right approach?
> It seems to work when run locally with local files but I wonder if it would
> run into problems when tested in a cluster.
>
> Thanks in advance.
>
luding any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you
trol
>> > > kafka topic. The concern we had there was we would almost
>> completely
>> > > lose insight into what was going on if there was a slow down.
>> > > 3. The current approach we are using for creating dynamic jobs is
>> >
55 matches
Mail list logo