Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread Jamie Grier
This is awesome, Stephan! Thanks for doing this. -Jamie On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote: > Here is the pull request with a draft of the roadmap: > https://github.com/apache/flink-web/pull/178 > > Best, > Stephan > > On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote: > >>

Re: About KafkaConsumer and WM'ing and EventTime charactersitics

2019-01-30 Thread Jamie Grier
Vishal, that answer to your question about IngestionTime is "no". Ingestion time in this context means the time the data was read by Flink not the time it was written to Kafka. To get the effect you're looking for you have to set TimeCharacteristic.EventTime and follow the instructions here:

Re: Flink Yarn Cluster - Jobs Isolation

2019-01-29 Thread Jamie Grier
Run each job individually as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn Yes they will run concurrently and be completely isolated from each other. -Jamie On Sun, Jan 27, 2019 at 6:08 AM Eran Twili

Re: Dump snapshot of big table in real time using StreamingFileSink

2019-01-18 Thread Jamie Grier
t every snapshot is written to a different folder. And > they are supposed to represent the state of the whole table at a point in > time. > > On Fri, Jan 18, 2019, 8:26 AM Jamie Grier >> Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger never >> PUR

Re: Dump snapshot of big table in real time using StreamingFileSink

2019-01-18 Thread Jamie Grier
Sorry my earlier comment should read: "It would just read all the files in order and NOT worry about which data rows are in which files" On Fri, Jan 18, 2019 at 10:00 AM Jamie Grier wrote: > Hmm.. I would have to look into the code for the StreamingFileSink more > close

Re: Dump snapshot of big table in real time using StreamingFileSink

2019-01-18 Thread Jamie Grier
Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger never PURGES but only FIRES what I said is semantically true. The window contents are never cleared. What I missed is that in this case since you're using a function that incrementally reduces on the fly rather than processing all

Re: Any advice on how to replay an event-timed stream?

2019-01-18 Thread Jamie Grier
e from one months ago. And I’m searching a way on how > to dump this data into a working flink application which already processed > this data (watermarks are far away from those dates). > > On Fri 18. Jan 2019 at 03:22, Jamie Grier wrote: > >> I don't think I understood all of yo

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-18 Thread Jamie Grier
I'm not sure if this is required. It's quite convenient to be able to just grab a single tarball and you've got everything you need. I just did this for the latest binary release and it was 273MB and took about 25 seconds to download. Of course I know connection speeds vary quite a bit but I

Re: Dump snapshot of big table in real time using StreamingFileSink

2019-01-17 Thread Jamie Grier
If I'm understanding you correctly you're just trying to do some data reduction so that you write data for each key once every five minutes rather than for every CDC update.. Is that correct? You also want to keep the state for most recent key you've ever seen so you don't apply writes out of

Re: Any advice on how to replay an event-timed stream?

2019-01-17 Thread Jamie Grier
I don't think I understood all of your question but with regard to the watermarking and keys.. You are correct that watermarking (event time advancement) is not per key. Event-time is a local property of each Task in an executing Flink job. It has nothing to do with keys. It has only to do

Re: Issue with counter metrics for large number of keys

2019-01-17 Thread Jamie Grier
+1 to what Zhenghua said. You're abusing the metrics system I think. Rather just do a stream.keyBy().sum() and then write a Sink to do something with the data -- for example push it to your metrics system if you wish. However, from experience, many metrics systems don't like that sort of thing.

Re: Getting RemoteTransportException

2019-01-17 Thread Jamie Grier
Avi, The stack trace there is pretty much a red herring. That happens whenever a job shuts down for any reason and is not a root cause. To diagnose this you will want to look at all the TaskManager logs as well as the JobManager logs. If you have a way to easily grep these (all of them at

Re: One TaskManager per node or multiple TaskManager per node

2019-01-15 Thread Jamie Grier
n 15, 2019 at 6:27 AM bastien dine wrote: > Hello Jamie, > > Does #1 apply to batch jobs too ? > > Regards, > > -- > > Bastien DINE > Data Architect / Software Engineer / Sysadmin > bastiendine.io > > > Le lun. 14 janv. 2019 à 20:39, Jami

Re: One TaskManager per node or multiple TaskManager per node

2019-01-14 Thread Jamie Grier
There are a lot of different ways to deploy Flink. It would be easier to answer your question with a little more context about your use case but in general I would advocate the following: 1) Don't run a "permanent" Flink cluster and then submit jobs to it. Instead what you should do is run an

Re: What happen to state in Flink Task Manager when crash?

2019-01-11 Thread Jamie Grier
Flink is designed such that local state is backed up to a highly available system such as HDFS or S3. When a TaskManager fails state is recovered from there. I suggest reading this: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html On Fri, Jan 11, 2019

Re: Metric on JobManager

2018-11-21 Thread Jamie Grier
What you're describing is not possible. There is no runtime context or metrics you can use at that point. The best you can probably do (at least for start time) is just keep a flag in your function and log a metric once and only once when it first starts executing. On Wed, Nov 21, 2018 at 5:18

Re: your advice please regarding state

2018-11-21 Thread Jamie Grier
Hi Avi, The typical approach would be as you've described in #1. #2 is not necessary -- #1 is already doing basically exactly that. -Jamie On Wed, Nov 21, 2018 at 3:36 AM Avi Levi wrote: > Hi , > I am very new to flink so please be gentle :) > > *The challenge:* > I have a road sensor that

Re: Assign IDs to Operators

2018-11-21 Thread Jamie Grier
Hi Chang, The partitioning steps, like keyBy() are not operators. In general you can let Flink's fluent-style API tell you the answer. If you can call .uid() in the API and it compiles then the thing just before that is an operator ;) -Jamie On Wed, Nov 21, 2018 at 5:59 AM Chang Liu wrote:

Re: Reset kafka offets to latest on restart

2018-11-21 Thread Jamie Grier
Hi Vishal, No, there is no way to do this currently. On Wed, Nov 21, 2018 at 10:22 AM Vishal Santoshi wrote: > Any one ? > > On Tue, Nov 20, 2018 at 12:48 PM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> Is it possible to have checkpointing but reset the kafka offsets to >>

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-25 Thread Jamie Grier
interpreted as the hostname for the jobmanager to bind to. The solution was just to remove `cluster` from that command. On Tue, Sep 25, 2018 at 10:15 AM Jamie Grier wrote: > Anybody else seen this and know the solution? We're dead in the water > with Flink 1.5.4. > > On Sun, Sep 23, 2018

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-25 Thread Jamie Grier
Anybody else seen this and know the solution? We're dead in the water with Flink 1.5.4. On Sun, Sep 23, 2018 at 11:46 PM alex wrote: > We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We > have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers >

Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-21 Thread Jamie Grier
Anybody else seen this? I'm running both the JM and TM on the same host in this setup. This was working fine w/ Flink 1.5.3. On the TaskManager: 00:31:30.268 INFO o.a.f.r.t.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@localhost:6123/user/resourcemanager, retrying

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Jamie Grier
Hey Cliff, can you provide the stack trace of the issue you were seeing? We recently ran into a similar issue that we're still debugging. Did it look like this: java.lang.IllegalStateException: Could not initialize operator state > backend. > at >

Running Flink in multiple AWS availability zones

2018-08-16 Thread Jamie Grier
Hi all, I'm looking to learn if/how others are running Flink jobs in such a way that they can survive failure of a single Amazon AWS availability zone. If you're currently doing this I would love a reply detailing your setup. Thanks! -Jamie

Re: Getting key from keyed stream

2017-01-12 Thread Jamie Grier
but i > need to know the topic to write to and for that I need to be able to read > the key. Is there a way to do this? > > > Is there a better way to do this, rather than using a KeyedStream. > > > Paul > ​ -- Jamie Grier data Artisans, Director of Applications En

Re: keyBy called twice. Second time, INetAddress and Array[Byte] are empty

2017-01-10 Thread Jamie Grier
t? > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/keyBy-called- > twice-Second-time-INetAddress-and-Array-Byte-are-empty-tp10907p10947.html > Sent from the Apache Flink User Mailing List archive. mailing list arc

Re: Set Parallelism and keyBy

2017-01-02 Thread Jamie Grier
.print() > - > While this executes, it breaks the assignment of the keys to the tasks: > The "ExpensiveOperation" is now not executed on the same nodes anymore all > the time (visible by the prefixes in the print()). > > What am I doing wrong? Is the only cha

Re: Flink streaming questions

2017-01-02 Thread Jamie Grier
gt; to user sessions identified for windows. > > 4. I also may have an additional requirement of writing out each event > enriched with current session and profile data. I basically could do this > again with generic window function and write out each event with collector > when it

Re: Programmatically get live values of accumulators

2017-01-02 Thread Jamie Grier
ome code on github (tests files) where it’s done using the > underlying akka framework, I don’t mind doing it the same way and creating > an actor to get notifications messages, but I don’t know the best way, and > there probably is a better one. > > > > Thanks in advance, > > >

Re: Hi, There is possibly an issue with EventTimeSessionWindows where a gap is specified for considering items in the same session. Here the logic is, if two adjacent items have a difference in event

2017-01-02 Thread Jamie Grier
Hemel Hempstead, Hertfordshire, HP2 > 4NN. Rave Technologies (India) Pvt Limited, registered in India under > number 117068 with a registered address of 2nd Floor, Ballard House, Adi > Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 41. > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Multiple consumers and custom triggers

2016-12-15 Thread Jamie Grier
-docs-release-1.1/concepts/concepts.html On Thu, Dec 15, 2016 at 3:21 PM, Jamie Grier <ja...@data-artisans.com> wrote: > All streams can be parallelized in Flink even with only one source. You > can have multiple sinks as well. > > On Thu, Dec 15, 2016 at 7:56 AM, Meghashyam Sandee

Re: Multiple consumers and custom triggers

2016-12-15 Thread Jamie Grier
ltiple sinks as well? > > On Dec 14, 2016 10:46 PM, <dromitl...@gmail.com> wrote: > >> Got it. Thanks! >> >> On Dec 15, 2016, at 02:58, Jamie Grier <ja...@data-artisans.com> wrote: >> >> Ahh, sorry, for #2: A single Flink job can have as many sou

Re: Multiple consumers and custom triggers

2016-12-14 Thread Jamie Grier
when using a custom trigger, not the way it assigns windows, it makes sense > now. > > Regarding #4, after doing some more tests I think it's more complex than I > first thought. I'll probably create another thread explaining more that > specific question. > > Thanks, > Matt &

Re: Multiple consumers and custom triggers

2016-12-14 Thread Jamie Grier
rigger that fires on every new element, with up > to 10 elements at a time. The result would be windows of sizes: 1 element, > then 2, 3, ..., 9, 10, 10, 10, Is there a way to achieve this with > predefined triggers or a custom trigger is the only way to go here? > > Best rega

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-02 Thread Jamie Grier
http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Flink-Metrics- > InfluxDB-Grafana-Help-with-query-influxDB-query-for- > Grafana-to-plot-numRecordsIn-numRen-tp9775p9819.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com.

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
p-with-query-influxDB-query-for- > Grafana-to-plot-numRecordsIn-numRen-tp9775p9816.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
PM, Jamie Grier <ja...@data-artisans.com> wrote: > Another note. In the example the template variable type is "custom" and > the values have to be enumerated manually. So in your case you would have > to configure all the possible values of "subtask" to be 0-49. &g

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
Another note. In the example the template variable type is "custom" and the values have to be enumerated manually. So in your case you would have to configure all the possible values of "subtask" to be 0-49. On Tue, Nov 1, 2016 at 2:43 PM, Jamie Grier <ja...@d

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
t; multiple subtask values. I have tried the 'All' option for this templating > variable- This give me an incorrect plot showing me negative values while > the individual selection of subtask values when selected from the > templating variable drop down yields correct result. > > Thank you! > > Regards, > Anchit > > > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Memory Management in Streaming?

2016-09-03 Thread Jamie Grier
one give me some hints on how Flink manage window buffer and > how streaming manages its memory. I see this page on batch api memory > management and wonder what is the equivalent for streaming? > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525 > > -- > C

Re: set flink yarn jvm options

2016-08-04 Thread Jamie Grier
anager and task-manager containers. > > Thanks, > Prabhu > > On Thu, Aug 4, 2016 at 3:07 PM, Jamie Grier <ja...@data-artisans.com> > wrote: > >> Use *env.java.opts* >> >> This will be respected by the YARN client. >> >> >> >> On Thu,

Re: set flink yarn jvm options

2016-08-04 Thread Jamie Grier
nored by the yarn client, is there a > way to set the jvm opts for yarn ? > > Thanks, > Prabhu > > On Wed, Aug 3, 2016 at 7:03 PM, Prabhu V <vpra...@gmail.com> wrote: > >> Hi, >> >> Is there a way to set jvm options on the yarn application-manager and >> task-

Re: Adding and removing operations after execute

2016-07-07 Thread Jamie Grier
or adding new operations, windows, etc to a > running application? Should I start multiple execution contexts? > > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Adding-and-removing-operations-after-execute-tp7863.htm

Re: Timewindow watermarks specific to keys of stream

2016-07-05 Thread Jamie Grier
stock market data. In this, for every symbol i > want to find max of stock price in last 10 mins. I want to generate > watermarks specific to key rather than across the stream. Is this possible > in flink? > > -- > Regards, > Madhukara Phatak > http://datamantra.io/ > -

Re: disable console output

2016-07-05 Thread Jamie Grier
to big logs files (>4MB). > > How can I disable from my Java code (through the Configuration object) > the progress messages displayed in console? > > Thanks, > Andres > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Failed job restart - flink on yarn

2016-07-05 Thread Jamie Grier
his is the way to go. My >> state >> > backend is HDFS and I can see that the checkpoint path has the data >> that has >> > been buffered in the window. >> > >> > I want to start the job in a way such that it will read the >> checkpointed >> > data before

Re: Late arriving events

2016-07-05 Thread Jamie Grier
> estimation. This approach is based on estimation and may add execution > latency to those windows. > > Which would be suggested way in general? > > Thanks, > Chen > > > > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Error submitting stand-alone Flink job to EMR YARN cluster

2016-07-03 Thread Jamie Grier
t; > > *HERE Seattle* > > 701 Pike St, Suite 2000, Seattle, WA 98101 > > *47° 36' 41" N. 122° 19' 57" W > <http://here.com/usa/seattle/98101/pike-st/701/map=47.611439,-122.332741,17/title=HERE%20Seattle%20-%20701%20Pike%20Street>* > > > > <http://360.here.com/> <https://twitter.com/here> > <https://www.facebook.com/here><https://linkedin.com/company/heremaps> > <https://www.instagram.com/here> > > > > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Using standalone single node without HA in production, crazy?

2016-07-01 Thread Jamie Grier
ately or is failure recovery a function > of HA? > > 5. How would I migrate the RocksDB state once I move to HA mode? Is there > a straight forward path? > > Thanks for your time, > > Ryan > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Failed job restart - flink on yarn

2016-07-01 Thread Jamie Grier
ailure ? > > Thanks, > Prabhu > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Failed-job-restart-flink-on-yarn-tp7764.html > Sent from the Apache Flink User Mailing List archive. mailing list archive

Re: Error submitting stand-alone Flink job to EMR YARN cluster

2016-07-01 Thread Jamie Grier
attle* > > 701 Pike St, Suite 2000, Seattle, WA 98101 > > *47° 36' 41" N. 122° 19' 57" W > <http://here.com/usa/seattle/98101/pike-st/701/map=47.611439,-122.332741,17/title=HERE%20Seattle%20-%20701%20Pike%20Street>* > > > > <http://360.here.com/> <https://twitter.com/here> > <https://www.facebook.com/here><https://linkedin.com/company/heremaps> > <https://www.instagram.com/here> > > > > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Cassamdra Connector in Scala

2016-06-20 Thread Jamie Grier
.tuple.Tuple]* > > Is this a Scala issue? Should I switch over to Java? > > > Thanks! > Eamon > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com

Re: Reading whole files (from S3)

2016-06-07 Thread Jamie Grier
asReachedEnd; > } > } > > This class returns the content of the whole file as a string. > > Is this the right approach? > It seems to work when run locally with local files but I wonder if it would > run into problems when tested in a cluster. > > Thanks in advance. >

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Jamie Grier
luding any attachments thereto) without > producing, distributing or retaining any copies thereof. Any review, > dissemination or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended > recipient(s) is prohibited. Thank you

Re: Large Numbers of Dynamically Created Jobs

2016-03-22 Thread Jamie Grier
trol >> > > kafka topic. The concern we had there was we would almost >> completely >> > > lose insight into what was going on if there was a slow down. >> > > 3. The current approach we are using for creating dynamic jobs is >> >