Re: Is taskmanager.heap.mb a valid configuration parameter in 1.7?

2019-03-07 Thread Yun Tang
Hi Yes, `taskmanager.heap.mb` is deprecated but still useful to keep backward comparability. I have already created an issue https://issues.apache.org/jira/browse/FLINK-11860 to move these deprecated options in documentation. Best Yun Tang From: anaray

Is taskmanager.heap.mb a valid configuration parameter in 1.7?

2019-03-07 Thread anaray
Hi, I see a reference about *taskmanager.heap.mb* in 1.7.1 config docs (https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html). I thought taskmanager.heap.mb got deprecated and new config is taskmanager.heap.size. Please correct me if I am wrong here. Thank you. -- Sent

Re: [ANNOUNCEMENT] March 2019 Bay Area Apache Flink Meetup

2019-03-07 Thread Xuefu Zhang
Hi all, As an update, this meetup will take place at 505 Brannan St · San Francisco, CA . Many thanks to Pinterest for their generosity of hosting the

Discrepancy between the part length file's length and the part file length during recover

2019-03-07 Thread Vishal Santoshi
Hello folks, I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange

local disk cleanup after crash

2019-03-07 Thread Derek VerLee
I think that effort is put in to have task managers clean up their folders, however I have noticed that in some cases local folders are not cleaned up and can build up, eventually causing problems due to a full disk.  As far as I know this only happens with

Re: Schema Evolution on Dynamic Schema

2019-03-07 Thread Shahar Cizer Kobrinsky
Thanks for the response Rong. Would be happy to clarify more. So there are two possible changes that could happen: 1. There could be a change in the incoming source schema. Since there's a deserialization phase here (JSON -> Pojo) i expect a couple of options. Backward compatible changes

Re: Schema Evolution on Dynamic Schema

2019-03-07 Thread Rong Rong
Hi Shahar, I wasn't sure which schema are you describing that is going to "evolve" (is it the registered_table? or the output sink?). It will be great if you can clarify more. For the example you provided, IMO it is more considered as logic change instead of schema evolution: - if you are

Re: Timestamp synchronized message consumption across kafka partitions

2019-03-07 Thread Gerard Garcia
I'll answer myself. I guess the most viable option for now is to wait for the work in http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html On Thu, Mar 7, 2019, 3:24 PM gerardg wrote: > I'm wondering if there is a way to avoid consuming too

Consume only a few of kafka topic partitions

2019-03-07 Thread Marke Builder
Hi, it is possible that my flume-kafka consumer read only from part of all Kafka patitions? What I mean is that I can use kafka to route certain messages into specific partitions. And with my flink job I would only consume this partitions (not all topic partitions). Thanks! Marke

expose number of entries in map state as a metric

2019-03-07 Thread Rinat
Hi mates, I would like to expose the number of keys in MapState as a job metric At first, I decided the Gauge metric is suitable for this purpose but then I think a little and decided that if I will iterate over the whole state on each request to Gauge, it will be too heavy. So, I decided to

Understanding timestamp and watermark assignment errors

2019-03-07 Thread Andrew Roberts
Hello, I’m trying to convert some of our larger stateful computations into something that aligns more with the Flink windowing framework, and particularly, start using “event time” instead of “ingest time” as a time characteristics. My data is coming in from Kafka (0.8.2.2, using the

Backoff strategies for async IO functions?

2019-03-07 Thread William Saar
Hi, Is there a way to specify an exponential backoff strategy for when async function calls fail? I have an async function that does web requests to a rate-limited API. Can you handle that with settings on the async function call? Thanks, William

Timestamp synchronized message consumption across kafka partitions

2019-03-07 Thread gerardg
I'm wondering if there is a way to avoid consuming too fast from partitions that not have as much data as the other ones in the same topic by keeping them more or less synchronized by its ingestion timestamp. Similar to what kafka streams does:

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-07 Thread Becket Qin
Absolutely! Thanks for the pointer. I'll submit a PR to update the ecosystem page and the navigation. Thanks, Jiangjie (Becket) Qin On Thu, Mar 7, 2019 at 8:47 PM Robert Metzger wrote: > Okay. I will reach out to spark-packages.org and see if they are willing > to share. > > Do you want to

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-07 Thread Robert Metzger
Okay. I will reach out to spark-packages.org and see if they are willing to share. Do you want to raise a PR to update the ecosystem page (maybe sync with the "Software Projects" listed here: https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink) and link it in the navigation? Best,

Flink zookeeper HA problem

2019-03-07 Thread Harris, Mark
Hi, We've got a problem trying to set up two flink clusters using the same zookeeper instance that we wonder if anyone has seen before or has any advice on. Our setup is two AWS EMR clusters running flink (v1.7.2) that are both trying to use a single zookeeper cluster (v3.4.6-1569965) for

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
Yep that was it. I have created https://issues.apache.org/jira/browse/FLINK-11853 so that it is easier for others to work around if they have restrictions on the HTTP client library choice On Thu, 7 Mar 2019 at 11:47, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Thu, 7 Mar

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
On Thu, 7 Mar 2019 at 11:33, Chesnay Schepler wrote: > I've heard of cases where client libraries are automatically changing > the HTTP method when provided with a body. > Hmmm thanks for that... I'll dig into it > > To figure out what exactly is received by Flink, enable TRACE logging, > try

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Chesnay Schepler
I've heard of cases where client libraries are automatically changing the HTTP method when provided with a body. To figure out what exactly is received by Flink, enable TRACE logging, try again and look for logging messages from "org.apache.flink.runtime.rest.handler.router.RouterHandler"

Re: DataStream EventTime last data cannot be output?

2019-03-07 Thread Stephen Connolly
I had this issue myself. Your timestamp assigner will only advance the window as it receives data, thus when you reach the end of the data there will be data which is newer than the last window. One solution is to have the source flag that there will be no more data. If you can do this then that

Re: okio and okhttp not shaded in the Flink Uber Jar on EMR

2019-03-07 Thread Chesnay Schepler
There is no (accurate) reference of included dependencies for the flink-shaded-hadoop uber jars. The contained NOTICE file is a good starting point, but for the time being we're using a generalized version that we apply to all hadoop versions (so some things may be missing). I believe for

Re: Timer question

2019-03-07 Thread Flavio Pompermaier
Here it is: https://issues.apache.org/jira/browse/FLINK-11852 for the moment On Thu, Mar 7, 2019 at 11:49 AM Kostas Kloudas wrote: > I believe you are right. > > It would be helpful to modify the example to delete the redundant timers > and some > text that explains that timers are also state

Re: Timer question

2019-03-07 Thread Kostas Kloudas
I believe you are right. It would be helpful to modify the example to delete the redundant timers and some text that explains that timers are also state and users should pay attention to that. Would you like to open a JIRA and submit a PR? Cheers, Kostas On Thu, Mar 7, 2019 at 11:30 AM Flavio

Re: Timer question

2019-03-07 Thread Flavio Pompermaier
Yes, you're right Kostas..in my code I was using processing time so I forgot to replace it with event time (used by the example). Maybe it could worth it to mention this problem in the doc..like pros and cons. What do you think? On Thu, Mar 7, 2019 at 11:27 AM Kostas Kloudas wrote: > Hi Flavio,

Re: Timer question

2019-03-07 Thread Kostas Kloudas
Hi Flavio, In general, deleting the redundant timers is definitely more memory-friendly. The reason why in the docs the code is presented the way it is, is: 1) it is mainly for pedagogical purposes, and 2) when the docs were written, Flink mechanism for deleting timers was not efficient as it

REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
In the documentation for the /jars/:jarid/plan endpoint https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-plan It says: > Program arguments can be passed both via the JSON request (recommended) or query parameters. Has anyone got sample code that sends

Timer question

2019-03-07 Thread Flavio Pompermaier
Hi to all, I was writing a process function similar to the one described in the Flink docs at [1]. Basically I need to set a timeout before emitting elements. However, the proposed approach creates a timer for every incoming tuple..isn't it dangerous if a key receives a very big burst of events?

Re: Task slot sharing: force reallocation

2019-03-07 Thread Till Rohrmann
The community no longer actively supports Flink < 1.6. Therefore I would try out whether you can upgrade to one of the latest versions. However, be aware that we reworked Flink's distributed architecture which slightly affected the scheduling behavior. In your case, it should actually be

?????? sql-client batch ????????????

2019-03-07 Thread yuess_coder
??debug # Define tables here such as sources, sinks, views, or temporal tables. tables: [] # empty list # Define scalar, aggregate, or table functions here. functions: [] # empty list # Execution properties allow for changing

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-07 Thread Becket Qin
Hi Robert, I think it at least worths checking if spark-packages.org owners are willing to share. Thanks for volunteering to write the requirement descriptions! In any case, that will be very helpful. Since a static page has almost no cost, and we will need it to redirect to the dynamic site