Re: jobmanager holds too many CLOSE_WAIT connection to datanode

2018-08-23 Thread vino yang
Hi Youjun, How long has your job been running for a long time? As far as I know, if in a short time, for checkpoint, jobmanager will not generate so many connections to HDFS. What is your Flink cluster environment? Standalone or Flink on YARN? In addition, does JM's log show any timeout

Re: question about setting different time window for operators

2018-08-23 Thread vino yang
Hi Stephen, Yes, of course. You can reuse upstream stream objects to build different windows. Thanks, vino. Stephen 于2018年8月24日周五 上午2:40写道: > Hi, > Is that possible to set different operators with different time windows in > a pipeline? For example, for the wordcount example, could I set

question about setting different time window for operators

2018-08-23 Thread Stephen
Hi, Is that possible to set different operators with different time windows in a pipeline? For example, for the wordcount example, could I set execution period of filter operator 2s but set filter 3s? Thank you.

AvroSchemaConverter and Tuple classes

2018-08-23 Thread françois lacombe
Hi all, I'm looking for best practices regarding Tuple instances creation. I have a TypeInformation object produced by AvroSchemaConverter.convertToTypeInfo("{...}"); Is this possible to define a corresponding Tuple instance with it? (get the T from the TypeInformation) Example : { "type":

jobmanager holds too many CLOSE_WAIT connection to datanode

2018-08-23 Thread Yuan,Youjun
Hi, After running for a while , my job manager holds thousands of CLOSE_WAIT TCP connection to HDFS datanode, the number is growing up slowly, and it's likely will hit the max open file limit. My jobs checkpoint to HDFS every minute. If I run lsof -i -a -p $JMPID, I can get a tons of following

Re: Data loss when restoring from savepoint

2018-08-23 Thread Juho Autio
I changed to allowedLateness=0, no change, still missing data when restoring from savepoint. On Tue, Aug 21, 2018 at 10:43 AM Juho Autio wrote: > I realized that BucketingSink must not play any role in this problem. This > is because only when the 24-hour window triggers, BucketinSink gets a

Re: Question about QueryableState

2018-08-23 Thread Kostas Kloudas
Hi Pierre, You are right that this should not happen. It seems like a bug. Could you open a JIRA and post it here? Thanks, Kostas > On Aug 21, 2018, at 9:35 PM, Pierre Zemb wrote: > > Hi! > > I’ve started to deploy a small Flink cluster (4tm and 1jm for now on 1.6.0), > and deployed a small

Re: lack of function and low usability of provided function

2018-08-23 Thread Fabian Hueske
Hi Henry, Flink is an open source project. New build-in functions are constantly contributed to Flink. Right now, there are more than 5 PRs open to add or improve various functions. If you find that some functions are not working correctly or could be improved, you can open a Jira issue. The

Re: lack of function and low usability of provided function

2018-08-23 Thread Timo Walther
Hi Henry, thanks for giving feedback. The set of built-in functions is a continous effort that will never be considered as "done". If you think a function should be supported, you can open issues in FLINK-6810 and we can discuss its priority. Flink is an open source project so feel also

Re: lack of function and low usability of provided function

2018-08-23 Thread vino yang
Hi Henry, I recently submitted some PRs about Scalar functions, some of which have been merged and some are being reviewed, and some may be what you need. Log2(x) :https://issues.apache.org/jira/browse/FLINK-9928 will be released in Flink 1.7 exp(x): exists here

lack of function and low usability of provided function

2018-08-23 Thread 徐涛
Hi All, I found flink is lack of some basic functions , for example string split, regular express support, json parse and extract support, these function are used frequently in development , but they are not supported, use has to write UDF to support this. And some of the

Re: Implement Joins with Lookup Data

2018-08-23 Thread Hequn Cheng
Hi Harsh, > What I don't get is, how would this work when I have more than 2 datasets involved? If you can ingest the product/account/rate information changes as a stream, I think there are two ways to enrich the positions. - One way is connect multi times. Positions connect Account connect

Count sliding window does not work as expected

2018-08-23 Thread Soheil Pourbafrani
Hi, I need some sliding windowing strategy that fills the window with the count of 400 and for every 100 incoming data, process the last 400 data. For example, suppose we have a data stream of count 16. For count window of 400 and sliding of 100, I expect it output 1597 stream: 16 - 400