回复:[DISCUSS] Dropping flink-storm?

2018-09-28 Thread Zhijiang(wangzhijiang999)
Very agree with to drop it. +1 -- 发件人:Jeff Carter 发送时间:2018年9月29日(星期六) 10:18 收件人:dev 抄 送:chesnay ; Till Rohrmann ; user 主 题:Re: [DISCUSS] Dropping flink-storm? +1 to drop it. On Fri, Sep 28, 2018, 7:25 PM Hequn Cheng wrote: >

Flink standalone cluster load average is too high.

2018-09-28 Thread weilongxing
I am using flink 1.5.2 in standalone mode. The machine is 8 cores with 32GB memory. I set slots to 4 or 8. I set most jobs parallelism to 1. And each machine has jobs less than the slot number. The load average is too high as the picture shows below. Each line presents load average 1 on one

Re: [DISCUSS] Dropping flink-storm?

2018-09-28 Thread Hequn Cheng
Hi, +1 to drop it. It seems that few people use it. Best, Hequn On Fri, Sep 28, 2018 at 10:22 PM Chesnay Schepler wrote: > I'm very much in favor of dropping it. > > Flink has been continually growing in terms of features, and IMO we've > reached the point where we should cull some of the

Re: [DISCUSS] Dropping flink-storm?

2018-09-28 Thread vino yang
Hi, +1, I agree. In addition, some users ask questions about the integration of Storm compatibility mode with the newer Flink version on the mailing list. It seems that they are not aware that some of Flink's new features are no longer available in Storm compatibility mode. This can be confusing

Deserialization of serializer errored

2018-09-28 Thread Elias Levy
I am experiencing a rather odd error. We have a job running on a Flink 1.4.2 cluster with two Kafka input streams, one of the streams is processed by an async function, and the output of the async function and the other original stream are consumed by a CoProcessOperator, that intern emits Scala

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-28 Thread Averell
Hello Kostas, Thank you very much for the details. Also thanks for that "feel free to disagree" (however, I don't have any desire to disagree here :) thanks a lot) Regarding that mainStream.windowAll, did you mean that checkpointing of the two branches (the main one and the monitoring one) will

Re: Streaming to Parquet Files in HDFS

2018-09-28 Thread hao gao
Hi Bill, I wrote those two medium posts you mentioned above. But clearly, the techlab one is much better I would suggest just "close the file when checkpointing" which is the easiest way. If you use BucketingSink, you can modify the code to make it work. Just replace the code from line 691 to 693

Streaming to Parquet Files in HDFS

2018-09-28 Thread William Speirs
I'm trying to stream log messages (syslog fed into Kafak) into Parquet files on HDFS via Flink. I'm able to read, parse, and construct objects for my messages in Flink; however, writing to Parquet is tripping me up. I do *not* need to have this be real-time; a delay of a few minutes, even up to an

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Rong Rong
Yes. Thanks for bringing this up Hequn! :-) I think Tuple would not be the best container to use. However, in search for alternative, shouldn't Collection / List be a more suitable solution? Row seems to not fit in the context (as there can be Rows with elements of different type). I vaguely

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Hequn Cheng
Hi, I haven't look into the code. If this is limited by Tuple, would it better to implement it with Row? Best, Hequn On Fri, Sep 28, 2018 at 9:27 PM Rong Rong wrote: > Hi Henry, Vino. > > I think IN operator was translated into either a RexSubQuery or a > SqlStdOperatorTable.IN operator. > I

Re: LIMIT and ORDER BY in hop window is not supported?

2018-09-28 Thread Hequn Cheng
Hi, You can implement TopN on SQL/Table-api or write a datastream job with ProcessFunction to solve the problem. Best, Hequn On Fri, Sep 28, 2018 at 9:38 AM 徐涛 wrote: > Hi Hequn, > If limit n is not supported in streaming, how to solve top n problem in > stream scenario? > > Best > Henry > >

Re: [DISCUSS] Dropping flink-storm?

2018-09-28 Thread Chesnay Schepler
I'm very much in favor of dropping it. Flink has been continually growing in terms of features, and IMO we've reached the point where we should cull some of the more obscure ones. flink-storm, while interesting from a theoretical standpoint, offers too little value. Note that the bolt/spout

Job failing during restore in different cluster

2018-09-28 Thread shashank734
Hi, I am trying to move my job from one cluster to another cluster using Savepoint. But It's failing while restoring on the new cluster. In error, it's still trying to connect from some URL of old cluster. I have checked all the properties and configuration. Is flink save the URL's while

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Rong Rong
Hi Henry, Vino. I think IN operator was translated into either a RexSubQuery or a SqlStdOperatorTable.IN operator. I think Vino was referring to the first case. For the second case (I think that's what you are facing here), they are converted into tuples and the maximum we currently have in Flink

[DISCUSS] Dropping flink-storm?

2018-09-28 Thread Till Rohrmann
Hi everyone, I would like to discuss how to proceed with Flink's storm compatibility layer flink-strom. While working on removing Flink's legacy mode, I noticed that some parts of flink-storm rely on the legacy Flink client. In fact, at the moment flink-storm does not work together with Flink's

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-28 Thread Till Rohrmann
What do you think about reverting this change (FLINK-8696), because it is really hard to debug for users? A problem would be if people now rely on the second argument being the hostname. An alternative could be to filter out `cluster` and `local` if they should appear as second argument. This

Re: New received containers silently lost during job auto restarts

2018-09-28 Thread Till Rohrmann
Hi Paul, this looks to me like a Yarn setup problem. Could you check which value you have set for dfs.namenode.delegation.token.max-lifetime? Per default it is set to 7 days. The reason why Flink needs to access HDFS is because the binaries and the configuration files are stored there for an

Re: Regarding implementation of aggregate function using a ProcessFunction

2018-09-28 Thread Gaurav Luthra
Hi Chesnay, I know it is an issue, And won't be fixed because of window merging feature in case of session window. But I am looking if someone has implemented aggregation function using ProcessFunction and process() method instead of AggregationFunction and aggregate() method. I hope you got my

Re: Regarding implementation of aggregate function using a ProcessFunction

2018-09-28 Thread Chesnay Schepler
Please see: https://issues.apache.org/jira/browse/FLINK-10250 On 28.09.2018 11:27, vino yang wrote: Hi Gaurav, Yes, you are right. It is really not allowed to use RichFunction. I will Ping Timo, he may give you a more professional answer. Thanks, vino. Gaurav Luthra

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread vino yang
Hi Henry, Maybe the number of elements in your IN clause is out of range? Its default value is 20, you can modify it with this configuration item: *withInSubQueryThreshold(XXX)* This API comes from Calcite. Thanks, vino. 徐涛 于2018年9月28日周五 下午4:23写道: > Hi, > > When I am executing the

Re: Regarding implementation of aggregate function using a ProcessFunction

2018-09-28 Thread vino yang
Hi Gaurav, Yes, you are right. It is really not allowed to use RichFunction. I will Ping Timo, he may give you a more professional answer. Thanks, vino. Gaurav Luthra 于2018年9月28日周五 下午4:27写道: > Hi Vino, > > Kindly check below flink code. > > package

Re: Regarding implementation of aggregate function using a ProcessFunction

2018-09-28 Thread Gaurav Luthra
Hi Vino, Kindly check below flink code. package org.apache.flink.streaming.api.datastream.WindowedStream @PublicEvolving public SingleOutputStreamOperator aggregate(AggregateFunction function) { checkNotNull(function, "function"); if (*function instanceof RichFunction*) { throw new

Does Flink SQL "in" operation has length limit?

2018-09-28 Thread 徐涛
Hi, When I am executing the following SQL in flink 1.6.1, some error throws out saying that it has a support issue, but when I reduce the number of integers in the “in” sentence, for example, trackId in (124427150,71648998) , Flink does not complain anything, so I wonder is there any

Re: Regarding implementation of aggregate function using a ProcessFunction

2018-09-28 Thread vino yang
Hi Gaurav, This is very strange, can you share your code and specific exceptions? Under normal circumstances, it should not throw an exception. Thanks, vino. Gaurav Luthra 于2018年9月28日周五 下午3:27写道: > Hi Vino, > > RichAggregateFunction can surely access the state. But the problem is, In >

flink-1.6.1-bin-scala_2.11.tgz fails signture and hash verification

2018-09-28 Thread Gianluca Ortelli
Hi, I just downloaded flink-1.6.1-bin-scala_2.11.tgz from https://flink.apache.org/downloads.html and noticed that it fails signature verification with a gpg: BAD signature from "Till Rohrmann (stsffap) " message. The sha512 hash doesn't match either. I switched to 1.6.0, which verifies OK.

Re: Regarding implementation of aggregate function using a ProcessFunction

2018-09-28 Thread vino yang
Hi Gaurav, Why do you think the RichAggregateFunction cannot access the State API? RichAggregateFunction inherits from AbstractRichFunction (it provides a RuntimeContext that allows you to access the state API). Thanks, vino. Gaurav Luthra 于2018年9月28日周五 下午1:38写道: > Hi, > > As we are aware,