flink 1.8 on yarn????RM????

2019-07-22 Thread ????
?? flink1.8 on yarn??yarn??ResourceManagerHA??RM??2ActiveStandbystandby??8032RM PS??flink1.7 on yarnyarn??hadoop 2019-07-23

Re:使用StreamingFileSink或者BucketingSink写入HDFS 问题请教

2019-07-22 Thread 徐嘉培
请问您第一个问题是如何解决呢 在 2019-07-18 11:15:49,"九思" <1048095...@qq.com> 写道: 请教老师,使用StreamingFileSink或者BucketingSink写入HDFS。 1、在本地,当掐掉程序的时候,文件状态还是in-process状态,不会转为正式文件,这个要怎么处理呢? 2、重启程序开始后,编号又从0开始,而不是从之前的编号继续。看了源码,是有去获取之前的编号。但是我断点查了,没获取到,是什么原因呢?

Execution environments for testing: local vs collection vs mini cluster

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/local_execution.html and https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/runtime/minicluster/MiniCluster.html I see there are 3 ways to create an execution environment for testing: -

MiniClusterResource class not found using AbstractTestBase

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, I'm trying to use AbstractTestBase in a test in order to use the mini cluster. I'm using specs2 with Scala, so I cannot extend AbstractTestBase because I also have to extend org.specs2.Specification, so I'm trying to access the mini cluster directly using Specs2 BeforeAll to initialize it as

Re: Job Manager becomes irresponsive if the size of the session cluster grows

2019-07-22 Thread Prakhar Mathur
On Mon, Jul 22, 2019, 16:08 Prakhar Mathur wrote: > Hi, > > We enabled GC logging, here are the logs > > [GC (Allocation Failure) [PSYoungGen: 6482015K->70303K(6776832K)] > 6955827K->544194K(20823552K), 0.0591479 secs] [Times: user=0.09 sys=0.00, > real=0.06 secs] > [GC (Allocation Failure)

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

2019-07-22 Thread 陈Darling
Hi Yun Tang Your suggestion is very very important to us.  According to your suggestion, We have suggested that users increase the interval time (1 to 5 minutes) and set state.backend.fs.memory-threshold=10k.  But we only have one hdfs cluster, we try to reduce Hdfs api call, I don't know if there

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

2019-07-22 Thread 陈Darling
Hi Yun Tang Your suggestion is very very important to us.  According to your suggestion, We have suggested that users increase the interval time (1 to 5 minutes) and set state.backend.fs.memory-threshold=10k.  But we only have one hdfs cluster, we try to reduce Hdfs api call, I don't know if there

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

2019-07-22 Thread 陈Darling
Hi In my understanding,CreateFile and FileCreated api is different,FileCreated is more like a check api, but I don’t find where it was called in the src source. I don’t understand when  FileCreated Api was called and for what。Is FileCreated api a hdfs internal confirmation api?FLINK-11696  is to

Re:AW: Re:Unable to build Flink1.10 from source

2019-07-22 Thread Haibo Sun
Please check whether the following profile section exists in "flink-filesystems/flink-mapr-fs/pom.xml". If not, you should pull the latest code and try to compile it again. If yes, please share the latest error message, it may be different from before.

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-22 Thread Jeff Zhang
Hi Flavio, Based on the discussion in the tickets you mentioned above, the program-class attribute was a mistake and community is intended to use main-class to replace it. Deprecating Program interface is a part of work of flink new client api. IIUC, your requirements are not so complicated. We

GroupBy result delay

2019-07-22 Thread Fanbin Bu
Hi, I have a Flink sql streaming job defined by: SELECT user_id , hop_end(created_at, interval '30' second, interval '1' minute) as bucket_ts , count(name) as count FROM event WHERE name = 'signin' GROUP BY user_id , hop(created_at, interval '30' second, interval '1' minute) there is

AW: Re:Unable to build Flink1.10 from source

2019-07-22 Thread Yebgenya Lazarkhosrouabadi
Hi, I used the command mvn clean package -DskipTests -Punsafe-mapr-repo , but it didn’t work. I get the same error. Regards Yebgenya Lazar Von: Haibo Sun Gesendet: Montag, 22. Juli 2019 04:40 An: Yebgenya Lazarkhosrouabadi Cc: user@flink.apache.org Betreff: Re:Unable to build Flink1.10 from

Re: Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-22 Thread Shuyi Chen
Hi Lasse, Thanks for the reply. If your input is in epoch time, you are not getting local time, instead, you are getting a wrong time that does not make sense. For example, if the user input value is 0 (which means 00:00:00 UTC on 1 January 1970), and your local timezone is UTC-8, converting

Will StreamingFileSink.forBulkFormat(...) support overriding OnCheckpointRollingPolicy?

2019-07-22 Thread Elkhan Dadashov
Hi folks, Will StreamingFileSink.forBulkFormat(...) support overriding OnCheckpointRollingPolicy? Does anyone use StreamingFileSink *with checkpoint disabled *for writing Parquet output files? The output parquet files are generated, but they are empty, and stay in *inprogress* state, even when

Re: Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-22 Thread Lasse Nedergaard
Hi. I have encountered the same problem when you input epoch time to window table function and then use window.start and window.end the out doesn’t output in epoch but local time and I located the problem to the same internal function as you. Med venlig hilsen / Best regards Lasse

Re: Job submission timeout with no error info.

2019-07-22 Thread Fakrudeen Ali Ahmed
It turns out the actual issue was a configuration issue and we just had to pore over job manager log carefully. We were using HDFS [really API on top of windows blob] as source and we didn’t provide the server location and it took the path prefix as the server. Only thing here would have been

Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-22 Thread Shuyi Chen
Hi all, Currently, in the non-blink table/SQL runtime, Flink used SqlFunctions.internalToTimestamp(long v) from Calcite to convert event time (in long) to java.sql.Timestamp. However, as discussed in the recent Calcite mailing list (Jul. 19, 2019), SqlFunctions.internalToTimestamp() assumes the

Re: Job submission timeout with no error info.

2019-07-22 Thread Fakrudeen Ali Ahmed
Thanks Andrey. The environment we run [Azure HD insight cluster] only supports Flink 1.4.2 now. So I can’t run with 1.8 in this environment. I can run in a different environment with 1.8 [on Kubernetes not YARN though] and report the results. Thanks, -Fakrudeen (define (sqrte n xn eph) (if (>

Re: Job submission timeout with no error info.

2019-07-22 Thread Andrey Zagrebin
Hi Fakrudeen, Thanks for sharing the logs. Could you also try it with Flink 1.8? Best, Andrey On Sat, Jul 20, 2019 at 12:44 AM Fakrudeen Ali Ahmed wrote: > Hi Andrey, > > > > > > Flink version: 1.4.2 > > Please find the client log attached and job manager log is at: job > manager log >

Use batch and stream environment in a single pipeline

2019-07-22 Thread Andres Angel
Hello everyone, I need to create a table from a stream environment and thinking in a pure SQL approach I was wondering if I can create few of the enrichment tables in batch environment and only the streaming payload as streaming table environment. I tried to create a batch table environment

Re: Extending REST API with new endpoints

2019-07-22 Thread Oytun Tez
I did take a look at it, but things got out of hand very quickly from there on :D I see that WebSubmissionExtension implements WebMonitorExtension, but then WebSubmissionExtension was used in DispatcherRestEndpoint, which I couldn't know how to manipulate/extend... How can I plug my Extension

Re: Extending REST API with new endpoints

2019-07-22 Thread Oytun Tez
I simply want to open up endpoints to query QueryableStates. What I had in mind was to give operators an interface to implement their own QueryableState controllers, e.g. serializers etc. We are trying to use Flink in more of an "application framework" fashion, so extensibility helps a lot. As

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-22 Thread Flavio Pompermaier
Hi Tison, we use a modified version of the Program interface to enable a web UI do properly detect and run Flink jobs contained in a jar + their parameters. As stated in [1], we dected multiple Main classes per jar by handling an extra comma-separeted Manifest entry (i.e. 'Main-classes'). As

[SURVEY] How many people implement Flink job based on the interface Program?

2019-07-22 Thread Zili Chen
Hi guys, We want to have an accurate idea of how many people are implementing Flink job based on the interface Program, and how they actually implement it. The reason I ask for the survey is from this thread[1] where we notice this codepath is stale and less useful than it should be. As it is an

Re: Time extracting in flink

2019-07-22 Thread Andy Hoang
Thanks Biao, just want to not reinvent the wheel :) > On Jul 22, 2019, at 4:29 PM, Biao Liu wrote: > > Hi Andy, > > As far as I know, Flink does not support feature like that. > > I would suggest recording and calculating the time in user code. > For example, add a timestamp field (maybe an

timeout exception when consuming from kafka

2019-07-22 Thread Yitzchak Lieberman
Hi. I'm running a Flink application (version 1.8.0) that uses FlinkKafkaConsumer to fetch topic data and perform transformation on the data, with state backend as below: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(5_000,

Re: Time extracting in flink

2019-07-22 Thread Biao Liu
Hi Andy, As far as I know, Flink does not support feature like that. I would suggest recording and calculating the time in user code. For example, add a timestamp field (maybe an array) in your record with printing a timestamp on in by each processing. Andy Hoang 于2019年7月22日周一 下午4:49写道: > Hi

Time extracting in flink

2019-07-22 Thread Andy Hoang
Hi guys, I’m trying to write elk log for flink, this help us to store/calculate processing time of a group of operators for business auditing. I read about process_function and Debugging Windows & Event Time in docs. They’re focus on “keyed” events and monitoring using web/metric, where I want

Re: Extending REST API with new endpoints

2019-07-22 Thread Biao Liu
Hi, As far as I know, the RESTful handler is not pluggable. And I don't see a strong reason from your description to do so. Could you explain more about your requirement? Oytun Tez 于2019年7月20日周六 上午4:36写道: > Yep, I scanned all of the issues in Jira and the codebase, I couldn't find > a way to

Re: Best way to compute the difference between 2 datasets

2019-07-22 Thread Ken Krugler
Hi Juan, If you want to deduplicate, then you could group by the record, and use a (very simple) reduce function to only emit a record if the group contains one element. There will be performance issues, though - Flink will have to generate all groups first, which typically means spilling to

Best way to compute the difference between 2 datasets

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, I've been trying to write a function to compute the difference between 2 datasets. With that I mean computing a dataset that has all the elements of a dataset that are not present in another dataset. I first tried using coCogroup, but it was very slow in a local execution environment, and

?????? flink????????

2019-07-22 Thread ????
??TaskManager?? 2019-07-22 05:39:03,987 WARN org.apache.hadoop.ipc.Client - Failed to connect to server: master/10.0.2.11:9000: try once and fail. java.nio.channels.ClosedByInterruptException at