Are savepoints / checkpoints co-ordinated?

2018-10-11 Thread anand.gopinath
Hi, I had a couple questions about savepoints / checkpoints When I issue "Cancel Job with Savepoint", how is that instruction co-ordinated with check points? Am I certain the savepoint will be the last operation (i.e. no more check points)? I have a kafka src>operation>kafka sink task in

Re: User jar is present in the flink job manager's class path

2018-10-11 Thread Gary Yao
Hi, Could it be that you are submitting the job in attached mode, i.e., without -d parameter? In the "job cluster attached mode", we actually start a Flink session cluster (and stop it again from the CLI) [1]. Therefore, in attached mode, the config option "yarn.per-job-cluster.include-user-jar"

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Jörn, Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact it is one of the two approaches that I named in the beginning of the thread. As also pointed out there, this isn't mutually exclusive from work we proposed inside Flink and they target at different user

RE: flink memory management / temp-io dir question

2018-10-11 Thread anand.gopinath
Its ok – I see the relevant docs now… "The RocksDBStateBackend holds in-flight data in a RocksDB database that is (per default) stored in the TaskManager data directories." Thanks for your help Anand From: Gopinath, Anand Sent: 11 October 2018 18:40 To: 'Till Rohrmann'; Kostas Kloudas Cc:

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Rong Rong
Hi Xuefu, Thanks for putting together the overview. I would like to add some more on top of Timo's comments. 1,2. I agree with Timo that a proper catalog support should also address the metadata compatibility issues. I was actually wondering if you are referring to something like utilizing table

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Timo, Thank you for your input. It's exciting to see that the community has already initiated some of the topics. We'd certainly like to leverage the current and previous work and make progress in phases. Here I'd like to comment on a few things on top of your feedback. 1. I think there

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Rong, Thanks for your feedback. Some of my earlier comments might have addressed some of your points, so here I'd like to cover some specifics. 1. Yes, I expect that table stats stored in Hive will be used in Flink plan optimization, but it's not part of compatibility concern (yet). 2. Both

RE: flink memory management / temp-io dir question

2018-10-11 Thread anand.gopinath
Hi Till, Thanks for the reply. I don’t use batch, so I assume what I am seeing is streaming related. I thought rocksdb writes to a different dir though ( as defined by checkpoint.data.uri)? Regards, Anand From: Till Rohrmann [mailto:trohrm...@apache.org] Sent: 08 October 2018 13:39 To:

Re: Taskmanager times out continuously for registration with Jobmanager

2018-10-11 Thread Abdul Qadeer
Hi Till, I didn't try with newer versions as it is not possible to update the Flink version atm. If you could give any pointers for debugging that would be great. On Thu, Oct 11, 2018 at 2:44 AM Till Rohrmann wrote: > Hi Abdul, > > have you tried whether this problem also occurs with newer

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
One other thought on the same lines was to use hive tables to store kafka information to process streaming tables. Something like "create table streaming_table ( bootstrapServers string, topic string, keySerialiser string, ValueSerialiser string)" Insert into streaming_table

答复: No data issued by flink window after a few hours

2018-10-11 Thread 潘 功森
Hi, I found the pictures maybe too big and the net here not so good, so the mail I wrote is not sent sucsessfully last night. Yes, I used event time. I found watermarks fired normally when the job started, but it stopped and no changed after running hours. And I changed as fs state backend, I

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
I think integrating Flink with Hive would be an amazing option and also to get Flink's SQL up to pace would be amazing. Current Flink Sql syntax to prepare and process a table is too verbose, users manually need to retype table definitions and that's a pain. Hive metastore integration should be

Re: User jar is present in the flink job manager's class path

2018-10-11 Thread yinhua.dai
Hi Gary, Yes you are right, we are using the attach mode. I will try to put my jar to flink/lib to get around with the issue. Thanks. I will open a jira for the discrepancy for flink 1.3 and 1.5, thanks a lot. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

答复: No data issued by flink window after a few hours

2018-10-11 Thread 潘 功森
Please look at the mail below, the others were out of data cause the bad network. Yours, September 发件人: 潘 功森 发送时间: Friday, October 12, 2018 11:05:49 AM 收件人: vino yang 抄送: user 主题: 答复: No data issued by flink window after a few hours Hi, I found the pictures

Re: When does Trigger.clear() get called?

2018-10-11 Thread Hequn Cheng
Hi Andrew, Do you use CountWindow? You can switch to TimeWindow to have a test. I'm not quite familiar with window. I checked the code and found that clear() is called only when timer is triggered, i.e, called at the end of time window. Hope this helps. Best, Hequn On Fri, Oct 12, 2018 at 6:23

When does Trigger.clear() get called?

2018-10-11 Thread Andrew Danks
Hello, I see that the clear() function is implemented for various types of Triggers in the Flink API. For example: https://github.com/apache/flink/blob/release-1.3/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/ContinuousProcessingTimeTrigger.java#L87

答复: 答复: No data issued by flink window after a few hours

2018-10-11 Thread 潘 功森
The second question looks fine. [cid:image004.png@01D461B6.93D6FF70] Yours, September 发件人: Dawid Wysakowicz 发送时间: 2018年10月11日 15:13 收件人: 潘 功森; vino yang 抄送: user 主题: Re: 答复:

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Taher, Thank you for your input. I think you emphasized two important points: 1. Hive metastore could be used for storing Flink metadata 2. There are some usability issues around Flink SQL configuration I think we all agree on #1. #2 may be well true and the usability should be improved.

Re: No data issued by flink window after a few hours

2018-10-11 Thread vino yang
Hi gongsen, Since you are running well locally, it should not be a configuration issue. You can refer to the Flink UI to see if your checkpoint is delayed. I hope that you can follow the instructions in the documentation[1] and provide some screenshots that will help the community help locate the

How do I initialize the window state on first run?

2018-10-11 Thread bupt_ljy
Hi, I’m going to run a new Flink program with some initialized window states. I can’t see there is an official way to do this, right? I’ve tried the bravo project, but it doesn’t support FsStateBackend and it costs too much work if we add a new StateBackend in it. Any good ideas about this?

Re: Getting NoMethod found error while running job on flink 1.6.1

2018-10-11 Thread Chandu Kempaiah
Flink is running as standalone cluster in High Availability mode, My application jar is a fat jar which has all the necessary dependencies included. I will check once again and verify by adding the flink-metrics-core to the classpath. Thanks Chandu On Wed, Oct 10, 2018 at 8:38 PM vino yang

Re: Issue while running integration test using AbstractTestBase

2018-10-11 Thread James Isaac
I'm using flink 1.5.0. The test gives the same error even with flink-1.6.0. Also, I introduced a Thread.sleep(3); before the assert statement. That didn't help either. Regards, James On Thu, Oct 11, 2018 at 11:11 AM James Isaac wrote: > Hi, > > I'm trying to run an integration test of my

Re: No data issued by flink window after a few hours

2018-10-11 Thread vino yang
Hi gongsen, Have you used event time as time semantics? If so, then the possible problem is related to watermark. Since I don't know the details of your program, it's hard to make a conclusion. You can check if your watermark is firing normally. Thanks, vino. 潘 功森 于2018年10月11日周四 下午12:12写道: >

Re: 答复: No data issued by flink window after a few hours

2018-10-11 Thread Dawid Wysakowicz
Hi, I agree with Vino, that you should check if the watermark is progressing for all subtasks, if you are using event time semantics. If this is not the problem it would help if you could share the code of your job. By the way have you tried reproducing the problem with collection source? Best,

Re: Taskmanager times out continuously for registration with Jobmanager

2018-10-11 Thread Dawid Wysakowicz
Hi Abdul, I've added Till and Gary to cc, who might be able to help you. Best, Dawid On 11/10/18 03:05, Abdul Qadeer wrote: > > Hi, > > > We are facing an issue in standalone HA mode in Flink 1.4.0 where > Taskmanager restarts and is not able to register with the Jobmanager. > It times out

Re: Partitions vs. Subpartitions

2018-10-11 Thread Fabian Hueske
Hi Chris, The terminology in the docs and code is not always consistent. It depends on the context. Both could also mean the same if they are used in different places. Can you point to the place(s) that refer to partition and subpartition? Fabian Am Do., 11. Okt. 2018 um 04:50 Uhr schrieb Kurt

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Timo Walther
Hi Xuefu, thanks for your proposal, it is a nice summary. Here are my thoughts to your list: 1. I think this is also on our current mid-term roadmap. Flink lacks a poper catalog support for a very long time. Before we can connect catalogs we need to define how to map all the information

Re: Streaming to Parquet Files in HDFS

2018-10-11 Thread Averell
Hi Kostas, Thanks for the info. That error caused by I built your code along with not up-to-date baseline. I rebased my branch build, and there's no more such issue. I've been testing, and until now have some questions/issues as below: 1. I'm not able to write to S3 with the following URI

Re: Flink leaves a lot RocksDB sst files in tmp directory

2018-10-11 Thread Sayat Satybaldiyev
Thank you Piotr for the reply! We didn't run this job on the previous version of Flink. Unfortunately, I don't have a log file from JM only TM logs. https://drive.google.com/file/d/14QSVeS4c0EETT6ibK3m_TMgdLUwD6H1m/view?usp=sharing On Wed, Oct 10, 2018 at 10:08 AM Piotr Nowojski wrote: > Hi, >

User jar is present in the flink job manager's class path

2018-10-11 Thread yinhua.dai
We have some customized log4j layout implementation so we need flink job manager/task manager be able to load the logger implementation which is packaged in the uber jar. However, we noticed that in flink 1.3, the user jar is put at the beginning of job manager, when we do the same again in flink

回复:What are channels mapped to?

2018-10-11 Thread Zhijiang(wangzhijiang999)
The channels are mapped to the subpartition index which would be consumed by specific downstream task parallelism. For example, if there are three reduce tasks parallelism, every map task would generate three subpartitions. If one record is hashed to the first channel, that means this record

Re: getRuntimeContext(): The runtime context has not been initialized.

2018-10-11 Thread Ahmad Hassan
Hi All, Thanks for the replies. Here is the code snippet of what we want to achieve: We have sliding windows of 24hrs with 5 minutes apart. inStream .filter(Objects::nonNull) .keyBy("tenant") .window(SlidingProcessingTimeWindows.of(Time.minutes(1440), Time.minutes(5))) .fold(new

Re: Taskmanager times out continuously for registration with Jobmanager

2018-10-11 Thread Till Rohrmann
Hi Abdul, have you tried whether this problem also occurs with newer Flink versions (1.5.4 or 1.6.1)? Cheers, Till On Thu, Oct 11, 2018 at 9:24 AM Dawid Wysakowicz wrote: > Hi Abdul, > > I've added Till and Gary to cc, who might be able to help you. > > Best, > > Dawid > > On 11/10/18 03:05,

Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

2018-10-11 Thread Kostas Evangelou
Hey all, Thank you so much for your efforts. I've already posted this question on stack overflow, but thought I should ask here as well. I am trying out Flink's new Python streaming API and attempting to run my script with ./flink-1.6.1/bin/pyflink-stream.sh examples/read_from_kafka.py. The

Re: Identifying missing events in keyed streams

2018-10-11 Thread Averell
Hi Fabian, Thanks for the suggestion. I will try with that support of removing timers. I have also tried approach (3) - using session windows, and it works: I set session gap to 2 minutes, and use an aggregation window function to keep the amount of in-memory data for each keyed stream to the

Re: Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

2018-10-11 Thread Dawid Wysakowicz
Hi Kostas, As far as I know you cannot just use java classes from within python API. I think Python API does not provide wrapper for kafka connector. I am adding Chesnay to cc to correct me if I am wrong. Best, Dawid On 11/10/18 12:18, Kostas Evangelou wrote: > Hey all,  > > Thank you so much

Re: Identifying missing events in keyed streams

2018-10-11 Thread Fabian Hueske
I'd go with 2) because the logic is simple and it is (IMO) much easier to understand what is going on and what state is kept. Am Do., 11. Okt. 2018 um 12:42 Uhr schrieb Averell : > Hi Fabian, > > Thanks for the suggestion. > I will try with that support of removing timers. > > I have also tried

Re: getRuntimeContext(): The runtime context has not been initialized.

2018-10-11 Thread Dawid Wysakowicz
Hi Ahmad, Few comments from my side:     1. FoldFunction is deprecated because of many problems, e.g. no possibility to merge contents of windows. Therefore you should at least use the AggregateFunction.     2. I am not sure if you need to store this in RocksDB, do you expect 24millions product

Re: Small checkpoint data takes too much time

2018-10-11 Thread 徐涛
Hi Zhijiang, Thanks for your response. I add the checkpointAlignmentTime, the data shows that the checkpointDuration is about 150s, and the checkpointAlignmentTims is about 4s. There is a big gap between them. Best Henry > 在 2018年10月10日,下午1:26,Zhijiang(wangzhijiang999) > 写道:

Re: [BucketingSink] notify on moving into pending/ final state

2018-10-11 Thread Rinat
Hi Piotr, during the migration to the latest Flink version, we’ve decided to try to contribute this functionality to the master branch. PR is available here https://github.com/apache/flink/pull/6824 More details about hooking the state changes in BucketingSink are available in

Re: User jar is present in the flink job manager's class path

2018-10-11 Thread yinhua.dai
Hi Timo, I didn't tried to configure the classloader order, according to the document, it should only be needed for yarn-session mode, right? I can see the ship files(-yt /path/dir/) is present in job manager's class path, so maybe I should put my uber jar in the -yt path so that it will be

Re: User jar is present in the flink job manager's class path

2018-10-11 Thread Timo Walther
Yes, you are right. I was not aware that the resolution order depends on the cluster deployment. I will loop in Gary (in CC) that might know about such a YARN setup. Regards, Timo Am 11.10.18 um 15:47 schrieb yinhua.dai: Hi Timo, I didn't tried to configure the classloader order, according

Re: User jar is present in the flink job manager's class path

2018-10-11 Thread yinhua.dai
Meanwhile, I can see below code in flink 1.5 public static final ConfigOption CLASSPATH_INCLUDE_USER_JAR = key("yarn.per-job-cluster.include-user-jar") .defaultValue("ORDER") .withDescription("Defines whether user-jars are included

Re: [BucketingSink] notify on moving into pending/ final state

2018-10-11 Thread Dawid Wysakowicz
Hi Ribat, I haven't checked your PR but we introduced a new connector in flink 1.6 called StreamingFileSink that is supposed to replace BucketingSink long term. I think it might solve a few problems of yours. Have you checked it by chance? Best, Dawid On Thu, 11 Oct 2018, 14:10 Rinat, wrote: >