Event time didn't advance because of some idle slots

2018-07-31 Thread Soheil Pourbafrani
In Flink Event time mode, I use the periodic watermark to advance event time. Every slot extract event time from the incoming message and to emit watermark, subtract it a network delay, say 3000ms. public Watermark getCurrentWatermark() { return new Watermark(MAX_TIMESTAMP - DELEY);

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread Reza Sameei
It's not a real solution; but why you don't change the parallelism for your `SourceFunction`? On Tue, Jul 31, 2018 at 10:51 AM Soheil Pourbafrani wrote: > In Flink Event time mode, I use the periodic watermark to advance event > time. Every slot extract event time from the incoming message and

Pass JVM option (-Dconfig.file) per job in standalone mode?

2018-07-31 Thread Chang Liu
Dear all, I would like to know is there a way to pass JVM options (for example, -Dconfig.file=application.conf) for each submitted flink job? I am using the Config library from lightbend. ./bin/flink run examples/Example.jar -Dconfig.file=/path/application.conf Best regards/祝好, Chang Liu 刘畅

Re: Questions on Unbounded number of keys

2018-07-31 Thread Till Rohrmann
Hi Ashish, FIRE_AND_PURGE should also clear the window state. Yes I mean with active windows, windows which have not been purged yet. Maybe Aljoscha knows more about why the window state is growing (I would not rule out a bug). Cheers, Till On Tue, Jul 31, 2018 at 1:45 PM ashish pok wrote: >

Re: Small-files source - partitioning based on prefix of file

2018-07-31 Thread Averell
Hi Fabian, Thanks for the information. I will try to look at the change to that complex logic that you mentioned when I have time. That would save one more shuffle (from 1 to 0), wouldn't that? BTW, regarding fault tolerant in the file reader task, could you help explain what would happen if the

Re: Questions on Unbounded number of keys

2018-07-31 Thread ashish pok
Thanks Till, I will try to create an instance of app will smaller heap and get a couple of dumps as well. I should be ok to share that on google drive.  - Ashish On Tuesday, July 31, 2018, 7:49 AM, Till Rohrmann wrote: Hi Ashish, FIRE_AND_PURGE should also clear the window state. Yes I mean

Re: Pass JVM option (-Dconfig.file) per job in standalone mode?

2018-07-31 Thread Chang Liu
Hi Dominik, Thanks for your reply. I thought so as well. But I am having some problems with passing program args. Currently, i have the following: object Configs { lazy val CONFIG: Config = ConfigFactory.load() lazy val config1: String = CONFIG.getString("key1") lazy val config2:

Access to Kafka Event Time

2018-07-31 Thread Vishal Santoshi
We have a use case where multiple topics are streamed to hdfsand we would want to created buckets based on ingestion time ( the time the event were pushed to kafka ). Our producers to kafka will set that the event time

Re: Questions on Unbounded number of keys

2018-07-31 Thread ashish pok
Hi Till, Keys are unbounded (a group of events have same key but that key doesnt repeat after it is fired other than some odd delayed events). So basically there 1 key that will be aligned to a window. When you say key space of active windows, does that include keys for windows that have

Re: Committing Kafka Transactions during Savepoint

2018-07-31 Thread Aljoscha Krettek
Hi Scott, Some more clarifications: Doing a stop-with-savepoint will suspend the checkpoint coordinator, meaning that no new checkpoints will happen between taking the savepoint and shutting down the job. This means you will be save from duplicates if you only use savepoints for this.

Re: Access to Kafka Event Time

2018-07-31 Thread Vishal Santoshi
In fact it may be available else where too ( for example ProcessFunction etc ) but do we have no need to create one, it is just a data relay ( kafka to hdfs ) and any intermediate processing should be avoided if possible IMHO. On Tue, Jul 31, 2018 at 9:10 AM, Vishal Santoshi wrote: > We have a

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread vino yang
Hi Soheil, Hequn has given you the usage of this method, see here : https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L639 Thanks, vino. 2018-07-31 17:56 GMT+08:00 Soheil

Re: Converting a DataStream into a Table throws error

2018-07-31 Thread vino yang
Hi Mich, The field specified by the fromDataStream API must match the number of fields contained in the DataStream stream object, your DataStream's type is just a string, example is here.[1] [1]:

Re: Converting a DataStream into a Table throws error

2018-07-31 Thread Hequn Cheng
Hi, Mich You can try adding "import org.apache.flink.table.api.scala._", so that the Symbol can be recognized as an Expression. Best, Hequn On Wed, Aug 1, 2018 at 6:16 AM, Mich Talebzadeh wrote: > Hi, > > I am following this example > > https://ci.apache.org/projects/flink/flink-docs- >

Re: Rest API calls

2018-07-31 Thread yuvraj singh
Hi vino , thanks for the information . But I was looking for the use case where I need to call a web service on the stream . Thanks Yubraj Singh On Wed, Aug 1, 2018, 8:32 AM vino yang wrote: > Hi yuvraj, > > The documentation of Flink REST API is here : >

Re: python vs java api

2018-07-31 Thread vino yang
Hi Nicos, For Checkpoint you can see the API in PythonStreamExecutionEnvironment. But it can not set TimeCharacteristic now. I will create a JIRA issue for this. Thanks, vino. 2018-08-01 0:15 GMT+08:00 Nicos Maris : > Thanks Vino, > > > Comparing functionalities in terms of the

Apache-flink -- Checkpointing to S3 Bucket

2018-07-31 Thread Chargel, Rafael
We have Apache Flink (1.4.2) running on an EMR cluster. We are checkpointing to an S3 bucket, and are pushing about 5,000 records per second through the flows. We recently saw the following error in our logs: java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed

Rest API calls

2018-07-31 Thread yuvraj singh
Hi I have a use case where I need to call rest apis from a flink . I am not getting much context form internet , please help me on this . Thanks

Re: Small-files source - partitioning based on prefix of file

2018-07-31 Thread Fabian Hueske
Hi Averell, please find my answers inlined. Best, Fabian 2018-07-31 13:52 GMT+02:00 Averell : > Hi Fabian, > > Thanks for the information. I will try to look at the change to that > complex > logic that you mentioned when I have time. That would save one more shuffle > (from 1 to 0), wouldn't

Re: [ANNOUNCE] Apache Flink 1.5.2 released

2018-07-31 Thread Bowen Li
Congratulations, community! On Tue, Jul 31, 2018 at 1:44 AM Chesnay Schepler wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.5.2, which is the second bugfix release for the Apache Flink 1.5 > series. > > Apache Flink® is an open-source stream

Re: python vs java api

2018-07-31 Thread Nicos Maris
Thanks Vino, Comparing functionalities in terms of the transformations is clear but what about timestamps and state? On Tue, Jul 31, 2018 at 6:47 PM vino yang wrote: > Hi Nicos, > > You can read the official documentation of latest Python API about > DataStream transformation[1] and latest

Re: My task managers are not starting

2018-07-31 Thread Felipe Gutierrez
Strange. I decreased the max value (TM_MAX_OFFHEAP_SIZE) even it says the slave will not use all space and it worked. # Long.MAX_VALUE in TB: This is an upper bound, much less direct memory will be used # TM_MAX_OFFHEAP_SIZE="8388607T" *--* *-- Felipe Gutierrez* *-- skype:

My task managers are not starting

2018-07-31 Thread Felipe Gutierrez
Hi all, I am deploying a Flink cluster using the version "flink-1.5.2-bin-hadoop28-scala_2.11.tgz". It is one master node and two slave nodes. I have configured the key-ssh between all nodes so I can log in without type the password (also the nodes of the cluster). When I start the cluster it

Re: python vs java api

2018-07-31 Thread vino yang
Hi Nicos, You can read the official documentation of latest Python API about DataStream transformation[1] and latest Java API transformation[2]. However, the latest documentation may not react the new feature especially for Python API, so you can also compare the implementation of

Re: Description of Flink event time processing

2018-07-31 Thread Fabian Hueske
Hi Elias, Sorry for the delay. I just made a pass over the document. I think it is very good. Let's have a look where this fits best into the docs and check if there is duplicate content on other pages that should be removed / reorganized. Best, Fabian 2018-07-31 3:17 GMT+02:00 Elias Levy : >

Converting a DataStream into a Table throws error

2018-07-31 Thread Mich Talebzadeh
Hi, I am following this example https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/table/common.html#integration-with-datastream-and-dataset-api This is my dataStream which is built on a Kafka topic // //Create a Kafka consumer // val dataStream = streamExecEnv

Yahoo Streaming Benchmark on a Flink 1.5 cluster

2018-07-31 Thread Naum Gjorgjeski
Hi, I am trying to run the data Artisans version of the Yahoo Streaming Benchmark. The benchmark applications are written for Flink 1.0.1. However, I need them to run on a Flink 1.5 cluster. When I try to build the benchmark applications with any version of Flink from 1.3.0 or higher, I

Re: Counting elements that appear "behind" the watermark

2018-07-31 Thread Elias Levy
Correct. Context gives you access to the element timestamp . But it also gives you access to the current watermark via timerService

Re: Rest API calls

2018-07-31 Thread vino yang
Hi yuvraj, The documentation of Flink REST API is here : https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/rest_api.html#monitoring-rest-api Thanks, vino. 2018-08-01 3:29 GMT+08:00 yuvraj singh <19yuvrajsing...@gmail.com>: > Hi I have a use case where I need to call rest

[ANNOUNCE] Apache Flink 1.5.2 released

2018-07-31 Thread Chesnay Schepler
|The Apache Flink community is very happy to announce the release of Apache Flink 1.5.2, which is the second bugfix release for the Apache Flink 1.5 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread Fabian Hueske
Hi, If you are using a custom source, you can call SourceContext.markAsTemporarilyIdle() to indicate that a task is currently not producing new records [1]. Best, Fabian 2018-07-31 8:50 GMT+02:00 Reza Sameei : > It's not a real solution; but why you don't change the parallelism for > your

Re: scala IT

2018-07-31 Thread Nicos Maris
Isn't the returns functions deprecated? On Tue, Jul 31, 2018, 5:32 AM vino yang wrote: > Hi Nicos, > > The thrown exception has given you a clear solution hint: > The return type of function 'apply(Mu >ltiplyByTwoTest.scala:43)' could not be determined automatically, due > to type erasure.

Re: 答复: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-31 Thread Till Rohrmann
I think that the web ui automatically redirects to the current leader. So if you should access the JobManager which is not leader, then you should get an HTTP redirect to the current leader. Due to that it should not be strictly necessary to know which of the JobManagers is the leader. The

Re: Questions on Unbounded number of keys

2018-07-31 Thread Till Rohrmann
Hi Ashish, the processing time session windows need to store state in the StateBackends and I assume that your key space of active windows is constantly growing. That could explain why you are seeing an ever increasing memory footprint. But without knowing the input stream and what the UDFs do

Re: watermark VS window trigger

2018-07-31 Thread Fabian Hueske
Hi, Watermarks are not holding back records. Instead they define the event-time at an operator (as Vino said) and can trigger the processing of data if the logic of an operator is based on time. For example, a window operator can emit complete results for a window once the time passed the

Re: Logs are not easy to read through webUI

2018-07-31 Thread Till Rohrmann
Hi Xinyu, thanks for starting this discussion. I think you should open a JIRA issue for this feature. I can see the benefit of such a feature if the DailyRollingAppender is activated. Cheers, Till On Mon, Jul 30, 2018 at 1:47 PM vino yang wrote: > Hi Xinyu, > > Thanks for your suggestion. I

Multiple output operations in a job vs multiple jobs

2018-07-31 Thread anna stax
Hi all, I am not sure when I should go for multiple jobs or have 1 job with all the sources and sinks. Following is my code. val env = StreamExecutionEnvironment.getExecutionEnvironment ... // create a Kafka source val srcstream = env.addSource(consumer) srcstream

Re: scala IT

2018-07-31 Thread vino yang
Hi Nicos, The returns API is not deprecated, just because it is part of the DataStream Java API. Thanks, vino. 2018-07-31 15:15 GMT+08:00 Nicos Maris : > Isn't the returns functions deprecated? > > On Tue, Jul 31, 2018, 5:32 AM vino yang wrote: > >> Hi Nicos, >> >> The thrown exception has

Re: Multiple output operations in a job vs multiple jobs

2018-07-31 Thread vino yang
Hi anna, 1. The srcstream is a very high volume stream and the window size is 2 weeks and 4 weeks. Is the window size a problem? In this case, I think it is not a problem because I am using reduce which stores only 1 value per window. Is that right? *>> Window Size is based on your business

Re: Small-files source - partitioning based on prefix of file

2018-07-31 Thread Fabian Hueske
Hi Averell, The records emitted by the monitoring tasks are "just" file splits, i.e., meta information that defines which data to read from where. The reader tasks receive these splits and process them by reading the corresponding files. You could of course partition the splits based on the file

Process with guava cache

2018-07-31 Thread Juan Gentile
Hello! I’m trying to have a process with a cache (using guava) and following this https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/process_function.html But when I run it I get the following exception: com.esotericsoftware.kryo.KryoException:

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread Hequn Cheng
Hi Soheil, You can set parallelism to 1 to solve the problem. Or use markAsTemporarilyIdle() as Fabian said(the link maybe is

scala cep with event time

2018-07-31 Thread 孙森
Hi,Fabian I am using flink CEP library with event time, but there is no output( the java code performed as expected, but scala did not) .My code is here: object EventTimeTest extends App { val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironment()

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread vino yang
Hi Soheil, The documentation of markAsTemporarilyIdle method is here : https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/functions/source/SourceFunction.SourceContext.html#markAsTemporarilyIdle-- Thanks, vino. 2018-07-31 17:14 GMT+08:00 Hequn