Fwd: Hi Flink Team

2018-03-01 Thread Ashish Attarde
Hi, I am new to Flink and in general data processing using stream processors. I am using flink to do real time correlation between multiple records which are coming as part of same stream. I am doing is "apply" operation on TimeWindowed stream. When I submit job with parallelism factor of 4, I

Re: Flink Kafka reads too many bytes .... Very rarely

2018-03-01 Thread Fabian Hueske
Hi Phil, I've created a JIRA ticket for the problem that you described and linked it to this thread: FLINK-8820. Thank you, Fabian [1] https://issues.apache.org/jira/browse/FLINK-8820 2018-02-28 5:13 GMT+01:00 Philip Doctor : > >- The fact that I seem to get all

Getting warning messages (...hdfs.DataStreamer - caught exception) while running Flink with Hadoop as the state backend

2018-03-01 Thread PedroMrChaves
While my flink job is running I keep getting the following warning message in the log: /2018-02-23 09:08:11,681 WARN org.apache.hadoop.hdfs.DataStreamer - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at

Re: Slow Flink program

2018-03-01 Thread Piotr Nowojski
Hi, First of all learn about what’s going with your job: check the status of the machines, cpu/network usage on the cluster. If CPU is not ~100%, analyse what is preventing the machines to work faster (network bottleneck, locking, blocking operations etc). If CPU is ~100%, profile the

Re: Hi Flink Team

2018-03-01 Thread Ashish Attarde
Thanks Piotrek for your response. Teena responsed for same. I am implementing changes to try it out. Yes, Originally I did call keyBy for same reason so that I can parallelize the operation. On Thu, Mar 1, 2018 at 1:24 AM, Piotr Nowojski wrote: > Hi, > > timeWindowAll

Re: logging question

2018-03-01 Thread JP de Vooght
in the docker-compose.yaml I have a volume entry which maps my log4j.properties with /opt/flink/conf/log4j-console.properties Not pretty but it works after I determined how it was being launched. See below version: "2.1"   services:   jobmanager:     image: flink     expose:   - "6123"    

Re: Reading csv-files

2018-03-01 Thread Fabian Hueske
Hi Esa, IMO, the easiest approach would be to implement a custom source function that reads the CSV files line-wise (in the correct timestamp order) and extracts timestamps. At the end of each file, you can emit a watermark. The order of files can either be hardcoded or determined from the file

Re: Hi Flink Team

2018-03-01 Thread Piotr Nowojski
Hi, timeWindowAll is a non parallel operation, since it gathers all of the elements and process them together:

Re: Unexpected hop start & end timestamps after stream SQL join

2018-03-01 Thread Fabian Hueske
Hi Juho, I have to admit I lost a bit track of what you are trying to compute. I also don't understand the problem with the missing ids. The query that you shared in the last mail will return for each record with a valid s_aid1, s_cid combination how often the id combination has been seen so far

Re: Questions about the FlinkCEP

2018-03-01 Thread Kostas Kloudas
Hi Esa, The answers to the questions are inlined. > On Feb 28, 2018, at 8:32 PM, Esa Heikkinen wrote: > > Hi > > I have tried to learn FlinkCEP [1], but i have yet not found the clear > answers for questions: > 1) Whether the pattern of CEP is meant only for one data

Re: Questions about the FlinkCEP

2018-03-01 Thread Kostas Kloudas
Hi, So yes you can do it with IterativeConditions. Cheers, Kostas > On Mar 1, 2018, at 1:15 PM, Esa Heikkinen > wrote: > > > Hi > > 6) I meant that in the first step the CEP pattern queries value for “Id” and > stores the value to (global) variable for

RE: Questions about the FlinkCEP

2018-03-01 Thread Esa Heikkinen
Hi 6) I meant that in the first step the CEP pattern queries value for “Id” and stores the value to (global) variable for later use in the same pattern or even other places in the application. Is this possible ? Best, Esa From: Kostas Kloudas [mailto:k.klou...@data-artisans.com] Sent:

RE: Reading csv-files

2018-03-01 Thread Esa Heikkinen
Hi Should the custom source function be written by Java, but no Scala, like in that RideCleansing exercise ? Best, Esa From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Thursday, March 1, 2018 11:23 AM To: Esa Heikkinen Cc: user@flink.apache.org Subject: Re:

Configuring S3 with Flink and Mesos

2018-03-01 Thread Guillaume Balaine
Hi Flink fellas, I am painfully trying to get the taskmanager launched by the MesosTaskManagerRunner to recognize the fs.hdfs.hadoopconf property. It seems the HadoopUtils class is unable to recognize the possibleHadoopConfPath of fs.hdfs.hadoopconf / core-site.xml. despite it being mounted in

Re: Reading csv-files

2018-03-01 Thread Fabian Hueske
That does not matter. 2018-03-01 13:32 GMT+01:00 Esa Heikkinen : > Hi > > > > Should the custom source function be written by Java, but no Scala, like > in that RideCleansing exercise ? > > > > Best, Esa > > > > *From:* Fabian Hueske [mailto:fhue...@gmail.com] >

Re: Slow Flink program

2018-03-01 Thread Supun Kamburugamuve
Yes, the program runs fine, I can see it on the UI. Sorry, didn't include the part where the execute is called. Thanks, Supun.. On Thu, Mar 1, 2018 at 10:27 AM, Fabian Hueske wrote: > Are you sure the program is doing anything at all? > Do you call execute() on the

Re: Slow Flink program

2018-03-01 Thread Supun Kamburugamuve
Thanks Piotrek, I did it this way on purpose to see how Flink performs. With 128000 messages it takes an un-reasonable amount of time for Flink to complete the operation. With another framework the same operation completes in about 70 seconds for 1000 messages of size 128000, while Flink takes

Does Flink support Hadoop (HDFS) 2.9 ?

2018-03-01 Thread Soheil Pourbafrani
?

Re: Flink Kafka reads too many bytes .... Very rarely

2018-03-01 Thread Stephan Ewen
Can you specify exactly where you have that excess of data? Flink uses basically Kafka's standard consumer and passes byte[] unmodified to the DeserializationSchema. Can you help us check whether the "too many bytes" happens already before or after the DeserializationSchema? - If the "too many

Re: Slow Flink program

2018-03-01 Thread Fabian Hueske
Are you sure the program is doing anything at all? Do you call execute() on the StreamExecutionEnvironment? 2018-03-01 15:55 GMT+01:00 Supun Kamburugamuve : > Thanks Piotrek, > > I did it this way on purpose to see how Flink performs. With 128000 > messages it takes an

Re: accessing flink HA cluster with scala shell/zeppelin notebook

2018-03-01 Thread santoshg
Hi Alexis, Were you able to make this work ? I am also looking for zepplin integration with Flink and this might be helpful. Thanks Santosh -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Getting warning messages (...hdfs.DataStreamer - caught exception) while running Flink with Hadoop as the state backend

2018-03-01 Thread PedroMrChaves
No. It is just a log message with no apparent side effects. - Best Regards, Pedro Chaves -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Slow Flink program

2018-03-01 Thread Supun Kamburugamuve
Is there a way to not go between RocksDB? For this test application it seems not necessary as we don't expect fault tolerance and this is an streaming case. Thanks, Supun.. On Thu, Mar 1, 2018 at 11:55 AM, Stephan Ewen wrote: > Few quick checks: > > - Do you properly set

Question on event time functionality, using Flink in a IoT usecase

2018-03-01 Thread Shailesh Jain
Hi, We're working with problems in IoT domain and using Flink to address certain use cases (dominantly CEP). There are multiple devices (of the same type, for eg. a temperature sensor) which are continuously pushing events. These (N) devices are distinct and independent data sources, mostly

Re: Does Flink support Hadoop (HDFS) 2.9 ?

2018-03-01 Thread Piotr Nowojski
Hi, You can build Flink against Hadoop 2.9: https://issues.apache.org/jira/browse/FLINK-8177 It seems like convenience binaries will be built by us only since 1.5: https://issues.apache.org/jira/browse/FLINK-8363

Re: Slow Flink program

2018-03-01 Thread Stephan Ewen
Few quick checks: - Do you properly set the parallelism? - If you start 640 tasks (parallelism), and you use the same key for everything, that behaves like parallelism 1 (Piotr mentioned this) - Do you use the RocksDB state backend? If yes, try the FsStateBackend. It looks like your state

Re: Which test cluster to use for checkpointing tests?

2018-03-01 Thread Stephan Ewen
@Nico This has nothing to do with the DataSet API. The DataStream API supports finite programs as well. @Ken The issue you are running into is that Checkpointing works currently only until the job reaches the point where the pipeline starts to drain out, meaning when the sources are done. In your

Re: Getting warning messages (...hdfs.DataStreamer - caught exception) while running Flink with Hadoop as the state backend

2018-03-01 Thread Stephan Ewen
Is this happening around a failure / recovery? Flink interrupts threads when canceling user code in order to reset an operator to a checkpoint. On Thu, Mar 1, 2018 at 11:27 AM, PedroMrChaves wrote: > While my flink job is running I keep getting the following warning

Re: Does Flink support Hadoop (HDFS) 2.9 ?

2018-03-01 Thread Soheil Pourbafrani
I mean Flink 1.4 On Thursday, March 1, 2018, Soheil Pourbafrani wrote: > ?