s3 read while writing inside sink function

2017-03-30 Thread Sathi Chowdhury
Hi fellow Flink enthusiasts, I am trying figure out a recommended way to read s3 data while I am trying to write to s3 using BucketingSink. BucketingSink s3Sink = new BucketingSink("s3://" + entityBucket + "/") .setBucketer(new EntityCustomBucketer()) .setWriter(new

Monitoring REST API and YARN session

2017-03-30 Thread Mohammad Kargar
How can I access the REST APIs for monitoring when cluster launched in a yarn session?

Re: Flink 1.2 time window operation

2017-03-30 Thread Dominik Safaric
> First, some remarks here - sources (in your case the Kafka consumer) will > not stop fetching / producing data when the windows haven’t fired yet. > This is for sure true. However, the plot shows the number of records produced per second, where each record was assigned a created at

Re: deploying flink in AWS - some teething issues

2017-03-30 Thread Patrick Lucas
I think Log4j includes a Syslog appender—the log4j config included with Flink just logs to the logs/ dir, but you should just be able to modify it (log4j.properties) to suit your needs. -- Patrick Lucas On Thu, Mar 30, 2017 at 2:39 PM, Chakravarthy varaga < chakravarth...@gmail.com> wrote: >

Async Functions and Scala async-client for mySql/MariaDB database connection

2017-03-30 Thread Andrea Spina
Dear Flink community, I started to use Async Functions in Scala, Flink 1.2.0, in order to retrieve enriching information from MariaDB database. In order to do that, I firstly employed classical jdbc library (org.mariadb.jdbc) and it worked has expected. Due to the blocking behavior of jdbc, I'm

Re: Cogrouped Stream never triggers tumbling event time window

2017-03-30 Thread Andrea Spina
Dear community, I finally solved the issue i was bumped into. Basically the reason of the encountered problem was the behavior of my input: incoming rates were so far different in behavior (really late and scarce presence of second type event in event time). The solution I employed was to assign

Re: Web Dashboard reports 0 for sent/received bytes and records

2017-03-30 Thread Konstantin Gregor
Hi Mohammad, that's normal. The source will never have a value other than zero in the "Records received" column, only in the "Records sent" column. But since your whole program is chained to one subtask, the source will never send anything to another subtask, because there is none. Same for the

Web Dashboard reports 0 for sent/received bytes and records

2017-03-30 Thread Mohammad Kargar
Hi, I have a simple job where FlinkKafkaConsumer08 reads data from a topic (with 8 partitions) and a custom Sink implementation inserts the data into Accumulo database. I can verify that the data is getting inserted into the db by adding simple system.out.println(...) to the invoke method of the

Re: deploying flink in AWS - some teething issues

2017-03-30 Thread Chakravarthy varaga
Hi, With regards to logging (both Flink & application specific logs) within the container, are there best practices that you know of to get the logs to a centralized locations.. For e.g. the flink TM's log are local inside the container and I don't wish to write to shared/mounted volumes,

OOM on flink when job restarts a lot

2017-03-30 Thread Frits Jalvingh
Hello List, We have a Flink job running reading a Kafka topic, then sending all messages with a SOAP call. We have had a situation where that SOAP call failed every time, causing the job to be RESTARTING every few seconds. After a few hours Flink itself terminates with an OutOfMemoryError. This

Re: Flink 1.2 time window operation

2017-03-30 Thread Tzu-Li (Gordon) Tai
Hi, Thanks for the clarification. What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet? First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet. Does

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Tzu-Li (Gordon) Tai
I'm wondering what I can tweak further to increase this. I was reading in this blog: https://data-artisans.com/extending-the-yahoo-streaming-benchmark/ about 3 millions per sec with only 20 partitions. So i'm sure I should be able to squeeze out more out of it. Not really sure if it is relevant

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Kamil Dziublinski
Thanks Ted, will read about it. While we are on throughput. Do you guys have any suggestion on how to optimise kafka reading from flink? In my current setup: Flink is on 15 machines on yarn Kafka on 9 brokers with 40 partitions. Source parallelism is 40 for flink, And just for testing I left only

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Ted Yu
Kamil: In the upcoming hbase 2.0 release, there are more write path optimizations which would boost write performance further. FYI > On Mar 30, 2017, at 1:07 AM, Kamil Dziublinski > wrote: > > Hey guys, > > Sorry for confusion it turned out that I had a bug in

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Kamil Dziublinski
Hey guys, Sorry for confusion it turned out that I had a bug in my code, when I was not clearing this list in my batch object on each apply call. Forgot it has to be added since its different than fold. Which led to so high throughput. When I fixed this I was back to 160k per sec. I'm still

flink one transformation end,the next transformation start

2017-03-30 Thread rimin515
hi,all,i run a job,it is :-val data = env.readTextFile("hdfs:///")//DataSet[(String,Array[String])]val dataVec = computeDataVect(data)//DataSet[(String,Int,Array[(Int,Double)])]val rescomm = computeCosSims

Re: Flink 1.2 time window operation

2017-03-30 Thread Dominik Safaric
Hi Gordon, The job was run using processing time. The Kafka broker version I’ve used was 0.10.1.1. Dominik > On 30 Mar 2017, at 08:35, Tzu-Li (Gordon) Tai wrote: > > Hi Dominik, > > Was the job running with processing time or event time? If event time, how > are you

Re: Flink 1.2 time window operation

2017-03-30 Thread Tzu-Li (Gordon) Tai
Hi Dominik, Was the job running with processing time or event time? If event time, how are you producing the watermarks? Normally to understand how windows are firing in Flink, these two factors would be the place to look at. I can try to further explain this once you provide info with these.

Re: Apache Flink Hackathon

2017-03-30 Thread Tzu-Li (Gordon) Tai
Sounds like a cool event! Thanks for sharing this! On March 27, 2017 at 11:40:24 PM, Lior Amar (lior.a...@parallelmachines.com) wrote: Hi all, My name is Lior and I am working at Parallel Machines (a startup company located in the Silicon Valley). We are hosting a Flink Hackathon on April

Re: How to rewind Kafka cursors into a Flink job ?

2017-03-30 Thread Tzu-Li (Gordon) Tai
Hi Dominique, What your plan A is suggesting is that a downstream operator can provide signals to upstream operators and alter their behaviour. In general, this isn’t possible, as in a distributed streaming environment it’s hard to guarantee what records exactly will be altered by the