How to flush all window states after Kafka (0.10.x) topic was removed

2017-09-04 Thread Tony Wei
Hi, I have a simple streaming job consuming data from Kafka and use time window to aggregate them. I am wondering if there is a built-in function to send a max watermark when consumer find this topic is not available, so that the window function can flush all state to the sink function. My Kafka

Does RocksDB need a dedicated CPU?

2017-09-04 Thread Bowen Li
Hi guys, Does RocksDB need a dedicated CPU? Do we need to allocate one CPU for each RocksDB while deploying Flink cluster with RocksDB state backend? I think there's probably no need since RocksDB is a native 'library', but I want to confirm it with Flink community. Thanks, Bowen

Process Function

2017-09-04 Thread Navneeth Krishnan
Hi All, I have a streaming pipeline which is keyed by userid and then to a flatmap function. I need to clear the state after sometime and I was looking at process function for it. Inside the process element function if I register a timer wouldn't it create a timer for each incoming message? // s

State Maintenance

2017-09-04 Thread Navneeth Krishnan
Hi All, I have couple of questions regarding state maintenance in flink. - I have a connected stream and then a keyby operator followed by a flatmap function. I use MapState and keys get added by data from stream1 and removed by messges from stream2. Stream2 acts as a control stream in my pipelin

Re: Rest API for Checkpoint Data

2017-09-04 Thread sohimankotia
Can you point to code to deserialize this data ? That would be great . -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

LatencyMarker

2017-09-04 Thread aitozi
i am doubt that whether the backpressure will increase the latency of the LatencyMarker? does the latency number can be used to monitor the backpressure -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-09-04 Thread Stefan Richter
Hi Jared, I just wanted to follow up on this problem that you reported. Are there any new insights about this problem from your debugging efforts and does it still exists for you? Best, Stefan > Am 09.07.2017 um 18:37 schrieb Jared Stehler > : > > We are using the rocksDB state backend. We h

Re: Rest API for Checkpoint Data

2017-09-04 Thread Aljoscha Krettek
Hi, This is the serialised data for the checkpoint. This is not meant to be user readable, though. There are some plans to make savepoints adhere to a specific schema, so that you could inspect savepoints (which are similar to checkpoints, but with additional metadata [1]). I'm not sure whether

Re: ANN: TouK Nussknacker - designing Flink processes with GUI

2017-09-04 Thread Maciek Próchniak
Hi Flavio, we have rather modest goals - so currently we don't plan to handle batch (although in theory it could be done). We also don't even think about visualizing and editing existing programs - in generality this could be pretty difficult. We only let users define pretty simple diagrams

Re: count + aggragation

2017-09-04 Thread Fabian Hueske
Hi Alieh, I'm not aware of a solution to the first problem, but for the second issue you should use mayBy() instead of max(). Best, Fabian 2017-09-04 16:08 GMT+02:00 Alieh : > Hello all, > > 1st question: > Is there any way to know the count or the content of a "Fink DataSet" > without using co

Re: Securing Flink Monitoring REST API

2017-09-04 Thread Fabian Hueske
Hi, you can configure SSL for Flink's network communication [1] (see jobmanager.web.ssl.enabled). However, Flink does not manage different user accounts or allows to grant permissions yet. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/security-ssl.html 2017-

count + aggragation

2017-09-04 Thread Alieh
Hello all, 1st question: Is there any way to know the count or the content of a "Fink DataSet" without using count() or collect()? The problem is that I have a loop which the number of iterations depends on the count of a DataSet. Using count() may force the whole pipeline to be executed again

Re: ANN: TouK Nussknacker - designing Flink processes with GUI

2017-09-04 Thread Flavio Pompermaier
Thanks for sharing this nice tool maciek. Does it handle both batch and Streaming? Is it able to visualize also an existing Flink program? Best, Flavio On Mon, Sep 4, 2017 at 3:03 PM, Maciek Próchniak wrote: > Hello, > > we would like to announce availability of TouK Nussknacker - tool for > cr

ANN: TouK Nussknacker - designing Flink processes with GUI

2017-09-04 Thread Maciek Próchniak
Hello, we would like to announce availability of TouK Nussknacker - tool for creating Flink processes with GUI. It's our attempt to bring Flink a bit closer to analysts and business people - by letting them design processes with MS Visio-like tool ;) Of course not every Flink process can b

Flink Job Deployment (Not enough resources)

2017-09-04 Thread Rinat
Hi everyone, I’ve got the following problem, when I’m trying to submit new job and if cluster has not enough resources, job submission fails with the following exception But in YARN job hangs and wait’s for requested resources. When resources become available, job successfully runs. What can I

Flink Job Deployment

2017-09-04 Thread Rinat
Hi folks ! I’ve got a question about running flink job on the top of YARN. Is there any possibility to store job sources in hdfs, for example /app/flink/job-name/ - /lib/*.jar - /etc/*.properties and specify directories, that should be added to the job classpath ? Thx.

Securing Flink Monitoring REST API

2017-09-04 Thread avivros
What is the best way to secure the Monitoring REST API? I am using the monitoring rest API in a production environment ( starting/stopping jobs, etc...). I should only allow authenticated calls to be executed ( called from a Java sever process ). What's the best way to go about this ( Kerberos? SSL

Re: Rest API for Checkpoint Data

2017-09-04 Thread sohimankotia
I tried to use file system as Backend store . env.setStateBackend(new FsStateBackend("file:///home/user/Desktop/data/flink/checkpoints")); After running job on local , I see a folder with name *a2716e2992bfb6f8796347328ec23c82* under directory /home/user/Desktop/data/flink/checkpoints . What

Re: Streaming job gets slower and slower

2017-09-04 Thread Till Rohrmann
Hi Aparup, the slow-down can have multiple reasons. One reason could be that your computation in Timeseries-Analytics becomes more complex over time and therefore it slows down resulting in back pressure at the sources. This could be, for example, caused by accumulating a large state. Here the que

Re: How to fill flink's datastream

2017-09-04 Thread AndreaKinn
Hi, I tried to remove the returns function but if I do it, the program returns an error (curious since the return value is a Double). I'm absolutely sure env.execute() is called because I see other streams printed. The program is connected, I followed exactly the example showed in the library, I

Re: termination of stream#iterate on finite streams

2017-09-04 Thread Xingcan Cui
Hi Peter, That's a good idea, but may not be applicable with an iteration operator. The operator can not determine when to generate the "end-of-stream message" for the feedback stream. The provided function (e.g., filter(_ > 0).map(_ - 1)) is stateless and has no side-effects. Best, Xingcan On

Re: Quick start guide

2017-09-04 Thread Aljoscha Krettek
Hi, Yes, I think this is wrong in the doc and should refer to the TaskManager logs. Best, Aljoscha > On 4. Sep 2017, at 03:56, Michael Fong wrote: > > Hi, > > I was following the quick start guide on official documents >

Re: How to fill flink's datastream

2017-09-04 Thread Fabian Hueske
Hi Andrea, a MapFunction calls its map() function for each stream element and returns exactly one result value. MapFunctions are used for 1-to-1 transformations. The returns() method allows to specify the return type of an operator, in your case the MapOperator. It is only necessary if Flink canno

Optimized-Drizzle vs Flink

2017-09-04 Thread Biplob Biswas
I just came across this slide about drizzle where they claim to achieve sub-millisecond latency and they compare with Flink, https://www.slideshare.net/SparkSummit/drizzlelow-latency-execution-for-apache-spark-spark-summit-east-talk-by-shivaram-venkataraman The normal drizzle still performs a bi

Re: Distributed reading and parsing of protobuf files from S3 in Apache Flink

2017-09-04 Thread Fabian Hueske
Hi, readFile() requests a FileInputFormat, i.e., your custom InputFormat would need to extend FileInputFormat. In general, any InputFormat decides about what to read when generating InputSplits. In your case the, createInputSplits() method should return one InputSplit for each file it wants to rea