protobuf messages from Kafka to elasticsearch using flink

2016-03-08 Thread Madhukar Thota
Friends, Can someone guide me or share an example on how to consume protobuf message from kafka and index into Elasticsearch using flink?

JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-08 Thread Prez Cannady
I’m attempting to create a stream using JDBCInputFormat. Objective is to convert each record into a tuple and then serialize for input into a Kafka topic. Here’s what I have so far. ``` val env = StreamExecutionEnvironment.getExecutionEnvironment val inputFormat =

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Maximilian Bode
Hi Aljoscha, yeah I should have been clearer. I did mean those accumulators but am not trusting them in the sense of total number (as you said, they are reset on failure). On the other hand, if they do not change for a while it is pretty obvious that the job has ingested everything in the

Re: ype of TypeVariable could not be determined

2016-03-08 Thread Timo Walther
Hi Radu, the exception can have multiple causes. It would be great if you could share some example code. In most cases the problem is the following: public class MapFunction { } new MapFunction(); The type WhatEverType is type erasured by Java. The type

RE: ype of TypeVariable could not be determined

2016-03-08 Thread Radu Tudoran
Hi, The issue is that this problem appears when I want to create a stream source. StreamExecutionEnvironment.addSource(new MySourceFunction()) … Where the stream source class is MySourceFunction implements SourceFunction { … } In such a case I am not sure how I can pass the outertype nor how

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Maximilian Bode
Hi Aljoscha, oh I see. I was under the impression this file was used internally and the output being completed at the end. Ok, so I extracted the relevant lines using for i in part-*; do head -c $(cat "_$i.valid-length" | strings) "$i" > "$i.final"; done which seems to do the trick.

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Aljoscha Krettek
Hi, a missing part file for one of the parallel sinks is not necessarily a problem. This can happen if that parallel instance of the sink never received data after the job successfully restarted. Missing data, however, is a problem. Maybe I need some more information about your setup: -

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Maximilian Bode
Hi, thanks for the fast answer. Answers inline. > Am 08.03.2016 um 13:31 schrieb Aljoscha Krettek : > > Hi, > a missing part file for one of the parallel sinks is not necessarily a > problem. This can happen if that parallel instance of the sink never received > data after

Re: Flink streaming throughput

2016-03-08 Thread おぎばやしひろのり
Stephan, Sorry for the delay in my response. I tried 3 cases you suggested. This time, I set parallelism to 1 for simpicity. 0) base performance (same as the first e-mail): 1,480msg/sec 1) Disable checkpointing : almost same as 0) 2) No ES sink. just print() : 1,510msg/sec 3) JSON to TSV :

Re: Flink streaming throughput

2016-03-08 Thread Aljoscha Krettek
Hi, Another interesting test would be a combination of 3) and 2). I.e. no JSON parsing and no sink. This would show what the raw throughput can be before being slowed down by writing to Elasticsearch. Also .print() is also not feasible for production since it just prints every element to the

Re: Window apply problem

2016-03-08 Thread Wang Yangjun
Hello Marcela, I am not sure what is the “parameters mismatch” here. From the example you shown, it seems that you just want do a window word count. Right? Could you try this code and is it want you want? Best, Jun - StreamExecutionEnvironment

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Aljoscha Krettek
Hi, are you taking the “.valid-length” files into account. The problem with doing “exactly-once” with HDFS is that before Hadoop 2.7 it was not possible to truncate files. So the trick we’re using is to write the length up to which a file is valid if we would normally need to truncate it. (If

Re: Webclient script misses building from source

2016-03-08 Thread Andrea Sella
I missed it! Thank you, Andrea 2016-03-08 14:27 GMT+01:00 Aljoscha Krettek : > Hi Andrea, > in Flink 1.0 there is no more a separate web client. The web client is > part of the default JobManager dashboard now. > > You can also disable the web client part of the JobManager

Webclient script misses building from source

2016-03-08 Thread Andrea Sella
Hi, I've built Flink from source but I was not able to find in build-target/bin the script start-webclient.sh to launch the WebUI. The script is available just in the binaries or I have to add an argument to trigger its generation? Thanks in advance, Andrea

Re: Webclient script misses building from source

2016-03-08 Thread Aljoscha Krettek
Hi Andrea, in Flink 1.0 there is no more a separate web client. The web client is part of the default JobManager dashboard now. You can also disable the web client part of the JobManager dashboard by setting: jobmanager.web.submit.enable: false in flink-conf.yaml. Cheers, Aljoscha > On 08 Mar

Re: rebalance of streaming job after taskManager restart

2016-03-08 Thread Aljoscha Krettek
Yes, there are plans to make this more streamlined but we are not there yet, unfortunately. > On 08 Mar 2016, at 16:07, Maciek Próchniak wrote: > > Hi, > > thanks for quick answer - yes, I does what I want to accomplish, > but I was hoping for some "easier" solution. > Are there

rebalance of streaming job after taskManager restart

2016-03-08 Thread Maciek Próchniak
Hi, we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots. How can I trigger restart (or other similar process) that will cause the job

[ANNOUNCE] Flink 1.0.0 has been released

2016-03-08 Thread Kostas Tzoumas
Hi everyone! As you might have noticed, Apache Flink 1.0.0 has been released and announced! You can read more about the release at the ASF blog and the Flink blog - https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces88 -

Re: rebalance of streaming job after taskManager restart

2016-03-08 Thread Aljoscha Krettek
Hi, I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers. See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html and

Re: rebalance of streaming job after taskManager restart

2016-03-08 Thread Maciek Próchniak
Hi, thanks for quick answer - yes, I does what I want to accomplish, but I was hoping for some "easier" solution. Are there any plans for "restart" button/command or sth similar? I mean, the whole process of restarting is ready as I understand - as it's triggered when task manager dies.

RE: [ANNOUNCE] Flink 1.0.0 has been released

2016-03-08 Thread Radu Tudoran
Hi, Do you have also a linkedin post that I could share - or should I make a blogpost in which I take this announcement? Dr. Radu Tudoran Research Engineer - Big Data Expert IT R Division HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center Riesstrasse 25, 80992 München E-mail:

Re: Window apply problem

2016-03-08 Thread Marcela Charfuelan
Thanks Jun, Very useful, I was confusing the parameters because my input is tuples, which I might not need in the end... I have now what I wanted (non-parallel and not so efficient I guess, any suggestion to improve is welcome) and I have to modify the trigger so to FIRE_AND_PURGE when it

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Aljoscha Krettek
Hi, with accumulator you mean the ones you get from RuntimeContext.addAccumulator/getAccumulator? I’m afraid these are not fault-tolerant which means that the count in these probably doesn’t reflect the actual number of elements that were processed. When a job fails and restarts the

Re: Window apply problem

2016-03-08 Thread Aljoscha Krettek
Hi, there is also PurgingTrigger, which turns any Trigger into a trigger that also purges when firing. Use it like this: .trigger(PurgingTrigger.of(CountTrigger.of(5))) Cheers, Aljoscha > On 08 Mar 2016, at 17:23, Marcela Charfuelan > wrote: > > Thanks Jun, >

ype of TypeVariable could not be determined

2016-03-08 Thread Radu Tudoran
Hi, I am trying to create a custom stream source. I first build this with generic and I run into problems regarding type extraction. I tried to put concrete types but run into the same issue (see errors below). Can anyone provide a solution to solve this issue. Caused by:

Stack overflow from self referencing Avro schema

2016-03-08 Thread David Kim
Hello all, I'm running into a StackOverflowError using flink 1.0.0. I have an Avro schema that has a self reference. For example: item.avsc { "namespace": "..." "type": "record" "name": "Item", "fields": [ { "name": "parent" "type": ["null, "Item"] } ] } When

Re: ype of TypeVariable could not be determined

2016-03-08 Thread Wang Yangjun
Hi Radu, I met this issue also. The reason is outTypeInfo couldn't be created base on generic type when a transform applied. public SingleOutputStreamOperator transform(String operatorName, TypeInformation outTypeInfo, OneInputStreamOperator operator) The solution would be passed

Re: [ANNOUNCE] Flink 1.0.0 has been released

2016-03-08 Thread Igor Berman
Congratulations! Very nice work, very interesting features. One question regarding CEP: do you think it's feasible to define pattern over window of 1 month or even more? Is there some deep explanation regarding how this partial states are saved? I mean events that create "funnel" might be