FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

2018-07-17 Thread sagar loke
Hi, We have a use case where we are consuming from more than 100s of Kafka Topics. Each topic has different number of partitions. As per the documentation, to parallelize a Kafka Topic, we need to use setParallelism() == number of Kafka Partitions for a topic. But if we are consuming multiple

Re: Parallel stream partitions

2018-07-17 Thread Ken Krugler
Hi Nick, > On Jul 17, 2018, at 9:09 AM, Nicholas Walton > wrote: > > Suppose I have a data stream of tuples > with the sequence of ticks being 1,2,3,…. for each separate k. > > I understand and keyBy(2) I think you want keyBy(1), since it’s 0-based. > will partition

Keeping only latest row by key?

2018-07-17 Thread Porritt, James
In Spark if I want to be able to get a set of unique rows by id, using the criteria of keeping the row with the latest timestamp, I would do the following: .withColumn("rn", F.row_number().over(

Re: clear method on Window Trigger

2018-07-17 Thread Hequn Cheng
Hi Soheil, The clear() method performs any action needed upon removal of the corresponding window. This is called when a window is purged. The differences between FIRE and FIRE_AND_PURGE is FIRE only trigger the computation while FIRE_AND_PURGE trigger the computation and clear the elements in

Parallel stream partitions

2018-07-17 Thread Nicholas Walton
Suppose I have a data stream of tuples with the sequence of ticks being 1,2,3,…. for each separate k. I understand and keyBy(2) will partition the stream so each partition has the same key in each tuple. I now have a sequence of functions to apply to the streams say f(),g() and h() in that

Re: clear method on Window Trigger

2018-07-17 Thread vino yang
Hi Soheil, Did you read the documentation about Flink Window/Trigger [1]? FIRE_AND_PURGE usually used to implement the count window. Flink provide a PurgingTrigger as a warper for other trigger to make those triggers can be purge. One of this class use case is count window[2][3]. About your

Re: FlinkCEP and scientific papers ?

2018-07-17 Thread vino yang
Hi Esa, AFAIK, the earlier Flink CEP refers to the Paper 《Efficient Pattern Matching over Event Streams》[1]. Flink absorbed two major idea from this paper: 1. NFA-b model on event stream 2. a shared versioned match buffer which is a optimized data structure To Till and Chesnay: Did I missed

[ANNOUNCE] Program for Flink Forward Berlin 2018 has been announced

2018-07-17 Thread Fabian Hueske
Hi everyone, I'd like to announce the program for Flink Forward Berlin 2018. The program committee [1] assembled a program of about 50 talks on use cases, operations, ecosystem, tech deep dive, and research topics. The conference will host speakers from Airbnb, Amazon, Google, ING, Lyft,

Re: flink javax.xml.parser Error

2018-07-17 Thread antonio saldivar
If somebody is facing this issue I solve it by adding the exclusion to my POM.xml and I am also using javax.xml org.apache.flink artifactId>flink-core 1.4.2 xml-apis xml-apis javax.xml jaxb-api 2.1 El lun., 16

Re: AvroInputFormat NullPointerException issues

2018-07-17 Thread vino yang
Hi Porritt, OK, it looks good. Thanks, vino. 2018-07-17 23:13 GMT+08:00 Porritt, James : > I got to the bottom of this – it was a namespace issue. My schema was; > > { > > "type" : "record", > > "name" : "MyAvroSchema", > > "fields" : [ { > > "name" : "a", > > "type" : [ "null",

RE: AvroInputFormat NullPointerException issues

2018-07-17 Thread Porritt, James
I got to the bottom of this – it was a namespace issue. My schema was; { "type" : "record", "name" : "MyAvroSchema", "fields" : [ { "name" : "a", "type" : [ "null", "int" ] }, { "name" : "b", "type" : [ "null", "string" ] }] } But actually, I was putting the generated

回复:Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-17 Thread Zhijiang(wangzhijiang999)
Hi Gerard, From the jstack you provided, the task is serializing the output record and during this process it will not process the input data any more. It can not indicate out of memory issue from this stack. And if the output buffer is exhausted, the task will be blocked on

clear method on Window Trigger

2018-07-17 Thread Soheil Pourbafrani
Hi, Can someone elaborate on when the clear method on class Trigger will be called and what is the duty of that? Also, I don't know what is the benefit of FIRE_AND_PURGE against FIRE and it's use case. For example, in a scenario, if we have a count of 3 Window that also will trigger after a

RE: AvroInputFormat NullPointerException issues

2018-07-17 Thread Porritt, James
My MyAvroSchema class is as follows. It was generated using avro-tools: /** * Autogenerated by Avro * * DO NOT EDIT DIRECTLY */ import org.apache.avro.specific.SpecificData; import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; import

FlinkCEP and scientific papers ?

2018-07-17 Thread Esa Heikkinen
Hi I don't know this the correct forum to ask, but are there exist some good scientific papers about FlinkCEP (Complex Event Processing) ? I know Flink is based to Stratosphere, but how is it FlinkCEP ? BR Esa

RequiredParameters in Flink 1.5.1

2018-07-17 Thread Flavio Pompermaier
Hi to all, I'm trying to migrate a job from Flink 1.3.1 to 1.5.1 but it seems that RequiredParameters and ParameterTool works differently from before... My code is the following: ParameterTool parameters = ParameterTool.fromArgs(args); RequiredParameters required = new RequiredParameters();

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread 陈梓立
Yes I can see it now. Thank you all! Till Rohrmann 于2018年7月17日周二 下午7:53写道: > Yes, pulling from https://git-wip-us.apache.org/repos/asf/flink.git > should show you the release-1.6 branch. > > Cheers, > Till > > On Tue, Jul 17, 2018 at 10:37 AM Chesnay Schepler > wrote: > >> The release-1.6

[ANNOUNCE] Weekly community update #29

2018-07-17 Thread Till Rohrmann
Dear community, this is the weekly community update thread #29. Please post any news and updates you want to share with the community to this thread. # Feature freeze Flink 1.6 The Flink community has cut off the release branch for Flink 1.6 [1]. From now on, the community will concentrate on

Serialization questions

2018-07-17 Thread Flavio Pompermaier
Hi to all, I was trying to check whether our jobs are properly typed or not. I've started disabling generic types[1] in order to discover untyped transformations and so I added the proper returns() to operators. Unfortunately there are jobs where we serialize Thrift and DateTime objects, so I

Re: StateMigrationException when switching from TypeInformation.of to createTypeInformation

2018-07-17 Thread Till Rohrmann
Hi Elias, I think introducing a new state and the deprecating the old one is currently the only way to solve this problem. The community is currently working on supporting state evolution [1]. With this feature it should be possible to change serializers between two savepoints. Unfortunately,

Object reuse in DataStreams

2018-07-17 Thread Urs Schoenenberger
Hi all, we came across some interesting behaviour today. We enabled object reuse on a streaming job that looks like this: stream = env.addSource(source) stream.map(mapFnA).addSink(sinkA) stream.map(mapFnB).addSink(sinkB) Operator chaining is enabled, so the optimizer fuses all operations into a

Re: rowTime from json nested timestamp field in SQL-Client

2018-07-17 Thread Timo Walther
Hi Ashwin, if you quickly want to make this work you can look into org.apache.flink.table.descriptors.RowtimeValidator#getRowtimeComponents. This is the component that converts the string property into a org.apache.flink.table.sources.tsextractors.TimestampExtractor. You can implement your

Re: rowTime from json nested timestamp field in SQL-Client

2018-07-17 Thread Ashwin Sinha
Hi Timo, We want to add this functionality in a forked branch. Can you guide us with steps to quickly apply a patch/fix for the same? On Mon, Jul 16, 2018 at 9:06 PM Ashwin Sinha wrote: > Thanks Timo for the clarification, but our processing actually involves > aggregations on huge past data

Re: Flink CLI properties with HA

2018-07-17 Thread Till Rohrmann
Hi Sampath, technically the client does not need to know the `high-availability.storageDir` to submit a job. However, due to how we construct the ZooKeeperHaServices it is still needed. The reason behind this is that we use the same services for the server and the client. Thus, the implementation

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Till Rohrmann
Yes, pulling from https://git-wip-us.apache.org/repos/asf/flink.git should show you the release-1.6 branch. Cheers, Till On Tue, Jul 17, 2018 at 10:37 AM Chesnay Schepler wrote: > The release-1.6 branch exists ( >

Re: Flink CLI properties with HA

2018-07-17 Thread vino yang
Hi Sampath, It seems Flink CLI for standalone would not access *high-availability.storageDir.* What's the exception stack trace in your environment? Thanks, vino. 2018-07-17 15:08 GMT+08:00 Sampath Bhat : > Hi vino > > Should the flink CLI have access to the path mentioned in >

Re: Global latency metrics

2018-07-17 Thread Chesnay Schepler
No, you can only get the latency for each operator. For starters, how would a global latency even account for multiple sources/sink? On 17.07.2018 10:22, shimin yang wrote: Hi All, Is there a method to get the global latency directly? Since I only find the latency for each operator in the

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Chesnay Schepler
The release-1.6 branch exists (https://git-wip-us.apache.org/repos/asf?p=flink.git;a=shortlog;h=refs/heads/release-1.6), but wasn't synced to GitHub yet. On 17.07.2018 09:33, Timo Walther wrote: Hi Tison, I guess this was a mistake that will be fixed soon. Till (in CC) forked off the

Re: Flink WindowedStream - Need assistance

2018-07-17 Thread Titus Rakkesh
Friends, any assistance regarding this? On Mon, Jul 16, 2018 at 3:34 PM, Titus Rakkesh wrote: > We have 2 independent streams which will receive elements in different > frequency, > > DataStream> splittedActivationTuple; > > DataStream> unionReloadsStream; > > We have a requirement to keep

Re: Parallelism and keyed streams

2018-07-17 Thread Nicholas Walton
Martin, To clarify things the code causing the issue is here, nothing clever. The code fails at the line in bold. The Long index values are set earlier in sequence 1,2,3,4,5,6,7…... val scaledReadings : DataStream[(Int,Long, Double, Double)] = maxChannelReading .keyBy(0) .map { in

Re: Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Xingcan Cui
Hi Soheil, The `getSideOutput()` method is defined on the operator instead of the datastream. You can invoke it after any action (e.g., map, window) performed on a datastream. Best, Xingcan > On Jul 17, 2018, at 3:36 PM, Soheil Pourbafrani wrote: > > Hi, according to the documents I tried

Re: Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Dawid Wysakowicz
Hi Soheil, The /getSideOutput/ method is part of /SingleOutputStreamOperator/ which extends /DataStream///. Try using /SingleOutputStreamOperator/ as the type for your res variable. Best, Dawid On 17/07/18 09:36, Soheil Pourbafrani wrote: > Hi, according to the documents I tried to get late

Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Soheil Pourbafrani
Hi, according to the documents I tried to get late data using side output. final OutputTag> lateOutputTag = new OutputTag>("late-data"){}; DataStream> res = aggregatedTuple .assignTimestampsAndWatermarks(new Bound())

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Timo Walther
Hi Tison, I guess this was a mistake that will be fixed soon. Till (in CC) forked off the release-1.6 branch yesterday? Regards, Timo Am 17.07.18 um 04:00 schrieb 陈梓立: Hi, I see no 1.6 branch or tag. What's the reason we skip 1.6 and now 1.7-SNAPSHOT? or there is a 1.6 I miss. Best,

Re: Flink CLI properties with HA

2018-07-17 Thread Sampath Bhat
Hi vino Should the flink CLI have access to the path mentioned in *high-availability.storageDir*? If my flink cluster is on set of machines and i submit my job from flink CLI from another independent machine by giving necessary details will the CLI try to access *high-availability.storageDir

Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-17 Thread Piotr Nowojski
Hi, Thanks for the additional data. Just to make sure, are you using Flink 1.5.0? There are a couple of threads that seams to be looping in serialisation, while others are blocked and either waiting for new data or waiting for some one to consume some data. Could you debug or CPU profile the

Re: Persisting Table in Flink API

2018-07-17 Thread Shivam Sharma
Thanks, Vino & Hequn. On Mon, Jul 16, 2018 at 5:47 PM Hequn Cheng wrote: > Hi Shivam, > > I think the non-window stream-stream join can solve your problem. > The non-window join will store all data from both inputs and output joined > results. The semantics of non-window join is exactly the