Re: Splitting stream

2021-05-10 Thread Taher Koitawala
I think what your looking for is a a side output. Change the logic to a process function. What is true goes to collector false can go to a side output. Which now gives you 2 streams On Mon, May 10, 2021, 8:14 PM Nikola Hrusov wrote: > Hi Arvid, > > In my case it's the latter, thus I have also

Re: Timestamp Issue with OutputTags

2021-01-11 Thread Taher Koitawala
tagSet.addAll(tags); > > } > > > > for (String tag : tagSet) { > > outputMap.putIfAbsent(tag, new OutputTag(tag) > {}); > > ctx.output(outputMap.get(tag), value); > > } > > } > > } > > > > Exception comes at

Re: Timestamp Issue with OutputTags

2021-01-11 Thread Taher Koitawala
Can you please share your code? On Mon, Jan 11, 2021, 6:47 PM Priyanka Kalra A < priyanka.a.ka...@ericsson.com> wrote: > Hi Team, > > > > We are generating multiple side-output tags and using default processing > time on non-keyed stream. The class $YYY extends *ProcessFunction* O> and

Re: map JSON to scala case class & off-heap optimization

2020-07-09 Thread Taher Koitawala
on4s or > https://github.com/FasterXML/jackson-module-scala both only seem to > consume strings. > > Best, > Georg > > Am Do., 9. Juli 2020 um 19:17 Uhr schrieb Taher Koitawala < > taher...@gmail.com>: > >> You can try the Jackson ObjectMapper library and t

Re: map JSON to scala case class & off-heap optimization

2020-07-09 Thread Taher Koitawala
You can try the Jackson ObjectMapper library and that will get you from json to object. Regards, Taher Koitawala On Thu, Jul 9, 2020, 9:54 PM Georg Heiler wrote: > Hi, > > I want to map a stream of JSON documents from Kafka to a scala case-class. > How can this be accomp

Re: Data Stream Enrichement

2020-05-30 Thread Taher Koitawala
The Open method would be a great! And close method could close it when operator closes! Also for external calls AsyncIO is a great operator. Give that a look. Regards, Taher Koitawala On Sat, May 30, 2020, 10:17 PM Aissa Elaffani wrote: > Hello Guys, > I want to enrich a data

Re: Blocking KeyedCoProcessFunction.processElement1

2020-01-28 Thread Taher Koitawala
Would AsyncIO operator not be an option for you to connect to RDBMS? On Tue, Jan 28, 2020, 12:45 PM Alexey Trenikhun wrote: > Thank you Yun Tang. > My implementation potentially could block for significant amount of time, > because I wanted to do RDBMS maintenance (create partitions for new

Re: How to emit changed data only w/ Flink trigger?

2019-11-01 Thread Taher Koitawala
You can do this by writing a custom trigger or evictor. On Fri, Nov 1, 2019 at 3:08 PM Qi Kang wrote: > Hi all, > > > We have a Flink job which aggregates sales volume and GMV data of each > site on a daily basis. The code skeleton is shown as follows. > > > ``` > sourceStream > .map(message

Re: Customize Part file naming (Flink 1.9.0)

2019-10-19 Thread Taher Koitawala
Beware when using Bucketing sink as it does not follow exactly once semantics. Also it has issues with s3 consistency. On Sat, Oct 19, 2019, 1:42 PM Ravi Bhushan Ratnakar < ravibhushanratna...@gmail.com> wrote: > Hi, > > As an alternative, you may use BucketingSink which provides you the >

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread taher koitawala
> On Sun 18 Aug 2019 at 16:24, taher koitawala wrote: > >> Hi Swapnil, >>We faced this problem once, I think changing checkpoint dir to >> hdfs and keeping sink dir to s3 with EMRFS s3 consistency enabled solves >> this problem. If you are not usin

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread taher koitawala
Hi Swapnil, We faced this problem once, I think changing checkpoint dir to hdfs and keeping sink dir to s3 with EMRFS s3 consistency enabled solves this problem. If you are not using emr then I don't know how else it can be solved. But in a nutshell because EMRFS s3 consistency uses Dynamo

Re: RocksDB KeyValue store

2019-07-30 Thread taher koitawala
I believe Flink serialization is really fast and GC is much better from Flink 1.6 release, along side the state depends on what you do with it. each task manager has its own instance of rocks DB and is responsible for snapshot for his own instance upon checkpointing. Further more if you used a

Re: Event time window eviction

2019-07-29 Thread taher koitawala
progress? > > Thanks > > On Mon, Jul 29, 2019 at 10:51 AM taher koitawala > wrote: > >> I believe the approach to this is wrong... For fixing windows we can >> write our custom triggers to fire them... However what I'm not convinced >> with is switching between event

Re: Event time window eviction

2019-07-29 Thread taher koitawala
I believe the approach to this is wrong... For fixing windows we can write our custom triggers to fire them... However what I'm not convinced with is switching between event and processing time. Write a custom triggers and fire the event time window if you don't see any activity. That's

Re: Apache Flink - Manipulating the state of an object in an evictor or trigger

2019-07-18 Thread taher koitawala
As far as I know. It is completely safe On Fri, Jul 19, 2019, 1:35 AM M Singh wrote: > Just wanted to see if there is any advice on this question. Thanks > > On Sunday, July 14, 2019, 09:07:45 AM EDT, M Singh > wrote: > > > Hi: > > Is it safe to manipulate the state of an object in the

Re: table toRetractStream missing last record and adding extra column (True)

2019-07-16 Thread taher koitawala
Looks like you need a window On Tue, Jul 16, 2019, 9:24 PM sri hari kali charan Tummala < kali.tumm...@gmail.com> wrote: > Hi All, > > I am trying to write toRetractSream to CSV which is kind of working ok but > I get extra values like True and then my output data values. > > Question1 :- > I

Re: Split Stream on a Split Stream

2019-02-27 Thread Taher Koitawala
No particular reason for not using the process function, just wanted to clarify if that was the correct way to do it. Thanks Knauf. Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Wed, Feb 27, 2019 at 8:23 PM Konstantin Knauf wrote: > Hi Taher , > > a ProcessFunction is

Split Stream on a Split Stream

2019-02-26 Thread Taher Koitawala
ava.lang.IllegalStateException: Consecutive multiple splits are not supported. Splits are deprecated. Please use side-outputs Regards, Taher Koitawala GS Lab Pune +91 8407979163

Re: StreamingFileSink Avro batch size and compression

2019-01-26 Thread Taher Koitawala
thub.com/apache/flink/blob/master/flink-streaming-java/src/test/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssignerITCases.java > On 26/01/2019 08:18, Taher Koitawala wrote: > > Can someone please help with this? > > On Fri 25 Jan, 2019, 1:47 PM Taher Koitawala

Re: StreamingFileSink Avro batch size and compression

2019-01-25 Thread Taher Koitawala
Can someone please help with this? On Fri 25 Jan, 2019, 1:47 PM Taher Koitawala Hi All, > Is there a way to specify *batch size* and *compression *properties > when using StreamingFileSink just like we did in bucketing sink? The only > parameters it is accepting is Inactivi

StreamingFileSink Avro batch size and compression

2019-01-25 Thread Taher Koitawala
the same kafka topics, however doing different operations. And each flink job is writing a file with different size and we would want to make it consistent. Regards, Taher Koitawala GS Lab Pune +91 8407979163

StreamingFileSink cannot get AWS S3 credentials

2019-01-11 Thread Taher Koitawala
cket and folders, we are only facing this issue with StreamingFileSink. Regards, Taher Koitawala GS Lab Pune +91 8407979163

Re: SteamingFileSink with TwoPhaseCommit

2019-01-10 Thread Taher Koitawala
similar to the ones > that the older BucketingSink was doing. > > Cheers, > Kostas > > On Thu, Jan 10, 2019 at 10:47 AM Taher Koitawala < > taher.koitaw...@gslab.com> wrote: > >> Hi Kostas, >>Thanks you for the clarification, also can you pl

Re: SteamingFileSink with TwoPhaseCommit

2019-01-10 Thread Taher Koitawala
Hi Kostas, Thanks you for the clarification, also can you please point how StreamingFileSink uses TwoPhaseCommit. Can you also point out the implementing class for that? Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Thu, Jan 10, 2019 at 3:10 PM Kostas Kloudas wrote

Re: SteamingFileSink with TwoPhaseCommit

2019-01-10 Thread Taher Koitawala
Koitawala On Thu 10 Jan, 2019, 2:40 PM Kostas Kloudas Hi Taher, > > The StreamingFileSink implements a version of TwoPhaseCommit. Can you > elaborate a bit on what do you mean by " TwoPhaseCommit is not being used > "? > > Cheers, > Kostas > > On Thu, Jan 10, 201

SteamingFileSink with TwoPhaseCommit

2019-01-10 Thread Taher Koitawala
Hi All, As per my understanding and the API of StreamingFileSink, TwoPhaseCommit is not being used. Can someone please confirm is that's right? Also if StreamingFileSink does not support TwoPhaseCommits what is the best way to implement this? Regards, Taher Koitawala GS

Row format or bulk format

2018-12-27 Thread Taher Koitawala
please elaborate and explain the how the row format and the bulk works? Document only stresses on how they will be serialized. Taher Koitawala GS Lab Pune +91 8407979163

Flink job execution graph hints

2018-12-09 Thread Taher Koitawala
Hi All, Is there a way to send hints to the job graph builder!? Like specifically disabling or enabling chaining.

Re: Flink join stream where one stream is coming 5 minutes late

2018-11-26 Thread Taher Koitawala
May I ask why you want to have 2 differences between window time? What's the use case? On Mon 26 Nov, 2018, 5:53 PM Abhijeet Kumar Hello Team, > > I've to join two stream where one stream is coming late. So, I planned > doing it by creating two windows, for first window the size will be 5 >

Re: understadning kafka connector - rebalance

2018-11-26 Thread Taher Koitawala
ion is by the key. rebalancing will not shuffle this partitioning ? > e.g > .addSource(source) > .rebalance > .keyBy(_.id) > .mapWithState(...) > > > On Mon, Nov 26, 2018 at 8:32 AM Taher Koitawala > wrote: > >> Hi Avi, >> No,

Re: Window is not working in streaming

2018-11-26 Thread Taher Koitawala
in my code then > it would be a great help. > > Thanks, > > > *Abhijeet Kumar* > Software Development Engineer, > Sentienz Solutions Pvt Ltd > Cognitive Data Platform - Perceive the Data ! > abhijeet.ku...@sentienz.com |www.sentienz.com | Bengaluru > > > On 26-N

Re: understadning kafka connector - rebalance

2018-11-25 Thread Taher Koitawala
Hi Avi, No, rebalance is not changing the number of kafka partitions. Lets say you have 6 kafka partitions and your flink parallelism is 8, in this case using rebalance will send records to all downstream operators in a round robin fashion. Regards, Taher Koitawala GS Lab Pune +91

Re: RocksDB checkpointing dir per TM

2018-10-26 Thread Taher Koitawala
ributed file system > available on all TMs. > The location is set in Flink option ‘state.checkpoints.dir'. > This way job can restore from it with different set of TMs. > > Best, > Andrey > > > On 26 Oct 2018, at 08:29, Taher Koitawala > wrote: > > > >

RocksDB checkpointing dir per TM

2018-10-26 Thread Taher Koitawala
that we have an even spread when the RocksDB files are written? Thanks, Taher Koitawala

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-12 Thread Taher Koitawala
Sounds smashing; I think the initial integration will help 60% or so flink sql users and a lot other use cases will emerge when we solve the first one. Thanks, Taher Koitawala On Fri 12 Oct, 2018, 10:13 AM Zhang, Xuefu, wrote: > Hi Taher, > > Thank you for your input. I think you e

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
;using" The way we use this is: Using streaming_table as configuration select count(*) from processingtable as streaming; This way users can now pass Flink SQL info easily and get rid of the Flink SQL configuration file all together. This is simple and easy to understand and I think most user

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
lect count(*) from flink_mailing_list process as batch; This way we could completely get rid of Flink SQL configuration files. Thanks, Taher Koitawala Integrating On Fri 12 Oct, 2018, 2:35 AM Zhang, Xuefu, wrote: > Hi Rong, > > Thanks for your feedback. Some of my earlier comments might ha

Re: Kafka Per-Partition Watermarks

2018-10-04 Thread Taher Koitawala
t fine. Thanks, Taher Koitawala On Fri 5 Oct, 2018, 1:28 AM Andrew Kowpak, wrote: > Hi all, > > I apologize if this has been discussed to death in the past, but, I'm > finding myself very confused, and google is not proving helpful. > > Based on the documentation, I understan

Re: Null Flink State

2018-09-25 Thread Taher Koitawala
Hi Dawid, Thanks for the answer, how do I get the state of the Window then? I do understand that elements are going to the state as window in itself is a stateful operator. How do I get access to those elements? Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Tue, Sep 25

Null Flink State

2018-09-25 Thread Taher Koitawala
t; valueState = ctx.globalState().getState(new ValueStateDescriptor<>("valueState", TypeInformation.of(new TypeHint() {}))); System.out.println(valueState.value()); collector.collect(T) }) Regards, Taher Koitawala GS Lab Pune +91 8407979163

Re: How does flink read data from kafka number of TM's are more than topic partitions

2018-09-21 Thread Taher Koitawala
Thanks a lot for the explanation. That was exactly what I thought should happen. However, it is always good to a clear confirmation. Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski wrote: > Hi, > > Yes, in your case half of the Kaf

How does flink read data from kafka number of TM's are more than topic partitions

2018-09-21 Thread Taher Koitawala
the number of TM's than the number of partitions in the Kafka topic guarantee high throughput? Regards, Taher Koitawala GS Lab Pune +91 8407979163

Lookup from state

2018-09-17 Thread Taher Koitawala
' from Stream2 reaches after maxOutOfOrderness time has passed. In this scenario as per my knowledge. X will be maintained in the flink state. However, when X' comes, how do I do a lookup for X from the flink state and carry on the further aggregation or whatever I want to do. Regards, Taher Koitawala

Re: Watermarks in Event Time windowing

2018-09-13 Thread Taher Koitawala
tate, and it's often difficult to achieve this state, because eventtime always has more or less delay as events are transmitted from the source to the processing system. Thanks, vino. [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamps_watermarks.html Taher Koitawal

Watermarks in Event Time windowing

2018-09-13 Thread Taher Koitawala
Hi All, Can someone show a good example of how watermarks need to be generated when using EventTime windows? What happens if the watermark is same as the timestamp? How does the watermark help in the window to be triggered and what if watermarks are kept behind the currentTimestamps in

Re: How does flink read a DataSet?

2018-09-12 Thread Taher Koitawala
ender side (batch only). 2018-09-12 7:30 GMT-04:00 Taher Koitawala : > So flink TMs reads one line at a time from hdfs in parallel and keep > filling it in memory and keep passing the records to the next operator? I > just want to know how data comes in memory? How it is pa

Re: How does flink read a DataSet?

2018-09-12 Thread Taher Koitawala
;> by directly playing back the complete data set. A TaskManager fails, Flink >> will kick it out of the cluster, and the Task running on it will fail, but >> the result of stream processing and batch Task failure is different. For >> stream processing, it triggers a restart of the en

Re: How does flink read a DataSet?

2018-09-11 Thread Taher Koitawala
Furthermore, how does Flink deal with Task Managers dying when it is using the DataSet API. Is checkpointing done on dataset too? Or the whole dataset has to re-read. Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Tue, Sep 11, 2018 at 11:18 PM, Taher Koitawala wrote: > Hi

How does flink read a DataSet?

2018-09-11 Thread Taher Koitawala
Hi All, Just like Spark does Flink read a dataset and keep it in memory and keep applying transformations? Or all records read by Flink async parallel reads? Furthermore, how does Flink deal with Regards, Taher Koitawala GS Lab Pune +91 8407979163