Re: Collector.collect

2017-05-02 Thread Chesnay Schepler
day, May 01, 2017 12:56 PM *To:* Newport, Billy [Tech]; 'user@flink.apache.org' *Subject:* Re: Collector.collect Oh you have multiple different output formats, missed that. For the Batch API you are i believe correct, using a custom output-format is the best solution. In the Streaming API the code be

RE: Collector.collect

2017-05-02 Thread Newport, Billy
, May 01, 2017 12:56 PM To: Newport, Billy [Tech]; 'user@flink.apache.org' Subject: Re: Collector.collect Oh you have multiple different output formats, missed that. For the Batch API you are i believe correct, using a custom output-format is the best solution. In the Streaming API the code below

Re: Collector.collect

2017-05-01 Thread Chesnay Schepler
:41 AM *To:* user@flink.apache.org *Subject:* Re: Collector.collect Hello, @Billy, what prevented you from duplicating/splitting the record, based on the bitmask, in a map function before the sink? This shouldn't incur any serialization overhead if the sink is chained to the map. The emitted

RE: Collector.collect

2017-05-01 Thread Newport, Billy
approaches? From: Chesnay Schepler [mailto:ches...@apache.org] Sent: Monday, May 01, 2017 10:41 AM To: user@flink.apache.org Subject: Re: Collector.collect Hello, @Billy, what prevented you from duplicating/splitting the record, based on the bitmask, in a map function before the sink? This shouldn't

Re: Collector.collect

2017-05-01 Thread Chesnay Schepler
Hello, @Billy, what prevented you from duplicating/splitting the record, based on the bitmask, in a map function before the sink? This shouldn't incur any serialization overhead if the sink is chained to the map. The emitted Tuple could also share the GenericRecord; meaning you don't even

RE: Collector.collect

2017-05-01 Thread Newport, Billy
We’ve done that but it’s very expensive from a serialization point of view when writing the same record multiple times, each in a different tuple. For example, we started with this: .collect(new Tuple