Re: [Structured Streaming] Avoiding multiple streaming queries

chandan prakash Thu, 12 Jul 2018 02:39:14 -0700

Hi,
Did anyone of you thought  about writing a custom foreach sink writer which
can decided which record should go to which sink (based on some marker in
record, which we can possibly annotate during transformation) and then
accordingly write to specific sink.
This will mean that:
1. every custom sink writer will have connections to as many sinks as many
there are types of sink where records can go.
2.  every record will be read once in the single query but can be written
to multiple sinks


Do you guys see any drawback in this approach ?
One drawback off course there is that sink is supposed to write the records
as they are but we are inducing some intelligence here in the sink.
Apart from that any other issues do you see with this approach?

Regards,
Chandan


On Thu, Feb 15, 2018 at 7:41 AM Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Of course, you can write to multiple Kafka topics from a single query. If
> your dataframe that you want to write has a column named "topic" (along
> with "key", and "value" columns), it will write the contents of a row to
> the topic in that row. This automatically works. So the only thing you need
> to figure out is how to generate the value of that column.
>
> This is documented -
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#writing-data-to-kafka
>
> Or am i misunderstanding the problem?
>
> TD
>
>
>
>
> On Tue, Feb 13, 2018 at 10:45 AM, Yogesh Mahajan <ymaha...@snappydata.io>
> wrote:
>
>> I had a similar issue and i think that’s where the structured streaming
>> design lacks.
>> Seems like Question#2 in your email is a viable workaround for you.
>>
>> In my case, I have a custom Sink backed by an efficient in-memory column
>> store suited for fast ingestion.
>>
>> I have a Kafka stream coming from one topic, and I need to classify the
>> stream based on schema.
>> For example, a Kafka topic can have three different types of schema
>> messages and I would like to ingest into the three different column
>> tables(having different schema) using my custom Sink implementation.
>>
>> Right now only(?) option I have is to create three streaming queries
>> reading the same topic and ingesting to respective column tables using
>> their Sink implementations.
>> These three streaming queries create underlying three
>> IncrementalExecutions and three KafkaSources, and three queries reading the
>> same data from the same Kafka topic.
>> Even with CachedKafkaConsumers at partition level, this is not an
>> efficient way to handle a simple streaming use case.
>>
>> One workaround to overcome this limitation is to have same schema for all
>> the messages in a Kafka partition, unfortunately this is not in our control
>> and customers cannot change it due to their dependencies on other
>> subsystems.
>>
>> Thanks,
>> http://www.snappydata.io/blog <http://snappydata.io>
>>
>> On Mon, Feb 12, 2018 at 5:54 PM, Priyank Shrivastava <
>> priy...@asperasoft.com> wrote:
>>
>>> I have a structured streaming query which sinks to Kafka.  This query
>>> has a complex aggregation logic.
>>>
>>>
>>> I would like to sink the output DF of this query to
>>> multiple Kafka topics each partitioned on a different ‘key’ column.  I
>>> don’t want to have multiple Kafka sinks for each of the
>>> different Kafka topics because that would mean running multiple streaming
>>> queries - one for each Kafka topic, especially since my aggregation logic
>>> is complex.
>>>
>>>
>>> Questions:
>>>
>>> 1.  Is there a way to output the results of a structured streaming query
>>> to multiple Kafka topics each with a different key column but without
>>> having to execute multiple streaming queries?
>>>
>>>
>>> 2.  If not,  would it be efficient to cascade the multiple queries such
>>> that the first query does the complex aggregation and writes output
>>> to Kafka and then the other queries just read the output of the first query
>>> and write their topics to Kafka thus avoiding doing the complex aggregation
>>> again?
>>>
>>>
>>> Thanks in advance for any help.
>>>
>>>
>>> Priyank
>>>
>>>
>>>
>>
>

-- 
Chandan Prakash

Re: [Structured Streaming] Avoiding multiple streaming queries

Reply via email to