Re: Flink custom parallel data source

Alexander Fedulov Tue, 31 Oct 2023 13:24:47 -0700

Flink natively supports a pull-based model for sources, where the source
operators request data from the external system when they are ready to
process it.  Implementing a TCP server socket operator essentially creates
a push-based source, which could lead to backpressure problems if the data
ingestion rate exceeds the processing rate. You also lose any delivery
guarantees because Flink's fault tolerance model relies on having
replayable sources.
Is using a message broker not feasible in your case?


Best,
Alexander Fedulov

On Tue, 31 Oct 2023 at 13:08, Kamal Mittal <kamal.mit...@ericsson.com>
wrote:

> Hello,
>
>
>
> We are writing TCP server socket custom source function in which TCP
> server socket listener will accept connections and read data.
>
> Single Custom source server socket function – ServerSocket *serversocket* =
> new ServerSocket(9999);
>
> Now using thread pool accept multiple connections in separate threads = new
>  *Runnable* () -> *serversocket*.accept();
>
> So client socket will be accepted and given to separate thread for read
> data from TCP stream.
>
> Rgds,
>
> Kamal
>
> *From:* Alexander Fedulov <alexander.fedu...@gmail.com>
> *Sent:* 31 October 2023 04:03 PM
> *To:* Kamal Mittal <kamal.mit...@ericsson.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Flink custom parallel data source
>
>
>
> Please note that SourceFunction API is deprecated and is due to be
> removed, possibly in the next major version of Flink.
>
> Ideally you should not be manually spawning threads in your Flink
> applications. Typically you would only perform data fetching in the sources
> and do processing in the subsequent operators which you can scale
> independently from the source parallelism. Can you describe what you are
> trying to achieve?
>
>
>
> Best,
>
> Alexander Fedulov
>
>
>
> On Tue, 31 Oct 2023 at 07:22, Kamal Mittal via user <user@flink.apache.org>
> wrote:
>
> Hello Community,
>
>
>
> I need to have a custom parallel data source (Flink ParallelSourceFunction)
> for fetching data based on some custom logic. In this source function,
> opening multiple threads via java thread pool to distribute work further.
>
>
>
> These threads share Flink provided ‘SourceContext’ and collect records via
> source_context.collect() method.
>
>
>
> Is it ok to share source context in separate threads and get data?
>
>
>
> Is there any issue for downstream operators due to above design?
>
>
>
> Rgds,
>
> Kamal
>
>

Re: Flink custom parallel data source

Reply via email to