Re: Flink custom parallel data source

2023-11-03 Thread David Anderson
> As you suggested message broker below then how it is feasible in this
case?

To my mind, the idea would be to use something like a socket source for
Kafka Connect. This would give you a simple, reliable way to get the data
stored into a replayable data store. You'd then be able to start, stop, and
redeploy the Flink app without worrying about data loss because the data
reception and storage would be decoupled from the data processing.

David

On Tue, Oct 31, 2023 at 7:50 PM Kamal Mittal via user 
wrote:
>
> Thanks for sharing views.
>
>
>
> Our client supports TCP stream based traffic only which is in some
proprietary format and need to decode that. System which is accepting this
traffic is flink based and that’s why all this tried with custom data
source?
>
>
>
> As you suggested message broker below then how it is feasible in this
case?
>
>
>
> From: Alexander Fedulov 
> Sent: 01 November 2023 01:54 AM
> To: Kamal Mittal 
> Cc: user@flink.apache.org
> Subject: Re: Flink custom parallel data source
>
>
>
> Flink natively supports a pull-based model for sources, where the source
operators request data from the external system when they are ready to
process it.  Implementing a TCP server socket operator essentially creates
a push-based source, which could lead to backpressure problems if the data
ingestion rate exceeds the processing rate. You also lose any delivery
guarantees because Flink's fault tolerance model relies on having
replayable sources.
>
> Is using a message broker not feasible in your case?
>
> Best,
>
> Alexander Fedulov
>
>
>
> On Tue, 31 Oct 2023 at 13:08, Kamal Mittal 
wrote:
>
> Hello,
>
>
>
> We are writing TCP server socket custom source function in which TCP
server socket listener will accept connections and read data.
>
> Single Custom source server socket function – ServerSocket serversocket =
new ServerSocket();
>
> Now using thread pool accept multiple connections in separate threads =
new Runnable () -> serversocket.accept();
>
> So client socket will be accepted and given to separate thread for read
data from TCP stream.
>
> Rgds,
>
> Kamal
>
> From: Alexander Fedulov 
> Sent: 31 October 2023 04:03 PM
> To: Kamal Mittal 
> Cc: user@flink.apache.org
> Subject: Re: Flink custom parallel data source
>
>
>
> Please note that SourceFunction API is deprecated and is due to be
removed, possibly in the next major version of Flink.
>
> Ideally you should not be manually spawning threads in your Flink
applications. Typically you would only perform data fetching in the sources
and do processing in the subsequent operators which you can scale
independently from the source parallelism. Can you describe what you are
trying to achieve?
>
>
>
> Best,
>
> Alexander Fedulov
>
>
>
> On Tue, 31 Oct 2023 at 07:22, Kamal Mittal via user 
wrote:
>
> Hello Community,
>
>
>
> I need to have a custom parallel data source (Flink
ParallelSourceFunction) for fetching data based on some custom logic. In
this source function, opening multiple threads via java thread pool to
distribute work further.
>
>
>
> These threads share Flink provided ‘SourceContext’ and collect records
via source_context.collect() method.
>
>
>
> Is it ok to share source context in separate threads and get data?
>
>
>
> Is there any issue for downstream operators due to above design?
>
>
>
> Rgds,
>
> Kamal

>


RE: Flink custom parallel data source

2023-10-31 Thread Kamal Mittal via user
Thanks for sharing views.

Our client supports TCP stream based traffic only which is in some proprietary 
format and need to decode that. System which is accepting this traffic is flink 
based and that’s why all this tried with custom data source?

As you suggested message broker below then how it is feasible in this case?

From: Alexander Fedulov 
Sent: 01 November 2023 01:54 AM
To: Kamal Mittal 
Cc: user@flink.apache.org
Subject: Re: Flink custom parallel data source

Flink natively supports a pull-based model for sources, where the source 
operators request data from the external system when they are ready to process 
it.  Implementing a TCP server socket operator essentially creates a push-based 
source, which could lead to backpressure problems if the data ingestion rate 
exceeds the processing rate. You also lose any delivery guarantees because 
Flink's fault tolerance model relies on having replayable sources.
Is using a message broker not feasible in your case?

Best,
Alexander Fedulov

On Tue, 31 Oct 2023 at 13:08, Kamal Mittal 
mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

We are writing TCP server socket custom source function in which TCP server 
socket listener will accept connections and read data.
Single Custom source server socket function – ServerSocket serversocket = new 
ServerSocket();
Now using thread pool accept multiple connections in separate threads = new 
Runnable () -> serversocket.accept();
So client socket will be accepted and given to separate thread for read data 
from TCP stream.
Rgds,
Kamal
From: Alexander Fedulov 
mailto:alexander.fedu...@gmail.com>>
Sent: 31 October 2023 04:03 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Flink custom parallel data source

Please note that SourceFunction API is deprecated and is due to be removed, 
possibly in the next major version of Flink.
Ideally you should not be manually spawning threads in your Flink applications. 
Typically you would only perform data fetching in the sources and do processing 
in the subsequent operators which you can scale independently from the source 
parallelism. Can you describe what you are trying to achieve?

Best,
Alexander Fedulov

On Tue, 31 Oct 2023 at 07:22, Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello Community,

I need to have a custom parallel data source (Flink ParallelSourceFunction) for 
fetching data based on some custom logic. In this source function, opening 
multiple threads via java thread pool to distribute work further.

These threads share Flink provided ‘SourceContext’ and collect records via 
source_context.collect() method.

Is it ok to share source context in separate threads and get data?

Is there any issue for downstream operators due to above design?

Rgds,
Kamal


Re: Flink custom parallel data source

2023-10-31 Thread Alexander Fedulov
Flink natively supports a pull-based model for sources, where the source
operators request data from the external system when they are ready to
process it.  Implementing a TCP server socket operator essentially creates
a push-based source, which could lead to backpressure problems if the data
ingestion rate exceeds the processing rate. You also lose any delivery
guarantees because Flink's fault tolerance model relies on having
replayable sources.
Is using a message broker not feasible in your case?

Best,
Alexander Fedulov

On Tue, 31 Oct 2023 at 13:08, Kamal Mittal 
wrote:

> Hello,
>
>
>
> We are writing TCP server socket custom source function in which TCP
> server socket listener will accept connections and read data.
>
> Single Custom source server socket function – ServerSocket *serversocket* =
> new ServerSocket();
>
> Now using thread pool accept multiple connections in separate threads = new
>  *Runnable* () -> *serversocket*.accept();
>
> So client socket will be accepted and given to separate thread for read
> data from TCP stream.
>
> Rgds,
>
> Kamal
>
> *From:* Alexander Fedulov 
> *Sent:* 31 October 2023 04:03 PM
> *To:* Kamal Mittal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Flink custom parallel data source
>
>
>
> Please note that SourceFunction API is deprecated and is due to be
> removed, possibly in the next major version of Flink.
>
> Ideally you should not be manually spawning threads in your Flink
> applications. Typically you would only perform data fetching in the sources
> and do processing in the subsequent operators which you can scale
> independently from the source parallelism. Can you describe what you are
> trying to achieve?
>
>
>
> Best,
>
> Alexander Fedulov
>
>
>
> On Tue, 31 Oct 2023 at 07:22, Kamal Mittal via user 
> wrote:
>
> Hello Community,
>
>
>
> I need to have a custom parallel data source (Flink ParallelSourceFunction)
> for fetching data based on some custom logic. In this source function,
> opening multiple threads via java thread pool to distribute work further.
>
>
>
> These threads share Flink provided ‘SourceContext’ and collect records via
> source_context.collect() method.
>
>
>
> Is it ok to share source context in separate threads and get data?
>
>
>
> Is there any issue for downstream operators due to above design?
>
>
>
> Rgds,
>
> Kamal
>
>


RE: Flink custom parallel data source

2023-10-31 Thread Kamal Mittal via user
Hello,

We are writing TCP server socket custom source function in which TCP server 
socket listener will accept connections and read data.
Single Custom source server socket function – ServerSocket serversocket = new 
ServerSocket();
Now using thread pool accept multiple connections in separate threads = new 
Runnable () -> serversocket.accept();
So client socket will be accepted and given to separate thread for read data 
from TCP stream.
Rgds,
Kamal
From: Alexander Fedulov 
Sent: 31 October 2023 04:03 PM
To: Kamal Mittal 
Cc: user@flink.apache.org
Subject: Re: Flink custom parallel data source

Please note that SourceFunction API is deprecated and is due to be removed, 
possibly in the next major version of Flink.
Ideally you should not be manually spawning threads in your Flink applications. 
Typically you would only perform data fetching in the sources and do processing 
in the subsequent operators which you can scale independently from the source 
parallelism. Can you describe what you are trying to achieve?

Best,
Alexander Fedulov

On Tue, 31 Oct 2023 at 07:22, Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello Community,

I need to have a custom parallel data source (Flink ParallelSourceFunction) for 
fetching data based on some custom logic. In this source function, opening 
multiple threads via java thread pool to distribute work further.

These threads share Flink provided ‘SourceContext’ and collect records via 
source_context.collect() method.

Is it ok to share source context in separate threads and get data?

Is there any issue for downstream operators due to above design?

Rgds,
Kamal


Re: Flink custom parallel data source

2023-10-31 Thread Alexander Fedulov
Please note that SourceFunction API is deprecated and is due to be removed,
possibly in the next major version of Flink.
Ideally you should not be manually spawning threads in your Flink
applications. Typically you would only perform data fetching in the sources
and do processing in the subsequent operators which you can scale
independently from the source parallelism. Can you describe what you are
trying to achieve?

Best,
Alexander Fedulov

On Tue, 31 Oct 2023 at 07:22, Kamal Mittal via user 
wrote:

> Hello Community,
>
>
>
> I need to have a custom parallel data source (Flink ParallelSourceFunction)
> for fetching data based on some custom logic. In this source function,
> opening multiple threads via java thread pool to distribute work further.
>
>
>
> These threads share Flink provided ‘SourceContext’ and collect records via
> source_context.collect() method.
>
>
>
> Is it ok to share source context in separate threads and get data?
>
>
>
> Is there any issue for downstream operators due to above design?
>
>
>
> Rgds,
>
> Kamal
>


Flink custom parallel data source

2023-10-31 Thread Kamal Mittal via user
Hello Community,

I need to have a custom parallel data source (Flink ParallelSourceFunction) for 
fetching data based on some custom logic. In this source function, opening 
multiple threads via java thread pool to distribute work further.

These threads share Flink provided 'SourceContext' and collect records via 
source_context.collect() method.

Is it ok to share source context in separate threads and get data?

Is there any issue for downstream operators due to above design?

Rgds,
Kamal