Re: Data & Task distribution among the available Nodes

2023-06-30 Thread Shammon FY
Hi Mahmoud,

For the third quest, currently flink uses Fine-Grained Resource Management
to choose a TM for tasks, you can refer to the doc [1] for more information.


[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/finegrained_resource/

Best,
Shammon FY


On Thu, Jun 29, 2023 at 4:17 PM Martijn Visser 
wrote:

> Hi Mahmoud,
>
> While it's not an answer to your questions, I do want to point out
> that the DataSet API is deprecated and will be removed in a future
> version of Flink. I would recommend moving to either the Table API or
> the DataStream API.
>
> Best regards,
>
> Martijn
>
> On Thu, Jun 22, 2023 at 6:14 PM Mahmoud Awad 
> wrote:
> >
> > Hello everyone,
> >
> > I am trying to understand the mechanism by which Flink distributed the
> data and the tasks among the nodes/task managers in the cluster, assuming
> all TMs have equal resources. I am using the DataSet API on my own machine.
> > I will try to address the issue with the following questions :
> >
> > -When we  firstly read the data from the source(Text,CSV..etc.), How
> does Flink ensures the fairly distribution of data from the source to the
> next subtask ?
> >
> > -Are there any preferences by which Flink will prefer a task manager on
> the other(assuming all task managers have equal resources) ?
> >
> > - Based on what, will Flink choose to deploy a specific task in a
> specific task manager ?
> >
> > I hope I was able to explain my point, thank you in advanced.
> >
> > Best regards
> > Mahmoud
> >
> >
> >
> > Gesendet von Mail für Windows
> >
> >
>


Re: Data & Task distribution among the available Nodes

2023-06-29 Thread Martijn Visser
Hi Mahmoud,

While it's not an answer to your questions, I do want to point out
that the DataSet API is deprecated and will be removed in a future
version of Flink. I would recommend moving to either the Table API or
the DataStream API.

Best regards,

Martijn

On Thu, Jun 22, 2023 at 6:14 PM Mahmoud Awad  wrote:
>
> Hello everyone,
>
> I am trying to understand the mechanism by which Flink distributed the data 
> and the tasks among the nodes/task managers in the cluster, assuming all TMs 
> have equal resources. I am using the DataSet API on my own machine.
> I will try to address the issue with the following questions :
>
> -When we  firstly read the data from the source(Text,CSV..etc.), How does 
> Flink ensures the fairly distribution of data from the source to the next 
> subtask ?
>
> -Are there any preferences by which Flink will prefer a task manager on the 
> other(assuming all task managers have equal resources) ?
>
> - Based on what, will Flink choose to deploy a specific task in a specific 
> task manager ?
>
> I hope I was able to explain my point, thank you in advanced.
>
> Best regards
> Mahmoud
>
>
>
> Gesendet von Mail für Windows
>
>


Data & Task distribution among the available Nodes

2023-06-22 Thread Mahmoud Awad
Hello everyone,

I am trying to understand the mechanism by which Flink distributed the data and 
the tasks among the nodes/task managers in the cluster, assuming all TMs have 
equal resources. I am using the DataSet API on my own machine.
I will try to address the issue with the following questions :

-When we  firstly read the data from the source(Text,CSV..etc.), How does Flink 
ensures the fairly distribution of data from the source to the next subtask ?

-Are there any preferences by which Flink will prefer a task manager on the 
other(assuming all task managers have equal resources) ?

- Based on what, will Flink choose to deploy a specific task in a specific task 
manager ?

I hope I was able to explain my point, thank you in advanced.

Best regards
Mahmoud


Gesendet von Mail für Windows