Hi,

How spark reads data from s3 and runs parallel task.

Assume I have a s3 bucket size of 35 GB( parquet file).

How the sparksession will read the data and process the data parallel. How
it splits the s3 data and assign to each executor task.

​Please share me your points.

Note:
if we have RDD , then we can look at the partitions.size or length to check
how many partition for a file. But how this will be accomplished in terms
of S3 bucket.​

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Reply via email to