Hi
So, for example, if I specify parallelism to 100, 100 partitions will be
created, right? My question is how spark divides the file? In other words,
how does it specify first x lines will be read by first partition and
further y lines will be read by second partition and so on? In case of
hdfs,
Hi Ayan,
You might be interested in the official Spark docs:
https://spark.apache.org/docs/latest/tuning.html#level-of-parallelism and
its spark.default.parallelism setting
Best,
On Mon, Jun 12, 2017 at 6:18 AM, ayan guha wrote:
> I understand how it works with hdfs. My question is when hdfs is
I understand how it works with hdfs. My question is when hdfs is not the
file sustem, how number of partitions are calculated. Hope that makes it
clearer.
On Mon, 12 Jun 2017 at 2:42 am, vaquar khan wrote:
>
>
> As per spark doc :
> The textFile method also takes an optional second argument for
As per spark doc :
The textFile method also takes an optional second argument for controlling
the number of partitions of the file.* By default, Spark creates one
partition for each block of the file (blocks being 128MB by default in
HDFS)*, but you can also ask for a higher number of partitions by
Hi
My question is what happens if I have 1 file of say 100gb. Then how many
partitions will be there?
Best
Ayan
On Sun, 11 Jun 2017 at 9:36 am, vaquar khan wrote:
> Hi Ayan,
>
> If you have multiple files (example 12 files )and you are using following
> code then you will get 12 partition.
>
>
Hi Ayan,
If you have multiple files (example 12 files )and you are using following
code then you will get 12 partition.
r = sc.textFile("file://my/file/*")
Not sure what you want to know about file system ,please check API doc.
Regards,
Vaquar khan
On Jun 8, 2017 10:44 AM, "ayan guha" wrote:
Any one?
On Thu, 8 Jun 2017 at 3:26 pm, ayan guha wrote:
> Hi Guys
>
> Quick one: How spark deals (ie create partitions) with large files sitting
> on NFS, assuming the all executors can see the file exactly same way.
>
> ie, when I run
>
> r = sc.textFile("file://my/file")
>
> what happens if t
Hi Guys
Quick one: How spark deals (ie create partitions) with large files sitting
on NFS, assuming the all executors can see the file exactly same way.
ie, when I run
r = sc.textFile("file://my/file")
what happens if the file is on NFS?
is there any difference from
r = sc.textFile("hdfs://my