Re: Read Data From NFS

2017-06-13 Thread ayan guha
Hi So, for example, if I specify parallelism to 100, 100 partitions will be created, right? My question is how spark divides the file? In other words, how does it specify first x lines will be read by first partition and further y lines will be read by second partition and so on? In case of hdfs,

Re: Read Data From NFS

2017-06-13 Thread Riccardo Ferrari
Hi Ayan, You might be interested in the official Spark docs: https://spark.apache.org/docs/latest/tuning.html#level-of-parallelism and its spark.default.parallelism setting Best, On Mon, Jun 12, 2017 at 6:18 AM, ayan guha wrote: > I understand how it works with hdfs. My question is when hdfs is

Re: Read Data From NFS

2017-06-11 Thread ayan guha
I understand how it works with hdfs. My question is when hdfs is not the file sustem, how number of partitions are calculated. Hope that makes it clearer. On Mon, 12 Jun 2017 at 2:42 am, vaquar khan wrote: > > > As per spark doc : > The textFile method also takes an optional second argument for

Re: Read Data From NFS

2017-06-11 Thread vaquar khan
As per spark doc : The textFile method also takes an optional second argument for controlling the number of partitions of the file.* By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS)*, but you can also ask for a higher number of partitions by

Re: Read Data From NFS

2017-06-11 Thread ayan guha
Hi My question is what happens if I have 1 file of say 100gb. Then how many partitions will be there? Best Ayan On Sun, 11 Jun 2017 at 9:36 am, vaquar khan wrote: > Hi Ayan, > > If you have multiple files (example 12 files )and you are using following > code then you will get 12 partition. > >

Re: Read Data From NFS

2017-06-10 Thread vaquar khan
Hi Ayan, If you have multiple files (example 12 files )and you are using following code then you will get 12 partition. r = sc.textFile("file://my/file/*") Not sure what you want to know about file system ,please check API doc. Regards, Vaquar khan On Jun 8, 2017 10:44 AM, "ayan guha" wrote:

Re: Read Data From NFS

2017-06-08 Thread ayan guha
Any one? On Thu, 8 Jun 2017 at 3:26 pm, ayan guha wrote: > Hi Guys > > Quick one: How spark deals (ie create partitions) with large files sitting > on NFS, assuming the all executors can see the file exactly same way. > > ie, when I run > > r = sc.textFile("file://my/file") > > what happens if t

Read Data From NFS

2017-06-07 Thread ayan guha
Hi Guys Quick one: How spark deals (ie create partitions) with large files sitting on NFS, assuming the all executors can see the file exactly same way. ie, when I run r = sc.textFile("file://my/file") what happens if the file is on NFS? is there any difference from r = sc.textFile("hdfs://my