Re: Spark Scheduler

Sai Prasanna Sun, 26 Jan 2014 20:25:00 -0800

Thathanga Das, With respect to HDFS, i think the job seeker will return
which of the replicated nodes is the preferred locations. But on a
stand-alone spark system, using native filesystem, say if partitions are
cached, its straightforward to return the same. IF not cached but
replicated across 3 nodes, how will spark return preferredlocations(p) in
the absence of Hadoop/HDFS.
In this case what is the logic ??



On Sat, Jan 25, 2014 at 12:11 AM, Tathagata Das <[email protected]
> wrote:

> The logic behind the preferred location of an RDD partition is pretty
> simple. For RDDs that are based on the HDFS file, the preferred location is
> set based on the where the HDFS blocks corresponding to the RDD's
> partitions are located. This is done by querying the HDFS framework. For
> any RDD that may be cached, the preferred location is set based on where a
> partition is cached (may be replicated as well). So the system does not
> maintain any history about block / partition access times, bandwidth, etc.
>
>
> On Fri, Jan 24, 2014 at 1:15 AM, Sai Prasanna <[email protected]>wrote:
>
>> Hello Everybody, Please help me with this.
>>
>> preferredLocations(p) method for an RDD gives nodes where partition p of
>> a given RDD can be accessed faster. How does SPARK inherently implements
>> this?...Does any history about access times, network bandwidth  for various
>> partitions across nodes are stored and used, or else jobs allocated to a
>> node only determines the preferredLocations in case for multiple copies of
>> RDD.
>> Or is the intelligence derived from underlying framework, say HDFS.
>>
>> --
>> *Sai Prasanna. AN*
>> *II M.Tech (CS), SSSIHL*
>>
>>
>> *Entire water in the ocean can never sink a ship, Unless it gets inside.
>> All the pressures of life can never hurt you, Unless you let them in.*
>>
>
>


-- 
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*


*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*

Re: Spark Scheduler

Reply via email to