Hi,
What is the purpose of the taskBinary for a ShuffleMapTask? What does it
contain and how is it useful? Is it the representation of all the RDD
operations that will be applied for the partition that task will be
processing? (in the case below the task will process stage 0, partition 0)
If it is
I'm running Spark in local mode and getting these two log messages who
appear to be similar. I want to understand what each is doing:
1. [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started
service 'sparkDriver' on port 60782.
2. [main] executor.Executor (Logging.scala:lo
I'm trying to build Spark using Intellij on Windows. But I'm repeatedly
getting this error
spark-master\external\flume-sink\src\main\scala\org\apache\spark\streaming\flume\sink\SparkAvroCallbackHandler.scala
Error:(46, 66) not found: type SparkFlumeProtocol
val transactionTimeout: Int, val backO
Spark is an in-memory engine and attempts to do computation in-memory.
Tachyon is memory-centeric distributed storage, OK, but how would that help
ran Spark faster?
Consider the classic word count application over a 4 node cluster with a
sizable working data. What makes Spark ran faster than MapReduce
considering that Spark also has to write to disk during shuffle?
Thanks!
On Wed, Aug 5, 2015 at 5:24 PM, Saisai Shao wrote:
> Yes, finally shuffle data will be written to disk for reduce stage to
> pull, no matter how large you set to shuffle memory fraction.
>
> Thanks
> Saisai
>
> On Thu, Aug 6, 2015 at 7:50 AM, Muler wrote:
>
wrote:
> Hi Muler,
>
> Shuffle data will be written to disk, no matter how large memory you have,
> large memory could alleviate shuffle spill where temporary file will be
> generated if memory is not enough.
>
> Yes, each node writes shuffle data to file and pulled from d
Hi,
Consider I'm running WordCount with 100m of data on 4 node cluster.
Assuming my RAM size on each node is 200g and i'm giving my executors 100g
(just enough memory for 100m data)
1. If I have enough memory, can Spark 100% avoid writing to disk?
2. During shuffle, where results have to b