Error running spark-sql-perf version 0.3.2 against Spark 1.6

2016-04-27 Thread Michael Slavitch
Hello;

I'm trying to run spark-sql-perf  version 0.3.2  (hash cb0347b) against Spark 
1.6,  I get the following when running

 ./bin/run --benchmark DatsetPerformance 

Exception in thread "main" java.lang.ClassNotFoundException: 
com.databricks.spark.sql.perf.DatsetPerformance

Even though the classes are built.  How do I resolve this?


Another question:  Why does the current version of the benchmark run only 
against spark 2.0?




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark on Mobile platforms

2016-04-07 Thread Michael Slavitch
You should consider mobile agents that feed data into a spark datacenter via 
spark streaming.


> On Apr 7, 2016, at 8:28 AM, Ashic Mahtab  wrote:
> 
> Spark may not be the right tool for this. Working on just the mobile device, 
> you won't be scaling out stuff, and as such most of the benefits of Spark 
> would be nullified. Moreover, it'd likely run slower than things that are 
> meant to work in a single process. Spark is also quite large, which is 
> another drawback in terms of mobile apps.
> 
> Perhaps check out Tensorflow, which may be better suited for this particular 
> requirement.
> 
> -Ashic.
> 
> > Date: Thu, 7 Apr 2016 04:50:18 -0700
> > From: sbarbhuiy...@qub.ac.uk 
> > To: user@spark.apache.org 
> > Subject: Spark on Mobile platforms
> > 
> > Hi all,
> > 
> > I have been trying to find if Spark can be run on a mobile device platform
> > (Android preferably) to analyse mobile log data for some performance
> > analysis. So, basically the idea is to collect and process the mobile log
> > data within the mobile device using the Spark framework to allow real-time
> > log data analysis, without offloading the data to remote server/cloud.
> > 
> > Does anybody have any idea about whether running Spark on a mobile platform
> > is supported by the existing Spark framework or is there any other option
> > for this?
> > 
> > Many thanks.
> > Sakil
> > 
> > 
> > 
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Mobile-platforms-tp26699.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> > 
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> > 



Re: lost executor due to large shuffle spill memory

2016-04-06 Thread Michael Slavitch
Shuffle will always spill the local dataset to disk.  Changing memory settings 
does nothing to alter this,  so you need to set spark.local.dir appropriately 
to a fast disk.


> On Apr 6, 2016, at 12:32 PM, Lishu Liu <lishu...@gmail.com> wrote:
> 
> Thanks Michael. I use 5 m3.2xlarge nodes. Should I increase 
> spark.storage.memoryFraction? Also I'm thinking maybe I should repartition 
> all_pairs so that each partition will be small enough to be handled. 
> 
> On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com 
> <mailto:slavi...@gmail.com>> wrote:
> Do you have enough disk space for the spill?  It seems it has lots of memory 
> reserved but not enough for the spill. You will need a disk that can handle 
> the entire data partition for each host. Compression of the spilled data 
> saves about 50% in most if not all cases.
> 
> Given the large data set I would consider a 1TB SATA flash drive, formatted 
> as EXT4 or XFS  and give it exclusive access as spark.local.dir.  It will 
> slow things down but it won’t stop.  There are alternatives if you want to 
> discuss offline.
> 
> 
> > On Apr 5, 2016, at 6:37 PM, l <lishu...@gmail.com 
> > <mailto:lishu...@gmail.com>> wrote:
> >
> > I have a task to remap the index to actual uuid in ALS prediction results.
> > But it consistently fail due to lost executors. I noticed there's large
> > shuffle spill memory but I don't know how to improve it.
> >
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png 
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png>>
> >
> > I've tried to reduce the number of executors while assigning each to have
> > bigger memory.
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png 
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png>>
> >
> > But it still doesn't seem big enough. I don't know what to do.
> >
> > Below is my code:
> > user = load_user()
> > product = load_product()
> > user.cache()
> > product.cache()
> > model = load_model(model_path)
> > all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
> > all_prediction = model.predictAll(all_pairs)
> > user_reverse = user.map(lambda r: (r[1], r[0]))
> > product_reverse = product.map(lambda r: (r[1], r[0]))
> > user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> > u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> > r[1][0][1])))
> > both_reversed = user_reversed.join(product_reverse).map(lambda r:
> > (r[1][0][0], r[1][1], r[1][0][1]))
> > both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> > x[2])).saveAsTextFile(recommendation_path)
> >
> > Both user and products are (uuid, index) tuples.
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> >  
> > <http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html>
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > <mailto:user-unsubscr...@spark.apache.org>
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > <mailto:user-h...@spark.apache.org>
> >
> 
> 



Re: lost executor due to large shuffle spill memory

2016-04-05 Thread Michael Slavitch
Do you have enough disk space for the spill?  It seems it has lots of memory 
reserved but not enough for the spill. You will need a disk that can handle the 
entire data partition for each host. Compression of the spilled data saves 
about 50% in most if not all cases.

Given the large data set I would consider a 1TB SATA flash drive, formatted as 
EXT4 or XFS  and give it exclusive access as spark.local.dir.  It will slow 
things down but it won’t stop.  There are alternatives if you want to discuss 
offline.


> On Apr 5, 2016, at 6:37 PM, l  wrote:
> 
> I have a task to remap the index to actual uuid in ALS prediction results.
> But it consistently fail due to lost executors. I noticed there's large
> shuffle spill memory but I don't know how to improve it. 
> 
>  
> 
> I've tried to reduce the number of executors while assigning each to have
> bigger memory. 
>  
> 
> But it still doesn't seem big enough. I don't know what to do. 
> 
> Below is my code:
> user = load_user()
> product = load_product()
> user.cache()
> product.cache()
> model = load_model(model_path)
> all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
> all_prediction = model.predictAll(all_pairs)
> user_reverse = user.map(lambda r: (r[1], r[0]))
> product_reverse = product.map(lambda r: (r[1], r[0]))
> user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> r[1][0][1])))
> both_reversed = user_reversed.join(product_reverse).map(lambda r:
> (r[1][0][0], r[1][1], r[1][0][1]))
> both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> x[2])).saveAsTextFile(recommendation_path)
> 
> Both user and products are (uuid, index) tuples. 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Michael Slavitch
Just to be sure:  Has spark-env.sh and spark-defaults.conf been correctly 
propagated to all nodes?  Are they identical?


> On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
> 
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o resolution;
> c.f.:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html
> http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html
> ]
> 
> Hello,
> 
> In short, I'm reporting a problem concerning load imbalance of RDD
> partitions across a standalone cluster. Though there are 16 cores
> available per node, certain nodes will have >16 partitions, and some
> will correspondingly have <16 (and even 0).
> 
> In more detail: I am running some scalability/performance tests for
> vector-type operations. The RDDs I'm considering are simple block
> vectors of type RDD[(Int,Vector)] for a Breeze vector type. The RDDs
> are generated with a fixed number of elements given by some multiple
> of the available cores, and subsequently hash-partitioned by their
> integer block index.
> 
> I have verified that the hash partitioning key distribution, as well
> as the keys themselves, are both correct; the problem is truly that
> the partitions are *not* evenly distributed across the nodes.
> 
> For instance, here is a representative output for some stages and
> tasks in an iterative program. This is a very simple test with 2
> nodes, 64 partitions, 32 cores (16 per node), and 2 executors. Two
> examples stages from the stderr log are stages 7 and 9:
> 7,mapPartitions at DummyVector.scala:113,64,1459771364404,1459771365272
> 9,mapPartitions at DummyVector.scala:113,64,1459771364431,1459771365639
> 
> When counting the location of the partitions on the compute nodes from
> the stderr logs, however, you can clearly see the imbalance. Examples
> lines are:
> 13627 task 0.0 in stage 7.0 (TID 196,
> himrod-2, partition 0,PROCESS_LOCAL, 3987 bytes)&
> 13628 task 1.0 in stage 7.0 (TID 197,
> himrod-2, partition 1,PROCESS_LOCAL, 3987 bytes)&
> 13629 task 2.0 in stage 7.0 (TID 198,
> himrod-2, partition 2,PROCESS_LOCAL, 3987 bytes)&
> 
> Grep'ing the full set of above lines for each hostname, himrod-?,
> shows the problem occurs in each stage. Below is the output, where the
> number of partitions stored on each node is given alongside its
> hostname as in (himrod-?,num_partitions):
> Stage 7: (himrod-1,0) (himrod-2,64)
> Stage 9: (himrod-1,16) (himrod-2,48)
> Stage 12: (himrod-1,0) (himrod-2,64)
> Stage 14: (himrod-1,16) (himrod-2,48)
> The imbalance is also visible when the executor ID is used to count
> the partitions operated on by executors.
> 
> I am working off a fairly recent modification of 2.0.0-SNAPSHOT branch
> (but the modifications do not touch the scheduler, and are irrelevant
> for these particular tests). Has something changed radically in 1.6+
> that would make a previously (<=1.5) correct configuration go haywire?
> Have new configuration settings been added of which I'm unaware that
> could lead to this problem?
> 
> Please let me know if others in the community have observed this, and
> thank you for your time,
> Mike
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Yes we see it on final write.  Our preference is to eliminate this.

On Fri, Apr 1, 2016, 7:25 PM Saisai Shao <sai.sai.s...@gmail.com> wrote:

> Hi Michael, shuffle data (mapper output) have to be materialized into disk
> finally, no matter how large memory you have, it is the design purpose of
> Spark. In you scenario, since you have a big memory, shuffle spill should
> not happen frequently, most of the disk IO you see might be final shuffle
> file write.
>
> So if you want to avoid this disk IO, you could use ramdisk as Reynold
> suggested. If you want to avoid FS overhead of ramdisk, you could try to
> hack a new shuffle implementation, since shuffle framework is pluggable.
>
>
> On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch <slavi...@gmail.com>
> wrote:
>
>> As I mentioned earlier this flag is now ignored.
>>
>>
>> On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch <slavi...@gmail.com> wrote:
>>
>>> Shuffling a 1tb set of keys and values (aka sort by key)  results in
>>> about 500gb of io to disk if compression is enabled. Is there any way to
>>> eliminate shuffling causing io?
>>>
>>> On Fri, Apr 1, 2016, 6:32 PM Reynold Xin <r...@databricks.com> wrote:
>>>
>>>> Michael - I'm not sure if you actually read my email, but spill has
>>>> nothing to do with the shuffle files on disk. It was for the partitioning
>>>> (i.e. sorting) process. If that flag is off, Spark will just run out of
>>>> memory when data doesn't fit in memory.
>>>>
>>>>
>>>> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch <slavi...@gmail.com>
>>>> wrote:
>>>>
>>>>> RAMdisk is a fine interim step but there is a lot of layers eliminated
>>>>> by keeping things in memory unless there is need for spillover.   At one
>>>>> time there was support for turning off spilling.  That was eliminated.
>>>>> Why?
>>>>>
>>>>>
>>>>> On Fri, Apr 1, 2016, 6:05 PM Mridul Muralidharan <mri...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I think Reynold's suggestion of using ram disk would be a good way to
>>>>>> test if these are the bottlenecks or something else is.
>>>>>> For most practical purposes, pointing local dir to ramdisk should
>>>>>> effectively give you 'similar' performance as shuffling from memory.
>>>>>>
>>>>>> Are there concerns with taking that approach to test ? (I dont see
>>>>>> any, but I am not sure if I missed something).
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com>
>>>>>> wrote:
>>>>>> > I totally disagree that it’s not a problem.
>>>>>> >
>>>>>> > - Network fetch throughput on 40G Ethernet exceeds the throughput
>>>>>> of NVME
>>>>>> > drives.
>>>>>> > - What Spark is depending on is Linux’s IO cache as an effective
>>>>>> buffer pool
>>>>>> > This is fine for small jobs but not for jobs with datasets in the
>>>>>> TB/node
>>>>>> > range.
>>>>>> > - On larger jobs flushing the cache causes Linux to block.
>>>>>> > - On a modern 56-hyperthread 2-socket host the latency caused by
>>>>>> multiple
>>>>>> > executors writing out to disk increases greatly.
>>>>>> >
>>>>>> > I thought the whole point of Spark was in-memory computing?  It’s
>>>>>> in fact
>>>>>> > in-memory for some things but  use spark.local.dir as a buffer pool
>>>>>> of
>>>>>> > others.
>>>>>> >
>>>>>> > Hence, the performance of  Spark is gated by the performance of
>>>>>> > spark.local.dir, even on large memory systems.
>>>>>> >
>>>>>> > "Currently it is not possible to not write shuffle files to disk.”
>>>>>> >
>>>>>> > What changes >would< make it possible?
>>>>>> >
>>>>>> > The only one that seems possible is to clone the shuffle service
>>>>>> and 

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Shuffling a 1tb set of keys and values (aka sort by key)  results in about
500gb of io to disk if compression is enabled. Is there any way to
eliminate shuffling causing io?

On Fri, Apr 1, 2016, 6:32 PM Reynold Xin <r...@databricks.com> wrote:

> Michael - I'm not sure if you actually read my email, but spill has
> nothing to do with the shuffle files on disk. It was for the partitioning
> (i.e. sorting) process. If that flag is off, Spark will just run out of
> memory when data doesn't fit in memory.
>
>
> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch <slavi...@gmail.com>
> wrote:
>
>> RAMdisk is a fine interim step but there is a lot of layers eliminated by
>> keeping things in memory unless there is need for spillover.   At one time
>> there was support for turning off spilling.  That was eliminated.  Why?
>>
>>
>> On Fri, Apr 1, 2016, 6:05 PM Mridul Muralidharan <mri...@gmail.com>
>> wrote:
>>
>>> I think Reynold's suggestion of using ram disk would be a good way to
>>> test if these are the bottlenecks or something else is.
>>> For most practical purposes, pointing local dir to ramdisk should
>>> effectively give you 'similar' performance as shuffling from memory.
>>>
>>> Are there concerns with taking that approach to test ? (I dont see
>>> any, but I am not sure if I missed something).
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>>
>>>
>>> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com>
>>> wrote:
>>> > I totally disagree that it’s not a problem.
>>> >
>>> > - Network fetch throughput on 40G Ethernet exceeds the throughput of
>>> NVME
>>> > drives.
>>> > - What Spark is depending on is Linux’s IO cache as an effective
>>> buffer pool
>>> > This is fine for small jobs but not for jobs with datasets in the
>>> TB/node
>>> > range.
>>> > - On larger jobs flushing the cache causes Linux to block.
>>> > - On a modern 56-hyperthread 2-socket host the latency caused by
>>> multiple
>>> > executors writing out to disk increases greatly.
>>> >
>>> > I thought the whole point of Spark was in-memory computing?  It’s in
>>> fact
>>> > in-memory for some things but  use spark.local.dir as a buffer pool of
>>> > others.
>>> >
>>> > Hence, the performance of  Spark is gated by the performance of
>>> > spark.local.dir, even on large memory systems.
>>> >
>>> > "Currently it is not possible to not write shuffle files to disk.”
>>> >
>>> > What changes >would< make it possible?
>>> >
>>> > The only one that seems possible is to clone the shuffle service and
>>> make it
>>> > in-memory.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Apr 1, 2016, at 4:57 PM, Reynold Xin <r...@databricks.com> wrote:
>>> >
>>> > spark.shuffle.spill actually has nothing to do with whether we write
>>> shuffle
>>> > files to disk. Currently it is not possible to not write shuffle files
>>> to
>>> > disk, and typically it is not a problem because the network fetch
>>> throughput
>>> > is lower than what disks can sustain. In most cases, especially with
>>> SSDs,
>>> > there is little difference between putting all of those in memory and
>>> on
>>> > disk.
>>> >
>>> > However, it is becoming more common to run Spark on a few number of
>>> beefy
>>> > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into
>>> improving
>>> > performance for those. Meantime, you can setup local ramdisks on each
>>> node
>>> > for shuffle writes.
>>> >
>>> >
>>> >
>>> > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <slavi...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hello;
>>> >>
>>> >> I’m working on spark with very large memory systems (2TB+) and notice
>>> that
>>> >> Spark spills to disk in shuffle.  Is there a way to force spark to
>>> stay in
>>> >> memory when doing shuffle operations?   The goal is to keep the
>>> shuffle data
>>> >> either in the heap or in off-heap memory (in 1.6.x) and never touch
>>> the IO
>>> >> subsystem.  I am willing to have the job fail if it runs out of RAM.
>>> >>
>>> >> spark.shuffle.spill true  is deprecated in 1.6 and does not work in
>>> >> Tungsten sort in 1.5.x
>>> >>
>>> >> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but
>>> this
>>> >> is ignored by the tungsten-sort shuffle manager; its optimized
>>> shuffles will
>>> >> continue to spill to disk when necessary.”
>>> >>
>>> >> If this is impossible via configuration changes what code changes
>>> would be
>>> >> needed to accomplish this?
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -
>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>> >>
>>> >
>>> >
>>>
>> --
>> Michael Slavitch
>> 62 Renfrew Ave.
>> Ottawa Ontario
>> K1S 1Z5
>>
>
> --
Michael Slavitch
62 Renfrew Ave.
Ottawa Ontario
K1S 1Z5


Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
RAMdisk is a fine interim step but there is a lot of layers eliminated by
keeping things in memory unless there is need for spillover.   At one time
there was support for turning off spilling.  That was eliminated.  Why?

On Fri, Apr 1, 2016, 6:05 PM Mridul Muralidharan <mri...@gmail.com> wrote:

> I think Reynold's suggestion of using ram disk would be a good way to
> test if these are the bottlenecks or something else is.
> For most practical purposes, pointing local dir to ramdisk should
> effectively give you 'similar' performance as shuffling from memory.
>
> Are there concerns with taking that approach to test ? (I dont see
> any, but I am not sure if I missed something).
>
>
> Regards,
> Mridul
>
>
>
>
> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com>
> wrote:
> > I totally disagree that it’s not a problem.
> >
> > - Network fetch throughput on 40G Ethernet exceeds the throughput of NVME
> > drives.
> > - What Spark is depending on is Linux’s IO cache as an effective buffer
> pool
> > This is fine for small jobs but not for jobs with datasets in the TB/node
> > range.
> > - On larger jobs flushing the cache causes Linux to block.
> > - On a modern 56-hyperthread 2-socket host the latency caused by multiple
> > executors writing out to disk increases greatly.
> >
> > I thought the whole point of Spark was in-memory computing?  It’s in fact
> > in-memory for some things but  use spark.local.dir as a buffer pool of
> > others.
> >
> > Hence, the performance of  Spark is gated by the performance of
> > spark.local.dir, even on large memory systems.
> >
> > "Currently it is not possible to not write shuffle files to disk.”
> >
> > What changes >would< make it possible?
> >
> > The only one that seems possible is to clone the shuffle service and
> make it
> > in-memory.
> >
> >
> >
> >
> >
> > On Apr 1, 2016, at 4:57 PM, Reynold Xin <r...@databricks.com> wrote:
> >
> > spark.shuffle.spill actually has nothing to do with whether we write
> shuffle
> > files to disk. Currently it is not possible to not write shuffle files to
> > disk, and typically it is not a problem because the network fetch
> throughput
> > is lower than what disks can sustain. In most cases, especially with
> SSDs,
> > there is little difference between putting all of those in memory and on
> > disk.
> >
> > However, it is becoming more common to run Spark on a few number of beefy
> > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into
> improving
> > performance for those. Meantime, you can setup local ramdisks on each
> node
> > for shuffle writes.
> >
> >
> >
> > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <slavi...@gmail.com>
> > wrote:
> >>
> >> Hello;
> >>
> >> I’m working on spark with very large memory systems (2TB+) and notice
> that
> >> Spark spills to disk in shuffle.  Is there a way to force spark to stay
> in
> >> memory when doing shuffle operations?   The goal is to keep the shuffle
> data
> >> either in the heap or in off-heap memory (in 1.6.x) and never touch the
> IO
> >> subsystem.  I am willing to have the job fail if it runs out of RAM.
> >>
> >> spark.shuffle.spill true  is deprecated in 1.6 and does not work in
> >> Tungsten sort in 1.5.x
> >>
> >> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but
> this
> >> is ignored by the tungsten-sort shuffle manager; its optimized shuffles
> will
> >> continue to spill to disk when necessary.”
> >>
> >> If this is impossible via configuration changes what code changes would
> be
> >> needed to accomplish this?
> >>
> >>
> >>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
> >
>
-- 
Michael Slavitch
62 Renfrew Ave.
Ottawa Ontario
K1S 1Z5


Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
I totally disagree that it’s not a problem.

- Network fetch throughput on 40G Ethernet exceeds the throughput of NVME 
drives.
- What Spark is depending on is Linux’s IO cache as an effective buffer pool  
This is fine for small jobs but not for jobs with datasets in the TB/node range.
- On larger jobs flushing the cache causes Linux to block.
- On a modern 56-hyperthread 2-socket host the latency caused by multiple 
executors writing out to disk increases greatly. 

I thought the whole point of Spark was in-memory computing?  It’s in fact 
in-memory for some things but  use spark.local.dir as a buffer pool of others.  

Hence, the performance of  Spark is gated by the performance of 
spark.local.dir, even on large memory systems.

"Currently it is not possible to not write shuffle files to disk.”

What changes >would< make it possible?

The only one that seems possible is to clone the shuffle service and make it 
in-memory.





> On Apr 1, 2016, at 4:57 PM, Reynold Xin <r...@databricks.com> wrote:
> 
> spark.shuffle.spill actually has nothing to do with whether we write shuffle 
> files to disk. Currently it is not possible to not write shuffle files to 
> disk, and typically it is not a problem because the network fetch throughput 
> is lower than what disks can sustain. In most cases, especially with SSDs, 
> there is little difference between putting all of those in memory and on disk.
> 
> However, it is becoming more common to run Spark on a few number of beefy 
> nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into improving 
> performance for those. Meantime, you can setup local ramdisks on each node 
> for shuffle writes.
> 
> 
> 
> On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <slavi...@gmail.com 
> <mailto:slavi...@gmail.com>> wrote:
> Hello;
> 
> I’m working on spark with very large memory systems (2TB+) and notice that 
> Spark spills to disk in shuffle.  Is there a way to force spark to stay in 
> memory when doing shuffle operations?   The goal is to keep the shuffle data 
> either in the heap or in off-heap memory (in 1.6.x) and never touch the IO 
> subsystem.  I am willing to have the job fail if it runs out of RAM.
> 
> spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten 
> sort in 1.5.x
> 
> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is 
> ignored by the tungsten-sort shuffle manager; its optimized shuffles will 
> continue to spill to disk when necessary.”
> 
> If this is impossible via configuration changes what code changes would be 
> needed to accomplish this?
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 



In-Memory Only Spark Shuffle

2016-04-01 Thread slavitch
Hello;

I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle.  Is there a way to force spark to stay
exclusively in memory when doing shuffle operations?   The goal is to keep
the shuffle data either in the heap or in off-heap memory (in 1.6.x) and
never touch the IO subsystem.  I am willing to have the job fail if it runs
out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this
is ignored by the tungsten-sort shuffle manager; its optimized shuffles will
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be
needed to accomplish this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/In-Memory-Only-Spark-Shuffle-tp26661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello;

I’m working on spark with very large memory systems (2TB+) and notice that 
Spark spills to disk in shuffle.  Is there a way to force spark to stay in 
memory when doing shuffle operations?   The goal is to keep the shuffle data 
either in the heap or in off-heap memory (in 1.6.x) and never touch the IO 
subsystem.  I am willing to have the job fail if it runs out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten 
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is 
ignored by the tungsten-sort shuffle manager; its optimized shuffles will 
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be 
needed to accomplish this?





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello;

I’m working on spark with very large memory systems (2TB+) and notice that 
Spark spills to disk in shuffle.  Is there a way to force spark to stay in 
memory when doing shuffle operations?   The goal is to keep the shuffle data 
either in the heap or in off-heap memory (in 1.6.x) and never touch the IO 
subsystem.  I am willing to have the job fail if it runs out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten 
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is 
ignored by the tungsten-sort shuffle manager; its optimized shuffles will 
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be 
needed to accomplish this?





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org