Error running spark-sql-perf version 0.3.2 against Spark 1.6

2016-04-27 Thread Michael Slavitch
Hello; I'm trying to run spark-sql-perf version 0.3.2 (hash cb0347b) against Spark 1.6, I get the following when running ./bin/run --benchmark DatsetPerformance Exception in thread "main" java.lang.ClassNotFoundException: com.databricks.spark.sql.perf.DatsetPerformance Even though the cl

Re: Spark on Mobile platforms

2016-04-07 Thread Michael Slavitch
You should consider mobile agents that feed data into a spark datacenter via spark streaming. > On Apr 7, 2016, at 8:28 AM, Ashic Mahtab wrote: > > Spark may not be the right tool for this. Working on just the mobile device, > you won't be scaling out stuff, and as such most of the benefits o

Re: lost executor due to large shuffle spill memory

2016-04-06 Thread Michael Slavitch
increase > spark.storage.memoryFraction? Also I'm thinking maybe I should repartition > all_pairs so that each partition will be small enough to be handled. > > On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <mailto:slavi...@gmail.com>> wrote: > Do you have enough disk

Re: lost executor due to large shuffle spill memory

2016-04-05 Thread Michael Slavitch
Do you have enough disk space for the spill? It seems it has lots of memory reserved but not enough for the spill. You will need a disk that can handle the entire data partition for each host. Compression of the spilled data saves about 50% in most if not all cases. Given the large data set I

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Michael Slavitch
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly propagated to all nodes? Are they identical? > On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o resolution;

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
ou could try to > hack a new shuffle implementation, since shuffle framework is pluggable. > > > On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch > wrote: > >> As I mentioned earlier this flag is now ignored. >> >> >> On Fri, Apr 1, 2016, 6:39 PM Michael S

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
As I mentioned earlier this flag is now ignored. On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch wrote: > Shuffling a 1tb set of keys and values (aka sort by key) results in about > 500gb of io to disk if compression is enabled. Is there any way to > eliminate shuffling causing io? &

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
pill has > nothing to do with the shuffle files on disk. It was for the partitioning > (i.e. sorting) process. If that flag is off, Spark will just run out of > memory when data doesn't fit in memory. > > > On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch > wrote: > >&g

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
oncerns with taking that approach to test ? (I dont see > any, but I am not sure if I missed something). > > > Regards, > Mridul > > > > > On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch > wrote: > > I totally disagree that it’s not a problem. > >

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
number of beefy > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into improving > performance for those. Meantime, you can setup local ramdisks on each node > for shuffle writes. > > > > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <mailto:slavi...@g

Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and never

Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and never