Re: Spark performance tests
Thanks Adam, Kazuaki! On Tue, Jan 10, 2017 at 3:28 PM, Adam Roberts <arobe...@uk.ibm.com> wrote: > Hi, I suggest HiBench and SparkSqlPerf, HiBench features many benchmarks > within it that exercise several components of Spark (great for stressing > core, sql, MLlib capabilities), SparkSqlPerf features 99 TPC-DS queries > (stressing the DataFrame API and therefore the Catalyst optimiser), both > work well with Spark 2 > > HiBench: https://github.com/intel-hadoop/HiBench > SparkSqlPerf: https://github.com/databricks/spark-sql-perf > > > > > From:"Kazuaki Ishizaki" <ishiz...@jp.ibm.com> > To:Prasun Ratn <prasun.r...@gmail.com> > Cc:Apache Spark Dev <dev@spark.apache.org> > Date:10/01/2017 09:22 > Subject:Re: Spark performance tests > > > > > Hi, > You may find several micro-benchmarks under > https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark. > > Regards, > Kazuaki Ishizaki > > > > From:Prasun Ratn <prasun.r...@gmail.com> > To:Apache Spark Dev <dev@spark.apache.org> > Date:2017/01/10 12:52 > Subject:Spark performance tests > > > > > Hi > > Are there performance tests or microbenchmarks for Spark - especially > directed towards the CPU specific parts? I looked at spark-perf but > that doesn't seem to have been updated recently. > > Thanks > Prasun > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Spark performance tests
Hi Are there performance tests or microbenchmarks for Spark - especially directed towards the CPU specific parts? I looked at spark-perf but that doesn't seem to have been updated recently. Thanks Prasun - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Scaling issues due to contention in Random
Hi, I am seeing perf degradation in the Spark/Pi example on a single-node setup (using local[K]) Using 1, 2, 4, and 8 cores, this is the execution time in seconds for the same number of iterations:- Random: 4.0, 7.0, 12.96, 17.96 If I change the code to use ThreadLocalRandom (https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala#L35) it scales properly:- ThreadLocalRandom: 2.2, 1.4, 1.07, 1.00 I see a similar issue in Kryo serializer in another app - the push function shows up at the top of profile data, but goes away completely if I use ThreadLocalRandom https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/util/ObjectMap.java#L259 The JDK documentation (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html) says: > When applicable, use of ThreadLocalRandom? rather than shared Random objects > in concurrent programs will typically encounter much less overhead and > contention. Use of ThreadLocalRandom? is particularly appropriate when > multiple tasks (for example, each a ForkJoinTask? ) use random numbers in > parallel in thread pools I am using Spark 1.5 and Java 1.8.0_91. Is there any reason to prefer Random over ThreadLocalRandom? Thanks Prasun - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: trying to use Spark applications with modified Kryo
Thanks a lot Steve! On Mon, Oct 17, 2016 at 4:59 PM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 17 Oct 2016, at 10:02, Prasun Ratn <prasun.r...@gmail.com> wrote: > > Hi > > I want to run some Spark applications with some changes in Kryo serializer. > > Please correct me, but I think I need to recompile spark (instead of > just the Spark applications) in order to use the newly built Kryo > serializer? > > I obtained Kryo 3.0.3 source and built it (mvn package install). > > Next, I took the source code for Spark 2.0.1 and built it (build/mvn > -X -DskipTests -Dhadoop.version=2.6.0 clean package) > > I then compiled the Spark applications. > > However, I am not seeing my Kryo changes when I run the Spark applications. > > > Kryo versions are very brittle. > > You'll > > -need to get an up to date/consistent version of Chill, which is where the > transitive dependency on Kryo originates > -rebuild spark depending on that chill release > > if you want hive integration, probably also rebuild Hive to be consistent > too; the main reason Spark has its own Hive version is that > Kryo version sharing. > > https://github.com/JoshRosen/hive/commits/release-1.2.1-spark2 > > Kryo has repackaged their class locations between versions. This lets the > versions co-exist, but probably also explains why your apps aren't picking > up the diffs. > > Finally, keep an eye on this github PR > > https://github.com/twitter/chill/issues/252 > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
trying to use Spark applications with modified Kryo
Hi I want to run some Spark applications with some changes in Kryo serializer. Please correct me, but I think I need to recompile spark (instead of just the Spark applications) in order to use the newly built Kryo serializer? I obtained Kryo 3.0.3 source and built it (mvn package install). Next, I took the source code for Spark 2.0.1 and built it (build/mvn -X -DskipTests -Dhadoop.version=2.6.0 clean package) I then compiled the Spark applications. However, I am not seeing my Kryo changes when I run the Spark applications. Please let me know if my assumptions and steps are correct. Thank you Prasun - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org