Re: Spark performance tests

2017-01-10 Thread Prasun Ratn
Thanks Adam, Kazuaki!

On Tue, Jan 10, 2017 at 3:28 PM, Adam Roberts <arobe...@uk.ibm.com> wrote:
> Hi, I suggest HiBench and SparkSqlPerf, HiBench features many benchmarks
> within it that exercise several components of Spark (great for stressing
> core, sql, MLlib capabilities), SparkSqlPerf features 99 TPC-DS queries
> (stressing the DataFrame API and therefore the Catalyst optimiser), both
> work well with Spark 2
>
> HiBench: https://github.com/intel-hadoop/HiBench
> SparkSqlPerf: https://github.com/databricks/spark-sql-perf
>
>
>
>
> From:"Kazuaki Ishizaki" <ishiz...@jp.ibm.com>
> To:Prasun Ratn <prasun.r...@gmail.com>
> Cc:Apache Spark Dev <dev@spark.apache.org>
> Date:10/01/2017 09:22
> Subject:Re: Spark performance tests
> 
>
>
>
> Hi,
> You may find several micro-benchmarks under
> https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark.
>
> Regards,
> Kazuaki Ishizaki
>
>
>
> From:Prasun Ratn <prasun.r...@gmail.com>
> To:Apache Spark Dev <dev@spark.apache.org>
> Date:2017/01/10 12:52
> Subject:Spark performance tests
> 
>
>
>
> Hi
>
> Are there performance tests or microbenchmarks for Spark - especially
> directed towards the CPU specific parts? I looked at spark-perf but
> that doesn't seem to have been updated recently.
>
> Thanks
> Prasun
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark performance tests

2017-01-09 Thread Prasun Ratn
Hi

Are there performance tests or microbenchmarks for Spark - especially
directed towards the CPU specific parts? I looked at spark-perf but
that doesn't seem to have been updated recently.

Thanks
Prasun

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Scaling issues due to contention in Random

2016-11-24 Thread Prasun Ratn
Hi,

I am seeing perf degradation in the Spark/Pi example on a single-node
setup (using local[K])

Using 1, 2, 4, and 8 cores, this is the execution time in seconds for
the same number of iterations:-
Random: 4.0, 7.0, 12.96, 17.96

If I change the code to use ThreadLocalRandom
(https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala#L35)
it scales properly:-
ThreadLocalRandom: 2.2, 1.4, 1.07, 1.00

I see a similar issue in Kryo serializer in another app - the push
function shows up at the top of profile data, but goes away completely
if I use ThreadLocalRandom

https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/util/ObjectMap.java#L259

The JDK documentation
(https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html)
says:

> When applicable, use of ThreadLocalRandom? rather than shared Random objects 
> in concurrent programs will typically encounter much less overhead and 
> contention. Use of ThreadLocalRandom? is particularly appropriate when 
> multiple tasks (for example, each a ForkJoinTask? ) use random numbers in 
> parallel in thread pools

I am using Spark 1.5 and Java 1.8.0_91.

Is there any reason to prefer Random over ThreadLocalRandom?

Thanks
Prasun

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: trying to use Spark applications with modified Kryo

2016-10-17 Thread Prasun Ratn
Thanks a lot Steve!

On Mon, Oct 17, 2016 at 4:59 PM, Steve Loughran <ste...@hortonworks.com> wrote:
>
> On 17 Oct 2016, at 10:02, Prasun Ratn <prasun.r...@gmail.com> wrote:
>
> Hi
>
> I want to run some Spark applications with some changes in Kryo serializer.
>
> Please correct me, but I think I need to recompile spark (instead of
> just the Spark applications) in order to use the newly built Kryo
> serializer?
>
> I obtained Kryo 3.0.3 source and built it (mvn package install).
>
> Next, I took the source code for Spark 2.0.1 and built it (build/mvn
> -X -DskipTests -Dhadoop.version=2.6.0 clean package)
>
> I then compiled the Spark applications.
>
> However, I am not seeing my Kryo changes when I run the Spark applications.
>
>
> Kryo versions are very brittle.
>
> You'll
>
> -need to get an up to date/consistent version of Chill, which is where the
> transitive dependency on Kryo originates
> -rebuild spark depending on that chill release
>
> if you want hive integration, probably also rebuild Hive to be consistent
> too; the main reason Spark has its own Hive version is that
> Kryo version sharing.
>
> https://github.com/JoshRosen/hive/commits/release-1.2.1-spark2
>
> Kryo has repackaged their class locations between versions. This lets the
> versions co-exist, but probably also explains why your apps aren't picking
> up the diffs.
>
> Finally, keep an eye on this github PR
>
> https://github.com/twitter/chill/issues/252
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



trying to use Spark applications with modified Kryo

2016-10-17 Thread Prasun Ratn
Hi

I want to run some Spark applications with some changes in Kryo serializer.

Please correct me, but I think I need to recompile spark (instead of
just the Spark applications) in order to use the newly built Kryo
serializer?

I obtained Kryo 3.0.3 source and built it (mvn package install).

Next, I took the source code for Spark 2.0.1 and built it (build/mvn
-X -DskipTests -Dhadoop.version=2.6.0 clean package)

I then compiled the Spark applications.

However, I am not seeing my Kryo changes when I run the Spark applications.

Please let me know if my assumptions and steps are correct.

Thank you
Prasun

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org