Re: How Spark utilize low-level architecture features?

2016-01-20 Thread Boric Tan
Anyone could shed some light on this?

Thanks,
Boric

On Tue, Jan 19, 2016 at 4:12 PM, Boric Tan  wrote:

> Hi there,
>
> I am new to Spark, and would like to get some help to understand if Spark
> can utilize the underlying architectures for better performance. If so, how
> does it do it?
>
> For example, assume there is a cluster built with machines of different
> CPUs, will Spark check the individual CPU information and use some
> machine-specific setting for the tasks assigned to that machine? Or is it
> totally dependent on the underlying JVM implementation to run the JAR file,
> and therefor the JVM is the place to check if certain CPU features can be
> used?
>
> Thanks,
> Boric
>


RE: Using CUDA within Spark / boosting linear algebra

2016-01-20 Thread Ulanov, Alexander
Hi Everyone,

I’ve updated the benchmark and done experiments with new hardware with 2x 
Nvidia Tesla K80 (physically 4x Tesla K40) and 2x modern Haswell CPU Intel 
E5-2650 v3 @ 2.30GHz.

This time I computed average and median of 10 runs for each of experiment and 
approximated FLOPS.

Results are available at google docs (old experiments are in the other 2 
sheets):
https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing
Benchmark code:
https://github.com/avulanov/scala-blas

Best regards, Alexander


From: Sam Halliday [mailto:sam.halli...@gmail.com]
Sent: Thursday, March 26, 2015 9:27 AM
To: John Canny
Cc: Xiangrui Meng; dev@spark.apache.org; Joseph Bradley; Evan R. Sparks; 
Ulanov, Alexander
Subject: Re: Using CUDA within Spark / boosting linear algebra


John, I have to disagree with you there. Dense matrices come up a lot in 
industry,  although your personal experience may be different.
On 26 Mar 2015 16:20, "John Canny" 
> wrote:
I mentioned this earlier in the thread, but I'll put it out again. Dense BLAS 
are not very important for most machine learning workloads: at least for 
non-image workloads in industry (and for image processing you would probably 
want a deep learning/SGD solution with convolution kernels). e.g. it was only 
relevant for 1/7 of our recent benchmarks, which should be a reasonable sample. 
What really matters is sparse BLAS performance. BIDMat is still an order of 
magnitude faster there. Those kernels are only in BIDMat, since NVIDIAs sparse 
BLAS dont perform well on power-law data.

Its also the case that the overall performance of an algorithm is determined by 
the slowest kernel, not the fastest. If the goal is to get closer to BIDMach's 
performance on typical problems, you need to make sure that every kernel goes 
at comparable speed. So the real question is how much faster MLLib routines do 
on a complete problem with/without GPU acceleration. For BIDMach, its close to 
a factor of 10. But that required running entirely on the GPU, and making sure 
every kernel is close to its limit.

-John

If you think nvblas would be helpful, you should try it in some end-to-end 
benchmarks.
On 3/25/15, 6:23 PM, Evan R. Sparks wrote:
Yeah, much more reasonable - nice to know that we can get full GPU performance 
from breeze/netlib-java - meaning there's no compelling performance reason to 
switch out our current linear algebra library (at least as far as this 
benchmark is concerned).

Instead, it looks like a user guide for configuring Spark/MLlib to use the 
right BLAS library will get us most of the way there. Or, would it make sense 
to finally ship openblas compiled for some common platforms (64-bit linux, 
windows, mac) directly with Spark - hopefully eliminating the jblas warnings 
once and for all for most users? (Licensing is BSD) Or am I missing something?

On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander 
> wrote:
As everyone suggested, the results were too good to be true, so I 
double-checked them. It turns that nvblas did not do multiplication due to 
parameter NVBLAS_TILE_DIM from "nvblas.conf" and returned zero matrix. My 
previously posted results with nvblas are matrices copying only. The default 
NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I handpicked 
other values that worked. As a result, netlib+nvblas is on par with 
BIDMat-cuda. As promised, I am going to post a how-to for nvblas configuration.

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing



-Original Message-
From: Ulanov, Alexander
Sent: Wednesday, March 25, 2015 2:31 PM
To: Sam Halliday
Cc: dev@spark.apache.org; Xiangrui Meng; Joseph 
Bradley; Evan R. Sparks; jfcanny
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi again,

I finally managed to use nvblas within Spark+netlib-java. It has exceptional 
performance for big matrices with Double, faster than BIDMat-cuda with Float. 
But for smaller matrices, if you will copy them to/from GPU, OpenBlas or MKL 
might be a better choice. This correlates with original nvblas presentation on 
GPU conf 2013 (slide 21): 
http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf

My results:
https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

Just in case, these tests are not for generalization of performance of 
different libraries. I just want to pick a library that does at best dense 
matrices multiplication for my task.

P.S. My previous issue with nvblas was the following: it has Fortran blas 
functions, at the same time netlib-java uses C cblas functions. So, one needs 
cblas shared library to use nvblas through netlib-java. Fedora does not have 
cblas (but Debian and 

Re: spark task scheduling delay

2016-01-20 Thread Stephen Boesch
Which Resource Manager  are you using?

2016-01-20 21:38 GMT-08:00 Renu Yadav :

> Any suggestions?
>
> On Wed, Jan 20, 2016 at 6:50 PM, Renu Yadav  wrote:
>
>> Hi ,
>>
>> I am facing spark   task scheduling delay issue in spark 1.4.
>>
>> suppose I have 1600 tasks running then 1550 tasks runs fine but for the
>> remaining 50 i am facing task delay even if the input size of these task is
>> same as the above 1550 tasks
>>
>> Please suggest some solution.
>>
>> Thanks & Regards
>> Renu Yadav
>>
>
>


Re: Removing the Mesos fine-grained mode

2016-01-20 Thread Iulian Dragoș
That'd be great, thanks Adam!

On Tue, Jan 19, 2016 at 5:41 PM, Adam McElwee  wrote:

> Sorry, I never got a chance to circle back with the master logs for this.
> I definitely can't share the job code, since it's used to build a pretty
> core dataset for my company, but let me see if I can pull some logs
> together in the next couple days.
>
> On Tue, Jan 19, 2016 at 10:08 AM, Iulian Dragoș <
> iulian.dra...@typesafe.com> wrote:
>
>> It would be good to get to the bottom of this.
>>
>> Adam, could you share the Spark app that you're using to test this?
>>
>> iulian
>>
>> On Mon, Nov 30, 2015 at 10:10 PM, Timothy Chen  wrote:
>>
>>> Hi Adam,
>>>
>>> Thanks for the graphs and the tests, definitely interested to dig a
>>> bit deeper to find out what's could be the cause of this.
>>>
>>> Do you have the spark driver logs for both runs?
>>>
>>> Tim
>>>
>>> On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee  wrote:
>>> > To eliminate any skepticism around whether cpu is a good performance
>>> metric
>>> > for this workload, I did a couple comparison runs of an example job to
>>> > demonstrate a more universal change in performance metrics (stage/job
>>> time)
>>> > between coarse and fine-grained mode on mesos.
>>> >
>>> > The workload is identical here - pulling tgz archives from s3, parsing
>>> json
>>> > lines from the files and ultimately creating documents to index into
>>> solr.
>>> > The tasks are not inserting into solr (just to let you know that
>>> there's no
>>> > network side-effect of the map task). The runs are on the same exact
>>> > hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
>>> > exact same jvm and it's not dependent on order of running the jobs,
>>> meaning
>>> > I get the same results whether I run the coarse first or whether I run
>>> the
>>> > fine-grained first. No other frameworks/tasks are running on the mesos
>>> > cluster during the test. I see the same results whether it's a 3-node
>>> > cluster, or whether it's a 200-node cluster.
>>> >
>>> > With the CMS collector in fine-grained mode, the map stage takes
>>> roughly
>>> > 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially
>>> start
>>> > out performing similarly, the total execution time gap widens as the
>>> job
>>> > size grows. To put that another way, the difference is much smaller for
>>> > jobs/stages < 1 hour. When I submit this job for a much larger dataset
>>> that
>>> > takes 5+ hours, the difference in total stage time moves closer and
>>> closer
>>> > to roughly 20-30% longer execution time.
>>> >
>>> > With the G1 collector in fine-grained mode, the map stage takes roughly
>>> > 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and
>>> coarse-grained
>>> > execution tests are on the exact same machines, exact same dataset,
>>> and only
>>> > changing spark.mesos.coarse to true/false.
>>> >
>>> > Let me know if there's anything else I can provide here.
>>> >
>>> > Thanks,
>>> > -Adam
>>> >
>>> >
>>> > On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee 
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
>>> >>  wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee 
>>> wrote:
>>> 
>>>  I've used fine-grained mode on our mesos spark clusters until this
>>> week,
>>>  mostly because it was the default. I started trying coarse-grained
>>> because
>>>  of the recent chatter on the mailing list about wanting to move the
>>> mesos
>>>  execution path to coarse-grained only. The odd things is,
>>> coarse-grained vs
>>>  fine-grained seems to yield drastic cluster utilization metrics for
>>> any of
>>>  our jobs that I've tried out this week.
>>> 
>>>  If this is best as a new thread, please let me know, and I'll try
>>> not to
>>>  derail this conversation. Otherwise, details below:
>>> >>>
>>> >>>
>>> >>> I think it's ok to discuss it here.
>>> >>>
>>> 
>>>  We monitor our spark clusters with ganglia, and historically, we
>>>  maintain at least 90% cpu utilization across the cluster. Making a
>>> single
>>>  configuration change to use coarse-grained execution instead of
>>> fine-grained
>>>  consistently yields a cpu utilization pattern that starts around
>>> 90% at the
>>>  beginning of the job, and then it slowly decreases over the next
>>> 1-1.5 hours
>>>  to level out around 65% cpu utilization on the cluster. Does anyone
>>> have a
>>>  clue why I'd be seeing such a negative effect of switching to
>>> coarse-grained
>>>  mode? GC activity is comparable in both cases. I've tried 1.5.2, as
>>> well as
>>>  the 1.6.0 preview tag that's on github.
>>> >>>
>>> >>>
>>> >>> I'm not very familiar with Ganglia, and how it computes utilization.
>>> But
>>> >>> one thing comes to mind: did you enable dynamic allocation on
>>> 

Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Marcelo Vanzin
On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski  wrote:
> /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes...
> [error] Cannot run program "javac": error=2, No such file or directory

That doesn't exactly look like a Spark problem.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Jacek Laskowski
Hi,

Build is broken again for me :( I build for Scala 2.11.7 and use
maven. Is this a known issue? Anyone looking into it?

➜  spark git:(master) ✗ ./build/mvn -Pyarn -Phadoop-2.6
-Dhadoop.version=2.7.1 -Dscala-2.11 -Phive -Phive-thriftserver
-DskipTests clean install
...
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @
spark-test-tags_2.11 ---
[INFO] Using zinc server for incremental compilation
[warn] Pruning sources from previous analysis, due to incompatible CompileSetup.
[info] Compiling 3 Java sources to
/Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes...
[error] Cannot run program "javac": error=2, No such file or directory
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [  2.843 s]
[INFO] Spark Project Test Tags  FAILURE [  0.321 s]


Pozdrawiam,
Jacek

Jacek Laskowski | https://medium.com/@jaceklaskowski/
Mastering Apache Spark
==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Jacek Laskowski
On Wed, Jan 20, 2016 at 8:48 PM, Marcelo Vanzin  wrote:
> On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski  wrote:
>> /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes...
>> [error] Cannot run program "javac": error=2, No such file or directory
>
> That doesn't exactly look like a Spark problem.

You're right and moreover I just today upgraded Java 8 to the latest
release. Kafka compiles fine (they use gradle, though). Other apps
build fine too. Just a friendly heads-up.

Jacek

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Jacek Laskowski
On Wed, Jan 20, 2016 at 8:48 PM, Marcelo Vanzin  wrote:
> On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski  wrote:
>> /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes...
>> [error] Cannot run program "javac": error=2, No such file or directory
>
> That doesn't exactly look like a Spark problem.

It *was* a Spark problem. The issue was that zinc was up while I
upgraded JDK and eventually it couldn't find proper binaries. When I
killed com.typesafe.zinc.Nailgun the build went fine.

I remember I saw the issue reported in the past and when I was
completely hopeless to figure it out without rebooting the machine the
idea of zinc being "misconfigured" came! `jps -lm` to the rescue!

Sorry for the noise.

Jacek

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Sean Owen
That's not a Spark problem. Your compiler was not available.

On Wed, Jan 20, 2016 at 10:44 PM, Jacek Laskowski  wrote:
> On Wed, Jan 20, 2016 at 8:48 PM, Marcelo Vanzin  wrote:
>> On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski  wrote:
>>> /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes...
>>> [error] Cannot run program "javac": error=2, No such file or directory
>>
>> That doesn't exactly look like a Spark problem.
>
> It *was* a Spark problem. The issue was that zinc was up while I
> upgraded JDK and eventually it couldn't find proper binaries. When I
> killed com.typesafe.zinc.Nailgun the build went fine.
>
> I remember I saw the issue reported in the past and when I was
> completely hopeless to figure it out without rebooting the machine the
> idea of zinc being "misconfigured" came! `jps -lm` to the rescue!
>
> Sorry for the noise.
>
> Jacek
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Optimized toIndexedRowMatrix

2016-01-20 Thread Driesprong, Fokko
Hi guys,

I've been working on an optimized implementation of the toIndexedRowMatrix

of the BlockMatrix. I already created a ticket
 and submitted a pull
 request at Github. What has to
be done to get this accepted? All the tests are passing.

On my own Github I created a project
 to see how the
performance is affected, for dense matrices this is a speedup of almost 19
times. Also for sparse matrices it will most likely be more optimal, as the
current implementation requires a lot of shuffling and creates high volumes
of intermediate objects (unless it is super sparse, but then also a
BlockMatrix would not be very optimal).

I would appreciate suggestions or tips to get this accepted.

Cheers, Fokko Driesprong.