Let me be more specific:
With GC/CPU aware task scheduling, user doesn't have to worry about
specifying cores carefully. So if the user always specify cores = 100 or
1024 for every executor, he will still not get OOM (under vast majority of
cases). Internally, the scheduler will vary the number o
Thanks for the update.
What about cores per executor?
On Tue, 27 Mar 2018 at 6:45 Rohit Karlupia wrote:
> Thanks Fawze!
>
> On the memory front, I am currently working on GC and CPU aware task
> scheduling. I see wonderful results based on my tests so far. Once the
> feature is complete and av
Thanks Fawze!
On the memory front, I am currently working on GC and CPU aware task
scheduling. I see wonderful results based on my tests so far. Once the
feature is complete and available, spark will work with whatever memory is
provided (at least enough for the largest possible task). It will al
Hi Rohit,
I would like to thank you for the unlimited patience and support that you
are providing here and behind the scene for all of us.
The tool is amazing and easy to use and understand most of the metrics ...
Thinking if we need to run it in cluster mode and all the time, i think we
can ski
Hi Shmuel,
In general it is hard to pin point to exact code which is responsible for a
specific stage. For example when using spark sql, depending upon the kind
of joins, aggregations used in the the single line of query, we will have
multiple stages in the spark application. I usually try to spli
Hi Rohit,
Thanks for the analysis.
I can use repartition on the slow task. But how can I tell what part of the
code is in charge of the slow tasks?
It would be great if you could further explain the rest of the output.
Thanks in advance,
Shmuel
On Sun, Mar 25, 2018 at 12:46 PM, Rohit Karlupia
Thanks Shamuel for trying out sparklens!
Couple of things that I noticed:
1) 250 executors is probably overkill for this job. It would run in same
time with around 100.
2) Many of stages that take long time have only 200 tasks where as we have
750 cores available for the job. 200 is the default va
I ran it on a single job.
SparkLens has an overhead on the job duration. I'm not ready to enable it
by default on all our jobs.
Attached is the output.
Still trying to understand what exactly it means.
On Sun, Mar 25, 2018 at 10:40 AM, Fawze Abujaber wrote:
> Nice!
>
> Shmuel, Were you able to
Nice!
Shmuel, Were you able to run on a cluster level or for a specific job?
Did you configure it on the spark-default.conf?
On Sun, 25 Mar 2018 at 10:34 Shmuel Blitz
wrote:
> Just to let you know, I have managed to run SparkLens on our cluster.
>
> I switched to the spark_1.6 branch, and also
Just to let you know, I have managed to run SparkLens on our cluster.
I switched to the spark_1.6 branch, and also compiled against the specific
image of Spark we are using (cdh5.7.6).
Now I need to figure out what the output means... :P
Shmuel
On Fri, Mar 23, 2018 at 7:24 PM, Fawze Abujaber w
Quick question:
how to add the --jars /path/to/sparklens_2.11-0.1.0.jar to the
spark-default conf, should it be using:
spark.driver.extraClassPath /path/to/sparklens_2.11-0.1.0.jar or i should
use spark.jars option? anyone who could give an example how it should be,
and if i the path for the jar
Hi Shmuel,
Did you compile the code against the right branch for Spark 1.6.
I tested it and it looks working and now i'm testing the branch for a wide
tests, Please use the branch for Spark 1.6
On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz
wrote:
> Hi Rohit,
>
> Thanks for sharing this great
Hi Rohit,
Thanks for sharing this great tool.
I tried running a spark job with the tool, but it failed with an
*IncompatibleClassChangeError
*Exception.
I have opened an issue on Github.(
https://github.com/qubole/sparklens/issues/1)
Shmuel
On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz
wrote:
Thanks.
We will give this a try and report back.
Shmuel
On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia wrote:
> Thanks everyone!
> Please share how it works and how it doesn't. Both help.
>
> Fawaze, just made few changes to make this work with spark 1.6. Can you
> please try building from br
Thanks everyone!
Please share how it works and how it doesn't. Both help.
Fawaze, just made few changes to make this work with spark 1.6. Can you
please try building from branch *spark_1.6*
thanks,
rohitk
On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber wrote:
> It's super amazing i see
It's super amazing i see it was tested on spark 2.0.0 and above, what
about Spark 1.6 which is still part of Cloudera's main versions?
We have a vast Spark applications with version 1.6.0
On Thu, Mar 22, 2018 at 6:38 AM, Holden Karau wrote:
> Super exciting! I look forward to digging throu
Super exciting! I look forward to digging through it this weekend.
On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) <
ravishankar.n...@gmail.com> wrote:
> Excellent. You filled a missing link.
>
> Best,
> Passion
>
> On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia
> wrote:
>
>> Hi,
>>
>>
Excellent. You filled a missing link.
Best,
Passion
On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia wrote:
> Hi,
>
> Happy to announce the availability of Sparklens as open source project. It
> helps in understanding the scalability limits of spark applications and
> can be a useful guide on
Hi,
Happy to announce the availability of Sparklens as open source project. It
helps in understanding the scalability limits of spark applications and
can be a useful guide on the path towards tuning applications for lower
runtime or cost.
Please clone from here: https://github.com/qubole/sparkl
19 matches
Mail list logo