Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
pe that helps. thanks, rohitk On Tue, Mar 27, 2018 at 9:20 AM, Fawze Abujaber wrote: > Thanks for the update. > > What about cores per executor? > > On Tue, 27 Mar 2018 at 6:45 Rohit Karlupia wrote: > >> Thanks Fawze! >> >> On the memory front, I

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
m spark job with 1 exec and 3 > cores and for sure the same compare with different exec memory. > > Overall, it is so good starting point, but it will be a GAME CHANGER > getting these metrics on the tool. > > @Rohit , Huge THANY YOU > > On Mon, Mar 26, 2018 at 1:35 PM, Roh

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-26 Thread Rohit Karlupia
est of the output. > > Thanks in advance, > Shmuel > > On Sun, Mar 25, 2018 at 12:46 PM, Rohit Karlupia > wrote: > >> Thanks Shamuel for trying out sparklens! >> >> Couple of things that I noticed: >> 1) 250 executors is probably overkill for this job

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Rohit Karlupia
gt; wrote: >>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> Thanks for sharing this great tool. >>>>>> I tried running a spark job with the tool, but it failed with an >>>>>> *IncompatibleClassChangeError >&

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Rohit Karlupia
ng! I look forward to digging through it this weekend. >> >> On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) < >> ravishankar.n...@gmail.com> wrote: >> >>> Excellent. You filled a missing link. >>> >>> Best, >>> Passion &g

Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Rohit Karlupia
Hi, Happy to announce the availability of Sparklens as open source project. It helps in understanding the scalability limits of spark applications and can be a useful guide on the path towards tuning applications for lower runtime or cost. Please clone from here: https://github.com/qubole/sparkl

Spark Tuning Tool

2018-01-22 Thread Rohit Karlupia
for some interest in the community if people find this work interesting and would like to try to it out. thanks, Rohit Karlupia

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Rohit Karlupia
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max input/ma

Re: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

2017-05-07 Thread Rohit Karlupia
Last time I checked, this happens only with Spark < 2.0.0. The reason is ServiceLoader used for loading all fileSystems from the classpath. In pre Spark < 2.0.0 tachyon.hadoop.TFS was packaged with Spark distribution and gets loaded irrespective of it being used or not. Moving to Spark 2.0.0+ will

Re: Setting Optimal Number of Spark Executor Instances

2017-03-15 Thread Rohit Karlupia
Number of tasks is very likely not the reason for getting timeouts. Few things to look for: What is actually timing out? What kind of operation? Writing/Reading to HSDF (NameNode or DataNode) or fetching shuffle data (External Shuffle Service or not) or driver is not able to talk to executor. Tri

Re: spark sql jobs heap memory

2016-11-24 Thread Rohit Karlupia
Dataset/dataframes will use direct/raw/off-heap memory in the most efficient columnar fashion. Trying to fit the same amount of data in heap memory would likely increase your memory requirement and decrease the speed. So, in short, don't worry about it and increase overhead. You can also set a bou