Hi Andrew,

I suggest that you narrow down your scope for performance testing using the
same setup and doing incremental changes keeping other systematics the same.

Spark itself can run on local, standalone, yarn client and yarn cluster
modes So really you need to target a particular setup of run and a
particular application like SQL, streaming etc.

And then increment the memory keeping cores the same etc.

For test data you can create your own data using Linux shell scripts etc.
Then I would say the test will have more meaning.

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 9 July 2016 at 05:28, Andrew Ehrlich <and...@aehrlich.com> wrote:

> Yea, I'm looking for any personal experiences people have had with tools
> like these.
>
> On Jul 8, 2016, at 8:57 PM, charles li <charles.up...@gmail.com> wrote:
>
> Hi, Andrew, I've got lots of materials when asking google for "*spark
> performance test*"
>
>
>    - https://github.com/databricks/spark-perf
>    -
>    
> https://spark-summit.org/2014/wp-content/uploads/2014/06/Testing-Spark-Best-Practices-Anupama-Shetty-Neil-Marshall.pdf
>    - http://people.cs.vt.edu/~butta/docs/tpctc2015-sparkbench.pdf
>
>
>
> On Sat, Jul 9, 2016 at 11:40 AM, Andrew Ehrlich <and...@aehrlich.com>
> wrote:
>
>> Hi group,
>>
>> What solutions are people using to do performance testing and tuning of
>> spark applications? I have been doing a pretty manual technique where I lay
>> out an Excel sheet of various memory settings and caching parameters and
>> then execute each one by hand. It’s pretty tedious though, so I’m wondering
>> what others do, and if you do performance testing at all.  Also, is anyone
>> generating test data, or just operating on a static set? Is regression
>> testing for performance a thing?
>>
>> Andrew
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> *___________________*
> Quant | Engineer | Boy
> *___________________*
> *blog*:    http://litaotao.github.io
> *github*: www.github.com/litaotao
>
>

Reply via email to