This looks pretty comprehensive to me. A few quick suggestions:

- On the VM part: we've actually been avoiding this in all the Databricks 
training efforts because the VM itself can be annoying to install and it makes 
it harder for people to really use Spark for development (they can learn it, 
but to do real development they'll probably want to install it in their own 
OS). We found things to work fine with just a binary distribution of Spark if 
you also get people to install Java before. So it might be worth a try.

- You may want to add a section on general debugging and tuning of JVM 
programs. For example using jstack to take a stack trace, using common 
profiling tools (e.g. jvisualvm), maybe understanding the memory consumption of 
various data structures.

- You may want some example early on that's an iterative algorithm, e.g. 
PageRank.

In case you haven't seen it, we've made a ton of educational resources 
available for free from Databricks; the latest ones are the Spark Summit 2014 
training sessions (http://spark-summit.org/2014/training), which include both a 
basic track and an advanced track.

Matei

On August 4, 2014 at 9:40:12 AM, Victor Tso-Guillen (v...@paxata.com) wrote:

I made a few small comments. Still a relative newbie, but hope it helps!


On Mon, Aug 4, 2014 at 9:08 AM, Jörn Franke <jornfra...@gmail.com> wrote:
Hi Chris,

I am currently working out a university course on Bi, Nosql (key/value, 
columnar, graph,document,search), big data (lambda architecture, hadoop, spark).
Your work looks quite ambitious.
You could elaborate as well on how you integrate different data sources with 
the spark cluster (kafka, Jdbc etc) and how to use spark streaming. 
Additionally, you could also.include dependency management between different 
software components as well as capacity management. Last but not least you 
could elaborate on the software delivery pipeline from development /ci cluster 
over acceptance cluster to production cluster.

All the best

Le 4 août 2014 14:25, "Chris London" <m...@chrislondon.co> a écrit :

Hey Everyone,

I'm thinking of creating an instructional video training course for Spark.  I 
don't know if I actually plan on publishing it or not, my goal is by creating 
this course I will become intimately familiar with Spark. I was wondering if 
you had a second you could look over my outline and give me any feedback.

Any sections too long? too short? Each lecture is planned to be about 10 
minutes, so maybe one 10 minute lecture on unit testing isn't enough?

Thanks in advance for your time:
https://docs.google.com/a/chrislondon.co/document/d/124x-MXWjNbT6qL7cDeyuIwC6gShPF6GlALxvdRpAMMk/edit#

Chris London

Reply via email to