incubator-toree-website git commit: Added 'How it works' content

lbustelo Mon, 13 Jun 2016 13:27:12 -0700

Repository: incubator-toree-website
Updated Branches:
  refs/heads/OverhaulSite 9b329ef1f -> e7ff553dd



Added 'How it works' content


Project: http://git-wip-us.apache.org/repos/asf/incubator-toree-website/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-toree-website/commit/e7ff553d
Tree: 
http://git-wip-us.apache.org/repos/asf/incubator-toree-website/tree/e7ff553d
Diff: 
http://git-wip-us.apache.org/repos/asf/incubator-toree-website/diff/e7ff553d

Branch: refs/heads/OverhaulSite
Commit: e7ff553ddc9cabf263db3d18eb4ce241e9937233
Parents: 9b329ef
Author: Gino Bustelo <lbust...@us.ibm.com>
Authored: Mon Jun 13 15:21:11 2016 -0500
Committer: Gino Bustelo <lbust...@us.ibm.com>
Committed: Mon Jun 13 15:23:31 2016 -0500

----------------------------------------------------------------------
 assets/images/batch_mode.png          | Bin 0 -> 61060 bytes
 assets/images/interactive_mode.png    | Bin 0 -> 65268 bytes
 assets/images/toree_spark_gateway.png | Bin 0 -> 74504 bytes
 assets/images/toree_with_notebook.png | Bin 0 -> 52906 bytes
 documentation/user/how-it-works.md    |  60 +++++++++++++++++++++++++++--
 5 files changed, 57 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-toree-website/blob/e7ff553d/assets/images/batch_mode.png
----------------------------------------------------------------------
diff --git a/assets/images/batch_mode.png b/assets/images/batch_mode.png
new file mode 100644
index 0000000..18082f3
Binary files /dev/null and b/assets/images/batch_mode.png differ

http://git-wip-us.apache.org/repos/asf/incubator-toree-website/blob/e7ff553d/assets/images/interactive_mode.png
----------------------------------------------------------------------
diff --git a/assets/images/interactive_mode.png 
b/assets/images/interactive_mode.png
new file mode 100644
index 0000000..55abbc2
Binary files /dev/null and b/assets/images/interactive_mode.png differ

http://git-wip-us.apache.org/repos/asf/incubator-toree-website/blob/e7ff553d/assets/images/toree_spark_gateway.png
----------------------------------------------------------------------
diff --git a/assets/images/toree_spark_gateway.png 
b/assets/images/toree_spark_gateway.png
new file mode 100644
index 0000000..a18daa0
Binary files /dev/null and b/assets/images/toree_spark_gateway.png differ

http://git-wip-us.apache.org/repos/asf/incubator-toree-website/blob/e7ff553d/assets/images/toree_with_notebook.png
----------------------------------------------------------------------
diff --git a/assets/images/toree_with_notebook.png 
b/assets/images/toree_with_notebook.png
new file mode 100644
index 0000000..873142c
Binary files /dev/null and b/assets/images/toree_with_notebook.png differ

http://git-wip-us.apache.org/repos/asf/incubator-toree-website/blob/e7ff553d/documentation/user/how-it-works.md
----------------------------------------------------------------------
diff --git a/documentation/user/how-it-works.md 
b/documentation/user/how-it-works.md
index 86c6472..3aa999a 100644
--- a/documentation/user/how-it-works.md
+++ b/documentation/user/how-it-works.md
@@ -9,7 +9,61 @@ tagline: Apache Project !
 
 {% include JB/setup %}
 
-- Architecture in relation to Jupyter and Spark
-- Links to Jupyter kernel spec
-- Links to keynotes and presentations
+# How it works
+
+Toree provides an interactive programming interface to a Spark Cluster. It's 
API takes in `code` in a variety of 
+languages and executes it. The `code` can perform Spark tasks using the 
provided Spark Context. 
+
+To further understand how Toree works, it is worth exploring the role that it 
plays in several usage scenarios. 
+
+### As a Kernel to Jupyter Notebooks
+
+Toree's primary role is as a [Jupyter](http://jupyter.org/) Kernel. It was 
originally created to add full Spark API 
+support to a Jupyter Notebook using the Scala language. It since has grown to 
also support Python an R. The diagram 
+below shows Toree in relation to a running Jupyter Notebook.
+
+![Toree with Jupyter Notebook](/assets/images/toree_with_notebook.png)
+
+When the user creates a new Notebook and selects Toree, the Notebook server 
launches a new Toree process that is
+configured to connect to a Spark cluster. Once in the Notebook, the user can 
interact with Spark by writing code that
+uses the managed Spark Context instance.
+
+The Notebook server and Toree communicate using the [Jupyter Kernel 
Protocol](https://ipython.org/ipython-doc/3/development/messaging.html). 
+This is a [0MQ](http://zeromq.org/) based protocol that is language agnostic 
and allows for bidirectional communication
+between the client and the kernel (i.e. Toree). This protocol is the __ONLY__ 
network interface for communicating with a 
+Toree process. 
+
+When using Toree within a Jupyter Notebook, these technical details can be 
ignored, but they are very relevant when 
+building custom clients. Several options are discussed in the next section.
+
+### As an Interactive Gateway to Spark
+
+One way of using Spark is what is commonly referred to as 'Batch' mode. Very 
similar to other Big Data systems, such as 
+Hadoop, this mode has the user create a program that is submitted to the 
cluster. This program runs tasks in the
+cluster and ultimately writes data to some persistent store (i.e. HDFS or 
No-SQL store). Spark provided `Batch` mode
+support through [Spark 
Submit](http://spark.apache.org/docs/latest/submitting-applications.html).
+
+![Toree Gateway to Spark](/assets/images/batch_mode.png)
+
+This mode of using Spark, although valid, suffers from lots of friction. For 
example, packaging and submitting of jobs, as
+well as the reading and writing from storage, tend to introduce unwanted 
latencies. Spark alleviates some of the 
+frictions by relying on memory to hold data along with the concept of a 
SparkContext as a way to tie jobs together. What
+is missing from Spark is a way for applications to interact with a long living 
SparkContext. 
+
+![Toree Gateway to Spark](/assets/images/interactive_mode.png)
+
+Toree provides this through a communication channel between an application and 
a SparkContext that allows access to the 
+entire Spark API. Through this channel, the application interacts with Spark 
by exchanging code and data.
+
+The Jupyter Notebook is a good example of an application that relies on the 
presence of these interactive channels and
+uses Toree to access Spark. Other Spark enabled applications can be built that 
directly connect to Toree through the 
+`0MQ` protocol, but there are also other ways.
+
+![Toree Gateway to Spark](/assets/images/toree_spark_gateway.png)
+
+As shown above, the [Jupyter Kernel 
Gateway](https://github.com/jupyter/kernel_gateway) can be used to expose a Web 
+Socket based protocol to Toree. This makes Toree easier to integrate. In 
combination with the
+[jupyter-js-services](https://github.com/jupyter/jupyter-js-services) library, 
other web applications can access Spark
+interactively. The [Jupyter Dashboard 
Server](https://github.com/jupyter-incubator/dashboards_server) is an example of
+a web application that uses Toree as the backend to dynamic dashboards.

incubator-toree-website git commit: Added 'How it works' content

Reply via email to