Documentation: HDFS Adapter Add Spark HDFS Adapater documentation.
Signed-off-by: Jonas Pfefferle <peppe...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/incubator-crail/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-crail/commit/f1dcb0d2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-crail/tree/f1dcb0d2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-crail/diff/f1dcb0d2 Branch: refs/heads/master Commit: f1dcb0d20b6b492861e32bb3a919217cf17a98ac Parents: 0e536ca Author: Jonas Pfefferle <peppe...@apache.org> Authored: Wed Aug 15 10:45:54 2018 +0200 Committer: Jonas Pfefferle <peppe...@apache.org> Committed: Thu Sep 6 12:59:41 2018 +0200 ---------------------------------------------------------------------- doc/source/spark.rst | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-crail/blob/f1dcb0d2/doc/source/spark.rst ---------------------------------------------------------------------- diff --git a/doc/source/spark.rst b/doc/source/spark.rst index 3f222ad..b999ed8 100644 --- a/doc/source/spark.rst +++ b/doc/source/spark.rst @@ -1,9 +1,40 @@ Spark ===== +Crail can be used to increase performance or enhance flexibility in +`Apache Spark <https://spark.apache.org/>`_. We provide multiple plugins to allow +Crail to be used as: + +* :ref:`HDFS Adapter`: input and output +* :ref:`Spark-IO`: shuffle data and broadcast store + +HDFS Adapter +------------ + +The Crail HDFS adapter is provided with every Crail :ref:`deployment <Deploy Crail>`. +The HDFS adpater allows to replace every HDFS path with a path on Crail. +However for it to be used for input and output in Spark the jar file paths +have to be added to the Spark configuration spark-defaults.conf: + +.. code-block:: bash + + spark.driver.extraClassPath $CRAIL_HOME/jars/* + spark.executor.extraClassPath $CRAIL_HOME/jars/* + +Data in Crail can be accessed by prepending the value of :code:`crail.namenode.address` +from :ref:`crail-site.conf` to any HDFS path. For example :code:`crail://localhost:9060/test` +accesses :code:`/test` in Crail. +Note that Crail works independent of HDFS and does not interact with HDFS in +any way. However Crail does not completely replace HDFS since we do not offer +durability and fault tolerance cf. :ref:`Introduction`. +A good fit for Crail is for example inter-job data that can be recomputed +from the original data in HDFS. + Spark-IO -------- + + Crail-TeraSort --------------