Please take a look at our setup instructions for windows that we created a
while back for our KDD Tutorial. You may need to download a newer version
of winutils from https://github.com/steveloughran/winutils, and don't
forget to chmod the permissions.
The attached instructions are from 2017, so you need to adjust your
versions.
From: Janardhan
To: Niketan Pansare , dev@systemml.apache.org,
Matthias Boehm
Date: 03/05/2019 07:54 PM
Subject:Hadoop is not working in dev environment on windows [since
2.7.7 update]. Thanks.
Hi,
Since 2.7.7 update, my hadoop and winutils [prebuilt winutils.exe not
available] is not working because of file permissions.
As a workaround I have changed hadoop source locally to bypass the access
check.
But, is there any way one could run the tests without hadoop. spark-submit
is working fine for me.
Thanks,
Janardhan
1. Java
===
The Java version should be > 1.8.
> java -version
Set JAVA_HOME environment variable and include %JAVA_HOME%\bin in the
environment variable PATH
> ls "%JAVA_HOME%"
2. Spark
Download and extract Spark from https://spark.apache.org/downloads.html,
> tar -xzf spark-2.1.0-bin-hadoop2.7.tgz
and set environment variable SPARK_HOME to point to the extracted directory.
Next step, install winutils:
- Download winutils.exe from
http://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe
- Place it in c:\winutils\bin
- Set environment variable HADOOP_HOME to point to c:\winutils
- Add c:\winutils\bin to the environment variable PATH.
- Finally, modify permission of hive directory that will be used by spark
> winutils.exe chmod 777 /tmp/hive
Finally, check if Spark is correctly installed:
> %SPARK_HOME%\bin\spark-shell
> %SPARK_HOME%\bin\pyspark
3. Python and Jupyter
=
Download and install Anaconda Python 2.7 from
https://www.continuum.io/downloads#macos
(includes jupyter, and pip)
4. Libraries used in this tutorial
==
4.1 Graphviz
To check if Graphviz is installed on your system,
> dot --help
If you get an error,
- Download and install
http://www.graphviz.org/pub/graphviz/stable/windows/graphviz-2.38.msi
- Please ensure that the C:\Program Files (x86)\Graphviz2.38\bin folder in
added to the PATH environment variable.
5. Apache SystemML
==
cd to tutorial folder, and install this version of Apache SystemML,
> pip install ./systemml-1.0.0-SNAPSHOT-python.tgz
and start pyspark/Jupyter
> set PYSPARK_DRIVER_PYTHON=jupyter
> set PYSPARK_DRIVER_PYTHON_OPTS=notebook
> %SPARK_HOME%\bin\pyspark --driver-memory 8g