Re: Hadoop is not working in dev environment on windows [since 2.7.7 update]. Thanks.

2019-03-20 Thread Berthold Reinwald
Please take a look at our setup instructions for windows that we created a 
while back for our KDD Tutorial. You may need to download a newer version 
of winutils from https://github.com/steveloughran/winutils, and don't 
forget to chmod the permissions.

The attached instructions are from 2017, so you need to adjust your 
versions.





From:   Janardhan 
To: Niketan Pansare , dev@systemml.apache.org, 
Matthias Boehm 
Date:   03/05/2019 07:54 PM
Subject:Hadoop is not working in dev environment on windows [since 
2.7.7 update]. Thanks.



Hi,

Since 2.7.7 update, my hadoop and winutils [prebuilt winutils.exe not
available] is not working because of file permissions.

As a workaround I have changed hadoop source locally to bypass the access
check.

But, is there any way one could run the tests without hadoop. spark-submit
is working fine for me.

Thanks,
Janardhan




1. Java 
===
The Java version should be > 1.8.

> java -version

Set JAVA_HOME environment variable and include %JAVA_HOME%\bin in the 
environment variable PATH

> ls "%JAVA_HOME%"

2. Spark

Download and extract Spark from https://spark.apache.org/downloads.html, 

> tar -xzf spark-2.1.0-bin-hadoop2.7.tgz

and set environment variable SPARK_HOME to point to the extracted directory.

Next step, install winutils:

- Download winutils.exe from 
http://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe
  
- Place it in c:\winutils\bin
- Set environment variable HADOOP_HOME to point to c:\winutils
- Add c:\winutils\bin to the environment variable PATH.
- Finally, modify permission of hive directory that will be used by spark

> winutils.exe chmod 777 /tmp/hive

Finally, check if Spark is correctly installed:

> %SPARK_HOME%\bin\spark-shell
> %SPARK_HOME%\bin\pyspark  

3. Python and Jupyter
=
Download and install Anaconda Python 2.7 from 
https://www.continuum.io/downloads#macos
(includes jupyter, and pip)


4. Libraries used in this tutorial
==

4.1 Graphviz


To check if Graphviz is installed on your system,

> dot --help

If you get an error, 

- Download and install 
http://www.graphviz.org/pub/graphviz/stable/windows/graphviz-2.38.msi
- Please ensure that the C:\Program Files (x86)\Graphviz2.38\bin folder in 
added to the PATH environment variable.

5. Apache SystemML
==
cd to tutorial folder, and install this version of Apache SystemML,  

> pip install ./systemml-1.0.0-SNAPSHOT-python.tgz

and start pyspark/Jupyter

> set PYSPARK_DRIVER_PYTHON=jupyter
> set PYSPARK_DRIVER_PYTHON_OPTS=notebook
> %SPARK_HOME%\bin\pyspark --driver-memory 8g


Hadoop is not working in dev environment on windows [since 2.7.7 update]. Thanks.

2019-03-05 Thread Janardhan
Hi,

Since 2.7.7 update, my hadoop and winutils [prebuilt winutils.exe not
available] is not working because of file permissions.

As a workaround I have changed hadoop source locally to bypass the access
check.

But, is there any way one could run the tests without hadoop. spark-submit
is working fine for me.

Thanks,
Janardhan