Numpy and dynamic loading

Abhinav M Kulkarni Sun, 20 Dec 2015 09:50:19 -0800

I am running Spark programs on a large cluster (for which, I do not have
administrative privileges). numpy is not installed on the worker nodes.
Hence, I bundled numpy with my program, but I get the following error:


Traceback (most recent call last):
  File "/home/user/spark-script.py", line 12, in <module>
    import numpy
  File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line
170, in <module>
  File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line
13, in <module>
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line
8, in <module>
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py",
line 11, in <module>
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py",
line 6, in <module>
ImportError: cannot import name multiarray

The script is actually quite simple:

from pyspark import SparkConf, SparkContext
sc = SparkContext()

sc.addPyFile('numpy.zip')

import numpy

a = sc.parallelize(numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]))
print a.collect()

I understand that the error occurs because numpy dynamically loads
multiarray.so dependency and even if my numpy.zip file includes
multiarray.so file, somehow the dynamic loading doesn't work with Apache
Spark. Why so? And how do you othewise create a standalone numpymodule with
static linking?

P.S. The numpy.zip file I had included with the program was zipped version
of the numpy installation on my Ubuntu machine. I also tried downloading
numpy source and building it on my local machine and bundling it with the
program, but the problem persisted. My local machine and the worker nodes
both run Ubuntu 64.

Thanks.

Numpy and dynamic loading

Reply via email to