>> I asked several people, no one seems to believe that we can do this:
>> $ PYTHONPATH=/path/to/assembly/jar python
>> >>> import pyspark
That is because people usually don't package python files into their jars.
For pyspark, however, this will work as long as the jar can be opened and
its conten
Yeah we need to add a build warning to the Maven build. Would you be
able to try compiling Spark with Java 6? It would be good to narrow
down if you hare hitting this problem or something else.
On Mon, Jun 2, 2014 at 1:15 PM, Xu (Simon) Chen wrote:
> Nope... didn't try java 6. The standard instal
Nope... didn't try java 6. The standard installation guide didn't say
anything about java 7 and suggested to do "-DskipTests" for the build..
http://spark.apache.org/docs/latest/building-with-maven.html
So, I didn't see the warning message...
On Mon, Jun 2, 2014 at 3:48 PM, Patrick Wendell wrot
Are you building Spark with Java 6 or Java 7. Java 6 uses the extended
Zip format and Java 7 uses Zip64. I think we've tried to add some
build warnings if Java 7 is used, for this reason:
https://github.com/apache/spark/blob/master/make-distribution.sh#L102
Any luck if you use JDK 6 to compile?
OK, my colleague found this:
https://mail.python.org/pipermail/python-list/2014-May/671353.html
And my jar file has 70011 files. Fantastic..
On Mon, Jun 2, 2014 at 2:34 PM, Xu (Simon) Chen wrote:
> I asked several people, no one seems to believe that we can do this:
> $ PYTHONPATH=/path/to/a
I asked several people, no one seems to believe that we can do this:
$ PYTHONPATH=/path/to/assembly/jar python
>>> import pyspark
This following pull request did mention something about generating a zip
file for all python related modules:
https://www.mail-archive.com/reviews@spark.apache.org/msg0
So, I did specify SPARK_JAR in my pyspark prog. I also checked the workers,
it seems that the jar file is distributed and included in classpath
correctly.
I think the problem is likely at step 3..
I build my jar file with maven, like this:
"mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.0.1
1) yes, that sc.parallelize(range(10)).count() has the same error.
2) the files seem to be correct
3) I have trouble at this step, "ImportError: No module named pyspark"
but I seem to have files in the jar file:
"""
$ PYTHONPATH=~/spark-assembly-1.0.0-hadoop2.3.0-cdh5.0.1.jar python
>>> import py
Hi Simon,
You shouldn't have to install pyspark on every worker node. In YARN mode,
pyspark is packaged into your assembly jar and shipped to your executors
automatically. This seems like a more general problem. There are a few
things to try:
1) Run a simple pyspark shell with yarn-client, and do
Hi folks,
I have a weird problem when using pyspark with yarn. I started ipython as
follows:
IPYTHON=1 ./pyspark --master yarn-client --executor-cores 4 --num-executors
4 --executor-memory 4G
When I create a notebook, I can see workers being created and indeed I see
spark UI running on my client
10 matches
Mail list logo