Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-41798885
Maybe I have something configured wrong, but I'm still getting a lot of
EOFExceptions. Certain actions seem to work fine, but when I try to do
anything that really runs on the executors I get EOFExceptions again and
/usr/bin/python: No module named pyspark. I'm just using whats checked into
master.
// this works
>>> words = sc.textFile("README.md")
>>> words.filter(lambda w: w.startswith("spar")).take(5)
>>> words.collect()
// this doesn't
>>> words = sc.textFile("README.md")
>>> words.filter(lambda w: w.startswith("spar")).collect()
>>> wods.count()
ideas?
I checked and PYTHONPATH is set on the executor to be =spark.jar, and py4j
is in the assembly jar. launching with MASTER=yarn-client ./bin/pyspark
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---