Github user dianacarroll commented on the pull request:
https://github.com/apache/spark/pull/294#issuecomment-39340552
Well, my perspective is always that of a new easily confused user. (That's
my target audience.) I myself got tired of typing "IPYTHON=1" every time I
started pyspark, so I did what I'm guessing most people will do which is to
set that as an environment variable in my profile. Which was fine until
the first time I tried running a pyspark script.
Your doc change explicitly recommends I not do that, but...well, really?
It just makes learning pyspark that much more confusing. The "pyspark"
command is going to be most new users' main entry point into this new
technology.
In the new Spark class I'm working on, which uses mainly Python, I had
planned all along to set IPYTHON for the students automatically, because it
is sooooo much easier working in IPython than vanilla, and so tedious to
have to type that in repeatedly (or use command history every time.)
I think what's going to happen is that users will ignore your admonishment
to explicitly type the variable setting every time they start the shell,
and set it in their environment...then they will end up scratching their
heads trying to figure out why their scripts aren't working. (The error
that results is quite non-intuitive for a Spark newbie.)
I can live with it as is (with the doc change) but it isn't a very user
friendly thing.
On Wed, Apr 2, 2014 at 12:31 AM, UCB AMPLab <[email protected]>wrote:
> Merged build finished. All automated tests passed.
>
> --
> Reply to this email directly or view it on
GitHub<https://github.com/apache/spark/pull/294#issuecomment-39287958>
> .
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---