[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-17 Thread minrk
Github user minrk commented on the issue:

https://github.com/apache/spark/pull/15659
  
Awesome, thanks @holdenk!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-12 Thread minrk
Github user minrk commented on the issue:

https://github.com/apache/spark/pull/15659
  
Sounds great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-11 Thread minrk
Github user minrk commented on the issue:

https://github.com/apache/spark/pull/15659
  
> Any ideas on how we could follow up and support that install pattern?

To go all the way to the two commands I expected to work based on Python 
tradition:

```bash
./build/mvn -DskipTests clean package
cd python && pip install .
```

I believe you would need to copy (or hardlink or symlink-to-absolute-path) 
the jars into the Python directory during `build\mvn`, since `pip` copies the 
whole `python` directory to a temp location before running your setup.py.

One step short of this would be to allow the `stage-jars` to be an explicit 
step, rather than a temporary staging in setup.py that cleans up after itself. 
Then you would have one small extra step and the full command would be:

```bash
./build/mvn
cd python
python setup.py stage_jars # ./stage-jars.sh, whatever you prefer
pip install .
```

and the error message when jars are missing could point to the extra step.

Yet another option, and the least work from what you have now, could be to 
put the full sequence in your missing-jars error message:

> If you are installing pyspark from the spark source, you must build spark 
and run sdist first:
> ```
> ./build/mvn -DskipTests clean package
> cd python
> python setup.py sdist
> pip install dist/*.tar.gz
> ```

I think most people wouldn't discover that without very explicit help, 
because they would need to know about both the temporary symlinks and pip's 
tempdir to work it out.

Not being able to do `pip install .` is a bit odd for a Python package, as 
that is the standard command for installing any Python package from source 
(it's not `python setup.py install` anymore). But if you provide something 
copy/pasteable when people do try it, you are in pretty good shape.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-11 Thread minrk
Github user minrk commented on the issue:

https://github.com/apache/spark/pull/15659
  
I downloaded this branch and ran this to test (macOS, Python 3.5):

```bash
build/mvn -DskipTests clean package
cd python
python setup.py sdist
pip install dist/*.tar.gz
```

and everything worked great. Yay!

What didn't work (which I tried first) was:

pip install .

which is what I'm used to for installing a Python package from source, 
because pip copies the python package directory to a temp location before 
running the installation. Since this happens before symlinking the jars, they 
don't get found. I'm not sure how big of a deal that is, since it won't affect 
stable-release installation, but it is idiomatic for Python projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org