[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
The logging approach has been merged and I'm closing this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
@cloud-fan, will try it. Thank you sincerely.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
I've not worked in the logging stuff yet, feel free to take it, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
> My proposal is, pandas and pyarrow should be a hard requirement for our 
jenkins, to make sure the features are well tested.

If this is a goal, I think another simple way is just to use an env set in 
Jenkins and throw an exception if both PyArrow or Pandas are not installed in 
the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Thank you for bearing with me @cloud-fan. I agree with it.

BTW, are you working on the logging thing BTW? I was thinking the simplest 
way to check is just print out once if PyArrow / Pandas are installed or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
I agree that pandas and pyarrow should not be a hard requirement for users, 
and this is what it is today: PySpark only throws exception when users try to 
use pandas related functions without pandas/pyarrow installed.

My proposal is, pandas and pyarrow should be a hard requirement for our 
jenkins, to make sure the features are well tested.

If there is a way to prove that py3 tests run well, and the environment 
issue is hard to fix, then we maybe we can deal with it later, after 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Also, if we should go in this way, I think we should enable some tests with 
PyPy too if I understood correctly and there isn't another problem I maybe 
missed:


https://github.com/apache/spark/blob/9623a98248837da302ba4ec240335d1c4268ee21/dev/sparktestsupport/modules.py#L457

At least, PyPy in my local has `numpy`.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yup, explicitly logging sounds fine for now so that we can easily check.

>  I do prefer to have these conditional skips removed because sometimes it 
is hard to tell if everything passed or was just skipped

To be clear, I think it's more because our own testing script doesn't show 
the skipped tests output from unittests in the console.

Also, I think it's more because we couldn't make sure Pandas and Arrow were 
installed properly in testing env, Jenkins but not because we skip tests 
related with extra dependencies when they are not installed. Making them as 
required dependencies is a big deal IMHO.

FYI, I tried to install PyArrow with PyPy last time and I failed. I wonder 
if we can easily install it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yes, the tests are being run with python3. I do prefer to have these 
conditional skips removed because sometimes it is hard to tell if everything 
passed or was just skipped. But since pandas and pyarrow are optional 
dependencies, there should be some way for the user to skip with an environment 
variable or something.  At the very least, being able to verify they were run 
in a log would be good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20465
  
So, jenkins jobs run those tests with python3? If so, I feel better because 
those tests are not completely skipped in Jenkins. If it is hard to make them 
run with python 2. Let’s have a log to explicitly show if we are going to run 
tests using pandas/pyarrow, which will help us confirm if they get exercised 
with python 3 in Jenkins or not.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yup, there was a related discussion already. See this 
https://github.com/apache/spark/pull/19884#issuecomment-351916074 and 
https://github.com/apache/spark/pull/19884#issuecomment-353068446. We shouldn't 
run this for now. Also these are technically not hard dependencies. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20465
  
Looking back at when pyarrow was last upgraded in #19884, pandas and 
pyarrow were upgraded on all workers for python 3, but there were maybe some 
concerns or difficulties with upgrading for python 2 and pypy environments at 
that time. That is why the above failure is from python 2 with an older version 
of pandas. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20465
  
```
ImportError: Pandas >= 0.19.2 must be installed on calling Python process; 
however, your version was 0.16.0.
```

I guess the RISELab boxes will need some updates...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20465
  
sure, that's ok, I think we can revisit later (ie. next release) if we want 
to add an env switch or something to make them optional


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
>  I think there are some values in having a way to run python tests 
without Arrow?

I agree, but the more important thing is to make sure jenkins runs 
everything, so that we can be confident about our release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20465
  
@felixcheung jenkins is actually skipping those tests (see the failure of 
this pr). It makes sense to provide a way to allow developers to not run those 
tests. But, I'd prefer that we run those tests by default. So, we can make sure 
that jenkins is doing the right thing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20465
  
hmm, I think there are some values in having a way to run python tests 
without Arrow? I mean the test.py is not just for Jenkins but for everyone 
consuming the Spark release... unless we are saying Arrow is required now?

And in Jenkins we shouldn't be skipping any of these tests anyway? Is there 
a reason we need to change that if Jenkins isn't affected (and if I recall 
there is a way to check if running under Jenkins - we could always make Arrow 
tests not skipped in Jenkins)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20465
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20465
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86910/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20465
  
**[Test build #86910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86910/testReport)**
 for PR 20465 at commit 
[`8aba4f5`](https://github.com/apache/spark/commit/8aba4f502879b7e3b8c154b00ded22e4bcba8df2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20465
  
**[Test build #86910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86910/testReport)**
 for PR 20465 at commit 
[`8aba4f5`](https://github.com/apache/spark/commit/8aba4f502879b7e3b8c154b00ded22e4bcba8df2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/460/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20465
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
cc @yhuai @icexelloss @BryanCutler @ueshin @shaneknapp 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org