[GitHub] [spark] Yikun commented on pull request #36980: [WIP][SPARK-39522][INFRA] Uses Docker image cache over a custom image

GitBox Sat, 25 Jun 2022 06:17:17 -0700


Yikun commented on PR #36980:
URL: https://github.com/apache/spark/pull/36980#issuecomment-1166283208


   Information sync: From the latest build result: 
https://github.com/Yikun/spark/runs/7051222363?check_suite_focus=true#step:7:127
 , the cache works.
   - Current, CI failed due to:
   
   <details><summary>1. ModuleNotFoundError: No module named '_pickle'</summary>
   
   ```
   Starting test(pypy3): pyspark.sql.tests.test_arrow (temp output: 
/tmp/pypy3__pyspark.sql.tests.test_arrow__jx96qdzs.log)
   Traceback (most recent call last):
     File "/usr/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
       mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
     File "/usr/lib/pypy3.8/runpy.py", line 111, in _get_module_details
       __import__(pkg_name)
     File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
       from pyspark.rdd import RDD, RDDBarrier
     File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
       from pyspark.java_gateway import local_connect_and_auth
     File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in 
<module>
       from pyspark.serializers import read_int, write_with_length, 
UTF8Deserializer
     File "/__w/spark/spark/python/pyspark/serializers.py", line 68, in <module>
       from pyspark import cloudpickle
     File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 4, in 
<module>
       from pyspark.cloudpickle.cloudpickle import *  # noqa
     File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 
57, in <module>
       from .compat import pickle
     File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in 
<module>
       from _pickle import Pickler  # noqa: F401
   ModuleNotFoundError: No module named '_pickle'
   Had test failures in pyspark.sql.tests.test_arrow with pypy3; see logs.
   ```
   
   </details>
   
   Build latest dockerfile pypy3 upgrade to 3.8 (original is 3.7), but it seems 
cloudpickle has a bug. This may related: 
https://github.com/cloudpipe/cloudpickle/commit/8bbea3e140767f51dd935a3c8f21c9a8e8702b7c
 , but I try to apply this, also failed. Need a deeper look, **if you guys know 
the reason of this, pls let me know.**
   
   <details><summary>2. fatal: unsafe repository</summary>
   
   ```
   fatal: unsafe repository ('/__w/spark/spark' is owned by someone else)
   To add an exception for this directory, call:
        git config --global --add safe.directory /__w/spark/spark
   fatal: unsafe repository ('/__w/spark/spark' is owned by someone else)
   To add an exception for this directory, call:
        git config --global --add safe.directory /__w/spark/spark
   Error: Process completed with exit code 128.
   ```
   </details>
   
   https://github.blog/2022-04-12-git-security-vulnerability-announced/
   https://github.com/actions/checkout/issues/760
   
   I do a quick fix, need submit a separate PR to address it.
   ```yaml
       - name: Github Actions permissions workaround
         run: |
           git config --global --add safe.directory ${GITHUB_WORKSPACE}
   ```
   
   <details><summary>3. lint python <ufunc 'divide'></summary>
   
   ```
   starting mypy annotations test...
   annotations failed mypy checks:
   python/pyspark/pandas/frame.py:9970: error: Need type annotation for 
"raveled_column_labels"  [var-annotated]
   Found 1 error in 1 file (checked 337 source files)
   ```
   
   </details>
   
   due to `numpy` upgrade, we could let numpy<=1.22.2 first.
   
   <details><summary>4. R lint error </summary>
   
   ```
   Loading required namespace: SparkR
   Loading required namespace: lintr
   Failed with error:  ‘there is no package called ‘lintr’’
   Installing package into ‘/usr/lib/R/site-library’
   (as ‘lib’ is unspecified)
   Error in contrib.url(repos, type) : 
     trying to use CRAN without setting a mirror
   Calls: install.packages -> startsWith -> contrib.url
   Execution halted
   ```
   
   </details>
   
   no idea about it: 
   https://github.com/Yikun/spark/runs/7052215049?check_suite_focus=true
   
   <details><summary>5. sparkr </summary>
   
   ```
   Loading required namespace: SparkR
   Loading required namespace: lintr
   Failed with error:  ‘there is no package called ‘lintr’’
   Installing package into ‘/usr/lib/R/site-library’
   (as ‘lib’ is unspecified)
   Error in contrib.url(repos, type) : 
     trying to use CRAN without setting a mirror
   Calls: install.packages -> startsWith -> contrib.url
   Execution halted
   ```
   </details>
   
   no idea about it: 
   
https://github.com/Yikun/spark/runs/7052215214?check_suite_focus=true#step:9:10200
   
   
   6. sparkr arrow related case failed:
   
https://github.com/Yikun/spark/runs/7043826939?check_suite_focus=true#step:9:10904
   no idea
   
   <details><summary>7. NotImplementedError: pandas-on-Spark objects currently 
do not support <ufunc 'divide'></summary>
   
   ```
   ======================================================================
   ERROR [2.102s]: test_arithmetic_op_exceptions 
(pyspark.pandas.tests.test_series_datetime.SeriesDateTimeTest)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/__w/spark/spark/python/pyspark/pandas/tests/test_series_datetime.py", line 
99, in test_arithmetic_op_exceptions
       self.assertRaisesRegex(TypeError, expected_err_msg, lambda: other / 
psser)
     File "/usr/lib/python3.9/unittest/case.py", line 1276, in assertRaisesRegex
       return context.handle('assertRaisesRegex', args, kwargs)
     File "/usr/lib/python3.9/unittest/case.py", line 201, in handle
       callable_obj(*args, **kwargs)
     File 
"/__w/spark/spark/python/pyspark/pandas/tests/test_series_datetime.py", line 
99, in <lambda>
       self.assertRaisesRegex(TypeError, expected_err_msg, lambda: other / 
psser)
     File "/__w/spark/spark/python/pyspark/pandas/base.py", line 465, in 
__array_ufunc__
       raise NotImplementedError(
   NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 
'divide'>.
   ----------------------------------------------------------------------
   ```
   
   </details>
   
   due to `numpy` upgrade, we could let numpy<=1.22.2 first.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Yikun commented on pull request #36980: [WIP][SPARK-39522][INFRA] Uses Docker image cache over a custom image

Reply via email to