HyukjinKwon opened a new pull request #30098:
URL: https://github.com/apache/spark/pull/30098


   ### What changes were proposed in this pull request?
   
   Some tests fail with PyArrow 2.0.0+:
   
   ```
   ======================================================================
   ERROR [0.774s]: test_grouped_over_window_with_key 
(pyspark.sql.tests.test_pandas_grouped_map.GroupedMapInPandasTests)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 
595, in test_grouped_over_window_with_key
       .select('id', 'result').collect()
     File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 588, in 
collect
       sock_info = self._jdf.collectToPython()
     File 
"/__w/spark/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/__w/spark/spark/python/pyspark/sql/utils.py", line 117, in deco
       raise converted from None
   pyspark.sql.utils.PythonException: 
     An exception was thrown from the Python worker. Please see the stack trace 
below.
   Traceback (most recent call last):
     File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
601, in main
       process()
     File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
593, in process
       serializer.dump_stream(out_iter, outfile)
     File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 255, in dump_stream
       return ArrowStreamSerializer.dump_stream(self, 
init_stream_yield_batches(), stream)
     File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 81, in dump_stream
       for batch in iterator:
     File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 248, in init_stream_yield_batches
       for series in iterator:
     File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
426, in mapper
       return f(keys, vals)
     File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
170, in <lambda>
       return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
     File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
158, in wrapped
       result = f(key, pd.concat(value_series, axis=1))
     File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 68, 
in wrapper
       return f(*args, **kwargs)
     File 
"/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 
590, in f
       "{} != {}".format(expected_key[i][1], window_range)
   AssertionError: {'start': datetime.datetime(2018, 3, 15, 0, 0), 'end': 
datetime.datetime(2018, 3, 20, 0, 0)} != {'start': datetime.datetime(2018, 3, 
15, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>), 'end': datetime.datetime(2018, 3, 
20, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>)}
   ```
   
   https://github.com/apache/spark/runs/1278917457
   
   This PR proposes to set the upper bound of PyArrow in GitHub Actions build. 
This should be removed when we properly support PyArrow 2.0.0+ (SPARK-33189).
   
   ### Why are the changes needed?
   
   To make build pass.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   GitHub Actions in this build will test it out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to