dchvn opened a new pull request #34750:
URL: https://github.com/apache/spark/pull/34750


   ### What changes were proposed in this pull request?
   Skip identical index checking of Series.compare when config 
'compute.eager_check' is disabled
   
   ### Why are the changes needed?
   identical index checking is expensive, so we should use config 
'compute.eager_check' to skip this one
   
   ### Does this PR introduce _any_ user-facing change?
   Yes
   
   Before this PR
   ```python
   >>> psser1 = ps.Series([1, 2, 3, 4, 5], index=pd.Index([1, 2, 3, 4, 5]))
   >>> psser2 = ps.Series([1, 2, 3, 4, 5], index=pd.Index([1, 2, 4, 3, 5]))
   >>> psser1.compare(psser2)
   Traceback (most recent call last):                                           
   
     File "<stdin>", line 1, in <module>
     File "/u02/spark/python/pyspark/pandas/series.py", line 5851, in compare
       raise ValueError("Can only compare identically-labeled Series objects")
   ValueError: Can only compare identically-labeled Series objects
   ```
   After this PR, when config 'compute.eager_check' is False, pandas-on-Spark 
just proceeds and performs by ignoring the identical index checking.
   ```python
   >>> with ps.option_context("compute.eager_check", False):
   ...     psser1.compare(psser2)
   ... 
      self  other
   3     3      4
   4     4      3
   ```
   ### How was this patch tested?
   Unit tests
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to