LuciferYang commented on PR #42385:
URL: https://github.com/apache/spark/pull/42385#issuecomment-1676854237

   @utkarsh39 
   
   I found that this PR may caused some PySpark test cases to fail in the Java 
17 daily tests(pyspark-sql and pyspark-connect module):
   
   - https://github.com/apache/spark/actions/runs/5837423492
   - https://github.com/apache/spark/actions/runs/5843658110
   - https://github.com/apache/spark/actions/runs/5849761680
   
   <img width="1157" alt="image" 
src="https://github.com/apache/spark/assets/1475305/bcab0032-5d96-4596-9f03-0aa364f91574";>
   
   
   To verify this , I conducted some local testing using Java 17
   
   ```
   java -version
   openjdk version "17.0.8" 2023-07-18 LTS
   OpenJDK Runtime Environment Zulu17.44+15-CA (build 17.0.8+7-LTS)
   OpenJDK 64-Bit Server VM Zulu17.44+15-CA (build 17.0.8+7-LTS, mixed mode, 
sharing)
   ```
   
   1. Revert to the previous PR before SPARK-44705 and run the following 
commands:
   
   
   ```
   // [SPARK-44765][CONNECT] Simplify retries of ReleaseExecute
   git reset --hard 9bde882fcb39e9fedced0df9702df2a36c1a84e6
   export SKIP_UNIDOC=true
   export SKIP_MIMA=true
   export SKIP_PACKAGING=true
   ./dev/run-tests --parallelism 1 --modules "pyspark-sql"
   ```
   
   ```
   Finished test(python3.9): pyspark.sql.tests.test_udtf (57s) ... 2 tests were 
skipped
   Tests passed in 59 seconds
   ```
   
   The tests in `pyspark.sql.tests.test_udtf` passed.
   
   
   2. Revert to SPARK-44705 and run the following commands:
   
   ```
   // [SPARK-44705][PYTHON] Make PythonRunner single-threaded
   git reset --hard 8aaff55839493e80e3ce376f928c04aa8f31d18c
   export SKIP_UNIDOC=true
   export SKIP_MIMA=true
   export SKIP_PACKAGING=true
   ./dev/run-tests --parallelism 1 --modules "pyspark-sql"
   ```
   
   
   ```
   ======================================================================
   FAIL: test_udtf_with_analyze_table_argument_adding_columns 
(pyspark.sql.tests.test_udtf.UDTFTests)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py",
 line 1340, in test_udtf_with_analyze_table_argument_adding_columns
       assertSchemaEqual(
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py",
 line 356, in assertSchemaEqual
       raise PySparkAssertionError(
   pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] 
Schemas do not match.
   --- actual
   +++ expected
   - StructType([StructField('a', LongType(), True)])
   + StructType([StructField('id', LongType(), False), StructField('is_even', 
BooleanType(), True)])
   
   ======================================================================
   FAIL: test_udtf_with_analyze_table_argument_repeating_rows 
(pyspark.sql.tests.test_udtf.UDTFTests) (query_no=0)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py",
 line 1394, in test_udtf_with_analyze_table_argument_repeating_rows
       assertSchemaEqual(df.schema, expected_schema)
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py",
 line 356, in assertSchemaEqual
       raise PySparkAssertionError(
   pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] 
Schemas do not match.
   --- actual
   +++ expected
   - StructType([StructField('id', LongType(), False), StructField('is_even', 
BooleanType(), True)])
   + StructType([StructField('id', LongType(), False)])
   
   ======================================================================
   FAIL: test_udtf_with_analyze_table_argument_repeating_rows 
(pyspark.sql.tests.test_udtf.UDTFTests)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py",
 line 1400, in test_udtf_with_analyze_table_argument_repeating_rows
       self.spark.sql(
   AssertionError: AnalysisException not raised
   
   ======================================================================
   FAIL: test_udtf_with_analyze_using_accumulator 
(pyspark.sql.tests.test_udtf.UDTFTests) (query_no=0)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py",
 line 1625, in test_udtf_with_analyze_using_accumulator
       assertSchemaEqual(df.schema, StructType().add("col1", IntegerType()))
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py",
 line 356, in assertSchemaEqual
       raise PySparkAssertionError(
   pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] 
Schemas do not match.
   --- actual
   +++ expected
   - StructType([StructField('a', IntegerType(), True), StructField('b', 
IntegerType(), True)])
   + StructType([StructField('col1', IntegerType(), True)])
   
   ======================================================================
   FAIL: test_udtf_with_analyze_using_accumulator 
(pyspark.sql.tests.test_udtf.UDTFTests)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py",
 line 1628, in test_udtf_with_analyze_using_accumulator
       self.assertEqual(test_accum.value, 222)
   AssertionError: 111 != 222
   
   ----------------------------------------------------------------------
   Ran 174 tests in 54.619s
   
   FAILED (failures=34, errors=6, skipped=2)
   ```
   
   There are 34 test failures after this one merged.
   
   @utkarsh39 Do you have time to fix these test cases?  For this, I have 
created SPARK-44797.
   
   Or should we revert this PR to restore the Java 17 daily tests first? 
@HyukjinKwon @ueshin @dongjoon-hyun 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to