HeartSaVioR opened a new pull request #27001: [SPARK-30345][SQL] Fix 
intermittent test failure (ConnectException) on 
ThriftServerQueryTestSuite/ThriftServerWithSparkContextSuite
URL: https://github.com/apache/spark/pull/27001
 
 
   ### What changes were proposed in this pull request?
   
   This patch fixes the intermittent test failure on 
ThriftServerQueryTestSuite/ThriftServerWithSparkContextSuite, getting 
ConnectException when querying to thrift server.
   
(https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115646/testReport/)
   
   The relevant unit test log messages are following:
   
   ```
   19/12/23 13:33:01.875 pool-1-thread-1 INFO AbstractService: 
Service:ThriftBinaryCLIService is started.
   19/12/23 13:33:01.875 pool-1-thread-1 INFO AbstractService: 
Service:HiveServer2 is started.
   ...
   19/12/23 13:33:01.888 pool-1-thread-1 INFO 
ThriftServerWithSparkContextSuite: HiveThriftServer2 started successfully
   ...
   19/12/23 13:33:01.909 
pool-1-thread-1-ScalaTest-running-ThriftServerWithSparkContextSuite INFO 
ThriftServerWithSparkContextSuite:
   
   ===== TEST OUTPUT FOR 
o.a.s.sql.hive.thriftserver.ThriftServerWithSparkContextSuite: 'SPARK-29911: 
Uncache cached tables when session closed' =====
   
   ...
   19/12/23 13:33:02.017 
pool-1-thread-1-ScalaTest-running-ThriftServerWithSparkContextSuite INFO Utils: 
Supplied authorities: localhost:15441
   19/12/23 13:33:02.018 
pool-1-thread-1-ScalaTest-running-ThriftServerWithSparkContextSuite INFO Utils: 
Resolved authority: localhost:15441
   19/12/23 13:33:02.078 HiveServer2-Background-Pool: Thread-213 INFO 
BaseSessionStateBuilder$$anon$2: Optimization rule 
'org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation' is excluded 
from the optimizer.
   19/12/23 13:33:02.078 HiveServer2-Background-Pool: Thread-213 INFO 
BaseSessionStateBuilder$$anon$2: Optimization rule 
'org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation' is excluded 
from the optimizer.
   19/12/23 13:33:02.121 
pool-1-thread-1-ScalaTest-running-ThriftServerWithSparkContextSuite WARN 
HiveConnection: Failed to connect to localhost:15441
   19/12/23 13:33:02.124 
pool-1-thread-1-ScalaTest-running-ThriftServerWithSparkContextSuite INFO 
ThriftServerWithSparkContextSuite:
   
   ===== FINISHED 
o.a.s.sql.hive.thriftserver.ThriftServerWithSparkContextSuite: 'SPARK-29911: 
Uncache cached tables when session closed' =====
   
   19/12/23 13:33:02.143 Thread-35 INFO ThriftCLIService: Starting 
ThriftBinaryCLIService on port 15441 with 5...500 worker threads
   19/12/23 13:33:02.327 pool-1-thread-1 INFO HiveServer2: Shutting down 
HiveServer2
   19/12/23 13:33:02.328 pool-1-thread-1 INFO ThriftCLIService: Thrift server 
has stopped
   ```
   (Here the error is logged as `WARN HiveConnection: Failed to connect to 
localhost:15441` - the actual stack trace can be seen on Jenkins test summary.)
   
   The reason of test failure: Thrift(Binary|Http)CLIService prepare and launch 
the service asynchronously (in new thread), which suites are not waiting for 
completion and just start running tests, ends up with race condition.
   
   That can be easily reproduced, via adding artificial sleep in 
`ThriftBinaryCLIService.run()` here:
   
https://github.com/apache/spark/blob/ba3f6330dd2b6054988f1f6f0ffe014fc4969088/sql/hive-thriftserver/v2.3/src/main/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java#L49
   
   (Note that `sleep` should be added before initializing server socket. E.g. 
Line 57)
   
   This patch changes the test initialization logic to try executing simple 
query to wait until the service is available. The patch also refactors the code 
to apply the change both ThriftServerQueryTestSuite and 
ThriftServerWithSparkContextSuite easily.
   
   ### Why are the changes needed?
   
   This patch fixes the intermittent failure observed here:
   
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115646/testReport/
   
   ### Does this PR introduce any user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Artificially made the test fail consistently (by the approach described 
above), and confirmed the patch fixed the test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to