jeff-xu-z opened a new pull request, #358: URL: https://github.com/apache/incubator-livy/pull/358
## What changes were proposed in this pull request? Proposed code fix for [https://issues.apache.org/jira/browse/LIVY-896](https://issues.apache.org/jira/browse/LIVY-896). ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) ### Unit tests Two new unit tests are included in the PR. Outputs of the new unit tests are attached as [unit-tests.txt](https://github.com/apache/incubator-livy/files/9793910/unit-tests.txt). ### System tests I run system tests manually to verify the fix on an EMR in AWS. 1. Install open source Livy 0.7.1 and configured it running on port 8999. 2. Upload my PySpark program run_sql.py to the cluster's HDFS (see the artifact below) 3. Loop my test_livy.py (see the artifact below) for 10 times. I hit the issue 5 out of 10 tries (Livy reported SessionState.SUCCESS even though spark-submit failed). See [reproduced.txt](https://github.com/apache/incubator-livy/files/9793894/reproduced.txt) for details. 5. Replaced livy-server jar with the fixed version. 6. Loop my test_livy.py for 100 times. I never hit the issue again. See [fixed.txt](https://github.com/apache/incubator-livy/files/9793943/fixed.txt) for details. 7. I also quickly verified that Livy Session does not have the issue. See [livy_session.txt](https://github.com/apache/incubator-livy/files/9793963/livy_session.txt) for details. ### Artifact: test_livy.py (the verification program) ``` #!/usr/bin/env python3 from livy import LivyBatch import sys import logging logging.basicConfig(format='%(asctime)s %(levelname)s %(message)s', stream=sys.stderr, level=logging.INFO) if __name__ == "__main__": batch = LivyBatch.create( url="http://ip-100-64-129-199.us-west-2.compute.internal:8999", file="/tmp/run_sql.py", args=["-s", "select * from abc"], ) logging.info(f"batch id={batch.batch_id} created ...") batch.wait() logging.info(f"batch id={batch.batch_id}, state={batch.state}") ``` ### Artifact: run_sql.py (the Spark program to run a given SQL) ``` from pyspark.sql import SparkSession import sys import argparse if __name__ == "__main__": parser = argparse.ArgumentParser( formatter_class = argparse.ArgumentDefaultsHelpFormatter, ) parser.add_argument("-s", action="store", dest="sql") args = parser.parse_args() spark = SparkSession.builder.\ appName("PySpark SparkSQL").\ enableHiveSupport().\ config("spark.ui.enabled", "false").\ getOrCreate() try: spark.sql(args.sql).show() finally: spark.stop() ``` ### Artifact: test_session.py (verify Livy session does not have the issue) ``` #!/usr/bin/env python3 from livy import LivySession, SessionKind import sys import logging logging.basicConfig(format='%(asctime)s %(levelname)s %(message)s', stream=sys.stderr, level=logging.INFO) if __name__ == "__main__": sess = LivySession.create( url="http://ip-100-64-129-199.us-west-2.compute.internal:8999", kind=SessionKind.SQL) logging.info(f"session id={sess.session_id} created ...") sess.wait() logging.info(f"session id={sess.session_id} is ready") try: sess.download_sql("SELECT * from xxyyzz") except Exception as e: logging.info(str(e)) ``` Please review https://livy.incubator.apache.org/community/ before opening a pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
