HyukjinKwon opened a new pull request #34187: URL: https://github.com/apache/spark/pull/34187
### What changes were proposed in this pull request? This PR fixes the test failure: ``` Running tests... ---------------------------------------------------------------------- test_read_images (pyspark.ml.tests.test_image.ImageFileFormatTest) ... ERROR (12.050s) ====================================================================== ERROR [12.050s]: test_read_images (pyspark.ml.tests.test_image.ImageFileFormatTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/tests/test_image.py", line 35, in test_read_images self.assertEqual(df.count(), 4) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py", line 507, in count return int(self._jdf.count()) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in _call_ answer, self.gateway_client, self.target_id, self.name) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/utils.py", line 98, in deco return f(*a, **kw) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o32.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, amp-jenkins-worker-05.amp, executor driver): javax.imageio.IIOException: Unsupported Image Type at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1079) at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1050) at javax.imageio.ImageIO.read(ImageIO.java:1448) at javax.imageio.ImageIO.read(ImageIO.java:1352) ``` This exception happens apparently when handling malformed invalid images when `dropInvalid` option is on. However, `ImageIO.read` fails to catch `javax.imageio.IIOException` for an invalid image that is not `RuntimeException`. In fact, `javax.imageio.IIOException` signals "run-time failure of reading" (see also https://docs.oracle.com/javase/8/docs/api/javax/imageio/IIOException.html). Therefore, this PR adds `javax.imageio.IIOException` when catching the exception when reading image to properly handle malformed images. For the reason why it's flaky instead of consistently failing, I am not yet sure. However, the fix should be correct. ### Why are the changes needed? To fix the flaky tests, see https://github.com/apache/spark/runs/3802639160 as an example. ### Does this PR introduce _any_ user-facing change? Users would be able to read malformed data even for the cases of `javax.imageio.IIOException` is thrown when `dropInvalid` option is enabled. ### How was this patch tested? Existing unittests. We should track if the tests are still flaky or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
