[
https://issues.apache.org/jira/browse/SPARK-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908751#comment-16908751
]
Matt Foley commented on SPARK-28749:
Hi [~hyukjin.kwon], thanks for looking at the issue. I did try that, but it
doesn't work for the following reason:
In {{python/pyspark/streaming/tests.py}}
* {{ENABLE_KAFKA_0_8_TESTS}} is used to derive boolean
{{are_kafka_tests_enabled}}
* The call to {{search_kafka_assembly_jar()}} is not guarded by the use of
{{are_kafka_tests_enabled}}.
* And the Failure exception is thrown from {search_kafka_assembly_jar()}.
So to make {{ENABLE_KAFKA_0_8_TESTS}} to properly guard the call to
{{search_kafka_assembly_jar()}} would be a similar bug fix.
> Fix PySpark tests not to require kafka-0-8 in branch-2.4
>
>
> Key: SPARK-28749
> URL: https://issues.apache.org/jira/browse/SPARK-28749
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Tests
>Affects Versions: 2.4.3
>Reporter: Matt Foley
>Priority: Minor
>
> As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with
> Scala-2.12, and kafka-0-8 does not support Scala-2.12.
> Currently, the PySpark tests invoked by `python/run-tests` demand the
> presence of kafka-0-8 libraries. If not present, this failure message will be
> generated:
> {code}
> Traceback (most recent call last):
> File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
> File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
> File "spark/python/pyspark/streaming/tests.py", line 1579, in
> kafka_assembly_jar = search_kafka_assembly_jar()
> File "spark/python/pyspark/streaming/tests.py", line 1524, in
> search_kafka_assembly_jar
> "You need to build Spark with "
> Exception: Failed to find Spark Streaming kafka assembly jar in
> spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt
> -Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or
> 'build/mvn -DskipTests -Pkafka-0-8 package' before running this test.
> Had test failures in pyspark.streaming.tests with
> spark/py_virtenv/bin/python; see logs.
> Process exited with code 255
> {code}
> This change is only targeted at branch-2.4, as most kafka-0-8 related
> materials have been removed in master and this problem no longer occurs there.
> PROPOSED SOLUTION
> The proposed solution is to make the kafka-0-8 stream testing optional for
> pyspark testing, exactly the same as the Kinesis stream testing currently is,
> in file `python/pyspark/streaming/tests.py`. This is only a few lines of
> change.
> Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns
> out to be somewhat onerous to reliably obtain that value from within the
> python test env, and no other python test code currently does so. So my
> proposed solution simply makes the use of the kafka-0-8 profile optional, and
> leaves it to the tester to include it for Scala-2.11 test builds and exclude
> it for Scala-2.12 test builds.
> PR will be available in a day or so.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org