Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22084#discussion_r209483839 --- Diff: dev/make-distribution.sh --- @@ -188,6 +190,23 @@ if [ -f "$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar cp "$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar "$DISTDIR/yarn" fi +# Only copy external jars if built +if [ -f "$SPARK_HOME"/external/avro/target/spark-avro_${SCALA_VERSION}-${VERSION}.jar ]; then + cp "$SPARK_HOME"/external/avro/target/spark-avro_${SCALA_VERSION}-${VERSION}.jar "$DISTDIR/external/jars/" +fi +if [ -f "$SPARK_HOME"/external/kafka-0-10/target/spark-streaming-kafka-0-10_${SCALA_VERSION}-${VERSION}.jar ]; then + cp "$SPARK_HOME"/external/kafka-0-10/target/spark-streaming-kafka-0-10_${SCALA_VERSION}-${VERSION}.jar "$DISTDIR/external/jars/" --- End diff -- I didn't want to include kinesis or ganglia because those would entail including OSS with licenses we can't redistribute. The existence of the modules is already a gray area. Let me look into what is built into these JARs. Some things like kafka we don't want to include, but do want to include kafka-client, yeah. We don't want to include Spark either for example. Yeah it's a reasonable argument, that nobody would use these directly anyway. The same could be said of some other JARs in the distro. Really the purpose of jars/ here is to support running in standalone mode. That is, most vendor distros would have spark-streaming-kafka on the classpath for you anyway, but, standalone doesn't. Standalone still won't pick up these new JARs because they're in external/jars/, but at least they're there at all, to be moved into jars/ if you cared to, for local deployment.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org