Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22084#discussion_r209483839
--- Diff: dev/make-distribution.sh ---
@@ -188,6 +190,23 @@ if [ -f
"$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar
cp
"$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar
"$DISTDIR/yarn"
fi
+# Only copy external jars if built
+if [ -f
"$SPARK_HOME"/external/avro/target/spark-avro_${SCALA_VERSION}-${VERSION}.jar
]; then
+ cp
"$SPARK_HOME"/external/avro/target/spark-avro_${SCALA_VERSION}-${VERSION}.jar
"$DISTDIR/external/jars/"
+fi
+if [ -f
"$SPARK_HOME"/external/kafka-0-10/target/spark-streaming-kafka-0-10_${SCALA_VERSION}-${VERSION}.jar
]; then
+ cp
"$SPARK_HOME"/external/kafka-0-10/target/spark-streaming-kafka-0-10_${SCALA_VERSION}-${VERSION}.jar
"$DISTDIR/external/jars/"
--- End diff --
I didn't want to include kinesis or ganglia because those would entail
including OSS with licenses we can't redistribute. The existence of the modules
is already a gray area.
Let me look into what is built into these JARs. Some things like kafka we
don't want to include, but do want to include kafka-client, yeah. We don't want
to include Spark either for example.
Yeah it's a reasonable argument, that nobody would use these directly
anyway. The same could be said of some other JARs in the distro. Really the
purpose of jars/ here is to support running in standalone mode. That is, most
vendor distros would have spark-streaming-kafka on the classpath for you
anyway, but, standalone doesn't.
Standalone still won't pick up these new JARs because they're in
external/jars/, but at least they're there at all, to be moved into jars/ if
you cared to, for local deployment.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]