dongjoon-hyun commented on code in PR #45237: URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313
########## dev/make-distribution.sh: ########## @@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE" # Copy jars cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/" +# Only create the hive-jackson directory if they exist. +for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do + mkdir -p "$DISTDIR"/hive-jackson + mv $f "$DISTDIR"/hive-jackson/ +done Review Comment: There are 4 main benefits like `yarn` directory, @viirya . 1. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration. 2. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory. 3. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective. 4. **Compatibility with `hive-jackson-provided`**: With the existing `hive-jackson-provided`, this provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
