dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 4 main benefits like `yarn` directory, @viirya .
   
   1. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if 
they delete those two files from Spark's `jar` directory manually. However, 
it's difficult to recover the deleted files when a production job fails due to 
Hive UDF. This PR provides more robust and safe way with a configuration.
   
   2. **Communication**: We (and the security team) can easily communicate that 
`hive-jackson` is not used like `yarn` directory because it's physically split 
from the distribution. Also, they can delete the directory easily (if they 
need) without knowing the details of dependency lists inside that directory.
   
   3. **Robustness**: If Apache Spark have everything in `jars`, it's difficult 
to prevent them from loading. Of course, we may choose a tricky way to filter 
out from class file lists via naming pattern. It's still less robust in a long 
term perspective.
   
   4. **Compatibility with `hive-jackson-provided`**:  With the existing 
`hive-jackson-provided`, this provides a cleaner injection point for the 
provided dependencies. For example, the custom build Jackson dependencies can 
be placed in `hive-jackson` instead of `jars`. We are very reluctant if someone 
put their custom jar files into Apache Spark's `jars` directory directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to