cloud-fan commented on code in PR #56453:
URL: https://github.com/apache/spark/pull/56453#discussion_r3398069261


##########
dev/make-distribution.sh:
##########
@@ -259,9 +268,14 @@ if [ "$MAKE_PIP" == "true" ]; then
   pushd "$SPARK_HOME/python" > /dev/null
   # Delete the egg info file if it exists, this can cache older setup files.
   rm -rf pyspark.egg-info || echo "No existing egg info file, skipping 
deletion"
+  # Ship the Apache LICENSE and NOTICE inside the PySpark source distributions
+  # (see MANIFEST.in). These are removed again after the sdists are built.
+  cp "$SPARK_HOME/LICENSE" LICENSE

Review Comment:
   The classic pyspark sdist bundles the full jar set 
(`packaging/classic/setup.py` symlinks the assembly jars into `deps/jars`), and 
for exactly that situation the binary dist ships 
`LICENSE-binary`/`NOTICE-binary` (lines 250-253 above). Should the classic 
sdist get the `-binary` variants instead, so the bundled jars' licenses are 
actually covered? The other three artifacts bundle no jars, so the source files 
are right for them.



##########
dev/make-distribution.sh:
##########
@@ -32,6 +32,15 @@ set -x
 SPARK_HOME="$(cd "`dirname "$0"`/.."; pwd)"
 DISTDIR="$SPARK_HOME/dist"
 
+# The Apache LICENSE and NOTICE are copied into the Python and R package
+# directories below so they are bundled into the source distributions. Remove
+# them on exit so a failed build does not leave stray files behind.
+function cleanup_dist_license_files {
+  rm -f "$SPARK_HOME/python/LICENSE" "$SPARK_HOME/python/NOTICE" \

Review Comment:
   If the script dies un-trappably (SIGKILL), these copies survive as untracked 
files that can get committed by accident. `python/.gitignore` already ignores 
the analogous transient copies `./setup.py`/`./setup.cfg` — consider adding 
`LICENSE`/`NOTICE` there and `pkg/LICENSE`/`pkg/NOTICE` to `R/.gitignore`.



##########
dev/make-distribution.sh:
##########
@@ -272,9 +286,14 @@ if [ "$MAKE_R" == "true" ]; then
   echo "Building R source package"
   R_PACKAGE_VERSION=`grep Version "$SPARK_HOME/R/pkg/DESCRIPTION" | awk 
'{print $NF}'`
   pushd "$SPARK_HOME/R" > /dev/null
+  # Ship the Apache LICENSE and NOTICE inside the SparkR source package. These
+  # are removed again after the package is built.
+  cp "$SPARK_HOME/LICENSE" pkg/LICENSE

Review Comment:
   Heads-up: with these files in `pkg/`, `R CMD check --as-cran` gains a NOTE — 
reproduced on R 4.5.2: "File LICENSE is not mentioned in the DESCRIPTION file" 
and "Non-standard file/directory found at top level: 'NOTICE'". Nothing breaks: 
the check still exits 0, and SparkR CI runs `check-cran.sh` without these files 
so the NOTE budget in `R/run-tests.sh` is unaffected. It will show up in 
release-build logs though; mentioning `file LICENSE` in DESCRIPTION would 
silence the first half.



##########
dev/make-distribution.sh:
##########
@@ -259,9 +268,14 @@ if [ "$MAKE_PIP" == "true" ]; then
   pushd "$SPARK_HOME/python" > /dev/null
   # Delete the egg info file if it exists, this can cache older setup files.
   rm -rf pyspark.egg-info || echo "No existing egg info file, skipping 
deletion"
+  # Ship the Apache LICENSE and NOTICE inside the PySpark source distributions
+  # (see MANIFEST.in). These are removed again after the sdists are built.
+  cp "$SPARK_HOME/LICENSE" LICENSE
+  cp "$SPARK_HOME/NOTICE" NOTICE
   python3 packaging/classic/setup.py sdist
   python3 packaging/connect/setup.py sdist
   python3 packaging/client/setup.py sdist
+  rm -f LICENSE NOTICE

Review Comment:
   This gap shipped for years and was only caught by an RC vote -1. A cheap 
guard after the sdist builds would fail the release build on recurrence, e.g. 
`for f in dist/pyspark*.tar.gz; do tar tzf "$f" | grep -q '/LICENSE$' || { echo 
"$f missing LICENSE"; exit 1; }; done` (and the same for NOTICE / the SparkR 
tarball).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to