cloud-fan commented on code in PR #56453:
URL: https://github.com/apache/spark/pull/56453#discussion_r3398069261
##########
dev/make-distribution.sh:
##########
@@ -259,9 +268,14 @@ if [ "$MAKE_PIP" == "true" ]; then
pushd "$SPARK_HOME/python" > /dev/null
# Delete the egg info file if it exists, this can cache older setup files.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping
deletion"
+ # Ship the Apache LICENSE and NOTICE inside the PySpark source distributions
+ # (see MANIFEST.in). These are removed again after the sdists are built.
+ cp "$SPARK_HOME/LICENSE" LICENSE
Review Comment:
The classic pyspark sdist bundles the full jar set
(`packaging/classic/setup.py` symlinks the assembly jars into `deps/jars`), and
for exactly that situation the binary dist ships
`LICENSE-binary`/`NOTICE-binary` (lines 250-253 above). Should the classic
sdist get the `-binary` variants instead, so the bundled jars' licenses are
actually covered? The other three artifacts bundle no jars, so the source files
are right for them.
##########
dev/make-distribution.sh:
##########
@@ -32,6 +32,15 @@ set -x
SPARK_HOME="$(cd "`dirname "$0"`/.."; pwd)"
DISTDIR="$SPARK_HOME/dist"
+# The Apache LICENSE and NOTICE are copied into the Python and R package
+# directories below so they are bundled into the source distributions. Remove
+# them on exit so a failed build does not leave stray files behind.
+function cleanup_dist_license_files {
+ rm -f "$SPARK_HOME/python/LICENSE" "$SPARK_HOME/python/NOTICE" \
Review Comment:
If the script dies un-trappably (SIGKILL), these copies survive as untracked
files that can get committed by accident. `python/.gitignore` already ignores
the analogous transient copies `./setup.py`/`./setup.cfg` — consider adding
`LICENSE`/`NOTICE` there and `pkg/LICENSE`/`pkg/NOTICE` to `R/.gitignore`.
##########
dev/make-distribution.sh:
##########
@@ -272,9 +286,14 @@ if [ "$MAKE_R" == "true" ]; then
echo "Building R source package"
R_PACKAGE_VERSION=`grep Version "$SPARK_HOME/R/pkg/DESCRIPTION" | awk
'{print $NF}'`
pushd "$SPARK_HOME/R" > /dev/null
+ # Ship the Apache LICENSE and NOTICE inside the SparkR source package. These
+ # are removed again after the package is built.
+ cp "$SPARK_HOME/LICENSE" pkg/LICENSE
Review Comment:
Heads-up: with these files in `pkg/`, `R CMD check --as-cran` gains a NOTE —
reproduced on R 4.5.2: "File LICENSE is not mentioned in the DESCRIPTION file"
and "Non-standard file/directory found at top level: 'NOTICE'". Nothing breaks:
the check still exits 0, and SparkR CI runs `check-cran.sh` without these files
so the NOTE budget in `R/run-tests.sh` is unaffected. It will show up in
release-build logs though; mentioning `file LICENSE` in DESCRIPTION would
silence the first half.
##########
dev/make-distribution.sh:
##########
@@ -259,9 +268,14 @@ if [ "$MAKE_PIP" == "true" ]; then
pushd "$SPARK_HOME/python" > /dev/null
# Delete the egg info file if it exists, this can cache older setup files.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping
deletion"
+ # Ship the Apache LICENSE and NOTICE inside the PySpark source distributions
+ # (see MANIFEST.in). These are removed again after the sdists are built.
+ cp "$SPARK_HOME/LICENSE" LICENSE
+ cp "$SPARK_HOME/NOTICE" NOTICE
python3 packaging/classic/setup.py sdist
python3 packaging/connect/setup.py sdist
python3 packaging/client/setup.py sdist
+ rm -f LICENSE NOTICE
Review Comment:
This gap shipped for years and was only caught by an RC vote -1. A cheap
guard after the sdist builds would fail the release build on recurrence, e.g.
`for f in dist/pyspark*.tar.gz; do tar tzf "$f" | grep -q '/LICENSE$' || { echo
"$f missing LICENSE"; exit 1; }; done` (and the same for NOTICE / the SparkR
tarball).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]