zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578581851 > Please note that I'm supporting your effort on this PR. Otherwise, I'll not chim in here to add comments. Thank you, I appreciate that. In general, full reproducible is defined by the Dockerfile which is shown at the begging, but to put it here for reference ``` FROM rocker/verse:3.4.3 RUN apt-get update \ && apt-get install -y --no-install-recommends gpg openjdk-8-jdk-headless \ && apt-get clean \ && rm -rf /var/lib/apt/lists/*ce RUN wget -qO- https://keybase.io/zero323/pgp_keys.asc | gpg --import RUN git clone --depth 1 --branch SPARK-23435 https://github.com/zero323/spark.git WORKDIR spark RUN git rev-parse HEAD RUN git verify-commit -v HEAD RUN build/mvn -DskipTests -Phive -Psparkr clean package RUN R --version RUN R -e "install.packages(c('e1071', 'praise'))" RUN R -e "install.packages('testthat', repos='https://cloud.r-project.org/'); packageVersion('testthat'); sessionInfo()" RUN R/create-rd.sh RUN R/create-docs.sh RUN R/check-cran.sh RUN R/run-tests.sh ``` It can be re-run to confirm that it reflects current state of things. As show in the cast, build are done directly from this head of this branch (signature is verified) and no changes to the codebase, beyond what is proposed in this PR (and we don't touch any Arrow related components here at all). As of skipping Arrow tests - that's default behavior defined in respective test for example here https://github.com/apache/spark/blob/43d9c7e7e57749ee611e0c97781a71a0645b5e9b/R/pkg/tests/fulltests/test_sparkSQL_arrow.R#L25 and following lines. So it is neither failure or result of any source modification. Can we make arrow tests run? Possibly, but: - R Arrow package is not present in snapshot repositories used by rocker images. Installing testthat from https://cloud.r-project.org, already pushed things a lot. Additionally some transitive dependencies have hidden version bounds. - C++ Arrow bindings would require external system repositories, which can break decencies for R. - Using other images (let's say official R-base) is not an option, as we need Tex as well as OpenSSL and Curl dev libraries and this will either break or require update of R beyond 2.4 (at least it did for other build configurations I considered). At this point Spark has no coverage for any intermediate R version (Jenkins runs 3.1 and then we have almost eight years of releases worth gap to 3.6 on AppVeyor), not to mention version-OS combinations. That's troubling, and as work related to this PR shown, can miss obvious errors. but not something that can be really addressed by running ad-hoc tests outside project infrastructure. Anyway... If you have specific concerns about the process used here, and you suspect that proposed changes can lead to problems in the future, I'll do my best to address these.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
